jogamp - Re: putMapBuffer - data In/Out performance issue

jogamp › jocl

Re: putMapBuffer - data In/Out performance issue

Posted by notzed on Jul 06, 2011; 1:45am
URL: https://forum.jogamp.org/putMapBuffer-data-In-Out-performance-issue-tp3126498p3143228.html

Are you timing one run or multiple/subsequent invocations? Mapping memory is probably lazy - i.e. the actual mapping only happens when you access it. Which makes the first time you do it slow.

Also note that the stuff below doesn't actually do any copying, it just creates a new pointer which associates cpu memory with the gpu object, all you're timing is a little bit of java code and a new().

//Data copy in App buffer (very efficient in terms of time)
long time2 = nanoTime();
clBufferC = clBufferC.cloneWith(h_data3.asFloatBuffer());

In general you seem to be doing excessive copies anyway. You already have data3, why copy it to clbufferc? And copying bufferA to the mapped copy of A?

And from what i can tell from the spec you need to unmap the buffer before executing the kernel. The spec wording is a little convoluted (spec 1.1, section 5.4.2.1) but it states that mapped memory cannot be accessed from kernels.

The AMD opencl programming guide section 4.4 covers a lot of this in detail for AMD's implementation, much of which i imagine is applicable to other gpus. http://developer.amd.com/sdks/AMDAPPSDK/assets/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide.pdf

FWIW I only use putwritebuffer/putreadbuffer as I find that model more intuitive. I haven't had any performance issues related to it.