CLMemory#getNIOSize should use buffer.limit() instead of buffer.capacity().
Posted by Emily Leiviskä on Nov 15, 2016; 11:47am
URL: https://forum.jogamp.org/CLMemory-getNIOSize-should-use-buffer-limit-instead-of-buffer-capacity-tp4037407.html
User story:
We are dealing with video processing. We use native code to load images from streams which requires the image buffers to be aligned on 16 byte boundaries. The images are stored in direct NIO buffers and owned by our java program.
To obtain aligned NIO buffers we allocate a slightly larger direct ByteBuffer, read out the NIO address using reflection set the position to be aligned, slice the buffer and set the limit to the desired size. This yields us a buffer that has slightly larger capacity than the limit, the limit is set to the desired size and the start address has the desired alignment in the native domain.
We have a ring buffer of the recent frames as device buffers and need to transfer the read image data into the next available device buffer for processing without constantly freeing and allocating the device buffers. So we would like to reuse our CLResources and we would also like to avoid the any memcopies on the way.
We allocate the device memory based on the image size and pixel mode and associate the aligned NIO buffer with the correct device CLBuffer and attempt to write to the device. This fails with CL_INVALID_VALUE because the capacity of the direct buffer is larger than the size of the device buffer.
Suggestion:
I debugged the error and found the following code in CLMemory line 155:
public int getNIOSize() {
if(buffer == null) {
return 0;
}
return getElementSize() * buffer.capacity();
}
In the above I strongly believe that buffer.capacity() should be buffer.limit() as limit() more closely represents the users desire for the buffer dimension and how much to copy.
Short, Self-Contained Correct Example:
import java.nio.ByteBuffer;
import com.jogamp.opencl.CLBuffer;
import com.jogamp.opencl.CLCommandQueue;
import com.jogamp.opencl.CLContext;
import com.jogamp.opencl.CLDevice;
public class SSCCE {
public static void main(String[] args) {
final CLContext context = CLContext.create();
final CLDevice device = context.getMaxFlopsDevice();
final CLCommandQueue queue = device.createCommandQueue();
final int size = 30;
final int padding = 16; // Change to 0 and both work
final ByteBuffer directBuffer = ByteBuffer.allocateDirect(size + padding);
directBuffer.limit(size);
// This is what we want to do, but it fails with CL_INVALID_VALUE
final CLBuffer<?> buffer = context.createBuffer(size).cloneWith(directBuffer);
// This works but not what we want, it allocates a new device buffer every time.
// final CLBuffer<ByteBuffer> buffer = context.createBuffer(directBuffer);
queue.putWriteBuffer(buffer, true);
}
}