Writing a range of an array to an offset of a GPU-domain buffer in OpenCL isn't working out. In other words, I wish to read from my java array at a given offset and write it to the CLBuffer on the GPU at another specified offset and lengths.
The original C API makes this possible by exposing control of the offset of the buffer object (GPU end) to begin writing, and explicit length as well. Unfortunately, the source for CLCommandQueue.putWriteBuffer() hard-codes the offset to zero, closing that option off.
I tried using CLBuffer.getBuffer().position(start).limit(end) then invoking putWriteBuffer(...) but ended up with a segfault. I found that CLMemory.getNIOSize() is using Buffer.capacity(), and I believe this is resulting in an overflow.
There was an older post on this forum suggesting that getNIOSize() use the limit() method instead for determining length and use that for the write length. I disagree with this: If position() is used to determine the host offset when calculating a pointer from the NIO buffer, position() + limit() will exceed capacity() and again result in overflow. I assert that the proper method to use is remaining() * getElementSize() when calculating the number of bytes to transfer, and perhaps position() * getElementSize() for calculating the offset on the GPU. Alternatively, a method could be written which takes an offset (in elements). This makes the offsets on Host and GPU independent.
If you can give us a test that reproduces the bug, and a patch that fixes it, I'd be happy to commit it for you? JOCL is a little tricky since all the different platform implementations seem to behave a little differently, so a test that exposes something like this would definitely be valuable.