CLMemory#getNIOSize should use buffer.limit() instead of buffer.capacity().

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

CLMemory#getNIOSize should use buffer.limit() instead of buffer.capacity().

Emily Leiviskä
User story:
We are dealing with video processing. We use native code to load images from streams which requires the image buffers to be aligned on 16 byte boundaries. The images are stored in direct NIO buffers and owned by our java program.

To obtain aligned NIO buffers we allocate a slightly larger direct ByteBuffer, read out the NIO address using reflection set the position to be aligned, slice the buffer and set the limit to the desired size. This yields us a buffer that has slightly larger capacity than the limit, the limit is set to the desired size and the start address has the desired alignment in the native domain.

We have a ring buffer of the recent frames as device buffers and need to transfer the read image data into the next available device buffer for processing without constantly freeing and allocating the device buffers. So we would like to reuse our CLResources and we would also like to avoid the any memcopies on the way.

We allocate the device memory based on the image size and pixel mode and associate the aligned NIO buffer with the  correct device CLBuffer and attempt to write to the device. This fails with CL_INVALID_VALUE because the capacity of the direct buffer is larger than the size of the device buffer.

Suggestion:
I debugged the error and found the following code in CLMemory line 155:

    public int getNIOSize() {
        if(buffer == null) {
            return 0;
        }
        return getElementSize() * buffer.capacity();
    }

In the above I strongly believe that buffer.capacity() should be buffer.limit() as limit() more closely represents the users desire for the buffer dimension and how much to copy.

Short, Self-Contained Correct Example:

import java.nio.ByteBuffer;

import com.jogamp.opencl.CLBuffer;
import com.jogamp.opencl.CLCommandQueue;
import com.jogamp.opencl.CLContext;
import com.jogamp.opencl.CLDevice;

public class SSCCE {

    public static void main(String[] args) {
        final CLContext context = CLContext.create();
        final CLDevice device = context.getMaxFlopsDevice();
        final CLCommandQueue queue = device.createCommandQueue();

        final int size = 30;
        final int padding = 16; // Change to 0 and both work
        final ByteBuffer directBuffer = ByteBuffer.allocateDirect(size + padding);
        directBuffer.limit(size);

        // This is what we want to do, but it fails with CL_INVALID_VALUE
        final CLBuffer<?> buffer = context.createBuffer(size).cloneWith(directBuffer);

        // This works but not what we want, it allocates a new device buffer every time.
        // final CLBuffer<ByteBuffer> buffer = context.createBuffer(directBuffer);

        queue.putWriteBuffer(buffer, true);
    }
}
Reply | Threaded
Open this post in threaded view
|

Re: CLMemory#getNIOSize should use buffer.limit() instead of buffer.capacity().

Wade Walker
Administrator
This sounds reasonable -- the original author probably just didn't consider the case where the user calls limit(int) on their buffer.

The question is what to do about it. I will commit your change to the repo if you'll submit a pull request on Github, but right now our build server is out of order, so there wouldn't automatically be a new build. Do you need help building a new version of JOCL?
Reply | Threaded
Open this post in threaded view
|

Re: CLMemory#getNIOSize should use buffer.limit() instead of buffer.capacity().

Emily Leiviskä
I can do the patch. I haven't build jogamp from source before so I guess I could use some help with that.

Is this the correct repository to fork: https://github.com/WadeWalker/jocl?
Reply | Threaded
Open this post in threaded view
|

Re: CLMemory#getNIOSize should use buffer.limit() instead of buffer.capacity().

Wade Walker
Administrator
Yes, that's the right repo. Build instructions are at http://jogamp.org/jocl/doc/HowToBuild.html, with more detail at https://jogamp.org/wiki/index.php/Building_JOGL_on_the_command_line if you need it. Let me know if you run into any problems.
Reply | Threaded
Open this post in threaded view
|

Re: CLMemory#getNIOSize should use buffer.limit() instead of buffer.capacity().

Emily Leiviskä
For any one who finds this by google, I hit the problem below when compiling jogl and fixed it by modifying the mingw system include file (both 32 and 64bit if you need them):

/mingw64/x86_64-w64-mingw32/include/setupapi.h

by adding #include <Devpropdef.h> as required here: https://msdn.microsoft.com/en-us/library/windows/hardware/dn315031(v=vs.85).aspx .

inside the #if/endif where DEVPROPKEY is used

How it looks after changes:

#if _WIN32_WINNT >= 0x0600
#include <Devpropdef.h>
  WINSETUPAPI WINBOOL WINAPI SetupDiGetDevicePropertyW(HDEVINFO DeviceInfoSet, PSP_DEVINFO_DATA DeviceInfoData, const DEVPROPKEY *PropertyKey, DEVPROPTYPE *PropertyType, PBYTE PropertyBuffer, DWORD PropertyBufferSize, PDWORD RequiredSize, DWORD Flags);
  WINSETUPAPI WINBOOL WINAPI SetupDiGetDevicePropertyKeys(HDEVINFO DeviceInfoSet, PSP_DEVINFO_DATA DeviceInfoData, DEVPROPKEY *PropertyKeyArray, DWORD PropertyKeyCount, PDWORD RequiredPropertyKeyCount, DWORD Flags);
#endif

Error message:

c.build.newt.windowlib:
     [echo] Output lib name = newt
     [echo] Compiling newt
       [cc] Starting dependency analysis for 2 files.
       [cc] 2 files are up to date.
       [cc] 0 files to be recompiled from dependency analysis.
       [cc] 1 total files to be compiled.
       [cc] cc1.exe: warning: command line option '-fno-rtti' is valid for C++/ObjC++ but not for C
       [cc] In file included from D:\work_java\jogl\src\newt\native\WindowsEDID.c:51:0:
       [cc] D:/opt/msys2/mingw64/x86_64-w64-mingw32/include/SetupApi.h:1879:119: error: unknown type name 'DEVPROPKEY'
       [cc]    WINSETUPAPI WINBOOL WINAPI SetupDiGetDevicePropertyW(HDEVINFO DeviceInfoSet, PSP_DEVINFO_DATA DeviceInfoData, const DEVPROPKEY *PropertyKey, DEVPROPTYPE *PropertyType, PBYTE PropertyBuffer, DWORD PropertyBufferSize, PDWORD RequiredSize, DWORD Flags);
       [cc]                                                                                                                        ^~~~~~~~~~
       [cc] D:/opt/msys2/mingw64/x86_64-w64-mingw32/include/SetupApi.h:1879:144: error: unknown type name 'DEVPROPTYPE'
       [cc]    WINSETUPAPI WINBOOL WINAPI SetupDiGetDevicePropertyW(HDEVINFO DeviceInfoSet, PSP_DEVINFO_DATA DeviceInfoData, const DEVPROPKEY *PropertyKey, DEVPROPTYPE *PropertyType, PBYTE PropertyBuffer, DWORD PropertyBufferSize, PDWORD RequiredSize, DWORD Flags);
       [cc]                                                                                                                                                 ^~~~~~~~~~~
       [cc] D:/opt/msys2/mingw64/x86_64-w64-mingw32/include/SetupApi.h:1880:116: error: unknown type name 'DEVPROPKEY'
       [cc]    WINSETUPAPI WINBOOL WINAPI SetupDiGetDevicePropertyKeys(HDEVINFO DeviceInfoSet, PSP_DEVINFO_DATA DeviceInfoData, DEVPROPKEY *PropertyKeyArray, DWORD PropertyKeyCount, PDWORD RequiredPropertyKeyCount, DWORD Flags);
       [cc]                                                                                                                     ^~~~~~~~~~


$ gcc --version
gcc.exe (Rev2, Built by MSYS2 project) 6.2.0
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


Reply | Threaded
Open this post in threaded view
|

Re: CLMemory#getNIOSize should use buffer.limit() instead of buffer.capacity().

Emily Leiviskä
In reply to this post by Wade Walker
I tried to run ant junit.run in jocl and it fails because it cannot find 7z. Which is rubbish because 7z is installed and on my path:

$ which 7z
/usr/bin/7z

Same problem also occurs for jogl but those unit tests took me like 45 minutes to run so I'm not running them again  :D

BUILD FAILED
D:\work_java\jocl\make\build.xml:866: The following error occurred while executing this line:
D:\work_java\jocl\make\build-test.xml:409: The following error occurred while executing this line:
D:\work_java\jocl\make\build-test.xml:418: The following error occurred while executing this line:
D:\work_java\gluegen\make\jogamp-archivetasks.xml:23: Execute failed: java.io.IOException: Cannot run program "7z" (in directory "D:\work_java\jocl\build\test\results"): CreateProcess error=2, The system cannot find the file specified
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
        at java.lang.Runtime.exec(Runtime.java:620)
        at org.apache.tools.ant.taskdefs.launcher.Java13CommandLauncher.exec(Java13CommandLauncher.java:58)
-snip-
Caused by: java.io.IOException: CreateProcess error=2, The system cannot find the file specified
        at java.lang.ProcessImpl.create(Native Method)
        at java.lang.ProcessImpl.<init>(ProcessImpl.java:386)
        at java.lang.ProcessImpl.start(ProcessImpl.java:137)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
        ... 62 more

Total time: 31 seconds

Any ideas?
Reply | Threaded
Open this post in threaded view
|

Re: CLMemory#getNIOSize should use buffer.limit() instead of buffer.capacity().

Wade Walker
Administrator
Are you compiling on Windows with MinGW? I've seen cases where having your project on a path with spaces in it causes an error in invoking external tools sometimes.

Regardless, the 7z failure shouldn't stop you -- it's run after all the tests finish to compress the results files, which is not necessary. As long as the results say "passed", the tests are OK.

Note that some tests may occasionally fail on some platforms due to differences between drivers and hardware for Nvidia and AMD. This is also normal, unfortunately. As long as the basic tests pass, you should be OK. If you have a specific test you're worried about, I can help you debug it if I have a platform similar to yours.
Reply | Threaded
Open this post in threaded view
|

Re: CLMemory#getNIOSize should use buffer.limit() instead of buffer.capacity().

Emily Leiviskä
Yes, Win7 pro 64bit on MinGW64 under MSYS2. There are no spaces in any of the involved paths... so that's not it. Anyway I disabled the test result archive task. Now I can run the tests.

However I'm encountering an issue writing my own tests. I copied one of the existing tests and only changed it to use "cloneWith" and now the test fails. I cannot see what I'm doing wrong...

        final int elements = NUM_ELEMENTS;
        final CLContext context = CLContext.create();

        // These two lines instead of commented line below causes the test to fail. The contents of clBufferB.buffer are rubbish.
        final ByteBuffer buffer = ByteBuffer.allocateDirect(elements*SIZEOF_INT);
        final CLBuffer<ByteBuffer> clBufferA = context.createBuffer(elements*SIZEOF_INT, Mem.READ_ONLY).cloneWith(buffer);

        // This works and IMHO should be equivalent to the above...
        // final CLBuffer<ByteBuffer> clBufferA = context.createByteBuffer(elements*SIZEOF_INT, Mem.READ_ONLY);


        final CLBuffer<ByteBuffer> clBufferB = context.createByteBuffer(elements*SIZEOF_INT, Mem.READ_ONLY);

        fillBuffer(clBufferA.buffer, 12345);

        final CLCommandQueue queue = context.getDevices()[0].createCommandQueue();
        queue.putWriteBuffer(clBufferA, false)                                 // write A
             .putCopyBuffer(clBufferA, clBufferB, clBufferA.buffer.capacity()) // copy A -> B
             .putReadBuffer(clBufferB, true)                                   // read B
             .finish();

        context.release();
        checkIfEqual(clBufferA.buffer, clBufferB.buffer, elements);
Reply | Threaded
Open this post in threaded view
|

Re: CLMemory#getNIOSize should use buffer.limit() instead of buffer.capacity().

Emily Leiviskä
Never mind, I found it. The host buffer had to be set to native byte order.

Pull request incoming :)
Reply | Threaded
Open this post in threaded view
|

Re: CLMemory#getNIOSize should use buffer.limit() instead of buffer.capacity().

Wade Walker
Administrator
Pull request merged -- thanks for your help! :)