To debug my JOCL-based app I switched to OCLGrind as my device to find possible issues. It typically works fine but I suddenly started getting this exception on invocations of putNDRangeKernel():
java.lang.NullPointerException at com.jogamp.opencl.llb.impl.CLAbstractImpl.dispatch_clEnqueueNDRangeKernel0(Native Method) at com.jogamp.opencl.llb.impl.CLAbstractImpl.clEnqueueNDRangeKernel(CLAbstractImpl.java:1346) at com.jogamp.opencl.CLCommandQueue.putNDRangeKernel(CLCommandQueue.java:1656) at com.jogamp.opencl.CLCommandQueue.put1DRangeKernel(CLCommandQueue.java:1523) at com.jogamp.opencl.CLCommandQueue.put1DRangeKernel(CLCommandQueue.java:1493) I suspect this has something to do with OCLGrind but the origin of the NullPointerException - apparently a deliberately-thrown java exception on the native side - something of which OCLGrind has no concept- is puzzling Here's one example of the offending code, with asserts to try to catch something amiss: assert numWorkItems > 0; assert queue.getID() > 0; assert kernel.getID() > 0; assert queue.getContext().getID() > 0; assert queue.getContext().getCL() != null; queue.put1DRangeKernel(kernel, 0, numWorkItems, 0); My POM deps have: <dependency> <groupId>org.jocl</groupId> <artifactId>jocl</artifactId> <version>2.0.4</version> </dependency> <dependency> <groupId>org.jogamp.gluegen</groupId> <artifactId>gluegen-rt-main</artifactId> <version>2.3.2</version> </dependency> <dependency> <groupId>org.jogamp.jocl</groupId> <artifactId>jocl-main</artifactId> <version>2.3.2</version> </dependency> This exception is happening throughout the app at persistent but bafflingly-random selections of kernels. Switching back to Intel GPU as the device removes the issue and results in normal function. ...any idea as to what is throwing this NPE? |
Administrator
|
This post was updated on .
Hello
Please use the version 2.4.0, it's available in our own Maven repository: https://jogamp.org/deployment/maven/org/jogamp/jocl/jocl-main/2.4.0/ Maybe this Maven example can help: https://gouessej.wordpress.com/2014/11/22/ardor3d-est-mort-vive-jogamps-ardor3d-continuation-ardor3d-is-dead-long-life-to-jogamps-ardor3d-continuation/#maven I'm not an expert of JOCL but if you find something wrong, we'll ask you to use the latest version anyway first. By the way, we don't support Optimus and similar technologies. If your laptop can switch between GPUs at runtime, it will cause problems. It might be the root cause.
Julien Gouesse | Personal blog | Website
|
Thank you; I have now added the repo and updated my versions. The issue persists, below is the new trace:
java.lang.NullPointerException at com.jogamp.opencl.llb.impl.CLImpl11.dispatch_clEnqueueNDRangeKernel0(Native Method) at com.jogamp.opencl.llb.impl.CLImpl11.clEnqueueNDRangeKernel(CLImpl11.java:1354) at com.jogamp.opencl.CLCommandQueue.putNDRangeKernel(CLCommandQueue.java:1628) at com.jogamp.opencl.CLCommandQueue.put1DRangeKernel(CLCommandQueue.java:1495) at com.jogamp.opencl.CLCommandQueue.put1DRangeKernel(CLCommandQueue.java:1465) It had to do with passing a 1.6MB READ_ONLY buffer as a constant arg, bigger than allowed by OCLGrind's 64KB. The Intel driver's limit was 3GB so it just slid by. Replacing constant with global const * fixed it. What still doesn't make sense is why this manifests as a NullPointerException. I do not see any code in JOCL's native source that would throw a NullPointerException for that function. Could the JVM be catching an uncaught C++ exception from the native call and converting it to a java exception? OCLGrind itself is written in C++. I haven't yet found anything about Temurin 11 doing automatic exception conversion. |
Administrator
|
w/o looking at the dispatch generated code, it highly likely is a java object reference being null
while trying to use it (dereference). Otherwise we would have a SIGSEGV not a Java NPE. Hence .. possible you pass 'null' to the CL method? Now looking at the generated C code, we dereference all buffers if not null, i.e. calling `GetDirectBufferAddress()` if( NULL != ... ). Then we have one C assert on the native function pointer (disabled) and call into (*ptr_clEnqueueNDRangeKernel)(...). This latter call can't make a Java NPE, hence it must be one of the buffer usage earlier. If you have a small reproducing test case, I like to have a look at it. |
Administrator
|
In reply to this post by bananafish
Thank you for the feedback. Maybe a global jobject used across JNI calls has become null but I wouldn't bet on it. At least you know the root cause.
Julien Gouesse | Personal blog | Website
|
Administrator
|
In reply to this post by Sven Gothel
Would a null kernel identifier or null event identifiers cause that?
Julien Gouesse | Personal blog | Website
|
Administrator
|
In reply to this post by bananafish
correct, same conclusion. Same thing here, weird. We don't use nor catch C++ stuff here. Only in Direct-BT I used a C++ -> Java mapping with mapping C++ exceptions to Java. But our GlueGen compiler for all JogAmp uses plain old C for same API. |
Free forum by Nabble | Edit this page |