NullPointerException from native method

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

NullPointerException from native method

bananafish
To debug my JOCL-based app I switched to OCLGrind as my device to find possible issues. It typically works fine but I suddenly started getting this exception on invocations of putNDRangeKernel():

java.lang.NullPointerException
at com.jogamp.opencl.llb.impl.CLAbstractImpl.dispatch_clEnqueueNDRangeKernel0(Native Method)
	at com.jogamp.opencl.llb.impl.CLAbstractImpl.clEnqueueNDRangeKernel(CLAbstractImpl.java:1346)
	at com.jogamp.opencl.CLCommandQueue.putNDRangeKernel(CLCommandQueue.java:1656)
	at com.jogamp.opencl.CLCommandQueue.put1DRangeKernel(CLCommandQueue.java:1523)
	at com.jogamp.opencl.CLCommandQueue.put1DRangeKernel(CLCommandQueue.java:1493)

I suspect this has something to do with OCLGrind but the origin of the NullPointerException - apparently a deliberately-thrown java exception on the native side - something of which OCLGrind has no concept- is puzzling Here's one example of the offending code, with asserts to try to catch something amiss:

assert numWorkItems > 0;
assert queue.getID() > 0;
assert kernel.getID() > 0;
assert queue.getContext().getID() > 0;
assert queue.getContext().getCL() != null;
queue.put1DRangeKernel(kernel, 0, numWorkItems, 0);

My POM deps have:
<dependency>
 <groupId>org.jocl</groupId>
 <artifactId>jocl</artifactId>
 <version>2.0.4</version>
</dependency>
<dependency>
 <groupId>org.jogamp.gluegen</groupId>
 <artifactId>gluegen-rt-main</artifactId>
 <version>2.3.2</version>
</dependency>
<dependency>
 <groupId>org.jogamp.jocl</groupId>
 <artifactId>jocl-main</artifactId>
 <version>2.3.2</version>
</dependency>

This exception is happening throughout the app at persistent but bafflingly-random selections of kernels. Switching back to Intel GPU as the device removes the issue and results in normal function.

...any idea as to what is throwing this NPE?
Reply | Threaded
Open this post in threaded view
|

Re: NullPointerException from native method

gouessej
Administrator
This post was updated on .
Hello

Please use the version 2.4.0, it's available in our own Maven repository:
https://jogamp.org/deployment/maven/org/jogamp/jocl/jocl-main/2.4.0/

Maybe this Maven example can help:
https://gouessej.wordpress.com/2014/11/22/ardor3d-est-mort-vive-jogamps-ardor3d-continuation-ardor3d-is-dead-long-life-to-jogamps-ardor3d-continuation/#maven

I'm not an expert of JOCL but if you find something wrong, we'll ask you to use the latest version anyway first.

By the way, we don't support Optimus and similar technologies. If your laptop can switch between GPUs at runtime, it will cause problems. It might be the root cause.
Julien Gouesse | Personal blog | Website
Reply | Threaded
Open this post in threaded view
|

Re: NullPointerException from native method

bananafish
Thank you; I have now added the repo and updated my versions. The issue persists, below is the new trace:

java.lang.NullPointerException
	at com.jogamp.opencl.llb.impl.CLImpl11.dispatch_clEnqueueNDRangeKernel0(Native Method)
	at com.jogamp.opencl.llb.impl.CLImpl11.clEnqueueNDRangeKernel(CLImpl11.java:1354)
	at com.jogamp.opencl.CLCommandQueue.putNDRangeKernel(CLCommandQueue.java:1628)
	at com.jogamp.opencl.CLCommandQueue.put1DRangeKernel(CLCommandQueue.java:1495)
	at com.jogamp.opencl.CLCommandQueue.put1DRangeKernel(CLCommandQueue.java:1465)
       


It had to do with passing a 1.6MB READ_ONLY buffer as a constant arg, bigger than allowed by OCLGrind's 64KB. The Intel driver's limit was 3GB so it just slid by. Replacing constant with global const * fixed it. What still doesn't make sense is why this manifests as a NullPointerException. I do not see any code in JOCL's native source that would throw a NullPointerException for that function. Could the JVM be catching an uncaught C++ exception from the native call and converting it to a java exception? OCLGrind itself is written in C++. I haven't yet found anything about Temurin 11 doing automatic exception conversion.
Reply | Threaded
Open this post in threaded view
|

Re: NullPointerException from native method

Sven Gothel
Administrator
w/o looking at the dispatch generated code, it highly likely is a java object reference being null
while trying to use it (dereference). Otherwise we would have a SIGSEGV not a Java NPE.

Hence .. possible you pass 'null' to the CL method?

Now looking at the generated C code, we dereference all buffers if not null,
i.e. calling `GetDirectBufferAddress()` if( NULL != ... ).

Then we have one C assert on the native function pointer (disabled)
and call into (*ptr_clEnqueueNDRangeKernel)(...).
This latter call can't make a Java NPE, hence it must be one of the buffer usage earlier.

If you have a small reproducing test case, I like to have a look at it.
Reply | Threaded
Open this post in threaded view
|

Re: NullPointerException from native method

gouessej
Administrator
In reply to this post by bananafish
Thank you for the feedback. Maybe a global jobject used across JNI calls has become null but I wouldn't bet on it. At least you know the root cause.
Julien Gouesse | Personal blog | Website
Reply | Threaded
Open this post in threaded view
|

Re: NullPointerException from native method

gouessej
Administrator
In reply to this post by Sven Gothel
Would a null kernel identifier or null event identifiers cause that?
Julien Gouesse | Personal blog | Website
Reply | Threaded
Open this post in threaded view
|

Re: NullPointerException from native method

Sven Gothel
Administrator
In reply to this post by bananafish
bananafish wrote
Intel driver's limit was 3GB so it just slid by. Replacing constant with global const * fixed it. What still doesn't make sense is why this manifests as a NullPointerException. I do not see any code in JOCL's native source that would throw a NullPointerException for that function.
correct, same conclusion.

bananafish wrote
Could the JVM be catching an uncaught C++ exception from the native call and converting it to a java exception? OCLGrind itself is written in C++. I haven't yet found anything about Temurin 11 doing automatic exception conversion.
Same thing here, weird.

We don't use nor catch C++ stuff here.
Only in Direct-BT I used a C++ -> Java mapping with mapping C++ exceptions to Java.
But our GlueGen compiler for all JogAmp uses plain old C for same API.