jogamp › jocl

NullPointerException from native method

Classic

List

Threaded

7 messages Options

bananafish

Apr 11, 2023; 9:34pm

NullPointerException from native method

To debug my JOCL-based app I switched to OCLGrind as my device to find possible issues. It typically works fine but I suddenly started getting this exception on invocations of putNDRangeKernel():

java.lang.NullPointerException
at com.jogamp.opencl.llb.impl.CLAbstractImpl.dispatch_clEnqueueNDRangeKernel0(Native Method)
	at com.jogamp.opencl.llb.impl.CLAbstractImpl.clEnqueueNDRangeKernel(CLAbstractImpl.java:1346)
	at com.jogamp.opencl.CLCommandQueue.putNDRangeKernel(CLCommandQueue.java:1656)
	at com.jogamp.opencl.CLCommandQueue.put1DRangeKernel(CLCommandQueue.java:1523)
	at com.jogamp.opencl.CLCommandQueue.put1DRangeKernel(CLCommandQueue.java:1493)

I suspect this has something to do with OCLGrind but the origin of the NullPointerException - apparently a deliberately-thrown java exception on the native side - something of which OCLGrind has no concept- is puzzling Here's one example of the offending code, with asserts to try to catch something amiss:

assert numWorkItems > 0;
assert queue.getID() > 0;
assert kernel.getID() > 0;
assert queue.getContext().getID() > 0;
assert queue.getContext().getCL() != null;
queue.put1DRangeKernel(kernel, 0, numWorkItems, 0);

My POM deps have:

<dependency>
 <groupId>org.jocl</groupId>
 <artifactId>jocl</artifactId>
 <version>2.0.4</version>
</dependency>
<dependency>
 <groupId>org.jogamp.gluegen</groupId>
 <artifactId>gluegen-rt-main</artifactId>
 <version>2.3.2</version>
</dependency>
<dependency>
 <groupId>org.jogamp.jocl</groupId>
 <artifactId>jocl-main</artifactId>
 <version>2.3.2</version>
</dependency>

This exception is happening throughout the app at persistent but bafflingly-random selections of kernels. Switching back to Intel GPU as the device removes the issue and results in normal function.

...any idea as to what is throwing this NPE?

gouessej

Apr 12, 2023; 11:04am

Re: NullPointerException from native method

Administrator

This post was updated on Apr 12, 2023; 9:18pm.

Hello

Please use the version 2.4.0, it's available in our own Maven repository:
https://jogamp.org/deployment/maven/org/jogamp/jocl/jocl-main/2.4.0/

Maybe this Maven example can help:
https://gouessej.wordpress.com/2014/11/22/ardor3d-est-mort-vive-jogamps-ardor3d-continuation-ardor3d-is-dead-long-life-to-jogamps-ardor3d-continuation/#maven

I'm not an expert of JOCL but if you find something wrong, we'll ask you to use the latest version anyway first.

By the way, we don't support Optimus and similar technologies. If your laptop can switch between GPUs at runtime, it will cause problems. It might be the root cause.

Julien Gouesse | Personal blog | Website

bananafish

Apr 12, 2023; 8:00pm

Re: NullPointerException from native method

Thank you; I have now added the repo and updated my versions. The issue persists, below is the new trace:

java.lang.NullPointerException
	at com.jogamp.opencl.llb.impl.CLImpl11.dispatch_clEnqueueNDRangeKernel0(Native Method)
	at com.jogamp.opencl.llb.impl.CLImpl11.clEnqueueNDRangeKernel(CLImpl11.java:1354)
	at com.jogamp.opencl.CLCommandQueue.putNDRangeKernel(CLCommandQueue.java:1628)
	at com.jogamp.opencl.CLCommandQueue.put1DRangeKernel(CLCommandQueue.java:1495)
	at com.jogamp.opencl.CLCommandQueue.put1DRangeKernel(CLCommandQueue.java:1465)

It had to do with passing a 1.6MB READ_ONLY buffer as a constant arg, bigger than allowed by OCLGrind's 64KB. The Intel driver's limit was 3GB so it just slid by. Replacing constant with global const * fixed it. What still doesn't make sense is why this manifests as a NullPointerException. I do not see any code in JOCL's native source that would throw a NullPointerException for that function. Could the JVM be catching an uncaught C++ exception from the native call and converting it to a java exception? OCLGrind itself is written in C++. I haven't yet found anything about Temurin 11 doing automatic exception conversion.

Sven Gothel

Apr 12, 2023; 9:25pm

Re: NullPointerException from native method

Administrator

w/o looking at the dispatch generated code, it highly likely is a java object reference being null
while trying to use it (dereference). Otherwise we would have a SIGSEGV not a Java NPE.

Hence .. possible you pass 'null' to the CL method?

Now looking at the generated C code, we dereference all buffers if not null,
i.e. calling `GetDirectBufferAddress()` if( NULL != ... ).

Then we have one C assert on the native function pointer (disabled)
and call into (*ptr_clEnqueueNDRangeKernel)(...).
This latter call can't make a Java NPE, hence it must be one of the buffer usage earlier.

If you have a small reproducing test case, I like to have a look at it.

gouessej

Apr 12, 2023; 9:26pm

Re: NullPointerException from native method

Administrator

In reply to this post by bananafish

Thank you for the feedback. Maybe a global jobject used across JNI calls has become null but I wouldn't bet on it. At least you know the root cause.

Julien Gouesse | Personal blog | Website

gouessej

Apr 12, 2023; 9:29pm

Re: NullPointerException from native method

Administrator

In reply to this post by Sven Gothel

Would a null kernel identifier or null event identifiers cause that?

Julien Gouesse | Personal blog | Website

Sven Gothel

Apr 12, 2023; 9:29pm

Re: NullPointerException from native method

Administrator

In reply to this post by bananafish

bananafish wrote

Intel driver's limit was 3GB so it just slid by. Replacing constant with global const * fixed it. What still doesn't make sense is why this manifests as a NullPointerException. I do not see any code in JOCL's native source that would throw a NullPointerException for that function.

correct, same conclusion.

bananafish wrote

Could the JVM be catching an uncaught C++ exception from the native call and converting it to a java exception? OCLGrind itself is written in C++. I haven't yet found anything about Temurin 11 doing automatic exception conversion.

Same thing here, weird.

We don't use nor catch C++ stuff here.
Only in Direct-BT I used a C++ -> Java mapping with mapping C++ exceptions to Java.
But our GlueGen compiler for all JogAmp uses plain old C for same API.