jogamp › jocl

using __local

Classic

List

Threaded

6 messages Options

gmseed

Nov 02, 2013; 10:53am

using __local

Hi

I have a triangle-ray intersection kernel that was working fine using global memory but when I add a local kernel argument it's throwing an exception when trying to read the returned intersection point, which is returned in a buffer of 1 float.

Kernel:

__kernel void triangles_ray_intersection(__global float* vxs, __global float* vys, __global float* vzs,
__global float* ray_origin, __global float* ray_direction,
__global float* t_nearest_output,
__local float* t_loc)

buffers:

CLBuffer<FloatBuffer> clBuffer_xs = context.createFloatBuffer(bufferSize, CLMemory.Mem.READ_ONLY);
CLBuffer<FloatBuffer> clBuffer_ys = context.createFloatBuffer(bufferSize, CLMemory.Mem.READ_ONLY);
CLBuffer<FloatBuffer> clBuffer_zs = context.createFloatBuffer(bufferSize, CLMemory.Mem.READ_ONLY);
CLBuffer<FloatBuffer> clBuffer_rayOrigin = context.createFloatBuffer(3, CLMemory.Mem.READ_ONLY);
CLBuffer<FloatBuffer> clBuffer_rayDir = context.createFloatBuffer(3, CLMemory.Mem.READ_ONLY);
CLBuffer<FloatBuffer> clBuffer_t = context.createFloatBuffer(1, CLMemory.Mem.WRITE_ONLY);

arg assignment; noting that i use putNullArg() for the __local arg. Is that correct?:

kernel.putArgs(clBuffer_xs,clBuffer_ys,clBuffer_zs,clBuffer_rayOrigin,clBuffer_rayDir,clBuffer_t)
.putNullArg(localWorkSize);

and finally trying to read back the result:

queue.putWriteBuffer(clBuffer_xs, false)
.putWriteBuffer(clBuffer_ys, false)
.putWriteBuffer(clBuffer_zs, false)
.putWriteBuffer(clBuffer_rayOrigin, false)
.putWriteBuffer(clBuffer_rayDir, false)
.put1DRangeKernel(kernel, 0, globalWorkSize, localWorkSize)
.putReadBuffer(clBuffer_t, true);

It throws the exception at the last putReadBuffer() call:

Exception in thread "main" com.jogamp.opencl.CLException$CLOutOfResourcesException: can not enqueue read-buffer: CLBuffer [id: 422414000 buffer: java.nio.DirectFloatBufferU[pos=0 lim=1 cap=1]] with
cond.: null events: null [error: CL_OUT_OF_RESOURCES]
at com.jogamp.opencl.CLException.newException(CLException.java:79)
at com.jogamp.opencl.CLCommandQueue.putReadBuffer(CLCommandQueue.java:185)
at com.jogamp.opencl.CLCommandQueue.putReadBuffer(CLCommandQueue.java:155)
at com.isl.opencl.jogamp_jocl.geo.threed.intersection.SimpleTriangleMeshRayIntersection3D.nearestIntersection_OpenCL(SimpleTriangleMeshRayIntersection3D.java:149)
at com.isl.opencl.jogamp_jocl.geo.threed.intersection.SimpleTriangleMeshRayIntersection3D.nearestIntersection(SimpleTriangleMeshRayIntersection3D.java:51)
at com.isl.opencl.jogamp_jocl.geo.threed.intersection.SimpleTriangleMeshRayIntersection3D.main(SimpleTriangleMeshRayIntersection3D.java:253)

Thanks

Graham

Wade Walker

Nov 02, 2013; 4:11pm

Re: using __local

Administrator

CL_OUT_OF_RESOURCES can be hard to debug. Have you checked the amount of local memory available on your device to make sure you're not exceeding it? Only 64KB is required by the spec. You might try reducing the amount of work you're enqueing; if that works, you're just not doing the right checks to stay within your hardware constraints.

adem9438

Nov 12, 2013; 7:59am

Re: using __local

In reply to this post by gmseed

It's just that the jogamp-jocl tutorial page is citing a paper comparing itself against jocl and javacl, in which the basis and results of the test are questionable.

adem

Xerxes Rånby

Nov 12, 2013; 8:39am

Re: using __local

adem9438 wrote

It's just that the jogamp-jocl tutorial page is citing a paper comparing itself against jocl and javacl, in which the basis and results of the test are questionable.

The paper linked is good at describing JogAmp JOGL configure and setup thats why I added it to the tutorial section.

If you want a more reliable OpenCL benchmark then please port http://www.luxrender.net/wiki/LuxMark to all three Java OpenCL bindings and compare results. LuxMark will give a better benchmark setup.

gouessej

Nov 14, 2013; 3:52pm

Re: using __local

Administrator

In reply to this post by adem9438

He already said that here and meanwhile I haven't changed my mind.

Julien Gouesse | Personal blog | Website

darla

Sep 19, 2014; 6:49am

Re: using __local

So are the multiple GPUs cooperating somehow in this algorithm, or are they supposed to be operating separately on separate buffers? If they're cooperating, I can see how there might be a problem, since you have to use clFlush()/clFinish() to synchronize between different command queues (I assume you have both devices in one context, but a separate command queue for each one). There's also the issue of copying a single buffer to two devices and then getting separate sets of results back to the host without clobbering one.

If your devices are supposed to be operating separately, and all the buffers and host memory are separate, then I'm not sure what's going on