using __local

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

using __local

gmseed
Hi

I have a triangle-ray intersection kernel that was working fine using global memory but when I add a local kernel argument it's throwing an exception when trying to read the returned intersection point, which is returned in a buffer of 1 float.

Kernel:

__kernel void triangles_ray_intersection(__global float* vxs, __global float* vys, __global float* vzs,
                                         __global float* ray_origin, __global float* ray_direction,
                                         __global float* t_nearest_output,
                                         __local float* t_loc)

buffers:

                CLBuffer<FloatBuffer> clBuffer_xs = context.createFloatBuffer(bufferSize, CLMemory.Mem.READ_ONLY);
                CLBuffer<FloatBuffer> clBuffer_ys = context.createFloatBuffer(bufferSize, CLMemory.Mem.READ_ONLY);
                CLBuffer<FloatBuffer> clBuffer_zs = context.createFloatBuffer(bufferSize, CLMemory.Mem.READ_ONLY);
                CLBuffer<FloatBuffer> clBuffer_rayOrigin = context.createFloatBuffer(3, CLMemory.Mem.READ_ONLY);
                CLBuffer<FloatBuffer> clBuffer_rayDir = context.createFloatBuffer(3, CLMemory.Mem.READ_ONLY);
                CLBuffer<FloatBuffer> clBuffer_t = context.createFloatBuffer(1, CLMemory.Mem.WRITE_ONLY);

arg assignment; noting that i use putNullArg() for the __local arg. Is that correct?:

                kernel.putArgs(clBuffer_xs,clBuffer_ys,clBuffer_zs,clBuffer_rayOrigin,clBuffer_rayDir,clBuffer_t)
                .putNullArg(localWorkSize);

and finally trying to read back the result:

                queue.putWriteBuffer(clBuffer_xs, false)
                        .putWriteBuffer(clBuffer_ys, false)
                        .putWriteBuffer(clBuffer_zs, false)
                        .putWriteBuffer(clBuffer_rayOrigin, false)
                        .putWriteBuffer(clBuffer_rayDir, false)
                        .put1DRangeKernel(kernel, 0, globalWorkSize, localWorkSize)
     .putReadBuffer(clBuffer_t, true);

It throws the exception at the last putReadBuffer() call:

Exception in thread "main" com.jogamp.opencl.CLException$CLOutOfResourcesException: can not enqueue read-buffer: CLBuffer [id: 422414000 buffer: java.nio.DirectFloatBufferU[pos=0 lim=1 cap=1]] with
cond.: null events: null [error: CL_OUT_OF_RESOURCES]
        at com.jogamp.opencl.CLException.newException(CLException.java:79)
        at com.jogamp.opencl.CLCommandQueue.putReadBuffer(CLCommandQueue.java:185)
        at com.jogamp.opencl.CLCommandQueue.putReadBuffer(CLCommandQueue.java:155)
        at com.isl.opencl.jogamp_jocl.geo.threed.intersection.SimpleTriangleMeshRayIntersection3D.nearestIntersection_OpenCL(SimpleTriangleMeshRayIntersection3D.java:149)
        at com.isl.opencl.jogamp_jocl.geo.threed.intersection.SimpleTriangleMeshRayIntersection3D.nearestIntersection(SimpleTriangleMeshRayIntersection3D.java:51)
        at com.isl.opencl.jogamp_jocl.geo.threed.intersection.SimpleTriangleMeshRayIntersection3D.main(SimpleTriangleMeshRayIntersection3D.java:253)

Thanks

Graham
Reply | Threaded
Open this post in threaded view
|

Re: using __local

Wade Walker
Administrator
CL_OUT_OF_RESOURCES can be hard to debug. Have you checked the amount of local memory available on your device to make sure you're not exceeding it? Only 64KB is required by the spec. You might try reducing the amount of work you're enqueing; if that works, you're just not doing the right checks to stay within your hardware constraints.
Reply | Threaded
Open this post in threaded view
|

Re: using __local

adem9438
In reply to this post by gmseed
It's just that the jogamp-jocl tutorial page is citing a paper comparing itself against jocl and javacl, in which the basis and results of the test are questionable.
adem
Reply | Threaded
Open this post in threaded view
|

Re: using __local

Xerxes Rånby
adem9438 wrote
It's just that the jogamp-jocl tutorial page is citing a paper comparing itself against jocl and javacl, in which the basis and results of the test are questionable.
The paper linked is good at describing JogAmp JOGL configure and setup thats why I added it to the tutorial section.

If you want a more reliable OpenCL benchmark then please port http://www.luxrender.net/wiki/LuxMark to all three Java OpenCL bindings and compare results. LuxMark will give a better benchmark setup.
Reply | Threaded
Open this post in threaded view
|

Re: using __local

gouessej
Administrator
In reply to this post by adem9438
He already said that here and meanwhile I haven't changed my mind.
Julien Gouesse | Personal blog | Website
Reply | Threaded
Open this post in threaded view
|

Re: using __local

darla
So are the multiple GPUs cooperating somehow in this algorithm, or are they supposed to be operating separately on separate buffers? If they're cooperating, I can see how there might be a problem, since you have to use clFlush()/clFinish() to synchronize between different command queues (I assume you have both devices in one context, but a separate command queue for each one). There's also the issue of copying a single buffer to two devices and then getting separate sets of results back to the host without clobbering one.

If your devices are supposed to be operating separately, and all the buffers and host memory are separate, then I'm not sure what's going on