Hi
I have a triangle-ray intersection kernel that was working fine using global memory but when I add a local kernel argument it's throwing an exception when trying to read the returned intersection point, which is returned in a buffer of 1 float. Kernel: __kernel void triangles_ray_intersection(__global float* vxs, __global float* vys, __global float* vzs, __global float* ray_origin, __global float* ray_direction, __global float* t_nearest_output, __local float* t_loc) buffers: CLBuffer<FloatBuffer> clBuffer_xs = context.createFloatBuffer(bufferSize, CLMemory.Mem.READ_ONLY); CLBuffer<FloatBuffer> clBuffer_ys = context.createFloatBuffer(bufferSize, CLMemory.Mem.READ_ONLY); CLBuffer<FloatBuffer> clBuffer_zs = context.createFloatBuffer(bufferSize, CLMemory.Mem.READ_ONLY); CLBuffer<FloatBuffer> clBuffer_rayOrigin = context.createFloatBuffer(3, CLMemory.Mem.READ_ONLY); CLBuffer<FloatBuffer> clBuffer_rayDir = context.createFloatBuffer(3, CLMemory.Mem.READ_ONLY); CLBuffer<FloatBuffer> clBuffer_t = context.createFloatBuffer(1, CLMemory.Mem.WRITE_ONLY); arg assignment; noting that i use putNullArg() for the __local arg. Is that correct?: kernel.putArgs(clBuffer_xs,clBuffer_ys,clBuffer_zs,clBuffer_rayOrigin,clBuffer_rayDir,clBuffer_t) .putNullArg(localWorkSize); and finally trying to read back the result: queue.putWriteBuffer(clBuffer_xs, false) .putWriteBuffer(clBuffer_ys, false) .putWriteBuffer(clBuffer_zs, false) .putWriteBuffer(clBuffer_rayOrigin, false) .putWriteBuffer(clBuffer_rayDir, false) .put1DRangeKernel(kernel, 0, globalWorkSize, localWorkSize) .putReadBuffer(clBuffer_t, true); It throws the exception at the last putReadBuffer() call: Exception in thread "main" com.jogamp.opencl.CLException$CLOutOfResourcesException: can not enqueue read-buffer: CLBuffer [id: 422414000 buffer: java.nio.DirectFloatBufferU[pos=0 lim=1 cap=1]] with cond.: null events: null [error: CL_OUT_OF_RESOURCES] at com.jogamp.opencl.CLException.newException(CLException.java:79) at com.jogamp.opencl.CLCommandQueue.putReadBuffer(CLCommandQueue.java:185) at com.jogamp.opencl.CLCommandQueue.putReadBuffer(CLCommandQueue.java:155) at com.isl.opencl.jogamp_jocl.geo.threed.intersection.SimpleTriangleMeshRayIntersection3D.nearestIntersection_OpenCL(SimpleTriangleMeshRayIntersection3D.java:149) at com.isl.opencl.jogamp_jocl.geo.threed.intersection.SimpleTriangleMeshRayIntersection3D.nearestIntersection(SimpleTriangleMeshRayIntersection3D.java:51) at com.isl.opencl.jogamp_jocl.geo.threed.intersection.SimpleTriangleMeshRayIntersection3D.main(SimpleTriangleMeshRayIntersection3D.java:253) Thanks Graham |
Administrator
|
CL_OUT_OF_RESOURCES can be hard to debug. Have you checked the amount of local memory available on your device to make sure you're not exceeding it? Only 64KB is required by the spec. You might try reducing the amount of work you're enqueing; if that works, you're just not doing the right checks to stay within your hardware constraints.
|
In reply to this post by gmseed
It's just that the jogamp-jocl tutorial page is citing a paper comparing itself against jocl and javacl, in which the basis and results of the test are questionable.
adem
|
The paper linked is good at describing JogAmp JOGL configure and setup thats why I added it to the tutorial section. If you want a more reliable OpenCL benchmark then please port http://www.luxrender.net/wiki/LuxMark to all three Java OpenCL bindings and compare results. LuxMark will give a better benchmark setup. |
Administrator
|
In reply to this post by adem9438
He already said that here and meanwhile I haven't changed my mind.
Julien Gouesse | Personal blog | Website
|
So are the multiple GPUs cooperating somehow in this algorithm, or are they supposed to be operating separately on separate buffers? If they're cooperating, I can see how there might be a problem, since you have to use clFlush()/clFinish() to synchronize between different command queues (I assume you have both devices in one context, but a separate command queue for each one). There's also the issue of copying a single buffer to two devices and then getting separate sets of results back to the host without clobbering one.
If your devices are supposed to be operating separately, and all the buffers and host memory are separate, then I'm not sure what's going on |
Free forum by Nabble | Edit this page |