Tried the HelloJOCL example on MacOS X 10.6.6. on a MacBook Pro laptop.
On running I got: Exception in thread "main" java.lang.OutOfMemoryError: Direct buffer memory at java.nio.Bits.reserveMemory(Bits.java:633) at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:98) at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288) at com.jogamp.common.nio.Buffers.newDirectByteBuffer(Buffers.java:67) at com.jogamp.common.nio.Buffers.newDirectFloatBuffer(Buffers.java:109) at com.jogamp.opencl.CLContext.createFloatBuffer(CLContext.java:316) at HelloJOCL.main(HelloJOCL.java:30) so I then changed line 20 to make elementCount smaller: int elementCount = 1444777; But now I get a new error: Exception in thread "main" com.jogamp.opencl.CLException$CLInvalidWorkGroupSizeException: can not enqueue 1DRange CLKernel [id: 4296105472 name: VectorAdd] with gwo: null gws: {1444864} lws: {256} cond.: null events: null [error: CL_INVALID_WORK_GROUP_SIZE] at com.jogamp.opencl.CLException.newException(CLException.java:78) at com.jogamp.opencl.CLCommandQueue.putNDRangeKernel(CLCommandQueue.java:1547) at com.jogamp.opencl.CLCommandQueue.put1DRangeKernel(CLCommandQueue.java:1455) at com.jogamp.opencl.CLCommandQueue.put1DRangeKernel(CLCommandQueue.java:1425) at HelloJOCL.main(HelloJOCL.java:47) Any suggestions how I can overcome this problem? |
Hi,
regarding the out of direct memory error you could ether increase the the max direct memory size (XX:MaxDirectMemorySize=) or decrease the element count and therefore the used memory in the demo. work group size is hardcoded in line 27 and is hardware dependent: int localWorkSize = 256; 256 should work on most hardware... you could use the CLInfo utility to find out what your HW's limit is: http://jogamp.org/jocl-demos/www/ regards, michael On 02/09/2011 09:13 AM, ralphrmartin [via jogamp] wrote: > Tried the HelloJOCL example on MacOS X 10.6.6. on a MacBook Pro laptop. > > On running I got: > Exception in thread "main" java.lang.OutOfMemoryError: Direct buffer memory > at java.nio.Bits.reserveMemory(Bits.java:633) > at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:98) > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288) > at > com.jogamp.common.nio.Buffers.newDirectByteBuffer(Buffers.java:67) > at > com.jogamp.common.nio.Buffers.newDirectFloatBuffer(Buffers.java:109) > at com.jogamp.opencl.CLContext.createFloatBuffer(CLContext.java:316) > at HelloJOCL.main(HelloJOCL.java:30) > > so I then changed line 20 to make elementCount smaller: > int elementCount = 1444777; > > But now I get a new error: > Exception in thread "main" > com.jogamp.opencl.CLException$CLInvalidWorkGroupSizeException: can not > enqueue 1DRange CLKernel [id: 4296105472 name: VectorAdd] > with gwo: null gws: {1444864} lws: {256} > cond.: null events: null [error: CL_INVALID_WORK_GROUP_SIZE] > at com.jogamp.opencl.CLException.newException(CLException.java:78) > at > com.jogamp.opencl.CLCommandQueue.putNDRangeKernel(CLCommandQueue.java:1547) > at > com.jogamp.opencl.CLCommandQueue.put1DRangeKernel(CLCommandQueue.java:1455) > at > com.jogamp.opencl.CLCommandQueue.put1DRangeKernel(CLCommandQueue.java:1425) > at HelloJOCL.main(HelloJOCL.java:47) > > Any suggestions how I can overcome this problem? > > > > _______________________________________________ > If you reply to this email, your message will be added to the discussion below: > http://jogamp.762907.n3.nabble.com/HelloJOCL-Problems-on-MacOSX-tp2456996p2456996.html > To start a new topic under jogamp, email [hidden email] > To unsubscribe from jogamp, visit http://michael-bien.com/ |
Michael, thanks for taking the time to reply.
Decreasing the element count seemed to overcome the out of memory problem. Howeevr, the worksize is still a problem.The MacBook Pro has both an Nvidia GeForce GT 330M and Intel integrated graphics; I am forcing it to use the former. I get two devices reported using CLInfo: CL_DEVICE_NAME GeForce GT 330M Intel(R) Core(TM) i7 CPU M 620 @ 2.67GHz and respectively CL_DEVICE_MAX_WORK_GROUP_SIZE 512 1 So I don't really understand why that value of 256 is failing. In fact even a worksize as small as 2 fails. Maybe it is using the CPU instead of the GPU? If I set it to 1, I then get a different error: Exception in thread "main" java.lang.NullPointerException at com.jogamp.opencl.CLContext.release(CLContext.java:486) at com.jogamp.opencl.CLContext.release(CLContext.java:504) at HelloJOCL.main(HelloJOCL.java:54) |
I introduced the NPE regression yesterday night. Please take build 276
or later: https://jogamp.org/chuck/job/jocl/276/ line 52 creates the CLCommandQueue on the fastest device. You could choose a different one. We saw in our test cluster that taking MAX_WORK_GROUP_SIZE as default does not work with some drivers. Thats why most demos/tests hardcode that value. best regards, michael On 02/09/2011 06:49 PM, ralphrmartin [via jogamp] wrote: > Michael, thanks for taking the time to reply. > > Decreasing the element count seemed to overcome the out of memory problem. > > Howeevr, the worksize is still a problem.The MacBook Pro has both an Nvidia > GeForce GT 330M and Intel integrated graphics; I am forcing it to use the > former. I get two devices reported using CLInfo: > CL_DEVICE_NAME GeForce GT 330M Intel(R) Core(TM) i7 CPU M 620 @ 2.67GHz > and respectively > CL_DEVICE_MAX_WORK_GROUP_SIZE 512 1 > > So I don't really understand why that value of 256 is failing. In fact even > a worksize as small as 2 fails. Maybe it is using the CPU instead of the > GPU? > > If I set it to 1, I then get a different error: > Exception in thread "main" java.lang.NullPointerException > at com.jogamp.opencl.CLContext.release(CLContext.java:486) > at com.jogamp.opencl.CLContext.release(CLContext.java:504) > at HelloJOCL.main(HelloJOCL.java:54) > > _______________________________________________ > If you reply to this email, your message will be added to the discussion below: > http://jogamp.762907.n3.nabble.com/HelloJOCL-Problems-on-MacOSX-tp2456996p2460165.html > To start a new topic under jogamp, email [hidden email] > To unsubscribe from jogamp, visit http://michael-bien.com/ |
Michael, thanks again.
CLCommandQueue queue = context.getDevices()[0].createCommandQueue(); let me use the GPU and a worksize of 256. The later build 276 fixed the NPE bug. All is now looking good! |
This post was updated on .
that sounds great Martin,
if you already know that you have to use the GPU you could do something like that: > // search a platform supporting GPU devices > CLPlatform platform = CLPlatform.getDefault(CLPlatformFilters.type(Type.GPU)); > > // create a context on that platform using all devices > CLContext context = CLContext.create(platform); > try{ > // queue on fastest GPU > CLCommandQueue queue = context.getMaxFlopsDevice(Type.GPU).createCommandQueue(); > > // start here ... > > }finally { > context.release(); > } its basically always a two step process: - pick a platform - create a context on all devices and pick the device for the CommandQueue later or - pick devices of that platform and create the context only on the devices you selected best regards, michael On 02/09/2011 07:39 PM, ralphrmartin [via jogamp] wrote: > > Michael, thanks again. > > CLCommandQueue queue = context.getDevices()[0].createCommandQueue(); > let me use the GPU and a worksize of 256. > > The later build 276 fixed the NPE bug. > > All is now looking good! > > _______________________________________________ > If you reply to this email, your message will be added to the discussion below: > http://jogamp.762907.n3.nabble.com/HelloJOCL-Problems-on-MacOSX-tp2456996p2460616.html > To start a new topic under jogamp, email ml-node+762907-380265080-8131@n3.nabble.com > To unsubscribe from jogamp, visit http://jogamp.762907.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=762907&code=YmllbmF0b3JAYXJjb3IuZGV8NzYyOTA3fDQxNTEwMDY0OA== -- http://michael-bien.com/ |
Michael,
Excellent! Thanks for the further tip. Ralph |
Just a quick remark for anyone else who is following this: I needed to write the below to get it to compile:
CLPlatform.getDefault(CLPlatformFilters.type(CLDevice.Type.GPU)) |
right. The enums reside usually in the same namespace as the class
where they are associated with. This makes them look a little bit to long from time to time. you can easily get rid of that by using static imports (Java 5 and later), for example: import static com.jogamp.opencl.CLDevice.Type.*; import static com.jogamp.opencl.util.CLPlatformFilters.*; ... CLPlatform.getDefault(type(GPU)); thats what i did in the code snippet of the last mail, just a little bit less radical as above. -michael On 02/10/2011 12:26 AM, ralphrmartin [via jogamp] wrote: > Just a quick remark for anyone else who is following this: I needed to write > the below to get it to compile: > > CLPlatform.getDefault(CLPlatformFilters.type(CLDevice.Type.GPU)) > > > _______________________________________________ > If you reply to this email, your message will be added to the discussion below: > http://jogamp.762907.n3.nabble.com/HelloJOCL-Problems-on-MacOSX-tp2456996p2462497.html > To start a new topic under jogamp, email [hidden email] > To unsubscribe from jogamp, visit http://michael-bien.com/ |
I ran into the same problems as described here. For some reason context.getMaxFlopsDevice() returned the CPU device and the localWorkSize of 256 caused problems.
After switching to the GPU (and reducing the elementCount to avoid out of memory errors) the HelloJOCL worked. Changing localWorkSize to "device.getMaxWorkGroupSize()" (1024) caused problems for me (CL_INVALID_WORK_GROUP_SIZE). One important thing seams to be, that the localWorkSize depends on the specific kernel. Using "kernel.getWorkGroupSize(device)" (256) solved the problem. As described here: http://www.khronos.org/message_boards/viewtopic.php?f=28&t=2210 I suggest that the HelloJOGL sample should use "kernel.getWorkGroupSize()" and a lower elementCount to make using JOCL a nicer experience for new developers :) Felix |
I experienced this as well when moving from Snow Leopard to Lion (and the problem was actually the CPU not the GPu because the localWorkSize was smaller then 256).
I can confirm changing the code to the following works OK: int elementCount = ELEM_COUNT; // Length of arrays to process int localWorkSize = min((int)kernel.getWorkGroupSize(device), 256); // Local work size dimensions int globalWorkSize = roundUp(localWorkSize, elementCount); // rounded up to the nearest multiple of the localWorkSize I agree this should probably become the default code in the examples :) |
Hi,
just a note that today (20.10.2013) on my MacBook Pro Retina Mac OS X 10.8.5 /NVIDIA GeForce GT 650M 1024 MB the code as it is/was today at wiki failed with this error and this thread helped me a lot. The problem is that on Mac the getMaxFlopsDevice returns the CPU, not GPU. Trivially by passed that by replacing: CLDevice device = context.getMaxFlopsDevice(); with CLDevice device=context.getDevices()[0]; // 0 happens to be the GPU in my system YMMV br Kusti |
Free forum by Nabble | Edit this page |