HelloJOCL Problems on MacOSX

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

HelloJOCL Problems on MacOSX

ralphrmartin
Tried the HelloJOCL example on MacOS X 10.6.6. on a MacBook Pro laptop.

On running I got:
Exception in thread "main" java.lang.OutOfMemoryError: Direct buffer memory
        at java.nio.Bits.reserveMemory(Bits.java:633)
        at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:98)
        at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
        at com.jogamp.common.nio.Buffers.newDirectByteBuffer(Buffers.java:67)
        at com.jogamp.common.nio.Buffers.newDirectFloatBuffer(Buffers.java:109)
        at com.jogamp.opencl.CLContext.createFloatBuffer(CLContext.java:316)
        at HelloJOCL.main(HelloJOCL.java:30)

so I then changed line 20 to make elementCount smaller:
       int elementCount = 1444777;

But now I get a new error:
Exception in thread "main" com.jogamp.opencl.CLException$CLInvalidWorkGroupSizeException: can not enqueue 1DRange CLKernel [id: 4296105472 name: VectorAdd]
 with gwo: null gws: {1444864} lws: {256}
cond.: null events: null [error: CL_INVALID_WORK_GROUP_SIZE]
        at com.jogamp.opencl.CLException.newException(CLException.java:78)
        at com.jogamp.opencl.CLCommandQueue.putNDRangeKernel(CLCommandQueue.java:1547)
        at com.jogamp.opencl.CLCommandQueue.put1DRangeKernel(CLCommandQueue.java:1455)
        at com.jogamp.opencl.CLCommandQueue.put1DRangeKernel(CLCommandQueue.java:1425)
        at HelloJOCL.main(HelloJOCL.java:47)

Any suggestions how I can overcome this problem?

Reply | Threaded
Open this post in threaded view
|

Re: HelloJOCL Problems on MacOSX

Michael Bien
  Hi,

regarding the out of direct memory error you could ether increase the
the max direct memory size (XX:MaxDirectMemorySize=) or decrease the
element count and therefore the used memory in the demo.

work group size is hardcoded in line 27 and is hardware dependent:
         int localWorkSize = 256;

256 should work on most hardware...


you could use the CLInfo utility to find out what your HW's limit is:
http://jogamp.org/jocl-demos/www/


regards,
michael



On 02/09/2011 09:13 AM, ralphrmartin [via jogamp] wrote:

> Tried the HelloJOCL example on MacOS X 10.6.6. on a MacBook Pro laptop.
>
> On running I got:
> Exception in thread "main" java.lang.OutOfMemoryError: Direct buffer memory
>          at java.nio.Bits.reserveMemory(Bits.java:633)
>          at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:98)
>          at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
>          at
> com.jogamp.common.nio.Buffers.newDirectByteBuffer(Buffers.java:67)
>          at
> com.jogamp.common.nio.Buffers.newDirectFloatBuffer(Buffers.java:109)
>          at com.jogamp.opencl.CLContext.createFloatBuffer(CLContext.java:316)
>          at HelloJOCL.main(HelloJOCL.java:30)
>
> so I then changed line 20 to make elementCount smaller:
>         int elementCount = 1444777;
>
> But now I get a new error:
> Exception in thread "main"
> com.jogamp.opencl.CLException$CLInvalidWorkGroupSizeException: can not
> enqueue 1DRange CLKernel [id: 4296105472 name: VectorAdd]
>   with gwo: null gws: {1444864} lws: {256}
> cond.: null events: null [error: CL_INVALID_WORK_GROUP_SIZE]
>          at com.jogamp.opencl.CLException.newException(CLException.java:78)
>          at
> com.jogamp.opencl.CLCommandQueue.putNDRangeKernel(CLCommandQueue.java:1547)
>          at
> com.jogamp.opencl.CLCommandQueue.put1DRangeKernel(CLCommandQueue.java:1455)
>          at
> com.jogamp.opencl.CLCommandQueue.put1DRangeKernel(CLCommandQueue.java:1425)
>          at HelloJOCL.main(HelloJOCL.java:47)
>
> Any suggestions how I can overcome this problem?
>
>
>
> _______________________________________________
> If you reply to this email, your message will be added to the discussion below:
> http://jogamp.762907.n3.nabble.com/HelloJOCL-Problems-on-MacOSX-tp2456996p2456996.html
> To start a new topic under jogamp, email [hidden email]
> To unsubscribe from jogamp, visit
http://michael-bien.com/

Reply | Threaded
Open this post in threaded view
|

Re: HelloJOCL Problems on MacOSX

ralphrmartin
Michael, thanks for taking the time to reply.

Decreasing the element count seemed to overcome the out of memory problem.

Howeevr, the worksize is still a problem.The MacBook Pro has both an Nvidia GeForce GT 330M and Intel integrated graphics; I am forcing it to use the former. I get two devices reported using CLInfo:
CL_DEVICE_NAME   GeForce GT 330M   Intel(R) Core(TM) i7 CPU M 620 @ 2.67GHz
and respectively
CL_DEVICE_MAX_WORK_GROUP_SIZE  512   1

So I don't really understand why that value of 256 is failing. In fact even a worksize as small as 2 fails. Maybe it is using the CPU instead of the GPU?

If I set it to 1, I then get a different error:
Exception in thread "main" java.lang.NullPointerException
       at com.jogamp.opencl.CLContext.release(CLContext.java:486)
       at com.jogamp.opencl.CLContext.release(CLContext.java:504)
       at HelloJOCL.main(HelloJOCL.java:54)
Reply | Threaded
Open this post in threaded view
|

Re: HelloJOCL Problems on MacOSX

Michael Bien
  I introduced the NPE regression yesterday night. Please take build 276
or later:
https://jogamp.org/chuck/job/jocl/276/

line 52 creates the CLCommandQueue on the fastest device. You could
choose a different one.

We saw in our test cluster that taking MAX_WORK_GROUP_SIZE as default
does not work with some drivers. Thats why most demos/tests hardcode
that value.

best regards,
michael

On 02/09/2011 06:49 PM, ralphrmartin [via jogamp] wrote:

> Michael, thanks for taking the time to reply.
>
> Decreasing the element count seemed to overcome the out of memory problem.
>
> Howeevr, the worksize is still a problem.The MacBook Pro has both an Nvidia
> GeForce GT 330M and Intel integrated graphics; I am forcing it to use the
> former. I get two devices reported using CLInfo:
> CL_DEVICE_NAME   GeForce GT 330M   Intel(R) Core(TM) i7 CPU M 620 @ 2.67GHz
> and respectively
> CL_DEVICE_MAX_WORK_GROUP_SIZE  512   1
>
> So I don't really understand why that value of 256 is failing. In fact even
> a worksize as small as 2 fails. Maybe it is using the CPU instead of the
> GPU?
>
> If I set it to 1, I then get a different error:
> Exception in thread "main" java.lang.NullPointerException
>         at com.jogamp.opencl.CLContext.release(CLContext.java:486)
>         at com.jogamp.opencl.CLContext.release(CLContext.java:504)
>         at HelloJOCL.main(HelloJOCL.java:54)
>
> _______________________________________________
> If you reply to this email, your message will be added to the discussion below:
> http://jogamp.762907.n3.nabble.com/HelloJOCL-Problems-on-MacOSX-tp2456996p2460165.html
> To start a new topic under jogamp, email [hidden email]
> To unsubscribe from jogamp, visit
http://michael-bien.com/

Reply | Threaded
Open this post in threaded view
|

Re: HelloJOCL Problems on MacOSX

ralphrmartin
Michael, thanks again.

CLCommandQueue queue = context.getDevices()[0].createCommandQueue();
let me use the GPU and a worksize of 256.

The later build 276 fixed the NPE bug.

All is now looking good!
Reply | Threaded
Open this post in threaded view
|

Re: HelloJOCL Problems on MacOSX

Michael Bien
This post was updated on .
  that sounds great Martin,

if you already know that you have to use the GPU you could do something
like that:

>         // search a platform supporting GPU devices
>         CLPlatform platform =  CLPlatform.getDefault(CLPlatformFilters.type(Type.GPU));
>
>         // create a context on that platform using all devices
>         CLContext context = CLContext.create(platform);
>         try{
>             // queue on fastest GPU
>             CLCommandQueue queue =  context.getMaxFlopsDevice(Type.GPU).createCommandQueue();
>
>             // start here ...
>
>         }finally {
>             context.release();
>         }

its basically always a two step process:
  - pick a platform
  - create a context on all devices and pick the device for the CommandQueue later
or
  - pick devices of that platform and create the context only on the devices you selected

best regards,

michael


On 02/09/2011 07:39 PM, ralphrmartin [via jogamp] wrote:
>
> Michael, thanks again.
>
> CLCommandQueue queue = context.getDevices()[0].createCommandQueue();
> let me use the GPU and a worksize of 256.
>
> The later build 276 fixed the NPE bug.
>
> All is now looking good!
>
> _______________________________________________
> If you reply to this email, your message will be added to the discussion below:
> http://jogamp.762907.n3.nabble.com/HelloJOCL-Problems-on-MacOSX-tp2456996p2460616.html
> To start a new topic under jogamp, email ml-node+762907-380265080-8131@n3.nabble.com
> To unsubscribe from jogamp, visit http://jogamp.762907.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=762907&code=YmllbmF0b3JAYXJjb3IuZGV8NzYyOTA3fDQxNTEwMDY0OA==

--
http://michael-bien.com/

Reply | Threaded
Open this post in threaded view
|

Re: HelloJOCL Problems on MacOSX

ralphrmartin
Michael,
Excellent! Thanks for the further tip.
Ralph
Reply | Threaded
Open this post in threaded view
|

Re: HelloJOCL Problems on MacOSX

ralphrmartin
Just a quick remark for anyone else who is following this: I needed to write the below to get it to compile:

CLPlatform.getDefault(CLPlatformFilters.type(CLDevice.Type.GPU))
Reply | Threaded
Open this post in threaded view
|

Re: HelloJOCL Problems on MacOSX

Michael Bien
  right. The enums reside usually in the same namespace as the class
where they are associated with. This makes them look a little bit to
long from time to time.

you can easily get rid of that by using static imports (Java 5 and
later), for example:

import static com.jogamp.opencl.CLDevice.Type.*;
import static com.jogamp.opencl.util.CLPlatformFilters.*;
...

CLPlatform.getDefault(type(GPU));

thats what i did in the code snippet of the last mail, just a little bit
less radical as above.

-michael

On 02/10/2011 12:26 AM, ralphrmartin [via jogamp] wrote:

> Just a quick remark for anyone else who is following this: I needed to write
> the below to get it to compile:
>
> CLPlatform.getDefault(CLPlatformFilters.type(CLDevice.Type.GPU))
>
>
> _______________________________________________
> If you reply to this email, your message will be added to the discussion below:
> http://jogamp.762907.n3.nabble.com/HelloJOCL-Problems-on-MacOSX-tp2456996p2462497.html
> To start a new topic under jogamp, email [hidden email]
> To unsubscribe from jogamp, visit
http://michael-bien.com/

Reply | Threaded
Open this post in threaded view
|

Re: HelloJOCL Problems on MacOSX

felix
I ran into the same problems as described here. For some reason context.getMaxFlopsDevice() returned the CPU device and the localWorkSize of 256 caused problems.
After switching to the GPU (and reducing the elementCount to avoid out of memory errors) the HelloJOCL worked.

Changing localWorkSize to "device.getMaxWorkGroupSize()" (1024) caused problems for me (CL_INVALID_WORK_GROUP_SIZE). One important thing seams to be, that the localWorkSize depends on the specific kernel. Using "kernel.getWorkGroupSize(device)" (256) solved the problem.

As described here: http://www.khronos.org/message_boards/viewtopic.php?f=28&t=2210

I suggest that the HelloJOGL sample should use "kernel.getWorkGroupSize()" and a lower elementCount to make using JOCL a nicer experience for new developers :)

Felix
Reply | Threaded
Open this post in threaded view
|

Re: HelloJOCL Problems on MacOSX

Giovanni Idili
I experienced this as well when moving from Snow Leopard to Lion (and the problem was actually the CPU not the GPu because the localWorkSize was smaller then 256).

I can confirm changing the code to the following works OK:

int elementCount = ELEM_COUNT;                                  // Length of arrays to process
int localWorkSize = min((int)kernel.getWorkGroupSize(device), 256);  // Local work size dimensions
int globalWorkSize = roundUp(localWorkSize, elementCount);   // rounded up to the nearest multiple of the localWorkSize


I agree this should probably become the default code in the examples :)


Reply | Threaded
Open this post in threaded view
|

Re: HelloJOCL Problems on MacOSX

nyholku
Hi,

just a note that today (20.10.2013) on my MacBook Pro Retina Mac OS X 10.8.5 /NVIDIA GeForce GT 650M 1024 MB the code as it is/was today at wiki failed with this error and this thread helped me a lot. The problem is that on Mac the getMaxFlopsDevice returns the CPU, not GPU. Trivially by passed that by replacing:

            CLDevice device = context.getMaxFlopsDevice();
with
            CLDevice device=context.getDevices()[0]; // 0 happens to be the GPU in my system YMMV

br Kusti