Reply – Re: JOCL code working under Snow Leopard but not on Lion
Your Name
or Cancel
In Reply To
Re: JOCL code working under Snow Leopard but not on Lion
— by Michael Bien Michael Bien
On 08/31/2011 12:36 AM, Giovanni Idili [via jogamp] wrote:

exact same JOCL code running OK on Snow Leopard but throwing the following error on Lion:

com.jogamp.opencl.CLException$CLInvalidWorkGroupSizeException: can not enqueue 1DRange CLKernel [id: 140699706121896 name: IntegrateHHStep]
with gwo: null gws: {256} lws: {256}
cond.: null events: null [error: CL_INVALID_WORK_GROUP_SIZE]

I am using the following code to define the local workgroup size and global worksize for the I/O buffers:

// Length of arrays to process
int elementCount = models.size();
// Local work size dimensions for the selected device
int localWorkSize = min(device.getMaxWorkGroupSize(), 256);
// rounded up to the nearest multiple of the localWorkSize
int globalWorkSize = roundUp(localWorkSize, elementCount);
// results buffers are bigger as we are capturing every value for every item for every time-step
int globalWorkSize_Results = roundUp(localWorkSize, elementCount*timeConfiguration.getTimeSteps());

On a twitter conversation, @mbien suggested I set localWorkSize to 0, so that the driver will pick-up automatically a worksize, but how can I declare the buffers of size globalWorkSize and globalWorkSize_Results without knowing what to round up to (do I just not round up)?


perfect, its much easier to answer here as via twitter :)

first lets list a few rules about the sizes:
LWS is limited by the device/driver specific maximum and can even depend on N of your NDRange.
further GWS must be a multiple of LWS.

however in many cases you don't care about LWS, all you want is that all elements of your GWS are processed - it doesn't matter how its subdivided. In this case
you can simply set LWS=0 and GWS="problem size" (so you are right, you don't have to round up in this case). The driver will have to figure out the values by himself however it might end up in being not the optimal config.

the code you posted tries to be smart and will only work if your kernel does a overlow check (if (workItemID >= size) return; ). It firstly tries to set a supported LWS and rounds GWS up to a multiple of LWS.

in your case GWS = LWS which is a bit unusual but it should work.

also checkout getMaxWorkItemSizes()[0] maybe its smaller than getMaxWorkGroupSize() on your system.

- - - -