Re: JOCL code working under Snow Leopard but not on Lion
Posted by
Michael Bien on
Aug 30, 2011; 11:29pm
URL: https://forum.jogamp.org/Code-working-under-Snow-Leopard-but-not-on-Lion-tp3296875p3296937.html
On 08/31/2011 12:36 AM, Giovanni Idili [via jogamp] wrote:
All,
exact same JOCL code running OK on Snow Leopard but throwing the
following error on Lion:
com.jogamp.opencl.CLException$CLInvalidWorkGroupSizeException:
can not enqueue 1DRange CLKernel [id: 140699706121896 name:
IntegrateHHStep]
with gwo: null gws: {256} lws: {256}
cond.: null events: null [error: CL_INVALID_WORK_GROUP_SIZE]
I am using the following code to define the local workgroup size
and global worksize for the I/O buffers:
// Length of arrays to process
int elementCount = models.size();
// Local work size dimensions for the selected device
int localWorkSize = min(device.getMaxWorkGroupSize(), 256);
// rounded up to the nearest multiple of the localWorkSize
int globalWorkSize = roundUp(localWorkSize, elementCount);
// results buffers are bigger as we are capturing every value
for every item for every time-step
int globalWorkSize_Results = roundUp(localWorkSize,
elementCount*timeConfiguration.getTimeSteps());
On a twitter conversation, @mbien suggested I set localWorkSize to
0, so that the driver will pick-up automatically a worksize, but
how can I declare the buffers of size globalWorkSize and
globalWorkSize_Results without knowing what to round up to (do I
just not round up)?
Thanks!
perfect, its much easier to answer here as via twitter :)
first lets list a few rules about the sizes:
LWS is limited by the device/driver specific maximum and can even
depend on N of your NDRange.
further GWS must be a multiple of LWS.
however in many cases you don't care about LWS, all you want is that
all elements of your GWS are processed - it doesn't matter how its
subdivided. In this case
you can simply set LWS=0 and GWS="problem size" (so you are right,
you don't have to round up in this case). The driver will have to
figure out the values by himself however it might end up in being
not the optimal config.
the code you posted tries to be smart and will only work if your
kernel does a overlow check (if (workItemID >= size) return; ).
It firstly tries to set a supported LWS and rounds GWS up to a
multiple of LWS.
in your case GWS = LWS which is a bit unusual but it should work.
also checkout getMaxWorkItemSizes()[0] maybe its smaller than
getMaxWorkGroupSize() on your system.
--
- - - -
http://michael-bien.com