jogamp - Re: Looking over JOCL

jogamp › jocl

Re: Looking over JOCL

Posted by jcpalmer on May 22, 2010; 6:54pm
URL: https://forum.jogamp.org/Looking-over-JOCL-tp835533p836722.html

Michael,
Couple of refinements based on your reply:

>Just look at an OpenCL example and count the function calls for a simple
>thing like sorting a buffer. Its like glbegin/end 18 years ago.

Let me refine my overhead tolerances. Calls that are outside what I call the "Kernel Loop" should be written for ease of use / maintenance / upgrading to any newer version of the spec. En-queuing / arg setting is show time.
I think JavaCL has a little un-neccessary Java overhead en-queuing kernels. Nothing a code optimizer like Proguard cannot get rid of though, for production.

> (no constructors ...)
Same for JavaCL. I understand why. People like me just have to run source code, instead of overriding.

>> JavaCL has methods to wait for events both in it's CLEvent & CLQueue.
>i thought about that, but it was to dangerous from a concurrency
>perspective. CLCommandQueues are not thread save in JOCL's concurrency
>model. This is by design since I expect that in most situations they
>will be only used from one producer thread.

>JOCL forces you to use a queue to do any CLEvent work which forces you
>to think about it... But maybe I will weaken this in future. But that
>was the idea behind that.

FYI, research around events combined with release during garbage collection is currently my highest priority. I had a little accident converting my code base to look like JavaCL. I called clWaitForEvents, but did not call clReleaseEvent. I got a 60% time reduction. It also happened with enqueueWaitForEvents. If this is for real, I could just let the garbage collector call it in it's own thread (I am running the concurrent one). This was on win7. Before I got too excited, I wanted to see if this was also the case on Linux (thanks for the help). Now that Linux is up I can get to the bottom of this. The few nVidia sample that use events do not even both to release them.

>one context per device?
>I tried to prevent that... Its basically a hack since OpenCL 1.0
>implementations are not ready at this point. N Queues per device is much
>cleaner.
>Just think about memory sharing...

The thing that was the decider against the multi-device context was kernel arg setting. You cannot specify a command queue when setting them. The upshot is you need to have a separate kernel for each device, whether they are all in one context or each in their own. I want each context to have it's own, unshared, host memory to be able to process asynchronously, with Java's ThreadPoolExecutor running the show & each context completely unaware multiple devices exist. If the arg setting thing gets fixed, I'll consider pooling command queues instead of contexts. See nVidia's simpleMultiGPU sample.

Jeff