jogamp - Re: Looking over JOCL

jogamp › jocl

Re: Looking over JOCL

Posted by Michael Bien on May 22, 2010; 9:48pm
URL: https://forum.jogamp.org/Looking-over-JOCL-tp835533p837034.html

On 05/22/2010 08:54 PM, jcpalmer [via jogamp] wrote:

Michael,
Couple of refinements based on your reply:

>Just look at an OpenCL example and count the function calls for a simple
>thing like sorting a buffer. Its like glbegin/end 18 years ago.

Let me refine my overhead tolerances. Calls that are outside what I call the "Kernel Loop" should be written for ease of use / maintenance / upgrading to any newer version of the spec. En-queuing / arg setting is show time.
I think JavaCL has a little un-neccessary Java overhead en-queuing kernels. Nothing a code optimizer like Proguard cannot get rid of though, for production.

sure. What i actually tried to say was that OpenCL will not stay as is. Just wait until the first wave of extensions is in core. We will subdevide devices at runtime and more... Concurrent kernel execution is very young, technically available since a few weeks on high end hardware.

setArgs + enqueueKernel+ waitforEvents are like a async function call which should be as fast as possible. No unnecessary overhead in the binding code. Thats the goal.
Again, i don't make any assumptions how the binding will be used by client code or where the bottleneck may be today or in future. Since its not possible to look into future.

Since there is as of today no faster way to call a function from java as with a thin JNI layer... we are using a thin JNI layer :). As of consistency we use it for all projects, JOGL, JOCL, JOAL and OpenMAX.

> (no constructors ...)
Same for JavaCL. I understand why. People like me just have to run source code, instead of overriding.

>> JavaCL has methods to wait for events both in it's CLEvent & CLQueue.
>i thought about that, but it was to dangerous from a concurrency
>perspective. CLCommandQueues are not thread save in JOCL's concurrency
>model. This is by design since I expect that in most situations they
>will be only used from one producer thread.

>JOCL forces you to use a queue to do any CLEvent work which forces you
>to think about it... But maybe I will weaken this in future. But that
>was the idea behind that.

FYI, research around events combined with release during garbage collection is currently my highest priority. I had a little accident converting my code base to look like JavaCL. I called clWaitForEvents, but did not call clReleaseEvent. I got a 60% time reduction. It also happened with enqueueWaitForEvents. If this is for real, I could just let the garbage collector call it in it's own thread (I am running the concurrent one). This was on win7.

hehe, so you are relying on finalizers :)

please don't do that... finalizers are like thread.stop... even worse -> completely unspecified and therefore implementation dependent. The only reason why there are still available is backwards compatibility with java 1.1.

Before I got too excited, I wanted to see if this was also the case on Linux (thanks for the help). Now that Linux is up I can get to the bottom of this. The few nVidia sample that use events do not even both to release them.

no problem. I was wondering why nobody else noticed this. We are using LD_PRELOAD since around december in production.

>one context per device?
>I tried to prevent that... Its basically a hack since OpenCL 1.0
>implementations are not ready at this point. N Queues per device is much
>cleaner.
>Just think about memory sharing...

The thing that was the decider against the multi-device context was kernel arg setting. You cannot specify a command queue when setting them. The upshot is you need to have a separate kernel for each device, whether they are all in one context or each in their own. I want each context to have it's own, unshared, host memory to be able to process asynchronously, with Java's ThreadPoolExecutor running the show & each context completely unaware multiple devices exist. If the arg setting thing gets fixed, I'll consider pooling command queues instead of contexts. See nVidia's simpleMultiGPU sample.

right. I wait till the API and implementations stabelize until i go this road. I don't want to add public apis which deal with multiple context now just to workaround implementation issues.

For production systems I go usually this road:
try to solve the issue by using:
1.) multiple queues + multiple kernel instances
2.) multiple program instances
3.) multiple contexts (luckily it never went this far)

thanks again,
great discussion
-michael

Jeff

View message @ http://jogamp.762907.n3.nabble.com/Looking-over-JOCL-tp835533p836722.html
To start a new topic under jogamp, email [hidden email]
To unsubscribe from jogamp, click here.