Reply – Re: Parallel computation on CPU faster than GPU!
Your Name
Subject
Message
or Cancel
In Reply To
Re: Parallel computation on CPU faster than GPU!
— by Michael Bien Michael Bien

On 05/22/2011 08:41 PM, John_Idol [via jogamp] wrote:

> I moved the loop to the kernel [http://goo.gl/ga6pt] - and getting much
> better results (only blocking one of the buffers or all of them to get the
> stuff out at the end with final values does not seem to make a difference):
>
> with 302 items -->  GPU: 276ms / CPU: 228ms
>
> Here's the code I am using to invoke the kernel: http://goo.gl/297a3
>
> One weird thing I've noticed, if I don't block any buffer the computation
> only takes 1ms ... which makes me think something is horribly wrong. Trying
> to find a way to verify.
if you don't block your java program will not wait for any results. It
would just send the command to the device and exit. The command queue is
by default asynchronous, you just send the commands but will have two
wait for the results somehow.

OpenCL provides many options how you can wait for results or certain
events in general.

  - finish() waits for everything (all previously enqueued commands) to
complete
  - a blocking command (the boolean on most putFoo methods) waits for
this command to complete (in a in-order queue its very similar to finish())
  - and events
  - (+ barriers but i leave them out for now)

events give you a lot of control over the execution. Most commands allow
passing a condition list and a event list as method parameters. Every
command is a CLEvent and can wait for other CLEvents (condition list).

the host can wait for events too... queue.putWaitForEvents(events, true);

In jocl you can use events only as list since there are often many of them.
CLEventList readEvents = new CLEventList(2);

...
queue.putReadBuffer(a, false, null/*this would be the condition list*/,
resultEvents);
queue.putReadBuffer(b, false, null/*this would be the condition list*/,
resultEvents);

queue.putWaitForEvents(readEvents, true);

after you release the list with readEvents.release() you can reuse the list.

(but events would be unnecessary in your usecase right now IMO)

-michael

> As mentioned in the previous post, ideally I would like at this stage to get
> a 2-dimensional array out (at least for one of the buffers) at the end with
> values for each step of the loop I moved into the kernel, so that I can do
> some plotting and check that the computation is actually happening.
>
> Any help on that appreciated!
>
>
>
> _______________________________________________
> If you reply to this email, your message will be added to the discussion below:
> http://forum.jogamp.org/Parallel-computation-on-CPU-faster-than-GPU-tp2963506p2972470.html
> To start a new topic under jogamp, email [hidden email]
> To unsubscribe from jogamp, visit
http://michael-bien.com/