Login  Register

Re: Multi-GPU processing inconsistent.

Posted by The.Scotsman on Jan 26, 2014; 12:47am
URL: https://forum.jogamp.org/Multi-GPU-processing-inconsistent-tp4031306p4031353.html

Thanks for the quick response!

Each comparison computation is completely independent.
I'm not currently using a clFlush/clFinish for each individual CLTask, as that doesn't seem appropriate.
However, I am doing a CLCommandQueuePool.flushQueues() & finishQueues(), but it doesn't appear to make a difference.

The CLCommandQueuePool class acts like a threaded job queue, where each job (CLTask) is scheduled as soon as a device is available.
(Benchmarks that I've done have shown the implementation to be very efficient!)

The CLTask.execute() method passes a CLSimpleQueueContext argument, which varies for each device.
So each CLTask has an associated CLContext & CLCommandQueue.

I see two potential sources of error:
1) A CLTask is performed on a given device before the previous CLTask for that device is complete.
2) There is some issue with copying the same memory to multiple contexts simultaneously.
 
However, recalling that the computations are performed perfectly when using a single device, the first seems unlikely.
The second scenario would occur somewhat randomly, which is the observed behavior.

Since there is no exception thrown, further troubleshooting will require a substantial amount of debugging...