Login  Register

Re: Multi-GPU processing inconsistent.

Posted by The.Scotsman on Oct 07, 2014; 4:39pm
URL: https://forum.jogamp.org/Multi-GPU-processing-inconsistent-tp4031306p4033283.html

After a long hiatus, I'm back on this project.
I anyone is still interested, here's an update.

First thing I did was update to the latest jogamp release, but the problem still occurred.

Then I wired up some detailed debug output, which can be sorted and compared over multiple runs.
This did not succeed in localizing the problem, but did show that the error rate was larger than previously indicated, as there were a number of less serious errors that weren't apparent before.

Next thing I did was shuffle the input list: Instead of an ordered comparison of A to B, A to C, A to D, etc., the object comparisons are now random, resulting in many fewer instances where multiple GPU's are reading data from the same objects simultaneously.
As a result of this simple change, the error rate dropped to less than 1%, and performance actually improved a bit.

Finally, I manually synchronized both objects at the CLTask.execute() level:

synchronize(object1) {
  synchronize(object2) {

...execute 3 comparison kernels...

 }
}

With this, the error rate dropped to zero, although performance dropped about 35%.

So I'm still pretty confident the problem is a result of some non-thread safe code somewhere within Jogamp/OpenCL (CLCommandQueue.putWriteBuffer?).

The effort required to create an independent test case is substantial - many days - and it doesn't sound like anyone there has a rig to test it with anyways.
So I don't know if I will have the opportunity to address this further.

Many thanks for supporting jogamp.