jogamp - Re: How to queue in a multidevice environment

jogamp › jocl

Re: How to queue in a multidevice environment

Posted by Arnold on Feb 26, 2017; 9:33pm
URL: https://forum.jogamp.org/How-to-queue-in-a-multidevice-environment-tp4037674p4037676.html

Thanks for the example links and that worked! Well partially. Using finish () is what I was looking for and the subbuffers seems to work. So the last lines of the code show are:

for (int i = 0; i < nDevices; i++)
{
CLDevice device = devices [i];
q [i] = device.createCommandQueue ();
q [i]
.putWriteBuffer (CLSubArrayA [i], false)
.putWriteBuffer (CLSubArrayB [i], false)
.put1DRangeKernel (kernel, 0, globalWorkSize, localWorkSize)
.putReadBuffer (CLSubArrayC [i], false);

} // for

for (int i = 0; i < nDevices; i++)
{
q [i].finish ();
} // for

There is just a small detail: the devices lumped together take twice as much time instead of half the time (I list the benchmark results below). I'll try to figure out tomorrow where the problem is. I saw a reference to CLCommandThreadPool by Michael Bien. Well, that's a nice exercise for tomorrow :-D

I did consider switching to the llb's but the hlb's are really well coded and I am rather particular to how code should look like. You guys really did a nice job with the hlb's. The books I am reading now are C(++) indeed but untilo now I can translate this to hlb one way or another.

1 OpenCL GPU platform(s) present
1 OpenCL CPU platform(s) present
Platform AMD Accelerated Parallel Processing contains 2 devices
Device: CLDevice [id: 139956276461280 name: Ellesmere type: GPU profile: FULL_PROFILE]
Device: CLDevice [id: 139956278518384 name: Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz type: CPU profile: FULL_PROFILE]
Running benchmark for 2 devices.
Benchmarking Ellesmere : 0 1 2 3 4 4983 ms
Benchmarking Intel(R) Xeon(R) CPU: 0 1 2 3 4 7765 ms
Benchmarking all devices together: 0 1 2 3 4 23149 ms
Benchmarking plain old Java: 12101 ms

Summary of computing vectors with 20,480,000 elements (of type double)
Device VectorAdd VectorMul VectorDiv VectorTri
Ellesmere 227 280 242 245
Intel(R) Xeon(R) CPU 355 365 356 475
All devices together 1120 1124 1142 1242
Plain old Java 45 43 79 11934

Anyhow, it's time to sleep. And thanks again for your great help and patience!