Posted by
gmseed on
Oct 30, 2013; 12:44pm
URL: https://forum.jogamp.org/MulJogamp-Timings-tp4030420.html
Hi
On the Tutorial page is a link to a paper that compares jogamp-jocl, jocl and javacl:
http://jogamp.org/wiki/index.php/JOCL_TutorialWhen comparing against "normal" Java the author includes the filling of the arrays in the cpu timings. I took the time to implement this test case [as I'm new to jocl] and fatcored out the array filling and computation:
private void fillJavaArrays(float[] matA, float[] matB, int seedA, int seedB)
{
Random randA = new Random(seedA);
Random randB = new Random(seedB);
final int n = matA.length;
for (int i=0; i<n; i++)
{
matA[i] = randA.nextFloat();
matB[i] = randB.nextFloat();
}
}
public void normalMatMulCalc(float[] matA, float[] matB, float[] C)
{
final int n = matA.length;
for (int i=0; i<n; i++)
{
C[i] = matA[i] * matB[i];
}
}
and now compare apples with apples:
...
// normal Java calculation
float[] matA = new float[n];
float[] matB = new float[n];
float[] C = new float[n];
fillJavaArrays(matA,matB,seedA,seedB);
time = nanoTime();
normalMatMulCalc(matA,matB,C);
time = nanoTime() - time;
...
From the pdf I'm a bit confused as to whether the size of n is 1444777 or 14447777, but using the bigger 14447777 then my timing results are:
created: CLContext [id: 375806496, platform: NVIDIA CUDA, profile: FULL_PROFILE, devices: 1]
using CLDevice [id: 375806416 name: Quadro K1000M type: GPU profile: FULL_PROFILE]
local: 256
global: 14447872
used device memory: 173MB
A*B=C results snapshot:
0.29194298, 0.23210067, 0.6739147, 0.5184218, 0.53693414, 0.0102392025, 0.2038985, 0.10943726, 0.16293794, 0.018490046, ...; 14447862 more
computation on GPU took: 52 ms
0.29194298, 0.23210067, 0.6739147, 0.5184218, 0.53693414, 0.0102392025, 0.2038985, 0.10943726, 0.16293794, 0.018490046, ...; 14447862 more
computation on CPU took: 16 ms
illustrating that the "normal" Java computation is 52/16=3.25 times faster.
I'm interested to hear if other people have run this test.
Thanks
Graham