Re: MulJogamp Timings
Posted by
Wade Walker on
Oct 31, 2013; 1:08am
URL: https://forum.jogamp.org/MulJogamp-Timings-tp4030420p4030427.html
Hi Graham,
I haven't run this particular test, but the result is not surprising

For a simple test like this, running on a desktop GPU with separate memory system, I would expect the overhead of copying the arrays out to the GPU memory to dominate the timing (so the CPU should be faster overall, as you observed).
Usually if you're going to offload computation to a GPU, there must be enough FP operations per byte of input to justify the copying overhead (definitely more than just 1 multiply per 8 bytes of float data as in this test). The kernels that really shine on the GPU are those with hundreds or thousands of FP operations per copied 4-byte operand.