When I started running the HelloJOCL sample [http://goo.gl/iI8t8
] and looking at how long the computation was taking, I noticed that with my CPU the computation was taking around 12ms to complete while with the GPU it was around 18ms, but since I was just getting the thing to work I did not ask myself too many questions.
Now that I put together a sample to run neuronal simulations (Hodking-Huxley model) I am noticing disturbing performance differences: the CPU takes 4 seconds while the GPU takes 13 seconds on average.
Here is the sample code: https://gist.github.com/981935
and here is the kernel: https://gist.github.com/981938
The structure of my code is built on top of the HelloJOCL example (I have around 300 elements and I am just populating queues and sending them down for processing), with the significant difference
that I am looping over a number of time steps and running my kernel in parallel at each time-step (not just once).
My CPu is an i7 QuadCore, while the GPU is an ATI HD 4XXX card (512MB RAM).
I am thinking either
my CPu is exceptionally fast and my GPU is crap or
I am doing something very wrong in my code (such as repeating operations that I could do only once in setting up the kernel).
Any help appreciated!
Hi everyone, I am back with another typical n00b question.