Login  Register

Re: Where does my code spend time and how to improve this

Posted by Wade Walker on Jan 01, 2019; 4:26pm
URL: https://forum.jogamp.org/Where-does-my-code-spend-time-and-how-to-improve-this-tp4039358p4039359.html

Hard to be certain, but for higher performance I'd advise putting more of the work into the kernel instead of the host code. If you're looping over tool_pos_x|y|z for example, having that loop inside the kernel might give better performance.

Another possibility is that since your kernel code has a data-dependent branch (if (stock[si] > h)), it may not parallelize onto a GPU very well. For GPU efficiency, a whole "warp" of threads needs to branch the same direction most of the time -- the more divergence there is within the warp, the lower the performance.

Yet another possibility is that you're doing integer math here instead of floating-point, which many GPUs are poor at. Your types are also a little strange -- you're reading a float from "tool", but then converting to int before you write to "stock". You should clearly separate index calculations (int) from data copying (float).