Where does my code spend time and how to improve this
Posted by nyholku on Jan 01, 2019; 12:11pm
URL: https://forum.jogamp.org/Where-does-my-code-spend-time-and-how-to-improve-this-tp4039358.html
I have a simple kernel (see below) that basically does tool_size**2 b[] = min(a[],b[]) ops between two float arrays (height maps).
When I queue 100.000 ops of this kernel
for tool_size = 8 I get about 100.000 completed ops/sec and
for tool_size =128 I get about 50.000
So my question is, given that even if I reduce my kernel to almost nil I get
similar results, what is the limiting factor here and what I can do about it.
wbr Kusti
for (int k = 0; k < 100000; k++) {
kernel.setArg(2, tool_pos_x);
kernel.setArg(3, tool_pos_y);
kernel.setArg(4,tool_pos_z);
queue.put2DRangeKernel(kernel, 0, 0, tool_size, tool_size, 0, 0);//
}
kernel void millcut(
global const float* tool,
global float* stock,
int tool_pos_x,
int tool_pos_y,
int tool_pos_z
) {
int x=get_global_id(0);
int y=get_global_id(1);
int si = (x + tool_pos_x) + (y + tool_pos_y) * stock_size;
int ti = x + y * tool_size;
int h=tool[ti]+tool_pos_z;
if (stock[si] > h)
stock[si] = h;
}