Login  Register

Where does my code spend time and how to improve this

Posted by nyholku on Jan 01, 2019; 12:11pm
URL: https://forum.jogamp.org/Where-does-my-code-spend-time-and-how-to-improve-this-tp4039358.html


I have a simple kernel (see below)  that basically does tool_size**2 b[] = min(a[],b[]) ops between two float arrays (height maps).

When I queue 100.000 ops of this kernel
for tool_size = 8 I get about 100.000 completed ops/sec and
for tool_size =128 I get about 50.000

So my question is, given that even if I reduce my kernel to almost nil I get
similar results, what is the limiting factor here and what I can do about it.

wbr Kusti


for (int k = 0; k < 100000; k++) {
        kernel.setArg(2, tool_pos_x);
        kernel.setArg(3, tool_pos_y);
        kernel.setArg(4,tool_pos_z);
        queue.put2DRangeKernel(kernel, 0, 0, tool_size, tool_size, 0, 0);//
        }

kernel void millcut(
        global const float* tool,
  global float* stock,
  int tool_pos_x,
  int tool_pos_y,
  int tool_pos_z
  ) {
  int x=get_global_id(0);
  int y=get_global_id(1);
  int si = (x + tool_pos_x) + (y + tool_pos_y) * stock_size;
        int ti = x + y * tool_size;
        int h=tool[ti]+tool_pos_z;
        if (stock[si] > h)
                stock[si] = h;
}