| 
					
	
	
	
	
				 | 
				
					
	
	 
		Hi,
  I have general question about OpenCL memory management. I want to perform gaussian downsampling of an image in two passes (vertical and horizontal). Before you ask: Much more like derivates and other task should follow, so OpenCL seams to be the right tool for the job. 
 I want to store the result of the computation of step 1 in an temporary CLBuffer. So far so good, it works as long as I transfer the result back into main memory using commandQueue.putReadBuffer(). However, since I don't need the data in my application (only for the second step) I replaced commandQueue.putReadBuffer() with commandQueue.finish(). Unfortunately my code stops working, which means that only zeros are finally read. The following code shows the problem:
  // STEP 1
 vPassKernel.rewind(); // the kernel might be reused
 vPassKernel.putArg(inBuffer) // input image
     .putArg(tmpBuffer); // result of the first pass is stored here (read_write, no host ptr)				
 commandQueue.putWriteBuffer(inBuffer, false)
     .put1DRangeKernel(vPassKernel, 0, vPassGlobalWorkSize, vPassLocalWorkSize)
     //.putReadBuffer(tmpBuffer, true); <-- slow, but works 
     .finish(); // <-- problem! When reusing tmpBuffer is has only zeros during the next step :(
  // STEP 2		
 //tmpBuffer.getBuffer().rewind(); // only needed when using "putReadBuffer"
 hPassKernel.rewind();
 hPassKernel.putArg(tmpBuffer) // use as input
     .putArg(clOutBuffer) // final output that is transfered into main memory
     .putArg(hPassElementCount);		
 commandQueue.putWriteBuffer(tmpBuffer, false)
     .put1DRangeKernel(hPassKernel, 0, hPassGlobalWorkSize, hPassLocalWorkSize)
     .putReadBuffer(clOutBuffer, true);
  Why is the call to "putReadBuffer" needed? Is that how the OpenCL memory model works? Am I missing something? 
 BTW: Changing between CPU or GPU does not change anything, so i guess it must be my fault. The kernels aren't special - just computing a weighted sum. I know that i could call the second kernel from the first kernel, but in the future I will need the temporary buffer in other ("non-sequencial") situations. (I'm running OSX 10.6 with a slow ATI 6490M).
  Thanks, Felix
	
	
	
	 
				 |