In particular I am not sure if I have to change the way I create the buffers etc. since in order to figure out the buffer size I am doing all sorts of rounding up etc. and it is my understanding that the CLWork utility will get rid of that kind of prestidigitation :)
Also, you mention in the blog post you took that code from unit tests, it would be helpful if you could point me to that too!