Would anyone be interested in apple's fft code ported to jocl? Or has someone already done it?
http://developer.apple.com/mac/library/samplecode/OpenCL_FFT/Introduction/Intro.html I ported most of it for work in a couple of hours, but if the boss wont let me use that version I can always do it again (I want to use it for my own code at some point anyway). It's pretty knarly code but i tried and failed to write my own and there's not much else out there that i've been able to find. Although google has become next to useless at finding anything technical lately, even more-so with java so it could just be hidden away somewhere. |
fyi: The AMD SDK ships also with a FFT example. regards, michael On 08/28/2010 06:53 AM, notzed [via jogamp] wrote: Would anyone be interested in apple's fft code ported to jocl? Or has someone already done it? -- - - - - http://michael-bien.com |
Hi Michael, Yep I had a good look at the AMD one too - but it's only fixed function/1d, the Apple one is somewhat more complete and could form the basis of a decent implementation. I'm happy for it to end up in the demos - will a patch on the bug tracker suffice? (i just don't want to deal with a git fork yet). It's not terribly tidy. I'll also try to convert some test code I have into a demo as well (just simple image convolution stuff). BTW should put1drangekernel(kernel, 0, G, L); be the same as put2drangekernel(kernel, 0, 0, G, 1, L, 1);? I spent an inordinate amount of time working out why the 2d fft only ran over a tiny part of the data and it ended up being the putkernel call, the former didn't work the latter did. Cheers, Michael
|
On 08/30/2010 05:31 AM, notzed [via jogamp] wrote: ah ok. patch is ok. I will try to make sure that you are listed as author in the commit message... should work somehow. Thats the main reason why we usually prefer to just pull form other git forks. I'll also try to convert some test code I have into a demo as well (just simple image convolution stuff).this would be great. We have only one junit test testing the image APIs. I' ll try to convert the demo later to a test if possible. i don't think so. 2d range methods pass 2 to the dimension param of 'clEnqueueNDRangeKernel' and the 1d counterpart passes 1. So both can not give the same result. basically all calls end up in this method: /**(manpage: http://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/clEnqueueNDRangeKernel.html)
maybe i misunderstood something but: 2D Range -> put2DRangeKernel(...) or: putNDRangeKernel(kernel, 2, 0, G, L) best regards, michael
-- http://michael-bien.com/ |
On 30 August 2010 22:03, Michael Bien [via jogamp]
<[hidden email]> wrote: > > On 08/30/2010 05:31 AM, notzed [via jogamp] wrote: > patch is ok. I will try to make sure that you are listed as author in the > commit message... should work somehow. Thats the main reason why we usually > prefer to just pull form other git forks. Well if it becomes a regular thing i'll look into it - if it's just for kudos i don't care but if you want it for blame i guess that's another matter. FWIW regarding the demo code, it is just using arrays atm since that's what the fft uses and I haven't had time to play with images yet (already midnight ... again). > BTW should put1drangekernel(kernel, 0, G, L); > be the same as > put2drangekernel(kernel, 0, 0, G, 1, L, 1);? > > i don't think so. 2d range methods pass 2 to the dimension param of > 'clEnqueueNDRangeKernel' and the 1d counterpart passes 1. So both can not > give the same result. Hmm, I still don't see why not - If the dimension size is 1 wont it just iterate over that dimension "from 0 to 0", which will just look the same as a 1d call to the gpu code? But regardless - it's the original C that does a 1d call: err |= clEnqueueNDRangeKernel(queue, kernelInfo->kernel, 1, NULL, &gWorkItems, &lWorkItems, 0, NULL, NULL); Just to re-check that I didn't just make it up or make a mistake at some ungodly hour of the morning I just checked again ... this works no worries: queue.put2DRangeKernel(kernelInfo.kernel, 0, 0, wd.gWorkItems, 1, wd.lWorkItems, 1); this definitely doesn't: queue.put1DRangeKernel(kernelInfo.kernel, 0, wd.gWorkItems, wd.lWorkItems); But I can't say why - the jocl source I have looks correct (although the different copy2NIO() implementations look strange). Unless it was something fixed a month or two ago (the version i'm actually using isn't minty-fresh for various reasons, and my checkout is a few weeks old at least). (i've only been running it on 64 bit linux fwiw) Anyway since just talking about weirdness is a bit of a pain I got off my lazy bum and just put the code into jocl-demos. So i'll upload a patch shortly once I reset my bugzilla password and can work out how to work that offensively named and natured version control system. Not sure what to do about apple's huge blurb - it says you only need to keep it if you distribute it unmodified, which is clearly not the case here. I left it in whole. I've only tested it on nvidia. http://jogamp.org/bugzilla/show_bug.cgi?id=408 (fwiw the single put1drange call commented out in CLFFTPlan.java is the case it hits with this demo) Michael |
i committed the example with small modifications: http://github.com/mbien/jocl-demos/commit/4455385dd411375345688685dd561652708a7024 --- initial import of Michael Zucchi's port of Apples FFT example (RFE 408). small modifications: - pick CLPlatform with a GPU - replaced tabs with spaces :) - removed apple license since it does not apply I'll take a look at the NDRange issue tomorrow, its already late here. regards, michael On 08/30/2010 04:44 PM, notzed [via jogamp] wrote: On 30 August 2010 22:03, Michael Bien [via jogamp] -- http://michael-bien.com/ |
Free forum by Nabble | Edit this page |