port of apple's fft

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

port of apple's fft

notzed
Would anyone be interested in apple's fft code ported to jocl?  Or has someone already done it?

http://developer.apple.com/mac/library/samplecode/OpenCL_FFT/Introduction/Intro.html

I ported most of it for work in a couple of hours, but if the boss wont let me use that version I can always do it again (I want to use it for my own code at some point anyway).

It's pretty knarly code but i tried and failed to write my own and there's not much else out there that i've been able to find.  Although google has become next to useless at finding anything technical lately, even more-so with java so it could just be hidden away somewhere.

Reply | Threaded
Open this post in threaded view
|

Re: port of apple's fft

Michael Bien
Sure this would be great. If you agree we could add it to the jocl-demos project.

fyi: The AMD SDK ships also with a FFT example.

regards,
michael

On 08/28/2010 06:53 AM, notzed [via jogamp] wrote:
Would anyone be interested in apple's fft code ported to jocl?  Or has someone already done it?

http://developer.apple.com/mac/library/samplecode/OpenCL_FFT/Introduction/Intro.html

I ported most of it for work in a couple of hours, but if the boss wont let me use that version I can always do it again (I want to use it for my own code at some point anyway).

It's pretty knarly code but i tried and failed to write my own and there's not much else out there that i've been able to find.  Although google has become next to useless at finding anything technical lately, even more-so with java so it could just be hidden away somewhere.




View message @ http://jogamp.762907.n3.nabble.com/port-of-apple-s-fft-tp1379040p1379040.html
To start a new topic under jogamp, email [hidden email]
To unsubscribe from jogamp, click here.



-- 
- - - -
http://michael-bien.com
Reply | Threaded
Open this post in threaded view
|

Re: port of apple's fft

notzed

Hi Michael,

Yep I had a good look at the AMD one too - but it's only fixed function/1d, the Apple one is somewhat more complete and could form the basis of a decent implementation.

I'm happy for it to end up in the demos - will a patch on the bug tracker suffice?  (i just don't want to deal with a git fork yet).  It's not terribly tidy.  I'll also try to convert some test code I have into a demo as well (just simple image convolution stuff).

BTW should put1drangekernel(kernel, 0, G, L);
be the same as
put2drangekernel(kernel, 0, 0, G, 1, L, 1);?

I spent an inordinate amount of time working out why the 2d fft only ran over a tiny part of the data and it ended up being the putkernel call, the former didn't work the latter did.

Cheers,
 Michael


Michael Bien wrote
  Sure this would be great. If you agree we could add it to the
jocl-demos project.

fyi: The AMD SDK ships also with a FFT example.

regards,
michael

On 08/28/2010 06:53 AM, notzed [via jogamp] wrote:
> Would anyone be interested in apple's fft code ported to jocl?  Or has
> someone already done it?
>
> http://developer.apple.com/mac/library/samplecode/OpenCL_FFT/Introduction/Intro.html
>
> I ported most of it for work in a couple of hours, but if the boss
> wont let me use that version I can always do it again (I want to use
> it for my own code at some point anyway).
>
> It's pretty knarly code but i tried and failed to write my own and
> there's not much else out there that i've been able to find.  Although
> google has become next to useless at finding anything technical
> lately, even more-so with java so it could just be hidden away somewhere.
>
>
>
> ------------------------------------------------------------------------
> View message @
> http://jogamp.762907.n3.nabble.com/port-of-apple-s-fft-tp1379040p1379040.html 
>
> To start a new topic under jogamp, email
> ml-node+762907-380265080-8131@n3.nabble.com
> To unsubscribe from jogamp, click here
> <http://jogamp.762907.n3.nabble.com/template/TplServlet.jtp?tpl=unsubscribe_by_code&node=762907&code=YmllbmF0b3JAYXJjb3IuZGV8NzYyOTA3fDQxNTEwMDY0OA==>.
>
>


--
- - - -
http://michael-bien.com
Reply | Threaded
Open this post in threaded view
|

Re: port of apple's fft

Michael Bien

On 08/30/2010 05:31 AM, notzed [via jogamp] wrote:

Hi Michael,

Yep I had a good look at the AMD one too - but it's only fixed function/1d, the Apple one is somewhat more complete and could form the basis of a decent implementation.
ah ok.

I'm happy for it to end up in the demos - will a patch on the bug tracker suffice?  (i just don't want to deal with a git fork yet).  It's not terribly tidy.
patch is ok. I will try to make sure that you are listed as author in the commit message... should work somehow. Thats the main reason why we usually prefer to just pull form other git forks.

 I'll also try to convert some test code I have into a demo as well (just simple image convolution stuff).
this would be great. We have only one junit test testing the image APIs. I' ll try to convert the demo later to a test if possible.



BTW should put1drangekernel(kernel, 0, G, L);
be the same as
put2drangekernel(kernel, 0, 0, G, 1, L, 1);?
i don't think so. 2d range methods pass 2 to the dimension param of 'clEnqueueNDRangeKernel' and the 1d counterpart passes 1. So both can not give the same result.

basically all calls end up in this method:
    /**
     * Calls {@native clEnqueueNDRangeKernel}.
     */
    public CLCommandQueue putNDRangeKernel(CLKernel kernel, int workDimension, PointerBuffer globalWorkOffset,
            PointerBuffer globalWorkSize, PointerBuffer localWorkSize, CLEventList condition, CLEventList events) {

        PointerBuffer conditionIDs = null;
        int conditions = 0;
        if(condition != null) {
            conditionIDs = condition.IDs;
            conditions   = condition.size;
        }

        int ret = cl.clEnqueueNDRangeKernel(
                ID, kernel.ID, workDimension,
                globalWorkOffset,
                globalWorkSize,
                localWorkSize,
                conditions, conditionIDs,
                events==null ? null : events.IDs);

        if(ret != CL_SUCCESS) {
            throw newException(ret, "can not enqueue "+workDimension+"DRangeKernel: " + kernel+ "\n"
                    + " with gwo: " + toStr(globalWorkOffset, workDimension)
                    + " gws: " + toStr(globalWorkSize, workDimension)
                    + " lws: " + toStr(localWorkSize, workDimension)
                    + toStr(condition, events));
        }

        if(events != null) {
            events.createEvent(context);
        }

        return this;
    }
(manpage: http://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/clEnqueueNDRangeKernel.html)



I spent an inordinate amount of time working out why the 2d fft only ran over a tiny part of the data and it ended up being the putkernel call, the former didn't work the latter did.

maybe i misunderstood something but:
2D Range -> put2DRangeKernel(...)
or:
putNDRangeKernel(kernel, 2, 0, G, L)

best regards,
michael



Cheers,
 Michael


Michael Bien wrote:
  Sure this would be great. If you agree we could add it to the
jocl-demos project.

fyi: The AMD SDK ships also with a FFT example.

regards,
michael

On 08/28/2010 06:53 AM, notzed [via jogamp] wrote:
> Would anyone be interested in apple's fft code ported to jocl?  Or has
> someone already done it?
>
> http://developer.apple.com/mac/library/samplecode/OpenCL_FFT/Introduction/Intro.html
>
> I ported most of it for work in a couple of hours, but if the boss
> wont let me use that version I can always do it again (I want to use
> it for my own code at some point anyway).
>
> It's pretty knarly code but i tried and failed to write my own and
> there's not much else out there that i've been able to find.  Although
> google has become next to useless at finding anything technical
> lately, even more-so with java so it could just be hidden away somewhere.
>
>
>
> ------------------------------------------------------------------------
> View message @
> http://jogamp.762907.n3.nabble.com/port-of-apple-s-fft-tp1379040p1379040.html 
>
> To start a new topic under jogamp, email
> [hidden email]
> To unsubscribe from jogamp, click here
> <http://jogamp.762907.n3.nabble.com/template/TplServlet.jtp?tpl=unsubscribe_by_code&node=762907&code=YmllbmF0b3JAYXJjb3IuZGV8NzYyOTA3fDQxNTEwMDY0OA==>.
>
>


--
- - - -
http://michael-bien.com



View message @ http://jogamp.762907.n3.nabble.com/port-of-apple-s-fft-tp1379040p1386733.html
To start a new topic under jogamp, email [hidden email]
To unsubscribe from jogamp, click here.


-- 
http://michael-bien.com/
Reply | Threaded
Open this post in threaded view
|

Re: port of apple's fft

notzed
On 30 August 2010 22:03, Michael Bien [via jogamp]
<[hidden email]> wrote:
>
> On 08/30/2010 05:31 AM, notzed [via jogamp] wrote:

> patch is ok. I will try to make sure that you are listed as author in the
> commit message... should work somehow. Thats the main reason why we usually
> prefer to just pull form other git forks.

Well if it becomes a regular thing i'll look into it - if it's just
for kudos i don't care but if you want it for blame i guess that's
another matter.   FWIW regarding the demo code, it is just using
arrays atm since that's what the fft uses and I haven't had time to
play with images yet (already midnight ... again).

> BTW should put1drangekernel(kernel, 0, G, L);
> be the same as
> put2drangekernel(kernel, 0, 0, G, 1, L, 1);?
>
> i don't think so. 2d range methods pass 2 to the dimension param of
> 'clEnqueueNDRangeKernel' and the 1d counterpart passes 1. So both can not
> give the same result.

Hmm, I still don't see why not - If the dimension size is 1 wont it
just iterate over that dimension "from 0 to 0", which will just look
the same as a 1d call to the gpu code?

But regardless - it's the original C that does a 1d call:

 err |= clEnqueueNDRangeKernel(queue,  kernelInfo->kernel, 1, NULL,
&gWorkItems, &lWorkItems, 0, NULL, NULL);

Just to re-check that I didn't just make it up or make a mistake at
some ungodly hour of the morning I just checked again ...

this works no worries:
  queue.put2DRangeKernel(kernelInfo.kernel, 0, 0, wd.gWorkItems, 1,
wd.lWorkItems, 1);
this definitely doesn't:
  queue.put1DRangeKernel(kernelInfo.kernel, 0,  wd.gWorkItems, wd.lWorkItems);

But I can't say why - the jocl source I have looks correct (although
the different copy2NIO() implementations look strange).  Unless it was
something fixed a month or two ago (the version i'm actually using
isn't minty-fresh for various reasons, and my checkout is a few weeks
old at least).

(i've only been running it on 64 bit linux fwiw)

Anyway since just talking about weirdness is a bit of a pain I got off
my lazy bum and just put the code into jocl-demos.  So i'll upload a
patch shortly once I reset my bugzilla password and can work out how
to work that offensively named and natured version control system.
Not sure what to do about apple's huge blurb - it says you only need
to keep it if you distribute it unmodified, which is clearly not the
case here.  I left it in whole.  I've only tested it on nvidia.

http://jogamp.org/bugzilla/show_bug.cgi?id=408

(fwiw the single put1drange call commented out in CLFFTPlan.java is
the case it hits with this demo)

 Michael
Reply | Threaded
Open this post in threaded view
|

Re: port of apple's fft

Michael Bien
Works fine on my machine, thanks again.

i committed the example with small modifications:
http://github.com/mbien/jocl-demos/commit/4455385dd411375345688685dd561652708a7024
---
initial import of Michael Zucchi's port of Apples FFT example (RFE 408).
small modifications:
 - pick CLPlatform with a GPU
 - replaced tabs with spaces :)
 - removed apple license since it does not apply
---
I'll take a look at the NDRange issue tomorrow, its already late here.

regards,
michael


On 08/30/2010 04:44 PM, notzed [via jogamp] wrote:
On 30 August 2010 22:03, Michael Bien [via jogamp]
<[hidden email]> wrote:
>
> On 08/30/2010 05:31 AM, notzed [via jogamp] wrote:

> patch is ok. I will try to make sure that you are listed as author in the
> commit message... should work somehow. Thats the main reason why we usually
> prefer to just pull form other git forks.

Well if it becomes a regular thing i'll look into it - if it's just
for kudos i don't care but if you want it for blame i guess that's
another matter.   FWIW regarding the demo code, it is just using
arrays atm since that's what the fft uses and I haven't had time to
play with images yet (already midnight ... again).

> BTW should put1drangekernel(kernel, 0, G, L);
> be the same as
> put2drangekernel(kernel, 0, 0, G, 1, L, 1);?
>
> i don't think so. 2d range methods pass 2 to the dimension param of
> 'clEnqueueNDRangeKernel' and the 1d counterpart passes 1. So both can not
> give the same result.

Hmm, I still don't see why not - If the dimension size is 1 wont it
just iterate over that dimension "from 0 to 0", which will just look
the same as a 1d call to the gpu code?

But regardless - it's the original C that does a 1d call:

 err |= clEnqueueNDRangeKernel(queue,  kernelInfo->kernel, 1, NULL,
&gWorkItems, &lWorkItems, 0, NULL, NULL);

Just to re-check that I didn't just make it up or make a mistake at
some ungodly hour of the morning I just checked again ...

this works no worries:
  queue.put2DRangeKernel(kernelInfo.kernel, 0, 0, wd.gWorkItems, 1,
wd.lWorkItems, 1);
this definitely doesn't:
  queue.put1DRangeKernel(kernelInfo.kernel, 0,  wd.gWorkItems, wd.lWorkItems);

But I can't say why - the jocl source I have looks correct (although
the different copy2NIO() implementations look strange).  Unless it was
something fixed a month or two ago (the version i'm actually using
isn't minty-fresh for various reasons, and my checkout is a few weeks
old at least).

(i've only been running it on 64 bit linux fwiw)

Anyway since just talking about weirdness is a bit of a pain I got off
my lazy bum and just put the code into jocl-demos.  So i'll upload a
patch shortly once I reset my bugzilla password and can work out how
to work that offensively named and natured version control system.
Not sure what to do about apple's huge blurb - it says you only need
to keep it if you distribute it unmodified, which is clearly not the
case here.  I left it in whole.  I've only tested it on nvidia.

http://jogamp.org/bugzilla/show_bug.cgi?id=408

(fwiw the single put1drange call commented out in CLFFTPlan.java is
the case it hits with this demo)

 Michael



View message @ http://jogamp.762907.n3.nabble.com/port-of-apple-s-fft-tp1379040p1389039.html
To start a new topic under jogamp, email [hidden email]
To unsubscribe from jogamp, click here.


-- 
http://michael-bien.com/