Re: Revisiting the "Manilla Benchmark" using Bridj
Posted by
Michael Bien on
Jul 21, 2010; 5:25am
URL: https://forum.jogamp.org/Revisiting-the-Manilla-Benchmark-using-Bridj-tp983376p983390.html
Olivier,
I left out that I took my new JavaCL version of the program with beta 4 & compared it to the OpenCL4Java version and got virtually the same result. All my own OpenCL4Java code is now retired, so I have no problem if you wish to use the conversion to cease to expose it. Maybe some kind of table in the Assembler Optimizations section of the Design Wiki. I was confused with all the faster-slower verbage.
Michael,
I have since remembered that I must be missing the DLL that would have been made with a build. They are not in the automated builds dirs. Guess I am kind of spoiled now, not using JOGL for almost a year & even then I used the NB plug-in. I would put everything needed for each platform in it’s own directory along with directions soon. Maybe it is just me, I like to have the source of libraries, but no interest in having to build it from source. Having to build JOGL just so you can build JOCL makes this pretty un-attractive.
yes you are right. Thanks for remembering me on this point. Its a trivial compile-time dependency which is only there for public-API convenience reasons. I'll update the script to allow compiling against jogl jars (automatic download etc) soon.
Have fun in L.A. , but stay away from the Grand Canyon unless you have “your papers” ;-)
haha, thanks!
-regards
michael
Both,
My window on this diversion has closed. I have moved on. One of my final tasks in the OpenCL part of my product is to test out my final production, asynchronous HTTP interface. 257-258.xxx screwed this up. I went back to 197 today to complete my testing. I re-ran the benchmark code, same result. See
http://www.khronos.org/message_boards/viewtopic.php?f=28&t=2451&sid=2aebfa793b81897c44f98f7258b2f27c if interested.
Either way, I think I have been pretty consistent in saying bindings overhead was not important. In a way, I still believe that, but in my 5th generation kernels (GLSL for Gen 1 & 2) I have actually moved the kernel loop right into the GPU. Needed to kind of thread the needle in kernel design to get the equivalent of Global Work Set sync achieved with external kernel calls. Also need to just throw all the possible combos of arguments up in a global prior.
Actually, this is to avoid much more than bindings overhead. The kernel & arg setting overhead + bindings overhead just vanishes. I could do everything with just 1 enqueueKernel(). In order to do multi-gpu, I only do 400 kernels per external queuing.
Jeff