Context:
my application renders a video stream of 720x576 pixels @ 25fps on a GLJPanel; each time a new frame is decoded, the decoder thread calls the panel's repaint() method; inside the panel's display(GLAutoDrawable) method I use glTexImage2D to draw the frame to a texture [the video frame buffer is stored in a direct ByteBuffer]; My GPU is an ATI Radeon HD 5750 with the latest drivers, the os is win xp 32bit. I don't experience any performance problem as long as the size of the GLJPanel is small, but when i enlarge the panel to almost full screen, the frame rate is dropped to 15-17 fps; The strange thing is that decoding the frames without rendering them takes the cpu load to only 5-10%, but with the rendering the cpu jumps to ~55%, even when the panel size is small. I measured how long it takes to execute the display method using System.nanoTime() and it only takes 2-3 milliseconds, so it's really fast. Questions: ->What could be the problem that causes the high CPU load while rendering? May it be the transfer of the video frame buffer from the RAM to the GPU? ->Is it safe to overwrite the content of the buffer passed to glTexImage2D just after that call? ->When the panel size is big, the frame rate is dropped to 15-17 fps, but the display() only takes less than 3 millis. Where is the bottleneck? I also tried to measure the time between the repaint() and the display(), but it is constantly around 20 millis... |
Administrator
|
A common slowdown reason is mismatched glTexImage2D parameters that would force format conversions to occur on the CPU side. Most video streams are in GL_RGB format and not GL_RGBA wich is to my experience the most widely accelerated texture format.
To test if the problem lies in a wrong texture format simply change the glTexImage2D to use GL_RGBA. The video will be fucked up but performance should greatly increase. If so u know thats most likely the problem. |
Administrator
|
On Wednesday, July 14, 2010 14:44:33 Demoscene Passivist [via jogamp] wrote:
> > A common slowdown reason is mismatched glTexImage2D parameters that would > force format conversions to occur on the CPU side. Most video streams are in > GL_RGB format and not GL_RGBA wich is to my experience the most widely > accelerated texture format. > > To test if the problem lies in a wrong texture format simply change the > glTexImage2D to use GL_RGBA. The video will be fucked up but performance > should greatly increase. If so u know thats most likely the problem. > > BGRA can be assumed as to be the HW presentation at least for all the windows cards (NV, ATI, ..). textureInternalFormat=GL.GL_RGBA8; textureDataFormat=GL2.GL_BGRA; textureDataType=GL2.GL_UNSIGNED_INT_8_8_8_8_REV; ~Sven |
In reply to this post by Matteo Battaglio
Thanks for the sudden responses!
Okay, I made some tests and discovered that the problem is much simpler but maybe more worrying: I wrote a simple test app which has a main method that: creates a JFrame, adds a GLJPanel to it, and then runs the following cycle: while (true) { try { Thread.sleep(40); //25 times/second glPanel.repaint(); } catch (InterruptedException ex) {} } The concrete class of the GLJPanel implements the interface GLEventListener and leaves all the implemented methods empty, except for the display which calculates the average frequency of calls to it; so I don't call any openGL command at all. To my surprise, the test showed that the CPU load has a 1:1 relationship with the size of the panel, until it reaches a value of 50%: once at 50%, the cpu load remains constant, but as I increase the size of the panel, the rate of the calls to the display method continues to decrease (while the rate of repaint() calls is still 25)! This makes me wonder that the CPU load can't increase above the limit of 50% on my PC: this makes sense to me because my CPU is a dual core, and awt/swing use a single thread to render the scene, so it can't take advantage of the two cores. But my big question is why it exploits such a high CPU usage although it isn't doing nothing? Can anyone reproduce this issue? (I could also send my source code if anyone needs it) Many thanks in advance! |
Administrator
|
I guess ur GLJPanel is NOT using hardware acceleration and has probably fallen back to software implementation of opengl under windows. I had a similar issue under vista when I started with JOGL.
Try using GLCanvas instead of GLJPanel and see if the problem still remains. Also try if u have hardware accelerated OpenGL in general using e.g. "GPU Caps Viewer". |
I tried with GLCanvas and... it works!
The CPU is 0% and the framerate doesn't drop even at fullscreen. So the problem seems to be related to GLJPanel: I hope this is a bug, because the performance difference in my opinion is too high to be justified by the differencies between awt and swing... |
Administrator
|
Ok if the GLCanvas work perfectly then ur problem is lost/no hardware acceleration wich is ofcourse slow. What u can try is to use these two VM switches:
-Dsun.java2d.opengl=true -Dsun.java2d.noddraw=true The first forces Java2D to use OpenGL under Windows instead of DirectDraw/Direct3D. The second works around a lowlevel driver incompatibility (needed for correct fullscreen support anyway). At least that worked for me when I got the same problem under Vista. The other option is not to use GLJPanel and instead to use the GLCanvas. If u only want good Java2D support and do not need any Swing components inside/overlayed ur rendered graphics, a combination of GLCanvas and TextureRenderer is the way to go. Take a look here: http://jogamp.org/deployment/jogl-next/javadoc_public/com/jogamp/opengl/util/awt/TextureRenderer.html The TextureRenderer also eases u from the problem of manually uploading ur textures and the texture format performance problems discussed before. Performance wise u can easily upload a 1280x1024 texture with 60 frames per second that way. So I guess for ur video application GLCanvas+TextureRenderer is the most easy, compatible and high-performance way. |
I tested the my machine's OpenGL configuration with GPU Caps Viewer and it seems to be all right, the OpenGL are well configured.
If I didn't have hardware acceleration enabled, could the GLCanvas perform well anyway, while the GLJPanel not? (which is the situation I am experiencing) I'm also wondering if the options you suggested me could influence the JOGL performance even if I don't make use of Java2D: I only make some direct OpenGL calls via the GL object... I'd prefer GLJPanel over GLCanvas since my application has to position some buttons on top of the video panel, if possible. |
Administrator
|
>If I didn't have hardware acceleration enabled, could the GLCanvas
>perform well anyway, while the GLJPanel not? (which is the situation I am experiencing) Yes Ãf u didn't have hardware acceleration enabled in general both GLCanvas and GLJPanel would perform bad. But if the OpenGL driver is not capable of using the graphics hardware for only GLJPanel, then software fallback is used, wich is ofcourse slow. This seems to be the case when u use GLJPanel but NOT when u use GLCanvas. >I'm also wondering if the options you suggested me could influence the >JOGL performance even if I don't make use of Java2D: >I only make some direct OpenGL calls via the GL object... But u use the GLJPanel wich is a Swing-Component wich uses Java2D for rendering. So u are indirectly using Java2D. |
Administrator
|
In reply to this post by Matteo Battaglio
On Thursday, July 15, 2010 17:30:32 Matteo Battaglio [via jogamp] wrote:
> > I tested the my machine's OpenGL configuration with GPU Caps Viewer and it > seems to be all right, the OpenGL are well configured. > > If I didn't have hardware acceleration enabled, could the GLCanvas perform > well anyway, while the GLJPanel not? (which is the situation I am > experiencing) > > I'm also wondering if the options you suggested me could influence the JOGL > performance even if I don't make use of Java2D: I only make some direct > OpenGL calls via the GL object... > > I'd prefer GLJPanel over GLCanvas since my application has to position some > buttons on top of the video panel, if possible. > ~Sven |
In reply to this post by Demoscene Passivist
Hi again!
after some days of tests on various machines and various options I can confirm that - on win xp - GLJPanel isn't hw accelerated unless you force the opengl stack with -Dsun.java2d.opengl=true, while the GLCanvas is hw accelerated. Now there is one more question: I see that - even if i'm using hw accelerated GLJPanel, or GLCanvas - when I render a 720x576 image to a texture, rescaled to 1280x1024 to fit the panel size, the CPU load is still very high. If I render a 360x288 image (which is 1/4 the resolution of the previous) the CPU load cuts in half. So i guess that the data transfer between the ram and the GPU isn't done by the DMA, but involves the CPU. Am I right? Is there a way to achieve DMA transfer? Maybe using PBOs? |
Administrator
|
Generally speaking the are a couple of things u can do to enhance performance of Java2D+JOGL intermix:
- Use -Dsun.java2d.opengl=true AND -Dsun.java2d.noddraw=true to avoid hardware acceleration trouble - Use GLCanvas and do ur Java2D rendering stuff with com.jogamp.opengl.util.awt.TextureRenderer - Use VolatileImages in conjunction with -Dsun.java2d.opengl.fbobject=true to render ur Java2D stuff See here for further info: http://weblogs.java.net/blog/campbell/archive/2005/09/java2djogl_inte_1.html - Try using PBOs for CPU base rendering if none of the above works. See here for some nice code+benchmarks: http://today.java.net/article/2008/10/28/integrating-glpbuffer-and-java-graphics2d In addition to that I can't really understand why performance of ur texture transfer is so bad. Take a look at this small routine I've done a couple of month ago: http://www.youtube.com/watch?v=zPX8z0du5I4 It renders a fullscreen 1280x1024 texture using a GLCanvas and TextureRenderer. In front of the Java2D stuff theres some simple translucent JOGL rendering going on. The Java2D 'wave' is a simple BufferedImage rendered using pure softwarerendering setting every single pixel by hand (1280x1024 using the CPU). The whole BufferedImage is then tranferred to the GPU every frame (using TextureRenderer) and rendered as simple plane texture in the background. The whole routine runs at ~100+ frames per second on my nearly 3+ year old Core2Duo notebook. And this is all without any DMA/PBuffer/FBO trickery ... Maybe u could provide some code of what u are actually doing so that we can get a better impression. Seems to me the problem lies somewhere else ... |
Using these options causes artifacts on my applications.. Here's an example: In my code I don't use Buffered/VolatileImage, but only native ByteBuffers which are passed as parameter to glTexImage2D in the display() method of the GLCanvas/JPanel As far as I know (after reading this thread) PBuffer and PBOs are two completely different things.. Am I wrong? Why are you associating PBOs with CPU based rendering? My intentions are to achieve asynchronous and DMA-based texture transfers as described in this tutorial I've just found. The real problem is that my application is a video-surveillance application which needs to display up to 16 video panels, each of those rendering a 720x576 @ 25 fps video stream. Since the decoding process is CPU-hungry, I need to optimize the most I can the rendering process. So far I can say that: -GLJPanel shouldn't be used, since without the opengl and noddraw options it doesn't enable hw acceleration -The CPU load in my tests (which DON'T involve the decoding phase) is directly dependent on the video frame resolution (or -better - the video frame weight), suggesting that the problem lies in the textures not being transferred by the DMA but by the CPU. |
Administrator
|
>Using these options causes artifacts on my applications.. Here's an example:
I thought u had kicked GLJPanel in favor of a GLCanvas based solution, but ur screenshot shows me ur using Swing/GLJPanel ?! >In my code I don't use Buffered/VolatileImage, but only native ByteBuffers which are passed >as parameter to glTexImage2D in the display() method of the GLCanvas/JPanel Maybe the bottleneck lies in the composition of these bytebuffers ? Ever tried "Visual VM" (https://visualvm.dev.java.net/) to profile ur application ? If u don't profile u'll never really know where the bottleneck lies ... >As far as I know (after reading this thread) PBuffer and PBOs are two completely different things.. Am I wrong? Yep they are two different things. But as u aren't using Java2D my link doesnt matter for what u wanna do. If u go 100% OpenGL u should definitly check out this paper from NVidia: http://developer.nvidia.com/object/fast_texture_transfers.html >The real problem is that my application is a video-surveillance >application which needs to display up to 16 video panels, each >of those rendering a 720x576 @ 25 fps video stream. As much as I love OpenGL and JOGL but it seems to me that u are only doing 2D stuff ? So the question arises why are u using JOGL at all ? If I were writing a pure 2D video surveillance application I would use Java2D. It would save u a lot of trouble ... just my 2 cents :) >suggesting that the problem lies in the textures not being transferred by the DMA but by the CPU. I can only repeat myself: Profile ur application! Guessing where the bottleneck lies often leads u in the false direction ... |
I developed this test to see if using a GLJPanel with the options you suggested could be a possible solution, but those options give me problems like that on the screenshot, and also they causes the application to crash on the resize of the panel/canvas (even after having commented the code in the reshape() callback). The byteBuffers are allocated once, passed to the decoder via JNI and then returned already filled by the decoder. Of course I profiled my app with VisualVM and also by putting lots of System.nanoTime() calls in the display method, and between the call to repaint() and the subsequent display(), and it seems that while the display() code is really fast, a lot of time elapses between repaint and display; anyway none of the profiled methods seems to be the guilty one, hence my assumption that the problem doesn't lie on the java-side. I read that paper and followed this suggestion you gave me: >To test if the problem lies in a wrong texture format simply change the glTexImage2D to use GL_RGBA. >The video will be fucked up but performance should greatly increase. If so u know thats most likely the problem. using the parameters suggested by Sven Gothel in every call to glTexImage2D, but the CPU load was still very high... surely using RGBA will improve performance, and I'll be changing the output of the decoder from RGB to RGBA, but i'm quite sure that the DMA problem still is the main responsible of the high CPU usage. Before trying with JOGL I tried a lot of other ways: I printed and read the tutorial Painting in AWT and Swing, learned about Buffered/VolatileImages, double bufering, drawing to offscreen images, etc; I tried to build a VolatileImage using custom Raster/ColorModel/DataBuffer which used my native ByteBuffer (there seems to be no simpler way to create an Image backed by a ByteBuffer), but I wasn't able to achieve a decent performance, even compared with non-hw-accelerated jogl panel. Maybe there's a simpler solution than using JOGL+PBOs - and it's really possible, since I can define myself a noob in awt/swing/java2D programming - but I wasn't able to find it :-/ |
Administrator
|
Maybe u could take a look here: http://blog.pirelenito.org/2008/08/java-movie-playback-jogl-fobs4jmf/
Then download the MovieGL pack from his quite extensive code library (full sourcecode): http://code.google.com/p/victamin/downloads/detail?name=movieGL_1_2.zip&can=2&q= There's an example class 'RGBGLTextureRenderer' wich does the movie->texture conversion. I can't really say anything about performance but I tried it nearly a year ago an it was quite fast at least for a single movie file. |
Thanks, they look interesting! If I succeed I'll make you know, thank you very much for all your support! |
Free forum by Nabble | Edit this page |