JOGL Updates ...

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

JOGL Updates ...

Sven Gothel
Administrator
JOGL Updates

Brainstorming some JOGL updates for all not reading the git logs.

- Maven 2.0-rc11post08
- Latest aggregated build

- GLEventListenerState / GLStateKeeper (Bug 665 GLContext/GLDrawable re-association)
    - Preservation of GLEventListenerState at followup destruction,
      restore it at next creation - using GLStateKeeper interface.

    - GLStateKeeper interface is implemented and fully functional
      w/ GLWindow.

- Exclusive Context Thread (ECT) via AnimatorBase and GLAutoDrawable:
  - [get|set]ExclusiveContextThread(..)
  - See unit tests TestGearsES2NEWT, TestExclusiveContext*
  - On certain GL impl, context switch is still expensive,
    ECT allows you to keep a single context current.
  - git sha1 224fab1b2c71464826594740022fdcbe278867dc

- GLJPanel
  - Uses FBO for offscreen rendering
  - Uses GLSL texture vertical flip if available
  - git sha1 e92823cddc54b0f4fa71e234061a21de6ee5248c
             59a1ab0312492a251a0efc700d040a5f71e88611
             d143475e995e473c142fd34be2af6521246f014a

- OSX Enhancements
  - Java7 build incl. removal of Java6 dependencies
  - CALayer self-contained layout fix
    - fixes [most of] the misplaced CALayer bugs
    - HELP: Need support detecting remaining bugs!
  - Perform all main-thread tasks (CALayer and NEWT)
    w/o infinite blocking.
    Impl. 'streams' commands to main-thread
    while attempting to determine desired states in an async fashion.  

- NEWT MouseEvent
  - enhancing rotation API / semantics

- NEWT KeyEvent (Bug 678, 641 and 688)
  - enhancing keyCode, keyChar - adding keySymbol semantics
  - deprecated: KEY_TYPED

- NEWT/Android Enhancements
  - more reliable rotation/scroll gesture detection,
    i.e. 2-finger scroll -> NEWT's rotation event.

  - demonstrating w/ GearsES2
    - 2 finger pinch zoom, fast zoom w/ a 3rd finger
    - 2 finger (close to each other) rotation/scroll
    - 1 finger drag rotation
    - keyboard visible now via 4 finger pressure > 0.7f

  - Pause w/o finish(), i.e. Home or Menu
    - Using GLEventListenerState / GLStateKeeper, see above

  - Map KEYCODE_BACK semantics, either (Bug 677):
    - keyboard invisible, or
    - send to KeyListener:
      - consumed by NEWT KeyListener, or
      - activity.finish()

  - Proper pixel format selection
    - git sha1 85d70b7d38885fa8ba6374aa790d5a296acc8ec1

- PNGJ Updates
  - interlace support
  - palette/indexed support
 
- GlueGen RecursiveLock
  - fix deadlock, corner case of TO reached but lock not acquired

.. and more detailed changes, since ..

~Sven


signature.asc (911 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: JOGL Updates ...

hharrison
Sven,

Any chance you can expand on the following:

> - Exclusive Context Thread (ECT) via AnimatorBase and GLAutoDrawable:
>   - [get|set]ExclusiveContextThread(..)
>   - See unit tests TestGearsES2NEWT, TestExclusiveContext*
>   - On certain GL impl, context switch is still expensive,
>     ECT allows you to keep a single context current.
>   - git sha1 224fab1b2c71464826594740022fdcbe278867dc

This sounds ideal for our use as we are already pushing all GL ops from a single thread
and if this offers a safety valve to keep us from accidentally making changes that break
that assumption, we'd like to catch it early.

Harvey
Reply | Threaded
Open this post in threaded view
|

Re: JOGL Updates ...

Sven Gothel
Administrator
On 03/25/2013 04:25 AM, hharrison [via jogamp] wrote:

> Sven,
>
> Any chance you can expand on the following:
>
>> - Exclusive Context Thread (ECT) via AnimatorBase and GLAutoDrawable:
>>   - [get|set]ExclusiveContextThread(..)
>>   - See unit tests TestGearsES2NEWT, TestExclusiveContext*
>>   - On certain GL impl, context switch is still expensive,
>>     ECT allows you to keep a single context current.
>>   - git sha1 224fab1b2c71464826594740022fdcbe278867dc
Check the referenced unit tests .. w/ API doc should be self-explanatory,
sure - if you have questions .. please shoot!
Otherwise, I don't know how to 'expand' (-> elaborate ?).

>
> This sounds ideal for our use as we are already pushing all GL ops from a
> single thread
> and if this offers a safety valve to keep us from accidentally making changes
> that break
> that assumption, we'd like to catch it early.

Great!

~Sven

>
> Harvey


signature.asc (911 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: JOGL Updates ...

robbiezl
In reply to this post by Sven Gothel
 GLJPanel
  - Uses FBO for offscreen rendering
  - Uses GLSL texture vertical flip if available
--------------------------------------------------------------

dose this  can update the fps  as many as using the GLCanvas?
Reply | Threaded
Open this post in threaded view
|

Re: JOGL Updates ...

gouessej
Administrator
It doesn't concern AWT GLCanvas which already works reliably and sometimes (often?) faster than GLJPanel.
Julien Gouesse | Personal blog | Website
Reply | Threaded
Open this post in threaded view
|

Re: JOGL Updates ...

robbiezl
But GLJPanel always runs much slower than GLCanvas
Reply | Threaded
Open this post in threaded view
|

Re: JOGL Updates ...

gouessej
Administrator
Yes, that's what I meant but it can't be solved. GLJPanel uses offscreen rendering under the hood, there is no way to change that. GLJPanel is not mandatory in most of the case except when using JInternalFrame instances. Maybe you can switch to NEWT and use the AWT/NEWT bridge in your case.
Julien Gouesse | Personal blog | Website
Reply | Threaded
Open this post in threaded view
|

Re: JOGL Updates ...

Sven Gothel
Administrator
On 03/26/2013 10:05 AM, gouessej [via jogamp] wrote:
> Yes, that's what I meant but it can't be solved. GLJPanel uses offscreen
> rendering under the hood, there is no way to change that.

The GL offscreen rendering path uses FBO[1], following a GLSL shader flipping
the texture vertically[2]. The GL operation is concluded via read-pixels
of the flipped FBO to the Java2D image buffer[3].
Java2D composition is performed[4] on the image buffer
and hence the most expensive step, after [3].

http://jogamp.org/git/?p=jogl.git;a=blob;f=src/jogl/classes/javax/media/opengl/awt/GLJPanel.java;h=f1a2ccc7eb20dcf3b4e6d0266ec852324486ffab;hb=d90855bfc457a703d7a8fe14598f4d5e8de7e73e#l107

~Sven


signature.asc (911 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: JOGL Updates ...

gouessej
Administrator
FBOs can be very slow on lots of machines, glReadPixels and their equivalents too.

Swing components can be drawn with NEWT and GLG2D :)
Julien Gouesse | Personal blog | Website
Reply | Threaded
Open this post in threaded view
|

Re: JOGL Updates ...

Sven Gothel
Administrator
On 03/26/2013 01:08 PM, gouessej [via jogamp] wrote:
> FBOs can be very slow on lots of machines, glReadPixels and their equivalents
> too.
>
> Swing components can be drawn with NEWT and GLG2D :)

Of course - I just wanted to explain the details ..

However, we can assume that FBO operations (alone) on modern GPUs is as fast
as onscreen rendering.

~Sven



signature.asc (911 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: JOGL Updates ...

gouessej
Administrator
Sven Gothel wrote
However, we can assume that FBO operations (alone) on modern GPUs is as fast
as onscreen rendering.
Actually, no. You can assume you're right only with a very recent driver with a decent and recent Nvidia graphics card under Windows... and there are still some exceptions even with some Nvidia Quadro FX cards with validated drivers. On Intel "modern" GPUs, it is never as fast as onscreen rendering.
Julien Gouesse | Personal blog | Website
Reply | Threaded
Open this post in threaded view
|

Re: JOGL Updates ...

Sven Gothel
Administrator
On 03/26/2013 01:25 PM, gouessej [via jogamp] wrote:
>     Sven Gothel wrote
>     However, we can assume that FBO operations (alone) on modern GPUs is as fast
>     as onscreen rendering.
>
> Actually, no. You can assume you're right only with a very recent driver with
> a decent and recent Nvidia graphics card under Windows... and there are still
> some exceptions even with some Nvidia Quadro FX cards with validated drivers.
> On Intel "modern" GPUs, it is never as fast as onscreen rendering.

This is an interesting statement / issue.

Of course, comparing FBO and onscreen rendering performance
shall not include a final FBO to onscreen composition step,
but the FBO rendering alone.

This is not related to GLJPanel, since it's FBO texture reading
via the GLSL shader and the glReadPixels(..) operation
are very expensive of course.

Allow me to elaborate a bit on my experience w/ FBO
while noting the wording 'shall' and 'in theory' :)

While implementing our FBObject I also considered perfomance remarks
in the spec, which were mostly regarding FBO reconfiguration.
Meaning reconfiguration (size, depth, ..) of an FBO is expensive,
while attaching / detaching and switching an FBO _shall_ be fast in theory.

FB - Framebuffer
FBO - Framebuffer Object

CPU-Mem  - Shared memory accessible by CPU/GPU, may require DMA
GPU-Mem1 - Memory accessible by GPU and able to be shown onscreen
GPU-Mem2 - Memory accessible by GPU only

Knowing at least one implementation in detail,
the difference of rendering into FBO and onscreen are:
  a - Onscreen FB memory: GPU-Mem1
  b - FBO's FB memory: GPU-Mem2
  c - Switching FBO's FB and onscreen FB shall be similar, since
      they are simple memory references onscreen.
  d - FBO's FB memory may require a texture format conversion,
      if it's render attachment is a texture.
      This step would be required, if the texture's data format & type
      is different from the 'internal' GL impl. used format.
     
So technically speaking, there should be no performance impact,
if respecting above details.

IMHO especially the FBO reconfiguration and remark [d] is of interest here
and could be avoided.

To satisfy [d], on desktop GL we use:
            textureDataFormat = alpha ? GL.GL_BGRA : GL.GL_RGB;
            textureDataType = alpha ? GL2GL3.GL_UNSIGNED_INT_8_8_8_8_REV : GL.GL_UNSIGNED_BYTE;
Maybe we could do better ..

However, it would be interesting to add a FBO performance test to our unit tests
allowing us to collect more data in this regard.
Our current extensive FBO unit tests cover functionality, but it should be easy to add
some performance tests here.

It might be also interesting whether FBO usage has an additional impact
on GL context switching, see also [c].

This also reminds me of our little performance framework in jogl-demos
I have added a few years ago, maybe we should pick that up and enhance it.

All in all .. very good point!

Cheers, Svem



signature.asc (911 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: JOGL Updates ...

Sven Gothel
Administrator
In reply to this post by gouessej
On 03/26/2013 01:55 PM, Sven Gothel wrote:

> On 03/26/2013 01:25 PM, gouessej [via jogamp] wrote:
>>     Sven Gothel wrote
>>     However, we can assume that FBO operations (alone) on modern GPUs is as fast
>>     as onscreen rendering.
>>
>> Actually, no. You can assume you're right only with a very recent driver with
>> a decent and recent Nvidia graphics card under Windows... and there are still
>> some exceptions even with some Nvidia Quadro FX cards with validated drivers.
>> On Intel "modern" GPUs, it is never as fast as onscreen rendering.
>
> This is an interesting statement / issue.
>
> Of course, comparing FBO and onscreen rendering performance
> shall not include a final FBO to onscreen composition step,
> but the FBO rendering alone.
>
> This is not related to GLJPanel, since it's FBO texture reading
> via the GLSL shader and the glReadPixels(..) operation
> are very expensive of course.
>
> Allow me to elaborate a bit on my experience w/ FBO
> while noting the wording 'shall' and 'in theory' :)
>
> While implementing our FBObject I also considered perfomance remarks
> in the spec, which were mostly regarding FBO reconfiguration.
> Meaning reconfiguration (size, depth, ..) of an FBO is expensive,
> while attaching / detaching and switching an FBO _shall_ be fast in theory.
>
> FB - Framebuffer
> FBO - Framebuffer Object
>
> CPU-Mem  - Shared memory accessible by CPU/GPU, may require DMA
> GPU-Mem1 - Memory accessible by GPU and able to be shown onscreen
> GPU-Mem2 - Memory accessible by GPU only
>
> Knowing at least one implementation in detail,
> the difference of rendering into FBO and onscreen are:
>   a - Onscreen FB memory: GPU-Mem1
>   b - FBO's FB memory: GPU-Mem2
>   c - Switching FBO's FB and onscreen FB shall be similar, since
>       they are simple memory references onscreen.
>   d - FBO's FB memory may require a texture format conversion,
>       if it's render attachment is a texture.
>       This step would be required, if the texture's data format & type
>       is different from the 'internal' GL impl. used format.
>      
> So technically speaking, there should be no performance impact,
> if respecting above details.
>
> IMHO especially the FBO reconfiguration and remark [d] is of interest here
> and could be avoided.
>
> To satisfy [d], on desktop GL we use:
>             textureDataFormat = alpha ? GL.GL_BGRA : GL.GL_RGB;
>             textureDataType = alpha ? GL2GL3.GL_UNSIGNED_INT_8_8_8_8_REV : GL.GL_UNSIGNED_BYTE;
http://www.opengl.org/discussion_boards/showthread.php/166635-FBO-switching-overhead?p=1177270&viewfull=1#post1177270

> Maybe we could do better ..
>

http://stackoverflow.com/questions/2198541/what-is-the-best-way-to-handle-fbos-in-opengl

Where I concur to the logic of answer 2:

"As a matter of philosophy, modifying an object state requires that it be
re-validated. Instead, simply changing the object binding (that's already
valid from the previous frame) should be faster for the driver [1]."

But [re-]attaching another 'same size/config' rendertarget 'should be'
ok as well, i.e. avoid deep FBO validation.
Then .. the mentioned probable driver dependent GL stream flush/sync could
harm performance. This ofc also depends on _when_ you switch FBOs,

http://www.opengl.org/discussion_boards/showthread.php/166635-FBO-switching-overhead

.. a long discussion for sure.


> However, it would be interesting to add a FBO performance test to our unit tests
> allowing us to collect more data in this regard.
> Our current extensive FBO unit tests cover functionality, but it should be easy to add
> some performance tests here.
>
> It might be also interesting whether FBO usage has an additional impact
> on GL context switching, see also [c].
>
> This also reminds me of our little performance framework in jogl-demos
> I have added a few years ago, maybe we should pick that up and enhance it.
>
> All in all .. very good point!
>
> Cheers, Svem
>
>


signature.asc (911 bytes) Download Attachment