jogamp › jogl

JOGL Updates ...

Classic

List

Threaded

13 messages Options

Sven Gothel

JOGL Updates ...

Administrator

JOGL Updates

Brainstorming some JOGL updates for all not reading the git logs.

- Maven 2.0-rc11post08
- Latest aggregated build

- GLEventListenerState / GLStateKeeper (Bug 665 GLContext/GLDrawable re-association)
- Preservation of GLEventListenerState at followup destruction,
restore it at next creation - using GLStateKeeper interface.

- GLStateKeeper interface is implemented and fully functional
w/ GLWindow.

- Exclusive Context Thread (ECT) via AnimatorBase and GLAutoDrawable:
- [get|set]ExclusiveContextThread(..)
- See unit tests TestGearsES2NEWT, TestExclusiveContext*
- On certain GL impl, context switch is still expensive,
ECT allows you to keep a single context current.
- git sha1 224fab1b2c71464826594740022fdcbe278867dc

- GLJPanel
- Uses FBO for offscreen rendering
- Uses GLSL texture vertical flip if available
- git sha1 e92823cddc54b0f4fa71e234061a21de6ee5248c
59a1ab0312492a251a0efc700d040a5f71e88611
d143475e995e473c142fd34be2af6521246f014a

- OSX Enhancements
- Java7 build incl. removal of Java6 dependencies
- CALayer self-contained layout fix
- fixes [most of] the misplaced CALayer bugs
- HELP: Need support detecting remaining bugs!
- Perform all main-thread tasks (CALayer and NEWT)
w/o infinite blocking.
Impl. 'streams' commands to main-thread
while attempting to determine desired states in an async fashion.

- NEWT MouseEvent
- enhancing rotation API / semantics

- NEWT KeyEvent (Bug 678, 641 and 688)
- enhancing keyCode, keyChar - adding keySymbol semantics
- deprecated: KEY_TYPED

- NEWT/Android Enhancements
- more reliable rotation/scroll gesture detection,
i.e. 2-finger scroll -> NEWT's rotation event.

- demonstrating w/ GearsES2
- 2 finger pinch zoom, fast zoom w/ a 3rd finger
- 2 finger (close to each other) rotation/scroll
- 1 finger drag rotation
- keyboard visible now via 4 finger pressure > 0.7f

- Pause w/o finish(), i.e. Home or Menu
- Using GLEventListenerState / GLStateKeeper, see above

- Map KEYCODE_BACK semantics, either (Bug 677):
- keyboard invisible, or
- send to KeyListener:
- consumed by NEWT KeyListener, or
- activity.finish()

- Proper pixel format selection
- git sha1 85d70b7d38885fa8ba6374aa790d5a296acc8ec1

- PNGJ Updates
- interlace support
- palette/indexed support

- GlueGen RecursiveLock
- fix deadlock, corner case of TO reached but lock not acquired

.. and more detailed changes, since ..

~Sven

signature.asc (911 bytes) Download Attachment

hharrison

Re: JOGL Updates ...

Sven,

Any chance you can expand on the following:

> - Exclusive Context Thread (ECT) via AnimatorBase and GLAutoDrawable:
> - [get|set]ExclusiveContextThread(..)
> - See unit tests TestGearsES2NEWT, TestExclusiveContext*
> - On certain GL impl, context switch is still expensive,
> ECT allows you to keep a single context current.
> - git sha1 224fab1b2c71464826594740022fdcbe278867dc

This sounds ideal for our use as we are already pushing all GL ops from a single thread
and if this offers a safety valve to keep us from accidentally making changes that break
that assumption, we'd like to catch it early.

Harvey

Sven Gothel

Re: JOGL Updates ...

Administrator

On 03/25/2013 04:25 AM, hharrison [via jogamp] wrote:

> Sven,
>
> Any chance you can expand on the following:
>
>> - Exclusive Context Thread (ECT) via AnimatorBase and GLAutoDrawable:
>> - [get|set]ExclusiveContextThread(..)
>> - See unit tests TestGearsES2NEWT, TestExclusiveContext*
>> - On certain GL impl, context switch is still expensive,
>> ECT allows you to keep a single context current.
>> - git sha1 224fab1b2c71464826594740022fdcbe278867dc

Check the referenced unit tests .. w/ API doc should be self-explanatory,
sure - if you have questions .. please shoot!
Otherwise, I don't know how to 'expand' (-> elaborate ?).

>
> This sounds ideal for our use as we are already pushing all GL ops from a
> single thread
> and if this offers a safety valve to keep us from accidentally making changes
> that break
> that assumption, we'd like to catch it early.

Great!

~Sven

>
> Harvey

signature.asc (911 bytes) Download Attachment

robbiezl

Re: JOGL Updates ...

In reply to this post by Sven Gothel

GLJPanel
- Uses FBO for offscreen rendering
- Uses GLSL texture vertical flip if available
--------------------------------------------------------------

dose this can update the fps as many as using the GLCanvas？

gouessej

Re: JOGL Updates ...

Administrator

It doesn't concern AWT GLCanvas which already works reliably and sometimes (often?) faster than GLJPanel.

Julien Gouesse | Personal blog | Website

Administrator

Sven Gothel wrote

However, we can assume that FBO operations (alone) on modern GPUs is as fast
as onscreen rendering.

Actually, no. You can assume you're right only with a very recent driver with a decent and recent Nvidia graphics card under Windows... and there are still some exceptions even with some Nvidia Quadro FX cards with validated drivers. On Intel "modern" GPUs, it is never as fast as onscreen rendering.

Julien Gouesse | Personal blog | Website

Sven Gothel

Re: JOGL Updates ...

Administrator

On 03/26/2013 01:25 PM, gouessej [via jogamp] wrote:
> Sven Gothel wrote
> However, we can assume that FBO operations (alone) on modern GPUs is as fast
> as onscreen rendering.
>
> Actually, no. You can assume you're right only with a very recent driver with
> a decent and recent Nvidia graphics card under Windows... and there are still
> some exceptions even with some Nvidia Quadro FX cards with validated drivers.
> On Intel "modern" GPUs, it is never as fast as onscreen rendering.

This is an interesting statement / issue.

Of course, comparing FBO and onscreen rendering performance
shall not include a final FBO to onscreen composition step,
but the FBO rendering alone.

This is not related to GLJPanel, since it's FBO texture reading
via the GLSL shader and the glReadPixels(..) operation
are very expensive of course.

Allow me to elaborate a bit on my experience w/ FBO
while noting the wording 'shall' and 'in theory' :)

While implementing our FBObject I also considered perfomance remarks
in the spec, which were mostly regarding FBO reconfiguration.
Meaning reconfiguration (size, depth, ..) of an FBO is expensive,
while attaching / detaching and switching an FBO _shall_ be fast in theory.

FB - Framebuffer
FBO - Framebuffer Object

CPU-Mem - Shared memory accessible by CPU/GPU, may require DMA
GPU-Mem1 - Memory accessible by GPU and able to be shown onscreen
GPU-Mem2 - Memory accessible by GPU only

Knowing at least one implementation in detail,
the difference of rendering into FBO and onscreen are:
a - Onscreen FB memory: GPU-Mem1
b - FBO's FB memory: GPU-Mem2
c - Switching FBO's FB and onscreen FB shall be similar, since
they are simple memory references onscreen.
d - FBO's FB memory may require a texture format conversion,
if it's render attachment is a texture.
This step would be required, if the texture's data format & type
is different from the 'internal' GL impl. used format.

So technically speaking, there should be no performance impact,
if respecting above details.

IMHO especially the FBO reconfiguration and remark [d] is of interest here
and could be avoided.

To satisfy [d], on desktop GL we use:
textureDataFormat = alpha ? GL.GL_BGRA : GL.GL_RGB;
textureDataType = alpha ? GL2GL3.GL_UNSIGNED_INT_8_8_8_8_REV : GL.GL_UNSIGNED_BYTE;
Maybe we could do better ..

However, it would be interesting to add a FBO performance test to our unit tests
allowing us to collect more data in this regard.
Our current extensive FBO unit tests cover functionality, but it should be easy to add
some performance tests here.

It might be also interesting whether FBO usage has an additional impact
on GL context switching, see also [c].

This also reminds me of our little performance framework in jogl-demos
I have added a few years ago, maybe we should pick that up and enhance it.

All in all .. very good point!

Cheers, Svem

signature.asc (911 bytes) Download Attachment

Sven Gothel

Re: JOGL Updates ...

Administrator

In reply to this post by gouessej

On 03/26/2013 01:55 PM, Sven Gothel wrote:

> On 03/26/2013 01:25 PM, gouessej [via jogamp] wrote:
>> Sven Gothel wrote
>> However, we can assume that FBO operations (alone) on modern GPUs is as fast
>> as onscreen rendering.
>>
>> Actually, no. You can assume you're right only with a very recent driver with
>> a decent and recent Nvidia graphics card under Windows... and there are still
>> some exceptions even with some Nvidia Quadro FX cards with validated drivers.
>> On Intel "modern" GPUs, it is never as fast as onscreen rendering.
>
> This is an interesting statement / issue.
>
> Of course, comparing FBO and onscreen rendering performance
> shall not include a final FBO to onscreen composition step,
> but the FBO rendering alone.
>
> This is not related to GLJPanel, since it's FBO texture reading
> via the GLSL shader and the glReadPixels(..) operation
> are very expensive of course.
>
> Allow me to elaborate a bit on my experience w/ FBO
> while noting the wording 'shall' and 'in theory' :)
>
> While implementing our FBObject I also considered perfomance remarks
> in the spec, which were mostly regarding FBO reconfiguration.
> Meaning reconfiguration (size, depth, ..) of an FBO is expensive,
> while attaching / detaching and switching an FBO _shall_ be fast in theory.
>
> FB - Framebuffer
> FBO - Framebuffer Object
>
> CPU-Mem - Shared memory accessible by CPU/GPU, may require DMA
> GPU-Mem1 - Memory accessible by GPU and able to be shown onscreen
> GPU-Mem2 - Memory accessible by GPU only
>
> Knowing at least one implementation in detail,
> the difference of rendering into FBO and onscreen are:
> a - Onscreen FB memory: GPU-Mem1
> b - FBO's FB memory: GPU-Mem2
> c - Switching FBO's FB and onscreen FB shall be similar, since
> they are simple memory references onscreen.
> d - FBO's FB memory may require a texture format conversion,
> if it's render attachment is a texture.
> This step would be required, if the texture's data format & type
> is different from the 'internal' GL impl. used format.
>
> So technically speaking, there should be no performance impact,
> if respecting above details.
>
> IMHO especially the FBO reconfiguration and remark [d] is of interest here
> and could be avoided.
>
> To satisfy [d], on desktop GL we use:
> textureDataFormat = alpha ? GL.GL_BGRA : GL.GL_RGB;
> textureDataType = alpha ? GL2GL3.GL_UNSIGNED_INT_8_8_8_8_REV : GL.GL_UNSIGNED_BYTE;

http://www.opengl.org/discussion_boards/showthread.php/166635-FBO-switching-overhead?p=1177270&viewfull=1#post1177270

> Maybe we could do better ..
>

http://stackoverflow.com/questions/2198541/what-is-the-best-way-to-handle-fbos-in-opengl

Where I concur to the logic of answer 2:

"As a matter of philosophy, modifying an object state requires that it be
re-validated. Instead, simply changing the object binding (that's already
valid from the previous frame) should be faster for the driver [1]."

But [re-]attaching another 'same size/config' rendertarget 'should be'
ok as well, i.e. avoid deep FBO validation.
Then .. the mentioned probable driver dependent GL stream flush/sync could
harm performance. This ofc also depends on _when_ you switch FBOs,

http://www.opengl.org/discussion_boards/showthread.php/166635-FBO-switching-overhead

.. a long discussion for sure.

> However, it would be interesting to add a FBO performance test to our unit tests
> allowing us to collect more data in this regard.
> Our current extensive FBO unit tests cover functionality, but it should be easy to add
> some performance tests here.
>
> It might be also interesting whether FBO usage has an additional impact
> on GL context switching, see also [c].
>
> This also reminds me of our little performance framework in jogl-demos
> I have added a few years ago, maybe we should pick that up and enhance it.
>
> All in all .. very good point!
>
> Cheers, Svem
>
>

signature.asc (911 bytes) Download Attachment