AWT/JOGL Deadlock, Revisited

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

AWT/JOGL Deadlock, Revisited

rhatcher
Fedora 12/i386
Sun/Oracle 7u5
JogAmp RC9

This past week we switched our code base from JOGL 1.1.1a to JogAmp RC9.  With a little help from perl the conversion process was painless, and things are working very well in general.

However...

After the switch we are sporadically seeing a deadlock similar to the one described in this topic from a few weeks ago:  http://forum.jogamp.org/Deadlock-during-JOGL-initialization-tc4025625.html.

I wasn't sure what the protocol was on the forum, i.e. should I start a new topic or squat on the one mentioned above.  That topic ended without apparent resolution so I thought it might be better to start over.

The presentation is that perhaps a third of the time our viewer desktop application will deadlock at startup.  Drawing never starts, menus don't work, and the event queue seems to be hung up across the board.

At the end of the topic mentioned above Sven had asked for a jstack output from the locked process.  Ours looks like this:

stack.txt

The backtraces in the relevant areas are similar to those discussed in the other topic.  The deadlock says:

Found one Java-level deadlock:
=============================
"Timer-5":
  waiting for ownable synchronizer 0xac94ad18, (a java.util.concurrent.locks.ReentrantLock$NonfairSync),
  which is held by "AWT-EventQueue-0"
"AWT-EventQueue-0":
  waiting to lock monitor 0x09e2423c (object 0xac8d1da0, a sun.awt.PostEventQueue),
  which is held by "Timer-5"

In the other topic there is no obvious association with JogAmp in the classes apparently involved in the deadlock, and that's the case here too.  The only association is indirect in that the problem started immediately after we switched, and JogAmp JOGL just happens to get involved with AWT event stuff.

This could easily be something we're doing that JOGL was tolerating better but at the moment I don't have any guesses.  I'm going to try a new hotness dev build and see if the problem remains.

Any other ideas?
Reply | Threaded
Open this post in threaded view
|

Re: AWT/JOGL Deadlock, Revisited

gouessej
Administrator
Hi

Either it is fixed in the RC 10, or Sven already talked to me about this problem. Anyway, you should give us a small test case to help us to reproduce your bug if any.
Julien Gouesse | Personal blog | Website
Reply | Threaded
Open this post in threaded view
|

Re: AWT/JOGL Deadlock, Revisited

rhatcher
Hi Julien,

I wish I could whittle it down to something small, and if I manage to do that I will be happy to provide a test case.  Unfortunately the situation is complex and I suspect the complexity is a big part of the problem.

Since my original post the problem has changed a little bit.  I built GlueGen and JOGL from repositories I cloned last Friday 08/10/2012 (I guess this would be something slightly after RC10 by JogAmp's reckoning) to see if the deadlock issue went away.  So far I haven't seen any more deadlocks, or at least nothing that's being reported as a deadlock.

However...

I am still frequently seeing situations where no drawing takes place and the AWT event queue seems to be locked up across the board (for example menus don't work either) even though no deadlock is reported by jstack.

Every time this happens I can get in the debugger and find a timer thread that appears to be stuck waiting on a java.awt.EventQueue$1InvocationLock with this backtrace:

"Timer-6" prio=10 tid=0xa5319c00 nid=0x5fce in Object.wait() [0xa5d69000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xad29c688> (a java.awt.EventQueue$1AWTInvocationLock)
    at java.lang.Object.wait(Object.java:503)
    at java.awt.EventQueue.invokeAndWait(EventQueue.java:1232)
    - locked <0xad29c688> (a java.awt.EventQueue$1AWTInvocationLock)
    at jogamp.common.awt.AWTEDTExecutor.invoke(AWTEDTExecutor.java:53)
    at jogamp.opengl.awt.AWTThreadingPlugin.invokeOnOpenGLThread(AWTThreadingPlugin.java:103)
    at jogamp.opengl.ThreadingImpl.invokeOnOpenGLThread(ThreadingImpl.java:206)
    at javax.media.opengl.Threading.invokeOnOpenGLThread(Threading.java:172)
    at javax.media.opengl.Threading.invoke(Threading.java:191)
    at javax.media.opengl.awt.GLCanvas.display(GLCanvas.java:428)
    at com.jogamp.opengl.util.AWTAnimatorImpl.display(AWTAnimatorImpl.java:74)
    at com.jogamp.opengl.util.AnimatorBase.display(AnimatorBase.java:142)
    at com.zedasoft.cba.view.DisplaySetAnimator.display(DisplaySetAnimator.java:105)
    at com.jogamp.opengl.util.FPSAnimator$1.run(FPSAnimator.java:128)
    at java.util.TimerThread.mainLoop(Timer.java:555)
    at java.util.TimerThread.run(Timer.java:505)

Here's the full jstack output from one failed run: jstack2.out

My observation is that once it's in this condition it never recovers.  Just for good measure I set a breakpoint higher up in the backtrace to verify that it's never returning from the display call.

I've been poking around in the lower levels of all this stuff for most of the day trying to make sense of where it's getting hung up.  The problem happens relatively frequently, so if you have any suggestions about ways I might trap the root cause I will be glad to try them.  Maybe that will help me learn enough to produce a smaller example.

Reply | Threaded
Open this post in threaded view
|

Re: AWT/JOGL Deadlock, Revisited

Sven Gothel
Administrator
On 08/13/2012 10:28 PM, rhatcher [via jogamp] wrote:

> Hi Julien,
>
> I wish I could whittle it down to something small, and if I manage to do that
> I will be happy to provide a test case.  Unfortunately the situation is
> complex and I suspect the complexity is a big part of the problem.
>
> Since my original post the problem has changed a little bit.  I built GlueGen
> and JOGL from repositories I cloned last Friday 08/10/2012 (I guess this would
> be something slightly after RC10 by JogAmp's reckoning) to see if the deadlock
> issue went away.  So far I haven't seen any more deadlocks, or at least
> nothing that's being reported as a deadlock.
Yes, RC10 (latest sources) includes the deadlock fix.

>
> However...
>
> I am still frequently seeing situations where no drawing takes place and the
> AWT event queue seems to be locked up across the board (for example menus
> don't work either) even though no deadlock is reported by jstack.
>
> Every time this happens I can get in the debugger and find a timer thread that
> appears to be stuck waiting on a java.awt.EventQueue$1InvocationLock with this
> backtrace:
>
> "Timer-6" prio=10 tid=0xa5319c00 nid=0x5fce in Object.wait() [0xa5d69000]
>    java.lang.Thread.State: WAITING (on object monitor)
>     at java.lang.Object.wait(Native Method)
>     - waiting on <0xad29c688> (a java.awt.EventQueue$1AWTInvocationLock)
Your jstack shows:

"AWT-EventQueue-0" prio=10 tid=0xa668e800 nid=0x5f7e runnable [0xa635c000]
   java.lang.Thread.State: RUNNABLE
        at java.lang.StringBuilder.append(StringBuilder.java:132)
        at java.lang.Object.toString(Object.java:237)
        at java.lang.String.valueOf(String.java:2902)

.. so for some reason it produces a String for-ever it seems like ?

>     at java.lang.Object.wait(Object.java:503)
>     at java.awt.EventQueue.invokeAndWait(EventQueue.java:1232)
>     - locked <0xad29c688> (a java.awt.EventQueue$1AWTInvocationLock)
>     at jogamp.common.awt.AWTEDTExecutor.invoke(AWTEDTExecutor.java:53)
>     at
> jogamp.opengl.awt.AWTThreadingPlugin.invokeOnOpenGLThread(AWTThreadingPlugin.java:103)
>
>     at jogamp.opengl.ThreadingImpl.invokeOnOpenGLThread(ThreadingImpl.java:206)
>     at javax.media.opengl.Threading.invokeOnOpenGLThread(Threading.java:172)
>     at javax.media.opengl.Threading.invoke(Threading.java:191)
>     at javax.media.opengl.awt.GLCanvas.display(GLCanvas.java:428)
>     at com.jogamp.opengl.util.AWTAnimatorImpl.display(AWTAnimatorImpl.java:74)
>     at com.jogamp.opengl.util.AnimatorBase.display(AnimatorBase.java:142)
>     at
> com.zedasoft.cba.view.DisplaySetAnimator.display(DisplaySetAnimator.java:105)
Minor note: Looks like you override AnimatorBase.display() w/ your
DisplaySetAnimator.display().

>     at com.jogamp.opengl.util.FPSAnimator$1.run(FPSAnimator.java:128)
>     at java.util.TimerThread.mainLoop(Timer.java:555)
>     at java.util.TimerThread.run(Timer.java:505)
>
> Here's the full jstack output from one failed run: jstack2.out
> <http://forum.jogamp.org/file/n4025798/jstack2.out>
>
> My observation is that once it's in this condition it never recovers.  Just
> for good measure I set a breakpoint higher up in the backtrace to verify that
> it's never returning from the display call.
>
> I've been poking around in the lower levels of all this stuff for most of the
> day trying to make sense of where it's getting hung up.  The problem happens
> relatively frequently, so if you have any suggestions about ways I might trap
> the root cause I will be glad to try them.  Maybe that will help me learn
> enough to produce a smaller example.
See above .. AWT EDT seems to be busy w/ producing a string ?

We would need a small unit test to reproduce the issue of yours
and to see whether it's JOGL who causes this 'overload'.
IMHO .. it's not.

~Sven


signature.asc (910 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: AWT/JOGL Deadlock, Revisited

rhatcher
> Your jstack shows:
>
> "AWT-EventQueue-0" prio=10 tid=0xa668e800 nid=0x5f7e runnable [0xa635c000]
>   java.lang.Thread.State: RUNNABLE
>        at java.lang.StringBuilder.append(StringBuilder.java:132)
>        at java.lang.Object.toString(Object.java:237)
>        at java.lang.String.valueOf(String.java:2902)
>
> .. so for some reason it produces a String for-ever it seems like ?

Even though everything in the UI has stalled, thread AWT-EventQueue-0 does not appear to be stuck - or at least not totally stuck.  It just happened to be at that spot at the time I ran jstack.  I can resume the JVM in the debugger then suspend it again and AWT-EventQueue-0 will sometimes stop with a different stack trace.  The stack trace in the previous jstack output is rooted in a logging statement in java.awt.EventDispatchThread.addEventFilter:
        eventLog.finest("adding the event filter: " + filter);
The String creation is a side effect of "filter" being converted to String form, and ultimately leads to the default Object toString() method.  It looks harmless on the surface.


> Minor note: Looks like you override AnimatorBase.display() w/ your
> DisplaySetAnimator.display().

This is a remnant from our previous code base that was based on 1.1.1a.  There are some things that happen before and after super.display().  I commented out this overridden display() method and got the same behavior.


> We would need a small unit test to reproduce the issue of yours
> and to see whether it's JOGL who causes this 'overload'.
> IMHO .. it's not.

I would want the same thing.  Not sure if I'll be able to reproduce it on a small scale, but I'll see what I can do.

I will be the first to admit that this problem almost certainly is something in our code base that 1.1.1a was more forgiving of.  The main reason I'm consulting the forum is to get some thoughts from people who know a lot more about the internals of AWT event dispatching than I do (and to hopefully increase the chances that I won't have to discard a week of work and go back to using 1.1.1a, which would get us nowhere :-/ ).

Any time this hang happens I can always find a timer thread stuck at the lock.wait() call in java.awt.EventQueue's invokeAndWait method:

   synchronized (lock) {
            Toolkit.getEventQueue().postEvent(event);
            while (!event.isDispatched()) {
                lock.wait();
            }
        }

That spot - and the entire stack trace for that particular timer thread - is not unusual per se.  On any random suspension of an app that's working ok this thread is likely to stop at this spot.  However, when these hangs happen the debugger shows that the "while" loop is never exiting, which obviously suggests that wait() never returns and/or isDispatched() never returns true.

The event is an InvocationEvent.  Looking at InvocationEvent's dispatch() method it looks like the most likely causes of being stuck in the "while" loop above would be that either the event's dispatch() method is never called, or dispatch() is called but the runnable's run() method is never completing.  In either case the lock/notifier's notifyAll() would not be called and the InvocationEvent would never be marked as dispatched.

In this case the runnable appears to be "displayOnEDTAction" from GLCanvas.  The first few lines of this runnable look like this.

  private final Runnable displayOnEDTAction = new Runnable() {
    @Override
    public void run() {
        final RecursiveLock _lock = lock;
        _lock.lock();
        try {            
            if( validateGLDrawable() ) {
                helper.invokeGL(drawable, context, displayAction, initAction);

Maybe Eclipse's debugger is misleading me, but I can't find this runnable on any backtrace in the total set of threads, so I'm not sure if it's still waiting to run, is running and stuck, or has already completed.  The lock appears to have a relatively short timeout on it so I would think if it ran at all it would complete at some point.

I added -Djogl.debug=true to the JVM args and re-ran this a few times.  I've uploaded that output here in case it might reveal something to a more informed eye.  Note that I added an annotation in there to show where the output stops in the cases where the app hangs.

I'm continuing to pound of this of course, but if anybody has any further suggestions I'd like to hear them.