Login  Register

Linux ARM freezes (Java, EGL/ES, JOGL)

Posted by Sven Gothel on Mar 05, 2012; 5:20pm
URL: https://forum.jogamp.org/Linux-ARM-freezes-Java-EGL-ES-JOGL-tp3801301.html

Dear Xerxes, dear All,

allow me to summarize my findings about the Linux ARM freezes.

+++

Phenomenon:

The freeze I am reporting is characterized by

  - hanging java process

  - the command 'ps ax' hangs before the line
    where it probably shall report the java process

  - syslog message:

  - 'kill -9 <PID>' doesn't work

  - reboot freezes as well, the reset button needs to be pressed
 
This is different then an implementation error, eg. 'software deadlock',
since such freeze shall not affect the overall system
and the user process shall be interrupt-able.
+++

The native es2redsquare didn't freeze the machine so far,
800 loops from the shell etc.
  cd ./jogl/src/test/native/mesa-demos-patched
  bash make.sh es2redsquare.c
  bash shell_loop.sh

+++

TestRedSquareES2NEWT or TestGearsES2NEWT
with '-loops 1000 -loop-shutdown 1 -time 100' doesn't frees either.

Note: '-loop-shutdown 2' triggers a bug in EGL, eglGetDisplay(..) fails
sometime, probably some EGL race condition ?

+++

Lately test of 'shell' loops w/ TestRedSquareES2NEWT or TestGearsES2NEWT
and the args '-loops 1 -time 100' didn't freeze the machines,
tested a few times until ~250.

+++

Platform-1a + Platform-2:

The remote NEWT unit tests pass properly the 1st time.
You have to remove the AWT*NEWT* test collection manually
from the junit.run.remote.ssh target in build-test.xml.

However a 2nd run freezes the machines (pandaboard/ac100)
within an arbitrary test.

Running all remote unit tests (default) freezes both machines
within the 'AWT/NEWT tests', which comes after the NEWT only tests.

+++

Platform-1b:

Running the NEWT unit tests, occasional 'hangs' occur in:
  'jogamp.opengl.x11.glx.GLX.dispatch_glXMakeContextCurrent1'

'ps ax' works and discloses the PID,
which can be killed via 'kill -9 <PID>'.

The unit tests then continue properly.

+++

This has been reproduced w/ OpenJDK
  - IcedTea6 1.11pre) (6b23~pre11-0ubuntu1.11.10.2) +
    JamVM (build 1.6.0-devel, inline-threaded interpreter with stack-caching)

  - Oracle J2SE/JRE build 1.6.0_30-b12 +
    Java HotSpot(TM) Client VM (build 20.5-b03, mixed mode)


Platform-1a:
  - Pandaboard ES (Omap4)

  - Ubuntu 11.10

  - GLX and Mesa3D Software 'enabled'

  - EGL/ES: pvr-omap4 1.7.10.0.1.9-1

  - Linux panda01 3.1.0-1282-omap4 #11-Ubuntu SMP PREEMPT Mon Feb 13 15:38:55
    UTC 2012 armv7l armv7l armv7l GNU/Linux


Platform-1b:
  - Pandaboard ES (Omap4)

  - Ubuntu 11.10

  - GLX and Mesa3D Software 'enabled'

  - EGL/ES: disabled (moved libEGL* libGLESv* away)

  - Linux panda01 3.1.0-1282-omap4 #11-Ubuntu SMP PREEMPT Mon Feb 13 15:38:55
    UTC 2012 armv7l armv7l armv7l GNU/Linux



Platform-2:
  - Toshiba AC100 (Tegra2)

  - Ubuntu 11.10

  - GLX and Mesa3D Software 'enabled'

  - EGL/ES: nvidia-tegra 12~beta1-0ubuntu1

  - Linux jautab02 2.6.38-1001-ac100 #2-Ubuntu SMP PREEMPT Tue Dec 20 08:05:25
    UTC 2011 armv7l armv7l armv7l GNU/Linux

+++

The freeze is completely arbitrary,
rarely it happens within the demo code's call of EGLContextImpl.makeCurrent(),
but more often before test setup or finish w/o any EGL/ES calls involved.

+++

Both platforms have a similar if not equal package setup.

They differ in their:
  - Linux kernel
  - EGL/ES driver.

Since the internal loop and neither the native test
could reproduce this freeze,
one could assumed that the EGL/ES drivers are not the culprit.

This assumption may also been deduced knowing that platform-1a
and platform-2 use different EGL/ES drivers.

However platform-1b does not freeze (software OpenGL)
hence some correlation between hardware and Java might
cause the problem.

The common ground on all freezing platforms is the
Xorg server/client, besides the other generic dependencies.

The Xorg server/client is being treated different
when using software OpenGL or proprietary EGL/ES.

+++

Cause: TBD

~Sven


signature.asc (910 bytes) Download Attachment