Linux ARM freezes (Java, EGL/ES, JOGL)

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Linux ARM freezes (Java, EGL/ES, JOGL)

Sven Gothel
Administrator
Dear Xerxes, dear All,

allow me to summarize my findings about the Linux ARM freezes.

+++

Phenomenon:

The freeze I am reporting is characterized by

  - hanging java process

  - the command 'ps ax' hangs before the line
    where it probably shall report the java process

  - syslog message:

  - 'kill -9 <PID>' doesn't work

  - reboot freezes as well, the reset button needs to be pressed
 
This is different then an implementation error, eg. 'software deadlock',
since such freeze shall not affect the overall system
and the user process shall be interrupt-able.
+++

The native es2redsquare didn't freeze the machine so far,
800 loops from the shell etc.
  cd ./jogl/src/test/native/mesa-demos-patched
  bash make.sh es2redsquare.c
  bash shell_loop.sh

+++

TestRedSquareES2NEWT or TestGearsES2NEWT
with '-loops 1000 -loop-shutdown 1 -time 100' doesn't frees either.

Note: '-loop-shutdown 2' triggers a bug in EGL, eglGetDisplay(..) fails
sometime, probably some EGL race condition ?

+++

Lately test of 'shell' loops w/ TestRedSquareES2NEWT or TestGearsES2NEWT
and the args '-loops 1 -time 100' didn't freeze the machines,
tested a few times until ~250.

+++

Platform-1a + Platform-2:

The remote NEWT unit tests pass properly the 1st time.
You have to remove the AWT*NEWT* test collection manually
from the junit.run.remote.ssh target in build-test.xml.

However a 2nd run freezes the machines (pandaboard/ac100)
within an arbitrary test.

Running all remote unit tests (default) freezes both machines
within the 'AWT/NEWT tests', which comes after the NEWT only tests.

+++

Platform-1b:

Running the NEWT unit tests, occasional 'hangs' occur in:
  'jogamp.opengl.x11.glx.GLX.dispatch_glXMakeContextCurrent1'

'ps ax' works and discloses the PID,
which can be killed via 'kill -9 <PID>'.

The unit tests then continue properly.

+++

This has been reproduced w/ OpenJDK
  - IcedTea6 1.11pre) (6b23~pre11-0ubuntu1.11.10.2) +
    JamVM (build 1.6.0-devel, inline-threaded interpreter with stack-caching)

  - Oracle J2SE/JRE build 1.6.0_30-b12 +
    Java HotSpot(TM) Client VM (build 20.5-b03, mixed mode)


Platform-1a:
  - Pandaboard ES (Omap4)

  - Ubuntu 11.10

  - GLX and Mesa3D Software 'enabled'

  - EGL/ES: pvr-omap4 1.7.10.0.1.9-1

  - Linux panda01 3.1.0-1282-omap4 #11-Ubuntu SMP PREEMPT Mon Feb 13 15:38:55
    UTC 2012 armv7l armv7l armv7l GNU/Linux


Platform-1b:
  - Pandaboard ES (Omap4)

  - Ubuntu 11.10

  - GLX and Mesa3D Software 'enabled'

  - EGL/ES: disabled (moved libEGL* libGLESv* away)

  - Linux panda01 3.1.0-1282-omap4 #11-Ubuntu SMP PREEMPT Mon Feb 13 15:38:55
    UTC 2012 armv7l armv7l armv7l GNU/Linux



Platform-2:
  - Toshiba AC100 (Tegra2)

  - Ubuntu 11.10

  - GLX and Mesa3D Software 'enabled'

  - EGL/ES: nvidia-tegra 12~beta1-0ubuntu1

  - Linux jautab02 2.6.38-1001-ac100 #2-Ubuntu SMP PREEMPT Tue Dec 20 08:05:25
    UTC 2011 armv7l armv7l armv7l GNU/Linux

+++

The freeze is completely arbitrary,
rarely it happens within the demo code's call of EGLContextImpl.makeCurrent(),
but more often before test setup or finish w/o any EGL/ES calls involved.

+++

Both platforms have a similar if not equal package setup.

They differ in their:
  - Linux kernel
  - EGL/ES driver.

Since the internal loop and neither the native test
could reproduce this freeze,
one could assumed that the EGL/ES drivers are not the culprit.

This assumption may also been deduced knowing that platform-1a
and platform-2 use different EGL/ES drivers.

However platform-1b does not freeze (software OpenGL)
hence some correlation between hardware and Java might
cause the problem.

The common ground on all freezing platforms is the
Xorg server/client, besides the other generic dependencies.

The Xorg server/client is being treated different
when using software OpenGL or proprietary EGL/ES.

+++

Cause: TBD

~Sven


signature.asc (910 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Linux ARM freezes (Java, EGL/ES, JOGL)

Sven Gothel
Administrator
Added bug entry:
  https://jogamp.org/bugzilla/show_bug.cgi?id=559

Please discuss it there .. thx.

The bugzilla entry also includes the syslog message, which I have missed in
the original post:

Mar  5 17:27:34 panda01 kernel: [  372.084716] INFO: task java:1503 blocked for more than 120 seconds.
Mar  5 17:27:34 panda01 kernel: [  372.084716] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar  5 17:27:34 panda01 kernel: [  372.084716] java            D c0576f58     0  1503   1167 0x00000000
Mar  5 17:27:34 panda01 kernel: [  372.084716] [<c0576f58>] (__schedule+0x4f0/0x5cc) from [<c0578d1c>] (__down_read+0xc0/0xd8)
Mar  5 17:27:34 panda01 kernel: [  372.084716] [<c0578d1c>] (__down_read+0xc0/0xd8) from [<c057b0e8>] (do_page_fault.part.2+0x90/0x1f8)
Mar  5 17:27:34 panda01 kernel: [  372.085205] [<c057b0e8>] (do_page_fault.part.2+0x90/0x1f8) from [<c057b2ec>] (do_page_fault+0x9c/0xac)
Mar  5 17:27:34 panda01 kernel: [  372.085266] [<c057b2ec>] (do_page_fault+0x9c/0xac) from [<c0008674>] (do_DataAbort+0x34/0x98)
Mar  5 17:27:34 panda01 kernel: [  372.085266] [<c0008674>] (do_DataAbort+0x34/0x98) from [<c05797d8>] (__dabt_svc+0x38/0x60)
Mar  5 17:27:34 panda01 kernel: [  372.085266] Exception stack(0xeb9a5ec0 to 0xeb9a5f08)
Mar  5 17:27:34 panda01 kernel: [  372.085357] 5ec0: 595ac000 595ae000 00000020 0000001f ee5a555c 595ad9f4 595ac680 ee5a5520
Mar  5 17:27:34 panda01 kernel: [  372.085357] 5ee0: 00000000 eb9a4000 00000000 000002ff 595ad000 eb9a5f08 c0011758 c0019958
Mar  5 17:27:34 panda01 kernel: [  372.085357] 5f00: 800f0113 ffffffff
Mar  5 17:27:34 panda01 kernel: [  372.085510] [<c05797d8>] (__dabt_svc+0x38/0x60) from [<c0019958>] (v7_coherent_kern_range+0x1c/0x7c)
Mar  5 17:27:34 panda01 kernel: [  372.085510] [<c0019958>] (v7_coherent_kern_range+0x1c/0x7c) from [<c0011758>] (arm_syscall+0x140/0x294)
Mar  5 17:27:34 panda01 kernel: [  372.085632] [<c0011758>] (arm_syscall+0x140/0x294) from [<c000d500>] (ret_fast_syscall+0x0/0x30)