Administrator
|
Dear Xerxes, dear All,
allow me to summarize my findings about the Linux ARM freezes. +++ Phenomenon: The freeze I am reporting is characterized by - hanging java process - the command 'ps ax' hangs before the line where it probably shall report the java process - syslog message: - 'kill -9 <PID>' doesn't work - reboot freezes as well, the reset button needs to be pressed This is different then an implementation error, eg. 'software deadlock', since such freeze shall not affect the overall system and the user process shall be interrupt-able. +++ The native es2redsquare didn't freeze the machine so far, 800 loops from the shell etc. cd ./jogl/src/test/native/mesa-demos-patched bash make.sh es2redsquare.c bash shell_loop.sh +++ TestRedSquareES2NEWT or TestGearsES2NEWT with '-loops 1000 -loop-shutdown 1 -time 100' doesn't frees either. Note: '-loop-shutdown 2' triggers a bug in EGL, eglGetDisplay(..) fails sometime, probably some EGL race condition ? +++ Lately test of 'shell' loops w/ TestRedSquareES2NEWT or TestGearsES2NEWT and the args '-loops 1 -time 100' didn't freeze the machines, tested a few times until ~250. +++ Platform-1a + Platform-2: The remote NEWT unit tests pass properly the 1st time. You have to remove the AWT*NEWT* test collection manually from the junit.run.remote.ssh target in build-test.xml. However a 2nd run freezes the machines (pandaboard/ac100) within an arbitrary test. Running all remote unit tests (default) freezes both machines within the 'AWT/NEWT tests', which comes after the NEWT only tests. +++ Platform-1b: Running the NEWT unit tests, occasional 'hangs' occur in: 'jogamp.opengl.x11.glx.GLX.dispatch_glXMakeContextCurrent1' 'ps ax' works and discloses the PID, which can be killed via 'kill -9 <PID>'. The unit tests then continue properly. +++ This has been reproduced w/ OpenJDK - IcedTea6 1.11pre) (6b23~pre11-0ubuntu1.11.10.2) + JamVM (build 1.6.0-devel, inline-threaded interpreter with stack-caching) - Oracle J2SE/JRE build 1.6.0_30-b12 + Java HotSpot(TM) Client VM (build 20.5-b03, mixed mode) Platform-1a: - Pandaboard ES (Omap4) - Ubuntu 11.10 - GLX and Mesa3D Software 'enabled' - EGL/ES: pvr-omap4 1.7.10.0.1.9-1 - Linux panda01 3.1.0-1282-omap4 #11-Ubuntu SMP PREEMPT Mon Feb 13 15:38:55 UTC 2012 armv7l armv7l armv7l GNU/Linux Platform-1b: - Pandaboard ES (Omap4) - Ubuntu 11.10 - GLX and Mesa3D Software 'enabled' - EGL/ES: disabled (moved libEGL* libGLESv* away) - Linux panda01 3.1.0-1282-omap4 #11-Ubuntu SMP PREEMPT Mon Feb 13 15:38:55 UTC 2012 armv7l armv7l armv7l GNU/Linux Platform-2: - Toshiba AC100 (Tegra2) - Ubuntu 11.10 - GLX and Mesa3D Software 'enabled' - EGL/ES: nvidia-tegra 12~beta1-0ubuntu1 - Linux jautab02 2.6.38-1001-ac100 #2-Ubuntu SMP PREEMPT Tue Dec 20 08:05:25 UTC 2011 armv7l armv7l armv7l GNU/Linux +++ The freeze is completely arbitrary, rarely it happens within the demo code's call of EGLContextImpl.makeCurrent(), but more often before test setup or finish w/o any EGL/ES calls involved. +++ Both platforms have a similar if not equal package setup. They differ in their: - Linux kernel - EGL/ES driver. Since the internal loop and neither the native test could reproduce this freeze, one could assumed that the EGL/ES drivers are not the culprit. This assumption may also been deduced knowing that platform-1a and platform-2 use different EGL/ES drivers. However platform-1b does not freeze (software OpenGL) hence some correlation between hardware and Java might cause the problem. The common ground on all freezing platforms is the Xorg server/client, besides the other generic dependencies. The Xorg server/client is being treated different when using software OpenGL or proprietary EGL/ES. +++ Cause: TBD ~Sven signature.asc (910 bytes) Download Attachment |
Administrator
|
Added bug entry:
https://jogamp.org/bugzilla/show_bug.cgi?id=559 Please discuss it there .. thx. The bugzilla entry also includes the syslog message, which I have missed in the original post: Mar 5 17:27:34 panda01 kernel: [ 372.084716] INFO: task java:1503 blocked for more than 120 seconds. Mar 5 17:27:34 panda01 kernel: [ 372.084716] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 5 17:27:34 panda01 kernel: [ 372.084716] java D c0576f58 0 1503 1167 0x00000000 Mar 5 17:27:34 panda01 kernel: [ 372.084716] [<c0576f58>] (__schedule+0x4f0/0x5cc) from [<c0578d1c>] (__down_read+0xc0/0xd8) Mar 5 17:27:34 panda01 kernel: [ 372.084716] [<c0578d1c>] (__down_read+0xc0/0xd8) from [<c057b0e8>] (do_page_fault.part.2+0x90/0x1f8) Mar 5 17:27:34 panda01 kernel: [ 372.085205] [<c057b0e8>] (do_page_fault.part.2+0x90/0x1f8) from [<c057b2ec>] (do_page_fault+0x9c/0xac) Mar 5 17:27:34 panda01 kernel: [ 372.085266] [<c057b2ec>] (do_page_fault+0x9c/0xac) from [<c0008674>] (do_DataAbort+0x34/0x98) Mar 5 17:27:34 panda01 kernel: [ 372.085266] [<c0008674>] (do_DataAbort+0x34/0x98) from [<c05797d8>] (__dabt_svc+0x38/0x60) Mar 5 17:27:34 panda01 kernel: [ 372.085266] Exception stack(0xeb9a5ec0 to 0xeb9a5f08) Mar 5 17:27:34 panda01 kernel: [ 372.085357] 5ec0: 595ac000 595ae000 00000020 0000001f ee5a555c 595ad9f4 595ac680 ee5a5520 Mar 5 17:27:34 panda01 kernel: [ 372.085357] 5ee0: 00000000 eb9a4000 00000000 000002ff 595ad000 eb9a5f08 c0011758 c0019958 Mar 5 17:27:34 panda01 kernel: [ 372.085357] 5f00: 800f0113 ffffffff Mar 5 17:27:34 panda01 kernel: [ 372.085510] [<c05797d8>] (__dabt_svc+0x38/0x60) from [<c0019958>] (v7_coherent_kern_range+0x1c/0x7c) Mar 5 17:27:34 panda01 kernel: [ 372.085510] [<c0019958>] (v7_coherent_kern_range+0x1c/0x7c) from [<c0011758>] (arm_syscall+0x140/0x294) Mar 5 17:27:34 panda01 kernel: [ 372.085632] [<c0011758>] (arm_syscall+0x140/0x294) from [<c000d500>] (ret_fast_syscall+0x0/0x30) |
Free forum by Nabble | Edit this page |