Ticket #1841 (closed defect: fixed)

Opened 6 years ago

Last modified 4 years ago

white screen of death (WSOD) after resume

Reported by: Rorschach Owned by: openmoko-devel
Priority: highest Milestone:
Component: unknown Version: GTA02v5
Severity: critical Keywords: wsod,resume
Cc: r0rschach@…, john_lee@… Blocked By:
Blocking: Estimated Completion (week):
HasPatchForReview: no PatchReviewResult:
Reproducible: always

Description

Because there's no bug-report describing exactly this problem I'm opening a new one:

After suspending and resuming the screen is all white. The phone ist still working, you can ssh to it and everything. Just the screen stays white. The screen will stay white forever, no matter how long you wait it changes nothing. The correct screen never comes back again. You have to shutdown by pressing the startbutton for a time or reboot (via ssh) to use your phone again.

This happens with every OS I tested!

This is imo a high critical bug and is preventing the daily usage of the Neo Freerunner because of decreased battery-lifetime without being able to resume. This bug is widly known on the mailinglist and irc for several weeks but it seems no progress has been made into fixing it.

While other bugs like the 3G-Sim bug make the phone unusable for a certain group of users this bug makes it unusable for everyone as daily phone.

Tested OSs by me:

  • 2008.8
  • FSO2
  • Qtopia
  • Debian

How to reproduce:

  • suspend Neo Freerunner for more than 10-20min, try to resume after that

Interesting behavior:

  • The screen still reacts on some things. e.g. under debian it gets darkened after a few seconds of inactivity by the screenlocker and it gets brigth (white) again when you touch it.

Attachments

wsod-gta02.dmesg.txt (15.6 KB) - added by nelg 6 years ago.
added wsof dmesg
dmesg.txt (15.2 KB) - added by GiggleLoop 6 years ago.
wsod-gta02.dmesg1.txt (15.2 KB) - added by nelg 6 years ago.
dmesg using updated u-boot + FDOM
wsod-gta02-wogsm.dmesg.txt (15.2 KB) - added by kapal 6 years ago.
wsod_logs.zip (22.0 KB) - added by VRGhost 6 years ago.
Several log file/program output
dmesg_2008.9.txt (15.2 KB) - added by nelg 6 years ago.
dmesg from 2008.9 install, when showing wsod after resume.
regs.txt (1.3 KB) - added by mrmoku 6 years ago.
jbt6k74_no_deep_sleep.patch (7.3 KB) - added by nicolas.dufresne 6 years ago.
Path to disable deep sleep, and get rid of WSOD. Applyable to origin/stable git.
jbt6k74_cleanup_no_deep_sleep.patch (4.0 KB) - added by nicolas.dufresne 6 years ago.
This is an enhancement on top of previous patch. This implementation does not break switching from VGA to QVGA.

Change History

comment:1 Changed 6 years ago by mwester@…

Let's begin by determining what kernel you have running, as this is probably more likely to be a kernel issue than a userspace issue if it happens on all those different rootfs' -- can you provide information regarding the kernel version and package version for each of the environments you've tested? The output from "uname -a" will be somewhat helpful; even better would be the full filename of the kernel image when it was downloaded to be flashed (i.e. "uImage-2.6.24+git1+cb3cc53a76c7f1f7c827d048db7a849e77071515-r1.01-om-gta02.bin")

comment:2 follow-up: ↓ 4 Changed 6 years ago by zecke

Small question. How is this different from #1621?

comment:3 in reply to: ↑ description Changed 6 years ago by wiml

FWIW, I'm seeing a similar problem. I saw it twice with the original firmware that was on my GTA02 when I bought it; this prompted me to upgrade to 2008.8 (uImage and rootfs via dfu-util), and it's still happening pretty regularly.

root@om-gta02:~# uname -a
Linux om-gta02 2.6.24 #1 PREEMPT Thu Aug 7 15:57:11 CST 2008 armv4tl unknown

and the images I installed (should be the "standard" 2008.8 ?):

9d11744284f180b28423d6bfcd46bff2  Om2008.8-gta02-20080808.rootfs.jffs2
f5a0e045ec9a2f0f08513fffc14b362e  Om2008.8-gta02-20080808.uImage.bin

comment:4 in reply to: ↑ 2 ; follow-up: ↓ 6 Changed 6 years ago by Rorschach

Replying to zecke:

Small question. How is this different from #1621?

In 1621 there are two different sympthoms: For some the screen gets functional after the wsod after some time for others it doesn't. Also andy sais that there are two different issues as last comment in this bug. So if I understood your right in the last bug-report (about the pin-dialog problem) this should result in two different bug-reports? Or am I wrong?

comment:5 follow-up: ↓ 19 Changed 6 years ago by andy

It's hard to know what goes on for you at the moment. Here is some more information about the issue.

The "white screen" thing is what we see with no video data coming OR the LCM ASIC is unconfigured.

Glamo stops sending video data at suspend because we put it in very low power state with no PLLs running. So, it's normal in early resume in fact the LCM would show WSOD -- but the backlight is down at that time. And it is also normal that after some short time first Glamo LCD controller is resumed and then we reinitialize LCM ASIC. Then we bring up backlight so you should never see any white display.

I have seen two behaviours that give WSOD despite the happy story above -- first is during resume, the Glamo's Reset pin is falsely activated by a spike. Because we assume most Glamo registers survive suspend, this kills us dead and the partial resume action we do on the registers does not succeed to issue any video. So with that problem you get a sticky WSOD until you reboot.

The second behaviour is that somehow resume is delayed by another driver (?) sometimes, then the normal sequencing does happen but delayed. During the delayed portion, the backlight came up normally but LCM ASIC reinit did not happen, so you see the WSOD until that completes.

comment:6 in reply to: ↑ 4 Changed 6 years ago by zecke

Replying to Rorschach:

Replying to zecke:

Small question. How is this different from #1621?

In 1621 there are two different sympthoms: For some the screen gets functional after the wsod after some time for others it doesn't. Also andy sais that there are two different issues as last comment in this bug. So if I understood your right in the last bug-report (about the pin-dialog problem) this should result in two different bug-reports? Or am I wrong?

No, you are not wrong at all. I just wanted to figure out the relationship between these two. It is quite hard to read every comment and I was not sure about the relationship.

comment:7 Changed 6 years ago by pbondo

Just a confirmation that I also see this issue quite often. Leaving the phone in suspend for 2 minutes is enough for me to provoke it.

comment:8 Changed 6 years ago by pbondo

Hello again

Should I start to feel worried that I bought a rather expensive brick ?

If Andy is correct in his assumption that the sticky WSOD is caused by a reset of the glamo chip, then the obvious thing would be to look into the specs of this chip.

However according to the hardware page the specs are only available under NDA: http://wiki.openmoko.org/wiki/Neo_FreeRunner_GTA02_Hardware

This will in effect reduce the resource pool to internal OpenMoko? developers.

According to the svn log: svn log https://svn.openmoko.org/trunk/src/target/kernel/patches/smedia-glamo.patch

this was handled by Harald Welte until he left last year and since then it appears that no one has touched it.

How do we proceed from here ? Will someone from OpenMoko? step up and look into it, or should I simply return the phone as a DOA hardware ? It obviously has a hardware error, but it should be solvable by software.

As it stands the device is useless as a phone.

comment:9 Changed 6 years ago by andy

There's some wrong assumptions there, we binned svn for kernel work and use git for many months. There's tons of stuff happened to glamo since then in the kernel.

http://git.openmoko.org/?p=kernel.git;a=tree;f=drivers/mfd/glamo;h=307b9483be4b681140b0ba96edf094de02cc0339;hb=stable

Also the triggering of hard reset on Glamo is queer. I have never ever seen it occur randomly during use, only during either suspend or resume action. It's part of a larger syndrome of races and crap in kernel suspend / resume that we very slowly clamp down on.

comment:10 Changed 6 years ago by pbondo

Well thanks Andy, I am very glad to stand corrected :-)

As you state I have also only seen it during suspend/resume. But I see it on every 2 or 3 resume.

Regarding the svn => git conversion I should have checked it better before posting. I simply jumped into the reference from http://wiki.openmoko.org/wiki/Neo_FreeRunner_GTA02_Hardware. Sorry about that, and I will keep waiting for it to be "debricked".

Also everyone I show it to is thrilled by the display.

comment:11 Changed 6 years ago by theseer105

Just a confirmation that I also see this issue if the phone is suspended for 50 and more minutes.

comment:12 Changed 6 years ago by nelg

any update on this defect?

comment:13 Changed 6 years ago by mwester@…

If you are experiencing this problem, can you please attach the output from "dmesg"?

comment:14 Changed 6 years ago by nelg

Below is a dmesg, during a wsof. I suspended, waited, tried to resume. got wsof. plugged in the neo freerunner, and logged in, then got the dmesg.

This is running: qtopia-4.3.3-snapshot-09012242-gta02-flash.tgz
I have had the wsof with om2008.8-update as well.

root@om-gta02:~# cat /etc/om-version
Tag Name:
VERSION: 20be0ea74a43d5a66f9506940e359986e6a1924b
Branch: org.openmoko.dev
Build Host: buildhost.openmoko.org
Time Stamp: Sun, 10 Aug 2008 02:43:00 +0200
root@om-gta02:~# uname -a
Linux om-gta02 2.6.24 #4 PREEMPT Sun Aug 3 16:16:27 CDT 2008 armv4tl unknown
root@om-gta02:~# dmesg
c>] (dump_stack+0x0/0x14) from [<c016655c>] (kobject_add+0x184/0x1d0)
[<c01663d8>] (kobject_add+0x0/0x1d0) from [<c006a16c>] (mod_sysfs_setup+0x28/0xb4)
[<c006a144>] (mod_sysfs_setup+0x0/0xb4) from [<c006b38c>] (sys_init_module+0xf90/0x14bc)
 r8:c887088c r7:c8870a6c r6:bf047cc0 r5:00000000 r4:bf0450cc
[<c006a3fc>] (sys_init_module+0x0/0x14bc) from [<c0029f40>] (ret_fast_syscall+0x0/0x2c)
snd: exports duplicate symbol snd_add_device_sysfs_file (owned by kernel)
snd_page_alloc: exports duplicate symbol snd_free_pages (owned by kernel)
snd_timer: exports duplicate symbol snd_timer_interrupt (owned by kernel)
snd_pcm: exports duplicate symbol snd_pcm_notify (owned by kernel)
snd_soc_core: exports duplicate symbol snd_soc_put_volsw_2r (owned by kernel)
snd_soc_wm8753: exports duplicate symbol soc_codec_dev_wm8753 (owned by kernel)
snd_soc_s3c24xx: exports duplicate symbol s3c24xx_soc_platform (owned by kernel)
snd_soc_s3c24xx_i2s: exports duplicate symbol s3c24xx_i2s_dai (owned by kernel)
sysfs: duplicate filename 'soc-audio' can not be created
WARNING: at fs/sysfs/dir.c:424 sysfs_add_one()
[<c002ebec>] (dump_stack+0x0/0x14) from [<c00d8444>] (sysfs_add_one+0x50/0xfc)
[<c00d83f4>] (sysfs_add_one+0x0/0xfc) from [<c00d8b40>] (create_dir+0x58/0xa8)
 r6:fffffff4 r5:c7481df0 r4:c754cb0c
[<c00d8ae8>] (create_dir+0x0/0xa8) from [<c00d8bd0>] (sysfs_create_dir+0x40/0x60)
 r8:c886b45c r7:c03db7f8 r6:00000000 r5:c7632600 r4:c7632670
[<c00d8b90>] (sysfs_create_dir+0x0/0x60) from [<c01664cc>] (kobject_add+0xf4/0x1d0)
 r4:c7632670
[<c01663d8>] (kobject_add+0x0/0x1d0) from [<c01a54e8>] (device_add+0x88/0x48c)
[<c01a5460>] (device_add+0x0/0x48c) from [<c01a9400>] (platform_device_add+0x100/0x154)
[<c01a9300>] (platform_device_add+0x0/0x154) from [<bf042064>] (neo1973_gta02_init+0x64/0xc8 [snd_soc_neo1973_gta02_wm8753])
 r7:00000001 r6:bf040da0 r5:bf040f04 r4:fffffff4
[<bf042000>] (neo1973_gta02_init+0x0/0xc8 [snd_soc_neo1973_gta02_wm8753]) from [<c006b7d4>] (sys_init_module+0x13d8/0x14bc)
 r5:c7e5f660 r4:c7e5f688
[<c006a3fc>] (sys_init_module+0x0/0x14bc) from [<c0029f40>] (ret_fast_syscall+0x0/0x2c)
kobject_add failed for soc-audio with -EEXIST, don't try to register things with the same name in the same directory.
[<c002ebec>] (dump_stack+0x0/0x14) from [<c016655c>] (kobject_add+0x184/0x1d0)
[<c01663d8>] (kobject_add+0x0/0x1d0) from [<c01a54e8>] (device_add+0x88/0x48c)
[<c01a5460>] (device_add+0x0/0x48c) from [<c01a9400>] (platform_device_add+0x100/0x154)
[<c01a9300>] (platform_device_add+0x0/0x154) from [<bf042064>] (neo1973_gta02_init+0x64/0xc8 [snd_soc_neo1973_gta02_wm8753])
 r7:00000001 r6:bf040da0 r5:bf040f04 r4:fffffff4
[<bf042000>] (neo1973_gta02_init+0x0/0xc8 [snd_soc_neo1973_gta02_wm8753]) from [<c006b7d4>] (sys_init_module+0x13d8/0x14bc)
 r5:c7e5f660 r4:c7e5f688
[<c006a3fc>] (sys_init_module+0x0/0x14bc) from [<c0029f40>] (ret_fast_syscall+0x0/0x2c)
snd: exports duplicate symbol snd_add_device_sysfs_file (owned by kernel)
snd_page_alloc: exports duplicate symbol snd_free_pages (owned by kernel)
snd_timer: exports duplicate symbol snd_timer_interrupt (owned by kernel)
snd_pcm: exports duplicate symbol snd_pcm_notify (owned by kernel)
snd_soc_core: exports duplicate symbol snd_soc_put_volsw_2r (owned by kernel)
snd_soc_wm8753: exports duplicate symbol soc_codec_dev_wm8753 (owned by kernel)
snd_soc_s3c24xx: exports duplicate symbol s3c24xx_soc_platform (owned by kernel)
snd_soc_s3c24xx_i2s: exports duplicate symbol s3c24xx_i2s_dai (owned by kernel)
Only GTA01 hardware supported by ASoc driver
ADDRCONF(NETDEV_UP): usb0: link is not ready
Alignment trap: hald-probe-volu (1299) PC=0x0000ce78 Instr=0xe59b300c Address=0xbeb9e5aa FSR 0x013
Alignment trap: hald-probe-volu (1299) PC=0x0000ce94 Instr=0xe59b1008 Address=0xbeb9e5a6 FSR 0x013
Alignment trap: hald-probe-volu (1299) PC=0x0000ce78 Instr=0xe59b300c Address=0xbeb9e5ba FSR 0x013
Alignment trap: hald-probe-volu (1299) PC=0x0000ce94 Instr=0xe59b1008 Address=0xbeb9e5b6 FSR 0x013
Alignment trap: hald-probe-volu (1299) PC=0x0000ce78 Instr=0xe59b300c Address=0xbeb9e5ca FSR 0x013
Alignment trap: hald-probe-volu (1299) PC=0x0000ce94 Instr=0xe59b1008 Address=0xbeb9e5c6 FSR 0x013
Alignment trap: hald-probe-volu (1299) PC=0x0000ce78 Instr=0xe59b300c Address=0xbeb9e5da FSR 0x013
Alignment trap: hald-probe-volu (1299) PC=0x0000ce94 Instr=0xe59b1008 Address=0xbeb9e5d6 FSR 0x013
neo1973-pm-bt neo1973-pm-bt.0: GTA02 Set PCF50633 LDO4 = 3200
usb 1-1: new full speed USB device using s3c2410-ohci and address 2
usb 1-1: configuration #1 chosen from 1 choice
bluetooth: exports duplicate symbol bt_sock_wait_state (owned by kernel)
bluetooth: exports duplicate symbol bt_sock_wait_state (owned by kernel)
sysfs: duplicate filename 'hci_usb' can not be created
WARNING: at fs/sysfs/dir.c:424 sysfs_add_one()
[<c002ebec>] (dump_stack+0x0/0x14) from [<c00d8444>] (sysfs_add_one+0x50/0xfc)
[<c00d83f4>] (sysfs_add_one+0x0/0xfc) from [<c00d8b40>] (create_dir+0x58/0xa8)
 r6:fffffff4 r5:c6c03e48 r4:c744b954
[<c00d8ae8>] (create_dir+0x0/0xa8) from [<c00d8bd0>] (sysfs_create_dir+0x40/0x60)
 r8:00000008 r7:bf040e6c r6:bf042568 r5:bf042520 r4:bf042568
[<c00d8b90>] (sysfs_create_dir+0x0/0x60) from [<c01664cc>] (kobject_add+0xf4/0x1d0)
 r4:bf042568
[<c01663d8>] (kobject_add+0x0/0x1d0) from [<c006a16c>] (mod_sysfs_setup+0x28/0xb4)
[<c006a144>] (mod_sysfs_setup+0x0/0xb4) from [<c006b38c>] (sys_init_module+0xf90/0x14bc)
 r8:c886ca04 r7:c886cbbc r6:bf042520 r5:00000000 r4:bf040e6c
[<c006a3fc>] (sys_init_module+0x0/0x14bc) from [<c0029f40>] (ret_fast_syscall+0x0/0x2c)
kobject_add failed for hci_usb with -EEXIST, don't try to register things with the same name in the same directory.
[<c002ebec>] (dump_stack+0x0/0x14) from [<c016655c>] (kobject_add+0x184/0x1d0)
[<c01663d8>] (kobject_add+0x0/0x1d0) from [<c006a16c>] (mod_sysfs_setup+0x28/0xb4)
[<c006a144>] (mod_sysfs_setup+0x0/0xb4) from [<c006b38c>] (sys_init_module+0xf90/0x14bc)
 r8:c886ca04 r7:c886cbbc r6:bf042520 r5:00000000 r4:bf040e6c
[<c006a3fc>] (sys_init_module+0x0/0x14bc) from [<c0029f40>] (ret_fast_syscall+0x0/0x2c)
sysfs: duplicate filename 'hci_usb' can not be created
WARNING: at fs/sysfs/dir.c:424 sysfs_add_one()
[<c002ebec>] (dump_stack+0x0/0x14) from [<c00d8444>] (sysfs_add_one+0x50/0xfc)
[<c00d83f4>] (sysfs_add_one+0x0/0xfc) from [<c00d8b40>] (create_dir+0x58/0xa8)
 r6:fffffff4 r5:c6c4fe48 r4:c744b954
[<c00d8ae8>] (create_dir+0x0/0xa8) from [<c00d8bd0>] (sysfs_create_dir+0x40/0x60)
 r8:00000008 r7:bf040e6c r6:bf042568 r5:bf042520 r4:bf042568
[<c00d8b90>] (sysfs_create_dir+0x0/0x60) from [<c01664cc>] (kobject_add+0xf4/0x1d0)
 r4:bf042568
[<c01663d8>] (kobject_add+0x0/0x1d0) from [<c006a16c>] (mod_sysfs_setup+0x28/0xb4)
[<c006a144>] (mod_sysfs_setup+0x0/0xb4) from [<c006b38c>] (sys_init_module+0xf90/0x14bc)
 r8:c886ca04 r7:c886cbbc r6:bf042520 r5:00000000 r4:bf040e6c
[<c006a3fc>] (sys_init_module+0x0/0x14bc) from [<c0029f40>] (ret_fast_syscall+0x0/0x2c)
kobject_add failed for hci_usb with -EEXIST, don't try to register things with the same name in the same directory.
[<c002ebec>] (dump_stack+0x0/0x14) from [<c016655c>] (kobject_add+0x184/0x1d0)
[<c01663d8>] (kobject_add+0x0/0x1d0) from [<c006a16c>] (mod_sysfs_setup+0x28/0xb4)
[<c006a144>] (mod_sysfs_setup+0x0/0xb4) from [<c006b38c>] (sys_init_module+0xf90/0x14bc)
 r8:c886ca04 r7:c886cbbc r6:bf042520 r5:00000000 r4:bf040e6c
[<c006a3fc>] (sys_init_module+0x0/0x14bc) from [<c0029f40>] (ret_fast_syscall+0x0/0x2c)
mapped channel 10 to 2
kernel BUG at net/bluetooth/rfcomm/tty.c:313!
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = c76ac000
[00000000] *pgd=37602031, *pte=00000000, *ppte=00000000
Internal error: Oops: 817 [#1] PREEMPT
Modules linked in: ipv6
CPU: 0    Not tainted  (2.6.24 #4)
PC is at __bug+0x20/0x2c
LR is at preempt_schedule+0x48/0x58
pc : [<c002e5fc>]    lr : [<c0300d50>]    psr: 60000013
sp : c76b1e88  ip : c76b1dd8  fp : c76b1e94
r10: c74c9e00  r9 : c76b0000  r8 : c002a0e8
r7 : c0426c50  r6 : bee9d5c0  r5 : 400452c9  r4 : c75e73e0
r3 : 00000000  r2 : c76b0000  r1 : c75ef9e0  r0 : 00000031
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: c000717f  Table: 376ac000  DAC: 00000015
Process qpe (pid: 1363, stack limit = 0xc76b0268)
Stack: (0xc76b1e88 to 0xc76b2000)
1e80:                   c76b1eac c76b1e98 c02dd4e4 c002e5ec 400452c9 c75e73e0
1ea0: c76b1eec c76b1eb0 c02ddfb4 c02dd4cc 00000000 00000004 00000000 00000000
1ec0: 00000000 00000000 c74c9e00 400452c9 bee9d5c0 c0426c50 c002a0e8 00000000
1ee0: c76b1f0c c76b1ef0 c02dbfec c02ddb18 c76b1f24 c77e8360 400452c9 bee9d5c0
1f00: c76b1f2c c76b1f10 c0256c1c c02dbfc4 c77e8360 bee9d5c0 400452c9 00000036
1f20: c76b1f4c c76b1f30 c00a42ac c0256a30 c00980c0 c77e8360 c798be48 bee9d5c0
1f40: c76b1f7c c76b1f50 c00a45c0 c00a4280 c76b1f7c c76b1f60 c0098418 c77e8360
1f60: fffffff7 400452c9 00000036 c002a0e8 c76b1fa4 c76b1f80 c00a4630 c00a431c
1f80: c0257ed0 00000001 0050ba50 00000000 0000003d 0050ba64 00000000 c76b1fa8
1fa0: c0029f40 c00a4600 00000000 0000003d 0000003d 400452c9 bee9d5c0 00000004
1fc0: 00000000 0000003d 0050ba64 00000036 0028abd8 bee9d694 403346f0 00449dc0
1fe0: 002788c0 bee9d5c0 402a38b0 4ac0dd4c 20000010 0000003d 00000000 00000000
Backtrace:
[<c002e5dc>] (__bug+0x0/0x2c) from [<c02dd4e4>] (rfcomm_dev_del+0x28/0xac)
[<c02dd4bc>] (rfcomm_dev_del+0x0/0xac) from [<c02ddfb4>] (rfcomm_dev_ioctl+0x4ac/0x7f8)
 r4:c75e73e0
[<c02ddb08>] (rfcomm_dev_ioctl+0x0/0x7f8) from [<c02dbfec>] (rfcomm_sock_ioctl+0x38/0x4c)
[<c02dbfb4>] (rfcomm_sock_ioctl+0x0/0x4c) from [<c0256c1c>] (sock_ioctl+0x1fc/0x258)
 r6:bee9d5c0 r5:400452c9 r4:c77e8360
[<c0256a20>] (sock_ioctl+0x0/0x258) from [<c00a42ac>] (do_ioctl+0x3c/0x9c)
 r7:00000036 r6:400452c9 r5:bee9d5c0 r4:c77e8360
[<c00a4270>] (do_ioctl+0x0/0x9c) from [<c00a45c0>] (vfs_ioctl+0x2b4/0x2e4)
 r6:bee9d5c0 r5:c798be48 r4:c77e8360
[<c00a430c>] (vfs_ioctl+0x0/0x2e4) from [<c00a4630>] (sys_ioctl+0x40/0x60)
 r8:c002a0e8 r7:00000036 r6:400452c9 r5:fffffff7 r4:c77e8360
[<c00a45f0>] (sys_ioctl+0x0/0x60) from [<c0029f40>] (ret_fast_syscall+0x0/0x2c)
 r6:0050ba64 r5:0000003d r4:00000000
Code: e1a01000 e59f000c eb006922 e3a03000 (e5833000)
---[ end trace bf02e3b86bfa0554 ]---
usb0: full speed config #1: 500 mA, Ethernet Gadget, using CDC Ethernet
ADDRCONF(NETDEV_CHANGE): usb0: link becomes ready
usb0: no IPv6 routers present
usb 1-1: USB disconnect, address 2
neo1973-pm-bt neo1973-pm-bt.0: GTA02 Set PCF50633 LDO4 = 3200
usb 1-1: new full speed USB device using s3c2410-ohci and address 3
usb 1-1: configuration #1 chosen from 1 choice
bluetooth: exports duplicate symbol bt_sock_wait_state (owned by kernel)
bluetooth: exports duplicate symbol bt_sock_wait_state (owned by kernel)
sysfs: duplicate filename 'hci_usb' can not be created
WARNING: at fs/sysfs/dir.c:424 sysfs_add_one()
[<c002ebec>] (dump_stack+0x0/0x14) from [<c00d8444>] (sysfs_add_one+0x50/0xfc)
[<c00d83f4>] (sysfs_add_one+0x0/0xfc) from [<c00d8b40>] (create_dir+0x58/0xa8)
 r6:fffffff4 r5:c6ca5e48 r4:c744bda0
[<c00d8ae8>] (create_dir+0x0/0xa8) from [<c00d8bd0>] (sysfs_create_dir+0x40/0x60)
 r8:00000008 r7:bf040e6c r6:bf042568 r5:bf042520 r4:bf042568
[<c00d8b90>] (sysfs_create_dir+0x0/0x60) from [<c01664cc>] (kobject_add+0xf4/0x1d0)
 r4:bf042568
[<c01663d8>] (kobject_add+0x0/0x1d0) from [<c006a16c>] (mod_sysfs_setup+0x28/0xb4)
[<c006a144>] (mod_sysfs_setup+0x0/0xb4) from [<c006b38c>] (sys_init_module+0xf90/0x14bc)
 r8:c886ca04 r7:c886cbbc r6:bf042520 r5:00000000 r4:bf040e6c
[<c006a3fc>] (sys_init_module+0x0/0x14bc) from [<c0029f40>] (ret_fast_syscall+0x0/0x2c)
kobject_add failed for hci_usb with -EEXIST, don't try to register things with the same name in the same directory.
[<c002ebec>] (dump_stack+0x0/0x14) from [<c016655c>] (kobject_add+0x184/0x1d0)
[<c01663d8>] (kobject_add+0x0/0x1d0) from [<c006a16c>] (mod_sysfs_setup+0x28/0xb4)
[<c006a144>] (mod_sysfs_setup+0x0/0xb4) from [<c006b38c>] (sys_init_module+0xf90/0x14bc)
 r8:c886ca04 r7:c886cbbc r6:bf042520 r5:00000000 r4:bf040e6c
[<c006a3fc>] (sys_init_module+0x0/0x14bc) from [<c0029f40>] (ret_fast_syscall+0x0/0x2c)
sysfs: duplicate filename 'hci_usb' can not be created
WARNING: at fs/sysfs/dir.c:424 sysfs_add_one()
[<c002ebec>] (dump_stack+0x0/0x14) from [<c00d8444>] (sysfs_add_one+0x50/0xfc)
[<c00d83f4>] (sysfs_add_one+0x0/0xfc) from [<c00d8b40>] (create_dir+0x58/0xa8)
 r6:fffffff4 r5:c7407e48 r4:c744bda0
[<c00d8ae8>] (create_dir+0x0/0xa8) from [<c00d8bd0>] (sysfs_create_dir+0x40/0x60)
 r8:00000008 r7:bf040e6c r6:bf042568 r5:bf042520 r4:bf042568
[<c00d8b90>] (sysfs_create_dir+0x0/0x60) from [<c01664cc>] (kobject_add+0xf4/0x1d0)
 r4:bf042568
[<c01663d8>] (kobject_add+0x0/0x1d0) from [<c006a16c>] (mod_sysfs_setup+0x28/0xb4)
[<c006a144>] (mod_sysfs_setup+0x0/0xb4) from [<c006b38c>] (sys_init_module+0xf90/0x14bc)
 r8:c886ca04 r7:c886cbbc r6:bf042520 r5:00000000 r4:bf040e6c
[<c006a3fc>] (sys_init_module+0x0/0x14bc) from [<c0029f40>] (ret_fast_syscall+0x0/0x2c)
kobject_add failed for hci_usb with -EEXIST, don't try to register things with the same name in the same directory.
[<c002ebec>] (dump_stack+0x0/0x14) from [<c016655c>] (kobject_add+0x184/0x1d0)
[<c01663d8>] (kobject_add+0x0/0x1d0) from [<c006a16c>] (mod_sysfs_setup+0x28/0xb4)
[<c006a144>] (mod_sysfs_setup+0x0/0xb4) from [<c006b38c>] (sys_init_module+0xf90/0x14bc)
 r8:c886ca04 r7:c886cbbc r6:bf042520 r5:00000000 r4:bf040e6c
[<c006a3fc>] (sys_init_module+0x0/0x14bc) from [<c0029f40>] (ret_fast_syscall+0x0/0x2c)
mapped channel 10 to 2
mapped channel 10 to 2
mapped channel 10 to 2
neo1973-pm-bt neo1973-pm-bt.0: GTA02 Set PCF50633 LDO4 = 3300
usb 1-1: USB disconnect, address 3
mapped channel 10 to 2
Syncing filesystems ... done.
PM: Preparing system for mem sleep
Freezing user space processes ... (elapsed 0.00 seconds) done.
Freezing remaining freezable tasks ... (elapsed 0.00 seconds) done.
PM: Entering mem sleep
Suspending console(s)
pcf50633 0-0073: pcf50633_suspend
glamo-mci glamo-mci.0: faking cmd 7 during suspend
mmc_set_power(power_mode=0, vdd=0
glamo-mci glamo-mci.0: glamo_mci_set_ios: power down.
gta02_udc_command(2)
suspending dma channel 0
suspending dma channel 1
suspending dma channel 2
suspending dma channel 3
GSTATUS3 0x303cb70c
GSTATUS4 0x00000000
s3c2440-i2c s3c2440-i2c: slave address 0x10
s3c2440-i2c s3c2440-i2c: bus frequency set to 390 KHz
gta02_udc_command(1)
s3c2440-nand s3c2440-nand: Tacls=3, 30ns Twrph0=7 70ns, Twrph1=3 30ns
not changing prescaler of PWM 3, since it's shared with timer4 (clock tick)
timer_usec_ticks = 7864
timer tcon=00599109, tcnt a2c1, tcfg 00000200,00002000, usec 00001eb8
mmc_set_power(power_mode=1, vdd=20
SD power -> 3200mV
glamo-mci glamo-mci.0: powered (vdd = 20) clk: 0kHz div=255 (req: 0kHz). Bus width=0
glamo-mci glamo-mci.0: powered (vdd = 20) clk: 195kHz div=255 (req: 195kHz). Bus width=0
glamo-mci glamo-mci.0: powered (vdd = 20) clk: 195kHz div=255 (req: 195kHz). Bus width=0
glamo-mci glamo-mci.0: powered (vdd = 20) clk: 195kHz div=255 (req: 195kHz). Bus width=0
glamo-mci glamo-mci.0: Error after cmd: 0x120
glamo-mci glamo-mci.0: powered (vdd = 20) clk: 195kHz div=255 (req: 195kHz). Bus width=0
glamo-mci glamo-mci.0: powered (vdd = 20) clk: 195kHz div=255 (req: 195kHz). Bus width=0
glamo-mci glamo-mci.0: powered (vdd = 20) clk: 16666kHz div=2 (req: 16666kHz). Bus width=0
glamo-mci glamo-mci.0: powered (vdd = 20) clk: 16666kHz div=2 (req: 16666kHz). Bus width=2
soc-audio soc-audio: scheduling resume work
PM: Finishing wakeup.
Restarting tasks ... <6>soc-audio soc-audio: starting resume work
done.
soc-audio soc-audio: resume work completed
usb0: full speed config #1: 500 mA, Ethernet Gadget, using CDC Ethernet
root@om-gta02:~#

Changed 6 years ago by nelg

added wsof dmesg

comment:15 Changed 6 years ago by andy

There's a lot of OOPS and crap in there that shouldn't be. It's because your kernel has sound and bluetooth drivers built-in, but your rootfs has out of date modules lying around for these it tries to use as well.

We should have moved to kernel build-specific module dir in the packaged kernel by now to stop this kind of thing from happening, but evidently we didn't.

comment:16 Changed 6 years ago by GiggleLoop

I have this problem too, I'm adding my info in case it helps track down the culprit.
The WSOD seems only to occur after more than 1-2 hours standby. Standby overnight (appx. 8 hours) triggers it all the time.

My versions:

# cat /etc/om-version
Tag Name:
VERSION: c4a208ae4114f14224a5fa37e55a8b2a51fbd5ba
Branch: org.openmoko.asu.stable
Build Host: barbie
Time Stamp: Wed, 10 Sep 2008 08:22:26 +0800

# uname -a
Linux om-gta02 2.6.24 #1 PREEMPT Wed Sep 3 19:01:18 CST 2008 armv4tl unknown
baa7e75288d30d67e263d4486d977cfa  Om2008.8-gta02-20080903.uImage.bin
92e0360acd0af200b519b4a33bd99ed5  Om2008.8-gta02-20080910.rootfs.jffs2
3c1ee3fb8a03dfa5b3b44491d33517a6  gta02v5_and_up-u-boot.bin

I'm attaching the dmesg output.

Is there anything else I can run/try to help debug?

Changed 6 years ago by GiggleLoop

Changed 6 years ago by nelg

dmesg using updated u-boot + FDOM

Changed 6 years ago by kapal

comment:17 Changed 6 years ago by nelg

Tried this with testing-om-gta02-20080917

Still get WSOD after suspend for more than a few minutes.

Looking at the output of logread from a successful resume (after just a couple of minutes) and a WSOD resume, the output seems to be much the same.

Any update on patches or things I can do to help?

Changed 6 years ago by VRGhost

Several log file/program output

comment:18 Changed 6 years ago by VRGhost

Got it with 2008.08-update (17-sep build)

Attacher archive contains:
/tmp/x.log
dmesg
logread

attachment:wsod_logs.zip

Changed 6 years ago by nelg

dmesg from 2008.9 install, when showing wsod after resume.

comment:19 in reply to: ↑ 5 Changed 6 years ago by nicolas.dufresne

Hi,

I'm having this issue too. I'm not adding any dmesg, since they are the same as others. Currently, only one reason for this has been proposed by Andy, it would be nice to have a bit more status on this.Here's the original comment:

I have seen two behaviours that give WSOD despite the happy story above -- first is during resume, the Glamo's Reset pin is falsely activated by a spike. Because we assume most Glamo registers survive suspend, this kills us dead and the partial resume action we do on the registers does not succeed to issue any video. So with that problem you get a sticky WSOD until you reboot.

Is this spike theory has been proven ? What would cause this spike ? If we know a spike might reset the Glamo, can we detect that in software and maybe do a complete reset of the Glamo ? What would be the result of always doing a complete reset of the Glamo ? I think we are all ready for experiment on this, we just need a OpenMoko? god to guide are ignorance when hacking the Glamo driver.

Thanks,
Nicolas

comment:20 follow-up: ↓ 21 Changed 6 years ago by Yohann


From my purchase (a week ago) my Neo is still unusable, in any distro I see only a white screen, sometimes started well but after the suspend or incoming call or incoming sms is a white screen again.

Video here: http://www.youtube.com/watch?v=WrLLYmKVuLc

comment:21 in reply to: ↑ 20 Changed 6 years ago by Lukyk

Replying to Yohann:


From my purchase (a week ago) my Neo is still unusable, in any distro I see only a white screen, sometimes started well but after the suspend or incoming call or incoming sms is a white screen again.

Video here: http://www.youtube.com/watch?v=WrLLYmKVuLc

From what I've seen in this video, your problem seems to be slightly different than the one discussed in this threed. We all experiencing wsod only after resume from suspend.
Anyway I've never read about such a behavior. Try booting without SD card and with a latest uBoot and kernel. If it not helps maybe it's a hardware problem...

comment:22 follow-up: ↓ 23 Changed 6 years ago by vanous

uff, my comment just got lost...

ok, second time:

the same story: wsod after some time of suspend. tried newer and newer uboot, kernel, root fs, SIM in/out, system from nand or from SD but no change. FDOM, 2008.9, 2008.8, Qtpia shapshot 4.3.3 . I can provide any logs any time, can reproduce within several minutes, have time to troubleshoot or run programs or test sequences.

Suspend is unusable by this bug, so i have two spare batteries always on me, but repowering is very annoying.

comment:23 in reply to: ↑ 22 Changed 6 years ago by vanous

OK, i think here is more info:

if i turn off the GSM module, have SIM card out, suspend and wait a bit (over two minutes), resume doesn't end with a white screen but completely black. FR is alive, responding to ping but no ssh is possible. i tried three times now and all the trials are the same:. I will still do more tests.

comment:24 Changed 6 years ago by pbondo

Well it doesn't look like it will be easy to get some attention on this subject. So we may need to consider alternative ways to draw some attention. Blackmail may well be such an option.

On Saturday I will give a public talk on a completely unrelated Linux project. However I intend to mention OpenMoko? and show the device as a reference.

So now I have a question for the affected users here: Should I put the device in suspend before I show it to the audience with a resume...., or is that unfair ?

Any OpenMoko? developer can override here by promising to take a serious look into fixing this issue.

comment:25 follow-up: ↓ 28 Changed 6 years ago by andy

Gah :-/ not so I read each of these reports in realtime. All I have to say though is the best chance for an actual fix is in 2.6.26 branch, and currently although the intended changes to improve suspend / resume are working fine there, it is caught on two nasty problems that are getting worked on and performance today there is actually worse. So apologies for not having anything better to say about it.

comment:26 Changed 6 years ago by pbondo

Hello again Andy

Ok it sounds like you are on top of it. So I will give it positive comments on Saturday.

comment:27 Changed 6 years ago by andy

It's more like it's on top of me... but anyway you should definitely just tell these folks the truth, suspend / resume is shaky at the moment but is getting attention.

comment:28 in reply to: ↑ 25 Changed 6 years ago by nelg

Replying to andy:

Hi Andy,

Great to know that this issue is being worked on. Guess it was fairly invisible to me as to what was being done.

For others wanting to know what's being done, have a look at.

http://docs.openmoko.org/trac/gitweb?p=kernel.git;a=shortlog;h=andy-tracking

Questions that's been on my mind.

Is the current opinion of the openmoko developers that this problem is fixable in software? it sounds like a yes from your last comments, but I am not sure.

Is a fix likely within a month or two?

comment:29 Changed 6 years ago by nelg

tested with: uImage-gta02-g291a9d50_mwester-stable.bin, 2.6.24mw-g291a9d50
with qtextended-4.4.1-gta02-rootfs-release-10022309.jffs2

Still get wsod on resume, but this is still a 2.6.24 kernel, so no surprise.

comment:30 follow-up: ↓ 31 Changed 6 years ago by leachim

Since i changed the display of my freerunner the white screen of death has gone.
Before every resume longer than 10 minutes resulted in a white screen issue.

I hope that this information helps to fix the problem.

Greetings, Michael

comment:31 in reply to: ↑ 30 ; follow-up: ↓ 32 Changed 6 years ago by nelg

Replying to leachim:

Since i changed the display of my freerunner the white screen of death has gone.
Before every resume longer than 10 minutes resulted in a white screen issue.

Did you replace they physical display? or are you talking something in software?

comment:32 in reply to: ↑ 31 Changed 6 years ago by Einstein

Replying to nelg:

Replying to leachim:

Since i changed the display of my freerunner the white screen of death has gone.
Before every resume longer than 10 minutes resulted in a white screen issue.

Did you replace they physical display? or are you talking something in software?

i know leachim. as i know he changed the physical Display.

comment:33 Changed 6 years ago by andy

It's an interesting clue... I guess it means that the settings in the display asic can be lost during suspend.

comment:34 Changed 6 years ago by mrmoku

I'm getting WSoD now _without_ suspending ;)

... with a current kernel (f5b973489) and current frameworkd (0f631a3df), if I leave it for some time with the screen blanked. Before I had it after suspending for more than 1 min, but not on screen blanking.

comment:35 Changed 6 years ago by andy

Is it possible you can ssh in and attach here output of

cat /sys/devices/platform/glamo3362.0/regs

when this happens? WSOD can be caused by glamo getting reset or LCM asic getting reset, or even problems with Harald's recent stuff about turning off video on glamo when we blank the framebuffer.

cat /proc/version

or the git hash on the kernel package might also help.

Changed 6 years ago by mrmoku

comment:36 Changed 6 years ago by mrmoku

0022 SU:root@om-gta02[pts/3]:~-> cat /proc/version
Linux version 2.6.24 (slug@builder) (gcc version 4.1.2) #1 PREEMPT Sat Oct 11 09:39:39 UTC 2008

which is f5b973489beb1a1239dfad53e3ad6e36ff7ee958

comment:37 Changed 6 years ago by andy

Thanks; it has Harald's patches in then. We never had that framebuffer blanking symptom reported before and those patches are to do with sleeping the LCM ASIC JBT6K74 on framebuffer blanking.

http://git.openmoko.org/?p=kernel.git;a=commitdiff;h=9aa87d67df2e59eea15f70c4e58f8bde10e5953b

I don't think his patch is at all to blame though, instead it seems to be showing us the underlying reason for the suspend WSOD too is slightly shaky comms to JBT6K74. With his patch we just talk to it more often than before so now we have new opportunities to see the trouble.

comment:38 Changed 6 years ago by jurg

i am an 'in production' neo freerunner user. i have no other phone anymore. my girlfriend told me i dropped one or two spots on the ladder of evolution. but i am sticking to it...

but, as a regular user i am very familiar with the WSOD. i have regular WSODs and lately i have begun to recognize the pattern.

i got my freerunner in the summer. even though in holland, it was relatively warm. during that period, while not having a truly stable phone, i did have a phone without WSODs. slowly the year progressed, flashing my freerunner once every week, at least. and with the change of seasons i began to see more and more WSODs.

i had to wear my phone in my the pocket of my pants for a day or two and i had no WSODs these days. i thought about it a lot (it IS very irritating) but after these couple of days it struck me. i figured it might have to do with the temperature of the expensive 'brick'. a simple test did the trick: when going to bed i left the freerunner sleeping on my router. and, contrary to most other cold mornings, my freerunner woke without the by then familiar WSOD.

now, with some behaviour changes, i managed to climb back another step up the ladder again (so i hope.)

(i hope it helps in finding a solution...)

comment:39 Changed 6 years ago by andy

  • HasPatchForReview unset

It's an interesting clue, but usually heat makes trouble since the silicon slows down, not the other way around?

comment:40 follow-up: ↓ 41 Changed 6 years ago by m0nt0

Yesterday i left my Free runner on the desk, powered on, screen locked but without suspend, this morning i touched the screen and the WSoD come up. Yes i'm sure the phone is not in suspension, only the screen off. Actual setup: latest FDOM, kernel daily build 2008-11-03, uboot daily build 2008-11-03, a personalized splash, but i hope that at least the splash is not related.
So can this be related to the screen and not to the glamo?

comment:41 in reply to: ↑ 40 Changed 6 years ago by vanous

I have the same setup/configuration except the custom splash and i have to say that WSOD happens now very often. Of course no suspend possible but now to make sure the FR stays usable i had to turn off the dimming function... so things actually got worse. Somebody was supposedly going to look at the glamo code... any news?

thank you
Petr

comment:42 follow-up: ↓ 43 Changed 6 years ago by jensbruhn

Same here, WSOD with my Neo1973 and FSO4. Occurs just after resume.

comment:43 in reply to: ↑ 42 Changed 6 years ago by Raphexion

Replying to jensbruhn:

Same here, WSOD with my Neo1973 and FSO4. Occurs just after resume.

However, if this problem now exists on the Neo1973 (something I didn't know before) should mean that it is not (strict) hardware. When Leachim (above) removed his problem by changing hardware I sort of gave up on my brick.

Something that I have been trying to figure out is, is there any kernel that this doesn't happen? Do someone know when this (WSoD) started more precisely?

comment:44 Changed 6 years ago by GiggleLoop

Please let me confirm the temperature-research jurg has done before:

Freerunner lying on my office-desk for hours in suspend, around 20°C: no WSOD (tried a couple of days)
Freerunner inside the inside-pocket of my jacket for some hours: no WSOD (tried a couple of days)
Freerunner inside the jacket hanging in the lobby for 2 hours, around 12°C: WSOD (tried twice)
Freerunner inside the fridge for 10 minutes, around 9°C: WSOD (tried twice)
Freerunner on top of the (slightly warm) radiator for 15 minutes: no WSOD.

So I would say it really is temperature-dependent, whereas the difference between 20° and 12° already seems to do the trick.

comment:45 follow-ups: ↓ 47 ↓ 48 ↓ 53 ↓ 61 Changed 6 years ago by joerg

Please can we collect exact hw-revision (A5, A6...), serial# (!), and datecode for every device showing WSOD.
Please attach to this ticket!
thanks,
jOERG

comment:46 Changed 6 years ago by jurg

System Info tells me:
Revision: "HW: GTA02BV5, GSM: gsm_ac_gp_fd_pu_em_cph_ds_vc_cal35_ri_36_amd8_ts0-Moko8"
Serial Number: 354651011619287

On the inside:
S/N: 8A8703469
DATE CODE: 20080717
P/N: 56-21147-00

comment:47 in reply to: ↑ 45 Changed 6 years ago by nelg

Replying to joerg:

/proc/cpuinfo shows
Hardware : GTA02
Revision : 0360
which I think means GTA02v6

From the modem tab in system information (qtopia)
Neo1973 GTA02 Embedded GSM Modem
Revision: "HW: GTA02BV5, GSM: gsm_ac_gp_fd_pu_em_cph_ds_vc_cal35_ri_36_amd8_ts0-Moko8"
Serial Number: 354651011638220"

On the inside:
S/N: 8A8710675
DATE CODE: 20080725
P/N: 56-21147-00

comment:48 in reply to: ↑ 45 ; follow-up: ↓ 49 Changed 6 years ago by GiggleLoop

/proc/cpuinfo shows:
Hardware : GTA02
Revision : 0360

org.freesmartphone.GSM.Device.GetInfo? shows:
'revision': '"HW: GTA02BV5, GSM: gsm_ac_gp_fd_pu_em_cph_ds_vc_cal35_ri_36_amd8_ts0-Moko8"'

Inside:
S/N: 8A8703718
DATE CODE: 20080721
P/N: 56-21147-00

comment:49 in reply to: ↑ 48 ; follow-up: ↓ 50 Changed 6 years ago by jensbruhn

my NEO1973:

Processor : ARM920T rev 0 (v4l)
BogoMIPS : 132.30
Features : swp half thumb
CPU implementer : 0x41
CPU architecture: 4T
CPU variant : 0x1
CPU part : 0x920
CPU revision : 0
Cache type : write-back
Cache clean : cp15 c7 ops
Cache lockdown : format A
Cache format : Harvard
I size : 16384
I assoc : 64
I line length : 32
I sets : 8
D size : 16384
D assoc : 64
D line length : 32
D sets : 8

Hardware : GTA01
Revision : 0240
Serial : 0000000000000000

P/N: 70-7GTA01B-00001
S/N: 8A7507390
Date Code: 20070712

comment:50 in reply to: ↑ 49 Changed 6 years ago by andy

Replying to jensbruhn:

my NEO1973:

Hardware : GTA01

Wow... what the hell is this issue then? I just assumed it was the bad boy Glamo but that makes it sound like it is jbt6k74 issue.

comment:51 Changed 6 years ago by mmezo

/proc/cpuinfo shows:
Hardware : GTA02
Revision : 0350

back:
Model: GTA02
S/N: 8A8602752
Date Code: 20080618
P/N: 56-21147-00

comment:52 Changed 6 years ago by mrmoku

/proc/cpuinfo:
Hardware : GTA02
Revision : 0360

gsmdevice.GetInfo? (from within cli-framework)

'model': '"Neo1973 GTA02 Embedded GSM Modem"',
'revision': '"HW: GTA02BV5, GSM: gsm_ac_gp_fd_pu_em_cph_ds_vc_cal35_ri_36_amd8_ts0-Moko8"'}

inside:
Model: GTA02
S/N: 8A8710361
Date Code: 20080725
P/N: 56-21147-00

comment:53 in reply to: ↑ 45 Changed 6 years ago by vanous

Qtopia reports:

GTA02BV5

Backside:

S/N 8A8710381
DATE 20080728
GSM MOKO8

comment:54 Changed 6 years ago by RuiSeabra

In my case (WSoD after resume / and more recently after dimming as well: #2115)
/proc/cpuinfo shows:
Hardware : GTA02
Revision : 0350

back:
Model: GTA02
S/N: 8A8603495
Date Code: 20080621
P/N: 56-21147-00

comment:55 Changed 6 years ago by pbondo

I was under the impression that we were not supposed to see this with the 2.6.26++ kernels ?

Still get the WSOD with a current stable-tracking kernel (using QI to boot):

http://people.openmoko.org/andy/uImage-moredrivers-GTA02_stable-tracking_d500794e07e71cc6.bin

uname -r
2.6.28-GTA02_stable-tracking_d500794e07e71cc6-mokodev

cat /proc/cpuinfo
Hardware : GTA02
Revision : 0360
Serial : 0000000000000000

comment:56 Changed 6 years ago by andy

It is "fixed here" for some days on stable-tracking (actually, andy-tracking)... it seems that what is left is the device- or temperature- specific WSOD that is seen by some folks after blanking.

Today I will look further at it.

comment:57 Changed 6 years ago by theseer105

I flashed the kernel and get "can't get kernel image".

Is there a howto for the installation of the of the kernel?

comment:58 Changed 6 years ago by nicolas.dufresne

I was sceptic about the temperature issue, but as I live in Québec (Canada) I tought I had the perfect condition outside to test it (today it's -5 C). It would also mean that this is a serious defect for me using the device. I repeated the same test 3 times it turn out I could reproduce it every time.

  1. Keep the device in a warm area place for 5 minutes
  2. Set the phone to sleep mode
  3. Put it in cold place for 5 minutes
  4. Wake you phone

I own a GTA02v5, date code 2008-06-20, GSM 850/... Later I'm going to test if it resumes correctly if I do warm it before step 3. I no nothing about hardware, but this would bring more input I guess.

comment:59 follow-up: ↓ 60 Changed 6 years ago by nicolas.dufresne

Warming the device before resume worked. No WSOD. I hope this information will help.

While I was waiting, I've read the specification of the TD028TTEC1 LCD. There is a contradiction in operating temperature, on one place they say -20 to 60 and at another place it's -10 to 60 (which is more common value). In both cases my test is valid since it's within operating temperature range (actually the phone will not reach -5 in only 5 minutes). I've also noticed that according to the spec it take a minimum 250 ms for the LCD to wake from suspend. I tought this was a lot of time since return from sleep on the Freerunner does not take more than a second. Anyway, that might not be relevent, but maybe it would be nice to keep it in mind while looking into a software solution.

comment:60 in reply to: ↑ 59 Changed 6 years ago by joerg

Replying to nicolas.dufresne:

Warming the device before resume worked. No WSOD. I hope this information will help.

[...]

I've also noticed that according to the spec it take a minimum 250 ms for the LCD to wake from suspend.

Actually this test might be very helpful on locating the problem.
The startup behaviour could differ depending on temperature.

We should check the code, if the situation mentioned by you really applies to LCD-resume, and for a sufficient delay after resume of LCD.

Yesterday Tony put a FR into the fridge, but wasn't able to reproduce the WSOD. Seems we aren't able to create WSOD situation here in TPE, at least up to now.
I will extend the search next days, as soon as I find a fridge in the office where we could freeze all the phones I can "borrow" for a few minutes.

@Andy: what would you suggest to probe/meassure/test if we can find a FR with WSOD. I'd like to help on this issue.

/jOERG

comment:61 in reply to: ↑ 45 Changed 6 years ago by lunved

Replying to joerg:

Please can we collect exact hw-revision (A5, A6...), serial# (!), and datecode for every device showing WSOD.

Model: GTA02
S/N: 8A8603571
Date Code: 20080621

comment:62 Changed 6 years ago by ArteK

Hardware : GTA02
Revision : 0350

S/N:8A8603193
DATA CODE: 20080621

comment:63 Changed 6 years ago by theseer105

Still get the WSOD with a current stable-tracking kernel (QI: http://people.openmoko.org/andy/qi-s3c2442-andy_8b773038524299aa.udfu, kernel: http://people.openmoko.org/andy/uImage-moredrivers-GTA02_andy-tracking_de248ab69418d52b.bin)

Hardware : GTA02v5
S/N:8A8602329
DATA CODE: 20080618
P/N: 56-21147-00

comment:64 Changed 6 years ago by andy

You always get the WSOD effect after resume (only ever after resume?) with recent andy-tracking? Any relationship to the temperature thing or just permanent? Second resume never recovers it?

comment:65 Changed 6 years ago by nicolas.dufresne

I've seen the hard reset that has been added in the jbt driver in stable-tracking so I've tested it. I've got the same problem too, but I've tested only once and did not try to sleep it again to see if reset help recovering (I will try to find time later to do this test).

I've been reading the jbt driver source code today. I have a feeling that locking is too weak. I think it only protects the registry cache.This makes me think there is a larger problem. As the resume process implies multiple reg_write (and the locking is done inside this function) it means that registry could be written during a suspend or resume sequence or registry write, which I don't think it is a good idea. But I might also be totally wrong.

comment:66 Changed 6 years ago by theseer105

I tried it today. I always get WSOD with a cold freerunner. If the Freerunner is about body temperature it resumes fine.

comment:67 Changed 6 years ago by andy

Is it ever the case that once you had a WSOD, by making it right temperature or anything else, that we ever recover from it during the same session?

On current andy-tracking there is a /sys node

/sys/class/i2c-adapter/i2c-0/0-0073/pcf50633-regltr.9/glamo3362.0/glamo-spi-gpio.0/spi2.0/reset

if you echo 1 in there (take care about spaces, echo 1 > /sys...) it should reset the jbt6k74 asic in the LCM.

Nicolas: About the locking, yes it can make trouble because the framebuffer blanking action that calls through to jbt6k74 code now is async to resume action in jbt6k74. But if this was the problem, a subsequent use of that reset /sys node should clear it, since that reset code does a physical hard reset of the LCM similar to boot init, and we NEVER see WSOD on boot. But I get the feel this WSOD thing is super sticky and the only thing that impresses it is reboot which is knows about somehow.

comment:68 Changed 6 years ago by nicolas.dufresne

I've added trace to the jbt driver and can confirmed that the locking was not affecting the resume process. Every calls are exclusive and are produced in same order both cases (wsod or not).

I wanted to test using andy-tracking and doing the reset, unfortunately the branch won't compile. I've also tried the kernel on http://people.openmoko.org/andy, but I always get a Bad CRC error from uboot. I'll try later ... I think it's important to confirm if the wsod can be recovered or if it sticks until reboot.

comment:69 Changed 6 years ago by theseer105

I think it's the temperature. Freerunner was over night on the heating and wakes up without WSOD. Same test on the table there is a WSOD.
The Freerunner is 5 minutes in the fridge and it resumes with WSOD even if I tried "echo 1 > /sys ..." bevor I toutch the Freerunner

comment:70 Changed 6 years ago by jave

I've recently started experiencing WSOD. My device details:

hw:GTA02
s/n:8a8704369
date:20080722

Everything was fine until something unknown happened to the device recently, and now I get wsod quite often. I run Debian+FSO+Illume. Things that happened recently:

  • some debian updates last week, to fso4
  • a sudden temperature drop in Sweden

comment:71 Changed 6 years ago by jave

I would like to add that I've tried to disable suspend, and wsod seems to happen anyway.
I put the freerunner on a table, wait a while and touch the screen. The screen turns on and is white.

comment:72 Changed 6 years ago by gomez

Hi,
same phenomenon here! it seems that the heat from itself while the display was just blanked was enough with normal temperature (like 20°C), but as it got colder i got this problem even if just blanked.

cat /proc/cpuinfo
Hardware	: GTA02
Revision	: 0350
Serial		: 0000000000000000
>>> gsmdevice.GetInfo()
{   'imei': '354651011613199',
    'manufacturer': 'FIC/OpenMoko',
    'model': '"Neo1973 GTA02 Embedded GSM Modem"',
    'revision': '"HW: GTA02BV5, GSM: gsm_ac_gp_fd_pu_em_cph_ds_vc_cal35_ri_36_amd8_ts0-Moko8"'}

will do more tests if needed

comment:73 follow-up: ↓ 86 Changed 6 years ago by nicolas.dufresne

Ok, I think we are very close to a solution. I've disable deep sleep in the JBT driver, checked the suspend still works for the rest of the driver (around 20 sleep, unsleep), put it to sleep, put it in the freezer for 20 minutes, took out and waked up, no WSOD. I've put back the unpatched kernel, did same thing, got a WSOD. There is definitely something wrong about this driver.

From that point without the datasheet (under NDA) it will be hard to continu, and to make sure we doing things right. Whatever how bad the datasheet are (as mentionned on the dev mailing list), the ones who has it will figure out the bug quicker than me. I'm going to post my patch, so other people can reproduce. Note that I have no idea what effect this patch may have on power consumption or anything else you could think of.

Changed 6 years ago by nicolas.dufresne

Path to disable deep sleep, and get rid of WSOD. Applyable to origin/stable git.

comment:74 Changed 6 years ago by joerg

Given we see a dependency of failure to resume from suspend and temperature, we may assume this is no misconfiguration of any register, or a missing command etc, but instead most probably a voltage or timing issue.
From jbt6k74.c and varaha:'docs/by_function/lcm/td028ttec1/JBT6K74A-22AS TSB_JBT6K74A-22AS_rev231s(D)_eg_20060329' I found the following suspects:

*-----
[p27]: e) Note for "Deep standby mode" release
■ Release method
[ [a diagram showing transfer of three words all-zeroes, with a pause between them marked "WAIT=1ms"] ]
1) Release method

When transfer “D/C=”0”、Command ="00h" continuously three times. Release “Deep standby mode”

(This rule is All=”0”x9bitx3 times.)

[ [corresponding code] ]
static int standby_to_sleep(struct jbt_info *jbt)
{

int rc;

/* three times command zero */
rc = jbt_reg_write_nodata(jbt, 0x00);
mdelay(1);
rc |= jbt_reg_write_nodata(jbt, 0x00);
mdelay(1);
rc |= jbt_reg_write_nodata(jbt, 0x00);
mdelay(1);

/* deep standby out */
rc |= jbt_reg_write(jbt, JBT_REG_POWER_ON_OFF, 0x17);

return rc ? -EIO : 0;

}

[ [comment] ]
The 1ms WAIT might be critical, and the language in datasheet makes me think this is a minimum value. We should increase to 2ms, just to make sure.

(00h is being described as "No Operation" and "always valid". Found no further info on this)

*-----
[p33:]
Software reset (01h)

D/XC D7 D6 D5 D4 D3 D2 D1 D0

0 0 0 0 0 0 0 0 1

The software reset command initializes the registers for which software reset is enabled. A wait of at least 5 ms

is necessary after the software reset command is issued. Be sure to insert a wait of 5 ms or longer before entering
the next command. If this command is issued in the normal display state, the JBT6K74A-22AS reverts to the initial
state and requires 120 ms to initialize the internal circuits of the LCD driver. Wait at least 120 ms before entering
normal display mode (Sleep-out).

[ [corresponding code] ]

/* hard reset the jbt6k74 */

(jbt6k74_pdata->reset)(0, 0);
mdelay(1);
(jbt6k74_pdata->reset)(0, 1);
mdelay(120);

rc = jbt_reg_write_nodata(jbt, 0x01);
if (rc < 0)

dev_err(dev, "cannot soft reset\n");

mdelay(120);

jbt->state = JBT_STATE_DEEP_STANDBY;

[ [comment] ]
Same considerations apply - we should increase time to avoid being "on the edge". Here the manual is even specific in specifying "at least 120ms". A few dozen more, just for test, won't hurt.

*-----
[p44:]
Sleep-in (10h)

D/XC D7 D6 D5 D4 D3 D2 D1 D0

0 0 0 0 1 0 0 0 0

This command causes the JBT6K74A-22AS to enter sleep mode. In sleep mode, the booster and display controller

are stopped. MCU command access and graphic controller data access are, however, permitted. After executing the
Sleep-in command, the PCLK, VSYNC, and HSYNC synchronization signals from the graphics controller must be
supplied for a period of at least two frames.

The display controller is stopped in sleep mode. To stop it with internal operation stabilized, however, write a

single frame of all-white (or all-black) display data. This write clears the display screen in sleep mode to maintain
the LCD quality.

The Sleep-in command must be followed by a wait time of 5 ms to allow the internal circuits to stop automatically.

Be sure to wait for at least 5 ms before entering the command next to the Sleep-in command. Note that the
command cannot be executed until a wait time of 120 ms elapses after the Sleep-out command is executed.

[ [AND] ]
[p45:]
Sleep-out (11h)

D/XC D7 D6 D5 D4 D3 D2 D1 D0

0 0 0 0 1 0 0 0 1

This command releases the JBT6K74A-22AS from sleep mode. The booster and display controller restart upon

the termination of sleep mode. Display synchronization signals can be stopped in sleep mode. Ensure that display
synchronization signals are supplied for a period of at least two frames before executing the Sleep-out command.
Those signals are necessary to generate an idling state so that the internal circuits can start normally.

The Sleep-out command is the only method of terminating sleep mode. The Sleep-out command must be followed

by a wait time of 5 ms to allow the booster and display controller to restart stably. Be sure to wait for at least 5 ms
before executing the next command. Note that executing the Sleep-out command after the Sleep-in command
requires a wait time of 120 ms between them.

[ [corresponding code] ]
???
[ [comment] ]
The 5ms seem to be an issue. I don't assume we are entering suspend and resuming from suspend in less than 120ms (well maybe even this should be checked ;-)

*-----
[p55ff:]
Detailed explanations on the detail setup commands
Power supply on/off control (B0h)
[ [AND] ]
Booster operation setup (B1h)
[ [AND] ]
Booster mode setup (B2h)
[ [comment] ]
voltage setup should be checked by OM EE, if we can't find an explanation for temperature dependency, which is based on timing issues.
see [p57:] (Supplementation 1) Please evaluate customer’s module to optimize booster command setting.

/jOERG

comment:75 Changed 6 years ago by joerg

could this be related, via mdelay, to the fast running systemclock we seen recently?
/j

comment:76 Changed 6 years ago by werner

Turning some of those mdelay() busy-loops into msleep()s wouldn't
be so bad either ...

comment:77 Changed 6 years ago by quatrox

This might be relevant.
I am using a current kernel and a current ver of SHR (just a one or two days old).
I had Numpty Physics running on my GTA02v5.
I suspended the phone.
When I woke it up again, only every thing was a little smaller - i.e. only
half the screen width where used. I could hear a high frequency sound from
the phone (from somewhere close to the handset speaker at the top).

Aditional suspend/resume did not help.
After a full reboot, everything worked great again:)

Note that this does not happen all the time... This was the first time

comment:78 Changed 6 years ago by Vladimir.Koutny

I've just found another way to reproduce a WSOD on backlight restore after a blank (ie, no suspend involved).

All that is needed: enter QVGA (portrait) mode (I've opened #2136 for the color/banding issue, maybe it is related in the end?) and let the system (2008.9 in my case) blank the display (no suspend). On backlight resume there is a WSOD. This is under normal indoor conditions on a device that never had WSOD problems in VGA mode. Nothing in dmesg.

In this state, I can get rid of WSOD by switching to VGA (and back); on next blank/resume the WSOD is there once again.

One additional observation: in this WSOD, when I change the screen orientation a few times (keeping it in QVGA) I start seeing some of the graphics that should be shown (it is still more-less white but with a bit darker/yellowish pieces (ie, you can see the rotating * when you start something by touching the panel).

This is with own build of mwester's stable kernel (833cda8ff04a0f45f02b70d0c3f61c0a4305e3c8) and own build of the rest via mokomakefile (I can get exact version if needed).

Device info:
hw-revision: GTA02A6 (EU version)
serial#: 8A8906520
datecode: 20080925

comment:79 Changed 6 years ago by C.M

Just want to add that I also get the WSOD on my GTA01v4 with the latest SHR-lite build from November 25th. It happens every time the phone suspends. Before the summer I had never had a WSOD, but all kernels after August seem to have the problem. The phone is connected via USB and in a room that's currently around 19 degrees C.

comment:80 Changed 6 years ago by m0nt0

WSOD with:
serial#: 8a8710237
datecode: 20080725

Just to understand, is this an hw bug? because if it is i can contact my shop and try to get a fix or something similar, if this is a pure sw bug i'll wait. I can't understand why not anyone is experiencing this, if all the FR are produced in the same way an there are not a tons of version out there, there should be a lot of people having this problem but it seems that only a few people are experiencing it, i can't understand (really, i don't want to make a flame or something like that, just want to understand).

comment:81 follow-up: ↓ 82 Changed 6 years ago by nicolas.dufresne

Andy: I've try the reset switch but it won't clear the white screen. I can see the kernel output telling me it did something. I've also try to manually switch the driver state, but again it did not clear the WSOD. Maybe we have to reset everything on the SPI ?

Note that I've produce the bug without suspend (just blanking) and I'm running latest andy-tracking kernel (2.6.28-GTA02_andy-tracking_b6d34d41617a6315-mokodev)

comment:82 in reply to: ↑ 81 Changed 6 years ago by andy

Replying to nicolas.dufresne:

Andy: I've try the reset switch but it won't clear the white screen. I can see the kernel output telling me it did something. I've also try to manually switch the driver state, but again it did not clear the WSOD. Maybe we have to reset everything on the SPI ?

What makes it puzzling is I can confirm the GPIO from Glamo that form the bitbang SPI bus are operational at this time, and the reset /sys thing performs a hard reset and the same steps used at boot to re-init the LCM asic... and we NEVER see WSOD at boot (the brief one until recently doesn't count, it's the backlight coming up before the LCM asic gets initialized).

Note that I've produce the bug without suspend (just blanking) and I'm running latest andy-tracking kernel (2.6.28-GTA02_andy-tracking_b6d34d41617a6315-mokodev)

I'm currently imagining that the reports of WSOD on resume are actually some form of the blanking WSOD happening immediately afterwards, dreamlike as that may sound with no evidence.

comment:84 Changed 6 years ago by john_lee

  • Cc john_lee@… added

comment:85 in reply to: ↑ 83 ; follow-up: ↓ 87 Changed 6 years ago by jensbruhn

Great news :)
Is there any image with this patch included?

Replying to theseer105:

No WSOD with this patch:

http://docs.openmoko.org/trac/attachment/ticket/1841/jbt6k74_no_deep_sleep.patch

comment:86 in reply to: ↑ 73 Changed 6 years ago by ankostis

Replying to nicolas.dufresne:
(i know i might be totally ignorant...)
Concerning the jbt6k74_no_deep_sleep.patch you submitted, Nicolas:
Is there any reason to invoke lcd-display-off twice in lines 347-348 of a/drivers/video/display/jbt6k74.c?

comment:87 in reply to: ↑ 85 ; follow-up: ↓ 92 Changed 6 years ago by theseer105

You can download a Kernel at:

http://rapidshare.com/files/169488908/uImage_screenhack_own.bin.html

patched by Mokel23 a member of Freeyourphone.de

Replying to jensbruhn:

Great news :)
Is there any image with this patch included?

Replying to theseer105:

No WSOD with this patch:

http://docs.openmoko.org/trac/attachment/ticket/1841/jbt6k74_no_deep_sleep.patch

comment:88 Changed 6 years ago by andy

Any confirmation about Nicolas' patch is very welcome... there was concern about suspend current for LCM, but we have moved to powering the thing off in suspend a couple of days ago.

comment:89 Changed 6 years ago by gomez

i can really confirm, that its working. now i should get sound working(other topic ;) ).
i dont know if it draws current, cause i never suspended it that long and saw how much was gone, i had to use it for really a while without suspend. And thas wasted battery! i think ist about a day when sometimes powered on.

No WSOD!!!

comment:90 Changed 6 years ago by andy

OK. I stuck Nicolas' patch on stable and I'll try adapt it to stable-tracking later... there's nothing else known to impact it and it's pretty hard to debug it here when I don't get WSOD on my devices.

Thanks for taking the time to do the patch Nicolas!

comment:91 follow-up: ↓ 93 Changed 6 years ago by gomez

Hi Andy,
if you like we could switch the phones...

no really... if i can get support in configuring and compiling a working kernel with audio i would like to test further patches and images

comment:92 in reply to: ↑ 87 Changed 6 years ago by RuiSeabra

Replying to theseer105:

You can download a Kernel at:

http://rapidshare.com/files/169488908/uImage_screenhack_own.bin.html

patched by Mokel23 a member of Freeyourphone.de

With my Freerunner and suspending for 15 minutes by pressing the button, I did not have a WSOD.

I've activated blanking and suspending again, for a longer trial.

comment:93 in reply to: ↑ 91 Changed 6 years ago by andy

Replying to gomez:

Hi Andy,
if you like we could switch the phones...

Hopefully I don't have to try to reproduce this any more :-)

no really... if i can get support in configuring and compiling a working kernel with audio i would like to test further patches and images

What's up with your audio? Open a new trac or post on kernel list about it.

comment:94 Changed 6 years ago by nicolas.dufresne

Ok, as I said, the patch does not fix the issue, it hides it. Still, I can rework this patch to make it cleaner, maybe I could only include the patch and get rid of the display_onoff rework (which was the original thing I wanted to submit until I figured-out that it was better to work on latest kernel)

What the patch currently doe:

  • Moves display_onoff into proper state transitions (aside double display off bug, thanks to ankostis)
  • Disable going into deep_sleep (this is tricky since at boot we assume we are in deep_sleep)

Instead of this patch, we could just add a sys entry to disable (at runtime) deep sleep of this chip. This way we keep it for devices that are not affected and with a small shell script we could disable it for affect devices. As boot is already done when set, we no longer need the obscure static variable.

(For binary images, I would prefer if someone from OpenMoko? choose if it's worth it and verify that no damage will be cause to the devices.)

comment:95 Changed 6 years ago by andy

If you want to refine the patch it's very welcome. But I think differentiating in userspace between devices that have WSOD issue sometimes and those that don't is probably too far.

One thing, I rebased most of the patch on to stable-tracking, and it's quite badly broken. The LCM goes crazy with dynamic lines all over the display mostly obscuring it. Can you see what's broken about that rebased patch?

http://git.openmoko.org/?p=kernel.git;a=commitdiff;h=8e318ae8cab52b85323a36ac7808bf33e7a98797

Patched file

http://git.openmoko.org/?p=kernel.git;a=blob;f=drivers/video/display/jbt6k74.c;h=976d99230472855649921339a723bfd60cc66c2e;hb=8e318ae8cab52b85323a36ac7808bf33e7a98797

comment:96 Changed 6 years ago by nicolas.dufresne

Agreed. I'll prepare a tiny correction to the patch and try to fix the rebase.

comment:97 Changed 6 years ago by nicolas.dufresne

The merge does not work because the LCM is reset but not bootstrapped (this is patch job not to execute the standbye_to_sleep() code. The patch will work again if you remove the reset at the beginning of jbt6k74_resume. Note that the reset attribute you added will most likely cause the same issue. This need a bit of rework. If you are not too hurry, you can just remove the offending reset and I will create a proper patch this weekend.

Changed 6 years ago by nicolas.dufresne

This is an enhancement on top of previous patch. This implementation does not break switching from VGA to QVGA.

comment:98 Changed 6 years ago by nicolas.dufresne

Andy, I've attached a rework on top of previous patch. I think you'll agree that this one has cleaner implementation and is less likely to break switching between VGA and QVGA.

For 2.6.28 version, I would suggest to move the reset routine that is in the suspend function to standby_to_sleep() before trying to rebase. This is more likely to give good results.

I hope we will have Joerg input soon on this since he has reported having WSOD even with the patch.

comment:99 Changed 6 years ago by andy

OK. Last night I reverted the patch on andy-tracking and still had the magic lines, I am wondering if this is due to something else then. So I will study that first thismorning and then try your patch... thanks for looking at it.

I hope we will have Joerg input soon on this since he has
reported having WSOD even with the patch

Huh... where has he reported this though? Neither on kernel list nor on this trac. I don't think we solve things quicker on average by hiding the information away.

comment:100 Changed 6 years ago by andy

I have added your second patch on stable, and put both on stable-tracking now. Thanks again for the work on it.

It's pretty interesting actually, this new sticky LCM behaviour of "jazzy lines" is provoked solely by taking down LDO6, the LCM power during suspend. If I have LDO6 kept up during suspend, no jazzy lines.

However while we don't take down LDO6 and we didn't enter deep sleep in LCM ASIC, I guess we are burning a little power in suspend we don't need to... it's less of an issue than WSOD :-) but it still needs looking at.

comment:101 follow-up: ↓ 102 Changed 6 years ago by nicolas.dufresne

Those "jazzy lines" are probably a separate issue. I've seen those by switching to QVGA or sometimes when I rotate the screen.

For Joerg comment you are right, he haven't posted on dev list or track, he posted on the owners list. I don't know why ... Here's the comment I was referring

From Joerg on owners list:
|The "no deep_suspend" patch didn't work for us -> it's a problem often
|triggered by but not directly caused by deep_suspend.

But I don't think this was their final conclusion. Anyway, if that patch only fixes half of the affected devices, we still have half of devices owners that can now use and test the device for longer period of time.

For power consumption, I've tested what percentage would go away over night (~9h) and I end up night at 87% (-13%). I guess this is pretty much the right number since an other owner have sent me this exact same stats (owner is Sebastien Hammerl, thanks to you).

comment:102 in reply to: ↑ 101 Changed 6 years ago by andy

Replying to nicolas.dufresne:

Those "jazzy lines" are probably a separate issue. I've seen those by switching to QVGA or sometimes when I rotate the screen.

Yes they are a separate issue for sure. The stickiness of it same as WSOD was interesting though, but now Balaji is checking it can be caused by something different on his side.

But I don't think this was their final conclusion. Anyway, if that patch only fixes half of the affected devices, we still have half of devices owners that can now use and test the device for longer period of time.

Well the patch works for me on unaffected LCM, that's good enough for me when added to only positive feedback from customer test of your patch on affected device.

For power consumption, I've tested what percentage would go away over night (~9h) and I end up night at 87% (-13%). I guess this is pretty much the right number since an other owner have sent me this exact same stats (owner is Sebastien Hammerl, thanks to you).

Awesome, that's not very noticeable change either. If Balaji find the issue is on his side about LDO6 there is chance we can even take the LCM completely down too, but this situation as it stands is a lot better than the WSOD we can't impact.

comment:103 follow-up: ↓ 104 Changed 6 years ago by TimoJyrinki

With the cleaner patch that appeared in stable-tracking, I can now play Pingus (which rotates the screen) without problems - earlier I got the whitish screen. Still though Doom does not work (which both changes to QVGA and rotates screen) - I got a corrupted screen still, though it's different than what was there before the patch. Now it's a darker screen with stripes etc, earlier it was close to pure white with various lines etc.

Please tell if there is a separate bug for these screen mode change problems I can follow.

comment:104 in reply to: ↑ 103 ; follow-up: ↓ 111 Changed 6 years ago by andy

Replying to TimoJyrinki:

With the cleaner patch that appeared in stable-tracking, I can now play Pingus (which rotates the screen) without problems - earlier I got the whitish screen.

Great.

Still though Doom does not work (which both changes to QVGA and rotates screen) - I got a corrupted screen still, though it's different than what was there before the patch. Now it's a darker screen with stripes etc, earlier it was close to pure white with various lines etc.

Please tell if there is a separate bug for these screen mode change problems I can follow.

Try #1812 if it is the same, otherwise create a new one.

comment:105 follow-up: ↓ 113 Changed 6 years ago by Seppi

Here is the Kernel which i compiled for me with the patch from 11/26/08

Works very well for me, no WSOD since then. Nicolas is my hero :)

http://music-starvation.de/openmoko/uImage_no_deep_sleep_11262008.bin

comment:106 follow-up: ↓ 110 Changed 6 years ago by joerg

could we please fix this ever incrementing integer, like e.g. by moving the "++" out of the "if" down to the "then" branch?
Didn't check new version, old one was hard enough for me to read.

comment:107 Changed 6 years ago by andy

If you had checked it, you'd've seen there was nothing to complain about.

comment:108 Changed 6 years ago by nelg

Hi Andy,

I would like to give your kernel a try to see if it fixed my WSOD. I've looked on http://people.openmoko.org/andy/, but am unsure if I should be using qi or u-boot, and which version, as you have a few on your page. I have a GTA02v6.

Which boot loader and kernel would you suggest?

comment:109 Changed 6 years ago by andy

Right this second this kernel

http://people.openmoko.org/andy/uImage-moredrivers-GTA02_andy-tracking_49cff03d0e867b06.bin

and you can either try U-Boot if you adjust the environment to pull > 2MByte kernel, or you can give Qi a try, it would be the one qi-s3c2442-master-*.udfu DFU'd into the U-Boot partition (you can always replace it with U-Boot image http://people.openmoko.org/andy/u-boot-gta02v5v6-stable_650149a53dbdd48b.udfu to go back).

If you go the Qi route, it tries first to boot from the first three partitions of SD Card, looking at any ext2 / 3 partition on there for /boot/uImage-GTA02.bin... if it finds it, it will try to boot it and use that partition as the rootfs.

If none of the first three SD partitions have the kernel file, it will boot from the NAND kernel partition as usual.

I added a README in the web dir to help future travellers as well.

comment:110 in reply to: ↑ 106 Changed 6 years ago by nicolas.dufresne

Replying to joerg:

Didn't check new version, old one was hard enough for me to read.

This is the exact reason there is a second patch ;) Actually I remember someone at the OLS last year telling that if you are not satisfied of a patch, you should not release it because it might go upstream "as-is". Well, sometimes you have to do this mistake to really learn.

Note also that the first version was breaking other things (like resolution switching including rotation).

comment:111 in reply to: ↑ 104 Changed 6 years ago by TimoJyrinki

Replying to andy:

Try #1812 if it is the same, otherwise create a new one.

Created a new one #2162, anyone with stable-tracking could test eg. xrandr -s 320x240 and report if having (or not having) the same problem.

comment:112 Changed 6 years ago by nelg

Testing uImage-moredrivers-GTA02_andy-tracking_49cff03d0e867b06.bin with both u-boot and qi, I no longer get a WSOD (great). However, on 2008.9 (fdom) filesystem, after resume, I see a black screen with "apm[pid]: Suspending now". I think this is probably just and issue with the image I am using being fairly old and not getting X back up and working, so am going to try it with openmoko-testing-om-gta02.rootfs.jffs2 from today's daily snapshots and see what happens.

I have also tried http://music-starvation.de/openmoko/uImage_no_deep_sleep_11262008.bin, which suspends and resumes pretty good. (only had an issue once), but with this test image, I don't have the appropriate kernel modules, so get no sound, etc.

Thanks for all the good work. Looks like someday soon I'll be able to use my neo as my phone ;)

comment:113 in reply to: ↑ 105 Changed 6 years ago by Raphexion

Replying to Seppi:

Here is the Kernel which i compiled for me with the patch from 11/26/08

Works very well for me, no WSOD since then. Nicolas is my hero :)

http://music-starvation.de/openmoko/uImage_no_deep_sleep_11262008.bin

I've been using the kernel now for almost two days, including overnight supspend, and my WSoD is gone! I think the battery time is "ok", especially compared to before when I was unable to put in suspend at all.

However sometimes I need to resume from suspend two times. Because the first one jumpes back into suspend after maybe 0.5 seconds in resume.

  1. Press power

the phone wakes from suspend, I see a quick dmesg/boot (I don't know what it is called), then I get back my phone with the menu,(0.5s), the screen goes black (back into suspend)

  1. Press power button,

the phone wakes from suspend, I see a quick dmesg/boot (I don't know what it is called), then I get back my phone with the menu, I can start using my phone!

And, I don't have any sound but after reading another post I guess that is perfectly normal since I don't have the modules to match the kernal

I also want to say thanks to all hard working people, great job and a thousand thanks!!

comment:114 follow-up: ↓ 116 Changed 6 years ago by RuiSeabra

I have sound. Try flashing with FSO M4.1 (the kernel seems to be compatible with the installed modules).

comment:115 Changed 6 years ago by andy

I believe John Lee has set the kernels (and in fact rootfs) in the Testing repo place to current stable kernel branch, which has these fixes already. So you should be OK just using that directly. He sent out this URL:

http://downloads.openmoko.org/daily/testing/

comment:116 in reply to: ↑ 114 Changed 6 years ago by Seppi

Replying to RuiSeabra:

I have sound. Try flashing with FSO M4.1 (the kernel seems to be compatible with the installed modules).

sorry, forgot to say, its the kernel of fso 4.1 that i have patched.

comment:117 Changed 6 years ago by RuiSeabra

Then I thank you doubly :)

comment:118 Changed 6 years ago by nelg

I have been using the kernel from daily/testing for the last 2 days, with suspends overnight for both the 2008.9 testing build, and qt extended, with kernel modules copied from the 2009.9 testing build.

Since using this, have had no WSOD's. Occasional resumes have failed, with just a black screen, but in general, it's a lot better than before, and most resumes work fine, even from suspended overnight.

comment:119 Changed 6 years ago by joerg

Results of our tests so far:
first we found two devices to show WSOD relatively frequent and reproduceably:
#51 from https://docs.openmoko.org/trac/ticket/1621
and a A7 PP model.

We verified temperature dependency, by warming up whole device (-> no WSOD),
then cooling down LCM while keeping rest of device in warmed up state -> WSOD
on first try.

We applied the no_deep_suspend patch to recent stable branch 2.6.24, and we
found (on #51) it reduces probability of WSOD but won't fix it. There are
other reports http://docs.openmoko.org/trac/ticket/2115 of WSOD not being
dependent on going to deep_suspend mode at all (and thus this patch shouldn't
be able to help there).
Seems deep_suspend can trigger WSOD very easily, but WSOD has some different
operation scheme than exactly something going wrong during deep_suspend or
resume from that.

WSOD is dependent on time the device is suspended, i.e. it seems like it takes
quite a few minutes sometimes until suspend triggers WSOD. This seems
somewhat paradox regarding paragraph above.

We patched JBT6K74.c driver to increase existing mdelay() and inserting new
ones on every reasonable point of communication-flow, and even lowered
GLAMO-SPI clockfrequency, to make LCM feel quite comfortable with any aspect
of timing regarding the control-communication. Result: none. Randomness of
WSOD seems unchanged.

We added printk() and created logs of a consecutive resume-ok, and a
resume-WSOD following immediately. On comparing both sequences we didn't
notice any significant difference, neither in sequence of function calls nor
in timing.

We had 2 or 3 times a complete refusal of #51 to produce WSOD. After taking
out battery for 10min it was back to normal (means 95% immediate WSOD after
20sec suspend)

We swapped LCM of #51 with the one of a known good device. Result: 40
suspend/resume, as well as placing #51 with new LCM to the fridge for 30min
and then resuming, didn't show any WSOD.
We attached #51-LCM to a known-good device, and it didn't show WSOD on 6
cycles. So obviously the issue isn't located on the LCM entirely.

We never seen any WSOD recovering on subsequent suspend/resume cycles. It
always needed a reboot to recover. *)

So far we didn't see a single WSOD on boot.

So we are wondering what's the difference between
a) switching LCM power down via LDO6, while keeping *all* lines to LCM at low
(to stop reverse powering by sneak currents, and not to violate JBT6K74
electrical specs), then power up and reset
~and
b) a usual boot bringing up LCM in sane state
Maybe that's pure incidence we never seen a WSOD on boot so far?

*) Further results:
we attached debug-board and resetted the device to reboot without power-down:
WSOD recovered.

We probed for the signals on LCM-FPC by using a GTA03-debugboard (task not
completed yet): With an old image and kernel (2008.08) there was 3.2V for
powersupply and some of the datalines. We didn't find differences in probed
signals between WSOD and clear display.
We didn't see a LCM-RESET on resume though.
By messing around with probing the signals, we got a recover from WSOD once,
but it wasn't reproduceable and only *might* be connected with shorting reset
to GND.
Removing a WSODed LCM from device during suspend, then reconnecting it, then
resume: WSOD recovered at least on second resume after that (first one
probably got some confusion by reconnecting FPC made not a nice switch and
some bounces on the lines and wrong sequences for power-up).
First resume LCM usually faded from white to black.

Conclusion: root cause of WSOD is some 'analog' thing depending on LCM and
device. We can not provide a good clue to nature of the issue.
By first(! Vio <= VDD) switching all glamo->lcm IO's to 0V/high-Z, then
disabling LDO6 for suspend, and on resume first powering up device via LDO6
and then initializing it (incl. activating glamo interface), we should
achieve to get zero power-consumption during suspend for LCM, and be able to
recover/avoid WSOD.

As Andy is much more savvy in meddling the kernel space, and LDO6-switchoff is
announced by him anyway, we didn't try to implement this plus the needed
glamo-lines-pulldown here in TPE.

jOERG

comment:120 Changed 6 years ago by andy

There are other reports http://docs.openmoko.org/trac/ticket/2115 of
WSOD not being dependent on going to deep_suspend mode at all
(and thus this patch shouldn't be able to help there).

No Harald's patches did in fact push the LCM in deep suspend on framebuffer blanking:

http://git.openmoko.org/?p=kernel.git;a=commitdiff;h=f9e5eb98527feda1937f87d93a5b6dd6322baf83;hp=0e7b63e010904140cc39fe98a5d8abd3de7e4f0f

+ case FB_BLANK_POWERDOWN:
+ jbt6k74_enter_state(jbt, JBT_STATE_DEEP_STANDBY);
+ break;

and the report is from before Nicolas' patch, but after Harald's stuff, so that bug is not evidence of WSOD without deep sleep on jbt6k74 I think you find. It is expected Nicolas' patch solves that bug report too.

While there are no further reports of WSOD from several people running just Nicolas' fix, and they definitely had devices that were prone to it, I would suspect a WSOD you observed can be something else, maybe related to stable branch suspend / resume.

For example if stable branch resume fails with backlight up but Glamo / LCM resume not having happened for unrelated reasons (since stable suspend / resume is known to be busted by potential races). But I didn't read yet about "occasional" or "rare" WSOD from testers of Nicolas' patch, but that it was completely gone.

If you're interested to go further into that, an idea would be not to use suspend but to reduce the framebuffer blanking timeout to a few seconds and repeatedly tap the touchscreen to keep bringing it back and see if you ever get a WSOD (using Nicolas' patches).

On andy-tracking we already hard-reset the Glamo on resume, and I never ever saw a sticky WSOD by using reset button on debug board: in both cases there is no power cycling just hard reset. So I don't think zeroing the Glamo LCM bus is involved in WSOD avoidance, but it can make sense on leakage grounds during suspend.

comment:121 Changed 5 years ago by nicolas.dufresne

I'm surprise this one is still open bug. I guess some people may be interested in the following patch: http://lists.openmoko.org/pipermail/openmoko-kernel/2009-February/009115.html

comment:122 Changed 4 years ago by joerg

  • Status changed from new to closed
  • Resolution set to fixed
Note: See TracTickets for help on using tickets.