Ticket #567 (closed defect: fixed)
I/O errors on flash after heavy flash access
| Reported by: | elrond+bugzilla.openmoko.org@… | Owned by: | michael@… |
|---|---|---|---|
| Priority: | high | Milestone: | |
| Component: | kernel | Version: | unspecified |
| Severity: | major | Keywords: | |
| Cc: | buglog@…, werner@… | Blocked By: | |
| Blocking: | Estimated Completion (week): | ||
| HasPatchForReview: | PatchReviewResult: | ||
| Reproducible: |
Description
Hardware: GTA01Bv03 (phase 0)
Kernel: Linux version 2.6.20.2-moko8 (stefan@fairlight) (gcc version 4.1.1) #56
PREEMPT Sun Apr 8 13:58:03 CEST 2007
u-boot: I think 1.2.0-moko7 (getting to the u-boot prompt isn't stricly easy)
The whole thing started, when I downloaded openmoko-theme-standard.ipk to tmpfs
and tried to install it using ipkg install *.ipk. I got a few Input/Output?
errors on files and the ipkg command hung.
Logging in via ssh still worked. And all commands, which were still in RAM,
worked fine. But any access to jffs2 locked up that command.
I could grab dmesg output, which contains a kernel-oops. (will be attached to
this bug as write-oops.dmesg)
As /sbin/halt.sysvinit (for "halt -p -d -i -p") was not in RAM (and I copied a
fresh one via scp to tmpfs) and some other needed stuff was not available, I had
to hard power off the machine (remove battery, half minute pressing power-button
did not work).
The next day, the machine seemed to boot fine. Until the fs gave me again I/O
errors (this time I did not try to write to it explicitly). This time I have a
bigger dmesg, which goes back to the boot. (will be attached to this bug as
boot_and_jffs2notices.dmesg).
Again I had to remove the battery, as I still haven't found the node in /sys,
that will shut down the phone IMMEDIATELY when echoing something into it.
Attachments
Change History
Changed 6 years ago by elrond+bugzilla.openmoko.org@…
- Attachment write-oops.dmesg added
Changed 6 years ago by elrond+bugzilla.openmoko.org@…
- Attachment boot_and_jffs2notices.dmesg added
boot_and_jffs2notices.dmesg
comment:1 Changed 6 years ago by laforge@…
I think this might be similar to #245. We've had a memory initialization
bug that resulted in memory corruption as soon as the kernel used the upper 64MB
of RAM. That memory corruption can obviously result in filesystem corruption as
soon as pages get written out to disk.
I'd recommend to install a way more up-to-date u-boot, kernel and rootfs image
and test again.
comment:2 Changed 6 years ago by elrond+bugzilla.openmoko.org@…
According to http://wiki.openmoko.org/wiki/ChangeLog#2007-03-11 1.2.0-moko6
fixes this problem and this phone has moko7.
I don't have a debug board so if you still recommend updating u-boot I'd kindly
ask you to name a known-to-be-working-on-P0 version of u-boot and its SHA1 (so I
can verify it after downloading).
comment:3 Changed 6 years ago by elrond+bugzilla.openmoko.org@…
- Priority changed from high to low
Okay, I have a new u-boot (rev2040) on it.
Still the old kernel, because the new prebuilt kernels are buggy at the
backlight (new bug on its way).
I have reduced the Priority to P4, because this is now a "wait for it to happen
again".
I leave the bug open for a while (two weeks - month), so people can look at the
Oops and maybe make the kernel more stable in this area.
comment:4 Changed 6 years ago by elrond+bugzilla.openmoko.org@…
I got this issue a few days ago again.
I have now updated the kernel (2.6.21.3, rev 2xxx; I'll post the precise
revision, when needed). I'll see, if this is now better.
I have some ideas on how to reproduce it, so I hope to either give this bug more
info or close it on my own within a few weeks.
comment:5 Changed 6 years ago by elrond+bugzilla.openmoko.org@…
- Priority changed from low to normal
Okay,
Just was able to reproduce it:
u-boot rev2040
kernel rev2118
phone$ dd if=/dev/mtdblock2 of=/tmp/kernel.bin
This command alone seems to be enough to trigger the problem for me.
(Just tried it four times in a row.)
comment:6 Changed 6 years ago by elrond+bugzilla.openmoko.org@…
- Priority changed from normal to high
"XorA on #openmoko" reproduced this just a few minutes ago on
Model: GTA01Bv04
Kernel: uImage-2.6.21.6-moko10-r1_0_0_2360
I'm raising Priority back to the default P2, because this problem is relevant.
comment:7 Changed 6 years ago by laforge@…
- Status changed from new to closed
- Resolution set to fixed
I think this might be related to the bug we had (#419) that didn't fully erase
the rootfs when flashing a new rootfs. This could introduce all kinds of
inconsistencies into the JFFS2 file system.
I therefore recommend trying this with a new u-boot version, and using that new
version to install a new rootfs image.. I'm confident the problem will disappear
at that point.
Please re-open if it still occurrs.
comment:8 Changed 6 years ago by elrond+bugzilla.openmoko.org@…
- Status changed from closed to reopened
- Resolution fixed deleted
- Summary changed from I/O errors on flash and frozen fs to I/O errors on flash after heavy flash access
Okay, next reproduce:
1) u-boot 2040
2) upload kernel via DFU
3) nand erase rootfs
4) Upload rootfs via DFU
boot
5) dd if=/dev/mtd2 of=/tmp/kernel.bin
6) ls -la /usr/bin
(givws I/O errors)
kernel and rootfs are from http://people.openmoko.org/roh/
- uImage-2.6.22.5-moko11+svnr2937-r2-fic-gta01.bin
-
OpenMoko?-openmoko-devel-image-glibc-ipk-P1-September-Snapshot-20070919-fic-gta01.rootfs.jffs2
I have retitled the Bug to better describe the problem.
comment:9 Changed 6 years ago by willie_chen@…
- Status changed from reopened to new
- Owner changed from laforge@… to michael@…
comment:10 Changed 5 years ago by werner@…
- Status changed from new to closed
- Cc werner@… added
- Resolution set to fixed
If CONFIG_MTD_NAND_S3C2410_CLKSTOP is set, opening an MTD device with
open(2) will cause all hell to break loose. See also:
http://lists.infradead.org/pipermail/linux-mtd/2007-July/019010.html
This should be fixed by now in the OE kernel configuration (revision
0238eff8862126ac83c3f05d7a6fb094feff89e9, say my files).
This explains #7 and #10. Not sure if #1 was also caused by this or not,
but I give it the benefit of the doubt, and close the bug :-)
Please reopen if there are more gremlins lurking.

write-oops.dmesg