Ticket #1802 (closed defect: fixed)

Opened 6 years ago

Last modified 4 years ago

Suspend/resume corrupts SD card's partition table

Reported by: montgoss Owned by: openmoko-kernel
Priority: normal Milestone:
Component: kernel Version:
Severity: critical Keywords:
Cc: Blocked By:
Blocking: Estimated Completion (week):
HasPatchForReview: no PatchReviewResult:
Reproducible:

Description

As the title suggests, suspend/resume corrupts the partition table of my SD card every time. May or may not be related to ticket #1743, as changing the SD clock at least partially affects this behavior.

Here's the last part of the dmesg output after a resume that corrupted the partition table:

modem wakeup interrupt
s3c2440-i2c s3c2440-i2c: slave address 0x10
s3c2440-i2c s3c2440-i2c: bus frequency set to 390 KHz
gta02_udc_command(1)
s3c2440-nand s3c2440-nand: Tacls=3, 30ns Twrph0=7 70ns, Twrph1=3 30ns
not changing prescaler of PWM 3, since it's shared with timer4 (clock tick)
timer_usec_ticks = 7864
timer tcon=00599109, tcnt a2c1, tcfg 00000200,00002000, usec 00001eb8
mmc_set_power(power_mode=1, vdd=20
SD power -> 3200mV
glamo-mci glamo-mci.0: powered (vdd = 20) clk: 0kHz div=255 (req:
0kHz). Bus width=0
glamo-mci glamo-mci.0: powered (vdd = 20) clk: 195kHz div=255 (req:
195kHz). Bus width=0
glamo-mci glamo-mci.0: powered (vdd = 20) clk: 195kHz div=255 (req:
195kHz). Bus width=0
glamo-mci glamo-mci.0: powered (vdd = 20) clk: 195kHz div=255 (req:
195kHz). Bus width=0
glamo-mci glamo-mci.0: Error after cmd: 0x8120
usb0: full speed config #1: 500 mA, Ethernet Gadget, using CDC Ethernet
mmc0: card d555 removed
MMC: killing requests for dead queue
mmc_set_power(power_mode=0, vdd=0
glamo-mci glamo-mci.0: glamo_mci_set_ios: power down.
mmc_set_power(power_mode=1, vdd=20
SD power -> 3200mV
glamo-mci glamo-mci.0: powered (vdd = 20) clk: 0kHz div=255 (req:
0kHz). Bus width=0
glamo-mci glamo-mci.0: powered (vdd = 20) clk: 195kHz div=255 (req:
195kHz). Bus width=0
glamo-mci glamo-mci.0: powered (vdd = 20) clk: 195kHz div=255 (req:
195kHz). Bus width=0
glamo-mci glamo-mci.0: powered (vdd = 20) clk: 195kHz div=255 (req:
195kHz). Bus width=0
glamo-mci glamo-mci.0: Error after cmd: 0x8120
glamo-mci glamo-mci.0: Error after cmd: 0x120
glamo-mci glamo-mci.0: Error after cmd: 0x8120
glamo-mci glamo-mci.0: Error after cmd: 0x120
glamo-mci glamo-mci.0: Error after cmd: 0x8120
mmc_set_power(power_mode=2, vdd=15
SD power -> 2700mV
soc-audio soc-audio: scheduling resume work
PM: Finishing wakeup.
Restarting tasks ... <6>soc-audio soc-audio: starting resume work
done.
soc-audio soc-audio: resume work completed
glamo-mci glamo-mci.0: powered (vdd = 15) clk: 195kHz div=255 (req:
195kHz). Bus width=0
glamo-mci glamo-mci.0: powered (vdd = 15) clk: 195kHz div=255 (req:
195kHz). Bus width=0
glamo-mci glamo-mci.0: powered (vdd = 15) clk: 195kHz div=255 (req:
195kHz). Bus width=0
glamo-mci glamo-mci.0: powered (vdd = 15) clk: 195kHz div=255 (req:
195kHz). Bus width=0
glamo-mci glamo-mci.0: powered (vdd = 15) clk: 195kHz div=255 (req:
195kHz). Bus width=0
glamo-mci glamo-mci.0: powered (vdd = 15) clk: 10000kHz div=4 (req:
10000kHz). Bus width=0
glamo-mci glamo-mci.0: powered (vdd = 15) clk: 10000kHz div=4 (req:
10000kHz). Bus width=2
mmc0: new high speed SDHC card at address d555
mmcblk0: mmc0:d555 SU08G 7977472KiB
 mmcblk0: unknown partition table
EXT3-fs error (device mmcblk0): ext3_check_descriptors: Block bitmap
for group 0 not in group (block 3802999490)!
EXT3-fs: group descriptors corrupted!
mapped channel 10 to 2
Alignment trap: phone-kit (1962) PC=0x0001214c Instr=0xe5970008
Address=0xbefdf4bf FSR 0x013

Change History

comment:1 Changed 6 years ago by montgoss

I just noticed something. It seems the reason/method of the resume has an effect on this issue. In my debugging, I added scripts to /etc/apm/suspend.d/ and /etc/apm/resume.d/ that would unmount and remount my partitions. This didn't seem to make any difference. However, I'm currently in an area that has no cell signal and therefore doesn't get woken up by those pesky cell broadcast messages. So, my Neo is waking up from pressing the power button only now. When it does that, combined with the unmount/remount hack, the partition table is not affected.

comment:2 Changed 6 years ago by andy

Mike Westerhof sent in a patch earlier that impacts GSM suspend / resume behaviour and might be connected with this.

comment:3 Changed 6 years ago by polarbaer

I have the same problem with a 8GB SanDisk?-card, even when the 2008.8 is installed on the micro-SC. Since I have a Qtopia in the flash (to have a stable phone) I backed up the partitiontable via
dd if=/dev/mmcblk0 of=/home/root/backup.img bs=512 count=1
and put the restore-command
dd of=/dev/mmcblk0 if=/home/root/backup.img bs=512 count=1
to any bootup-script in Qtopia, in my case to the top of:
/etc/init.d/mountall.sh
so it fixes itself once I boot my Qtopia.

Not a fix, but at least a very bloody workaround.

comment:4 Changed 6 years ago by simat

A workaround for this problem that seems to work is to turn on the SD clock before the suspend.

See treads 'suspend/resume and Debian on SD card' and ' Workaround for suspend/resume SD card problems' in Openmoko Support and Community lists.

I think the problem maybe caused by not enough clock cycles being generated by the Galmo chip before or after certain commands are being issued for the SD card to execute the commands properly.

I have written a driver for SD cards in Forth and assembler on 68S12 micros some time ago and found this to be important.

Simon

comment:5 Changed 6 years ago by Sander

Here's a solution that seems to work for everyone (?)

/etc/apm/suspend.d/00sd_idleclk

#!/bin/sh

echo 1 > /sys/module/glamo_mci/parameters/sd_idleclk

touch /home/root/.profile
sync;sync;sync

/etc/apm/resume.d/00sd_idleclk

#!/bin/sh

echo 0 > /sys/module/glamo_mci/parameters/sd_idleclk

comment:6 Changed 6 years ago by Sander

Ah just patched in the stable kernel tree I see

http://git.openmoko.org/?p=kernel.git;a=commit;h=ca19d156400f817960efe0d14680324b2ea34171

thanks Andy

comment:7 Changed 6 years ago by Sander

I just tested Andy's patch and it's looking good.

Did a couple of suspend/resume cycles in FSO and the SD card still has its partitions.

comment:8 Changed 6 years ago by simat

I had a quick look at Andy's fix which from what i can see is just turning the SD clock on at Suspend and back off on Resume. This may work but i am not sure it is the proper fix. The Simplified SD Spec states that at power up
"
• The host shall supply power to the card so that the voltage is reached to Vdd_min within 250ms and

start to supply at least 74 SD clocks to the SD card with keeping CMD line to high. In case of SPI
mode, CS shall be held to high during 74 clock cycles.

"

I just wonder if this is being done or not?

I don't have the SDHC spec, so am not sure if it is the same for SDHC. If someone could point me to a copy of the SDHC spec i will check it out.

Simon

comment:9 Changed 6 years ago by andy

The glamo driver is aware about it, in the io config callback export to mmc layer it figures out if we are applying power and if so sets the clock going and waits "1ms" which typically means 50ms.

The mci layer should take care about powering us up and down logically, but in 2.6.24 in fact the PMU gets to go to suspend first, and pulls the power to SD Card before the mci stack does the deed. But still it should get called later in suspend and do the wait with clock up thing.

MCI stuff feels racy anyway, there is async mmc_rescan() thread going on behind all this on resume. My guess is the trouble ultimately comes from a race in this area in mainline mci stuff.

static void glamo_mci_set_ios(struct mmc_host *mmc, struct mmc_ios *ios)
{...

host->real_rate = glamo_mci_set_card_clock(host, ios->clock, &div);
host->clk_div = div;

/* after power-up, we are meant to give it >= 74 clocks so it can

  • initialize itself. Doubt any modern cards need it but anyway... */

if (powering)

msleep(1);

...

comment:10 Changed 6 years ago by andy

  • Status changed from new to closed
  • HasPatchForReview unset
  • Resolution set to fixed

comment:11 Changed 4 years ago by raz71abb6

ya sem

Note: See TracTickets for help on using tickets.