Ticket #2180 (new defect)
stable-tracking: 'rxserr' UART messages
| Reported by: | laforge | Owned by: | openmoko-kernel |
|---|---|---|---|
| Priority: | high | Milestone: | FSO |
| Component: | kernel | Version: | |
| Severity: | major | Keywords: | gps s3x24xx_serial rxerr |
| Cc: | testing@… | Blocked By: | |
| Blocking: | Estimated Completion (week): | ||
| HasPatchForReview: | no | PatchReviewResult: | |
| Reproducible: |
Description
the stable-tracking kernel shows a number of receive error messages on the UART for GPS:
[ 4559.800000] rxerr: port ch=0x39, rxs=0x00000001
[ 4562.285000] rxerr: port ch=0x00, rxs=0x0000000c
Those perceived receive errors lead to missing characters in the u-blox binary protocol and corrupt the checksum and lead to all kinds of havoc on higher layers.
This bug did not occur with older kernel versions, though it is unclear what was the last working version.
The same problem also exists with non-gta02 hardware (e.g. the E-TEN glofiish devices), so it is believed to be a bug in the s3c24xx serial driver.
It can be reproduced _always_ during power-up of the GPS device. However, it also occurs sporadically later during data transmission.
Attachments
Change History
comment:2 Changed 4 years ago by andy
Understood...
Jan 15 11:40:15 debian-gta02 kernel: rxerr: port ch=0x2c, rxs=0x00000001
is just before the blowup
got 552 bytes from: '061412",145,"00570061006C007400650072002000260020004D00610072006700720069007400680020002F0048"\r\n+CPBR: 82,\x00"41794336783",145,"00570061006C00740065007200200026002000400680020002F004D"\r\n+CPBR:
where it seems a \x00 byte appeared
when I look at what the rxs thing is printing, it is the error status register, bit 1 suggests "overrun". What exactly "overrun" means on a FIFO UART as a character error indication I dunno yet.
comment:3 Changed 4 years ago by andy
It does seem to refer to FIFO rather than character status
Generated when it gets to the top of the receive FIFO without reading out data in it (overrun error).
So this would seem to boil down to a problem of long interrupt latency caused completely elsewhere.
comment:5 Changed 4 years ago by andy
I was thinking about this on and off the last days.
Harald, what drivers do you have up on your non-GTAxx device? I guess we can rule out Glamo for a start, so Glamo MMC. What else can we know that it isn't then? We can also rule out SDIO and AR6000 WLAN? You probably don't have FIQ / HDQ up? USB / Ethernet over USB is up?
I had two ideas about mapping latency, first was to set all IRQs as FIQ, log the request time immediately, and if necessary do some magic about also having the IRQ dealt with as normal IRQ. In the normal IRQ, we add code to also log the time it was finally serviced. This would give 100% view of genuine latencies incorporating priority blocking relationship.
Since that's a whole project in itself, I had another idea to make some macros and apply them to ISR entry and exit, maybe at platform, to simply log IRQ duration without capturing the relationship between that and other interrupts.
But neither are simple as the ARM920T does not have a CPU cycle counter register, so we need to study timers and debug that, etc.
comment:6 Changed 4 years ago by werner
Here's a patch for measuring the time during which interrupts are
globally disabled:
http://svn.openmoko.org/developers/werner/wlan-spi/patches-tracking/find-irq-blockers.patch
Another source of interrupt latency can be the time during which a
specific interrupt is disabled. The execution time of an interrupt
also (indirectly) shows up in this, but it's better to measure it
directly.
- Werner
comment:7 Changed 4 years ago by laforge
andy: with regard to non-GTA devices: yes, there is no glamo, there is no SDIO or AR6000. There also is o FIQ/HDQ. And yes usb/ethernet is up and runnign.
Apart from that, the s3c2410fb is used, like in GTA01. The NAND flash is used. The SD-card is used (s3cmci, like GTA01). I have a couple of my glofiish specific drivers, but none of them
are actually loaded when this bug occurs. There is no particular system activity, no daemons
running, almost zero CPU consumption.
comment:8 Changed 4 years ago by andy
I looked at /dev/ttySAC0 with stty on GTA02 / FSO and the crtscts flag is not set on it.
Quite aside from interrupt latencies, it should actually be impossible to get RX FIFO overrun as seen by the RX UART anyway, when the UART is dealing with the handshakes in hardware and the other side is honouring them. It won't help with GPS case but that's of a different level of symptom if we trash the occasional RX sentence compared to destroying the GSM mux.
I confirmed that if we set it, with stty -F /dev/ttyUSB0 crtscts, then the appropriate bit in the UART regs gets set to enable hardware management of RTS.
What's the crtscts state on the glofiish ttySAC?
comment:9 Changed 4 years ago by Sascha
I'm running debian with kernel 2.6.29 stable 821e9fa664049b1e5e97a00f6eeed3b72b67c1ba on a GTA02v5 and I get lots of these error messages for GPS and GSM.
For the attached syslog I've removed the trace in iblock.c (it doesn't work for me) and
I've added the port number to the rxerr messages:
Feb 8 00:51:44 gta02 kernel: [ 293.080000] rxerr: port=1 ch=0x45, rxs=0x00000001 ... Feb 8 00:52:02 gta02 kernel: [ 310.250000] rxerr: port=0 ch=0x7e, rxs=0x00000001 Feb 8 00:52:02 gta02 /usr/sbin/gsm0710muxd[1598]: gsm0710muxd.c:1168:gsm0710_advanced_buffer_get_frame(): Dropping frame: FCS doesn't match
Oh, and hardware handshake is enabled:
$ stty -F /dev/ttySAC0 -a speed 115200 baud; rows 0; columns 0; line = 0; intr = <undef>; quit = <undef>; erase = ^?; kill = ^U; eof = ^D; eol = <undef>; eol2 = <undef>; swtch = <undef>; start = <undef>; stop = <undef>; susp = <undef>; rprnt = ^R; werase = ^W; lnext = ^V; flush = ^O; min = 0; time = 0; -parenb -parodd cs8 -hupcl -cstopb cread clocal crtscts -ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr -icrnl -ixon -ixoff -iuclc -ixany -imaxbel -iutf8 -opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0 -isig -icanon iexten -echo -echoe -echok -echonl -noflsh -xcase -tostop -echoprt echoctl echoke
comment:10 Changed 4 years ago by andy
Thanks for the report and dumps Sascha.
434 Feb 8 00:53:16 gta02 kernel: [ 384.800000] rxerr: port=1 ch=0xb5, rxs=0x00000001
435 Feb 8 00:53:16 gta02 kernel: [ 384.845000] interrupts were disabled for 596 us !
436 Feb 8 00:53:17 gta02 kernel: [ 385.300000] rxerr: port=1 ch=0xb5, rxs=0x00000001
437 Feb 8 00:53:17 gta02 kernel: [ 385.345000] rxerr: port=1 ch=0x00, rxs=0x00000001
438 Feb 8 00:53:21 gta02 kernel: [ 389.825000] interrupts were disabled for 599 us !
439 Feb 8 00:53:26 gta02 kernel: [ 394.925000] rxerr: port=1 ch=0x7a, rxs=0x00000001
440 Feb 8 00:53:26 gta02 kernel: [ 394.935000] interrupts were disabled for 593 us !
441 Feb 8 00:53:31 gta02 kernel: [ 399.970000] rxerr: port=1 ch=0x0d, rxs=0x00000001
442 Feb 8 00:53:31 gta02 kernel: [ 399.970000] interrupts were disabled for 596 us !
443 Feb 8 00:53:36 gta02 kernel: [ 404.980000] rxerr: port=0 ch=0x7e, rxs=0x00000001
444 Feb 8 00:53:36 gta02 /usr/sbin/gsm0710muxd[1598]: gsm0710muxd.c:1168:gsm0710_advanced_buffer_get_frame(): Dropping frame: FCS doesn't match
Well whatever else, error with b0 set (overrun) on ch0 with crtscts on is actually illegal, unless I miss the point somewhere. There shouldn't be a way to get an overrun seen by the RX FIFO under those circumstances.
Either the other end (GSM) is not set to use handshakes, the timing of using them is wrong, or the detail of the error report from the serial code is bogus somehow.
The interrupt lockout period doesn't exceed 8ms anyway and doesn't correlate with the error presence. There can still be (and probably is) trouble somewhere in terms of locking out serial interrupts by priority, so some other interrupt like USB is blocking lower priority serial service.
I think maybe we learn something if we study the damage done to the received serial stream sequence by one of these events... if we can figure out how many chars are dropped or what corruption is happening.
comment:11 Changed 4 years ago by Sascha
I think hardware handshaking works fine for GSM (no bytes get lost), but the overrun flag is set anyway. This results in one additional 0x00 byte for each overrun.
With the attached patch the serial driver doesn't forward the overruns, works without problems since 1 hour (of course I still get the rxerr messages).
So I guess the uart is not properly initialized?
comment:12 Changed 4 years ago by Sascha
Please ignore the previously posted log files. There was a bug in the iblock code.
I've attached a new log file...
comment:13 follow-up: ↓ 15 Changed 4 years ago by andy
With the new fixes and debug information you added Sascha, it seems that there is some correlation between (long) activity on IRQ 21 (SDI) and what's now a spew of the errors.
For GTA02 I guess that's WLAN driver, what's the status of your WLAN device when you are running these tests?
Does ignore_s3c2410_serial_overruns.patch mean that we are able to survive all of these claimed errors OK now?
comment:14 Changed 4 years ago by andy
Thanks a lot for the patches I sent them on to andy-tracking, together with a little patch on top to make the new handling time detector stuff sensitive to if the blocker config is enabled.
comment:15 in reply to: ↑ 13 Changed 4 years ago by Sascha
For GTA02 I guess that's WLAN driver, what's the status of your WLAN device when you are running these tests?
Module not loaded or loaded, but not used:
$ cat /proc/interrupts
CPU0
16: 1 s3c-ext0 lis302dl
17: 4 s3c-ext0 modem
30: 404740 s3c S3C2410 Timer Tick
33: 0 s3c s3c-mci
37: 400955 s3c s3c-mci
41: 97214 s3c s3c2410_udc
42: 36 s3c ohci_hcd:usb1
43: 1834 s3c s3c2440-i2c
48: 1 s3c-ext Neo1973 Headphone jack
49: 0 s3c-ext ar6000
50: 1 s3c-ext Neo1973 AUX button
51: 1 s3c-ext Neo1973 HOLD button
53: 19 s3c-ext pcf50633
60: 1 s3c-ext lis302dl
70: 540463 s3c-uart0 s3c2440-uart
71: 12968 s3c-uart0 s3c2440-uart
79: 6 s3c-adc s3c2410_action
80: 408 s3c-adc s3c2410_action
Err: 0
Does ignore_s3c2410_serial_overruns.patch mean that we are able to survive all of these claimed errors OK now?
for gsm: yes so far
for gps: no
Changed 4 years ago by andy
- Attachment fix-gta02-irq-arbiter-priority.patch added
Patch to change interrupt priorities
comment:16 Changed 4 years ago by andy
21 + 16 --> 37, which is under "s3cmci" and has 400955 interrupts to its name.
Apparently because "real" hard SDIO interrupt detection is broken in 2442 or at least the driver, we sit there polling the device 100 times a second or some such over SDIO interface to see if it has a pending interrupt.
So I guess this goes on whether the ar6000 part above it is active or not and makes the latencies.
For GPS unlike GSM the overruns are real I think. Maybe you can try the attached patch about changing interrupt priority to put UART1 before SDI and removing rotation, I still got GPS overruns here but I never saw a spew as in you log.
comment:17 Changed 4 years ago by Sascha
Hmm, I've removed ar6000.ko and rebooted:
# cat /proc/interrupts
CPU0
16: 1 s3c-ext0 lis302dl
17: 14 s3c-ext0 modem
30: 495931 s3c S3C2410 Timer Tick
33: 0 s3c s3c-mci
37: 349 s3c s3c-mci
41: 49772 s3c s3c2410_udc
42: 38 s3c ohci_hcd:usb1
43: 1274 s3c s3c2440-i2c
48: 1 s3c-ext Neo1973 Headphone jack
49: 0 s3c-ext ar6000
50: 1 s3c-ext Neo1973 AUX button
51: 1 s3c-ext Neo1973 HOLD button
53: 15 s3c-ext pcf50633
60: 1 s3c-ext lis302dl
70: 128818 s3c-uart0 s3c2440-uart
71: 1844 s3c-uart0 s3c2440-uart
73: 92360 s3c-uart1 s3c2440-uart
74: 61 s3c-uart1 s3c2440-uart
79: 14 s3c-adc s3c2410_action
80: 1032 s3c-adc s3c2410_action
Err: 0
And I still get:
Feb 11 15:00:24 gta02 kernel: [ 2240.430000] asm_do_IRQ(21): 8784 us Feb 11 15:00:26 gta02 kernel: [ 2241.650000] rxerr: port=1 ch=0xb5, rxs=0x00000001 Feb 11 15:00:26 gta02 kernel: [ 2241.680000] asm_do_IRQ(21): 5751 us Feb 11 15:00:26 gta02 kernel: [ 2241.685000] rxerr: port=1 ch=0x70, rxs=0x00000001 Feb 11 15:00:31 gta02 kernel: [ 2247.030000] asm_do_IRQ(21): 6536 us Feb 11 15:00:31 gta02 kernel: [ 2247.045000] asm_do_IRQ(21): 6129 us Feb 11 15:00:39 gta02 kernel: [ 2255.380000] rxerr: port=1 ch=0x03, rxs=0x00000001 Feb 11 15:00:42 gta02 kernel: [ 2258.060000] rxerr: port=1 ch=0x76, rxs=0x00000001 Feb 11 15:01:04 gta02 kernel: [ 2280.400000] rxerr: port=1 ch=0x6a, rxs=0x00000001 Feb 11 15:01:04 gta02 kernel: [ 2280.460000] asm_do_IRQ(21): 7199 us Feb 11 15:01:04 gta02 kernel: [ 2280.480000] asm_do_IRQ(21): 6596 us Feb 11 15:01:04 gta02 kernel: [ 2280.495000] asm_do_IRQ(21): 7503 us Feb 11 15:01:09 gta02 kernel: [ 2284.990000] asm_do_IRQ(21): 7268 us Feb 11 15:01:39 gta02 kernel: [ 2315.420000] rxerr: port=1 ch=0x9a, rxs=0x00000001 Feb 11 15:01:41 gta02 kernel: [ 2317.045000] asm_do_IRQ(21): 6133 us Feb 11 15:01:41 gta02 kernel: [ 2317.050000] rxerr: port=1 ch=0xb5, rxs=0x00000001
while "349 s3c s3c-mci" doesn't change...
comment:18 Changed 4 years ago by werner
While the AR6k driver is providing the eth0 interface, the SDIO device is active as far as the SDIO stack is concerned, and thus you get the interrupt polling.
If the AR6k driver is absent or disabled (rfkill), you have no SDIO device but still the SDIO controller. So the interrupt is shown but shouldn't move. If unbinding the driver of the SDIO controller, the interrupt vanishes completely.
- Werner
comment:19 Changed 4 years ago by Sascha
Ok, then do you already have enough informations to fix this bug?
Btw, fix-gta02-irq-arbiter-priority.patch doesn't change anything here.
comment:20 Changed 4 years ago by Sascha
I've done some more tests... and I must say that there are some bytes lost in the gsm uart (64 bytes after the overrun flag has been set).
received: \r\n2,2,2,2,2,2\r\n2,2,2 ,0,0,0,0,0\r\n expected: \r\n2,2,2,2,2,2\r\n2,2,2,2,2,2\r\n0,0,0,0,0,0\r\n
comment:21 Changed 4 years ago by andy
It seems there is a new update of the GSM firmware that is worth trying:
http://people.openmoko.org/joerg/calypso_moko_FW/moko11/
I just read on kernel list mwester found this solved handshake issues he also had around suspend time.
comment:22 Changed 4 years ago by Sascha
I flashed moko11 beta 1 and the errors in my test case are gone :)
comment:23 Changed 4 years ago by Sascha
I found that the delay is introduced by a busy wait in glamo_mci_send_request. The call graph looks like this:
arch/arm/kernel/irq.c asm_do_IRQ(21) include/linux/irq.h generic_handle_irq(21) arch/arm/plat-s3c24xx/irq.c s3c_irq_demux_extint8(21, desc) include/linux/irq.h generic_handle_irq(56) drivers/mfd/glamo/glamo-core.c glamo_irq_demux_handler(56, desc) arch/arm/include/asm/hw_irq.h desc_handle_irq(92, desc) drivers/mfd/glamo/glamo-mci.c glamo_mci_irq(92, desc) drivers/mfd/glamo/glamo-mci.c glamo_mci_irq_host(host) drivers/mfd/glamo/glamo-mci.c glamo_mci_send_request(mmc)
Feb 15 18:20:28 gta02 kernel: [21474811.755000] glamo_mci_send_request: 5874 us - 22770 polls Feb 15 18:20:28 gta02 kernel: [21474811.755000] asm_do_IRQ(21): 6170 us Feb 15 18:20:34 gta02 kernel: [21474817.000000] glamo_mci_send_request: 6195 us - 24007 polls Feb 15 18:20:34 gta02 kernel: [21474817.000000] asm_do_IRQ(21): 6509 us Feb 15 18:20:34 gta02 kernel: [21474817.015000] glamo_mci_send_request: 6354 us - 24627 polls Feb 15 18:20:34 gta02 kernel: [21474817.015000] asm_do_IRQ(21): 6646 us Feb 15 18:22:33 gta02 kernel: [ 99.735000] glamo_mci_send_request: 5109 us - 58541 polls Feb 15 18:22:33 gta02 kernel: [ 99.735000] asm_do_IRQ(21): 5433 us Feb 15 18:22:49 gta02 kernel: [ 115.380000] glamo_mci_send_request: 7170 us - 376570 polls Feb 15 18:22:49 gta02 kernel: [ 115.380000] asm_do_IRQ(21): 7492 us Feb 15 18:22:49 gta02 kernel: [ 115.385000] rxerr: port=1 ch=0x00, rxs=0x00000001 Feb 15 18:28:24 gta02 kernel: [ 450.320000] glamo_mci_send_request: 6284 us - 276284 polls Feb 15 18:28:24 gta02 kernel: [ 450.320000] asm_do_IRQ(21): 6557 us Feb 15 18:28:24 gta02 kernel: [ 450.335000] glamo_mci_send_request: 6225 us - 334202 polls Feb 15 18:28:24 gta02 kernel: [ 450.335000] asm_do_IRQ(21): 6507 us Feb 15 18:28:24 gta02 kernel: [ 450.355000] rxerr: port=1 ch=0x04, rxs=0x00000001 Feb 15 18:28:29 gta02 kernel: [ 455.275000] glamo_mci_send_request: 8298 us - 303475 polls Feb 15 18:28:29 gta02 kernel: [ 455.275000] asm_do_IRQ(21): 8624 us
The number of polls suggests that we lose timer interrupts (at 200 Hz we can only measure 2 * 5 ms)...
Can we really have a delay of ~ 1 second here? This would explain why a 64 byte fifo can overrun at 9600 baud.
comment:24 Changed 4 years ago by Sascha
argh, 100 ms delay...
comment:25 follow-up: ↓ 26 Changed 4 years ago by andy
100ms would still be death at 9600.
Wow I see what this one is anyway, when we do a multi read or write sequence, we have to send a STOP command when we've had enough. I had it perform the send action of the STOP command inside the glamo-mci interrupt service :-[ Thanks a lot for the great call trace.
This has to be split out into a workqueue, I will sort it out today hopefully.
There was a generic MCI CONFIG option I noticed and enabled in 2.6.28 period that more aggressively chains up sequential read and write actions into one longer bulk read or write on the bus... this can have increased probability of STOP issue because single page reads don't need STOP action.
But, Harald won't have this issue since he doesn't have Glamo. I don't think it's Calypso in his other platform either.
comment:26 in reply to: ↑ 25 Changed 4 years ago by Sascha
Replying to andy:
This has to be split out into a workqueue, I will sort it out today hopefully.
Any news on this yet?
comment:27 Changed 4 years ago by andy
Yes it's written but I haven't tested it yet, I wonder what the impact will be on throughput. I'll test it thisafternoon.
comment:28 Changed 4 years ago by andy
I just posted it to kernel list
http://lists.openmoko.org/pipermail/openmoko-kernel/2009-February/008782.html
it doesn't seem to impact throughput insanely bad just from eyeballing the boot anyway. But I doubt it improved matters there... maybe it can do something about the latencies though.
comment:29 Changed 4 years ago by Sascha
works fine here. no overruns within the last 24 hours... :)
comment:30 Changed 4 years ago by andy
That's great, I'll backport it to stable shortly.
But we didn't throw much light on Harald's issue since unless he soldered in a Glamo for old times' sake :-) it won't be that code hurting it.
comment:31 Changed 4 years ago by sushama
- Cc testing@… added
Andy, Testing teamw ould like to verify this bug with the current kernel-uImage-2.6.28-stable+gitr0+f19f259d3c1afde8eae53983fd19f61831927413-r2-om-gta02.bin
Can you guide us on how we could test this particular bug?
comment:32 Changed 3 years ago by lindi
I think I just hit this bug twice on my gta02V5 with andy-tracking a3587e4ed77974ad:
I was running gpsd on FR and streaming gps data over usb network to laptop. GSM, bluetooth, wifi and backlight were all off. Twice during the openstreetmap mapping trip the phone just froze and watchdog kicked in after a while. ramconsole logs ended with
<7>[45124.025000] rxerr: port=1 ch=0xb1, rxs=0x00000007
<7>[45124.030000] rxerr: port=1 ch=0x2c, rxs=0x00000007
<7>[45124.035000] rxerr: port=1 ch=0x4e, rxs=0x00000004
<7>[45124.040000] rxerr: port=1 ch=0x2e, rxs=0x00000004
<7>[45124.045000] rxerr: port=1 ch=0x24, rxs=0x00000004
<7>[45124.050000] rxerr: port=1 ch=0x14, rxs=0x00000006
<7>[45124.055000] rxerr: port=1 ch=0x25, rxs=0x00000006
<7>[45124.060000] rxerr: port=1 ch=0x35, rxs=0x00000004
<7>[45124.065000] rxerr: port=1 ch=0x43, rxs=0x00000006
<7>[45124.070000] rxerr: port=1 ch=0x89, rxs=0x00000006
<7>[45124.075000] rxerr: port=1 ch=0xb1, rxs=0x00000006
<7>[45124.080000] rxerr: port=1 ch=0x2c, rxs=0x00000006
<7>[45124.085000] rxerr: port=1 ch=0x89, rxs=0x00000006
<7>[45124.090000] rxerr: port=1 ch=0x62, rxs=0x00000006
<7>[45124.095000] rxerr: port=1 ch=0x2c, rxs=0x00000006
<7>[45124.100000] rxerr: port=1 ch=0x89, rxs=0x00000006
<7>[45124.105000] rxerr: port=1 ch=0x62, rxs=0x00000006
<7>[45124.110000] rxerr: port=1 ch=0x2c, rxs=0x00000006
<7>[45124.115000] rxerr: port=1 ch=0x4a, rxs=0x00000006
<7>[45124.120000] rxerr: port=1 ch=0xaa, rxs=0x00000004
<7>[45124.125000] rxerr: port=1 ch=0x33, rxs=0x00000004
<7>[45124.130000] rxerr: port=1 ch=0xd2, rxs=0x00000004
<7>[45124.135000] rxerr: port=1 ch=0x41, rxs=0x00000005
<7>[45124.140000] rxerr: port=1 ch=0x56, rxs=0x00000005
<7>[45124.145000] rxerr: port=1 ch=0xd5, rxs=0x00000007
<7>[45124.150000] rxerr: port=1 ch=0xb1, rxs=0x00000007
<7>[45124.155000] rxerr: port=1 ch=0x2c, rxs=0x00000007
<7>[45124.160000] rxerr: port=1 ch=0x39, rxs=0x00000004
and
<7>[21474658.310000] rxerr: port=1 ch=0x50, rxs=0x00000004
<7>[21474658.310000] rxerr: port=1 ch=0x54, rxs=0x00000004
<7>[21474658.315000] rxerr: port=1 ch=0xb1, rxs=0x00000006
<7>[21474658.320000] rxerr: port=1 ch=0x2c, rxs=0x00000006
<7>[21474658.325000] rxerr: port=1 ch=0x89, rxs=0x00000006
<7>[21474658.330000] rxerr: port=1 ch=0x62, rxs=0x00000006
<7>[21474658.335000] rxerr: port=1 ch=0x30, rxs=0x00000004
<7>[21474658.340000] rxerr: port=1 ch=0x62, rxs=0x00000006
<7>[21474658.345000] rxerr: port=1 ch=0x30, rxs=0x00000004
<7>[21474658.350000] rxerr: port=1 ch=0x52, rxs=0x00000006
<7>[21474658.355000] rxerr: port=1 ch=0x34, rxs=0x00000006
<7>[21474658.360000] rxerr: port=1 ch=0x6a, rxs=0x00000006
I have not tried any of the patches yet since I need to figure out how to make this bug occur again first.

It looks like it's even worse that that. This problem seems to also affect gsm communication, see the log at http://trac.freesmartphone.org/attachment/ticket/316/syslog_090115_1141.txt#L827
This leads to all sort of funnies and lock ups all over the framework, especially for the GSM communication which has no way of checking the integrity of the message. We need to fix this.