Ticket #2264 (new defect)

Opened 6 years ago

Last modified 4 years ago

Heavy GPRS traffic causes a Calypso crash

Reported by: budfive Owned by:
Priority: normal Milestone:
Component: GSM Modem Version: unspecified
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Estimated Completion (week):
HasPatchForReview: no PatchReviewResult:
Reproducible:

Description

I am running SHR-unstable with moko11beta on my Calypso. If I download a large file with GPRS, I am seeing the Calypso stop responding to AT commands, and generally go dead. The time or amount of data it takes for the Calypso to die seems to vary (2-20 minutes usually). It also doesn't matter what the downloaded data is. I usually download a file of zeros to trigger the bug (http://secretsauce.net:5050/zeros). I'm attaching a Calypso debug log collected from the headphone jack. I don't have the documentation to interpret the log, but the stuff at the end doesn't look good.

Attachments

calypsolog2 (1.6 MB) - added by budfive 6 years ago.
Calypso debug log throughout a gprs-downloading session. Crash appears at the end
gsmcrash.with.nspy.tgz (177.9 KB) - added by budfive 6 years ago.
Shorter pair of crash logs (nspy + headphone jack)
calypso-crash1.txt.bz2 (34.9 KB) - added by lindi 6 years ago.
calypso debug output when I try to make a SIP call over gprs and calypso crashes. some characters are lost since my AVR is not fast enough…

Change History

Changed 6 years ago by budfive

Calypso debug log throughout a gprs-downloading session. Crash appears at the end

comment:1 Changed 6 years ago by alphaone

This could be an issue with the muxer. Could you check which version of gsm0710muxd you have?

See also http://git.freesmartphone.org/?p=gsm0710muxd.git;a=commit;h=2edaa892361dab3001e9f2485bdd478298abbed3 for the fix

comment:2 Changed 6 years ago by budfive

That would be nice, but no; I'm up to date:
I'm at 0.9.3.1+gitrabcbcd7cc532a8834906de3fc24c8f8fe7643cd4-r0

comment:3 Changed 6 years ago by budfive

I straced gsm0710muxd from start until the Calypso crash. The log appears at
http://secretsauce.net:5050/calypsobug.mux.log

This log file is ~4MB, so I'm not uploading it here.

comment:4 Changed 6 years ago by alphaone

One more thing, what is the firmware version on the TI Calypso? Maybe mickey knows more, he's been using GPRS on his FR without problems so far, I believe.

comment:5 Changed 6 years ago by alphaone

Okay, sorry I'm blind.
I don't really have an idea except maybe test with fso-image.

comment:6 Changed 6 years ago by shoragan

The strace log only shows that the modem suddenly goes silent. I don't have an idea how to debug this... :(

comment:7 Changed 6 years ago by shoragan

What's the charge level of your battery? The gsm modem is using the battery directly and may shut down on low power.

comment:8 Changed 6 years ago by mickey

The battery might be the culprit. If not, then: You can easily overload the modem in GPRS. It will then flag you that it can no longer accept any data, however gsm0710muxd is not honoring this flag. It keeps overloading the modem and this might lead to a crash. Fso-abyss is honoring the flow control flag, so once we switch over to it completely, this bug should be fixed. (_if_ it is that, I'm not sure yet.) Which line in your strace log shows the crash?

comment:9 Changed 6 years ago by budfive

As far as I can tell from the logs, the muxer is fine, and doesn't crash itself. I've done some other experiments, such as power-cycle the calypso via the sysfs node, and that causes the calypso to come back with "interpreter ready" or whatever it says. The Calypso log, though, clearly shows that the modem has crashed. This is fairly repeatable for me (boot, start gprs, wget, wait 10 minutes or so). The phone is plugged in the whole time, so the battery isn't the problem; also I'm up-to-date with everything.

About 62% into the Calypso log, the modem starts issuing warnings "SYSTEM WARNING: Bigger partition allocated than requested" for entities SND, LLC, UART, PPP, IRQ. It seems to keep working fine for a while despite this, only crashing further down, 78% into the log with "SYSTEM ERROR: No Partition available". Everything past this is the crash dump. Does this speak to anybody?

comment:10 Changed 6 years ago by dieter

This bug is caused because the GSM firmware is out of memory and asserts (stops). Most certainly there is a memory leak somewhere. The problem: Openmoko does not have the source code of the GSM stack and this is most certainly the place where the memory leak occurs. The entities of the GSM stack communicate by sending messages, the memory for those messages is dynamically allocated and (hopefully) freed again. I will see what can be done, but I can't estimate yet how long it takes to fix it (if possible at all).

comment:11 Changed 6 years ago by budfive

I just found a much faster and more consistent method of triggering this bug. I installed tinyproxy onto the GTA02 (http://secretsauce.net:5050/tinyproxy_1.6.4-r0_armv4t.ipk), and used it as a GPRS web proxy for my laptop, over usb0. Tinyproxy needs to be configured by adding "Allow 192.168.0.0/24" to /etc/tinyproxy/tinyproxy.conf. If I start the proxy, and access http://www.latimes.com from my laptop (Opera browser, Javascript on, images on, plugins off) via the proxy, the calypso crashes before the page finishes loading. I've tried this about 8 times and it crashed 7 of the 8. The one time it didn't crash, I reloaded the page and it crashed that time. Each time, the log from the headphone jack showed a crash very similar to the log that is attached to this bug.

I compared this to bug 2257 (https://docs.openmoko.org/trac/ticket/2257), and the symptoms are identical, except that lindi's "good" kernel does not fix this calypso crashing for me. I'm starting to seriously suspect that this and 2257 are the same problem.

I also tried mwester's nspy, as described in a comment for 2257. I'm attaching a pair of logs, one from nspy, and the other from the headphone jack, collected together during a load of the latimes website. This was a case where it didn't crash the first time, but did crash on reload.

comment:12 Changed 6 years ago by budfive

I just tried to reproduce the crash running GPRS directly through pppd with no FSO and no muxer. The network connection was stable, and I could not make it crash under those conditions, which strongly points to the muxer as the source of the problem. Going back to the framework+muxer, the crash hapenned immediately. I'm updating the log pair (headphone jack+nspy) to a shorter set of logs that I obtained with a single loading of the website.

Changed 6 years ago by budfive

Shorter pair of crash logs (nspy + headphone jack)

comment:13 Changed 6 years ago by mickey

I have just released fso-abyss 0.3.3 which contains support for handling flow control on virtual channels. This could be a workaround for the firmware crash you are seeing. I only did test with ping -s 10000 so far, will test with your website on weekend.

comment:14 Changed 6 years ago by mickey

Looks good here. I don't see it crashing with the latimes site.

comment:15 Changed 6 years ago by alphaone

Did you suffer from the crash before?

comment:16 Changed 6 years ago by lindi

After a week of studying AVR programming I can now reproduce this crash and capture the error message "SYSTEM ERROR: No Partition available".

Changed 6 years ago by lindi

calypso debug output when I try to make a SIP call over gprs and calypso crashes. some characters are lost since my AVR is not fast enough...

comment:17 Changed 6 years ago by budfive

I updated to the latest unstable SHR image (as of late May 2009), and for whatever reason no longer hit FSO bug 409. I can thus try this with fso-abyss (0.3.3). The gprs seems stable with the new muxer, so maybe the flow control was indeed the issue.

comment:18 Changed 5 years ago by GNUtoo

Hi,I've the same problem...I tried fso-abyss and it didn't solve the problem(I verified that fso-abyss was used)
I triger it that way:
gprs->openvpn<--internet-->openvpn<-asterisk-server
not all the traffic is rerouted trough the vpn...only sip signaling
the vpn is UDP
that is for permitting sip signaling to pass trough the provider's port blocking
then with SIP it makes the calypso crash and I've to reboot the phone...
I even tried to change fso-abyss niceness...
I can reproduce it when I want

comment:19 Changed 4 years ago by lindi

I can still reproduce the crash with fso-abyss 0.9.0+git20100310-1, linphone-nox 3.3.2-1, fso-frameworkd e6c36e917cc75809f60fa587b68bbf6be0c5bf58, andy-tracking a3587e4ed77974ad

comment:20 Changed 4 years ago by lindi

Uploading data with netcat caused me to hit this again and again. Using

tc qdisc add dev ppp0 root tbf rate 2kbit latency 50ms burst 1600

seems to have so far helped. Can everyone who is suffering from this bug try this please? (It does limit your outgoing traffic considerably but that might just be the price for stability)

comment:21 Changed 4 years ago by lindi

I sent UDP packets at regular intervals and wrote down how it affects the calypso system:

size(bytes) interval(msec) result
1000 1000 ok
1000 500 crash
1000 750 crash
1000 900 crash
500 500 ok
250 250 ok
130 130 crash

It seems that we can sustain 1000 bytes per second but trying to go faster causes calypso to crash. This probably means that setting "rate" in the above tc command to anything higher than say "7kbit" probably allows normal users to crash calypso.

I'm currently testing with

$ cat /etc/ppp/ip-up.d/09lindi-tc
#!/bin/sh
/sbin/tc qdisc add dev ppp0 root tbf rate 7kbit latency 4500ms burst 3200

comment:22 Changed 4 years ago by TimoJyrinki

With quick testing I can confirm that lindi's tc command improves GPRS usability tremendously. I'm now browsing google maps with full javascript and images and SSHing to a server running irssi at the same time, something which is on the heavy side and has tended to always crash the GSM before.

comment:23 Changed 4 years ago by lindi

If I send 1000-byte packets every 500 ms then 132 of those reach the destination before calypso crashes. Since the outgoing bandwidth is 1000 bytes per second, wouldn't that imply that calypso can buffer 130000 bytes before it crashes? Does it even have that much memory?

Note: See TracTickets for help on using tickets.