Ticket #1597 (closed defect: fixed)

Opened 6 years ago

Last modified 5 years ago

after a long suspend time, kernel thread events/0 sits eating 30% cpu

Reported by: raster Owned by: zecke
Priority: normal Milestone:
Component: kernel Version:
Severity: major Keywords:
Cc: werner@…, zecke@…, john_lee@… Blocked By:
Blocking: Estimated Completion (week):
HasPatchForReview: no PatchReviewResult:
Reproducible:

Description

if the freerunner is suspended for a long time (eg many hours) sometimes it wakes up with events/0 process (kernel thread) eating 30% cpu and the system being very slow (especially xglamo). and also i've seen avahi-daemon eating up 30% of cpu too at the same time...

unsure what is really up at this stage but it smells of the wifi driver to me. seen in current stable kernel:

uImage-2.6.24+git42+0f565eebf6f9a52a66053348aa710e05732f934e-r1-om-gta02.bin

Attachments

top_output_for_holger (5.0 KB) - added by wendy_hung 6 years ago.
dmesg.txt (15.2 KB) - added by odlg 6 years ago.
dmesg_output.txt (15.0 KB) - added by koe 6 years ago.
ar6k-giwscan-no-eagain.patch (608 bytes) - added by werner 6 years ago.
dmesg-removing-from-wall-charger-pluging-into-desktop-usb (15.3 KB) - added by lala 6 years ago.

Change History

comment:1 Changed 6 years ago by raster

added comment - killing avahi-daemon still has events/0 eating cpu at the same rate... as best i can tell its doing it all on its own.

comment:2 follow-up: ↓ 5 Changed 6 years ago by andy

  • Cc zecke@… added

Holger, is this coming from recent uevent / PMU stuff? No basis for suspecting it except that was recent event-related code.

comment:3 Changed 6 years ago by roh

  • Owner changed from andy@… to andy

comment:4 Changed 6 years ago by zecke

strace of the user space processes misbehaving would be handy.

comment:5 in reply to: ↑ 2 Changed 6 years ago by zecke

Replying to andy:

Holger, is this coming from recent uevent / PMU stuff? No basis for suspecting it except that was recent event-related code.

I think that is unlikely.

comment:6 Changed 6 years ago by andy

How about neod / eating 2 x 100Hz motion sensor traffic?

comment:7 Changed 6 years ago by zecke

More likely (besides we don't have a neod running). I would have to check if the the legacy tty handler is playing an input_handler for the motion sensors as well. The only issue is that neod doesn't run with asu.

I have a constant load of 2.x on my neo and wonder where it comes from. I think iotop will help me finding out (but that needs a special kernel config).

comment:8 Changed 6 years ago by andy

About the load, update your kernel, a Jason Uhlenkott fixed it for us some days ago

http://git.openmoko.org/?p=kernel.git;a=commitdiff;h=287b292cf95edbd82dc63085ae5f0167a6e8141f

comment:9 Changed 6 years ago by raster

load calculation isn't the problem.. the cpu they consume is real. everything slows down. xglamo is painfully slow, as is enlightenment... sorry no straces.. i'll try next time i see it...

comment:10 Changed 6 years ago by andy

Sure it's not this bug. Holger mentioned it as an aside and I saw my 2.0 idle load go to 0.0 after the patch, so I think that issue is fixed by that patch.

comment:11 Changed 6 years ago by zecke

  • Owner changed from andy to zecke
  • Status changed from new to accepted

my bug. I just got annoyed by the constant output of:

ul 23 10:06:03 om-gta02 user.debug kernel: [ 6263.920000] power_supply bat: power_supply_changed
Jul 23 10:06:03 om-gta02 user.debug kernel: [ 6263.920000] power_supply bat: power_supply_changed_work
Jul 23 10:06:03 om-gta02 user.debug kernel: [ 6263.920000] power_supply bat: power_supply_update_bat_leds 1
Jul 23 10:06:03 om-gta02 user.debug kernel: [ 6263.925000] power_supply bat: uevent
Jul 23 10:06:03 om-gta02 user.debug kernel: [ 6263.925000] power_supply bat: POWER_SUPPLY_NAME=bat
Jul 23 10:06:03 om-gta02 user.debug kernel: [ 6263.930000] power_supply bat: Static prop TYPE=Battery
Jul 23 10:06:03 om-gta02 user.debug kernel: [ 6263.935000] power_supply bat: 11 dynamic props
Jul 23 10:06:03 om-gta02 user.debug kernel: [ 6263.950000] power_supply bat: prop STATUS=Charging
Jul 23 10:06:03 om-gta02 user.debug kernel: [ 6264.120000] power_supply bat: prop VOLTAGE_NOW=3763000
Jul 23 10:06:03 om-gta02 user.debug kernel: [ 6264.180000] power_supply bat: prop CURRENT_NOW=-117187
Jul 23 10:06:03 om-gta02 user.debug kernel: [ 6264.225000] power_supply bat: prop CHARGE_FULL=1177029
Jul 23 10:06:03 om-gta02 user.debug kernel: [ 6264.270000] power_supply bat: prop TEMP=353
Jul 23 10:06:03 om-gta02 user.debug kernel: [ 6264.270000] power_supply bat: prop TECHNOLOGY=Li-ion
Jul 23 10:06:03 om-gta02 user.debug kernel: [ 6264.290000] power_supply bat: prop PRESENT=1
Jul 23 10:06:04 om-gta02 user.debug kernel: [ 6264.335000] power_supply bat: prop TIME_TO_EMPTY_NOW=3932100
Jul 23 10:06:04 om-gta02 user.debug kernel: [ 6264.380000] power_supply bat: prop TIME_TO_FULL_NOW=45600
Jul 23 10:06:04 om-gta02 user.debug kernel: [ 6264.395000] power_supply bat: prop CAPACITY=20
Jul 23 10:06:04 om-gta02 user.debug kernel: [ 6264.395000] power_supply bat: prop ONLINE=1
Jul 23 10:06:04 om-gta02 user.debug kernel: [ 6264.430000] power_supply bat: power_supply_changed
Jul 23 10:06:04 om-gta02 user.debug kernel: [ 6264.430000] power_supply bat: power_supply_changed
Jul 23 10:06:04 om-gta02 user.debug kernel: [ 6264.445000] power_supply bat: power_supply_changed
Jul 23 10:06:04 om-gta02 user.debug kernel: [ 6264.445000] power_supply bat: power_supply_changed_work
Jul 23 10:06:04 om-gta02 user.debug kernel: [ 6264.445000] power_supply bat: power_supply_update_bat_leds 1
Jul 23 10:06:04 om-gta02 user.debug kernel: [ 6264.450000] power_supply bat: uevent

comment:12 Changed 6 years ago by andy

Wow that's the problem that it spews events?

comment:13 Changed 6 years ago by zecke

I think this sysfs code should spew out stuff on the console but the question is why is power_supply_changed called that often, from where and what happens if we stop that? trying to do this now.

Changed 6 years ago by wendy_hung

comment:14 Changed 6 years ago by wendy_hung

It looks like the motion sensors start up and produce plenty of interrupts after a while. Please see the attachment. Nothing in the system has event files open.

comment:15 Changed 6 years ago by zecke

Okay, the previous entry was a red herring. I attached the debug board and waited for it to happen. It is the wlan driver. The internal debug buffer of the a6k runs over and tells the host about it... I try to understand how to disable that.

Breakpoint 19, wireless_send_event (dev=0xc7430000, cmd=35842, wrqu=0xc7c25d50, extra=0xc7d594c0 "\b0l�,D\b0") at net/wireless/wext.c:1229
1229 if (cmd <= SIOCIWLAST) {
(gdb) bt
#0 wireless_send_event (dev=0xc7430000, cmd=35842, wrqu=0xc7c25d50, extra=0xc7d594c0 "\b0l�,D\b0") at net/wireless/wext.c:1229
#1 0xc01e534c in ar6000_send_event_to_app (ar=0xc7430360, eventId=12296, datap=0xc7fe0094 "l�,D\b0", len=244)

at drivers/sdio/function/wlan/ar6000/ar6000/ar6000_drv.c:2894

#2 0xc01e5464 in ar6000_dbglog_event (ar=0xc7430000, dropped=<value optimized out>, buffer=0xc7fe0094 "l�,D\b0", length=1492)

at drivers/sdio/function/wlan/ar6000/ar6000/ar6000_drv.c:482

#3 0xc01f63e0 in wmi_control_rx_xtnd (wmip=0xc7d59ec0, osbuf=0xc7c32380) at drivers/sdio/function/wlan/ar6000/wmi/wmi.c:1359
#4 0xc01f7308 in wmi_control_rx (wmip=0xc7d59ec0, osbuf=0xc7c32380) at drivers/sdio/function/wlan/ar6000/wmi/wmi.c:630
#5 0xc01e7f78 in ar6000_rx (Context=0xc7430360, pPacket=0xc7fe0000) at drivers/sdio/function/wlan/ar6000/ar6000/ar6000_drv.c:1913
#6 0xc01e0fd0 in HTCRecvCompleteHandler (Context=0xc7e9a000, pPacket=0xc7fe0000) at drivers/sdio/function/wlan/ar6000/htc/htc_recv.c:324
#7 0xc01de7b0 in DevRWCompletionHandler (context=0xc7d594c0, status=<value optimized out>) at drivers/sdio/function/wlan/ar6000/htc/ar6k_events.c:42
#8 0xc01e2d10 in hifRWCompletionHandler (request=<value optimized out>) at drivers/sdio/function/wlan/ar6000/hif/hif.c:420
#9 0xc01da0f4 in _SDIO_HandleHcdEvent (pHcd=0xc039b884, Event=<value optimized out>) at drivers/sdio/stack/busdriver/_busdriver.h:347
#10 0xc01dacc8 in SDIO_HandleHcdEvent (pHcd=0xc7430000, Event=2 '\002') at drivers/sdio/stack/busdriver/sdio_bus_os.c:199
#11 0xc01ddb88 in s3c24xx_hcd_io_work (work=<value optimized out>) at drivers/sdio/hcd/s3c24xx/s3c24xx_hcd.c:645
#12 0xc005773c in run_workqueue (cwq=0xc7c01a00) at kernel/workqueue.c:277
#13 0xc0058374 in worker_thread (cwq=<value optimized out>) at kernel/workqueue.c:322
#14 0xc005bfec in kthread (_create=<value optimized out>) at kernel/kthread.c:78
#15 0xc0048db4 in sys_waitid (which=-942304064, pid=35842, infop=0xc7c25d50, options=-943563448, ru=0x0) at kernel/exit.c:1727

comment:16 Changed 6 years ago by zecke

  • Status changed from accepted to in_testing

The latest stable kernel does not compile the a6k driver in debug mode, this might have toggled a switch to not use the diagnostic buffer at all. I have not seen this issue since then but I'm not confident to say that it is fixed. Please keep an eye on it but I put it into testing for now.

Automatically turning on accelerometers actually belongs to another bug.

comment:17 Changed 6 years ago by koe

It seems that this did not fix the problem, my events/0 process still uses 22% of the cpu time after an uptime of about a day. I am using kernel-image-2.6.24_2.6.24+git35+2d61a7406ec89893cdb4246d3f0144818278a5d8-r2_om-gta02.ipk.

comment:18 Changed 6 years ago by zecke

As usual please provide log messages. E.g. the output of dmesg would be interesting. I assume we see something like EP full messages...

comment:19 Changed 6 years ago by odlg

I see this as well, here is dmesg output.

Changed 6 years ago by odlg

Changed 6 years ago by koe

comment:21 follow-up: ↓ 22 Changed 6 years ago by koe

I've also added my dmesg output, it seems that there where no unexpected messages after booting.

comment:22 in reply to: ↑ 21 Changed 6 years ago by h.koenig

Replying to koe:

I've also added my dmesg output, it seems that there where no unexpected messages after booting.

right now I see the same problem (looping events/0) for the first time running this kernel:

root@om-gta02:~# opkg list_installed kernel
kernel - 2:2.6.24+git75965+cb3cc53a76c7f1f7c827d048db7a849e77071515-r1.01 -

root@om-gta02:~# cat /proc/version
Linux version 2.6.24 (build@barbie) (gcc version 4.1.2) #1 PREEMPT Tue Aug 26 08:33:29 CST 2008

the FR was running all night (suspend disabled) waiting for the Xglamo looping problem (on devel list see subject: Xglamo loops and hangs), but this morning both Xglamo and events/0 loop:

Cpu(s): 22.0%us, 75.2%sy, 0.0%ni, 0.3%id, 0.0%wa, 0.0%hi, 2.5%si, 0.0%st
Mem: 123856k total, 121796k used, 2060k free, 24k buffers
Swap: 0k total, 0k used, 0k free, 12240k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

1513 root 19 -1 12784 4020 824 R 49.1 3.2 166:19.26 Xglamo

5 root 15 -5 0 0 0 S 30.5 0.0 507:27.27 events/0

1468 root 9 -11 18704 3012 420 S 11.0 2.4 383:22.12 pulseaudio

if I SIGSTOP Xglamo, top output looks like this:

Cpu(s): 6.9%us, 41.0%sy, 0.0%ni, 50.0%id, 0.0%wa, 0.0%hi, 2.1%si, 0.0%st
Mem: 123856k total, 121860k used, 1996k free, 24k buffers
Swap: 0k total, 0k used, 0k free, 5500k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

5 root 15 -5 0 0 0 R 30.5 0.0 507:54.49 events/0

1468 root 9 -11 18704 3012 420 S 10.9 2.4 383:30.47 pulseaudio
6618 root 20 0 2396 1148 904 R 4.7 0.9 0:01.71 top

and here is the Xglamo traceback:

(gdb) where
#0 0x00014f10 in GLAMOEngineWaitReal ()
#1 0x000152e4 in GLAMOFlushCMDQCache ()
#2 0x00015ee4 in GLAMOBlockHandler ()
#3 0x0001ee8c in BlockHandler? ()
#4 0x000edaa4 in WaitForSomething? ()
#5 0x000381b4 in Dispatch ()
#6 0x000223e8 in main ()
(gdb)

in dmesg I get lots of these messages:

[220492.240000] ar6000_ioctl_giwscan(): data length 0
[220507.270000] ar6000_ioctl_giwscan(): data length 0
[220523.220000] ar6000_ioctl_giwscan(): data length 0
[220539.200000] ar6000_ioctl_giwscan(): data length 0
[220555.200000] ar6000_ioctl_giwscan(): data length 0

any more data I can provide ?

is it possible to reset or stop that events/0 thread without rebooting ?

just in case it might be helpful to further debug the Xglamo process which is stuck too....

comment:23 Changed 6 years ago by zecke

jtag is the interesting thing. I assume the wifi chip is still crazy.

comment:24 Changed 6 years ago by regina_kim

Raster ~ please check this ticket.

comment:25 Changed 6 years ago by raster

check? why? what? eh?

comment:26 Changed 6 years ago by regina_kim

testing status now. check then close ticket.

comment:27 Changed 6 years ago by Treviño

I've the same here... After some days of uptime (not suspending) I get events/0 eating my phone.

comment:28 Changed 6 years ago by lala

Using Om2008.8-update I can reproduce this in 8-12 hours, without suspending, wifi or accelerometers. Only with 1-2 gps activations and 1-2 phone calls. Most of the time the phone is connected to my desktop/usb.

When this "plague" happens, even if I kill all the processes (xserver, hald, ompower, gsm0710muxd, dbus-daemon, ... all) it still eat around 30% from proc.

kernel: Om2008.8-gta02-20080903.uImage.bin

comment:29 Changed 6 years ago by werner

From reading the code, I'd say that the ar6000_ioctl_giwscan message means
that no access points were found and SIOCGIWSCAN returned EAGAIN, which is
arguably the wrong return code.

I don't know if not finding any access points is cause or effect, though.
Does my ar6k-giwscan-no-eagain.patch ease the pain ?

Changed 6 years ago by werner

comment:30 Changed 6 years ago by lala

I left my phone connected to the wall charger and it happened again over night. In the morning the battery was 95% full (batmon). I've seen that "SDIO Helper" also eat around 5% from CPU. In the /tmp/x.log I've seen "CHANGE PW SAVE MODE TO 3 / 4" between a lot of "...caches flushed.".

So, by doing nothing I can reproduce this in less than 23 hours. I mean, no gps activated, no gpsd started, no wifi, no bt, no phone calls received or initiated, not playing with accelerometers. Just lunching clock (qtopia), Settings-exposure (non-stop), batmon, dmesg, htop (non-stop), xterm (non-stop) and pluging/unpluging into/from usb.

rootfs: Om2008.8-update (updated about 3 weeks ago)
kernel: Om2008.9-gta02-20080916.uImage.bin (has the same md5 like Om2008.8-gta02-20080903.uImage.bin)
uboot: gta02v5_and_up-u-boot.bin from daily builds (2008.09.30)

comment:31 Changed 6 years ago by lala

Is it possible to be from battery kernel-module?

One time I've seen a lot of errors in dmesg with the words "current", "property", "power", "supply", "bat", from what I remember.

comment:32 Changed 6 years ago by lala

I forgot to say: suspend was disabled with the wrench and screensaver was disabled with "xset s off". I've also changed brightness and profile from time to time.

comment:33 Changed 6 years ago by werner

Just had a device running FSO MS3 do something that looked similar: about
75% Xglamo, 15% events/0, 5% SDIO Helper. I don't think FSO MS3 ever
suspends, so this doesn't seem to be related to suspend/resume problems.

In terms of events actually reported through /dev/input/event*, I only saw
the accelerometers, pumping out data at about 400kHz. Brutally forcing their
interrupts off (gpio f0=1 g8=1 - warning, that's CPU GPIO output driving
against interrupt output of the acceleration sensors) made them stop but
didn't slow down events/0.

Killing Xglamo also had no effect on events/0. However it's interesting to
note that the CPU load of events/0 did not increase.

Note that, after stopping the acceleration sensors from reporting data,
there was no other significant amount of data coming out of
/dev/input/event*.

Killing virtually all other processes on the system had no effect on events/0.

Throwing a crowbar into SDIO (by disabling the SD clock, gpio e5=0) did stop
SDIO Helper and events/0 from running wild. Unfortunately, the Atheros SDIO
stack does not recover when I re-enable the clock.

comment:34 Changed 6 years ago by werner

A small update: I booted FSO MS3 and let it just sit there completely idle.
No user input, no SIM card, and no suspend. After about 15-19h, events/0
ran at ~20%.

This time, Xglamo remained silent. So it seems that busy Xglamo is indeed
unrelated.

This might be a continuous interrupt from the S3C MMC hardware. A similar
problem has been observed a while ago with the mainline driver.

Never having observed this problem when using a stack that had SD-SPI at
its bottom also resonates with my hypothesis.

comment:35 Changed 6 years ago by andy

Shouldn't this show up on cat /proc/interrupts?

comment:36 Changed 6 years ago by werner

Good that you mention it. Indeed, I also saw that S3C MMC got quite a lot
of interrupts, about as many as the timer. Could be cause or effect, though.

comment:37 Changed 6 years ago by lala

Being with events/0 at ~27% CPU, I've unpluged the neo from the wall charger into the desktop usb and I'v seen some errors in dmesg. See attached.

comment:38 Changed 6 years ago by zecke

Debugging hint:

  • Compile a6k as module make it autoload
  • If CPU is eaten unload the module

Hypothesis:

  • events/0 is executing the work queue of the hcd s3c implementation of "Openmoko" (glue)
  • interrupts will continue but the cpu will not be busy anymore

What would be interesting:

  • loading the module again (assuming the hcd s3c implementation is doing a reset)
  • What happens with the interrupts and the load
  • Maybe even restart connman

Possible outcomes:

  • With reload/reset interrupts come in and events/0 is busy => firmware
  • With reload/reset interrupts are normal => our hcd glue

who is willing to do what is necessary?

comment:39 Changed 6 years ago by werner

Unload/reset would most likely stop the trouble, but that may just be
because the driver would of course reset the WLAN module. My current
plan is to just observe but otherwise ignore that bug for now and put
everything on the Linux SDIO stack instead. Then we'll see if the
problem persists.

Regarding the USB crash: seems that this is caused by the accelerometer
driver trying to get a mutex while interrupts are off and isn't related
to the WLAN problem. I think Andy has a patch that gets rid of all those
mutex uses in the driver, so it should solve that specific issue.

comment:40 Changed 6 years ago by andy

ar6k doesn't work as it is when compiled as a module. The stack doesn't pick up that it exists on module insertion.

Having WLAN stuff as a module gives some nice opportunity to recover from this and firmware / stack problems without reboot, it also enables power saving stuff too.

If Werner can put ar6k on normal stack, maybe it can detect "insertion" properly and we can use the whole thing as a module, that will be another advantage of the change.

Attempts to use Linux SPI stack with interrupt lockout are doomed, these are the source of the dmesg traffic.

comment:41 Changed 6 years ago by zecke

"Insertion" should be farily easy to add to the a6k stack... our hcd glue is sending such an event... it is just a nightmare to track it.

Disclaimer: This does not mean I volunteer. :)

comment:42 Changed 6 years ago by lala

And another thing, maybe will be helpful: the power consumption is ~100mA with brightness set to low and gsm activated. When events/0 starts to eat ~27% CPU, the consumption goes to 200-240mA, and after 2-3 hours the front of the neo is unusual hot.

comment:43 Changed 6 years ago by andy

Wow that is good to know, folks have reported that hotness symptom before, at least we got a clue now it is associated with the events thing and seems to be to do with WLAN.

When WLAN is in RX or TX in RF, it can pull these kinds of currents like extra 400mW. Normally it uses 80211 powersave and times its RF activity around the beacon of the AP, so its average power is much lower. But if for some reason its RF stage was stuck on, it will heat up itself and maybe also raise the temperature of the battery.

comment:44 Changed 6 years ago by lala

And it is also good to know that I don't touch wifi, which is disabled by default :)

comment:45 Changed 6 years ago by lala

I'm afraid that the front of the neo being hot it's not related with this issue. It all depends on which is the pocket that holds my neo and if the screen is near the body.

I'm sorry.

comment:46 Changed 6 years ago by Zogg

After some time, events/0 stops eating CPU in the same mystical fashion it started. Timespan was too wide to notice when did that happen, or what made it stop (2008.9 stable, with latest updates).

comment:47 Changed 6 years ago by john_lee

  • Cc john_lee@… added

comment:48 follow-up: ↓ 49 Changed 6 years ago by lala

Yes, after a while events/0 stops eating CPU.

Somewhere between 11-19 hours after boot events/0 raised in (h)top. Between 30-35 hours uptime dropped to 0-0.6 %CPU. Somewhere between 48-53 hours uptime raised again in "pole position". It's seems that is changing its state every ~17 hours.

/proc/interrupts shows

16: 1 s3c-ext0 lis302dl
17: 6 s3c-ext0 modem
30: 38755766 s3c S3C2410 Timer Tick
33: 5718424 s3c s3c24xx_hcd
37: 57640576 s3c S3c24xx SDIO host controller
41: 520580 s3c s3c2410_udc
42: 0 s3c ohci_hcd:usb1
43: 110879 s3c s3c2440-i2c
48: 1 s3c-ext Neo1973 Headphone Jack
49: 0 s3c-ext ar6000
50: 1 s3c-ext Neo1973 AUX button
51: 13 s3c-ext Neo1973 HOLD button
53: 241 s3c-ext pcf50633
60: 123083 s3c-ext lis302dl
70: 16880 s3c-uart0 s3c2440-uart
71: 13386 s3c-uart0 s3c2440-uart
76: 0 s3c-uart2 s3c2440-uart
77: 25 s3c-uart2 s3c2440-uart
79: 3418 s3c-adc s3c2410_action
80: 661875 s3c-adc s3c2410_action

comment:49 in reply to: ↑ 48 Changed 6 years ago by lala

Replying to lala:

When events/0 goes crazy,

33: 5718424 s3c s3c24xx_hcd

grows with ~100 per second

37: 57640576 s3c S3c24xx SDIO host controller

grows with ~1000 per second

comment:50 Changed 5 years ago by xbaldauf

Is this bug fixed in Om2008.12 (kernel)? I'm running Om2008.12, and as soon as I connect using WLAN, "events/0" is taking up about 23% of CPU time. There does not seem to be any improvement over Om2008.09 or Om2008.08.

comment:51 Changed 5 years ago by werner

  • HasPatchForReview unset

I believe this bug has vanished as part of the SDIO stack change in 2.6.28. Om2008.12 still uses the old kernel. I don't know what the migration plan to
2.6.28 looks like for Om2008.12+.

comment:52 Changed 5 years ago by xbaldauf

I can confirm that the problem goes away when using kernel 2.6.28 (from http://git.openmoko.org/?p=kernel.git;a=tree;h=79f8516ca4a534e42d97fe727c63eb7281be48ed;hb=79f8516ca4a534e42d97fe727c63eb7281be48ed ). This kernel seems to work for me (with Om2008.12), I cannot see any problems.

comment:53 follow-up: ↓ 54 Changed 5 years ago by Matthias

I did a lot of tests with monitor-scripts to catch the situation and can confirm:

  • the runaway (consuming CPU) of events/0 occurs also without using suspend;
  • it occurs without using GPRS, GPS and Wifi, just acting the FR as a GSM phone
  • it occurs exactly after 18h+16...17min uptime (which is in seconds a bit more than the unsigned short value of 65536)

Matthias (guru@…)

comment:54 in reply to: ↑ 53 Changed 5 years ago by Matthias

Replying to Matthias:

I did a lot of tests with monitor-scripts to catch the situation and can confirm:

  • the runaway (consuming CPU) of events/0 occurs also without using suspend;
  • it occurs without using GPRS, GPS and Wifi, just acting the FR as a GSM phone
  • it occurs exactly after 18h+16...17min uptime (which is in seconds a bit more than the unsigned short value of 65536)

Matthias (guru@…)

sorry, forgot to say this is with Om2008.9

comment:55 Changed 5 years ago by andy

As it says above the issue can't be reproduced (allegedly) on 2.6.28 kernels, we won't be backporting the change that we think solved it to 2.6.24.

comment:56 follow-up: ↓ 57 Changed 5 years ago by werner

  • Status changed from in_testing to closed
  • Resolution set to fixed

Yes, time to move on :-)
I'm closing it as "fixed" (in 2.6.28).

comment:57 in reply to: ↑ 56 Changed 5 years ago by Matthias

Replying to werner:

Yes, time to move on :-)
I'm closing it as "fixed" (in 2.6.28).

Werner, don't get me wrong, but I have always a bad feeling if a bug gets closed because "I believe this bug has vanished" or when a bug can not be reproduced any more; a bug is only really fixed when someone says, for example, "yes there was a small int used where it should be integer, I've changed and commited it"; believe me that I know about what I'm talking, I'm working as the tech head in a software company;

concerning the bug itself I said, that it also occurs without 'long suspend time'; may be we are talking about two different bugs or usages of the FR; has someone with a 2.6.28 kernel tested it 7x24 without suspending the FR, at least?

anyway; I will update asap to 2.6.28 (once I get an idea about an upgrade path in Om2008.12++) and will come back to this (or hopefully not :-))

just my concerns, Matthias

comment:58 Changed 5 years ago by andy

It'll get reopened for sure if there is any more sign of it on 2.6.28.

As for the upgrade path, I read that testing or unstable repo for FSO has migrated to 2.6.28 already, it can be worth checking out on SD Card maybe.

comment:59 Changed 5 years ago by werner

Matthias, I agree with the sentiment. That's also why I left the bug open
so long, on the off chance that the bug would somehow show up again. But it
didn't.

The change that made it "disappear" is the complete replacement of the SDIO
stack. That's some 10'000 lines of Atheros SDIO code that were removed from
our kernel and now we're using something like the same amount of different
code (the Linux SDIO stack) in its place.

And yes, I had 2.6.28 running for weeks without seeing this problem, while
it would quite reliably show up on 2.6.24.

Note: See TracTickets for help on using tickets.