Ticket #2327 (new defect)

Opened 5 years ago

Last modified 5 years ago

wifi sometimes fails to start with "ar6000_activate: Failed to activate -4" when DEBUG_KERNEL/PREEMPT are disabled

Reported by: lindi Owned by: openmoko-kernel
Priority: normal Milestone:
Component: kernel Version:
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Estimated Completion (week):
HasPatchForReview: no PatchReviewResult:
Reproducible: sometimes

Description

Steps to reproduce:
1) echo s3c2440-sdi > /sys/bus/platform/drivers/s3c2440-sdi/bind
2) ip link show

Expected results:
2) the wifi interface appears in the list

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN

link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

2: usb: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000

link/ether 00:1f:11:01:1f:6b brd ff:ff:ff:ff:ff:ff

33: wlan: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000

link/ether 00:12:cf:8e:ee:09 brd ff:ff:ff:ff:ff:ff

Actual results:
2) the wifi interface does not appear in the list

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN

link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

2: usb: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000

link/ether 00:1f:11:01:1f:6b brd ff:ff:ff:ff:ff:ff

More info:
1) This started to happen when I upgraded from andy-tracking a3587e4ed77974ad to andy-tracking a15608f241a40b41 and disabled CONFIG_DEBUG_KERNEL and CONFIG_PREEMPT.
2) I have hit the bug now three times and used wifi succesfully at least 20 times so there is probably some race somewhere that causes it to only happen sometimes.
3) dmesg from working case:

72651.735000] s3c2440-sdi s3c2440-sdi: powered down.
72660.080000] s3c2440-sdi s3c2440-sdi: host detect has no irq available
72660.080000] mapped channel 0 to 0
72660.085000] s3c2440-sdi s3c2440-sdi: powered down.
72660.090000] s3c2440-sdi s3c2440-sdi: initialisation done.
72660.100000] s3c2440-sdi s3c2440-sdi: running at 0kHz (requested: 0kHz).
72660.110000] s3c2440-sdi s3c2440-sdi: running at 196kHz (requested: 195kHz).
72660.110000] s3c2440-sdi s3c2440-sdi: running at 196kHz (requested: 195kHz).
72660.125000] s3c2440-sdi s3c2440-sdi: running at 196kHz (requested: 195kHz).
72660.130000] s3c2440-sdi s3c2440-sdi: running at 196kHz (requested: 195kHz).
72660.150000] s3c2440-sdi s3c2440-sdi: running at 196kHz (requested: 195kHz).
72660.160000] mmc1: queuing CIS tuple 0x01 length 3
72660.175000] mmc1: queuing CIS tuple 0x1a length 5
72660.180000] mmc1: queuing CIS tuple 0x1b length 8
72660.185000] s3c2440-sdi s3c2440-sdi: running at 25000kHz (requested: 25000kHz).
72660.190000] s3c2440-sdi s3c2440-sdi: running at 25000kHz (requested: 25000kHz).
72660.200000] mmc1: queuing CIS tuple 0x80 length 1
72660.200000] mmc1: queuing CIS tuple 0x81 length 1
72660.205000] mmc1: queuing CIS tuple 0x82 length 1
72660.210000] mmc1: new SDIO card at address 0001
72661.275000] BMI Get Target Info: Exit (ver: 0x20000059 type: 0x1)
72661.330000] AR6000 Reg Code = 0x40000060
72661.355000] udev: renamed network interface eth0 to wlan

4) dmesg from non-working case:

72138.450000] s3c2440-sdi s3c2440-sdi: host detect has no irq available
72138.450000] mapped channel 0 to 0
72138.450000] s3c2440-sdi s3c2440-sdi: powered down.
72138.455000] s3c2440-sdi s3c2440-sdi: initialisation done.
72138.465000] s3c2440-sdi s3c2440-sdi: running at 0kHz (requested: 0kHz).
72138.475000] s3c2440-sdi s3c2440-sdi: running at 196kHz (requested: 195kHz).
72138.480000] s3c2440-sdi s3c2440-sdi: running at 196kHz (requested: 195kHz).
72138.485000] s3c2440-sdi s3c2440-sdi: running at 196kHz (requested: 195kHz).
72138.510000] s3c2440-sdi s3c2440-sdi: running at 196kHz (requested: 195kHz).
72138.530000] s3c2440-sdi s3c2440-sdi: running at 196kHz (requested: 195kHz).
72138.540000] mmc1: queuing CIS tuple 0x01 length 3
72138.555000] mmc1: queuing CIS tuple 0x1a length 5
72138.560000] mmc1: queuing CIS tuple 0x1b length 8
72138.560000] s3c2440-sdi s3c2440-sdi: running at 25000kHz (requested: 25000kHz).
72138.565000] s3c2440-sdi s3c2440-sdi: running at 25000kHz (requested: 25000kHz).
72138.570000] mmc1: queuing CIS tuple 0x80 length 1
72138.575000] mmc1: queuing CIS tuple 0x81 length 1
72138.580000] mmc1: queuing CIS tuple 0x82 length 1
72138.585000] mmc1: new SDIO card at address 0001
72138.600000] sdio_ar6000 mmc1:0001:1: deviceInsertedHandler: -1
72138.600000] sdio_ar6000 mmc1:0001:1: kthread_stop (ar6000_io): -4
72138.605000] ar6000_activate: Failed to activate -4
72138.610000] sdio_ar6000: probe of mmc1:0001:1 failed with error -4

Attachments

testcase1.config (49.8 KB) - added by lindi 5 years ago.
kernel configuration to trigger the issue
delay_patch.diff (363 bytes) - added by gena2x 5 years ago.

Change History

Changed 5 years ago by lindi

kernel configuration to trigger the issue

Changed 5 years ago by gena2x

comment:1 in reply to: ↑ description Changed 5 years ago by gena2x

I reviewed this problem, first i got as close to cause of problem as following:

The s2c sdi driver is getting syncronous (not from any thread) write request from ar6000 driver, but get "error: data timeout" from host sdio. Request seem to disable interrupts on device ( DevDisableInterrupts function). As this request is part of initializations, whole initialization fails.

This happens in process of initialization. enabling any debug messages eliminates this error, as they seem add required delay.

Real case of error is not available on level of as some different errors are hidden under 'EILSEQ' Illegal byte sequence. This cause is available on lowest level of ar6000 - process_request function which calls sdio level functions.

I saw two ways to fix problem:

  1. for the first request, retry the request if is fails.
  2. add some delay to initalization sequence.

way (1) look better, in fact, first i added code to process_request to retry attempt, and that seem worked, but it turned out that this require relatively large patch to find place where to store the state that this request is first. Other variant was to retry any write request, which fails with same error and print some kernel message to check that we have no problems with such requests, and keeping in mind that for some kinds of requests return value is not even checked. But is bad solution, as where is some other kind of errors where retry is not need.

way (2) this way was considered as solution at all because of following comment in code:

  • - Atheros call deviceInsertedHandler from a thread spawned off the probe or
  • device insertion function. The original explanation for the Atheros SDIO
  • stack said that this is done because a delay is needed to let the chip
  • complete initialization. There is indeed a one second delay in the thread. *
  • The Atheros Linux SDIO HIF removes the delay and only retains the thread.
  • Experimentally removing the thread didn't show any conflicts, so let's get
  • rid of it for good.

So, as we are not willing one second as delay - I added just 10millisecond delay, and this works fine. Why 10milliseconds? it seems large enough for device (according to tests) and negligible for suspend/resume and normal boot. Review and comments are welcome.

So, I propose simple patch attached to fix problem. It have advantage that it's simple and will not harm anything in any case.

Until any other solution will be found or review done, i'll include it to my kernel build, which is available at: http://www.bsdmn.com/openmoko/kernel/optim-stage2/ . This kernel include nodebug/nopreempt options.

comment:2 Changed 5 years ago by lindi

I tested delay_patch.diff. After a while I got again to the buggy state where wifi is powered on but no interface exists. This time the log does not show error messages from kernel but only has dhcp client complaining (I did verify later that "ip link" indeed does not show the interface):

kernel: [ 2137.870000] s3c2440-sdi s3c2440-sdi: host detect has no irq available
kernel: [ 2137.875000] mapped channel 0 to 0
kernel: [ 2137.875000] s3c2440-sdi s3c2440-sdi: powered down.
kernel: [ 2137.880000] s3c2440-sdi s3c2440-sdi: initialisation done.
kernel: [ 2137.895000] s3c2440-sdi s3c2440-sdi: running at 0kHz (requested: 0kHz).
kernel: [ 2137.895000] s3c2440-sdi s3c2440-sdi: running at 196kHz (requested: 195kHz).
kernel: [ 2137.905000] s3c2440-sdi s3c2440-sdi: running at 196kHz (requested: 195kHz).
kernel: [ 2137.915000] s3c2440-sdi s3c2440-sdi: running at 196kHz (requested: 195kHz).
kernel: [ 2137.920000] s3c2440-sdi s3c2440-sdi: running at 196kHz (requested: 195kHz).
kernel: [ 2137.945000] s3c2440-sdi s3c2440-sdi: running at 196kHz (requested: 195kHz).
kernel: [ 2137.955000] mmc1: queuing CIS tuple 0x01 length 3
kernel: [ 2137.975000] mmc1: queuing CIS tuple 0x1a length 5
kernel: [ 2137.985000] mmc1: queuing CIS tuple 0x1b length 8
kernel: [ 2137.985000] s3c2440-sdi s3c2440-sdi: running at 25000kHz (requested: 25000kHz).
kernel: [ 2137.990000] s3c2440-sdi s3c2440-sdi: running at 25000kHz (requested: 25000kHz).
kernel: [ 2138.005000] mmc1: queuing CIS tuple 0x80 length 1
kernel: [ 2138.005000] mmc1: queuing CIS tuple 0x81 length 1
kernel: [ 2138.010000] mmc1: queuing CIS tuple 0x82 length 1
kernel: [ 2138.015000] mmc1: new SDIO card at address 0001
dhclient: Internet Systems Consortium DHCP Client V3.1.3
dhclient: Copyright 2004-2009 Internet Systems Consortium.
dhclient: All rights reserved.
dhclient: For info, please visit https://www.isc.org/software/dhcp/
dhclient: 
dhclient: Bind socket to interface: No such device

Also I am using andy-tracking a3587e4ed77974ad this time since I'm wondering if newer andy-tracking version introduced a TS bug or not.

comment:3 Changed 5 years ago by lindi

gena, can you make the sleep a module option so that people can experiment with longer delays and see if it helps?

Note: See TracTickets for help on using tickets.