Ticket #2232 (in_testing defect)

Opened 4 years ago

Last modified 4 years ago

unplugging with gadgetfs causes panic: "softlockup: blocked tasks"

Reported by: lindi Owned by: openmoko-kernel
Priority: normal Milestone:
Component: kernel Version: unspecified
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Estimated Completion (week):
HasPatchForReview: no PatchReviewResult:
Reproducible:

Description

Steps to reproduce:
1) tar xzf testcase1.tar.gz
2) configure gadgetfs on freerunner:
2.1) sudo ifconfig usb0 down
2.2) sudo rmmod g_ether
2.3) sudo modprobe gadgetfs
2.4) sudo mkdir /dev/gadget
2.5) sudo mount -t gadgetfs gadgetfs /dev/gadget
3) run gadgetfs example on freerunner:
3.1) cd sink
3.2) gcc -Wall -g -o usb usb.c usbstring.c -lpthread
3.3) sudo ./usb -r 1.3 -p 1
4) verify that lsusb on PC side shows the same contents as sink-source-reader/sink.lsusb
5) run driver example on PC:
5.1) cd sink-source-reader
5.2) make
5.3) sudo ./reader1 0x0525 0xa4a4
6) unplug freerunner from PC and add a hub between freerunner and PC, do not kill "sink" process while doing this.
7) run lsusb again

Expected results:
7) gadgetfs survives unplugging and replugging usb cable and lsusb again shows our device

Actual results:
7) kernel on freerunner crashes with

INFO: task usb:20196 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
usb           D c03a6cbc     0 20196  20111
[<c03a6900>] (schedule+0x0/0x430) from [<bf01c504>] (ep_io+0x1fc/0x24c [gadgetfs])
[<bf01c308>] (ep_io+0x0/0x24c [gadgetfs]) from [<bf01cf1c>] (ep_write+0x104/0x120 [gadgetfs])
[<bf01ce18>] (ep_write+0x0/0x120 [gadgetfs]) from [<c00b21d4>] (vfs_write+0xb8/0x148)
 r8:c0034048 r7:00001c00 r6:c6e8df78 r5:40972200 r4:c6c590c0
[<c00b211c>] (vfs_write+0x0/0x148) from [<c00b232c>] (sys_write+0x4c/0x7c)
 r7:00000004 r6:c6c590c0 r5:00000000 r4:00000000
[<c00b22e0>] (sys_write+0x0/0x7c) from [<c0033ea0>] (ret_fast_syscall+0x0/0x2c)
 r6:4004b000 r5:40974490 r4:409744d8
no locks held by usb/20196.
Kernel panic - not syncing: softlockup: blocked tasks
fbcon_event_notify action=12, data=c782de04
jbt6k74 spi2.0: **** jbt6k74 unblank
BUG: scheduling while atomic: watchdog/0/4/0x00000003
6 locks held by watchdog/0/4:
 #0:  (tasklist_lock){....}, at: [<c00828c4>] check_hung_uninterruptible_tasks+0x4c/0x17c
 #1:  ((fb_notifier_list).rwsem){....}, at: [<c006f814>] __blocking_notifier_call_chain+0x38/0x6c
 #2:  (&new_bd->ops_lock){....}, at: [<c01c5f30>] fb_notifier_callback+0x38/0xb8
 #3:  (&new_bd->update_lock){....}, at: [<c01c5f78>] fb_notifier_callback+0x80/0xb8
 #4:  (&pcf->lock){....}, at: [<c01f7c9c>] pcf50633_reg_read+0x28/0x4c
 #5:  (&adap->bus_lock){....}, at: [<c027795c>] i2c_transfer+0x68/0x12c
Modules linked in: gadgetfs s3c2410_wdt [last unloaded: g_ether]

(please see attached kernel.log for complete output)

More info:
1) I am using andy-tracking b8b36e5ec3db71d5 on freerunner and regular linux 2.6.27.6 on a 32-bit x86 system.

Attachments

kernel.log (16.7 KB) - added by lindi 4 years ago.
testcase1.tar.gz (14.6 KB) - added by lindi 4 years ago.
this replaces the original testcase1, now it should have freerunner specific modifications
kernel2.log (4.6 KB) - added by lindi 4 years ago.
log of what happens if wait_event_interruptible is used in ep_io and ctrl-c is hit

Change History

Changed 4 years ago by lindi

comment:1 Changed 4 years ago by balajirrao

  • Status changed from new to in_testing

Looks like a problem I've seen before. Can you please try if this works ?

http://kerneltrap.org/index.php?q=mailarchive/linux-usb/2008/12/24/4485954

I wonder how you got ./usb working, with no modification to usb.c to add support for s3c2410_udc usb controller. This is what I had to do atleast to get the example gadgetfs program running. It needs to find the right endpoint files in the gadgetfs mountpoint.

comment:2 Changed 4 years ago by lindi

I did make modifications, search for "s3c2410_udc" in usb.c.

comment:3 Changed 4 years ago by balajirrao

Hum, I'm not able to see it - it seems that you've attached the one without modifications.

And, what about the problem ? Does it still show up with the patch applied ?

Changed 4 years ago by lindi

this replaces the original testcase1, now it should have freerunner specific modifications

comment:4 Changed 4 years ago by lindi

On my desktop computer all bulk reads in step 5.3 seem to fail with "-2". Therefore I need to test this patch on my laptop where sink-source-reader works.

comment:5 Changed 4 years ago by lindi

This time "lsusb -vvv" on laptop just took a really long time and eventually showed only the hub. Hitting ctrl-c on usb.c does not seem to kill it now:

lindi@ginger:~/scm/usb-root/sink$ sudo ./usb -r 1.3 -p 1
/dev/gadget/s3c2410_udc ep0 configured
serial="1.3"
^Creset source fd: Interrupted system call
^Creset sink fd: Interrupted system call
ep0 stall: Identifier removed
^Cep0 read after poll: Interrupted system call

Shouldn't there be a kernel space patch to fix the issue? Bad userland software should not be able to crash the kernel.

comment:6 Changed 4 years ago by lindi

After a while I got another crash with same backtrace:

4007.055000] gta02_udc_command S3C2410_UDC_P_DISABLE
4167.195000] INFO: task usb:3713 blocked for more than 120 seconds.
4167.200000] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
4167.205000] usb           D c03a6cbc     0  3713      1
4167.215000] [<c03a6900>] (schedule+0x0/0x430) from [<bf01c504>] (ep_io+0x1fc/0x24c [gadgetfs])
4167.220000] [<bf01c308>] (ep_io+0x0/0x24c [gadgetfs]) from [<bf01cf1c>] (ep_write+0x104/0x120 [gadgetfs])
4167.230000] [<bf01ce18>] (ep_write+0x0/0x120 [gadgetfs]) from [<c00b21d4>] (vfs_write+0xb8/0x148)
4167.245000]  r8:c0034048 r7:00001c00 r6:c6303f78 r5:40972200 r4:c4e416e0
4167.250000] [<c00b211c>] (vfs_write+0x0/0x148) from [<c00b232c>] (sys_write+0x4c/0x7c)
4167.255000]  r7:00000004 r6:c4e416e0 r5:00000000 r4:00000000
4167.260000] [<c00b22e0>] (sys_write+0x0/0x7c) from [<c0033ea0>] (ret_fast_syscall+0x0/0x2c)
4167.265000]  r6:4004b000 r5:40974490 r4:409744d8
4167.270000] no locks held by usb/3713.
4167.275000] Kernel panic - not syncing: softlockup: blocked tasks
4167.280000] Rebooting in 10 seconds..<3>RCU detected CPU 0 stall (t=834039/600 jiffies)
4170.195000] [<c03a6390>] (dump_stack+0x0/0x14) from [<c0085bd8>] (check_cpu_stall+0x80/0x100)
4170.195000] [<c0085b58>] (check_cpu_stall+0x0/0x100) from [<c0085c70>] (__rcu_pending+0x18/0xa8)
4170.195000]  r6:c782a080 r5:c04cb700 r4:c09a0924
4170.195000] [<c0085c58>] (__rcu_pending+0x0/0xa8) from [<c0085d18>] (rcu_pending+0x18/0x4c)
4170.195000]  r5:00000000 r4:00000000
4170.195000] [<c0085d00>] (rcu_pending+0x0/0x4c) from [<c005f944>] (update_process_times+0x40/0x64)
4170.195000] [<c005f904>] (update_process_times+0x0/0x64) from [<c0037e38>] (timer_tick+0x100/0x128)
4170.195000]  r6:00000000 r5:00000000 r4:c04c8d48
4170.195000] [<c0037d38>] (timer_tick+0x0/0x128) from [<c0044d90>] (s3c2410_timer_interrupt+0x10/0x18)
4170.195000]  r4:c04c6524
4170.195000] [<c0044d80>] (s3c2410_timer_interrupt+0x0/0x18) from [<c00830c8>] (handle_IRQ_event+0x2c/0x68)
4170.195000] [<c008309c>] (handle_IRQ_event+0x0/0x68) from [<c00846d8>] (handle_edge_irq+0x12c/0x178)
4170.195000]  r7:c04c9eac r6:c04c6524 r5:0000001e r4:c04c9e78
4170.195000] [<c00845ac>] (handle_edge_irq+0x0/0x178) from [<c0033064>] (__exception_text_start+0x64/0x84)
4170.195000]  r8:00000002 r7:00000003 r6:00004000 r5:00000000 r4:0000001e
4170.195000] [<c0033000>] (__exception_text_start+0x0/0x84) from [<c0033a38>] (__irq_svc+0x38/0xc8)
4170.195000] Exception stack(0xc782df10 to 0xc782df58)
4170.195000] df00:                                     00017a8f 00000000 000001e5 c04c9378 
4170.195000] df20: 00000000 00000ab5 c04f5040 000003e8 c04f5044 c04c936c c04c9378 c782df64 
4170.195000] df40: c782df58 c782df58 c03a6460 c01a4b0c 20000013 ffffffff                   
4170.195000]  r5:f4000000 r4:ffffffff
4170.195000] [<c03a63a4>] (panic+0x0/0x134) from [<c0082994>] (check_hung_uninterruptible_tasks+0x11c/0x17c)
4170.195000]  r3:00000001 r2:c09a08fc r1:00000001 r0:c04562ec
4170.195000] [<c0082878>] (check_hung_uninterruptible_tasks+0x0/0x17c) from [<c0082a9c>] (watchdog+0xa8/0xe4)
4170.195000] [<c00829f4>] (watchdog+0x0/0xe4) from [<c006ae68>] (kthread+0x60/0x94)
4170.195000] [<c006ae08>] (kthread+0x0/0x94) from [<c0058d80>] (do_exit+0x0/0x2a4)
4170.195000]  r6:00000000 r5:00000000 r4:00000000

comment:7 Changed 4 years ago by balajirrao

The thing is, usb.c goes to an uninterruptible sleep and there's no way for it to get out of it. That is what the log says - 'INFO: task usb:3713 blocked for more than 120 seconds.'

To fix this non-killability with c is what I sent the patch for! You're right, it looks like there are races somewhere in the code s3c2410-udc/gadgetfs code that prevents the task from getting out of sleep. It might also be preferable to use killable sleep here. I'll try to reproduce the problem and see what can be done.

comment:8 Changed 4 years ago by lindi

$ arm-linux-gnueabi-addr2line -e ./drivers/usb/gadget/gadgetfs.o 0x1504
/home/lindi/l/neolinux/drivers/usb/gadget/inode.c:359

shows the wait_event call indeed. I changed that to wait_event_interruptible but now I get a spinlock lockup (details in kernel2.log).

Changed 4 years ago by lindi

log of what happens if wait_event_interruptible is used in ep_io and ctrl-c is hit

comment:9 Changed 4 years ago by lindi

My guess here is: when epio_complete is called it tries to use req->context which points to completion struct that was allocated on stack in ep_io() and that has already returned? Any idea why the req is still in the queue even though ep_io calls usb_ep_dequeue?

comment:10 Changed 4 years ago by frankmpunkt

I have a working example for an USB Gadget using GadgetFS on Openmoko. Unfortunately it's not yet ready to be published. Did you check out the patch from here https://docs.openmoko.org/trac/ticket/2206 ? It's the only needed modification to the kernel source. I have modified the configuration in usb.c for s3c2410_udc the following:

        /* Samsung S3C24xx series, full speed only */                            
        if (stat (DEVNAME = "s3c2410_udc", &statb) == 0) {
                HIGHSPEED = 0;
                device_desc.bcdDevice = __constant_cpu_to_le16 (0x0104),

                fs_source_desc.bEndpointAddress = USB_DIR_IN | 2;
                EP_IN_NAME = "ep2-bulk";
                fs_sink_desc.bEndpointAddress = USB_DIR_OUT | 1;
                EP_OUT_NAME = "ep1-bulk";

                source_sink_intf.bNumEndpoints = 3;
                fs_status_desc.bEndpointAddress = USB_DIR_IN | 3;
                EP_STATUS_NAME = "ep3-bulk";
        }

This works, although i am not quite sure if device_desc.bcdDevice is correctly set. Hope this helps.

comment:11 Changed 4 years ago by frankmpunkt

You can find a working example of GadgetFS here http://sourceforge.net/projects/vsmartcard/. The Program ccid implements a smartcard reader and works fine on OpenMoko?? (OM1/OM2). Unplugging and replugging the usb cable works as expected. By the way, the patch suggested above is still required.

Note: See TracTickets for help on using tickets.