tack: <0>c2a17580 c18d41a0 c39bbd16 00000038 c18d41e0 00000000 d147c640 c29e0b68 c29e0b90 00000212 c29e0b68 c39932b2 c29e0bb0 c10736a0 c0119ef0 c399326c c10736a8 c10736a0 c10736b0 c0119f93 c011a06e 00000001 00000000 00000000 Call Trace: [] handle_minor_send+0x1af/0x241 [capi] [] recv_handler+0x46/0x5f [kernelcapi] [] run_workqueue+0x5e/0x8d [] recv_handler+0x0/0x5f [kernelcapi] [] worker_thread+0x0/0x10b [] worker_thread+0xdb/0x10b [] default_wake_function+0x0/0xc [] kthread+0x90/0xbc [] kthread+0x0/0xbc [] kernel_thread_helper+0x5/0xb Code: 7e 02 89 ee 89 f0 5a f7 d0 c1 f8 1f 5b 21 f0 5e 5f 5d c3 56 53 8b 48 50 89 d6 89 c3 8b 11 eb 2f 66 39 71 08 75 25 8b 41 04 8b 11 <89> 10 89 42 04 c7 01 00 01 10 00 89 c8 c7 41 04 00 02 20 00 e8 The interresting part of it is the "virtual address 00200200", which is LIST_POISON2. I thought about some race condition, but as this is an UP system, it leads to questions on how it can happen. If we look at EFLAGS: 00010202, we see that interrupts are enabled at the time of the crash (eflags & 0x200). Finally, I don't understand all the capi code, but I think that handle_minor_send() is racing somehow against capi_recv_message(), which call both capiminor_del_ack(). So if an IRQ occurs in the middle of capiminor_del_ack() and another instance of it is invoked, it leads to linked list corruption. I came up with the following patch. With this, I could not reproduce the crash anymore. Clearly, this is not the correct fix for the issue. As this seems to be some locking issue, there might be more locking issues in that code. For example, doesn't the whole struct capiminor have to be locked somehow? Cc: Carsten Paeth Cc: Kai Germaschewski Cc: Karsten Keil Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds <Ą