[Intel-wired-lan] BUG: KCSAN: data-race in e1000_clean_rx_irq+0x330/0x870
Jesse Brandeburg
jesse.brandeburg at intel.com
Wed Feb 9 00:33:40 UTC 2022
On 2/7/2022 8:08 AM, Paul Menzel wrote:
> Dear Linux folks,
>
>
> Running Linux 5.17-rc2+ with KCSAN in QEMU, it reports the race below:
>
> ```
> [ 0.000000] Linux version 5.17.0-rc2-00353-g90c9e950c0de
> (pmenzel at invidia.molgen.mpg.de) (gcc (GCC) 11.1.0, GNU ld (GNU Binutils)
> 2.37) #34 SMP PREEMPT Sun Feb 6 13:11:13 CET 2022
> [ 0.000000] Command line: root=/dev/vda1 rw quiet
> […]
> [ 410.295890]
> ==================================================================
> [ 410.297475] BUG: KCSAN: data-race in e1000_clean_rx_irq+0x330/0x870
>
> [ 410.299722] race at unknown origin, with read to 0xffff8a554584d3ec
> of 1 bytes by interrupt on cpu 0:
> [ 410.301524] e1000_clean_rx_irq+0x330/0x870
> [ 410.301534] e1000_clean+0x4a5/0xc40
> [ 410.301541] __napi_poll+0x5c/0x280
> [ 410.301550] net_rx_action+0x4ff/0x5b0
> [ 410.301559] __do_softirq+0xe4/0x2d9
> [ 410.301567] run_ksoftirqd+0x21/0x30
> [ 410.301577] smpboot_thread_fn+0x26b/0x360
> [ 410.301595] kthread+0x16d/0x1a0
> [ 410.301604] ret_from_fork+0x22/0x30
>
> [ 410.302478] value changed: 0x00 -> 0x07
>
> [ 410.304564] Reported by Kernel Concurrency Sanitizer on:
> [ 410.305757] CPU: 0 PID: 12 Comm: ksoftirqd/0 Not tainted
> 5.17.0-rc2-00353-g90c9e950c0de #34
> [ 410.305776] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 1.15.0-1 04/01/2014
> [ 410.305788]
> ==================================================================
> ```
>
> Please find the output of `dmesg` attached.
>
>
> Kind regards,
>
> Paul
Thanks for the bug report, I don't even have any e1000 these days to
test on, so I had to install a Virtual machine.
This is probably because we access rx_desc->status in a while loop and
then try to access it again after dma_rmb() and it's changed. This is
kind of expected to happen, but the clean_rx routine can be updated to
be more like our newer drivers, and should hopefully avoid the data
dependency.
I have a patch to try that out, I'll see if I can get it to run in my
VM. If it gets too messy, I may just send the patch to you/this list and
see if others can give it a go to indicate if I broke something.
The code is a bit messy on purpose but has shown itself to be resilient
on most platforms we've tried it on all these years. However I'd like
for us to not be discussing this issue for years going forward, so I'll
spend a little time on it.
Jesse
More information about the Intel-wired-lan
mailing list