[Intel-wired-lan] [PATCH] igb: Fix igb_down hung on surprise removal

Stefan Schaeckeler schaecsn at gmx.net
Thu Jun 6 15:46:58 UTC 2024


Hello Ying,

On 6/6/24 01:03, Ying Hsu wrote:
> On the CalDigit Thunderbolt Station 3 Plus, we've encountered an issue
> when the USB downstream display connection state changes. The
> problematic sequence observed is:
> ```
> igb_io_error_detected
> igb_down
> igb_io_error_detected
> igb_down
> ```
>
> The second igb_down call blocks at napi_synchronize.

From the backtrace in your commit message, I gain the impression you get a hotplug event for removing the ethernet device. From your commit message I gain the impression you get an AER as well which is handled in igb_io_error_detected()/igb_io_resume(). The problem lies IMHO in the interaction of both.


> Simply avoiding redundant igb_down calls makes the Ethernet of the thunderbolt dock unusable.

I'm not too sure if the current code is even perfect in your use-case. What happens when you get an AER on the ethernet device (without plugging it out at the same time)?

Can you try to AER inject a completion timeout into your ethernet device, similar how I showed it in my previous message? Just replace the bdf 09:00.0 with the bdf of your ethernet device. I expect a kernel crash like we see that on our embedded system.


> If Intel can identify when an Ethernet device is within a Thunderbolt
> tunnel, the patch can be more specific.

 Stefan


More information about the Intel-wired-lan mailing list