[Intel-wired-lan] intermittent ixgbe transmit queue timeouts in v5.18 kernels

Switzer, David david.switzer at intel.com
Tue Jun 7 21:22:38 UTC 2022


>-----Original Message-----
>From: Intel-wired-lan <intel-wired-lan-bounces at osuosl.org> On Behalf Of
>Jeff Layton
>Sent: Thursday, June 2, 2022 2:38 PM
>To: intel-wired-lan at lists.osuosl.org; Nguyen, Anthony L
><anthony.l.nguyen at intel.com>; Brandeburg, Jesse
><jesse.brandeburg at intel.com>
>Cc: Ilya Dryomov <idryomov at gmail.com>; Xiubo Li <xiubli at redhat.com>;
>Venky Shankar <vshankar at redhat.com>
>Subject: [Intel-wired-lan] intermittent ixgbe transmit queue timeouts in v5.18
>kernels
>
>The Ceph project test lab has a fairly large cluster of machines with ixgbe
>adapters:
>
>    03:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+
>Network Connection (rev 01)
>
We are attempting to reproduce your issue, and the output from lspci -s 03:00.0
-vv would help us make sure we're looking at the exact adapter that the issue is
Being seen on.

>Recently, we've started getting intermittent tx queue timeouts with these
>machines. One of them is reported here:
>
>    https://tracker.ceph.com/issues/55823
>
>Usually this happens when we're trying to do a sync, and there is a flurry of
>transmission activity. Afterward we see a lot of fallout in ceph culminating in
>softlockups.
>
>The kernels we're testing have some patches that are not yet in mainline, but
>mostly they are confined to net/ceph and fs/ceph, and shouldn't really affect
>hw drivers.
>
>The problem manifested pretty regularly during v5.18 and then I didn't see it
>for a while. I had figured it was something that had been fixed, but I think it
>was just "luck".
>
>I attempted a bisect a while back, and ruled out recent ceph changes as the
>issue. Unfortunately, I wasn't able to get to a conclusive patch that broke it,
>but I think it likely crept in during the initial merge window for v5.18 (pre-rc1).
>
>One other oddity: the test lab often installs bleeding-edge kernels on old
>distros (RHEL8 and Ubuntu from similar era). Is it possible that the firmware
>that ships with these older distros is not suitable for the more recent driver in
>v5.18 ?
>
Thank you for this information, we'll look into it if we're having trouble
reproducing the issue!


>Any thoughts or suggestions on things we can do to fix this?
>
Nothing yet, but we'll be sure to let you know when we find it.

Have a great day!
Dave Switzer <david.switzer at intel.com>

>Thanks,
>--
>Jeff Layton <jlayton at kernel.org>
>_______________________________________________
>Intel-wired-lan mailing list
>Intel-wired-lan at osuosl.org
>https://lists.osuosl.org/mailman/listinfo/intel-wired-lan


More information about the Intel-wired-lan mailing list