[Intel-wired-lan] [PATCH net v2] ice: avoid bonding causing auxiliary plug/unplug under RTNL lock
Petr Oros
poros at redhat.com
Wed Mar 1 13:12:54 UTC 2023
Ertman, David M píše v Pá 24. 02. 2023 v 18:33 +0000:
> > -----Original Message-----
> > From: Linus Heckemann <git at sphalerite.org>
> > Sent: Thursday, February 16, 2023 9:24 AM
> > To: Ertman, David M <david.m.ertman at intel.com>; intel-wired-
> > lan at lists.osuosl.org
> > Cc: Jaroslav Pulchart <jaroslav.pulchart at gooddata.com>
> > Subject: Re: [Intel-wired-lan] [PATCH net v2] ice: avoid bonding
> > causing
> > auxiliary plug/unplug under RTNL lock
> >
> > Dave Ertman <david.m.ertman at intel.com> writes:
> > > RDMA is not supported in ice on a PF that has been added to a
> > > bonded
> > > interface. To enforce this, when an interface enters a bond, we
> > > unplug
> > > the auxiliary device that supports RDMA functionality. This
> > > unplug
> > > currently happens in the context of handling the netdev bonding
> > > event.
> > > This event is sent to the ice driver under RTNL context. This is
> > > causing
> > > a deadlock where the RDMA driver is waiting for the RTNL lock to
> > > complete
> > > the removal.
> > >
> > > Defer the unplugging/re-plugging of the auxiliary device to the
> > > service
> > > task so that it is not performed under the RTNL lock context.
> > >
> > > Reported-by: Jaroslav Pulchart <jaroslav.pulchart at gooddata.com>
> > > Link: https://lore.kernel.org/linux-rdma/68b14b11-d0c7-65c9-4eeb-
> > 0487c95e395d at leemhuis.info/
> > > Fixes: 5cb1ebdbc434 ("ice: Fix race condition during interface
> > > enslave")
> > > Fixes: 425c9bd06b7a ("RDMA/irdma: Report the correct link speed")
> > > Signed-off-by: Dave Ertman <david.m.ertman at intel.com>
> > > ---
> > > Changes since v1:
> > > Reversed order of bit processing in ice_service_task for
> > > PLUG/UNPLUG
> >
> > Hi Dave,
> >
> > Thanks for your continued work on this! We've tested this on a
> > system
> > affected by the original issue (with 8086:1593 cards) and, unlike
> > v1 of
> > the patch, it appears not to resolve it:
>
> Hi Linus,
>
> This error confuses me. The only difference between v1 and v2 of
> this patch
> is the order in which we process state bits in the service task
> thread. They are
> still being processed outside of RTNL context.
>
> Can you provide the steps you used to reproduce this issue?
Hi all,
I have tested this fix and i can confirm that the issue is resolved
with v2.
With patch (v1 or v2)
$ modprobe -v bonding mode=1 miimon=100 max_bonds=1
insmod /lib/modules/6.2.0+/kernel/net/tls/tls.ko
insmod /lib/modules/6.2.0+/kernel/drivers/net/bonding/bonding.ko
max_bonds=0 mode=1 miimon=100 max_bonds=1
$ ip link set up bond0
$ ifenslave bond0 enp65s0f0np0 enp65s0f1np1
$
Without patch
$ modprobe -v bonding mode=1 miimon=100 max_bonds=1
insmod /lib/modules/6.2.0+/kernel/net/tls/tls.ko
insmod /lib/modules/6.2.0+/kernel/drivers/net/bonding/bonding.ko
max_bonds=0 mode=1 miimon=100 max_bonds=1
$ ip link set up bond0
$ ifenslave bond0 enp65s0f0np0 enp65s0f1np1
^^^^^^ HANG
Regards,
Petr
>
> Thanks,
> DaveE
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan at osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
>
More information about the Intel-wired-lan
mailing list