[Intel-wired-lan] [PATCH iwl-net v2 2/6] ice: protect XDP configuration with a mutex
Rout, ChandanX
chandanx.rout at intel.com
Thu Aug 8 02:16:35 UTC 2024
>-----Original Message-----
>From: Intel-wired-lan <intel-wired-lan-bounces at osuosl.org> On Behalf Of
>Zaremba, Larysa
>Sent: Wednesday, July 24, 2024 10:19 PM
>To: intel-wired-lan at lists.osuosl.org
>Cc: Drewek, Wojciech <wojciech.drewek at intel.com>; Fijalkowski, Maciej
><maciej.fijalkowski at intel.com>; Jesper Dangaard Brouer <hawk at kernel.org>;
>Daniel Borkmann <daniel at iogearbox.net>; Zaremba, Larysa
><larysa.zaremba at intel.com>; netdev at vger.kernel.org; John Fastabend
><john.fastabend at gmail.com>; Alexei Starovoitov <ast at kernel.org>; linux-
>kernel at vger.kernel.org; Eric Dumazet <edumazet at google.com>; Kubiak,
>Michal <michal.kubiak at intel.com>; Nguyen, Anthony L
><anthony.l.nguyen at intel.com>; Nambiar, Amritha
><amritha.nambiar at intel.com>; Keller, Jacob E <jacob.e.keller at intel.com>;
>Jakub Kicinski <kuba at kernel.org>; bpf at vger.kernel.org; Paolo Abeni
><pabeni at redhat.com>; David S. Miller <davem at davemloft.net>; Karlsson,
>Magnus <magnus.karlsson at intel.com>
>Subject: [Intel-wired-lan] [PATCH iwl-net v2 2/6] ice: protect XDP configuration
>with a mutex
>
>The main threat to data consistency in ice_xdp() is a possible asynchronous PF
>reset. It can be triggered by a user or by TX timeout handler.
>
>XDP setup and PF reset code access the same resources in the following
>sections:
>* ice_vsi_close() in ice_prepare_for_reset() - already rtnl-locked
>* ice_vsi_rebuild() for the PF VSI - not protected
>* ice_vsi_open() - already rtnl-locked
>
>With an unfortunate timing, such accesses can result in a crash such as the one
>below:
>
>[ +1.999878] ice 0000:b1:00.0: Registered XDP mem model
>MEM_TYPE_XSK_BUFF_POOL on Rx ring 14 [ +2.002992] ice 0000:b1:00.0:
>Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 18
>[Mar15 18:17] ice 0000:b1:00.0 ens801f0np0: NETDEV WATCHDOG: CPU: 38:
>transmit queue 14 timed out 80692736 ms [ +0.000093] ice 0000:b1:00.0
>ens801f0np0: tx_timeout: VSI_num: 6, Q 14, NTC: 0x0, HW_HEAD: 0x0, NTU:
>0x0, INT: 0x4000001 [ +0.000012] ice 0000:b1:00.0 ens801f0np0: tx_timeout
>recovery level 1, txqueue 14 [ +0.394718] ice 0000:b1:00.0: PTP reset
>successful [ +0.006184] BUG: kernel NULL pointer dereference, address:
>0000000000000098 [ +0.000045] #PF: supervisor read access in kernel mode [
>+0.000023] #PF: error_code(0x0000) - not-present page [ +0.000023] PGD 0
>P4D 0 [ +0.000018] Oops: 0000 [#1] PREEMPT SMP NOPTI [ +0.000023] CPU:
>38 PID: 7540 Comm: kworker/38:1 Not tainted 6.8.0-rc7 #1 [ +0.000031]
>Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS
>SE5C620.86B.02.01.0014.082620210524 08/26/2021 [ +0.000036]
>Workqueue: ice ice_service_task [ice] [ +0.000183] RIP:
>0010:ice_clean_tx_ring+0xa/0xd0 [ice] [...] [ +0.000013] Call Trace:
>[ +0.000016] <TASK>
>[ +0.000014] ? __die+0x1f/0x70
>[ +0.000029] ? page_fault_oops+0x171/0x4f0 [ +0.000029] ?
>schedule+0x3b/0xd0 [ +0.000027] ? exc_page_fault+0x7b/0x180 [ +0.000022]
>? asm_exc_page_fault+0x22/0x30 [ +0.000031] ? ice_clean_tx_ring+0xa/0xd0
>[ice] [ +0.000194] ice_free_tx_ring+0xe/0x60 [ice] [ +0.000186]
>ice_destroy_xdp_rings+0x157/0x310 [ice] [ +0.000151]
>ice_vsi_decfg+0x53/0xe0 [ice] [ +0.000180] ice_vsi_rebuild+0x239/0x540 [ice]
>[ +0.000186] ice_vsi_rebuild_by_type+0x76/0x180 [ice] [ +0.000145]
>ice_rebuild+0x18c/0x840 [ice] [ +0.000145] ? delay_tsc+0x4a/0xc0 [
>+0.000022] ? delay_tsc+0x92/0xc0 [ +0.000020] ice_do_reset+0x140/0x180
>[ice] [ +0.000886] ice_service_task+0x404/0x1030 [ice] [ +0.000824]
>process_one_work+0x171/0x340 [ +0.000685] worker_thread+0x277/0x3a0 [
>+0.000675] ? preempt_count_add+0x6a/0xa0 [ +0.000677] ?
>_raw_spin_lock_irqsave+0x23/0x50 [ +0.000679] ?
>__pfx_worker_thread+0x10/0x10 [ +0.000653] kthread+0xf0/0x120 [
>+0.000635] ? __pfx_kthread+0x10/0x10 [ +0.000616]
>ret_from_fork+0x2d/0x50 [ +0.000612] ? __pfx_kthread+0x10/0x10 [
>+0.000604] ret_from_fork_asm+0x1b/0x30 [ +0.000604] </TASK>
>
>The previous way of handling this through returning -EBUSY is not viable,
>particularly when destroying AF_XDP socket, because the kernel proceeds with
>removal anyway.
>
>There is plenty of code between those calls and there is no need to create a
>large critical section that covers all of them, same as there is no need to protect
>ice_vsi_rebuild() with rtnl_lock().
>
>Add xdp_state_lock mutex to protect ice_vsi_rebuild() and ice_xdp().
>
>Leaving unprotected sections in between would result in two states that have
>to be considered:
>1. when the VSI is closed, but not yet rebuild 2. when VSI is already rebuild, but
>not yet open
>
>The latter case is actually already handled through !netif_running() case, we
>just need to adjust flag checking a little. The former one is not as trivial, because
>between ice_vsi_close() and ice_vsi_rebuild(), a lot of hardware interaction
>happens, this can make adding/deleting rings exit with an error. Luckily, VSI
>rebuild is pending and can apply new configuration for us in a managed fashion.
>
>Therefore, add an additional VSI state flag ICE_VSI_REBUILD_PENDING to
>indicate that ice_xdp() can just hot-swap the program.
>
>Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
>Fixes: efc2214b6047 ("ice: Add support for XDP")
>Reviewed-by: Wojciech Drewek <wojciech.drewek at intel.com>
>Signed-off-by: Larysa Zaremba <larysa.zaremba at intel.com>
>---
> drivers/net/ethernet/intel/ice/ice.h | 2 ++
> drivers/net/ethernet/intel/ice/ice_lib.c | 26 +++++++++++++++--------
>drivers/net/ethernet/intel/ice/ice_main.c | 19 ++++++++++++-----
>drivers/net/ethernet/intel/ice/ice_xsk.c | 3 ++-
> 4 files changed, 35 insertions(+), 15 deletions(-)
>
Tested-by: Chandan Kumar Rout <chandanx.rout at intel.com> (A Contingent Worker at Intel)
More information about the Intel-wired-lan
mailing list