[Intel-wired-lan] [PATCH iwl-net v5 0/4] iavf: fix reset task deadlock

Ahmed Zaki ahmed.zaki at intel.com
Mon May 8 13:17:45 UTC 2023


On 2023-05-05 03:37, Kamil Maziarz wrote:
> Changing the way we handle resets so that the callback operating under the RTNL lock will wait for the reset to
> finish, the rtnl_lock sensitive functions in reset flow will schedule the netdev update for later.
> This will eliminate circular dependency with the critical lock.
>
> Marcin Szycik (4):
>    iavf: Wait for reset in callbacks which trigger it
>    iavf: Don't lock rtnl_lock twice in reset
>    Revert "iavf: Detach device during reset task"
>    Revert "iavf: Do not restart Tx queues after reset task failure"
>
>   drivers/net/ethernet/intel/iavf/iavf.h        |   3 +
>   .../net/ethernet/intel/iavf/iavf_ethtool.c    |  31 +++--
>   drivers/net/ethernet/intel/iavf/iavf_main.c   | 112 +++++++++++++-----
>   .../net/ethernet/intel/iavf/iavf_virtchnl.c   |   1 +
>   4 files changed, 100 insertions(+), 47 deletions(-)
>

This series is generating the following errors when tested with the 
script (repro.sh) from:

https://lore.kernel.org/netdev/20230503031541.27855-1-dinghui@sangfor.com.cn/


[325739.871905] ------------[ cut here ]------------
[325739.871911] New queues can't be registered after device unregistration.
[325739.871960] WARNING: CPU: 62 PID: 36764 at net/core/net-sysfs.c:1714 
netdev_queue_update_kobjects+0x15d/0x170
[325739.871981] Modules linked in: iavf(OE) tls 8021q garp mrp stp llc 
vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd qrtr rfkill sunrpc 
vfat fat intel_rapl_msr intel_rapl_common intel_uncore_frequency 
intel_uncore_frequency_common isst_if_common skx_edac nfit libnvdimm 
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm raid1 
irqbypass rapl intel_cstate ipmi_ssif iTCO_wdt intel_pmc_bxt 
iTCO_vendor_support intel_uncore ib_uverbs mei_me acpi_ipmi ses i2c_i801 
ib_core enclosure ioatdma ipmi_si mei joydev intel_pch_thermal i2c_smbus 
lpc_ich dca ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad fuse 
zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni ice 
mpt3sas polyval_generic nvme ghash_clmulni_intel raid_class sha512_ssse3 
nvme_core scsi_transport_sas nvme_common ast i40e(OE) wmi
[325739.872143] Unloaded tainted modules: iavf(OE):1 [last unloaded: 
iavf(OE)]
[325739.872155] CPU: 62 PID: 36764 Comm: kworker/62:0 Tainted: 
G           OE      6.2.8-100.fc36.x86_64 #1
[325739.872162] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS 
SE5C620.86B.02.01.0012.070720200218 07/07/2020
[325739.872167] Workqueue: events iavf_delayed_set_interrupt_capability 
[iavf]
[325739.872203] RIP: 0010:netdev_queue_update_kobjects+0x15d/0x170
[325739.872214] Code: 89 74 1f 00 8d 45 01 39 44 24 04 74 07 89 c5 e9 0c 
ff ff ff 44 8b 74 24 04 e9 70 ff ff ff 48 c7 c7 e0 59 97 88 e8 c3 14 3d 
ff <0f> 0b e9 e1 fe ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90
[325739.872219] RSP: 0018:ffffaaba65a63e10 EFLAGS: 00010286
[325739.872225] RAX: 0000000000000000 RBX: ffff88dbbdfec000 RCX: 
0000000000000000
[325739.872230] RDX: 0000000000000002 RSI: ffffffff888c1386 RDI: 
00000000ffffffff
[325739.872235] RBP: 0000000000000001 R08: 0000000000000000 R09: 
ffffaaba65a63ca0
[325739.872238] R10: 0000000000000003 R11: ffff893abff214a8 R12: 
ffff8939ffdb7900
[325739.872242] R13: ffff88dbbdfec000 R14: 0000000000000000 R15: 
ffff88dbbdfeca10
[325739.872246] FS:  0000000000000000(0000) GS:ffff8939ffd80000(0000) 
knlGS:0000000000000000
[325739.872251] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[325739.872257] CR2: 000055b6e64100c8 CR3: 00000021d9010003 CR4: 
00000000007706e0
[325739.872262] PKRU: 55555554
[325739.872265] Call Trace:
[325739.872269]  <TASK>
[325739.872276]  netif_set_real_num_tx_queues+0x6d/0x1f0
[325739.872289]  iavf_delayed_set_interrupt_capability+0x31/0x40 [iavf]
[325739.872319]  process_one_work+0x1c5/0x3c0
[325739.872331]  worker_thread+0x4d/0x380
[325739.872336]  ? _raw_spin_lock_irqsave+0x23/0x50
[325739.872347]  ? __pfx_worker_thread+0x10/0x10
[325739.872352]  kthread+0xe6/0x110
[325739.872360]  ? __pfx_kthread+0x10/0x10
[325739.872369]  ret_from_fork+0x29/0x50
[325739.872387]  </TASK>
[325739.872389] ---[ end trace 0000000000000000 ]---
[325739.872397] ------------[ cut here ]------------


There are other warnings and errors but they seem consequences of the 
above. I think you need to put some state checks or guards in 
iavf_delayed_set_interrupt_capability() to make sure setting real_num of 
queues is still valid.


Thanks,

Ahmed



More information about the Intel-wired-lan mailing list