[Intel-wired-lan] [PATCH iwl-net v5 0/4] iavf: fix reset task deadlock
Ahmed Zaki
ahmed.zaki at intel.com
Mon May 8 13:17:45 UTC 2023
On 2023-05-05 03:37, Kamil Maziarz wrote:
> Changing the way we handle resets so that the callback operating under the RTNL lock will wait for the reset to
> finish, the rtnl_lock sensitive functions in reset flow will schedule the netdev update for later.
> This will eliminate circular dependency with the critical lock.
>
> Marcin Szycik (4):
> iavf: Wait for reset in callbacks which trigger it
> iavf: Don't lock rtnl_lock twice in reset
> Revert "iavf: Detach device during reset task"
> Revert "iavf: Do not restart Tx queues after reset task failure"
>
> drivers/net/ethernet/intel/iavf/iavf.h | 3 +
> .../net/ethernet/intel/iavf/iavf_ethtool.c | 31 +++--
> drivers/net/ethernet/intel/iavf/iavf_main.c | 112 +++++++++++++-----
> .../net/ethernet/intel/iavf/iavf_virtchnl.c | 1 +
> 4 files changed, 100 insertions(+), 47 deletions(-)
>
This series is generating the following errors when tested with the
script (repro.sh) from:
https://lore.kernel.org/netdev/20230503031541.27855-1-dinghui@sangfor.com.cn/
[325739.871905] ------------[ cut here ]------------
[325739.871911] New queues can't be registered after device unregistration.
[325739.871960] WARNING: CPU: 62 PID: 36764 at net/core/net-sysfs.c:1714
netdev_queue_update_kobjects+0x15d/0x170
[325739.871981] Modules linked in: iavf(OE) tls 8021q garp mrp stp llc
vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd qrtr rfkill sunrpc
vfat fat intel_rapl_msr intel_rapl_common intel_uncore_frequency
intel_uncore_frequency_common isst_if_common skx_edac nfit libnvdimm
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm raid1
irqbypass rapl intel_cstate ipmi_ssif iTCO_wdt intel_pmc_bxt
iTCO_vendor_support intel_uncore ib_uverbs mei_me acpi_ipmi ses i2c_i801
ib_core enclosure ioatdma ipmi_si mei joydev intel_pch_thermal i2c_smbus
lpc_ich dca ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad fuse
zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni ice
mpt3sas polyval_generic nvme ghash_clmulni_intel raid_class sha512_ssse3
nvme_core scsi_transport_sas nvme_common ast i40e(OE) wmi
[325739.872143] Unloaded tainted modules: iavf(OE):1 [last unloaded:
iavf(OE)]
[325739.872155] CPU: 62 PID: 36764 Comm: kworker/62:0 Tainted:
G OE 6.2.8-100.fc36.x86_64 #1
[325739.872162] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS
SE5C620.86B.02.01.0012.070720200218 07/07/2020
[325739.872167] Workqueue: events iavf_delayed_set_interrupt_capability
[iavf]
[325739.872203] RIP: 0010:netdev_queue_update_kobjects+0x15d/0x170
[325739.872214] Code: 89 74 1f 00 8d 45 01 39 44 24 04 74 07 89 c5 e9 0c
ff ff ff 44 8b 74 24 04 e9 70 ff ff ff 48 c7 c7 e0 59 97 88 e8 c3 14 3d
ff <0f> 0b e9 e1 fe ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90
[325739.872219] RSP: 0018:ffffaaba65a63e10 EFLAGS: 00010286
[325739.872225] RAX: 0000000000000000 RBX: ffff88dbbdfec000 RCX:
0000000000000000
[325739.872230] RDX: 0000000000000002 RSI: ffffffff888c1386 RDI:
00000000ffffffff
[325739.872235] RBP: 0000000000000001 R08: 0000000000000000 R09:
ffffaaba65a63ca0
[325739.872238] R10: 0000000000000003 R11: ffff893abff214a8 R12:
ffff8939ffdb7900
[325739.872242] R13: ffff88dbbdfec000 R14: 0000000000000000 R15:
ffff88dbbdfeca10
[325739.872246] FS: 0000000000000000(0000) GS:ffff8939ffd80000(0000)
knlGS:0000000000000000
[325739.872251] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[325739.872257] CR2: 000055b6e64100c8 CR3: 00000021d9010003 CR4:
00000000007706e0
[325739.872262] PKRU: 55555554
[325739.872265] Call Trace:
[325739.872269] <TASK>
[325739.872276] netif_set_real_num_tx_queues+0x6d/0x1f0
[325739.872289] iavf_delayed_set_interrupt_capability+0x31/0x40 [iavf]
[325739.872319] process_one_work+0x1c5/0x3c0
[325739.872331] worker_thread+0x4d/0x380
[325739.872336] ? _raw_spin_lock_irqsave+0x23/0x50
[325739.872347] ? __pfx_worker_thread+0x10/0x10
[325739.872352] kthread+0xe6/0x110
[325739.872360] ? __pfx_kthread+0x10/0x10
[325739.872369] ret_from_fork+0x29/0x50
[325739.872387] </TASK>
[325739.872389] ---[ end trace 0000000000000000 ]---
[325739.872397] ------------[ cut here ]------------
There are other warnings and errors but they seem consequences of the
above. I think you need to put some state checks or guards in
iavf_delayed_set_interrupt_capability() to make sure setting real_num of
queues is still valid.
Thanks,
Ahmed
More information about the Intel-wired-lan
mailing list