[Intel-wired-lan] [PATCH v2] ice: wait for reset completion in ice_resume()
Paul Menzel
pmenzel at molgen.mpg.de
Tue Apr 28 09:17:59 UTC 2026
Dear Aaron,
Thank you for your reply.
Am 28.04.26 um 09:53 schrieb Aaron Ma:
> On Mon, Apr 27, 2026 at 6:13 PM Paul Menzel wrote:
>> Am 24.04.26 um 05:03 schrieb Aaron Ma via Intel-wired-lan:
>>> ice_resume() schedules an asynchronous PF reset and returns
>>> immediately. The reset runs later in ice_service_task(). If
>>> userspace tries to bring up the net device before the reset
>>> finishes, ice_open() fails with -EBUSY:
>>>
>>> ice_resume()
>>> ice_schedule_reset() # sets ICE_PFR_REQ, returns
>>> ...
>>> ice_open()
>>> ice_is_reset_in_progress() # ICE_PFR_REQ still set, -EBUSY
>>> ...
>>> ice_service_task()
>>> ice_do_reset()
>>> ice_rebuild() # clears ICE_PFR_REQ, too late
>>>
>>> Reproduced on E800 series NICs during suspend/resume with irdma
>>> enabled, where the aux device probe widens the race window.
>>
>> Please document, how you reproduced it, and also paste possible messages
>> by Linux or NetworkManager, so that people can easily search for the commit.
>
> The error message is "can't open net device while reset is in progress"
> I can add it in v3 if you like.
Yes, that’d be great.
> > > Wait for the reset to complete before returning from ice_resume().
>>
>> Please mention the delay length in the commit message.
>
> The timeout is 10 * HZ (10 seconds), matching the existing usage in
> ice_devlink_info_get() for the same ice_wait_for_reset() call. In
> practice the wait completes in ~300ms.
I often wonder, where the delay values come from. Maybe mention, that
you copied it.
>>> Fixes: 769c500dcc1e ("ice: Add advanced power mgmt for WoL")
>>> Cc: stable at vger.kernel.org
>>> Signed-off-by: Aaron Ma <aaron.ma at canonical.com>
>>> ---
>>> v2: reword comment to clarify best-effort semantics (Kohei Enju)
>>>
>>> drivers/net/ethernet/intel/ice/ice_main.c | 9 +++++++++
>>> 1 file changed, 9 insertions(+)
>>>
>>> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
>>> index 5f92377d4dfc2..a81eb21ea87c1 100644
>>> --- a/drivers/net/ethernet/intel/ice/ice_main.c
>>> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
>>> @@ -5635,6 +5635,15 @@ static int ice_resume(struct device *dev)
>>> /* Restart the service task */
>>> mod_timer(&pf->serv_tmr, round_jiffies(jiffies + pf->serv_tmr_period));
>>>
>>> + /* Best-effort wait for the scheduled reset to finish so that the
>>> + * device is operational before returning. Without this, userspace
>>> + * (e.g. NetworkManager) may try to open the net device while the
>>> + * asynchronous reset is still in progress, hitting -EBUSY.
>>> + */
>>> + ret = ice_wait_for_reset(pf, 10 * HZ);
>>
>> Why not pass a delay in micro/milliseconds?
>
> ice_wait_for_reset() takes jiffies — that's the existing API.
It’s recommended to use `msecs_to_jiffies()` to make it HZ invariant.
>>> + if (ret)
>>> + dev_err(dev, "Wait for reset failed during resume: %d\n", ret);
>>
>> Mention the delay?
>
> Good point. I'll include the timeout in the error message in v3.
Awesome.
[…]
Thanks,
Paul
More information about the Intel-wired-lan
mailing list