[Intel-wired-lan] [PATCH net v4 2/4] iavf: Don't lock rtnl_lock twice in reset
Jacob Keller
jacob.e.keller at intel.com
Mon Apr 24 17:26:52 UTC 2023
On 4/24/2023 5:27 AM, Wesierski, DawidX wrote:
> This function schedules itself in case it is impossible to update the
> netdev at a particular time.
> This scenario could occur when multiple operations require the rtnl_lock
> simultaneously.
> By implementing it this way, the function seems to introduce unnecessary
> operations during
> testing with operations requiring reset in quick sucession. However, my
> thought process was to
> make it more error-proof in normal use cases.
>
>
> Do you think this information should be included inside the comment or
> commit message?
>
Just take rtnl_lock. The only reason we *can't* take rtnl_lock in the
reset flow is because we're holding another lock in that context which
leads to a hard lock where:
application thread A:
<netdev call>
rtnl_lock
<driver handler>
driver_lock
...
reset thread B:
<reset request starts>
driver_lock
<perform reset>
rtnl_lock
...
The reason this is a problem is that the driver is already in a critical
section with driver_lock, but if it tries to enter critical section for
RTNL, it hard lock waiting for thread A to release RTNL, but thread A is
waiting for reset thread B to release driver lock, so we get deadlock.
In the new case, reset thread B looks like this:
reset thread B:
<reset request starts>
driver_lock
<perform reset>
schedule thread C
release_driver_lock
thread C:
rtnl_lock
finish netdev task
rtnl_unlock
In this new world , thread C doesn't hold any driver critical sections,
and it is safe to just take RTNL lock.
It is also potentially more fair and less CPU waste than try_lock +
reschedule. The scheduler subsystem can sleep the thread until the lock
becomes available.
The complexity of the reschedule is only necessary within the critical
section of the driver lock used for reset handling.
Thanks,
Jake
More information about the Intel-wired-lan
mailing list