[Intel-wired-lan] [next PATCH S75-V2 04/12] i40e: synchronize nvmupdate command and adminq subtask

Shannon Nelson shannon.nelson at oracle.com
Wed Jul 12 15:28:23 UTC 2017



On 7/11/2017 5:01 AM, Alice Michael wrote:
> From: Sudheer Mogilappagari <sudheer.mogilappagari at intel.com>
> 
> During NVM update, state machine gets into unrecoverable state because
> i40e_clean_adminq_subtask can get scheduled after the admin queue
> command but before other state variables are updated. This causes
> incorrect input to i40e_nvmupd_check_wait_event and state transitions
> don't happen.
> 
> This issue existed before but surfaced after commit 373149fc99a0
> ("i40e: Decrease the scope of rtnl lock")

I had a feeling that patch might bite you.  I suspect there may still be 
some other occasional timing issues cropping up.

> 
> This fix adds locking around admin queue command and update of
> state variables so that adminq_subtask will have accurate information
> whenever it gets scheduled.
> 
> Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari at intel.com>
> ---
>   drivers/net/ethernet/intel/i40e/i40e_nvm.c | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_nvm.c b/drivers/net/ethernet/intel/i40e/i40e_nvm.c
> index 17607a2..04f2192 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_nvm.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_nvm.c
> @@ -753,6 +753,11 @@ i40e_status i40e_nvmupd_command(struct i40e_hw *hw,
>   		hw->nvmupd_state = I40E_NVMUPD_STATE_INIT;
>   	}
>   
> +	/* Acquire lock to prevent race condition where adminq_task
> +	 * can execute after i40e_nvmupd_nvm_read/write but before state
> +	 * variables (nvm_wait_opcode, nvm_release_on_done) are updated
> +	 */
> +	mutex_lock(&hw->aq.arq_mutex);

Have you done any testing to see how long you might end up holding this 
lock?  I suppose it is limited by the max length of the synchronous AQ 
polling timeout.  You might mention that maximum time limitation here or 
in the commit notes, since this is a mutex over a possibly long I/O 
operation.

>   	switch (hw->nvmupd_state) {
>   	case I40E_NVMUPD_STATE_INIT:
>   		status = i40e_nvmupd_state_init(hw, cmd, bytes, perrno);
> @@ -788,6 +793,7 @@ i40e_status i40e_nvmupd_command(struct i40e_hw *hw,


There's a return statement in the *_WAIT cases that should have a 
mutex_unlock(), or should have a goto to the unlock at the end of the 
function, or you'll end up never again receiving AR events.

sln

>   		*perrno = -ESRCH;
>   		break;
>   	}
> +	mutex_unlock(&hw->aq.arq_mutex);
>   	return status;
>   }
>   
> 


More information about the Intel-wired-lan mailing list