[Intel-wired-lan] [PATCH net] i40e: Enforce software interrupt during busy-poll exit

Ivan Vecera ivecera at redhat.com
Fri Mar 15 09:19:55 UTC 2024



On 15. 03. 24 1:53, Jesse Brandeburg wrote:
> On 3/13/2024 5:54 AM, Ivan Vecera wrote:
>> As for ice bug fixed by commit b7306b42beaf ("ice: manage interrupts
>> during poll exit") I'm seeing the similar issue also with i40e driver.
>>
>> In certain situation when busy-loop is enabled together with adaptive
>> coalescing, the driver occasionally miss that there are outstanding
>> descriptors to clean when exiting busy poll.
>>
>> Try to catch the remaining work by triggering a software interrupt
>> when exiting busy poll. No extra interrupts will be generated when
>> busy polling is not used.
>>
>> The issue was found when running sockperf ping-pong tcp test with
>> adaptive coalescing and busy poll enabled (50 as value busy_pool
>> and busy_read sysctl knobs) and results in huge latency spikes
>> with more than 100000us.
> 
> I like the results of this fix! Thanks for working on it.
> 
>>
>> The fix is inspired from the ice driver and do the following:
>> 1) During napi poll exit in case of busy-poll (napo_complete_done()
>>     returns false) this is recorded to q_vector that we were in busy
>>     loop.
>> 2) In i40e_update_enable_itr()
>>     - updates refreshed ITR intervals directly using PFINT_ITRN register
>>     - if we are exiting ordinary poll then just enables the interrupt
>>       using PFINT_DYN_CTLN
>>     - if we are exiting busy poll then enables the interrupt and
>>       additionally triggers an immediate software interrupt to catch any
>>       pending clean-ups
>> 3) Reuses unused 3rd ITR (interrupt throttle) index and set it to
>>     20K interrupts per second to limit the number of these sw interrupts.
> 
> This is a good idea.
> 
>>
>> @@ -2702,8 +2716,8 @@ static inline void i40e_update_enable_itr(struct i40e_vsi *vsi,
>>   	 */
>>   	if (q_vector->rx.target_itr < q_vector->rx.current_itr) {
>>   		/* Rx ITR needs to be reduced, this is highest priority */
>> -		intval = i40e_buildreg_itr(I40E_RX_ITR,
>> -					   q_vector->rx.target_itr);
>> +		wr32(hw, I40E_PFINT_ITRN(I40E_RX_ITR, q_vector->reg_idx),
>> +		     q_vector->rx.target_itr >> 1);
> 
> so here you write (this is a new write)
> 
>>   		q_vector->rx.current_itr = q_vector->rx.target_itr;
>>   		q_vector->itr_countdown = ITR_COUNTDOWN_START;
>>   	} else if ((q_vector->tx.target_itr < q_vector->tx.current_itr) ||
>> @@ -2712,25 +2726,33 @@ static inline void i40e_update_enable_itr(struct i40e_vsi *vsi,
>>   		/* Tx ITR needs to be reduced, this is second priority
>>   		 * Tx ITR needs to be increased more than Rx, fourth priority
>>   		 */
>> -		intval = i40e_buildreg_itr(I40E_TX_ITR,
>> -					   q_vector->tx.target_itr);
>> +		wr32(hw, I40E_PFINT_ITRN(I40E_TX_ITR, q_vector->reg_idx),
>> +		     q_vector->tx.target_itr >> 1);
>>   		q_vector->tx.current_itr = q_vector->tx.target_itr;
>>   		q_vector->itr_countdown = ITR_COUNTDOWN_START;
>>   	} else if (q_vector->rx.current_itr != q_vector->rx.target_itr) {
>>   		/* Rx ITR needs to be increased, third priority */
>> -		intval = i40e_buildreg_itr(I40E_RX_ITR,
>> -					   q_vector->rx.target_itr);
>> +		wr32(hw, I40E_PFINT_ITRN(I40E_RX_ITR, q_vector->reg_idx),
>> +		     q_vector->rx.target_itr >> 1);
> 
> or here (new write)
> 
>>   		q_vector->rx.current_itr = q_vector->rx.target_itr;
>>   		q_vector->itr_countdown = ITR_COUNTDOWN_START;
>>   	} else {
>>   		/* No ITR update, lowest priority */
>> -		intval = i40e_buildreg_itr(I40E_ITR_NONE, 0);
>>   		if (q_vector->itr_countdown)
>>   			q_vector->itr_countdown--;
>>   	}
>>   
>> -	if (!test_bit(__I40E_VSI_DOWN, vsi->state))
>> -		wr32(hw, INTREG(q_vector->reg_idx), intval);
> 
> The above used to be the *only* write.
> 
>> +	/* Do not enable interrupt if VSI is down */
>> +	if (test_bit(__I40E_VSI_DOWN, vsi->state))
>> +		return;
>> +
>> +	if (!q_vector->in_busy_poll) {
>> +		intval = i40e_buildreg_itr(I40E_ITR_NONE, 0);
>> +	} else {
>> +		q_vector->in_busy_poll = false;
>> +		intval = i40e_buildreg_swint(I40E_SW_ITR);
>> +	}
>> +	wr32(hw, I40E_PFINT_DYN_CTLN(q_vector->reg_idx), intval);
> 
> and then you write again here.
> 
> So this function will now regularly have two writes in hot-path. Before
> it was very carefully crafted to reduce the number of writes to 1.
> 
> This is made possible because the PFINT_DYN_CTLN register can do
> multiple tasks at once with a single write.
> 
> Can you just modify intval to *both* trigger a software interrupt, and
> update the ITR simultaneously? I'm really not sure that's even possible.
> 
> It may make more sense to only do the second write when exiting busy
> poll, what do you think?

Yeah, you are right, we can eliminate these two writes by one and also 
for busy-poll exit. I'm setting up ITR2_IDX rate during MSI-X 
initialization and as this is fixed we do not need to update it 
everytime in i40e_update_enable_itr().

Per datasheet the PFINT_DYN_CTLN value can be encoded to do the 
following at once:
- enable interrupt
- update interval for particular ITR index
- software interrupt trigger limited by interval of different ITR index

Will prepare, test and submit v3 with this change.

Thanks,
Ivan



More information about the Intel-wired-lan mailing list