[Intel-wired-lan] [PATCH net] i40e: Enforce software interrupt during busy-poll exit
Ivan Vecera
ivecera at redhat.com
Fri Mar 15 09:19:55 UTC 2024
On 15. 03. 24 1:53, Jesse Brandeburg wrote:
> On 3/13/2024 5:54 AM, Ivan Vecera wrote:
>> As for ice bug fixed by commit b7306b42beaf ("ice: manage interrupts
>> during poll exit") I'm seeing the similar issue also with i40e driver.
>>
>> In certain situation when busy-loop is enabled together with adaptive
>> coalescing, the driver occasionally miss that there are outstanding
>> descriptors to clean when exiting busy poll.
>>
>> Try to catch the remaining work by triggering a software interrupt
>> when exiting busy poll. No extra interrupts will be generated when
>> busy polling is not used.
>>
>> The issue was found when running sockperf ping-pong tcp test with
>> adaptive coalescing and busy poll enabled (50 as value busy_pool
>> and busy_read sysctl knobs) and results in huge latency spikes
>> with more than 100000us.
>
> I like the results of this fix! Thanks for working on it.
>
>>
>> The fix is inspired from the ice driver and do the following:
>> 1) During napi poll exit in case of busy-poll (napo_complete_done()
>> returns false) this is recorded to q_vector that we were in busy
>> loop.
>> 2) In i40e_update_enable_itr()
>> - updates refreshed ITR intervals directly using PFINT_ITRN register
>> - if we are exiting ordinary poll then just enables the interrupt
>> using PFINT_DYN_CTLN
>> - if we are exiting busy poll then enables the interrupt and
>> additionally triggers an immediate software interrupt to catch any
>> pending clean-ups
>> 3) Reuses unused 3rd ITR (interrupt throttle) index and set it to
>> 20K interrupts per second to limit the number of these sw interrupts.
>
> This is a good idea.
>
>>
>> @@ -2702,8 +2716,8 @@ static inline void i40e_update_enable_itr(struct i40e_vsi *vsi,
>> */
>> if (q_vector->rx.target_itr < q_vector->rx.current_itr) {
>> /* Rx ITR needs to be reduced, this is highest priority */
>> - intval = i40e_buildreg_itr(I40E_RX_ITR,
>> - q_vector->rx.target_itr);
>> + wr32(hw, I40E_PFINT_ITRN(I40E_RX_ITR, q_vector->reg_idx),
>> + q_vector->rx.target_itr >> 1);
>
> so here you write (this is a new write)
>
>> q_vector->rx.current_itr = q_vector->rx.target_itr;
>> q_vector->itr_countdown = ITR_COUNTDOWN_START;
>> } else if ((q_vector->tx.target_itr < q_vector->tx.current_itr) ||
>> @@ -2712,25 +2726,33 @@ static inline void i40e_update_enable_itr(struct i40e_vsi *vsi,
>> /* Tx ITR needs to be reduced, this is second priority
>> * Tx ITR needs to be increased more than Rx, fourth priority
>> */
>> - intval = i40e_buildreg_itr(I40E_TX_ITR,
>> - q_vector->tx.target_itr);
>> + wr32(hw, I40E_PFINT_ITRN(I40E_TX_ITR, q_vector->reg_idx),
>> + q_vector->tx.target_itr >> 1);
>> q_vector->tx.current_itr = q_vector->tx.target_itr;
>> q_vector->itr_countdown = ITR_COUNTDOWN_START;
>> } else if (q_vector->rx.current_itr != q_vector->rx.target_itr) {
>> /* Rx ITR needs to be increased, third priority */
>> - intval = i40e_buildreg_itr(I40E_RX_ITR,
>> - q_vector->rx.target_itr);
>> + wr32(hw, I40E_PFINT_ITRN(I40E_RX_ITR, q_vector->reg_idx),
>> + q_vector->rx.target_itr >> 1);
>
> or here (new write)
>
>> q_vector->rx.current_itr = q_vector->rx.target_itr;
>> q_vector->itr_countdown = ITR_COUNTDOWN_START;
>> } else {
>> /* No ITR update, lowest priority */
>> - intval = i40e_buildreg_itr(I40E_ITR_NONE, 0);
>> if (q_vector->itr_countdown)
>> q_vector->itr_countdown--;
>> }
>>
>> - if (!test_bit(__I40E_VSI_DOWN, vsi->state))
>> - wr32(hw, INTREG(q_vector->reg_idx), intval);
>
> The above used to be the *only* write.
>
>> + /* Do not enable interrupt if VSI is down */
>> + if (test_bit(__I40E_VSI_DOWN, vsi->state))
>> + return;
>> +
>> + if (!q_vector->in_busy_poll) {
>> + intval = i40e_buildreg_itr(I40E_ITR_NONE, 0);
>> + } else {
>> + q_vector->in_busy_poll = false;
>> + intval = i40e_buildreg_swint(I40E_SW_ITR);
>> + }
>> + wr32(hw, I40E_PFINT_DYN_CTLN(q_vector->reg_idx), intval);
>
> and then you write again here.
>
> So this function will now regularly have two writes in hot-path. Before
> it was very carefully crafted to reduce the number of writes to 1.
>
> This is made possible because the PFINT_DYN_CTLN register can do
> multiple tasks at once with a single write.
>
> Can you just modify intval to *both* trigger a software interrupt, and
> update the ITR simultaneously? I'm really not sure that's even possible.
>
> It may make more sense to only do the second write when exiting busy
> poll, what do you think?
Yeah, you are right, we can eliminate these two writes by one and also
for busy-poll exit. I'm setting up ITR2_IDX rate during MSI-X
initialization and as this is fixed we do not need to update it
everytime in i40e_update_enable_itr().
Per datasheet the PFINT_DYN_CTLN value can be encoded to do the
following at once:
- enable interrupt
- update interval for particular ITR index
- software interrupt trigger limited by interval of different ITR index
Will prepare, test and submit v3 with this change.
Thanks,
Ivan
More information about the Intel-wired-lan
mailing list