[Intel-wired-lan] i40e performance regression after 1.3.9 ?

Alexander Duyck alexander.duyck at gmail.com
Mon Nov 7 20:44:16 UTC 2016


On Mon, Nov 7, 2016 at 12:16 PM, Ray Bellis <ray at isc.org> wrote:
> On 07/11/2016 18:42, Alexander Duyck wrote:
>
>> Do you know if the packets are being dropped in the Rx path or is the
>> bottleneck on the Tx side?
>
> None were being dropped AFAICR (from looking at the interface stats) and
> I can't tell which side is the bottleneck.
>
>> It looks like this could be one of a few
>> things.  Doing a quick git log and git diff between v4.3 and v4.4 the
>> two things that jump out at me are changes to the Tx tail bumping code
>> and changes to the interrupt moderation code.
>>
>> To narrow this down you might try manually configuring both the 1.3.9
>> and 1.3.38 drivers to the same interrupt moderation values using a
>> command like:
>> ethtool -C <iface> adaptive-rx off adaptive-tx off rx-usecs 25 tx-usecs 25
>>
>> That would default the interrupt moderation to somewhere around 40K
>> interrupts per second.  If you run this command on both drivers and
>> they give you the same performance than we would know that the issue
>> is likely due to changes in the dynamic interrupt moderation.
>
> I had noticed last week that the rx-usecs and tx-usecs values were
> different between 4.3 and 4.4, but when I changed the 25/25 that 4.4
> uses to the 62/122 that 4.3 uses it made no difference.
>
> However that's not quite what you've asked, so I shall repeat the test
> tomorrow, and also see whether your suggestion of changing 4.3 to the
> 4.4 values causes the same drop.

I would suggest also changing the values on 4.4 with the same command.
It will say that rx-usecs and tx-usecs didn't change but the simple
fact that we disabled the adaptive moderation can have a huge impact.

> I also didn't turn off adaptive rx or tx, so I shall try that too.

The adaptive Rx and Tx being disabled is the important part.  If you
didn't do that then changing the other values really had no effect.

> Also if it helps, I've got flame graphs of "before" and "after":
>
> http://users.isc.org/~ray/graph-fast.svg   (1.3.9)
> http://users.isc.org/~ray/graph-slow.svg   (1.3.38+)
>
> These both represent 30 seconds of dnsperf hammering my UDP echo server,
> although I suspect that compiler inlining may have interfered with the
> stack traces.

I'm used to seeing the effects of compiler inlining on these sort of
things.  Just looking at them I suspect the problem is the new driver
isn't firing the interrupts often enough.  You are spending half as
much time in the driver as you were before when handling the Rx
cleanup.

Thanks.

- Alex


More information about the Intel-wired-lan mailing list