[Intel-wired-lan] Regression: Approximate 34% performance hit in receive throughput over ixgbe seen due to build_skb patch

Alexander Duyck alexander.duyck at gmail.com
Tue May 22 18:23:58 UTC 2018


On Tue, May 22, 2018 at 11:00 AM, William Kucharski
<william.kucharski at oracle.com> wrote:
> A performance hit of approximately 34% in receive numbers for some packet sizes is
> seen when testing traffic over ixgbe links using the network test netperf.
>
> Starting with the top of tree commit 7addb3e4ad3db6a95a953c59884921b5883dcdec,
> a git bisect narrowed the issue down to:
>
> commit 6f429223b31c550b835b4f066ac034d0cf0cc71e
>
>     ixgbe: Add support for build_skb
>
>     This patch adds build_skb support to the Rx path.  There are several
>     advantages to this change.
>
>     1.  It avoids the memcpy and skb->head allocation for small packets which
>         improves performance by about 5% in my tests.
>     2.  It avoids the memcpy, skb->head allocation, and eth_get_headlen
>         for larger packets improving performance by about 10% in my tests.
>     3.  For VXLAN packets it allows the full header to be in skb->data which
>         improves the performance by as much as 30% in some of my tests.
>
> Netperf was sourced from:
>
>     https://hewlettpackard.github.io/netperf/
>
> Two machines were directly connected via ixgbe links.
>
> The process "netserver" was started on 10.196.11.8, and running this test:
>
> # netperf -l 60 -H 10.196.11.8 -i 10,2 -I 99,10 -t UDP_STREAM -- -m 64 -s 32768 -S 32768

Okay, so I can already see what the most likely issue is. The
build_skb code is more CPU efficient, but it will consume more memory
in the process since it is avoiding the memcpy and is instead using a
full 2K block of memory for a small frame. I'm suspecting any
performance issue you are seeing may be due to a slow interrupt rate
causing us to either exhaust available Tx memory, or overrun the
available Rx memory.

There end up being multiple ways to address this.
1. Use a larger value for your "-s/-S" values to allow for more memory
to be handled in the queues.
2. Update the interrupt moderation code for the driver. You can either
manually decrease the per-interrupt delay via "ethtool -C" or just
update the adaptive ITR code, see commit b4ded8327fea ("ixgbe: Update
adaptive ITR algorithm").
3. There should be a private flag that can be updated via "ethtool
--set-priv-flags" called "legacy-rx" that you can enable that will
roll back to the original that did the copy-break type approach for
small packets and the headers of the frame.

Thanks.

- Alex


More information about the Intel-wired-lan mailing list