[Intel-wired-lan] [REGRESSION] Intel ICE Ethernet driver in linux >= 6.6.9 triggers extra memory consumption and cause continous kswapd* usage and continuous swapping

Jesse Brandeburg jesse.brandeburg at intel.com
Wed Jan 10 18:07:59 UTC 2024


On 1/8/2024 2:49 AM, Jaroslav Pulchart wrote:
> Hello

First, thank you for your work trying to chase this!

> 
> I would like to report a regression triggered by recent change in
> Intel ICE Ethernet driver in the 6.6.9 linux kernel. The problem was
> bisected and the regression is triggered by
> fc4d6d136d42fab207b3ce20a8ebfd61a13f931f "ice: alter feature support
> check for SRIOV and LAG" commit and originally reported as part of
> https://lore.kernel.org/linux-mm/CAK8fFZ4DY+GtBA40Pm7Nn5xCHy+51w3sfxPqkqpqakSXYyX+Wg@mail.gmail.com/T/#m5217c62beb03b3bc75d7dd4b1d9bab64a3e68826
> thread.

I think that's a bad bisect. There is no reason I could understand for
that change to cause a continuous or large leak, it really doesn't make
any sense. Reverting it consistently helps? You're not just rewinding
the tree back to that point, right? just running 6.6.9 without that
patch? (sorry for being pedantic, just trying to be certain)


>> However, after the following patch we see that more NUMA nodes have
>> such a low amount of memory and  that is causing constant reclaiming
>> of memory because it looks like something inside of the kernel ate all
>> the memory. This is right after the start of the system as well.
> 
>  I'm reporting it here as it is a different problem than the original
> thread. The commit introduces a low memory problem per each numa node
> of the first socket (node0 .. node3 in our case) and cause constant
> kswapd* 100% CPU usage. See attached 6.6.9-kswapd_usage.png. The low
> memory issue is nicely visible in "numastat -m", see attached files:
> * numastat_m-6.6.10_28GB_HP_ice_revert.txt   >= 6.6.9 with reverted ice commit
> * numastat_m-6.6.10_28GB_HP_no_revert.txt    >= 6.6.9 vanilla
> the server "is fresh" (after reboot), without running any application load.

OK, so the initial allocations of your system is running your system out
of memory.

Are you running jumbo frames on your ethernet interfaces?

Do you have /proc/slabinfo output from working/non-working boot?

> 
> $ grep MemFree numastat_m-6.6.10_28GB_HP_ice_revert.txt
> numastat_m-6.6.10_28GB_HP_no_revert.txt
> numastat_m-6.6.10_28GB_HP_ice_revert.txt:MemFree
> 2756.89         2754.86          100.39         2278.43         < ice
> fix is reverted, we have ~2GB free per numa, except one, like before
> == no issue
> numastat_m-6.6.10_28GB_HP_ice_revert.txt:MemFree
> 3551.29         1530.52         2212.04         3488.09
> ...
> numastat_m-6.6.10_28GB_HP_no_revert.txt:MemFree
> 127.52           66.49          120.23          263.47               <


> ice fix is present, we see just few MB free per each node, this will
> cause kswapd utilization!
> numastat_m-6.6.10_28GB_HP_no_revert.txt:MemFree
> 3322.18         3134.47          195.55          879.17
> ...
> 
> If you have some hints on how to debug what is actually occupying all
> that memory and some fix of the problem will be nice. We can provide
> testing and more reports if needed to analyze the issue. We reverted
> the commit fc4d6d136d42fab207b3ce20a8ebfd61a13f931f as a workaround
> till we know a proper fix.

My first suspicion is that we're contributing to the problem by running
out of receive descriptors memory.

Can we see the ethtool -S stats from the freshly booted system that's
running out of memory or doing OOM? Also, all the standard debugging
info (at least once please), devlink dev info, any other configuration
specifics? What networking config (bonding? anything else?)

Do you have a bugzilla.kernel.org bug yet where you can upload larger
files like dmesg and others?

Also, I'm curious if your problem goes away if you change / reduce the
number of queues per port. use ethtool -L eth0 combined 4 ?

You also said something about reproducing when launching / destroying
virtual machines with VF passthrough?

Can you reproduce the issue without starting qemu (just doing bare-metal
SR-IOV instance creation/destruction via
/sys/class/net/eth0/device/sriov_numvfs ?)

Thanks


More information about the Intel-wired-lan mailing list