[Intel-wired-lan] Linux 4.12+ memory leak on router with i40e NICs

Paweł Staszewski pstaszewski at itcare.pl
Wed Oct 18 22:20:33 UTC 2017



W dniu 2017-10-18 o 17:44, Paweł Staszewski pisze:
>
>
> W dniu 2017-10-17 o 16:08, Paweł Staszewski pisze:
>>
>>
>> W dniu 2017-10-17 o 13:52, Paweł Staszewski pisze:
>>>
>>>
>>> W dniu 2017-10-17 o 13:05, Paweł Staszewski pisze:
>>>>
>>>>
>>>> W dniu 2017-10-17 o 12:59, Paweł Staszewski pisze:
>>>>>
>>>>>
>>>>> W dniu 2017-10-17 o 12:51, Paweł Staszewski pisze:
>>>>>>
>>>>>>
>>>>>> W dniu 2017-10-17 o 12:20, Paweł Staszewski pisze:
>>>>>>>
>>>>>>>
>>>>>>> W dniu 2017-10-17 o 11:48, Paweł Staszewski pisze:
>>>>>>>>
>>>>>>>>
>>>>>>>> W dniu 2017-10-17 o 02:44, Paweł Staszewski pisze:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> W dniu 2017-10-17 o 01:56, Alexander Duyck pisze:
>>>>>>>>>> On Mon, Oct 16, 2017 at 4:34 PM, Paweł Staszewski 
>>>>>>>>>> <pstaszewski at itcare.pl> wrote:
>>>>>>>>>>>
>>>>>>>>>>> W dniu 2017-10-16 o 18:26, Paweł Staszewski pisze:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> W dniu 2017-10-16 o 13:20, Pavlos Parissis pisze:
>>>>>>>>>>>>> On 15/10/2017 02:58 πμ, Alexander Duyck wrote:
>>>>>>>>>>>>>> Hi Pawel,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> To clarify is that Dave Miller's tree or Linus's that you 
>>>>>>>>>>>>>> are talking
>>>>>>>>>>>>>> about? If it is Dave's tree how long ago was it you 
>>>>>>>>>>>>>> pulled it since I
>>>>>>>>>>>>>> think the fix was just pushed by Jeff Kirsher a few days 
>>>>>>>>>>>>>> ago.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The issue should be fixed in the following commit:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/drivers/net/ethernet/intel/i40e/i40e_txrx.c?id=2b9478ffc550f17c6cd8c69057234e91150f5972 
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> Do you know when it is going to be available on net-next 
>>>>>>>>>>>>> and linux-stable
>>>>>>>>>>>>> repos?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>> Pavlos
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> I will make some tests today night with "net" git tree 
>>>>>>>>>>>> where this patch is
>>>>>>>>>>>> included.
>>>>>>>>>>>> Starting from 0:00 CET
>>>>>>>>>>>> :)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> Upgraded and looks like problem is not solved with that patch
>>>>>>>>>>> Currently running system with
>>>>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/
>>>>>>>>>>> kernel
>>>>>>>>>>>
>>>>>>>>>>> Still about 0.5GB of memory is leaking somewhere
>>>>>>>>>>>
>>>>>>>>>>> Also can confirm that the latest kernel where memory is not 
>>>>>>>>>>> leaking (with
>>>>>>>>>>> use i40e driver intel 710 cards) is 4.11.12
>>>>>>>>>>> With kernel 4.11.12 - after hour no change in memory usage.
>>>>>>>>>>>
>>>>>>>>>>> also checked that with ixgbe instead of i40e with same 
>>>>>>>>>>> net.git kernel there
>>>>>>>>>>> is no memleak - after hour same memory usage - so for 100% 
>>>>>>>>>>> this is i40e
>>>>>>>>>>> driver problem.
>>>>>>>>>> So how long was the run to get the .5GB of memory leaking?
>>>>>>>>> 1 hour
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Also is there any chance of you being able to bisect to 
>>>>>>>>>> determine
>>>>>>>>>> where the memory leak was introduced since as you pointed out it
>>>>>>>>>> didn't exist in 4.11.12 so odds are it was introduced somewhere
>>>>>>>>>> between 4.11 and the latest kernel release.
>>>>>>>>> Can be hard cause currently need to back to 4.11.12 - this is 
>>>>>>>>> production host/router
>>>>>>>>> Will try to find some free/test router for tests/bicects with 
>>>>>>>>> i40e driver (intel 710 cards)
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks.
>>>>>>>>>>
>>>>>>>>>> - Alex
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Also forgoto to add errors for i40e when driver initialize:
>>>>>>>> [   15.760569] i40e 0000:02:00.1: Error I40E_AQ_RC_ENOSPC 
>>>>>>>> adding RX filters on PF, promiscuous mode forced on
>>>>>>>> [   16.365587] i40e 0000:03:00.3: Error I40E_AQ_RC_ENOSPC 
>>>>>>>> adding RX filters on PF, promiscuous mode forced on
>>>>>>>> [   16.367686] i40e 0000:02:00.2: Error I40E_AQ_RC_ENOSPC 
>>>>>>>> adding RX filters on PF, promiscuous mode forced on
>>>>>>>> [   16.368816] i40e 0000:03:00.0: Error I40E_AQ_RC_ENOSPC 
>>>>>>>> adding RX filters on PF, promiscuous mode forced on
>>>>>>>> [   16.369877] i40e 0000:03:00.2: Error I40E_AQ_RC_ENOSPC 
>>>>>>>> adding RX filters on PF, promiscuous mode forced on
>>>>>>>> [   16.370941] i40e 0000:02:00.3: Error I40E_AQ_RC_ENOSPC 
>>>>>>>> adding RX filters on PF, promiscuous mode forced on
>>>>>>>> [   16.372005] i40e 0000:02:00.0: Error I40E_AQ_RC_ENOSPC 
>>>>>>>> adding RX filters on PF, promiscuous mode forced on
>>>>>>>> [   16.373029] i40e 0000:03:00.1: Error I40E_AQ_RC_ENOSPC 
>>>>>>>> adding RX filters on PF, promiscuous mode forced on
>>>>>>>>
>>>>>>>> some params that are set for this nic's
>>>>>>>>         ip link set up dev $i
>>>>>>>>         ethtool -A $i autoneg off rx off tx off
>>>>>>>>         ethtool -G $i rx 1024 tx 2048
>>>>>>>>         ip link set $i txqueuelen 1000
>>>>>>>>         ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs 
>>>>>>>> 512 tx-usecs 128
>>>>>>>>         ethtool -L $i combined 6
>>>>>>>>         #ethtool -N $i rx-flow-hash udp4 sdfn
>>>>>>>>         ethtool -K $i ntuple on
>>>>>>>>         ethtool -K $i gro off
>>>>>>>>         ethtool -K $i tso off
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> Also after TSO/GRO on there is memory usage change - and leaking 
>>>>>>> faster
>>>>>>> Below image from memory usage before change with TSO/GRO OFF and 
>>>>>>> after enabling TSO/GRO
>>>>>>>
>>>>>>> https://ibb.co/dTqBY6
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>> Pawel
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> With settings like this:
>>>>>> ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1 
>>>>>> enp3s0f2 enp3s0f3'
>>>>>> for i in $ifc
>>>>>>         do
>>>>>>         ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs 
>>>>>> 512 tx-usecs 128
>>>>>>         ethtool -K $i gro on
>>>>>>         ethtool -K $i tso on
>>>>>>
>>>>>>         done
>>>>>>
>>>>>> Server is leaking about 4-6MB per each 10 seconds
>>>>>> MEMLEAK:
>>>>>> 5  MB/10sec
>>>>>> 6  MB/10sec
>>>>>> 4  MB/10sec
>>>>>> 4  MB/10sec
>>>>>>
>>>>>>
>>>>>> Other settings TSO/GRO off
>>>>>> ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1 
>>>>>> enp3s0f2 enp3s0f3'
>>>>>> for i in $ifc
>>>>>>         do
>>>>>>         ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs 
>>>>>> 512 tx-usecs 128
>>>>>>         ethtool -K $i gro off
>>>>>>         ethtool -K $i tso off
>>>>>>
>>>>>>         done
>>>>>>
>>>>>> Same leak about 5MB per 10 seconds
>>>>>> MEMLEAK:
>>>>>> 5  MB/10sec
>>>>>> 5  MB/10sec
>>>>>> 5  MB/10sec
>>>>>>
>>>>>>
>>>>>> Other settings rx-usecs change from 512 to 1024:
>>>>>> ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1 
>>>>>> enp3s0f2 enp3s0f3'
>>>>>> for i in $ifc
>>>>>>         do
>>>>>>         ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs 
>>>>>> 1024 tx-usecs 128
>>>>>>         ethtool -K $i gro off
>>>>>>         ethtool -K $i tso off
>>>>>>
>>>>>>         done
>>>>>>
>>>>>> MEMLEAK:
>>>>>> 4  MB/10sec
>>>>>> 3  MB/10sec
>>>>>> 4  MB/10sec
>>>>>> 4  MB/10sec
>>>>>>
>>>>>>
>>>>>> So memleak have something to do with rx-usecs (less interrupts 
>>>>>> but bigger latency for traffic)
>>>>>>
>>>>>>
>>>>>> But also enabling TSO/GRO making leak about 1MB bigger for each 
>>>>>> 10 seconds
>>>>>>
>>>>>>
>>>>>>
>>>>> So far best config is:
>>>>> ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1 
>>>>> enp3s0f2 enp3s0f3'
>>>>> for i in $ifc
>>>>>         do
>>>>>         ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs 64 
>>>>> tx-usecs 512
>>>>>         ethtool -K $i gro off
>>>>>         ethtool -K $i tso on
>>>>>
>>>>>         done
>>>>>
>>>>> MEMLEAK - about 2MB/10secs
>>>>> 2  MB/10sec
>>>>> 2  MB/10sec
>>>>> 2  MB/10sec
>>>>>
>>>>>
>>>>> With - rx-usecs set to 256 (about 7-9MB/10secs memleak)
>>>>> ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1 
>>>>> enp3s0f2 enp3s0f3'
>>>>> for i in $ifc
>>>>>         do
>>>>>         ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs 256 
>>>>> tx-usecs 512
>>>>>         ethtool -K $i gro off
>>>>>         ethtool -K $i tso on
>>>>>
>>>>>         done
>>>>>
>>>>> MEMLEAK:
>>>>> 7  MB/10sec
>>>>> 7  MB/10sec
>>>>> 8  MB/10sec
>>>>> 9  MB/10sec
>>>>>
>>>>>
>>>>
>>>> And even less memleak with rx-usecs set to 32
>>>> ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1 enp3s0f2 
>>>> enp3s0f3'
>>>> for i in $ifc
>>>>         do
>>>>         ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs 32 
>>>> tx-usecs 512
>>>>         ethtool -K $i gro off
>>>>         ethtool -K $i tso on
>>>>
>>>>         done
>>>>
>>>>
>>>> MEMLEAK - about 0-2MB for each 10 seconds
>>>> 0  MB/10sec
>>>> 1  MB/10sec
>>>> 0  MB/10sec
>>>> 2  MB/10sec
>>>> 1  MB/10sec
>>>>
>>>>
>>>>
>>>
>>>
>>> So best settings - to have as less leak as possible for now 
>>> (rx-usecs set to 16):
>>> ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1 enp3s0f2 
>>> enp3s0f3'
>>> for i in $ifc
>>>         do
>>>         ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs 16 
>>> tx-usecs 768
>>>         ethtool -K $i gro on
>>>         ethtool -K $i tso on
>>>
>>>         done
>>>
>>>
>>> MEMLEAK: (0-1MB/10seconds)
>>> 0  MB/10sec
>>> 0  MB/10sec
>>> 0  MB/10sec
>>> 1  MB/10sec
>>> 1  MB/10sec
>>> -1  MB/10sec
>>> 1  MB/10sec
>>> 1  MB/10sec
>>> 0  MB/10sec
>>>
>>> (there are some memory recycles - so this is good :) )
>>>
>>>
>>>
>>> Compared to(rx-usecs 512):
>>>
>>> ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1 enp3s0f2 
>>> enp3s0f3'
>>> for i in $ifc
>>>         do
>>>         ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs 512 
>>> tx-usecs 128
>>>         ethtool -K $i gro on
>>>         ethtool -K $i tso on
>>>
>>>         done
>>>
>>> Server is leaking about 4-6MB per each 10 seconds
>>> MEMLEAK:
>>> 5  MB/10sec
>>> 6  MB/10sec
>>> 4  MB/10sec
>>> 4  MB/10sec
>>>
>>>
>>
>> And  graph where all changes for rx-usecs was done over some time:
>> https://ibb.co/nrRfbR
>>
>>
>>
>>
>>
> Cant eliminate the problem with settings - memleak is bigger or less 
> visible with rx-usecs set to low values - but then have 100% cpu load 
> - cant have rx-usecs set to 16
>
> Cant find also other host with same cards or that are using i40e 
> driver for tests with bisecting
> So will just replace to mellanox :)
>
>
Also after fresh reboot with i40e
startup settings:
ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1 enp3s0f2 
enp3s0f3'
for i in $ifc
         do
         ip link set up dev $i
         ethtool -A $i autoneg off rx off tx off
         ethtool -G $i rx 2048 tx 2048
         ip link set $i txqueuelen 1000
         #ethtool -C $i rx-usecs 256
         ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs 17 
tx-usecs 125
         ethtool -L $i combined 6
         #ethtool -N $i rx-flow-hash udp4 sdfn
         #ethtool -K $i ntuple on
         #ethtool -K $i gro off
         #ethtool -K $i tso off
         done


After issuing:

  ethtool -K enp2s0f0 gro on tso on

dmesg shows
[35764.338259] i40e 0000:02:00.0: PF reset failed, -15


and no traffic on the card :)



More information about the Intel-wired-lan mailing list