[Intel-wired-lan] Linux 4.12+ memory leak on router with i40e NICs

Paweł Staszewski pstaszewski at itcare.pl
Tue Oct 17 10:20:18 UTC 2017



W dniu 2017-10-17 o 11:48, Paweł Staszewski pisze:
>
>
> W dniu 2017-10-17 o 02:44, Paweł Staszewski pisze:
>>
>>
>> W dniu 2017-10-17 o 01:56, Alexander Duyck pisze:
>>> On Mon, Oct 16, 2017 at 4:34 PM, Paweł Staszewski 
>>> <pstaszewski at itcare.pl> wrote:
>>>>
>>>> W dniu 2017-10-16 o 18:26, Paweł Staszewski pisze:
>>>>
>>>>>
>>>>> W dniu 2017-10-16 o 13:20, Pavlos Parissis pisze:
>>>>>> On 15/10/2017 02:58 πμ, Alexander Duyck wrote:
>>>>>>> Hi Pawel,
>>>>>>>
>>>>>>> To clarify is that Dave Miller's tree or Linus's that you are 
>>>>>>> talking
>>>>>>> about? If it is Dave's tree how long ago was it you pulled it 
>>>>>>> since I
>>>>>>> think the fix was just pushed by Jeff Kirsher a few days ago.
>>>>>>>
>>>>>>> The issue should be fixed in the following commit:
>>>>>>>
>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/drivers/net/ethernet/intel/i40e/i40e_txrx.c?id=2b9478ffc550f17c6cd8c69057234e91150f5972 
>>>>>>>
>>>>>>>
>>>>>> Do you know when it is going to be available on net-next and 
>>>>>> linux-stable
>>>>>> repos?
>>>>>>
>>>>>> Cheers,
>>>>>> Pavlos
>>>>>>
>>>>>>
>>>>> I will make some tests today night with "net" git tree where this 
>>>>> patch is
>>>>> included.
>>>>> Starting from 0:00 CET
>>>>> :)
>>>>>
>>>>>
>>>> Upgraded and looks like problem is not solved with that patch
>>>> Currently running system with
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/
>>>> kernel
>>>>
>>>> Still about 0.5GB of memory is leaking somewhere
>>>>
>>>> Also can confirm that the latest kernel where memory is not leaking 
>>>> (with
>>>> use i40e driver intel 710 cards) is 4.11.12
>>>> With kernel 4.11.12 - after hour no change in memory usage.
>>>>
>>>> also checked that with ixgbe instead of i40e with same net.git 
>>>> kernel there
>>>> is no memleak - after hour same memory usage - so for 100% this is 
>>>> i40e
>>>> driver problem.
>>> So how long was the run to get the .5GB of memory leaking?
>> 1 hour
>>
>>>
>>> Also is there any chance of you being able to bisect to determine
>>> where the memory leak was introduced since as you pointed out it
>>> didn't exist in 4.11.12 so odds are it was introduced somewhere
>>> between 4.11 and the latest kernel release.
>> Can be hard cause currently need to back to 4.11.12 - this is 
>> production host/router
>> Will try to find some free/test router for tests/bicects with i40e 
>> driver (intel 710 cards)
>>
>>>
>>> Thanks.
>>>
>>> - Alex
>>>
>>
>>
> Also forgoto to add errors for i40e when driver initialize:
> [   15.760569] i40e 0000:02:00.1: Error I40E_AQ_RC_ENOSPC adding RX 
> filters on PF, promiscuous mode forced on
> [   16.365587] i40e 0000:03:00.3: Error I40E_AQ_RC_ENOSPC adding RX 
> filters on PF, promiscuous mode forced on
> [   16.367686] i40e 0000:02:00.2: Error I40E_AQ_RC_ENOSPC adding RX 
> filters on PF, promiscuous mode forced on
> [   16.368816] i40e 0000:03:00.0: Error I40E_AQ_RC_ENOSPC adding RX 
> filters on PF, promiscuous mode forced on
> [   16.369877] i40e 0000:03:00.2: Error I40E_AQ_RC_ENOSPC adding RX 
> filters on PF, promiscuous mode forced on
> [   16.370941] i40e 0000:02:00.3: Error I40E_AQ_RC_ENOSPC adding RX 
> filters on PF, promiscuous mode forced on
> [   16.372005] i40e 0000:02:00.0: Error I40E_AQ_RC_ENOSPC adding RX 
> filters on PF, promiscuous mode forced on
> [   16.373029] i40e 0000:03:00.1: Error I40E_AQ_RC_ENOSPC adding RX 
> filters on PF, promiscuous mode forced on
>
> some params that are set for this nic's
>         ip link set up dev $i
>         ethtool -A $i autoneg off rx off tx off
>         ethtool -G $i rx 1024 tx 2048
>         ip link set $i txqueuelen 1000
>         ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs 512 
> tx-usecs 128
>         ethtool -L $i combined 6
>         #ethtool -N $i rx-flow-hash udp4 sdfn
>         ethtool -K $i ntuple on
>         ethtool -K $i gro off
>         ethtool -K $i tso off
>
>
>
>
Also after TSO/GRO on there is memory usage change - and leaking faster
Below image from memory usage before change with TSO/GRO OFF and after 
enabling TSO/GRO

https://ibb.co/dTqBY6


Thanks
Pawel




More information about the Intel-wired-lan mailing list