[Intel-wired-lan] Bug 215689 - e1000e: regression for I219-V for kernel 5.14 onwards
Thorsten Leemhuis
regressions at leemhuis.info
Sun Apr 10 08:21:22 UTC 2022
Hi, this is your Linux kernel regression tracker. Top-posting for once,
to make this easily accessible to everyone.
Hey Sasha and e1000e developers, what's up there? Two and a half weeks
ago it seemed the root cause for this regression was found and a
proposed patch to fix it was added to the bugzilla ticket and even
tested by the reporter. But since then nothing happened afaics. What's
up here? Or did I miss something?
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
P.S.: As the Linux kernel's regression tracker I'm getting a lot of
reports on my table. I can only look briefly into most of them and lack
knowledge about most of the areas they concern. I thus unfortunately
will sometimes get things wrong or miss something important. I hope
that's not the case here; if you think it is, don't hesitate to tell me
in a public reply, it's in everyone's interest to set the public record
straight.
#regzbot poke
On 24.03.22 20:36, Neftin, Sasha wrote:
> On 3/24/2022 17:09, Neftin, Sasha wrote:
>> On 3/24/2022 12:37, Thorsten Leemhuis wrote:
>>> Hi, this is your Linux kernel regression tracker.
>>>
>>> I noticed a regression report in bugzilla.kernel.org that afaics nobody
>>> acted upon since it was reported about a week ago, that's why I decided
>>> to forward it to the lists and a few relevant people to the CC. To quote
>>> from https://bugzilla.kernel.org/show_bug.cgi?id=215689 :
>>>
>>>> [reply] [−] Description James 2022-03-15 13:45:38 UTC
>>>>
>>>> I run Arch linux on an Intel NUC 8i3BEH which has the following
>>>> network card:
>>>>
>>>> 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection
>>>> (6) I219-V (rev 30)
>>>> DeviceName: LAN
>>>> Subsystem: Intel Corporation Device 2074
>>>> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast
>>>> >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>> Latency: 0
>>>> Interrupt: pin A routed to IRQ 135
>>>> Region 0: Memory at c0b00000 (32-bit, non-prefetchable)
>>>> [size=128K]
>>>> Capabilities: [c8] Power Management version 3
>>>> Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
>>>> PME(D0+,D1-,D2-,D3hot+,D3cold+)
>>>> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
>>>> Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>>>> Address: 00000000fee003d8 Data: 0000
>>>> Kernel driver in use: e1000e
>>>> Kernel modules: e1000e
>>>>
>>>> I found a major regression since the previous few kernel versions
>>>> which causes several odd issues, most noteably I use the machine to
>>>> stream live tv via TVheadend and was finding this to be unusable
>>>> (picture freezes and sound breaks up very badly with continuity
>>>> errors in the TVheadend logfile).
>>>>
>>>> I found the issue was introduced since the 5.14 kernel, and have
>>>> eventually got round to performing a git bisect, which landed upon
>>>> the following commit:
>>>>
>>>> 44a13a5: e1000e: Fix the max snoop/no-snoop latency for 10M
>>>>
>>>> Indeed, if I revert this single commit then the problem is resolved.
>>>>
>>>> To help diagnose the issue I applied the following patch to capture
>>>> the values of the lat_enc, max_ltr_enc vs lat_enc_d, max_ltr_enc_d
>>>> variables:
>>>>
>>>> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>> b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>> index d60e2016d..f4e5ffbcd 100644
>>>> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>> @@ -1012,6 +1012,7 @@ static s32 e1000_platform_pm_pch_lpt(struct
>>>> e1000_hw *hw, bool link)
>>>> u16 max_ltr_enc_d = 0; /* maximum LTR decoded by platform */
>>>> u16 lat_enc_d = 0; /* latency decoded */
>>>> u16 lat_enc = 0; /* latency encoded */
>>>> + struct e1000_adapter *adapter = hw->adapter;
>>>>
>>>> if (link) {
>>>> u16 speed, duplex, scale = 0;
>>>> @@ -1074,6 +1075,9 @@ static s32 e1000_platform_pm_pch_lpt(struct
>>>> e1000_hw *hw, bool link)
>>>> ((max_ltr_enc &
>>>> E1000_LTRV_SCALE_MASK)
>>>> >> E1000_LTRV_SCALE_SHIFT)));
>>>>
>>>> + e_info("e1000e: lat_enc=%d max_ltr_enc=%d", lat_enc,
>>>> max_ltr_enc);
>>>> + e_info("e1000e: lat_enc_d=%u max_ltr_enc_d=%u",
>>>> lat_enc_d, max_ltr_enc_d);
>>>> +
>>>> if (lat_enc_d > max_ltr_enc_d)
>>>> lat_enc = max_ltr_enc;
>>>> }
>>>>
>>>> With this in place I see the following in dmesg:
>>>>
>>>> [ 3.241215] e1000e: Intel(R) PRO/1000 Network Driver
>>>> [ 3.241217] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
>>>> [ 3.243382] e1000e 0000:00:1f.6: Interrupt Throttling Rate
>>>> (ints/sec) set to dynamic conservative mode
>>>> [ 3.749009] e1000e 0000:00:1f.6 0000:00:1f.6 (uninitialized):
>>>> registered PHC clock
>>>> [ 3.824751] e1000e 0000:00:1f.6 eth0: (PCI Express:2.5GT/s:Width
>>>> x1) 94:c6:91:ae:b3:7b
>>>> [ 3.824765] e1000e 0000:00:1f.6 eth0: Intel(R) PRO/1000 Network
>>>> Connection
>>>> [ 3.824849] e1000e 0000:00:1f.6 eth0: MAC: 13, PHY: 12, PBA No:
>>>> FFFFFF-0FF
>>>> [ 6.949327] e1000e 0000:00:1f.6 eth0: e1000e: lat_enc=2233
>>>> max_ltr_enc=4099
>>>> [ 6.949331] e1000e 0000:00:1f.6 eth0: e1000e: lat_enc_d=58368
>>>> max_ltr_enc_d=0
>>>> [ 6.951165] e1000e 0000:00:1f.6 eth0: NIC Link is Up 1000 Mbps
>>>> Full Duplex, Flow Control: Rx/Tx
>>>>
>>>> Notice that lat_enc_d=58368 and max_ltr_enc_d=0 !
>>>>
>>>> lat_enc_d is greater than max_ltr_enc_d so it's setting snoop
>>>> latency to max_ltr_enc (i.e. 4099) where it would have previously
>>>> been set to 2233 in this particular example. This seems to be where
>>>> the problem lies.
>>>>
>>>> Prior to commit 44a13a5:
>>>>
>>>> if (lat_enc > max_ltr_enc)
>>>> lat_enc = max_ltr_enc;
>>>>
>>>> After commit 44a13a5:
>>>>
>>>> if (lat_enc_d > max_ltr_enc_d)
>>>> lat_enc = max_ltr_enc;
>>>>
>>>>
>>>> I'm not sure whether it was intended for this new code to take
>>>> effect for an I219 since the commit message on 44a13a5 indicates it
>>>> was aimed at I217/I218. Seems strange that max_ltr_enc_d is getting
>>>> set to 0?
>>>>
>>>
>>> BTW, that commit is from Sasha Neftin.
>> Hello Thorsten,
>> I've expected follow decoded values (link 1G)
>> lat_enc: 0x000008b9 => lat_enc_d: 189440 (1024*185)
>> max_ltr_enc: 0x00001003 => max_ltr_enc_d: 3145728 (1048576*3)
>>
>> scale 0 - 1
>> scale 1 - 32
>> scale 2 - 1024
>> scale 3 - 32768
>> scale 4 - 1048576 (nano s)
>>
>> I've separated calculate:
>> e_info("e1000e: 1* max_ltr_enc_d: %d\n",
>> max_ltr_enc & E1000_LTRV_VALUE_MASK);
>> e_info("e1000e: 2* max_ltr_enc_d: %d\n",
>> (1U << (E1000_LTRV_SCALE_FACTOR *
>> ((max_ltr_enc & E1000_LTRV_SCALE_MASK)
>> >> E1000_LTRV_SCALE_SHIFT))));
>> I would expect:
>> 1* max_ltr_enc_d (value): 3
>> 2* max_ltr_enc_d (scale): 1048576
>> and so: value * scale
>> 1048576*3 = 3145728ns
>>
>> Please, let's check it. (I am wondering if over-calculate it)
>> Thanks,
>> Sasha
> ok. Overflow... Instead of
> + u16 max_ltr_enc_d = 0; /* maximum LTR decoded by platform */
> + u16 lat_enc_d = 0; /* latency decoded */
>
> Should be:
> + u32 max_ltr_enc_d = 0; /* maximum LTR decoded by platform */
> + u32 lat_enc_d = 0; /* latency decoded */
> I will process the patch address this overflow and some e_dbg to
> eliminate calculation.
>
> sudo cat /sys/kernel/debug/pmc_core/ltr_show
> SOUTHPORT_A LTR: RAW: 0x0 Non-Snoop(ns):
> 0 Snoop(ns): 0
> SOUTHPORT_B LTR: RAW: 0x0 Non-Snoop(ns):
> 0 Snoop(ns): 0
> SATA LTR: RAW: 0x900f Non-Snoop(ns):
> 0 Snoop(ns): 15728640
> GIGABIT_ETHERNET LTR: RAW: 0x88b988b9 Non-Snoop(ns):
> 189440 Snoop(ns): 189440
> XHCI LTR: RAW: 0x891a Non-Snoop(ns):
> 0 Snoop(ns): 288768
>
>>>
>>> Could somebody take a look into this? Or was this discussed somewhere
>>> else already? Or even fixed?
>>>
>>> Anyway, to get this tracked:
>>>
>>> #regzbot introduced: 44a13a5d99c71bf9e1676d9e51679daf4d7b3d73
>>> #regzbot from: James <jahutchinson99 at googlemail.com>
>>> #regzbot title: net: e1000e: instabilities on I219-V for kernel 5.14
>>> onwards
>>> #regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=215689
>>>
>>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>>>
>>> P.S.: As the Linux kernel's regression tracker I'm getting a lot of
>>> reports on my table. I can only look briefly into most of them and lack
>>> knowledge about most of the areas they concern. I thus unfortunately
>>> will sometimes get things wrong or miss something important. I hope
>>> that's not the case here; if you think it is, don't hesitate to tell me
>>> in a public reply, it's in everyone's interest to set the public record
>>> straight.
>>>
>>
>
>
>
More information about the Intel-wired-lan
mailing list