[Intel-wired-lan] [e1000e] Linux 4.9: unable to send packets after link recovery with patched driver

Wed Sep 4 10:06:27 UTC 2019

> 
> On 2019-09-03 21:39, Paul Menzel wrote:
> > Dear Tomas,
> >
> > On 2019-09-03 11:28, Winkler, Tomas wrote:
> >
> >>> On Tue, Sep 03, 2019 at 10:35:30AM +0200, Paul Menzel wrote:
> >
> >>>> On 03.09.19 09:56, Gavin Lambert wrote:
> >>>>> On 2019-08-20 14:15, I wrote:
> >>>>>> Does anyone have any ideas about this?  Either towards further
> >>>>>> investigation or to a possible resolution?
> >>>>>>
> >>>>>> This is at the point of hardware internals now, so I have no idea
> >>>>>> how to proceed in either area.
> >>>>>
> >>>>> To recap (plus some new info):
> >>>>>
> >>>>> 1. I am using a kernel module which uses the code from the e1000e
> >>>>> driver to communicate with the hardware without actually
> >>>>> registering it as a Linux netdev.  (This is partly because it can
> >>>>> get used in a Xenomai context outside of Linux itself, although
> >>>>> I'm not doing that
> >>>>> myself.) This historically works fine.
> >>>>>
> >>>>> 2. On certain Linux versions, I encountered an issue where
> >>>>> disconnecting the network cable and reconnecting it almost always
> >>>>> results in not being able to send any packets.  (I cannot
> >>>>> determine if receiving packets works in this case, as the network
> >>>>> design will not receive packets unless some are sent first.)
> >>>>> Restarting the driver (rmmod+modprobe) does recover from this case
> >>>>> (until the next link loss), but simply replugging the cable never does.
> >>>>>
> >>>>> 3. The problem was observed with both I219-V and I219-LM (on
> >>>>> motherboard), but was *not* observed with 82571EB (PCIE).  The
> >>>>> problem was not observed with a motherboard igb-based I211.  I
> >>>>> suspect the issue is limited to motherboard-based e1000e adapters.
> >>>>> (Or perhaps there's something different about how the IGBs are
> >>>>> internally connected.)
> >>>>>
> >>>>> 4. The problem does not occur when the e1000e driver is registered
> >>>>> "normally" as a Linux netdev.
> >>>>>
> >>>>> 5. The problem was introduced by "mei: me: allow runtime pm for
> >>>>> platform with D0i3" (which has been backported to 4.4+, as far as
> >>>>> I can
> >>> tell).
> >>>>> Excluding this commit reliably resolves the issue and including it
> >>>>> reliably breaks it.
> >>>>
> >>>> The commit hash in the master branch is
> >>>> cc365dcf0e56271bedf3de95f88922abe248e951 and is there since
> >>>> v4.16-rc1.
> >>>>
> >>>> Strange, that it is in 4.4 and 4.9, as it was only tagged for
> >>>> v4.13+.
> >>>>
> >>>>> 6. Applying the previously suggested patch
> >>>>> https://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue.
> >>>>>
> git/commit/drivers/net/ethernet/intel/e1000e?id=def4ec6dce393e2136
> >>>>> b6
> >>>>> 2a05712f35a7fa5f5e56 has no effect; the E1000_STATUS_PCIM_STATE
> >>>>> bit is not set when the issue occurs.
> >>>>>
> >>>>> 7. Given the content of the change in #5, I assumed that the
> >>>>> problem was power-management related, perhaps a side effect of the
> >>>>> e1000e driver not being registered as a netdev.  (So perhaps
> >>>>> something thinks that no devices are in use and turns something
> >>>>> off?)
> >>>>>
> >>>>> 8. I've previously posted register dumps from an e1000e in both
> >>>>> the "normal" and "link up but not transmitting" states.  They
> >>>>> seemed very similar, but as I'm not familiar with the register
> >>>>> meanings I may have overlooked something significant.  (Note that
> >>>>> the dumps were captured inside the watchdog task, when it detects
> >>>>> link up but before it sets
> >>>>> E1000_TCTL_EN.)
> >>>>>
> >>>>> 9. I enabled debug logging in the mei driver; it logs a couple of
> >>>>> runtime_idles and then a runtime_suspend during system startup.
> >>>>> (I added a log to runtime_resume that is missing in the driver
> >>>>> source, but it appears this does not get called in my scenario.)
> >>>>> Note that the e1000e driver is still working ok after this.. at
> >>>>> least at first.
> >>>>>
> >>>>> 10. "cat
> >>>>> /sys/bus/devices/pci0000:00/0000:00:16.0/power/runtime_status"
> >>>>> => "suspended"
> >>>>>      "cat
> >>>>>
> >>>
> /sys/bus/devices/pci0000:00/0000:00:16.0/mei/mei0/power/runtime_status"
> >>>>> => "unsupported"
> >>>>>      "cat
> >>>>> /sys/bus/devices/pci0000:00/0000:00:1f.0/power/runtime_status"
> >>>>> => "active"
> >>>>>      "cat
> >>>>> /sys/bus/devices/pci0000:00/0000:00:1f.6/power/runtime_status"
> >>>>> => "active" (this is the actual NIC)
> >>>>>      These don't change between the working and non-working states.
> >>>>> (It's possible that some other device does, but I haven't found it
> >>>>> yet.)
> >>>>>
> >>>>> 11. I did try forcing the above to unsuspend, but this did not
> >>>>> recover from the e1000e issue.
> >>>>>
> >>>>> 12. I also tried calling e1000e_reset on link-down.  This produces
> >>>>> different register output on link-up, but doesn't recover from the
> >>>>> issue.
> >>>>>
> >>>>> 13. I also tried recompiling the kernel with CONFIG_PM disabled
> >>>>> (no power management).  This *does* resolve the problem (but is a
> >>>>> very big hammer).
> >>>>>
> >>>>> 14. Possibly also of interest is that if I do *both* #12 and #13,
> >>>>> the problem remains (suggesting #12 was counter-productive).
> >>>>>
> >>>>> FYI the hardware on one of the test machines is as follows:
> >>>>>      00:00.0 Host bridge: Intel Corporation Device 591f (rev 05)
> >>>>>      00:01.0 PCI bridge: Intel Corporation Skylake PCIe Controller
> >>>>> (x16) (rev 05)
> >>>>>      00:02.0 VGA compatible controller: Intel Corporation Device
> >>>>> 5912 (rev 04)
> >>>>>      00:08.0 System peripheral: Intel Corporation Skylake Gaussian
> >>>>> Mixture Model
> >>>>>      00:14.0 USB controller: Intel Corporation Sunrise Point-H USB
> >>>>> 3.0  xHCI Controller (rev 31)
> >>>>>      00:14.2 Signal processing controller: Intel Corporation
> >>>>> Sunrise Point-H Thermal subsystem (rev 31)
> >>>>>      00:15.0 Signal processing controller: Intel Corporation
> >>>>> Sunrise Point-H Serial IO I2C Controller #0 (rev 31)
> >>>>>      00:15.1 Signal processing controller: Intel Corporation
> >>>>> Sunrise Point-H Serial IO I2C Controller #1 (rev 31)
> >>>>>      00:16.0 Communication controller: Intel Corporation Sunrise
> >>>>> Point-H CSME HECI #1 (rev 31)
> >>>>>      00:17.0 SATA controller: Intel Corporation Sunrise Point-H
> >>>>> SATA controller [AHCI mode] (rev 31)
> >>>>>      00:1b.0 PCI bridge: Intel Corporation Sunrise Point-H PCI
> >>>>> Root Port #19 (rev f1)
> >>>>>      00:1b.3 PCI bridge: Intel Corporation Sunrise Point-H PCI
> >>>>> Root Port #20 (rev f1)
> >>>>>      00:1c.0 PCI bridge: Intel Corporation Sunrise Point-H PCI
> >>>>> Express Root Port #5 (rev f1)
> >>>>>      00:1d.0 PCI bridge: Intel Corporation Sunrise Point-H PCI
> >>>>> Express Root Port #11 (rev f1)
> >>>>>      00:1e.0 Signal processing controller: Intel Corporation
> >>>>> Sunrise Point-H Serial IO UART #0 (rev 31)
> >>>>>      00:1f.0 ISA bridge: Intel Corporation Sunrise Point-H LPC
> >>>>> Controller (rev 31)
> >>>>>      00:1f.2 Memory controller: Intel Corporation Sunrise Point-H
> >>>>> PMC (rev 31)
> >>>>>      00:1f.4 SMBus: Intel Corporation Sunrise Point-H SMBus (rev
> >>>>> 31)
> >>>>>      00:1f.6 Ethernet controller: Intel Corporation Ethernet
> >>>>> Connection (2) I219-LM (rev 31)
> >>>>>      02:00.0 Ethernet controller: Intel Corporation I211 Gigabit
> >>>>> Network Connection (rev 03)
> >>>>>      03:00.0 Ethernet controller: Intel Corporation I211 Gigabit
> >>>>> Network Connection (rev 03)
> >>>>>      05:00.0 Ethernet controller: Intel Corporation I211 Gigabit
> >>>>> Network Connection (rev 03)
> >
> > (Tomas, your MUA wrapped the lines messing up the formatting.)

Sorry, it's outlook.  

> >
> >>>>> I'm happy to add any code instrumentation or make any other
> >>>>> changes needed to locate and resolve the problem, and I can
> >>>>> readily reproduce it
> >>>>> -- I'm just at a complete loss as to where to start looking, and
> >>>>> am still hoping for some suggestions in that regard.
> >>>>>
> >>>>> If there's anywhere (or anyone) else better for me to talk to
> >>>>> about this issue, please let me know that too.
> >>>>
> >>>> It is not clear to me, if this is still reproducible on Linux
> >>>> 5.3-rc7
> >>>> (or Linus’ master branch).
> >>>>
> >>>> If it is, this is a definitely regression, and the commits need to
> >>>> be reverted due to Linux’ no regression policy.
> >>>
> >>> So I should revert this from 4.4.y and 4.9.y?
> >>
> >> The issue is not in mei driver, it is in e1000 driver, I my best
> >> knowledge there should be fix, please Vitaly can it be backported to
> >> older kernels?
> >
> > Tomas, backporting the commit supposedly fixing this, does *not* help.

I hope that Vitaly can address that.

> > Also, it does not matter for the no regression policy.

There are power consumption implication if you revert this commit for everyone, while the issue is present only on some platforms.
You can still disable runtime power management via sysfs and permanently using udev rule on your particular system.
e.g. ATTR{../../power/control}="on"

> >
> > Let’s wait until Gavin can confirm if it is happening with Linux
> > 5.3-rc7.
> 
> As noted above (and in a prior email), the problem doesn't occur when using
> the driver "normally" within Linux.  The triggering environment is where the
> driver init/send/receive code is being executed directly
> *without* being registered as a Linux netdev.
> 
> It is likely that the "real problem" is some side effect of this, such as
> something checking if a child device is in use or powered down but it's not
> registered.
> 
> My environment is currently based on this tree:
> 
> > Using this kernel tree:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git
> > /log/?h=v4.9-rt&ofs=3120
> >
> > I've identified that the code at tag v4.9.126 is "good" and the code
> > at tag v4.9.127 is "bad".
> (I then narrowed it down to that specific commit.)
> 
> To reiterate, there is probably no problem with standard usage of the
> drivers as part of Linux.
> 
> But in this particular non-standard-edge-case-usage, there seems to be some
> unfortunate interaction between the mei driver power management change
> and link-loss in onboard e1000e, and I'm trying to figure out the cause and
> hopefully a fix/workaround (or at least one less serious than disabling power
> management entirely).
This is some underlying issue, I'm don't think you can be able to resolve it yourself,  e1000 guys should provide the fix.
Unfortunately I cannot really fix this issue form the mei side. 

> 
> Some more context from my original email:
> > I'm using a system with an e1000e network driver which has been
> > patched to bypass the regular Linux network stack (because it can get
> > called from a Xenomai RT context, among other reasons -- although in
> > my case I'm not doing that).  The complete source for the patched
> > version of the code can be found here:
> >
> > https://github.com/ribalda/ethercat/blob/master/devices/e1000e/netdev-
> > 4.9-ethercat.c (There are some minor changes to other files, but the
> > majority of changes are only to this file.  You can see just the
> > changes at
> > https://gist.github.com/uecasm/5e36a15bda6ffd53079344fc443dcc5f/revisi
> > ons
> > .)
> >
> > It was originally based on the in-kernel e1000e driver as of Linux
> > 4.9.65.  (I'm not the person who originally made the patches, but I am
> > the person who rebased them to kernel 4.9 and I'm the one trying to
> > maintain them for newer kernel versions.  Though I'm also not the
> > person who made that github repo.)

You will need to eventually incorporate the e1000 fix when resolved also to your code base.
For now the easiest workaround is to disable power management on mei from outside on effected platforms.

Tomas