[Intel-wired-lan] [e1000e] Linux 4.9: unable to send packets after link recovery with patched driver
Paul Menzel
pmenzel at molgen.mpg.de
Thu Jul 18 08:22:04 UTC 2019
[private answer]
Dear Gavin,
Your messages were delivered to the list subscribers.
On 18.07.19 10:06, Gavin Lambert wrote:
> On 2019-07-12 15:23, I wrote:
>> On 2019-07-11 18:50, I wrote:
>>> On a Debian system with kernel linux-image-4.9.0-4-rt-amd64 (4.9.65)
>>> installed, this works perfectly. It also works perfectly with
>>> linux-image-4.9.0-8-rt-amd64 (4.9.110).
>>>
>>> However, with kernel linux-image-4.9.0-9-rt-amd64 (4.9.168) installed
>>> (and no other changes to the system other than building the patched
>>> e1000e module against this kernel's headers), something weird happens
>>> when the driver is running in its alternate "ecdev" mode.
> [...]
>> Since this was mostly just a rebase error (you can see a similar
>> change in the old location of this code), I'm not sure if this helps
>> narrow down the source of the problem between 4.9.110 and 4.9.168 or
>> not. I'm still looking for ideas for that.
>
> Using this kernel tree:
> https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/log/?h=v4.9-rt&ofs=3120
>
> I've identified that the code at tag v4.9.126 is "good" and the code at
> tag v4.9.127 is "bad".
>
> I've done a bisect (twice, from different starting points) and both
> times settled on this commit as the one which introduced the problem I'm
> experiencing:
>
> commit c0b809985a7a418fcc3361c239ae79250245282d (refs/bisect/bad)
> Author: Tomas Winkler <tomas.winkler at intel.com>
> Date: Tue Jan 2 12:01:41 2018 +0200
>
> mei: me: allow runtime pm for platform with D0i3
>
> commit cc365dcf0e56271bedf3de95f88922abe248e951 upstream.
>
> >From the pci power documentation:
> "The driver itself should not call pm_runtime_allow(), though.
> Instead,
> it should let user space or some platform-specific code do that
> (user space
> can do it via sysfs as stated above)..."
>
> However, the S0ix residency cannot be reached without MEI device
> getting
> into low power state. Hence, for mei devices that support D0i3,
> it's better
> to make runtime power management mandatory and not rely on the system
> integration such as udev rules.
> This policy cannot be applied globally as some older platforms
> were found to have broken power management.
>
> Cc: <stable at vger.kernel.org> v4.13+
> Cc: Rafael J. Wysocki <rafael.j.wysocki at intel.com>
> Signed-off-by: Tomas Winkler <tomas.winkler at intel.com>
> Reviewed-by: Alexander Usyskin <alexander.usyskin at intel.com>
> Signed-off-by: Greg Kroah-Hartman <gregkh at linuxfoundation.org>
This commit was added in v4.16-rc1.
> It is reproducible every time; if I build at the parent commit
> (3d3432580911) then the driver works, and if I add the commit above then
> it fails.
>
> However it's unclear to me how this is affecting my modified e1000e
> driver in this way, except that it is perhaps power management related?
>
> Since it appears to be a pm_runtime-related thing, just as an experiment
> I did try commenting out every single call to pm_runtime* functions in
> netdev.c, but this did not resolve the problem. Ditto for anything with
> the word "suspend" in it. I also tried adding e_info() logging calls to
> most places that used pm_ calls other than pm_runtime_get/put (and in
> particular, in all of the pm_ops callbacks), and none of them were hit
> during the problem events.
>
> And even when it's not working, if I `cat` various things in
> `/sys/bus/pci/.../power/` on the adapter device, it appears to all be
> non-suspended, which makes me doubt that it really is a PM issue, unless
> I'm just looking in the wrong places.
If you found a faulty commit, please CC the commit authors, reviewers,
and subsystem maintainers and maybe even the regression address.
If you have time, please check with Linux master tree to see if a commit
fixing this has been added or you still need to revert it.
Kind regards,
Paul
More information about the Intel-wired-lan
mailing list