[Intel-wired-lan] [PATCH] fm10k: restore D0 power state upon probe

Keller, Jacob E jacob.e.keller at intel.com
Thu Jun 9 18:30:15 UTC 2016


> -----Original Message-----
> From: Alexander Duyck [mailto:alexander.duyck at gmail.com]
> Sent: Thursday, June 09, 2016 10:55 AM
> To: Keller, Jacob E <jacob.e.keller at intel.com>
> Cc: Intel Wired LAN <intel-wired-lan at lists.osuosl.org>
> Subject: Re: [Intel-wired-lan] [PATCH] fm10k: restore D0 power state upon
> probe
> 
> On Thu, Jun 9, 2016 at 10:38 AM, Keller, Jacob E
> <jacob.e.keller at intel.com> wrote:
> >
> >
> >> -----Original Message-----
> >> From: Alexander Duyck [mailto:alexander.duyck at gmail.com]
> >> Sent: Wednesday, June 08, 2016 2:34 PM
> >> To: Keller, Jacob E <jacob.e.keller at intel.com>
> >> Cc: Intel Wired LAN <intel-wired-lan at lists.osuosl.org>
> >> Subject: Re: [Intel-wired-lan] [PATCH] fm10k: restore D0 power state
> upon
> >> probe
> >>
> >> On Wed, Jun 8, 2016 at 2:23 PM, Keller, Jacob E
> >> <jacob.e.keller at intel.com> wrote:
> >> >> -----Original Message-----
> >> >> From: Alexander Duyck [mailto:alexander.duyck at gmail.com]
> >> >> Sent: Wednesday, June 08, 2016 2:23 PM
> >> >> To: Keller, Jacob E <jacob.e.keller at intel.com>
> >> >> Cc: Intel Wired LAN <intel-wired-lan at lists.osuosl.org>
> >> >> Subject: Re: [Intel-wired-lan] [PATCH] fm10k: restore D0 power state
> >> upon
> >> >> probe
> >> >>
> >> >> On Wed, Jun 8, 2016 at 2:18 PM, Keller, Jacob E
> >> >> <jacob.e.keller at intel.com> wrote:
> >> >> >> -----Original Message-----
> >> >> >> From: Alexander Duyck [mailto:alexander.duyck at gmail.com]
> >> >> >> Sent: Wednesday, June 08, 2016 2:02 PM
> >> >> >> To: Keller, Jacob E <jacob.e.keller at intel.com>
> >> >> >> Cc: Intel Wired LAN <intel-wired-lan at lists.osuosl.org>
> >> >> >> Subject: Re: [Intel-wired-lan] [PATCH] fm10k: restore D0 power
> state
> >> >> upon
> >> >> >> probe
> >> >> >>
> >> >> >> On Wed, Jun 8, 2016 at 12:11 PM, Keller, Jacob E
> >> >> >> <jacob.e.keller at intel.com> wrote:
> >> >> >> >> -----Original Message-----
> >> >> >> >> From: Alexander Duyck [mailto:alexander.duyck at gmail.com]
> >> >> >> >> Sent: Tuesday, June 07, 2016 8:54 PM
> >> >> >> >> To: Keller, Jacob E <jacob.e.keller at intel.com>
> >> >> >> >> Cc: Intel Wired LAN <intel-wired-lan at lists.osuosl.org>
> >> >> >> >> Subject: Re: [Intel-wired-lan] [PATCH] fm10k: restore D0 power
> >> state
> >> >> >> upon
> >> >> >> >> probe
> >> >> >> >>
> >> >> >> >> On Tue, Jun 7, 2016 at 5:20 PM, Jacob Keller
> >> >> <jacob.e.keller at intel.com>
> >> >> >> >> wrote:
> >> >> >> >> > It seems that under some circumstances, such as after
> >> performing
> >> >> an
> >> >> >> >> > unbind following an AER injection recovery, the power state
> of a
> >> VF
> >> >> >> >> > isn't set to PCI_D0. Following the unbind if we attempt to bind
> >> the
> >> >> >> >> > device, the MSI-X initialization fails due to incorrect power
> state.
> >> >> >> >>
> >> >> >> >> That doesn't make much sense.  I didn't think the VFs had their
> own
> >> >> >> >> power state control.  After all it isn't as if they could switch off
> >> >> >> >> while the PF is active on the host.
> >> >> >> >>
> >> >> >> >> > Fix this by always calling pci_set_power_state(pdev, PCI_D0)
> on
> >> >> driver
> >> >> >> >> > device probe.
> >> >> >> >>
> >> >> >> >> That should be redundant.  There is already code that is called
> to
> >> >> >> >> transition the power state to D0 in pci_enable_device_mem.  As
> >> long
> >> >> as
> >> >> >> >> nobody else has enabled the device it will be ran.
> >> >> >> >>
> >> >> >> >
> >> >> >> > That's weird. In this case, after doing aer-inject on the PF, if we
> >> unbind
> >> >> >> then bind the VF device, it fails in pci_enable_msix_range which I
> >> traced
> >> >> into
> >> >> >> eventually a call that fails because the state isn't PCI_D0. If I add a
> >> >> >> "pci_set_power_state(pdev, PCI_D0)" call in the probe routine it
> >> works
> >> >> >> fine...?
> >> >> >> >
> >> >> >> > I don't really know why that'd be the case... thoughts?
> >> >> >>
> >> >> >> Does the VF even advertise support for PCI power management
> >> >> >> capability?  If I recall a VF shouldn't.  As such I think a call to
> >> >> >> pci_set_power_state should be returning an error and not doing
> >> >> >> anything so I would think this code only has any effect on the PF.
> So
> >> >> >> I am not certain what this patch is even doing if you aren't
> reloading
> >> >> >> the PF driver to trigger it.
> >> >> >>
> >> >> >> - Alex
> >> >> >
> >> >> > I'm not sure either but without it we're ending up with the failure in
> >> >> pci_msi_supported failing with -EINVAL because dev->current_state
> !=
> >> >> PCI_D0...
> >> >> >
> >> >> > It only happens after an unbind of the VF device then a bind. I think
> >> >> somehow the dev->current_state gets messed up but I don't know
> how
> >> this
> >> >> happens.
> >> >> >
> >> >> > Thanks,
> >> >> > Jake
> >> >>
> >> >> When you say unbind/bind are you talking about using the sysfs or is
> >> >> this via some other mechanism?
> >> >>
> >> >> - Alex
> >> >
> >> > Using sysfs. We echo the PCI address to fm10k's unbind, then echo it to
> >> bind.
> >> >
> >> > Thanks,
> >> > Jake
> >>
> >> I'm wondering if we have a reference count bug being created
> >> somewhere.  One thing you might try tracking is the dev->enable_cnt
> >> value for the VF through probe and after a remove to see if we are
> >> leaking that somewhere.  The other thing to look for is to figure out
> >> where we are setting dev->current_state to something other than
> >> PCI_D0.  I'd be interested in knowing where we are messing on this
> >> with a VF and what we are setting it to.
> >>
> >> - Alex
> >
> > Yep, when we call pci_enable_device_mem() in fm10k_io_slot_reset() we
> leak an enable count and I think that’s at least partially responsible for the
> failure since I bet VFs don't actually expose power state, so once enabled it
> won't change back from unknown?
> 
> No, it will but you have to actually be re-enabling the device to trigger it.
> 
> You might try replacing the call to pci_enable_device_mem with
> pci_reenable_device and see if that works for you.  I suspect this is
> probably a bug that is common in many of the drivers out there.
> 
> - Alex

The other possible fix is to put pci_device_disable into the .io_error_detected handler, which used to be there in the past. The downside to this call is that we have to check when we remove if we've already disabled to avoid disabling the device twice.

I'll look into using reenable instead and see how that goes.

Thanks,
Jake


More information about the Intel-wired-lan mailing list