[Intel-wired-lan] [PATCH net] i40e: avoid NULL pointer dereference and recursive errors on early PCI error
Bowers, AndrewX
andrewx.bowers at intel.com
Fri Sep 30 22:55:28 UTC 2016
> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces at lists.osuosl.org] On
> Behalf Of Guilherme G. Piccoli
> Sent: Tuesday, September 27, 2016 2:15 PM
> To: Kirsher, Jeffrey T <jeffrey.t.kirsher at intel.com>; intel-wired-
> lan at lists.osuosl.org
> Cc: netdev at vger.kernel.org; gpiccoli at linux.vnet.ibm.com
> Subject: [Intel-wired-lan] [PATCH net] i40e: avoid NULL pointer dereference
> and recursive errors on early PCI error
>
> Although rare, it's possible to hit PCI error early on device probe, meaning
> possibly some structs are not entirely initialized, and some might even be
> completely uninitialized, leading to NULL pointer dereference.
>
> The i40e driver currently presents a "bad" behavior if device hits such early
> PCI error: firstly, the struct i40e_pf might not be attached to pci_dev yet,
> leading to a NULL pointer dereference on access to pf->state.
>
> Even checking if the struct is NULL and avoiding the access in that case isn't
> enough, since the driver cannot recover from PCI error that early; in our
> experiments we saw multiple failures on kernel log, like:
>
> [549.664] i40e 0007:01:00.1: Initial pf_reset failed: -15
> [549.664] i40e: probe of 0007:01:00.1 failed with error -15
> [...]
> [871.644] i40e 0007:01:00.1: The driver for the device stopped because the
> device firmware failed to init. Try updating your NVM image.
> [871.644] i40e: probe of 0007:01:00.1 failed with error -32
> [...]
> [872.516] i40e 0007:01:00.0: ARQ: Unknown event 0x0000 ignored
>
> Between the first probe failure (error -15) and the second (error -32) another
> PCI error happened due to the first bad probe. Also, driver started to flood
> console with those ARQ event messages.
>
> This patch will prevent these issues by allowing error recovery mechanism to
> remove the failed device from the system instead of trying to recover from
> early PCI errors during device probe.
>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli at linux.vnet.ibm.com>
> ---
> drivers/net/ethernet/intel/i40e/i40e_main.c | 6 ++++++
> 1 file changed, 6 insertions(+)
Tested-by: Andrew Bowers <andrewx.bowers at intel.com>
More information about the Intel-wired-lan
mailing list