[Intel-wired-lan] [PATCH net-next v5 1/4] igb: add support of RX network flow classification

'Matt Porter' mporter at konsulko.com
Thu Jun 30 15:28:16 UTC 2016


On Thu, Jun 30, 2016 at 01:20:44AM +0000, Brown, Aaron F wrote:
> > From: Brown, Aaron F
> > Sent: Wednesday, June 29, 2016 1:06 PM
> > To: Matt Porter <mporter at konsulko.com>
> > Cc: Gangfeng <gangfeng.huang at ni.com>; intel-wired-lan at lists.osuosl.org;
> > Ruhao Gao <ruhao.gao at ni.com>
> > Subject: RE: [Intel-wired-lan] [PATCH net-next v5 1/4] igb: add support of RX
> > network flow classification
> > 
> > > From: Matt Porter [mailto:mporter at konsulko.com]
> > > Sent: Wednesday, June 29, 2016 12:13 PM
> > > To: Brown, Aaron F <aaron.f.brown at intel.com>
> > > Cc: Gangfeng <gangfeng.huang at ni.com>; intel-wired-lan at lists.osuosl.org;
> > > Ruhao Gao <ruhao.gao at ni.com>
> > > Subject: Re: [Intel-wired-lan] [PATCH net-next v5 1/4] igb: add support of
> > RX
> > > network flow classification
> > >
> > > On Mon, May 16, 2016 at 10:09:04PM +0000, Brown, Aaron F wrote:
> > > > > From: Intel-wired-lan [mailto:intel-wired-lan-bounces at lists.osuosl.org]
> > > On
> > > > > Behalf Of Gangfeng
> > > > > Sent: Monday, May 9, 2016 2:28 AM
> > > > > To: intel-wired-lan at lists.osuosl.org
> > > > > Cc: Gangfeng Huang <gangfeng.huang at ni.com>; Ruhao Gao
> > > > > <ruhao.gao at ni.com>
> > > > > Subject: [Intel-wired-lan] [PATCH net-next v5 1/4] igb: add support of RX
> > > > > network flow classification
> > > > >
> > > > > From: Gangfeng Huang <gangfeng.huang at ni.com>
> > > > >
> > > > > This patch is meant to allow for RX network flow classification to insert
> > > > > and remove Rx filter by ethtool. Ethtool interface has it's own rules
> > > > > manager
> > > > >
> > > > > Show all filters:
> > > > > $ ethtool -n eth0
> > > > > 4 RX rings available
> > > > > Total 2 rules
> > > > >
> > > > > Signed-off-by: Ruhao Gao <ruhao.gao at ni.com>
> > > > > Signed-off-by: Gangfeng Huang <gangfeng.huang at ni.com>
> > > > > ---
> > > > >  drivers/net/ethernet/intel/igb/igb.h         |  32 +++++
> > > > >  drivers/net/ethernet/intel/igb/igb_ethtool.c | 193
> > > > > +++++++++++++++++++++++++++
> > > > >  drivers/net/ethernet/intel/igb/igb_main.c    |  44 ++++++
> > > > >  3 files changed, 269 insertions(+)
> > > >
> > > > This patch is causing 3/4 of my regression systems to fail.  Driver load
> > > seems normal, but applying an IP address via ifconfig causes the following
> > > splat in dmesg and /var/log/messages:
> > >
> > > Hi Aaron,
> > >
> > > I'm looking at this series on current net-next and am wondering if you
> > > saw this issue with just patch 1 applied or you meant the entire series?
> > 
> > Hi Matt,
> > 
> > My recollection is that I saw it with just patch 1 applied.  And my procedure
> > when I see an issue with a series is to try and isolate it to the individual patch
> > and reply to the one in the series that triggers the issue, so I am pretty sure it
> > was with this patch applied and the rest of the series not applied.
> 
> I was able to apply this (v5 1/4) patch to a recent version of Jeff's next-queue / dev-branch (with a little fuzz) and reproduce the problem on one of the systems that was previously triggering it without any difficulty.  It does occur with just this one patch applied.  This was with the system that has an i211 and a pair of 82580 ports.  I will still try to sort out if it happens with just i211 or the pair of 82580s (leaning towards the i211 as a system with a pair of 82580s and an i210 worked fine) but have been searching for another i211 as the system in question is at the bottom of a rack with a bunch of systems stacked on top of it, making the card cage rather difficult to get to.

Ok, thanks for trying this again. I'm switching to that branch and
trying to get a multi-port system together to try to reproduce here
though it sounds like I may need to pick up more than the i210's I have
handy.

-Matt

> > > I've been working with this on an i210 and haven't reproduced your
> > > results yet either with just the (non-functional) first patch applied or
> > > the entire series. However, I noticed you had no problems on your system
> > > with an i210.
> > 
> > Correct, the system with an i210 included was one of the ones not affected
> > by this.  I'm not sure if that is due to it not being a problem with the i210 or
> > something more elusive like the system's chipset of a variation in the .config.
> > 
> > >
> > > > ----------------------------------------------
> > > > May 16 14:37:50 u1486 kernel: Hardware name: Supermicro A1SAi/A1SRi,
> > > BIOS 1.0b 11/06/2013
> > > > May 16 14:37:50 u1486 kernel: 0000000000000000 ffff880849ad3938
> > > ffffffff813373d7 0000000000000007
> > > > May 16 14:37:50 u1486 kernel: 0000000000000006 0000000000000000
> > > ffff88085c2f6770 ffff880849ad3a58
> > > > May 16 14:37:50 u1486 kernel: ffffffff810c4e13 ffff880849ad39f8
> > > 0000000000000005 0000000000000000
> > > > May 16 14:37:50 u1486 kernel: Call Trace:
> > > > May 16 14:37:50 u1486 kernel: [<ffffffff813373d7>]
> > dump_stack+0x6b/0xa4
> > > > May 16 14:37:50 u1486 kernel: [<ffffffff810c4e13>]
> > > register_lock_class+0x523/0x5c0
> > > > May 16 14:37:50 u1486 kernel: [<ffffffff8136644b>] ?
> > > check_preemption_disabled+0x1b/0x110
> > > > May 16 14:37:50 u1486 kernel: [<ffffffff811f5655>] ? kfree+0x1a5/0x3a0
> > > > May 16 14:37:50 u1486 kernel: [<ffffffff81366553>] ?
> > > __this_cpu_preempt_check+0x13/0x20
> > > > May 16 14:37:50 u1486 kernel: [<ffffffff810c7ae0>]
> > > __lock_acquire+0x80/0x5d0
> > > > May 16 14:37:50 u1486 kernel: [<ffffffff811f83f5>] ?
> > > __kmalloc+0x265/0x3a0
> > > > May 16 14:37:50 u1486 kernel: [<ffffffffa051864f>] ? kzalloc+0xf/0x20
> > [igb]
> > > > May 16 14:37:50 u1486 kernel: [<ffffffff810c80fa>]
> > > lock_acquire+0xca/0x240
> > > > May 16 14:37:50 u1486 kernel: [<ffffffffa0520c3f>] ?
> > > igb_configure+0xaf/0x1d0 [igb]
> > > > May 16 14:37:50 u1486 kernel: [<ffffffff815b958b>] ?
> > > netdev_rss_key_fill+0x5b/0xa0
> > > > May 16 14:37:50 u1486 kernel: [<ffffffffa052dfb9>] ?
> > > igb_vfta_set+0x189/0x1f0 [igb]
> > > > May 16 14:37:50 u1486 kernel: [<ffffffff816a8930>]
> > > _raw_spin_lock+0x40/0x80
> > > > May 16 14:37:50 u1486 kernel: [<ffffffffa0520c3f>] ?
> > > igb_configure+0xaf/0x1d0 [igb]
> > > > May 16 14:37:50 u1486 kernel: [<ffffffffa051bf62>] ?
> > > igb_setup_rctl+0x22/0xb0 [igb]
> > > > May 16 14:37:50 u1486 kernel: [<ffffffffa0520c3f>]
> > > igb_configure+0xaf/0x1d0 [igb]
> > > > May 16 14:37:50 u1486 kernel: [<ffffffffa052408d>]
> > > __igb_open+0xfd/0x300 [igb]
> > > > May 16 14:37:50 u1486 kernel: [<ffffffff815ab260>] ?
> > > call_netdevice_notifiers_info+0x40/0x70
> > > > May 16 14:37:50 u1486 kernel: [<ffffffffa0524420>] igb_open+0x10/0x20
> > > [igb]
> > > > May 16 14:37:50 u1486 kernel: [<ffffffff815ac7f8>]
> > > __dev_open+0xb8/0x110
> > > > May 16 14:37:50 u1486 kernel: [<ffffffff815ac5fc>]
> > > __dev_change_flags+0xac/0x180
> > > > May 16 14:37:50 u1486 kernel: [<ffffffff815ac700>]
> > > dev_change_flags+0x30/0x70
> > > > May 16 14:37:50 u1486 kernel: [<ffffffff815c6685>] ?
> > > lockdep_rtnl_is_held+0x15/0x20
> > > > May 16 14:37:50 u1486 kernel: [<ffffffff816403a5>]
> > > devinet_ioctl+0x5b5/0x620
> > > > May 16 14:37:50 u1486 kernel: [<ffffffff81156660>] ?
> > > trace_buffer_unlock_commit+0x60/0x80
> > > > May 16 14:37:50 u1486 kernel: [<ffffffff81643033>] inet_ioctl+0x63/0x80
> > > > May 16 14:37:50 u1486 kernel: [<ffffffff8158fd60>]
> > > sock_do_ioctl+0x30/0x70
> > > > May 16 14:37:50 u1486 kernel: [<ffffffff815901b3>]
> > sock_ioctl+0x73/0x280
> > > > May 16 14:37:50 u1486 kernel: [<ffffffff8121f678>] vfs_ioctl+0x18/0x30
> > > > May 16 14:37:50 u1486 kernel: [<ffffffff81220057>]
> > > do_vfs_ioctl+0x87/0x430
> > > > May 16 14:37:50 u1486 kernel: [<ffffffff8100297e>] ?
> > > syscall_trace_enter_phase2+0x6e/0x280
> > > > May 16 14:37:50 u1486 kernel: [<ffffffff81220492>] SyS_ioctl+0x92/0xa0
> > > > May 16 14:37:50 u1486 kernel: [<ffffffff81002fd3>]
> > > do_syscall_64+0x63/0x130
> > > > May 16 14:37:50 u1486 kernel: [<ffffffff8100201b>] ?
> > > trace_hardirqs_on_thunk+0x1b/0x1d
> > > > May 16 14:37:50 u1486 kernel: [<ffffffff816a981a>]
> > > entry_SYSCALL64_slow_path+0x25/0x25
> > > > ----------------------------------------------
> > >
> > > Since it's doing dump_stack in register_lock_class it appears some of
> > > the error has been truncated before this stack trace. Can you confirm
> > > that this is the complete output logged? By inspection, I would expect
> > > to see one of the contextual messages from register_lock_class when it
> > > calls dump_stack.
> > 
> > I will see if I still have a copy of, or can reproduce the trace along with more
> > of the log messages leading up to it.
> 
> Yes, a tiny bit of more info before the trace.  I cleared the dmesg queue before running ifconfig and here is what I got, I also checked /var/log/messages and did not see any extra information there:
> ------------------------------------------------------------------------------------------------------------------------

Aha, yes, that helps isolate it a bit further.

> pps_core: LinuxPPS API ver. 1 registered
> pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti at linux.it>
> PTP clock support registered
> dca service started, version 1.12.1
> igb: Intel(R) Gigabit Ethernet Network Driver - version 5.3.0-k
> igb: Copyright (c) 2007-2014 Intel Corporation.
> pps pps0: new PPS source ptp0
> igb 0000:08:00.0: added PHC on eth1
> igb 0000:08:00.0: Intel(R) Gigabit Ethernet Network Connection
> igb 0000:08:00.0: eth1: (PCIe:2.5Gb/s:Width x1) a0:36:9f:0c:76:de
> igb 0000:08:00.0: eth1: PBA No: FFFFFF-0FF
> igb 0000:08:00.0: Using MSI-X interrupts. 2 rx queue(s), 2 tx queue(s)
> igb 0000:09:00.0: added PHC on eth2
> igb 0000:09:00.0: Intel(R) Gigabit Ethernet Network Connection
> igb 0000:09:00.0: eth2: (PCIe:2.5Gb/s:Width x4) 00:1b:21:56:25:64
> igb 0000:09:00.0: eth2: PBA No: Unknown
> igb 0000:09:00.0: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s)
> igb 0000:09:00.1: added PHC on eth3
> igb 0000:09:00.1: Intel(R) Gigabit Ethernet Network Connection
> igb 0000:09:00.1: eth3: (PCIe:2.5Gb/s:Width x4) 00:1b:21:56:25:65
> igb 0000:09:00.1: eth3: PBA No: Unknown
> igb 0000:09:00.1: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s)
> BUG: spinlock bad magic on CPU#6, ifconfig/5569
>  lock: 0xffff88007cbca080, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
> CPU: 6 PID: 5569 Comm: ifconfig Not tainted 4.7.0-rc2_next_dev_rx_flow-00941-gb9e6e3f-dirty #2
> Hardware name: Supermicro X7DBX/X7DBX, BIOS 2.1 06/23/2008
>  0000000000000000 ffff880074393b08 ffffffff811e2937 0000000000000000
>  ffffffff81717dbc ffff88007cbca080 0000000000000000 ffff880074393b28
>  ffffffff81070724 ffff88007cbca080 ffff88007cbc8db0 ffff880074393b48
> Call Trace:
>  [<ffffffff811e2937>] dump_stack+0x53/0x74
>  [<ffffffff81070724>] spin_dump+0x86/0x8b
>  [<ffffffff8107074f>] spin_bug+0x26/0x28
>  [<ffffffff810708d5>] do_raw_spin_lock+0x29/0x12c
>  [<ffffffff811f55ad>] ? __do_once_done+0x71/0x78
>  [<ffffffff814b488b>] _raw_spin_lock+0x1e/0x22
>  [<ffffffffa00bd0d7>] igb_configure+0x2c8/0x3b1 [igb]
>  [<ffffffffa00c0295>] __igb_open+0xb3/0x509 [igb]
>  [<ffffffff81404799>] ? call_netdevice_notifiers_info+0x51/0x5a
>  [<ffffffffa00c083c>] igb_open+0xb/0xd [igb]
>  [<ffffffff81406542>] __dev_open+0xa7/0xf5
>  [<ffffffff814063aa>] __dev_change_flags+0xb5/0x14d
>  [<ffffffff81406465>] dev_change_flags+0x23/0x59
>  [<ffffffff814b2b4f>] ? mutex_lock+0x27/0x38
> [<ffffffff8145cc49>] devinet_ioctl+0x296/0x576
>  [<ffffffff8145ecba>] inet_ioctl+0x92/0xaa
>  [<ffffffff813f1566>] sock_ioctl+0x204/0x229
>  [<ffffffff811092cf>] vfs_ioctl+0x13/0x23
>  [<ffffffff811099e2>] do_vfs_ioctl+0x5e8/0x621
>  [<ffffffff811fe00c>] ? __percpu_counter_add+0x8c/0xa8
>  [<ffffffff810dfca7>] ? do_munmap+0x2e7/0x301
>  [<ffffffff81109a5e>] SyS_ioctl+0x43/0x62
>  [<ffffffff814b499b>] entry_SYSCALL_64_fastpath+0x13/0x8f
> IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
> igb 0000:08:00.0 eth1: igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
> IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
> ------------------------------------------------------------------------------------------------------------------------
> > 
> > >
> > > Also, any chance of seeing a .config for this run or a freshly
> > > reproduced run?
> 
> Sure, I've attached the current .config from the system that I just reproduced it on.  I know we should not use attachments on the lists, but a .config file is rather long for pasting into a message.  If it gets stripped let me know and I'll figure out how to get it to you.

Got it. Thanks for everything so far, this gives me quite a bit more to
go on for debugging.

-Matt

> 
> > > By inspection at least there's no obvious locking or
> > > otherwise issues in the open path (only *filter_restore() is executed on
> > > open and it's a mostly a NOP if this is just patch 1 applied) so I think
> > > we need some more detailed output since you have the only system that
> > > seems
> > > to produce this issue.
> > 
> > I can certainly get you a copy of the .config file on the affected systems,
> > however, they will have changed some as the kernel gets updated frequently
> > for tests, along with a make oldconfig pushing occasional changes in.
> > Assuming I can re-apply the patch to the current tree I'll try and reproduce
> > the issue and get you copies of a .config known to be current when the issue
> > strikes.
> > 
> > >
> > > Any other details you can provide would be appreciated. I'm happy to dig
> > > into the root cause.
> > 
> > Only thing that immediately comes to mind is that my test systems all have
> > multiple ports to minimize the lab space needed to get a sampling of the
> > different parts.  I will see if I can reproduce the issue in a system with a single
> > port.  If I remember correctly, it was a consistent issue, always appearing
> > relatively quickly on the affected systems.
> > 
> > >
> > > Thanks,
> > > Matt




More information about the Intel-wired-lan mailing list