[Intel-wired-lan] igb firmware 1.63 broken / flapping on switch reboot - update or downgrade possible?
Christian Ruppert
idl0r at qasl.de
Wed May 19 13:09:21 UTC 2021
On 2021-05-19 13:57, Christian Ruppert wrote:
> Hi List,
>
> Problem: If we reboot a Switch that is connected to igb interfaces (we
> use bonding), the interface will flapp several times during the reboot
> of the switch
> Setup: 2x 1GE I350 (igb) connected to 2x Juniper EX3330 for example
> It's a active/backup Bonding with MIIMON being disabled and ARP check
> being configured
>
> What we have figured out so far, it seems to be a bug in firmware 1.63
> while a system with 1.61 seems to work just fine:
>
> We have a bunch of systems with:
> 02:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network
> Connection (rev 01)
> Subsystem: Super Micro Computer Inc Device 1521
> Kernel driver in use: igb
> Kernel modules: igb
> 02:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network
> Connection (rev 01)
> Subsystem: Super Micro Computer Inc Device 1521
> Kernel driver in use: igb
> Kernel modules: igb
>
> Lets pick 2 of those systems, first the good one:
> # ethtool -i net0
> driver: igb
> version: 5.6.0-k
> firmware-version: 1.61, 0x8000090e
> expansion-rom-version:
> bus-info: 0000:02:00.0
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: yes
> supports-register-dump: yes
> supports-priv-flags: yes
>
> # uname -r
> 3.10.0-1160.25.1.el7.x86_64
>
> CentOS 7.9
>
> # dmesg
> [627590.997603] igb 0000:02:00.0 net0: igb: net0 NIC Link is Down
> [627598.277441] bond0: link status definitely down for interface net0,
> disabling it
> [627598.278062] bond0: making interface net1 the new active one
> [627598.278536] device net0 left promiscuous mode
> [627598.279109] device net1 entered promiscuous mode
> [627856.894229] igb 0000:02:00.0 net0: igb: net0 NIC Link is Up 1000
> Mbps Full Duplex, Flow Control: RX/TX
> [627859.970951] bond0: link status definitely up for interface net0
> [627859.971577] bond0: making interface net0 the new active one
> [627859.972127] device net1 left promiscuous mode
> [627859.972801] device net0 entered promiscuous mode
>
>
> That's the complete switch reboot and that is how it's supposed to be.
>
> Now the broken one (we have multiple broken ones, all the same
> firmware version):
> # ethtool -i net0
> driver: igb
> version: 5.6.0-k
> firmware-version: 1.63, 0x80000a05
> expansion-rom-version:
> bus-info: 0000:01:00.0
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: yes
> supports-register-dump: yes
> supports-priv-flags: yes
>
> # uname -r
> 3.10.0-1160.25.1.el7.x86_64
>
> CentOS 7.9
>
> # dmesg[451689.477836] igb 0000:01:00.0 net0: igb: net0 NIC Link is
> Down
> [451697.112000] bond0: link status definitely down for interface net0,
> disabling it
> [451697.113060] bond0: making interface net1 the new active one
> [451697.113906] device net0 left promiscuous mode
> [451697.114840] device net1 entered promiscuous mode
> [451742.241325] bond0: link status definitely up for interface net0
> [451742.242276] bond0: making interface net0 the new active one
> [451742.243065] device net1 left promiscuous mode
> [451742.243976] device net0 entered promiscuous mode
> [451751.265579] bond0: link status definitely down for interface net0,
> disabling it
> [451751.266503] bond0: making interface net1 the new active one
> [451751.267300] device net0 left promiscuous mode
> [451751.268166] device net1 entered promiscuous mode
> [451817.443511] bond0: link status definitely up for interface net0
> [451817.444428] bond0: making interface net0 the new active one
> [451817.445216] device net1 left promiscuous mode
> [451817.446100] device net0 entered promiscuous mode
> [451826.467777] bond0: link status definitely down for interface net0,
> disabling it
> [451826.468836] bond0: making interface net1 the new active one
> [451826.469702] device net0 left promiscuous mode
> [451826.470534] device net1 entered promiscuous mode
> [451856.548666] bond0: link status definitely up for interface net0
> [451856.549534] bond0: making interface net0 the new active one
> [451856.550283] device net1 left promiscuous mode
> [451856.551142] device net0 entered promiscuous mode
> [451865.572959] bond0: link status definitely down for interface net0,
> disabling it
> [451865.573892] bond0: making interface net1 the new active one
> [451865.574671] device net0 left promiscuous mode
> [451865.575504] device net1 entered promiscuous mode
> [451874.597227] bond0: link status definitely up for interface net0
> [451874.598273] bond0: making interface net0 the new active one
> [451874.599057] device net1 left promiscuous mode
> [451874.599901] device net0 entered promiscuous mode
> [451883.621550] bond0: link status definitely down for interface net0,
> disabling it
> [451883.622382] bond0: making interface net1 the new active one
> [451883.623136] device net0 left promiscuous mode
> [451883.623898] device net1 entered promiscuous mode
> [451886.629557] bond0: link status definitely up for interface net0
> [451886.630416] bond0: making interface net0 the new active one
> [451886.631178] device net1 left promiscuous mode
> [451886.632051] device net0 entered promiscuous mode
> [451895.653860] bond0: link status definitely down for interface net0,
> disabling it
> [451895.654792] bond0: making interface net1 the new active one
> [451895.655548] device net0 left promiscuous mode
> [451895.656372] device net1 entered promiscuous mode
> [451898.661903] bond0: link status definitely up for interface net0
> [451898.662789] bond0: making interface net0 the new active one
> [451898.663582] device net1 left promiscuous mode
> [451898.664464] device net0 entered promiscuous mode
> [451907.686173] bond0: link status definitely down for interface net0,
> disabling it
> [451907.687090] bond0: making interface net1 the new active one
> [451907.687864] device net0 left promiscuous mode
> [451907.688700] device net1 entered promiscuous mode
> [451919.718549] bond0: link status definitely up for interface net0
> [451919.719403] bond0: making interface net0 the new active one
> [451919.720165] device net1 left promiscuous mode
> [451919.721040] device net0 entered promiscuous mode
> [451928.742836] bond0: link status definitely down for interface net0,
> disabling it
> [451928.743834] bond0: making interface net1 the new active one
> [451928.744601] device net0 left promiscuous mode
> [451928.745452] device net1 entered promiscuous mode
> [451949.799426] bond0: link status definitely up for interface net0
> [451949.800297] bond0: making interface net0 the new active one
> [451949.801080] device net1 left promiscuous mode
> [451949.801978] device net0 entered promiscuous mode
> [451954.463872] igb 0000:01:00.0 net0: igb: net0 NIC Link is Up 1000
> Mbps Full Duplex, Flow Control: RX/TX
>
> This is the same reboot as on the good one. It's the same switch
> they're connected to. The same bonding config etc. So it doesn't seem
> to be related to the bonding.
> # cat /proc/net/bonding/bond0
> Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
>
> Bonding Mode: fault-tolerance (active-backup)
> Primary Slave: net0 (primary_reselect always)
> Currently Active Slave: net0
> MII Status: up
> MII Polling Interval (ms): 0
> Up Delay (ms): 0
> Down Delay (ms): 0
> ARP Polling Interval (ms): 3000
> ARP IP target/s (n.n.n.n form): 192.168.99.105
>
> Slave Interface: net0
> MII Status: up
> Speed: 1000 Mbps
> Duplex: full
> Link Failure Count: 9
> Permanent HW addr: 0c:c4:7a:ab:f2:30
> Slave queue ID: 0
>
> Slave Interface: net1
> MII Status: up
> Speed: 1000 Mbps
> Duplex: full
> Link Failure Count: 1
> Permanent HW addr: 0c:c4:7a:ab:f2:31
> Slave queue ID: 0
>
>
> Is it possible to upgrade the firmware? Is there a more recent one at
> all? I couldn't find any info about that nor a changelog or something
> else so far. We'd do even a downgrade to get that fixed.
> The firmware doesn't seem to be included into the driver so I would
> assume there's an external package for it?
Ok, it's probably not the firmware :(
We also have systems with the same version that work, while others
don't. Something else must differ.
So I just found two systems, all the same as above, just that both have
1.63 and one works, the other one doesn't.
--
Regards,
Christian Ruppert
More information about the Intel-wired-lan
mailing list