[Intel-wired-lan] Counter spikes in /proc/net/dev for E810-CQDA2 interfaces (ice driver) on kernel >=6.2
Przemek Kitszel
przemyslaw.kitszel at intel.com
Fri Nov 17 10:13:34 UTC 2023
On 11/16/23 17:24, Christian Rohmann wrote:
> Dear sir or madam,
>
> we run multiple Intel E810-CQDA2 100G adapters (2x QSFP28) in our fleet
> of servers . The machines are running Ubuntu 22.04 LTS (Jammy), wieth
> Linux kernel 6.2.0-36-generic (Ubuntu HWE Kernel).
>
> This is the output from ethtool:
>
> ---cut ---
> # ethtool -i eth2
> driver: ice
> version: 6.2.0-36-generic
> firmware-version: 4.30 0x8001af29 1.3429.0
> expansion-rom-version:
> bus-info: 0000:a1:00.0
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: yes
> supports-register-dump: yes
> supports-priv-flags: yes
>
> --- cut ---
>
> We observe strange, totally unrealistic traffic spikes (Multiple
> Terabits/s) in our monitoring. We use the Prometheus Node Exporter and
> the netdev collector
> (https://github.com/prometheus/node_exporter/blob/ed1b8e3d88851806627e4f8262ee26232ca56c2c/collector/netdev_common.go#L39).
> I found issue https://github.com/prometheus/node_exporter/issues/1849
> and it appears that others have noticed similar issues with the counters.
>
> I have now dumped "/proc/net/dev" of one of the machines once per second
> to a logfile per interface to show the issue actually originates from
> the "ice" kernel driver
> and not from any of our other tooling.
Good move!
>
> I can provide the whole files, but if you just look at two timestamps in
> particular, you can actually see two jump in the counters:
>
> --- cut ---
> Inter-| Receive | Transmit
> face |bytes packets errs drop fifo frame compressed
> multicast|bytes packets errs drop fifo colls carrier compressed
> [...]
> Nov 16 14:44:17 eth2: 322480275246795 161202637791 12245 2396226 0
> 0 0 71204126 497958797609464 188500340907 0 0 0
> 0 0 0
> Nov 16 14:44:18 eth2: 386617853382565 193953665830 12245 2396226 0
> 0 0 71204282 593586606935949 223802656120 0 0 0
> 0 0 0
> [...]
> Nov 16 14:49:10 eth2: 386662845936810 193977501895 12247 2396226 0
> 0 0 71230993 593637495306092 223827197609 0 0 0
> 0 0 0
> Nov 16 14:49:11 eth2: 450845520538932 226752438356 12247 2396226 0
> 0 0 71230993 689316465134429 259154140003 0 0 0
> 0 0 0
> [...]
> --- cut ---
>
>
> If you require any more information to narrow down the issue, please
> don't hesitate to contact me.
Was there anything logged in dmesg or other system logs at that time?
>
>
>
> Regards
>
>
> Christian Rohmann
>
>
Thank you for the report, I will take a look.
We have already received similar report from Nebojsa Stevanovic, CCed.
Sorry that the issue is not resolved yet. I will review what we have
changed in the drivers between 6.1 and 6.2, where bug was introduced.
Best regards,
Przemek Kitszel
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan at osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
More information about the Intel-wired-lan
mailing list