[Intel-wired-lan] Counter spikes in /proc/net/dev for E810-CQDA2 interfaces (ice driver) on kernel >=6.2
Christian Rohmann
christian.rohmann at inovex.de
Thu Nov 16 16:24:43 UTC 2023
Dear sir or madam,
we run multiple Intel E810-CQDA2 100G adapters (2x QSFP28) in our fleet
of servers . The machines are running Ubuntu 22.04 LTS (Jammy), wieth
Linux kernel 6.2.0-36-generic (Ubuntu HWE Kernel).
This is the output from ethtool:
---cut ---
# ethtool -i eth2
driver: ice
version: 6.2.0-36-generic
firmware-version: 4.30 0x8001af29 1.3429.0
expansion-rom-version:
bus-info: 0000:a1:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
--- cut ---
We observe strange, totally unrealistic traffic spikes (Multiple
Terabits/s) in our monitoring. We use the Prometheus Node Exporter and
the netdev collector
(https://github.com/prometheus/node_exporter/blob/ed1b8e3d88851806627e4f8262ee26232ca56c2c/collector/netdev_common.go#L39).
I found issue https://github.com/prometheus/node_exporter/issues/1849
and it appears that others have noticed similar issues with the counters.
I have now dumped "/proc/net/dev" of one of the machines once per second
to a logfile per interface to show the issue actually originates from
the "ice" kernel driver
and not from any of our other tooling.
I can provide the whole files, but if you just look at two timestamps in
particular, you can actually see two jump in the counters:
--- cut ---
Inter-| Receive | Transmit
face |bytes packets errs drop fifo frame compressed
multicast|bytes packets errs drop fifo colls carrier compressed
[...]
Nov 16 14:44:17 eth2: 322480275246795 161202637791 12245 2396226
0 0 0 71204126 497958797609464 188500340907 0 0
0 0 0 0
Nov 16 14:44:18 eth2: 386617853382565 193953665830 12245 2396226
0 0 0 71204282 593586606935949 223802656120 0 0
0 0 0 0
[...]
Nov 16 14:49:10 eth2: 386662845936810 193977501895 12247 2396226
0 0 0 71230993 593637495306092 223827197609 0 0
0 0 0 0
Nov 16 14:49:11 eth2: 450845520538932 226752438356 12247 2396226
0 0 0 71230993 689316465134429 259154140003 0 0
0 0 0 0
[...]
--- cut ---
If you require any more information to narrow down the issue, please
don't hesitate to contact me.
Regards
Christian Rohmann
More information about the Intel-wired-lan
mailing list