[Intel-wired-lan] Counter spikes in /proc/net/dev for E810-CQDA2 interfaces (ice driver) on kernel >=6.2

Christian Rohmann christian.rohmann at inovex.de
Thu Nov 16 16:24:43 UTC 2023


Dear sir or madam,

we run multiple Intel E810-CQDA2 100G adapters (2x QSFP28) in our fleet 
of servers . The machines are running Ubuntu 22.04 LTS (Jammy), wieth 
Linux kernel 6.2.0-36-generic (Ubuntu HWE Kernel).

This is the output from ethtool:

---cut ---
# ethtool -i eth2
driver: ice
version: 6.2.0-36-generic
firmware-version: 4.30 0x8001af29 1.3429.0
expansion-rom-version:
bus-info: 0000:a1:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

--- cut ---

We observe strange, totally unrealistic traffic spikes (Multiple 
Terabits/s) in our monitoring. We use the Prometheus Node Exporter and 
the netdev collector 
(https://github.com/prometheus/node_exporter/blob/ed1b8e3d88851806627e4f8262ee26232ca56c2c/collector/netdev_common.go#L39).
I found issue https://github.com/prometheus/node_exporter/issues/1849 
and it appears that others have noticed similar issues with the counters.

I have now dumped "/proc/net/dev" of one of the machines once per second 
to a logfile per interface to show the issue actually originates from 
the "ice" kernel driver
and not from any of our other tooling.

I can provide the whole files, but if you just look at two timestamps in 
particular, you can actually see two jump in the counters:

--- cut ---
Inter-|   Receive |  Transmit
  face |bytes    packets errs drop fifo frame compressed 
multicast|bytes    packets errs drop fifo colls carrier compressed
[...]
Nov 16 14:44:17   eth2: 322480275246795 161202637791 12245 2396226    
0     0          0  71204126 497958797609464 188500340907    0    0    
0     0       0          0
Nov 16 14:44:18   eth2: 386617853382565 193953665830 12245 2396226    
0     0          0  71204282 593586606935949 223802656120    0    0    
0     0       0          0
[...]
Nov 16 14:49:10   eth2: 386662845936810 193977501895 12247 2396226    
0     0          0  71230993 593637495306092 223827197609    0    0    
0     0       0          0
Nov 16 14:49:11   eth2: 450845520538932 226752438356 12247 2396226    
0     0          0  71230993 689316465134429 259154140003    0    0    
0     0       0          0
[...]
--- cut ---


If you require any more information to narrow down the issue, please 
don't hesitate to contact me.



Regards


Christian Rohmann




More information about the Intel-wired-lan mailing list