[Intel-wired-lan] Counter spikes in /proc/net/dev for E810-CQDA2 interfaces (ice driver) on kernel >=6.2

Przemek Kitszel przemyslaw.kitszel at intel.com
Fri Nov 17 10:13:34 UTC 2023


On 11/16/23 17:24, Christian Rohmann wrote:
> Dear sir or madam,
> 
> we run multiple Intel E810-CQDA2 100G adapters (2x QSFP28) in our fleet 
> of servers . The machines are running Ubuntu 22.04 LTS (Jammy), wieth 
> Linux kernel 6.2.0-36-generic (Ubuntu HWE Kernel).
> 
> This is the output from ethtool:
> 
> ---cut ---
> # ethtool -i eth2
> driver: ice
> version: 6.2.0-36-generic
> firmware-version: 4.30 0x8001af29 1.3429.0
> expansion-rom-version:
> bus-info: 0000:a1:00.0
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: yes
> supports-register-dump: yes
> supports-priv-flags: yes
> 
> --- cut ---
> 
> We observe strange, totally unrealistic traffic spikes (Multiple 
> Terabits/s) in our monitoring. We use the Prometheus Node Exporter and 
> the netdev collector 
> (https://github.com/prometheus/node_exporter/blob/ed1b8e3d88851806627e4f8262ee26232ca56c2c/collector/netdev_common.go#L39).
> I found issue https://github.com/prometheus/node_exporter/issues/1849 
> and it appears that others have noticed similar issues with the counters.
> 
> I have now dumped "/proc/net/dev" of one of the machines once per second 
> to a logfile per interface to show the issue actually originates from 
> the "ice" kernel driver
> and not from any of our other tooling.

Good move!

> 
> I can provide the whole files, but if you just look at two timestamps in 
> particular, you can actually see two jump in the counters:
> 
> --- cut ---
> Inter-|   Receive |  Transmit
>   face |bytes    packets errs drop fifo frame compressed 
> multicast|bytes    packets errs drop fifo colls carrier compressed
> [...]
> Nov 16 14:44:17   eth2: 322480275246795 161202637791 12245 2396226 0     
> 0          0  71204126 497958797609464 188500340907    0    0 0     
> 0       0          0
> Nov 16 14:44:18   eth2: 386617853382565 193953665830 12245 2396226 0     
> 0          0  71204282 593586606935949 223802656120    0    0 0     
> 0       0          0
> [...]
> Nov 16 14:49:10   eth2: 386662845936810 193977501895 12247 2396226 0     
> 0          0  71230993 593637495306092 223827197609    0    0 0     
> 0       0          0
> Nov 16 14:49:11   eth2: 450845520538932 226752438356 12247 2396226 0     
> 0          0  71230993 689316465134429 259154140003    0    0 0     
> 0       0          0
> [...]
> --- cut ---
> 
> 
> If you require any more information to narrow down the issue, please 
> don't hesitate to contact me.

Was there anything logged in dmesg or other system logs at that time?

> 
> 
> 
> Regards
> 
> 
> Christian Rohmann
> 
> 

Thank you for the report, I will take a look.

We have already received similar report from Nebojsa Stevanovic, CCed.

Sorry that the issue is not resolved yet. I will review what we have
changed in the drivers between 6.1 and 6.2, where bug was introduced.

Best regards,
Przemek Kitszel

> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan at osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan



More information about the Intel-wired-lan mailing list