[Intel-wired-lan] [PATCH v2 2/2] e1000e: Fix ptp time reset on network interruption

Thu Apr 14 18:21:09 UTC 2016

On Thu, 2016-04-14 at 11:08 -0400, Brian Walsh wrote:
> On Thu, Apr 14, 2016 at 03:11:45AM +0000, Brown, Aaron F wrote:
> > 
> > > 
> > > From: Intel-wired-lan [mailto:intel-wired-lan-bounces at lists.osuos
> > > l.org] On
> > > Behalf Of Brian Walsh
> > > Sent: Tuesday, April 12, 2016 8:23 PM
> > > To: intel-wired-lan at lists.osuosl.org
> > > Subject: [Intel-wired-lan] [PATCH v2 2/2] e1000e: Fix ptp time
> > > reset on
> > > network interruption
> > > 
> > > Time is resetting on any interruption of network connectivity.
> > > This
> > > causes the clock to jump around by the leapsecond offset. It
> > > should
> > > only reset when the device is initialized.
> > > 
> > > Signed-off-by: Brian Walsh <brian at walsh.ws>
> > > ---
> > >  drivers/net/ethernet/intel/e1000e/netdev.c | 22 +++++++++++-----
> > > ------
> > >  1 file changed, 11 insertions(+), 11 deletions(-)
> > > 
> > This patch introduces a Call Trace and panic for me on a handful of
> > regression systems.  I am usually seeing this on the e1000e driver
> > load, but on one system when just under traffic stress.  It seems
> > to show up mostly on older hardware, the trace has been spotted on
> > a system with a 82573 LOM, another system with a pair of
> > 80003ES2LAN controller's and an add in 82572.  The following trace
> > is taken via a serial console from a system with an 82574L and
> > 82579L LOM on the board after the system had been running randomish
> > netperf traffic for an hour or so.  The trace on driver load is
> > similar to the first call trace of this series, but generally did
> > not recover enough to get the follow along messages:
> > 
> This patch seems to be causing issues on other systems. I am running
> it
> on about 30 units with all the same card. I also have linuxptp
> running
> at the same time.
> 
> Would there be some other way to address the problem that I am trying
> to fix with this patch?
> 
> Basically if the network connection between the device and the 1588
> clock is interrupted for a period of time the hardware clock was
> switching from being on TAI time to thinking that the time is now UTC
> time. This causes the system time to fluctuate by the leapsecond
> offset.
> 
> I was able to reproduce this problem with a 1588 clock source using
> ipv4
> udp by temporarily dropping udp traffic on ports 319 and 320 through
> iptables.
> 
> Moving the the clock reset to only in initialization fixed the
> problem
> for me.
> 
> Brian

Moving the clock reset to initialization seems like the correct
behavior to me.

Thanks,
Jake