[Intel-wired-lan] igc kernel module crashes on new hardware (Intel Ethernet I225-V)
Conor Dooley
conor at kernel.org
Wed Dec 21 17:30:54 UTC 2022
On Sun, Nov 20, 2022 at 07:55:09PM +0000, Conor Dooley wrote:
> On Sat, Nov 19, 2022 at 08:06:05PM +0200, Neftin, Sasha wrote:
> > On 11/19/2022 01:21, Conor Dooley wrote:
> > > On Fri, Nov 18, 2022 at 02:54:43PM -0800, Jakub Kicinski wrote:
> > > > On Fri, 18 Nov 2022 22:43:29 +0000 Conor Dooley wrote:
> > > > > > Is there any update for the community? More and more folks are asking. We
> > > > > > are all techies and happy to help debug.
> > > > >
> > > > > Vested interest since I am suffering from the same issue (X670E-F
> > > > > Gaming), but is it okay to add this to regzbot? Not sure whether it
> > > > > counts as a regression or not since it's new hw with the existing driver,
> > > > > but this seems to be falling through the cracks without a response for
> > > > > several weeks.
> > > >
> > > > Dunno, Thorsten's will decide. The line has to be drawn somewhere
> > > > on "vendor doesn't care about Linux support" vs "we broke uAPI".
> > > > This is the kind of situation I was alluding to in my line of
> > > > questioning at the maintainer summit: https://lwn.net/Articles/908324/
> > >
> > > Yeah & it is /regression/ tracking which I don't (or rather didn't)
> > > consider this situation to be. I'm generally a little unsure as to when
> > > I should trigger regzbot in general:
> > > - immediately when I find something?
> > > - only if it goes a while with nothing constructive?
> > > - is it okay to use it outside of "this used to work and now doesnt"?
> > >
> > > Either way, but I did some more googling and found this reddit thread:
> > > https://www.reddit.com/r/intel/comments/lqb4km/for_people_having_i225v_connection_issues/
> > >
> > > That's being reported against windows & I dunno if the dude is using
> > > firmware and driver interchangeably etc. But the disabling power saving
> > > etc sounds oddly like the issue we have here, since that was a proposed
> > > workaround in Ivan's 2022 reddit thread.
> > >
> > > Supposedly I am on firmware-version 1082:8770, but /I/ I have no idea
> > > how that corresponds to windows versioning. That may lend some credence
> > > to your assertion about firmware being the source of many issues.
> > >
> > > > Finding a kernel release which does not suffer from the problem
> > > > would certainly strengthen your case.
> > >
> > > Aye, likely to be a little difficult to do a meaningful bisection for
> > > me at least, since the motherboard I have with the problem is an AM5
> > > one for the new Zen4 stuff. I'm not an x86 person, so not entirely
> > > sure when that support landed. I may do some poking tomorrow..
> > >
> > I do not think we can resolve this problem on this forum.
> > In early Ivan's report was reported error to netdev "PCIe link lost, device
> > now detached"). Since the PCIe link unexpectedly drops it could lead to many
> > problems (not only crashes).
>
> Hmm, I'll take a look at what mine spits out next time it dies, but I
> would imagine that you're correct and I see it too.
It does in fact say that, but interestingly only this peripheral has any
issues. My GPUs etc have no problem at all.
> > Before you go to SW/FW bisection (change FW(NVM), go back with a kernel
> > version) - please, contact your board vendor (ASUS). Why PCIe link drop?
>
> I dunno, I suppose it just entered a lower power state!
>
> > Circuit problem on board, the system performs power management flows and
> > does not stop the driver.
>
> My GPU and other PCI devices are returning from lower power modes properly.
> I wonder what's different about this specific device. As I said, not too
> familiar with x86 stuff - is there someone from AMD worth poking as the
> output from lspci is a wall of AMD bridges w/ endpoints mixed in.
>
> Doing a cursory look at other x670 stuff - the non-asus ones that I
> looked at are not using Intel ethernet.
>
> > "failed to read reg 0xc030" (just symptom) happen after PCIe link lost.
>
> Per 47e16692b26b ("igb/igc: warn when fatal read failure happens"), it
> looks as though this is not a *new* problem though as you guys have seen
> this while testing.
>
> I've got a 1 G NIC, I like my dev machine to "just work" so I'll probably
> throw that in and see how far that gets me. IIRC it's an igb one so will
> at least make for a datapoint.
FWIW I gave up on the igc driver and am using my NIC, couldn't be
bothered with the disruption. I'll give the bios stuff mentioned
elsewhere a go over Christmas now that v6.1.1 exists and see if that
helps. Hopefully it does!
Conor.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20221221/2df13298/attachment.asc>
More information about the Intel-wired-lan
mailing list