[Intel-wired-lan] [PATCH] igb: add parameter to ignore nvm checksum validation

Florian Fainelli f.fainelli at gmail.com
Fri May 17 16:47:21 UTC 2019


On 5/17/19 9:36 AM, Daniel Walker wrote:
> On Fri, May 17, 2019 at 08:16:34AM -0700, Alexander Duyck wrote:
>> On Thu, May 16, 2019 at 6:48 PM Florian Fainelli <f.fainelli at gmail.com> wrote:
>>>
>>>
>>>
>>> On 5/16/2019 6:03 PM, Daniel Walker wrote:
>>>> On Thu, May 16, 2019 at 03:02:18PM -0700, Florian Fainelli wrote:
>>>>> On 5/16/19 12:55 PM, Nikunj Kela (nkela) wrote:
>>>>>>
>>>>>>
>>>>>> On 5/16/19, 12:35 PM, "Jeff Kirsher" <jeffrey.t.kirsher at intel.com> wrote:
>>>>>>
>>>>>>     On Wed, 2019-05-08 at 23:14 +0000, Nikunj Kela wrote:
>>>>>>    >> Some of the broken NICs don't have EEPROM programmed correctly. It
>>>>>>    >> results
>>>>>>    >> in probe to fail. This change adds a module parameter that can be
>>>>>>    >> used to
>>>>>>    >> ignore nvm checksum validation.
>>>>>>    >>
>>>>>>    >> Cc: xe-linux-external at cisco.com
>>>>>>    >> Signed-off-by: Nikunj Kela <nkela at cisco.com>
>>>>>>    >> ---
>>>>>>    >>  drivers/net/ethernet/intel/igb/igb_main.c | 28
>>>>>>    >> ++++++++++++++++++++++------
>>>>>>    >>  1 file changed, 22 insertions(+), 6 deletions(-)
>>>>>>
>>>>>>     >NAK for two reasons.  First, module parameters are not desirable
>>>>>>     >because their individual to one driver and a global solution should be
>>>>>>     >found so that all networking device drivers can use the solution.  This
>>>>>>     >will keep the interface to change/setup/modify networking drivers
>>>>>>     >consistent for all drivers.
>>>>>>
>>>>>>
>>>>>>     >Second and more importantly, if your NIC is broken, fix it.  Do not try
>>>>>>     >and create a software workaround so that you can continue to use a
>>>>>>     >broken NIC.  There are methods/tools available to properly reprogram
>>>>>>     >the EEPROM on a NIC, which is the right solution for your issue.
>>>>>>
>>>>>> I am proposing this as a debug parameter. Obviously, we need to fix EEPROM but this helps us continuing the development while manufacturing fixes NIC.
>>>>>
>>>>> Then why even bother with sending this upstream?
>>>>
>>>> It seems rather drastic to disable the entire driver because the checksum
>>>> doesn't match. It really should be a warning, even a big warning, to let people
>>>> know something is wrong, but disabling the whole driver doesn't make sense.
>>>
>>> You could generate a random Ethernet MAC address if you don't have a
>>> valid one, a lot of drivers do that, and that's a fairly reasonable
>>> behavior. At some point in your product development someone will
>>> certainly verify that the provisioned MAC address matches the network
>>> interface's MAC address.
>>> --
>>> Florian
>>
>> The thing is the EEPROM contains much more than just the MAC address.
>> There ends up being configuration for some of the PCIe interface in
>> the hardware as well as PHY configuration. If that is somehow mangled
>> we shouldn't be bringing up the part because there are one or more
>> pieces of the device configuration that are likely wrong.
>>
>> The checksum is being used to make sure the EEPROM is valid, without
>> that we would need to go through and validate each individual section
>> of the EEPROM before enabling the the portions of the device related
>> to it. The concern is that this will become a slippery slope where we
>> eventually have to code all the configuration of the EEPROM into the
>> driver itself.
>  
> 
> I don't think you can say because the checksum is valid that all data contained
> inside is also valid. You can have a valid checksum , and someone screwed up the
> data prior to the checksum getting computed.
> 
> 
>> We need to make the checksum a hard stop. If the part is broken then
>> it needs to be addressed. Workarounds just end up being used and
>> forgotten, which makes it that much harder to support the product.
>> Better to mark the part as being broken, and get it fixed now, than to
>> have parts start shipping that require workarounds in order to
>> function.o
> 
> I don't think it's realistic to define the development process for large
> corporations like Cisco, or like what your doing , to define the development
> process for all corporations and products which may use intel parts. It's better
> to be flexible.

Nikunj indicated that "manufacturing fixes NIC" so that sounds like a
workaround for an issue that would not affect a final product, in which
case, keeping downstream changes for development boards/intermediate
revisions of a product and focusing on relevant upstreaming changes for
the actual product would make a lot more sense, no?
-- 
Florian


More information about the Intel-wired-lan mailing list