[Intel-wired-lan] [net-next PATCH v3 00/17] Future-proof tunnel offload handlers

Alexander Duyck alexander.duyck at gmail.com
Tue Jun 21 18:17:58 UTC 2016


On Tue, Jun 21, 2016 at 10:40 AM, Hannes Frederic Sowa
<hannes at redhat.com> wrote:
> On 21.06.2016 10:27, Edward Cree wrote:
>> On 21/06/16 18:05, Alexander Duyck wrote:
>>> On Tue, Jun 21, 2016 at 1:22 AM, David Miller <davem at davemloft.net> wrote:
>>>> But anyways, the vastness of the key is why we want to keep "sockets"
>>>> out of network cards, because proper support of "sockets" requires
>>>> access to information the card simply does not and should not have.
>>> Right.  Really what I would like to see for most of these devices is a
>>> 2 tuple filter where you specify the UDP port number, and the PF/VF ID
>>> that the traffic is received on.
>> But that doesn't make sense - the traffic is received on a physical network
>> port, and it's the headers (i.e. flow) at that point that determine whether
>> the traffic is encap or not.  After all, those headers are all that can
>> determine which PF or VF it's sent to; and if it's multicast and goes to
>> more than one of them, it seems odd for one to treat it as encap and the
>> other to treat it as normal UDP - one of them must be misinterpreting it
>> (unless the UDP is going to a userspace tunnel endpoint, but I'm ignoring
>> that complication for now).
>
> Disabling offloading of packets is never going to cause data corruptions
> or misinterpretations. In some cases we can hint the network card to do
> even more (RSS+checksumming). We always have a safe choice, namely not
> doing hw offloading.

Agreed.  Also we need to keep in mind that in many cases things like
RSS and checksumming can be very easily made port specific since what
we are talking about is just what is reported in the Rx descriptor and
not any sort of change to the packet data.

> Multicast is often scoped, in some cases we have different multicast
> scopes but the same addresses. In case of scoped traffic, we must verify
> the device as well and can't install the same flow on every NIC.

Right.  Hopefully the NIC vendors are thinking ahead and testing to
validate such cases where multicast or broadcast traffic doesn't do
anything weird to their NICs in terms of offloads.

>> At a given physical point in the network, a given UDP flow either is or is
>> not carrying encapsulated traffic, and if it tries to be both then things
>> are certain to break, just as much as if two different applications try to
>> use the same UDP flow for two different application protocols.
>
> I think the example Tom was hinting at initially is like that:
>
> A net namespace acts as a router and has a vxlan endpoint active. The
> vxlan endpoint enables vxlan offloading on all net_devices in the same
> namespace. Because we only identify the tunnel endpoint by UDP port
> number, traffic which should actually just be forwarded and should never
> be processed locally suddenly can become processed by the offloading hw
> units. Because UDP ports only form a contract between the end points and
> not with the router in between it would be illegal to treat those not
> locally designated packets as vxlan by the router.

Yes.  The problem is I am sure there are some vendors out there
wanting to tout their product as being excellent at routing VXLAN
traffic so they are probably exploiting this to try and claim
performance gains.

There is also some argument to be had for theory versus application.
Arguably it is the customers that are leading to some of the dirty
hacks as I think vendors are building NICs based on customer use cases
versus following any specifications.  In most data centers the tunnel
underlays will be deployed throughout the network and UDP will likely
be blocked for anything that isn't being used explicitly for
tunneling.  As such we seem to be seeing a lot of NICs that are only
supporting one port for things like this instead of designing them to
handle whatever we can throw at them.

I really think it may be a few more years before we hit the point
where the vendors start to catch a clue about the fact that they need
to have a generic approach that works in all cases versus what we have
now were they are supporting whatever the buzzword of the day is and
not looking much further down the road than that.  The fact is in a
few years time we might even have to start dealing with
tunnel-in-tunnel type workloads to address the use of containers
inside of KVM guests.  I'm pretty sure we don't have support for
recursive tunnel offloads in hardware and likely never will.  To that
end all I would really need is support for CHECKSUM_COMPLETE or outer
Rx checksums enabled, RSS based on the outer source port assuming the
destination port is recognized as a tunnel, the ability to have DF bit
set for any of the inner tunnel headers, and GSO partial extended to
support tunnel-in-tunnel scenarios.

- Alex


More information about the Intel-wired-lan mailing list