[Intel-wired-lan] Regression 5.12.0-rc4 net: ice: significant throughput drop

Robin Murphy robin.murphy at arm.com
Thu Jun 3 13:09:46 UTC 2021


On 2021-06-03 13:32, Jussi Maki wrote:
> On Wed, Jun 2, 2021 at 2:49 PM Robin Murphy <robin.murphy at arm.com> wrote:
>>>> Thanks for the quick response & patch. I tried it out and indeed it
>>>> does solve the issue:
>>
>> Cool, thanks Jussi. May I infer a Tested-by tag from that?
> 
> Of course!
> 
>> Given that the race looks to have been pretty theoretical until now, I'm
>> not convinced it's worth the bother of digging through the long history
>> of default domain and DMA ops movement to figure where it started, much
>> less attempt invasive backports. The flush queue change which made it
>> apparent only landed in 5.13-rc1, so as long as we can get this in as a
>> fix in the current cycle we should be golden - in the meantime, note
>> that booting with "iommu.strict=0" should also restore the expected
>> behaviour.
>>
>> FWIW I do still plan to resend the patch "properly" soon (in all honesty
>> it wasn't even compile-tested!)
> 
> BTW, even with the patch there's quite a bit of spin lock contention
> coming from ice_xmit_xdp_ring->dma_map_page_attrs->...->alloc_iova.
> CPU load drops from 85% to 20% (~80Mpps, 64b UDP) when iommu is
> disabled. Is this type of overhead to be expected?

Yes, IOVA allocation can still be a bottleneck - the percpu caching 
system mostly alleviates it, but certain workloads can still defeat 
that, and if you're spending significant time in alloc_iova() rather 
than alloc_iova_fast() then it sounds like yours is one of them.

If you're using small IOVA sizes which *should* be cached, then you 
might be running into a pathological case of thrashing the global depot. 
I've ranted before about the fixed MAX_GLOBAL_MAGS probably being too 
small for systems with more than 16 CPUs, which on a modern AMD system I 
imagine you may well have.

If on the other hand your workload is making larger mappings above the 
IOVA caching threshold, then please take a look at John's series for 
making that tuneable:

https://lore.kernel.org/linux-iommu/1622557781-211697-1-git-send-email-john.garry@huawei.com/

Cheers,
Robin.


More information about the Intel-wired-lan mailing list