[Intel-wired-lan] [PATCH bpf-next v3 03/11] xsk: add support to allow unaligned chunk placement
Laatz, Kevin
kevin.laatz at intel.com
Fri Jul 26 09:58:14 UTC 2019
On 26/07/2019 10:47, Laatz, Kevin wrote:
> On 25/07/2019 16:39, Jonathan Lemon wrote:
>> On 23 Jul 2019, at 22:10, Kevin Laatz wrote:
>>
>>> Currently, addresses are chunk size aligned. This means, we are very
>>> restricted in terms of where we can place chunk within the umem. For
>>> example, if we have a chunk size of 2k, then our chunks can only be
>>> placed
>>> at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
>>>
>>> This patch introduces the ability to use unaligned chunks. With these
>>> changes, we are no longer bound to having to place chunks at a 2k (or
>>> whatever your chunk size is) interval. Since we are no longer
>>> dealing with
>>> aligned chunks, they can now cross page boundaries. Checks for page
>>> contiguity have been added in order to keep track of which pages are
>>> followed by a physically contiguous page.
>>>
>>> Signed-off-by: Kevin Laatz <kevin.laatz at intel.com>
>>> Signed-off-by: Ciara Loftus <ciara.loftus at intel.com>
>>> Signed-off-by: Bruce Richardson <bruce.richardson at intel.com>
>>>
>>> ---
>>> v2:
>>> - Add checks for the flags coming from userspace
>>> - Fix how we get chunk_size in xsk_diag.c
>>> - Add defines for masking the new descriptor format
>>> - Modified the rx functions to use new descriptor format
>>> - Modified the tx functions to use new descriptor format
>>>
>>> v3:
>>> - Add helper function to do address/offset masking/addition
>>> ---
>>> include/net/xdp_sock.h | 17 ++++++++
>>> include/uapi/linux/if_xdp.h | 9 ++++
>>> net/xdp/xdp_umem.c | 18 +++++---
>>> net/xdp/xsk.c | 86 ++++++++++++++++++++++++++++++-------
>>> net/xdp/xsk_diag.c | 2 +-
>>> net/xdp/xsk_queue.h | 68 +++++++++++++++++++++++++----
>>> 6 files changed, 170 insertions(+), 30 deletions(-)
>>>
>>> diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
>>> index 69796d264f06..738996c0f995 100644
>>> --- a/include/net/xdp_sock.h
>>> +++ b/include/net/xdp_sock.h
>>> @@ -19,6 +19,7 @@ struct xsk_queue;
>>> struct xdp_umem_page {
>>> void *addr;
>>> dma_addr_t dma;
>>> + bool next_pg_contig;
>>> };
>>>
>>> struct xdp_umem_fq_reuse {
>>> @@ -48,6 +49,7 @@ struct xdp_umem {
>>> bool zc;
>>> spinlock_t xsk_list_lock;
>>> struct list_head xsk_list;
>>> + u32 flags;
>>> };
>>>
>>> struct xdp_sock {
>>> @@ -144,6 +146,15 @@ static inline void xsk_umem_fq_reuse(struct
>>> xdp_umem *umem, u64 addr)
>>>
>>> rq->handles[rq->length++] = addr;
>>> }
>>> +
>>> +static inline u64 xsk_umem_handle_offset(struct xdp_umem *umem, u64
>>> handle,
>>> + u64 offset)
>>> +{
>>> + if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
>>> + return handle |= (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT);
>>> + else
>>> + return handle += offset;
>>> +}
>>
>> This should be something like 'xsk_umem_adjust_offset()', and use
>> "+=" for both cases.
>
> I'll try to come up with a better name for this, I see how
> "handle_offset" could be misleading :)
>
> In terms of the "+=", we can't do that for both. We need to use | for
> the unaligned case since we store the offset in the upper 16-bits of
> the address field so that we leave the lower 48-bits (the original
> base address) untouched.
>
Correction, we can do "+=". With that, the lower 48-bits will still
remain untouched though, so we still have the original address :)
>
> <...>
>
>>>
>>> if (!PAGE_ALIGNED(addr)) {
>>> @@ -331,9 +335,11 @@ static int xdp_umem_reg(struct xdp_umem *umem,
>>> struct xdp_umem_reg *mr)
>>> if (chunks == 0)
>>> return -EINVAL;
>>>
>>> - chunks_per_page = PAGE_SIZE / chunk_size;
>>> - if (chunks < chunks_per_page || chunks % chunks_per_page)
>>> - return -EINVAL;
>>> + if (!unaligned_chunks) {
>>> + chunks_per_page = PAGE_SIZE / chunk_size;
>>> + if (chunks < chunks_per_page || chunks % chunks_per_page)
>>> + return -EINVAL;
>>> + }
>>>
>>> headroom = ALIGN(headroom, 64);
>>>
>>> @@ -342,13 +348,15 @@ static int xdp_umem_reg(struct xdp_umem *umem,
>>> struct xdp_umem_reg *mr)
>>> return -EINVAL;
>>>
>>> umem->address = (unsigned long)addr;
>>> - umem->chunk_mask = ~((u64)chunk_size - 1);
>>> + umem->chunk_mask = unaligned_chunks ? XSK_UNALIGNED_BUF_ADDR_MASK
>>> + : ~((u64)chunk_size - 1);
>>
>> The handle needs to be cleaned (reset to base address) when removed
>> from the fill queue or recycle stack. This will not provide the correct
>> semantics for unaligned mode.
>
> When we use aligned mode, the chunk mask is ~0x07FF which will remove
> the low 11-bits in order to get us back to the original base address.
>
> In unaligned mode, the chunk mask is ~0xFFFF00....00 which will remove
> the upper 16-bits. This will remove the offset from the address field
> and, since we have not directly modified the low 48-bits (original
> base address), leave us with the original base address.
>
> Cleaning the handle will therefore work as it does now, using the
> appropriate mask based on which mode we are using.
>
> <...>
>
>>>
>>> if (xskq_produce_addr_lazy(umem->cq, desc->addr))
>>> @@ -243,7 +269,7 @@ static int xsk_generic_xmit(struct sock *sk,
>>> struct msghdr *m,
>>> if (xs->queue_id >= xs->dev->real_num_tx_queues)
>>> goto out;
>>>
>>> - while (xskq_peek_desc(xs->tx, &desc)) {
>>> + while (xskq_peek_desc(xs->tx, &desc, xs->umem)) {
>>> char *buffer;
>>> u64 addr;
>>> u32 len;
>>> @@ -262,6 +288,10 @@ static int xsk_generic_xmit(struct sock *sk,
>>> struct msghdr *m,
>>>
>>> skb_put(skb, len);
>>> addr = desc.addr;
>>> + if (xs->umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
>>> + addr = (addr & XSK_UNALIGNED_BUF_ADDR_MASK) |
>>> + (addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT);
>>
>> This doesn't look right to me. Shouldn't it be "(addr & mask) +
>> (addr >> shift)"?
>> I'd also prefer to see this type of logic in an inline/macro
>
> Will fix in the v4, thanks!
>
> I'll look to move this to an inline function as well.
>
> <...>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20190726/75ab8880/attachment-0001.html>
More information about the Intel-wired-lan
mailing list