[Intel-wired-lan] FW: [PATCH bpf-next 2/4] xsk: allow AF_XDP sockets to receive packets directly from a queue

Wed Oct 9 17:17:19 UTC 2019

On Wed, Oct 9, 2019 at 9:53 AM Samudrala, Sridhar
<sridhar.samudrala at intel.com> wrote:
>
>
> >> +
> >> +u32 bpf_direct_xsk(const struct bpf_prog *prog, struct xdp_buff *xdp)
> >> +{
> >> +       struct xdp_sock *xsk;
> >> +
> >> +       xsk = xdp_get_xsk_from_qid(xdp->rxq->dev, xdp->rxq->queue_index);
> >> +       if (xsk) {
> >> +               struct bpf_redirect_info *ri =
> >> + this_cpu_ptr(&bpf_redirect_info);
> >> +
> >> +               ri->xsk = xsk;
> >> +               return XDP_REDIRECT;
> >> +       }
> >> +
> >> +       return XDP_PASS;
> >> +}
> >> +EXPORT_SYMBOL(bpf_direct_xsk);
> >
> > So you're saying there is a:
> > """
> > xdpsock rxdrop 1 core (both app and queue's irq pinned to the same core)
> >     default : taskset -c 1 ./xdpsock -i enp66s0f0 -r -q 1
> >     direct-xsk :taskset -c 1 ./xdpsock -i enp66s0f0 -r -q 1 6.1x improvement in drop rate """
> >
> > 6.1x gain running above C code vs exactly equivalent BPF code?
> > How is that possible?
>
> It seems to be due to the overhead of __bpf_prog_run on older processors
> (Ivybridge). The overhead is smaller on newer processors, but even on
> skylake i see around 1.5x improvement.
>
> perf report with default xdpsock
> ================================
> Samples: 2K of event 'cycles:ppp', Event count (approx.): 8437658090
> Overhead  Command          Shared Object     Symbol
>    34.57%  xdpsock          xdpsock           [.] main
>    17.19%  ksoftirqd/1      [kernel.vmlinux]  [k] ___bpf_prog_run
>    13.12%  xdpsock          [kernel.vmlinux]  [k] ___bpf_prog_run

That must be a bad joke.
The whole patch set is based on comparing native code to interpreter?!
It's pretty awesome that interpreter is only 1.5x slower than native x86.
Just turn the JIT on.

Obvious Nack to the patch set.