[Intel-wired-lan] [PATCH v2 0/5] Introducing ixgbe AF_XDP ZC support
Jesper Dangaard Brouer
brouer at redhat.com
Thu Oct 4 21:18:48 UTC 2018
On Tue, 2 Oct 2018 10:00:29 +0200
Björn Töpel <bjorn.topel at gmail.com> wrote:
> From: Björn Töpel <bjorn.topel at intel.com>
>
> Jeff: Please remove the v1 patches from your dev-queue!
>
> This patch set introduces zero-copy AF_XDP support for Intel's ixgbe
> driver.
>
> The ixgbe zero-copy code is located in its own file ixgbe_xsk.[ch],
> analogous to the i40e ZC support. Again, as in i40e, code paths have
> been copied from the XDP path to the zero-copy path. Going forward we
> will try to generalize more code between the AF_XDP ZC drivers, and
> also reduce the heavy C&P.
>
> We have run some benchmarks on a dual socket system with two Broadwell
> E5 2660 @ 2.0 GHz with hyperthreading turned off. Each socket has 14
> cores which gives a total of 28, but only two cores are used in these
> experiments. One for TR/RX and one for the user space application. The
> memory is DDR4 @ 2133 MT/s (1067 MHz) and the size of each DIMM is
> 8192MB and with 8 of those DIMMs in the system we have 64 GB of total
> memory. The compiler used is GCC 7.3.0. The NIC is Intel
> 82599ES/X520-2 10Gbit/s using the ixgbe driver.
>
> Below are the results in Mpps of the 82599ES/X520-2 NIC benchmark runs
> for 64B and 1500B packets, generated by a commercial packet generator
> HW blasting packets at full 10Gbit/s line rate. The results are with
> retpoline and all other spectre and meltdown fixes.
>
> AF_XDP performance 64B packets:
> Benchmark XDP_DRV with zerocopy
> rxdrop 14.7
> txpush 14.6
I see similar performance numbers, but my system can crash with 'txonly'.
See full crash log and my analysis, below.
> l2fwd 11.1
Got l2fwd 13.2 Mpps.
>
> AF_XDP performance 1500B packets:
> Benchmark XDP_DRV with zerocopy
> rxdrop 0.8
> l2fwd 0.8
>
> XDP performance on our system as a base line.
>
> 64B packets:
> XDP stats CPU Mpps issue-pps
> XDP-RX CPU 16 14.7 0
>
> 1500B packets:
> XDP stats CPU Mpps issue-pps
> XDP-RX CPU 16 0.8 0
>
> The structure of the patch set is as follows:
>
> Patch 1: Introduce Rx/Tx ring enable/disable functionality
> Patch 2: Preparatory patche to ixgbe driver code for RX
> Patch 3: ixgbe zero-copy support for RX
> Patch 4: Preparatory patch to ixgbe driver code for TX
> Patch 5: ixgbe zero-copy support for TX
>
> Changes since v1:
>
> * Removed redundant AF_XDP precondition checks, pointed out by
> Jakub. Now, the preconditions are only checked at XDP enable time.
> * Fixed a crash in the egress path, due to incorrect usage of
> ixgbe_ring queue_index member. In v2 a ring_idx back reference is
> introduced, and used in favor of queue_index. William reported the
> crash, and helped me smoke out the issue. Kudos!
> * In ixgbe_xsk_async_xmit, validate qid against num_xdp_queues,
> instead of num_rx_queues.
>
> Cheers!
> Björn
>
> Björn Töpel (5):
> ixgbe: added Rx/Tx ring disable/enable functions
> ixgbe: move common Rx functions to ixgbe_txrx_common.h
> ixgbe: add AF_XDP zero-copy Rx support
> ixgbe: move common Tx functions to ixgbe_txrx_common.h
> ixgbe: add AF_XDP zero-copy Tx support
>
> drivers/net/ethernet/intel/ixgbe/Makefile | 3 +-
> drivers/net/ethernet/intel/ixgbe/ixgbe.h | 28 +-
> drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c | 17 +-
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 291 ++++++-
> .../ethernet/intel/ixgbe/ixgbe_txrx_common.h | 50 ++
> drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 803 ++++++++++++++++++
> 6 files changed, 1146 insertions(+), 46 deletions(-)
> create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
> create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
sock0 at ixgbe2:0 rxdrop
pps pkts 1.00
rx 14,572,284 36,093,496
tx 0 0
sock0 at ixgbe2:0 l2fwd
pps pkts 1.00
rx 13,287,830 108,616,192
tx 13,287,830 108,616,284
Notice, the crash only happens some times (on the second invocation):
$ sudo ./xdpsock --interface ixgbe2 --txonly --zero
samples/bpf/xdpsock_user.c:kick_tx:749: Assertion failed: 0: errno: 100/"Network is down"
sock0 at ixgbe2:0 txonly
pps pkts 0.05
rx 0 0
tx 33,763 1,709
$ sudo ./xdpsock --interface ixgbe2 --txonly --zero
sock0 at ixgbe2:0 txonly
pps pkts 1.00
rx 0 0
tx 14,730,354 14,733,404
$ sudo ./xdpsock --interface ixgbe2 --txonly --zero
samples/bpf/xdpsock_user.c:kick_tx:749: Assertion failed: 0: errno: 100/"Network is down"
sock0 at ixgbe2:0 txonly
pps pkts 0.26
rx 0 0
tx 2,054,927 524,680
$ sudo ./xdpsock --interface ixgbe2 --txonly --zero
[ 249.953547] ixgbe 0000:01:00.1 ixgbe2: detected SFP+: 4
[ 250.204158] ixgbe 0000:01:00.1 ixgbe2: NIC Link is Up 10 Gbps, Flow Control: None
[ 257.217496] ixgbe 0000:01:00.1: removed PHC on ixgbe2
[ 257.279328] ixgbe 0000:01:00.1: Multiqueue Disabled: Rx Queue count = 1, Tx Queue count = 1 XDP Queue count = 6
[ 257.308463] ixgbe 0000:01:00.1: registered PHC device on ixgbe2
[ 257.489166] ixgbe 0000:01:00.1 ixgbe2: detected SFP+: 4
[ 257.494923] ixgbe 0000:01:00.1 ixgbe2: initiating reset to clear Tx work after link loss
[ 257.716190] ixgbe 0000:01:00.1 ixgbe2: Reset adapter
[ 257.968552] ixgbe 0000:01:00.1 ixgbe2: detected SFP+: 4
[ 258.185273] ixgbe 0000:01:00.1 ixgbe2: NIC Link is Up 10 Gbps, Flow Control: None
[ 260.836196] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
[ 260.844652] PGD 0 P4D 0
[ 260.847527] Oops: 0002 [#1] PREEMPT SMP PTI
[ 260.852042] CPU: 1 PID: 18 Comm: ksoftirqd/1 Not tainted 4.19.0-rc5-bpf-next-xdp-ixgbe-ZC+ #66
[ 260.861269] Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0a 08/01/2016
[ 260.869381] RIP: 0010:xsk_umem_consume_tx+0xc9/0x180
[ 260.874682] Code: 24 75 be 48 8b 86 08 03 00 00 48 8d b0 f8 fc ff ff 48 39 c7 75 96 e8 26 bd 8a ff 5b 31 c0 41 5a 41 5c 41 5d 5d 49 8d 62 f8 c3 <89> 41 40 8b 4a 24 8b 42 1c 29 c8 75 0b 48 8b 42 28 8b 00 89 42 1c
[ 260.894317] RSP: 0018:ffffc9000323bd00 EFLAGS: 00010246
[ 260.899873] RAX: 0000000000000000 RBX: ffffc9000323bd68 RCX: 0000000000000000
[ 260.907339] RDX: ffff8808553e1c00 RSI: ffff880826e43000 RDI: ffff880854940818
[ 260.914801] RBP: ffffc9000323bd20 R08: 0000000000000010 R09: 0000000000000000
[ 260.922263] R10: ffffc9000323bd40 R11: 0000000000000000 R12: ffffc9000323bd64
[ 260.929726] R13: ffff880854940780 R14: 0000000000000000 R15: 0000000000000000
[ 260.937189] FS: 0000000000000000(0000) GS:ffff88085c640000(0000) knlGS:0000000000000000
[ 260.945871] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 260.951943] CR2: 0000000000000040 CR3: 000000087f20a006 CR4: 00000000003606e0
[ 260.959409] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 260.966872] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 260.974333] Call Trace:
[ 260.977115] ? ixgbe_clean_xdp_tx_irq+0x19d/0x2e0 [ixgbe]
[ 260.982843] ixgbe_clean_xdp_tx_irq+0x19d/0x2e0 [ixgbe]
[ 260.988426] ixgbe_poll+0x5a/0x700 [ixgbe]
[ 260.992850] net_rx_action+0x141/0x3f0
[ 260.996931] ? sort_range+0x20/0x20
[ 261.000743] __do_softirq+0xe3/0x2f7
[ 261.004656] ? sort_range+0x20/0x20
[ 261.008490] run_ksoftirqd+0x26/0x30
[ 261.012420] smpboot_thread_fn+0x114/0x1d0
[ 261.016848] kthread+0x111/0x130
[ 261.020423] ? kthread_create_worker_on_cpu+0x50/0x50
[ 261.025802] ret_from_fork+0x1f/0x30
[ 261.029707] Modules linked in: ip6table_filter ip6_tables iptable_filter ip_tables x_tables tun nfnetlink bridge nf_defrag_ipv6 nf_defrag_ipv4 bpfilter sunrpc coretemp intel_cstate intel_uncore intel_rapl_perf pcspkr i2c_i801 wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad pcc_cpufreq sch_fq_codel ixgbe mdio mlx5_core i40e igb nfp ptp i2c_algo_bit devlink i2c_core pps_core hid_generic [last unloaded: x_tables]
[ 261.067878] CR2: 0000000000000040
[ 261.071526] ---[ end trace f0011e17c3744ee4 ]---
[ 261.077903] RIP: 0010:xsk_umem_consume_tx+0xc9/0x180
[ 261.083191] Code: 24 75 be 48 8b 86 08 03 00 00 48 8d b0 f8 fc ff ff 48 39 c7 75 96 e8 26 bd 8a ff 5b 31 c0 41 5a 41 5c 41 5d 5d 49 8d 62 f8 c3 <89> 41 40 8b 4a 24 8b 42 1c 29 c8 75 0b 48 8b 42 28 8b 00 89 42 1c
[ 261.102852] RSP: 0018:ffffc9000323bd00 EFLAGS: 00010246
[ 261.108423] RAX: 0000000000000000 RBX: ffffc9000323bd68 RCX: 0000000000000000
[ 261.115889] RDX: ffff8808553e1c00 RSI: ffff880826e43000 RDI: ffff880854940818
[ 261.123382] RBP: ffffc9000323bd20 R08: 0000000000000010 R09: 0000000000000000
[ 261.130847] R10: ffffc9000323bd40 R11: 0000000000000000 R12: ffffc9000323bd64
[ 261.138325] R13: ffff880854940780 R14: 0000000000000000 R15: 0000000000000000
[ 261.145788] FS: 0000000000000000(0000) GS:ffff88085c640000(0000) knlGS:0000000000000000
[ 261.154503] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 261.160594] CR2: 0000000000000040 CR3: 000000087f20a006 CR4: 00000000003606e0
[ 261.168070] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 261.175547] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 261.183012] Kernel panic - not syncing: Fatal exception in interrupt
[ 261.189743] Kernel Offset: disabled
[ 261.194954] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
[ 261.203123] ------------[ cut here ]------------
[ 261.208071] sched: Unexpected reschedule of offline CPU#0!
[ 261.213885] WARNING: CPU: 1 PID: 18 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x31/0x40
[ 261.223698] Modules linked in: ip6table_filter ip6_tables iptable_filter ip_tables x_tables tun nfnetlink bridge nf_defrag_ipv6 nf_defrag_ipv4 bpfilter sunrpc coretemp intel_cstate intel_uncore intel_rapl_perf pcspkr i2c_i801 wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad pcc_cpufreq sch_fq_codel ixgbe mdio mlx5_core i40e igb nfp ptp i2c_algo_bit devlink i2c_core pps_core hid_generic [last unloaded: x_tables]
[ 261.261869] CPU: 1 PID: 18 Comm: ksoftirqd/1 Tainted: G D 4.19.0-rc5-bpf-next-xdp-ixgbe-ZC+ #66
[ 261.272468] Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0a 08/01/2016
[ 261.280549] RIP: 0010:native_smp_send_reschedule+0x31/0x40
[ 261.286361] Code: 48 0f a3 05 91 c7 3d 01 73 12 48 8b 05 e8 11 0c 01 be fd 00 00 00 48 8b 40 30 ff e0 89 fe 48 c7 c7 b8 36 09 82 e8 ff 7d 02 00 <0f> 0b c3 66 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53 48
[ 261.306001] RSP: 0018:ffff88085c643cc0 EFLAGS: 00010082
[ 261.311553] RAX: 000000000000002e RBX: ffff88085c6213c0 RCX: 0000000000000006
[ 261.319023] RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff88085c6555e0
[ 261.326483] RBP: ffff88085306a0d4 R08: 0000000000000000 R09: 0000000000000478
[ 261.333943] R10: ffff88085c643bf8 R11: ffffffff82acfbad R12: ffff880853069640
[ 261.341407] R13: ffff88085c643d10 R14: 0000000000000086 R15: 00000000000213c0
[ 261.348869] FS: 0000000000000000(0000) GS:ffff88085c640000(0000) knlGS:0000000000000000
[ 261.357555] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 261.363624] CR2: 0000000000000040 CR3: 000000087f20a006 CR4: 00000000003606e0
[ 261.371090] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 261.378554] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 261.386014] Call Trace:
[ 261.388788] <IRQ>
[ 261.391128] check_preempt_curr+0x6f/0x80
[ 261.395466] ttwu_do_wakeup+0x19/0x150
[ 261.399548] try_to_wake_up+0x19c/0x450
[ 261.403715] ? enqueue_entity+0xad/0x2c0
[ 261.407964] __wake_up_common+0x71/0x170
[ 261.412220] ep_poll_callback+0xb5/0x2a0
[ 261.416474] __wake_up_common+0x71/0x170
[ 261.420729] __wake_up_common_lock+0x6c/0x90
[ 261.425335] ? tick_sched_do_timer+0x60/0x60
[ 261.429935] irq_work_run_list+0x47/0x70
[ 261.434190] update_process_times+0x3b/0x50
[ 261.438705] tick_sched_handle+0x21/0x70
[ 261.442959] ? tick_sched_do_timer+0x50/0x60
[ 261.447554] tick_sched_timer+0x37/0x70
[ 261.451719] __hrtimer_run_queues+0xf8/0x2a0
[ 261.456317] hrtimer_interrupt+0xe5/0x240
[ 261.460657] ? sched_clock+0x5/0x10
[ 261.464478] smp_apic_timer_interrupt+0x5e/0x140
[ 261.469420] apic_timer_interrupt+0xf/0x20
[ 261.473847] </IRQ>
[ 261.476271] RIP: 0010:panic+0x1e3/0x232
[ 261.480433] Code: eb ac 83 3d 30 07 a0 01 00 74 05 e8 39 36 02 00 48 c7 c6 a0 8b ac 82 48 c7 c7 10 af 09 82 e8 84 6a 05 00 fb 66 0f 1f 44 00 00 <31> db e8 f8 22 0b 00 4c 39 eb 7c 17 41 83 f4 01 44 89 e7 ff 15 d6
[ 261.500066] RSP: 0018:ffffc9000323baf8 EFLAGS: 00000292 ORIG_RAX: ffffffffffffff13
[ 261.508234] RAX: 0000000000000046 RBX: 0000000000000000 RCX: 0000000000000006
[ 261.515696] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff88085c6555e0
[ 261.523160] RBP: ffffc9000323bb68 R08: 0000000000000000 R09: 0000000000000476
[ 261.530620] R10: 0000000000000008 R11: ffffffff82acfbad R12: 0000000000000000
[ 261.538084] R13: 0000000000000000 R14: 0000000000000009 R15: 0000000000000001
[ 261.545546] ? panic+0x1dc/0x232
[ 261.549101] oops_end+0xb9/0xd0
[ 261.552569] no_context+0x156/0x3a0
[ 261.556392] ? cpumask_next_and+0x1a/0x20
[ 261.560730] ? find_busiest_group+0x112/0xa80
[ 261.565413] __do_page_fault+0xd5/0x500
[ 261.569579] page_fault+0x1e/0x30
[ 261.573220] RIP: 0010:xsk_umem_consume_tx+0xc9/0x180
[ 261.578508] Code: 24 75 be 48 8b 86 08 03 00 00 48 8d b0 f8 fc ff ff 48 39 c7 75 96 e8 26 bd 8a ff 5b 31 c0 41 5a 41 5c 41 5d 5d 49 8d 62 f8 c3 <89> 41 40 8b 4a 24 8b 42 1c 29 c8 75 0b 48 8b 42 28 8b 00 89 42 1c
[ 261.598148] RSP: 0018:ffffc9000323bd00 EFLAGS: 00010246
[ 261.603703] RAX: 0000000000000000 RBX: ffffc9000323bd68 RCX: 0000000000000000
[ 261.611169] RDX: ffff8808553e1c00 RSI: ffff880826e43000 RDI: ffff880854940818
[ 261.618631] RBP: ffffc9000323bd20 R08: 0000000000000010 R09: 0000000000000000
[ 261.626094] R10: ffffc9000323bd40 R11: 0000000000000000 R12: ffffc9000323bd64
[ 261.633557] R13: ffff880854940780 R14: 0000000000000000 R15: 0000000000000000
[ 261.641021] ? ixgbe_clean_xdp_tx_irq+0x19d/0x2e0 [ixgbe]
[ 261.646755] ixgbe_clean_xdp_tx_irq+0x19d/0x2e0 [ixgbe]
[ 261.652308] ixgbe_poll+0x5a/0x700 [ixgbe]
[ 261.656735] net_rx_action+0x141/0x3f0
[ 261.660814] ? sort_range+0x20/0x20
[ 261.664627] __do_softirq+0xe3/0x2f7
[ 261.668530] ? sort_range+0x20/0x20
[ 261.672351] run_ksoftirqd+0x26/0x30
[ 261.676250] smpboot_thread_fn+0x114/0x1d0
[ 261.680671] kthread+0x111/0x130
[ 261.684223] ? kthread_create_worker_on_cpu+0x50/0x50
[ 261.689603] ret_from_fork+0x1f/0x30
[ 261.701291] ---[ end trace f0011e17c3744ee5 ]---
(gdb) list *(xsk_umem_consume_tx)+0xc9
0xffffffff81883fe9 is in xsk_umem_consume_tx (./include/linux/compiler.h:214).
209 static __always_inline void __write_once_size(volatile void *p, void *res, int size)
210 {
211 switch (size) {
212 case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
213 case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
214 case 4: *(volatile __u32 *)p = *(__u32 *)res; break;
215 case 8: *(volatile __u64 *)p = *(__u64 *)res; break;
216 default:
217 barrier();
218 __builtin_memcpy((void *)p, (const void *)res, size);
I think the bug occurs in the WRITE_ONCE in xskq_peek_desc() and
it correspond to q->ring == NULL (as ring have offset 40)
static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
struct xdp_desc *desc)
{
if (q->cons_tail == q->cons_head) {
WRITE_ONCE(q->ring->consumer, q->cons_tail);
q->cons_head = q->cons_tail + xskq_nb_avail(q, RX_BATCH_SIZE);
/* Order consumer and data */
smp_rmb();
}
return xskq_validate_desc(q, desc);
}
$ pahole -C xsk_queue vmlinux
struct xsk_queue {
u64 chunk_mask; /* 0 8 */
u64 size; /* 8 8 */
u32 ring_mask; /* 16 4 */
u32 nentries; /* 20 4 */
u32 prod_head; /* 24 4 */
u32 prod_tail; /* 28 4 */
u32 cons_head; /* 32 4 */
u32 cons_tail; /* 36 4 */
struct xdp_ring * ring; /* 40 8 */
u64 invalid_descs; /* 48 8 */
/* size: 56, cachelines: 1, members: 10 */
/* last cacheline: 56 bytes */
};
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
More information about the Intel-wired-lan
mailing list