[Intel-wired-lan] [PATCH iwl-net v4] igb: Fix trigger of incorrect irq in igb_xsk_wakeup

Behera, VIVEK vivek.behera at siemens.com
Sun Jan 11 14:27:39 UTC 2026



> -----Original Message-----
> From: Maciej Fijalkowski <maciej.fijalkowski at intel.com>
> Sent: Friday, January 9, 2026 1:15 AM
> To: Behera, Vivek (DI FA DSP ICC PRC1) <vivek.behera at siemens.com>
> Cc: aleksandr.loktionov at intel.com; jacob.e.keller at intel.com;
> anthony.l.nguyen at intel.com; przemyslaw.kitszel at intel.com;
> sriram.yagnaraman at ericsson.com; kurt at linutronix.de; intel-wired-
> lan at lists.osuosl.org
> Subject: Re: [PATCH iwl-net v4] igb: Fix trigger of incorrect irq in
> igb_xsk_wakeup
> 
> On Mon, Dec 22, 2025 at 12:57:47PM +0100, Vivek Behera wrote:
> > The current implementation in the igb_xsk_wakeup expects the Rx and Tx
> > queues to share the same irq. This would lead to triggering of
> > incorrect irq in split irq configuration.
> > This patch addresses this issue which could impact environments with 2
> > active cpu cores or when the number of queues is reduced to 2 or less
> >
> > cat /proc/interrupts | grep eno2
> >  167:          0          0          0          0 IR-PCI-MSIX-0000:08:00.0
> >  0-edge      eno2
> >  168:          0          0          0          0 IR-PCI-MSIX-0000:08:00.0
> >  1-edge      eno2-rx-0
> >  169:          0          0          0          0 IR-PCI-MSIX-0000:08:00.0
> >  2-edge      eno2-rx-1
> >  170:          0          0          0          0 IR-PCI-MSIX-0000:08:00.0
> >  3-edge      eno2-tx-0
> >  171:          0          0          0          0 IR-PCI-MSIX-0000:08:00.0
> >  4-edge      eno2-tx-1
> >
> > Furthermore it uses the flags input argument to trigger either rx, tx
> > or both rx and tx irqs as specified in the ndo_xsk_wakeup api
> > documentation
> >
> > Fixes: 80f6ccf9f116 ("igb: Introduce XSK data structures and helpers")
> > Signed-off-by: Vivek Behera <vivek.behera at siemens.com>
> > Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov at intel.com>
> > ---
> > v1:
> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore
> > .kernel.org%2Fintel-wired-lan%2F20251212131454.124116-1-vivek.behera%4
> >
> 0siemens.com%2F&data=05%7C02%7Cvivek.behera%40siemens.com%7C2e7cb
> 8169b
> >
> f4457c291908de4f1428a2%7C38ae3bcd95794fd4addab42e1495d55a%7C1%7C0%
> 7C63
> >
> 9035145171065657%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRyd
> WUsIlYiO
> >
> iIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0
> %7C
> >
> %7C%7C&sdata=yCLwTDeoPNeub3tvzXTA7vX5p8I%2BmqcArOk0BzogiA0%3D&
> reserved
> > =0
> > v2:
> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore
> > .kernel.org%2Fintel-wired-lan%2F20251215115416.410619-1-vivek.behera%4
> >
> 0siemens.com%2F&data=05%7C02%7Cvivek.behera%40siemens.com%7C2e7cb
> 8169b
> >
> f4457c291908de4f1428a2%7C38ae3bcd95794fd4addab42e1495d55a%7C1%7C0%
> 7C63
> >
> 9035145171089890%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRyd
> WUsIlYiO
> >
> iIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0
> %7C
> >
> %7C%7C&sdata=wm2ylxc0sATU4XL5LFP026DwMxUWe7EWAVjCQ3yyPGM%3
> D&reserved=0
> > v3:
> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore
> > .kernel.org%2Fintel-wired-lan%2F20251220114936.140473-1-vivek.behera%4
> >
> 0siemens.com%2F&data=05%7C02%7Cvivek.behera%40siemens.com%7C2e7cb
> 8169b
> >
> f4457c291908de4f1428a2%7C38ae3bcd95794fd4addab42e1495d55a%7C1%7C0%
> 7C63
> >
> 9035145171106278%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRyd
> WUsIlYiO
> >
> iIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0
> %7C
> >
> %7C%7C&sdata=q7VkYNY8YzNn5rsbunTF04mt0gkBe1jeF6DrSbvw5s4%3D&res
> erved=0
> >
> > changelog:
> > v1
> > - Inital description of the Bug and fixes made in the patch
> >
> > v1 -> v2
> > - Handling of RX and TX Wakeup in igc_xsk_wakeup for a split IRQ
> > configuration
> > - Review suggestions by Aleksander: Modified sequence to complete all
> >   error checks for rx and tx before updating napi states and
> > triggering irqs
> > - Corrected trigger of TX and RX interrupts over E1000_ICS (non msix
> > use case)
> > - Added define for Tx interrupt trigger bit mask for E1000_ICS
> >
> > v2 -> v3
> > - Included applicable feedback and suggestions from igc patch
> > - Fixed logic in updating eics value when  both TX and RX need wakeup
> >
> > v3 -> v4
> > - Added comments to explain trigerring of both TX and RX with active
> > queue pairs
> > - Fixed check of xsk pools in if statement
> > ---
> >  .../net/ethernet/intel/igb/e1000_defines.h    |  1 +
> >  drivers/net/ethernet/intel/igb/igb_xsk.c      | 90 +++++++++++++++++--
> >  2 files changed, 83 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/intel/igb/e1000_defines.h
> > b/drivers/net/ethernet/intel/igb/e1000_defines.h
> > index fa028928482f..9357564a2d58 100644
> > --- a/drivers/net/ethernet/intel/igb/e1000_defines.h
> > +++ b/drivers/net/ethernet/intel/igb/e1000_defines.h
> > @@ -443,6 +443,7 @@
> >  #define E1000_ICS_LSC       E1000_ICR_LSC       /* Link Status Change */
> >  #define E1000_ICS_RXDMT0    E1000_ICR_RXDMT0    /* rx desc min.
> threshold */
> >  #define E1000_ICS_DRSTA     E1000_ICR_DRSTA     /* Device Reset Aserted
> */
> > +#define E1000_ICS_TXDW      E1000_ICR_TXDW	/* Transmit desc written
> back */
> >
> >  /* Extended Interrupt Cause Set */
> >  /* E1000_EITR_CNT_IGNR is only for 82576 and newer */ diff --git
> > a/drivers/net/ethernet/intel/igb/igb_xsk.c
> > b/drivers/net/ethernet/intel/igb/igb_xsk.c
> > index 30ce5fbb5b77..1d21674c0f33 100644
> > --- a/drivers/net/ethernet/intel/igb/igb_xsk.c
> > +++ b/drivers/net/ethernet/intel/igb/igb_xsk.c
> > @@ -529,6 +529,7 @@ int igb_xsk_wakeup(struct net_device *dev, u32 qid,
> u32 flags)
> >  	struct igb_adapter *adapter = netdev_priv(dev);
> >  	struct e1000_hw *hw = &adapter->hw;
> >  	struct igb_ring *ring;
> > +	struct igb_q_vector *q_vector;
> >  	u32 eics = 0;
> >
> >  	if (test_bit(__IGB_DOWN, &adapter->state)) @@ -536,14 +537,82 @@
> int
> > igb_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags)
> >
> >  	if (!igb_xdp_is_enabled(adapter))
> >  		return -EINVAL;
> > -
> > -	if (qid >= adapter->num_tx_queues)
> > +	/* Check if queue_id is valid. Tx and Rx queue numbers are always same
> */
> > +	if (qid >= adapter->num_rx_queues)
> >  		return -EINVAL;
> >
> > -	ring = adapter->tx_ring[qid];
> > -
> > -	if (test_bit(IGB_RING_FLAG_TX_DISABLED, &ring->flags))
> > -		return -ENETDOWN;
> > +	if ((flags & XDP_WAKEUP_RX) && (flags & XDP_WAKEUP_TX)) {
> > +		/* If both TX and RX need to be woken up check if queue pairs are
> active */
> > +		if (adapter->flags & IGB_FLAG_QUEUE_PAIRS) {
> > +			/* In queue-pair mode, rx_ring and tx_ring share the same
> q_vector,
> > +			 * so a single IRQ trigger will wake both RX and TX
> processing
> > +			 */
> > +			ring = adapter->rx_ring[qid];
> > +		} else {
> > +			/* Two irqs for Rx AND Tx need to be triggered */
> > +			struct napi_struct *rx_napi;
> > +			struct napi_struct *tx_napi;
> > +			bool trigger_irq_tx = false;
> > +			bool trigger_irq_rx = false;
> > +			u32 eics_tx = 0;
> > +			u32 eics_rx = 0;
> > +			/* IRQ trigger preparation for Rx */
> > +			ring = adapter->rx_ring[qid];
> > +			if (!READ_ONCE(ring->xsk_pool))
> > +				return -ENXIO;
> > +			q_vector = ring->q_vector;
> > +			rx_napi  = &q_vector->napi;
> > +			/* Extend the BIT mask for eics */
> > +			eics_rx = ring->q_vector->eims_value;
> > +
> > +			/* IRQ trigger preparation for Tx */
> > +			ring = adapter->tx_ring[qid];
> > +			if (test_bit(IGB_RING_FLAG_TX_DISABLED, &ring-
> >flags))
> > +				return -ENETDOWN;
> > +
> > +			if (!READ_ONCE(ring->xsk_pool))
> > +				return -ENXIO;
> > +			q_vector = ring->q_vector;
> > +			tx_napi  = &q_vector->napi;
> > +			/* Extend the BIT mask for eics */
> > +			eics_tx = ring->q_vector->eims_value;
> > +
> > +			/* Check and update napi states for rx and tx */
> > +			if (!napi_if_scheduled_mark_missed(rx_napi)) {
> > +				trigger_irq_rx = true;
> > +				eics |= eics_rx;
> > +			}
> > +			if (!napi_if_scheduled_mark_missed(tx_napi)) {
> > +				trigger_irq_tx = true;
> > +				eics |= eics_tx;
> > +			}
> > +			/* Now we trigger the required irqs for Rx and Tx */
> > +			if ((trigger_irq_rx) || (trigger_irq_tx)) {
> > +				if (adapter->flags & IGB_FLAG_HAS_MSIX) {
> > +					wr32(E1000_EICS, eics);
> > +				} else {
> > +					if ((trigger_irq_rx) && (trigger_irq_tx))
> > +						wr32(E1000_ICS,
> E1000_ICS_RXDMT0 | E1000_ICS_TXDW);
> > +					else if (trigger_irq_rx)
> > +						wr32(E1000_ICS,
> E1000_ICS_RXDMT0);
> > +					else
> > +						wr32(E1000_ICS,
> E1000_ICS_TXDW);
> > +				}
> > +			}
> > +			return 0;
> > +		}
> > +	} else if (flags & XDP_WAKEUP_TX) {
> > +		/* Get the ring params from Tx */
> > +		ring = adapter->tx_ring[qid];
> > +		if (test_bit(IGB_RING_FLAG_TX_DISABLED, &ring->flags))
> > +			return -ENETDOWN;
> > +	} else if (flags & XDP_WAKEUP_RX) {
> > +		/* Get the ring params from Rx */
> > +		ring = adapter->rx_ring[qid];
> > +	} else {
> > +		/* Invalid Flags */
> > +		return -EINVAL;
> > +	}
> 
> This is too complicated IMHO. Wouldn't something like this work:
> - if wakeup rx, pick napi from rx ring's q_vector
>   * napi_if_scheduled_mark_missed() logic that exists in igc_xsk_wakeup()
> 
> repeat for tx; if IGB_FLAG_QUEUE_PAIRS then the branch of second
> napi_if_scheduled_mark_missed() call would not be executed as we had
> previously marked the missed bit in napi state;


Okay. I hope I understood the change you are suggesting. Does the code below reflect what you have in mind?

int igb_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags)
{
	struct igb_adapter *adapter = netdev_priv(dev);
	struct e1000_hw *hw = &adapter->hw;
	struct igb_ring *ring;
	struct igb_q_vector *q_vector;
	struct napi_struct *rx_napi;
	struct napi_struct *tx_napi;
	bool trigger_irq_tx = false;
	bool trigger_irq_rx = false;
	u32 eics_tx = 0;
	u32 eics_rx = 0;
	u32 eics = 0;

	if (test_bit(__IGB_DOWN, &adapter->state))
		return -ENETDOWN;

	if (!igb_xdp_is_enabled(adapter))
		return -EINVAL;
	/* Check if queue_id is valid. Tx and Rx queue numbers are always same */
	if (qid >= adapter->num_rx_queues)
		return -EINVAL;
	/* Check if flags are valid */
	if (!(flags & XDP_WAKEUP_RX) && !(flags & XDP_WAKEUP_TX))
		return -EINVAL;
	if (flags & XDP_WAKEUP_RX) {
		/* IRQ trigger preparation for Rx */
		ring = adapter->rx_ring[qid];
		if (!READ_ONCE(ring->xsk_pool))
			return -ENXIO;
		q_vector = ring->q_vector;
		rx_napi = &q_vector->napi;
		/* Extend the BIT mask for eics */
		eics_rx = ring->q_vector->eims_value;
		trigger_irq_rx = true;
	}
	if (flags & XDP_WAKEUP_TX) {
		if (adapter->flags & IGB_FLAG_QUEUE_PAIRS) {
			/* In queue-pair mode, rx_ring and tx_ring share the same q_vector,
			* so a single IRQ trigger will wake both RX and TX processing
			*/

		} else {
			/* IRQ trigger preparation for Tx */
			ring = adapter->tx_ring[qid];
			if (test_bit(IGB_RING_FLAG_TX_DISABLED, &ring->flags))
				return -ENETDOWN;

			if (!READ_ONCE(ring->xsk_pool))
				return -ENXIO;
			q_vector = ring->q_vector;
			tx_napi = &q_vector->napi;
			/* Extend the BIT mask for eics */
			eics_tx = ring->q_vector->eims_value;
			trigger_irq_tx = true;
		}
	}
	/* All error checks are finished. Check and update napi states for rx and tx */
	if (trigger_irq_rx) {
		if (!napi_if_scheduled_mark_missed(rx_napi)) {
			eics |= eics_rx;
		}
	}
	if (trigger_irq_tx) {
		if (!napi_if_scheduled_mark_missed(tx_napi)) {
			eics |= eics_tx;
		}
	}
	/* Now we trigger the required irqs for Rx and Tx */
	if ((trigger_irq_rx) || (trigger_irq_tx)) {
		if (adapter->flags & IGB_FLAG_HAS_MSIX) {
			wr32(E1000_EICS, eics);
		} else {
			if ((trigger_irq_rx) && (trigger_irq_tx))
				wr32(E1000_ICS,
				     E1000_ICS_RXDMT0 | E1000_ICS_TXDW);
			else if (trigger_irq_rx)
				wr32(E1000_ICS, E1000_ICS_RXDMT0);
			else
				wr32(E1000_ICS, E1000_ICS_TXDW);
		}
	}
	return 0;
}
> 
> >
> >  	if (!READ_ONCE(ring->xsk_pool))
> >  		return -EINVAL;
> > @@ -551,10 +620,15 @@ int igb_xsk_wakeup(struct net_device *dev, u32 qid,
> u32 flags)
> >  	if (!napi_if_scheduled_mark_missed(&ring->q_vector->napi)) {
> >  		/* Cause software interrupt */
> >  		if (adapter->flags & IGB_FLAG_HAS_MSIX) {
> > -			eics |= ring->q_vector->eims_value;
> > +			eics = ring->q_vector->eims_value;
> >  			wr32(E1000_EICS, eics);
> >  		} else {
> > -			wr32(E1000_ICS, E1000_ICS_RXDMT0);
> > +			if ((flags & XDP_WAKEUP_RX) && (flags &
> XDP_WAKEUP_TX))
> > +				wr32(E1000_ICS, E1000_ICS_RXDMT0 |
> E1000_ICS_TXDW);
> > +			else if (flags & XDP_WAKEUP_RX)
> > +				wr32(E1000_ICS, E1000_ICS_RXDMT0);
> > +			else
> > +				wr32(E1000_ICS, E1000_ICS_TXDW);
> >  		}
> >  	}
> >
> > --
> > 2.34.1
> >


More information about the Intel-wired-lan mailing list