[Intel-wired-lan] [PATCH net-next v1 5/7] taprio: Add support for txtime offload mode.
Patel, Vedang
vedang.patel at intel.com
Tue Jun 4 20:06:22 UTC 2019
Hi Murali,
> On Jun 3, 2019, at 7:15 AM, Murali Karicheri <m-karicheri2 at ti.com> wrote:
>
> Hi Vedang,
>
> On 05/28/2019 01:46 PM, Vedang Patel wrote:
>> Currently, we are seeing non-critical packets being transmitted outside
>> of their timeslice. We can confirm that the packets are being dequeued
>> at the right time. So, the delay is induced in the hardware side. The
>> most likely reason is the hardware queues are starving the lower
>> priority queues.
>> In order to improve the performance of taprio, we will be making use of the
>> txtime feature provided by the ETF qdisc. For all the packets which do not have
>> the SO_TXTIME option set, taprio will set the transmit timestamp (set in
>> skb->tstamp) in this mode. TAPrio Qdisc will ensure that the transmit time for
>> the packet is set to when the gate is open. If SO_TXTIME is set, the TAPrio
>> qdisc will validate whether the timestamp (in skb->tstamp) occurs when the gate
>> corresponding to skb's traffic class is open.
>> Following is the example configuration for enabling txtime offload:
>> tc qdisc replace dev eth0 parent root handle 100 taprio \\
>> num_tc 3 \\
>> map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \\
>> queues 1 at 0 1 at 0 1 at 0 \\
>> base-time 1558653424279842568 \\
>> sched-entry S 01 300000 \\
>> sched-entry S 02 300000 \\
>> sched-entry S 04 400000 \\
>> offload 2 \\
>> txtime-delay 40000 \\
>> clockid CLOCK_TAI
>> tc qdisc replace dev $IFACE parent 100:1 etf skip_sock_check \\
>> offload delta 200000 clockid CLOCK_TAI
>> Here, the "offload" parameter is indicating that the TXTIME_OFFLOAD mode is
>> enabled. Also, all the traffic classes are mapped to the same queue. This is
>> only possible in taprio when txtime offload is enabled. Also note that the ETF
>> Qdisc is enabled with offload mode set.
>> In this mode, if the packet's traffic class is open and the complete packet can
>> be transmitted, taprio will try to transmit the packet immediately. This will
>> be done by setting skb->tstamp to current_time + the time delta indicated in
>> the txtime_delay parameter. This parameter indicates the time taken (in
>> software) for packet to reach the network adapter.
>
> In TSN Time aware shaper, packets are sent when gate for a specific
> traffic class is open. So packets that are available in the queues are
> sent by the scheduler. So the ETF is not strictly required for this
> function.
> I understand if the application needs to send packets with
> some latency expectation should use ETF to schedule the packet in sync
> with the next gate open time. So txtime_delay is used to account for
> the delay for packets to travel from user space to nic.
This is not true. As explained in the other email, txtime-delay is the maximum time a packet might take after reaching taprio_enqueue() assuming the gate corresponding to that packet was already open.
> So it is ETF
> that need to inspect the skb->tstamp and allow or if time match or
> discard if late. Is this the case?
>
The role of ETF is just to sort packets according to their transmit timestamp and send them to the hardware queues to transmit whenever their time comes. This is needed because the i210 hardware does not have any sort feature within its queue. If 2 packets arrive and the first packet has a transmit timestamp later than the second packet, the second packet won’t be transmitted on time.
Taprio in the txtime offload mode (btw, this will soon be renamed to txtime-assist) will set the transmit timestamp of each packet (in skb->tstamp) and then send it to ETF so that it can be sorted with the other packets and sent to hardware whenever the time comes. ETF will discard the packet if the transmit time is in the past or before the transmit time of the packet which is already been sent to the hardware.
Let me know if you have more questions about this.
Thanks,
Vedang
> For offload case, the h/w that offload ETF needs to send a trigger to
> software to get packet for transmission ahead of the Gate open event?
>
> Thanks
>
> Murali
>> If the packet cannot be transmitted in the current interval or if the packet's
>> traffic is not currently transmitting, the skb->tstamp is set to the next
>> available timestamp value. This is tracked in the next_launchtime parameter in
>> the struct sched_entry.
>> The behaviour w.r.t admin and oper schedules is not changed from what is
>> present in software mode.
>> The transmit time is already known in advance. So, we do not need the HR timers
>> to advance the schedule and wakeup the dequeue side of taprio. So, HR timer
>> won't be run when this mode is enabled.
>> Signed-off-by: Vedang Patel <vedang.patel at intel.com>
>> ---
>> include/uapi/linux/pkt_sched.h | 1 +
>> net/sched/sch_taprio.c | 326 ++++++++++++++++++++++++++++++---
>> 2 files changed, 306 insertions(+), 21 deletions(-)
>> diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
>> index 3319255ffa25..afffda24e055 100644
>> --- a/include/uapi/linux/pkt_sched.h
>> +++ b/include/uapi/linux/pkt_sched.h
>> @@ -1174,6 +1174,7 @@ enum {
>> TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME, /* s64 */
>> TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION, /* s64 */
>> TCA_TAPRIO_ATTR_OFFLOAD_FLAGS, /* u32 */
>> + TCA_TAPRIO_ATTR_TXTIME_DELAY, /* s32 */
>> __TCA_TAPRIO_ATTR_MAX,
>> };
>> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
>> index 8d87ba099130..1cd19eabc53b 100644
>> --- a/net/sched/sch_taprio.c
>> +++ b/net/sched/sch_taprio.c
>> @@ -21,6 +21,7 @@
>> #include <net/pkt_sched.h>
>> #include <net/pkt_cls.h>
>> #include <net/sch_generic.h>
>> +#include <net/sock.h>
>> static LIST_HEAD(taprio_list);
>> static DEFINE_SPINLOCK(taprio_list_lock);
>> @@ -40,6 +41,7 @@ struct sched_entry {
>> * packet leaves after this time.
>> */
>> ktime_t close_time;
>> + ktime_t next_txtime;
>> atomic_t budget;
>> int index;
>> u32 gate_mask;
>> @@ -76,6 +78,7 @@ struct taprio_sched {
>> struct sk_buff *(*peek)(struct Qdisc *sch);
>> struct hrtimer advance_timer;
>> struct list_head taprio_list;
>> + int txtime_delay;
>> };
>> static ktime_t sched_base_time(const struct sched_gate_list *sched)
>> @@ -116,6 +119,235 @@ static void switch_schedules(struct taprio_sched *q,
>> *admin = NULL;
>> }
>> +/* Get how much time has been already elapsed in the current cycle. */
>> +static inline s32 get_cycle_time_elapsed(struct sched_gate_list *sched, ktime_t time)
>> +{
>> + ktime_t time_since_sched_start;
>> + s32 time_elapsed;
>> +
>> + time_since_sched_start = ktime_sub(time, sched->base_time);
>> + div_s64_rem(time_since_sched_start, sched->cycle_time, &time_elapsed);
>> +
>> + return time_elapsed;
>> +}
>> +
>> +static ktime_t get_interval_end_time(struct sched_gate_list *sched,
>> + struct sched_gate_list *admin,
>> + struct sched_entry *entry,
>> + ktime_t intv_start)
>> +{
>> + s32 cycle_elapsed = get_cycle_time_elapsed(sched, intv_start);
>> + ktime_t intv_end, cycle_ext_end, cycle_end;
>> +
>> + cycle_end = ktime_add_ns(intv_start, sched->cycle_time - cycle_elapsed);
>> + intv_end = ktime_add_ns(intv_start, entry->interval);
>> + cycle_ext_end = ktime_add(cycle_end, sched->cycle_time_extension);
>> +
>> + if (ktime_before(intv_end, cycle_end))
>> + return intv_end;
>> + else if (admin && admin != sched &&
>> + ktime_after(admin->base_time, cycle_end) &&
>> + ktime_before(admin->base_time, cycle_ext_end))
>> + return admin->base_time;
>> + else
>> + return cycle_end;
>> +}
>> +
>> +static inline int length_to_duration(struct taprio_sched *q, int len)
>> +{
>> + return (len * atomic64_read(&q->picos_per_byte)) / 1000;
>> +}
>> +
>> +/* Returns the entry corresponding to next available interval. If
>> + * validate_interval is set, it only validates whether the timestamp occurs
>> + * when the gate corresponding to the skb's traffic class is open.
>> + */
>> +static struct sched_entry *find_entry_to_transmit(struct sk_buff *skb,
>> + struct Qdisc *sch,
>> + struct sched_gate_list *sched,
>> + struct sched_gate_list *admin,
>> + ktime_t time,
>> + ktime_t *interval_start,
>> + ktime_t *interval_end,
>> + bool validate_interval)
>> +{
>> + ktime_t curr_intv_start, curr_intv_end, cycle_end, packet_transmit_time;
>> + ktime_t earliest_txtime = KTIME_MAX, txtime, cycle, transmit_end_time;
>> + struct sched_entry *entry = NULL, *entry_found = NULL;
>> + struct taprio_sched *q = qdisc_priv(sch);
>> + struct net_device *dev = qdisc_dev(sch);
>> + int tc, entry_available = 0, n;
>> + s32 cycle_elapsed;
>> +
>> + tc = netdev_get_prio_tc_map(dev, skb->priority);
>> + packet_transmit_time = length_to_duration(q, qdisc_pkt_len(skb));
>> +
>> + *interval_start = 0;
>> + *interval_end = 0;
>> +
>> + if (!sched)
>> + return NULL;
>> +
>> + cycle = sched->cycle_time;
>> + cycle_elapsed = get_cycle_time_elapsed(sched, time);
>> + curr_intv_end = ktime_sub_ns(time, cycle_elapsed);
>> + cycle_end = ktime_add_ns(curr_intv_end, cycle);
>> +
>> + list_for_each_entry(entry, &sched->entries, list) {
>> + curr_intv_start = curr_intv_end;
>> + curr_intv_end = get_interval_end_time(sched, admin, entry,
>> + curr_intv_start);
>> +
>> + if (ktime_after(curr_intv_start, cycle_end))
>> + break;
>> +
>> + if (!(entry->gate_mask & BIT(tc)) ||
>> + packet_transmit_time > entry->interval)
>> + continue;
>> +
>> + txtime = entry->next_txtime;
>> +
>> + if (ktime_before(txtime, time) || validate_interval) {
>> + transmit_end_time = ktime_add_ns(time, packet_transmit_time);
>> + if ((ktime_before(curr_intv_start, time) &&
>> + ktime_before(transmit_end_time, curr_intv_end)) ||
>> + (ktime_after(curr_intv_start, time) && !validate_interval)) {
>> + entry_found = entry;
>> + *interval_start = curr_intv_start;
>> + *interval_end = curr_intv_end;
>> + break;
>> + } else if (!entry_available && !validate_interval) {
>> + /* Here, we are just trying to find out the
>> + * first available interval in the next cycle.
>> + */
>> + entry_available = 1;
>> + entry_found = entry;
>> + *interval_start = ktime_add_ns(curr_intv_start, cycle);
>> + *interval_end = ktime_add_ns(curr_intv_end, cycle);
>> + }
>> + } else if (ktime_before(txtime, earliest_txtime) &&
>> + !entry_available) {
>> + earliest_txtime = txtime;
>> + entry_found = entry;
>> + n = div_s64(ktime_sub(txtime, curr_intv_start), cycle);
>> + *interval_start = ktime_add(curr_intv_start, n * cycle);
>> + *interval_end = ktime_add(curr_intv_end, n * cycle);
>> + }
>> + }
>> +
>> + return entry_found;
>> +}
>> +
>> +static bool is_valid_interval(struct sk_buff *skb, struct Qdisc *sch)
>> +{
>> + struct taprio_sched *q = qdisc_priv(sch);
>> + struct sched_gate_list *sched, *admin;
>> + ktime_t interval_start, interval_end;
>> + struct sched_entry *entry;
>> +
>> + rcu_read_lock();
>> + sched = rcu_dereference(q->oper_sched);
>> + admin = rcu_dereference(q->admin_sched);
>> +
>> + entry = find_entry_to_transmit(skb, sch, sched, admin, skb->tstamp,
>> + &interval_start, &interval_end, true);
>> + rcu_read_unlock();
>> +
>> + return entry;
>> +}
>> +
>> +static inline ktime_t get_cycle_start(struct sched_gate_list *sched,
>> + ktime_t time)
>> +{
>> + ktime_t cycle_elapsed;
>> +
>> + cycle_elapsed = get_cycle_time_elapsed(sched, time);
>> +
>> + return ktime_sub(time, cycle_elapsed);
>> +}
>> +
>> +/* There are a few scenarios where we will have to modify the txtime from
>> + * what is read from next_txtime in sched_entry. They are:
>> + * 1. If txtime is in the past,
>> + * a. The gate for the traffic class is currently open and packet can be
>> + * transmitted before it closes, schedule the packet right away.
>> + * b. If the gate corresponding to the traffic class is going to open later
>> + * in the cycle, set the txtime of packet to the interval start.
>> + * 2. If txtime is in the future, there are packets corresponding to the
>> + * current traffic class waiting to be transmitted. So, the following
>> + * possibilities exist:
>> + * a. We can transmit the packet before the window containing the txtime
>> + * closes.
>> + * b. The window might close before the transmission can be completed
>> + * successfully. So, schedule the packet in the next open window.
>> + */
>> +static long get_packet_txtime(struct sk_buff *skb, struct Qdisc *sch)
>> +{
>> + ktime_t transmit_end_time, interval_end, interval_start;
>> + int len, packet_transmit_time, sched_changed;
>> + struct taprio_sched *q = qdisc_priv(sch);
>> + ktime_t minimum_time, now, txtime;
>> + struct sched_gate_list *sched, *admin;
>> + struct sched_entry *entry;
>> +
>> + now = q->get_time();
>> + minimum_time = ktime_add_ns(now, q->txtime_delay);
>> +
>> + rcu_read_lock();
>> + admin = rcu_dereference(q->admin_sched);
>> + sched = rcu_dereference(q->oper_sched);
>> + if (admin && ktime_after(minimum_time, admin->base_time))
>> + switch_schedules(q, &admin, &sched);
>> +
>> + /* Until the schedule starts, all the queues are open */
>> + if (!sched || ktime_before(minimum_time, sched->base_time)) {
>> + txtime = minimum_time;
>> + goto done;
>> + }
>> +
>> + len = qdisc_pkt_len(skb);
>> + packet_transmit_time = length_to_duration(q, len);
>> +
>> + do {
>> + sched_changed = 0;
>> +
>> + entry = find_entry_to_transmit(skb, sch, sched, admin,
>> + minimum_time,
>> + &interval_start, &interval_end,
>> + false);
>> + if (!entry) {
>> + txtime = 0;
>> + goto done;
>> + }
>> +
>> + txtime = entry->next_txtime;
>> + txtime = max_t(ktime_t, txtime, minimum_time);
>> + txtime = max_t(ktime_t, txtime, interval_start);
>> +
>> + if (admin && admin != sched &&
>> + ktime_after(txtime, admin->base_time)) {
>> + sched = admin;
>> + sched_changed = 1;
>> + continue;
>> + }
>> +
>> + transmit_end_time = ktime_add(txtime, packet_transmit_time);
>> + minimum_time = transmit_end_time;
>> +
>> + /* Update the txtime of current entry to the next time it's
>> + * interval starts.
>> + */
>> + if (ktime_after(transmit_end_time, interval_end))
>> + entry->next_txtime = ktime_add(interval_start, sched->cycle_time);
>> + } while (sched_changed || ktime_after(transmit_end_time, interval_end));
>> +
>> + entry->next_txtime = transmit_end_time;
>> +
>> +done:
>> + rcu_read_unlock();
>> + return txtime;
>> +}
>> +
>> static int taprio_enqueue(struct sk_buff *skb, struct Qdisc *sch,
>> struct sk_buff **to_free)
>> {
>> @@ -129,6 +361,15 @@ static int taprio_enqueue(struct sk_buff *skb, struct Qdisc *sch,
>> if (unlikely(!child))
>> return qdisc_drop(skb, sch, to_free);
>> + if (skb->sk && sock_flag(skb->sk, SOCK_TXTIME)) {
>> + if (!is_valid_interval(skb, sch))
>> + return qdisc_drop(skb, sch, to_free);
>> + } else if (TXTIME_OFFLOAD_IS_ON(q->offload_flags)) {
>> + skb->tstamp = get_packet_txtime(skb, sch);
>> + if (!skb->tstamp)
>> + return qdisc_drop(skb, sch, to_free);
>> + }
>> +
>> qdisc_qstats_backlog_inc(sch, skb);
>> sch->q.qlen++;
>> @@ -206,11 +447,6 @@ static struct sk_buff *taprio_peek(struct Qdisc *sch)
>> return q->peek(sch);
>> }
>> -static inline int length_to_duration(struct taprio_sched *q, int len)
>> -{
>> - return div_u64(len * atomic64_read(&q->picos_per_byte), 1000);
>> -}
>> -
>> static void taprio_set_budget(struct taprio_sched *q, struct sched_entry *entry)
>> {
>> atomic_set(&entry->budget,
>> @@ -594,7 +830,8 @@ static int parse_taprio_schedule(struct nlattr **tb,
>> static int taprio_parse_mqprio_opt(struct net_device *dev,
>> struct tc_mqprio_qopt *qopt,
>> - struct netlink_ext_ack *extack)
>> + struct netlink_ext_ack *extack,
>> + u32 offload_flags)
>> {
>> int i, j;
>> @@ -642,6 +879,9 @@ static int taprio_parse_mqprio_opt(struct net_device *dev,
>> return -EINVAL;
>> }
>> + if (TXTIME_OFFLOAD_IS_ON(offload_flags))
>> + continue;
>> +
>> /* Verify that the offset and counts do not overlap */
>> for (j = i + 1; j < qopt->num_tc; j++) {
>> if (last > qopt->offset[j]) {
>> @@ -804,6 +1044,9 @@ static int taprio_enable_offload(struct net_device *dev,
>> const struct net_device_ops *ops = dev->netdev_ops;
>> int err = 0;
>> + if (TXTIME_OFFLOAD_IS_ON(offload_flags))
>> + goto done;
>> +
>> if (!FULL_OFFLOAD_IS_ON(offload_flags)) {
>> NL_SET_ERR_MSG(extack, "Offload mode is not supported");
>> return -EOPNOTSUPP;
>> @@ -816,15 +1059,28 @@ static int taprio_enable_offload(struct net_device *dev,
>> /* FIXME: enable offloading */
>> - q->dequeue = taprio_dequeue_offload;
>> - q->peek = taprio_peek_offload;
>> -
>> - if (err == 0)
>> +done:
>> + if (err == 0) {
>> + q->dequeue = taprio_dequeue_offload;
>> + q->peek = taprio_peek_offload;
>> q->offload_flags = offload_flags;
>> + }
>> return err;
>> }
>> +static void setup_txtime(struct taprio_sched *q,
>> + struct sched_gate_list *sched, ktime_t base)
>> +{
>> + struct sched_entry *entry;
>> + u32 interval = 0;
>> +
>> + list_for_each_entry(entry, &sched->entries, list) {
>> + entry->next_txtime = ktime_add_ns(base, interval);
>> + interval += entry->interval;
>> + }
>> +}
>> +
>> static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
>> struct netlink_ext_ack *extack)
>> {
>> @@ -846,7 +1102,10 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
>> if (tb[TCA_TAPRIO_ATTR_PRIOMAP])
>> mqprio = nla_data(tb[TCA_TAPRIO_ATTR_PRIOMAP]);
>> - err = taprio_parse_mqprio_opt(dev, mqprio, extack);
>> + if (tb[TCA_TAPRIO_ATTR_OFFLOAD_FLAGS])
>> + offload_flags = nla_get_u32(tb[TCA_TAPRIO_ATTR_OFFLOAD_FLAGS]);
>> +
>> + err = taprio_parse_mqprio_opt(dev, mqprio, extack, offload_flags);
>> if (err < 0)
>> return err;
>> @@ -887,6 +1146,10 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
>> goto free_sched;
>> }
>> + /* preserve offload flags when changing the schedule. */
>> + if (q->offload_flags)
>> + offload_flags = q->offload_flags;
>> +
>> if (tb[TCA_TAPRIO_ATTR_SCHED_CLOCKID]) {
>> clockid = nla_get_s32(tb[TCA_TAPRIO_ATTR_SCHED_CLOCKID]);
>> @@ -914,7 +1177,10 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
>> /* Protects against enqueue()/dequeue() */
>> spin_lock_bh(qdisc_lock(sch));
>> - if (!hrtimer_active(&q->advance_timer)) {
>> + if (tb[TCA_TAPRIO_ATTR_TXTIME_DELAY])
>> + q->txtime_delay = nla_get_s32(tb[TCA_TAPRIO_ATTR_TXTIME_DELAY]);
>> +
>> + if (!TXTIME_OFFLOAD_IS_ON(offload_flags) && !hrtimer_active(&q->advance_timer)) {
>> hrtimer_init(&q->advance_timer, q->clockid, HRTIMER_MODE_ABS);
>> q->advance_timer.function = advance_sched;
>> }
>> @@ -966,20 +1232,35 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
>> goto unlock;
>> }
>> - setup_first_close_time(q, new_admin, start);
>> + if (TXTIME_OFFLOAD_IS_ON(offload_flags)) {
>> + setup_txtime(q, new_admin, start);
>> +
>> + if (!oper) {
>> + rcu_assign_pointer(q->oper_sched, new_admin);
>> + err = 0;
>> + new_admin = NULL;
>> + goto unlock;
>> + }
>> +
>> + rcu_assign_pointer(q->admin_sched, new_admin);
>> + if (admin)
>> + call_rcu(&admin->rcu, taprio_free_sched_cb);
>> + } else {
>> + setup_first_close_time(q, new_admin, start);
>> - /* Protects against advance_sched() */
>> - spin_lock_irqsave(&q->current_entry_lock, flags);
>> + /* Protects against advance_sched() */
>> + spin_lock_irqsave(&q->current_entry_lock, flags);
>> - taprio_start_sched(sch, start, new_admin);
>> + taprio_start_sched(sch, start, new_admin);
>> - rcu_assign_pointer(q->admin_sched, new_admin);
>> - if (admin)
>> - call_rcu(&admin->rcu, taprio_free_sched_cb);
>> - new_admin = NULL;
>> + rcu_assign_pointer(q->admin_sched, new_admin);
>> + if (admin)
>> + call_rcu(&admin->rcu, taprio_free_sched_cb);
>> - spin_unlock_irqrestore(&q->current_entry_lock, flags);
>> + spin_unlock_irqrestore(&q->current_entry_lock, flags);
>> + }
>> + new_admin = NULL;
>> err = 0;
>> unlock:
>> @@ -1225,6 +1506,9 @@ static int taprio_dump(struct Qdisc *sch, struct sk_buff *skb)
>> if (nla_put_u32(skb, TCA_TAPRIO_ATTR_OFFLOAD_FLAGS, q->offload_flags))
>> goto options_error;
>> + if (nla_put_s32(skb, TCA_TAPRIO_ATTR_TXTIME_DELAY, q->txtime_delay))
>> + goto options_error;
>> +
>> if (oper && dump_schedule(skb, oper))
>> goto options_error;
>>
>
More information about the Intel-wired-lan
mailing list