1. 22 12月, 2017 3 次提交
  2. 09 12月, 2017 1 次提交
  3. 08 11月, 2017 1 次提交
  4. 28 10月, 2017 1 次提交
  5. 19 10月, 2017 1 次提交
  6. 17 10月, 2017 1 次提交
  7. 14 10月, 2017 1 次提交
    • A
      mqprio: Introduce new hardware offload mode and shaper in mqprio · 4e8b86c0
      Amritha Nambiar 提交于
      The offload types currently supported in mqprio are 0 (no offload) and
      1 (offload only TCs) by setting these values for the 'hw' option. If
      offloads are supported by setting the 'hw' option to 1, the default
      offload mode is 'dcb' where only the TC values are offloaded to the
      device. This patch introduces a new hardware offload mode called
      'channel' with 'hw' set to 1 in mqprio which makes full use of the
      mqprio options, the TCs, the queue configurations and the QoS parameters
      for the TCs. This is achieved through a new netlink attribute for the
      'mode' option which takes values such as 'dcb' (default) and 'channel'.
      The 'channel' mode also supports QoS attributes for traffic class such as
      minimum and maximum values for bandwidth rate limits.
      
      This patch enables configuring additional HW shaper attributes associated
      with a traffic class. Currently the shaper for bandwidth rate limiting is
      supported which takes options such as minimum and maximum bandwidth rates
      and are offloaded to the hardware in the 'channel' mode. The min and max
      limits for bandwidth rates are provided by the user along with the TCs
      and the queue configurations when creating the mqprio qdisc. The interface
      can be extended to support new HW shapers in future through the 'shaper'
      attribute.
      
      Introduces a new data structure 'tc_mqprio_qopt_offload' for offloading
      mqprio queue options and use this to be shared between the kernel and
      device driver. This contains a copy of the existing data structure
      for mqprio queue options. This new data structure can be extended when
      adding new attributes for traffic class such as mode, shaper, shaper
      parameters (bandwidth rate limits). The existing data structure for mqprio
      queue options will be shared between the kernel and userspace.
      
      Example:
        queues 4@0 4@4 hw 1 mode channel shaper bw_rlimit\
        min_rate 1Gbit 2Gbit max_rate 4Gbit 5Gbit
      
      To dump the bandwidth rates:
      
      qdisc mqprio 804a: root  tc 2 map 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
                   queues:(0:3) (4:7)
                   mode:channel
                   shaper:bw_rlimit   min_rate:1Gbit 2Gbit   max_rate:4Gbit 5Gbit
      Signed-off-by: NAmritha Nambiar <amritha.nambiar@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      4e8b86c0
  8. 26 8月, 2017 1 次提交
    • W
      net_sched: remove tc class reference counting · 143976ce
      WANG Cong 提交于
      For TC classes, their ->get() and ->put() are always paired, and the
      reference counting is completely useless, because:
      
      1) For class modification and dumping paths, we already hold RTNL lock,
         so all of these ->get(),->change(),->put() are atomic.
      
      2) For filter bindiing/unbinding, we use other reference counter than
         this one, and they should have RTNL lock too.
      
      3) For ->qlen_notify(), it is special because it is called on ->enqueue()
         path, but we already hold qdisc tree lock there, and we hold this
         tree lock when graft or delete the class too, so it should not be gone
         or changed until we release the tree lock.
      
      Therefore, this patch removes ->get() and ->put(), but:
      
      1) Adds a new ->find() to find the pointer to a class by classid, no
         refcnt.
      
      2) Move the original class destroy upon the last refcnt into ->delete(),
         right after releasing tree lock. This is fine because the class is
         already removed from hash when holding the lock.
      
      For those who also use ->put() as ->unbind(), just rename them to reflect
      this change.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      143976ce
  9. 08 8月, 2017 3 次提交
  10. 08 6月, 2017 1 次提交
  11. 16 3月, 2017 2 次提交
    • A
      mqprio: Modify mqprio to pass user parameters via ndo_setup_tc. · 56f36acd
      Amritha Nambiar 提交于
      The configurable priority to traffic class mapping and the user specified
      queue ranges are used to configure the traffic class, overriding the
      hardware defaults when the 'hw' option is set to 0. However, when the 'hw'
      option is non-zero, the hardware QOS defaults are used.
      
      This patch makes it so that we can pass the data the user provided to
      ndo_setup_tc. This allows us to pull in the queue configuration if the
      user requested it as well as any additional hardware offload type
      requested by using a value other than 1 for the hw value.
      
      Finally it also provides a means for the device driver to return the level
      supported for the offload type via the qopt->hw value. Previously we were
      just always assuming the value to be 1, in the future values beyond just 1
      may be supported.
      Signed-off-by: NAmritha Nambiar <amritha.nambiar@intel.com>
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      56f36acd
    • A
      mqprio: Change handling of hw u8 to allow for multiple hardware offload modes · 2026fecf
      Alexander Duyck 提交于
      This patch is meant to allow for support of multiple hardware offload type
      for a single device. There is currently no bounds checking for the hw
      member of the mqprio_qopt structure.  This results in us being able to pass
      values from 1 to 255 with all being treated the same.  On retreiving the
      value it is returned as 1 for anything 1 or greater being set.
      
      With this change we are currently adding limited bounds checking by
      defining an enum and using those values to limit the reported hardware
      offloads.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2026fecf
  12. 13 3月, 2017 1 次提交
  13. 12 2月, 2017 1 次提交
    • E
      net_sched: fix error recovery at qdisc creation · 87b60cfa
      Eric Dumazet 提交于
      Dmitry reported uses after free in qdisc code [1]
      
      The problem here is that ops->init() can return an error.
      
      qdisc_create_dflt() then call ops->destroy(),
      while qdisc_create() does _not_ call it.
      
      Four qdisc chose to call their own ops->destroy(), assuming their caller
      would not.
      
      This patch makes sure qdisc_create() calls ops->destroy()
      and fixes the four qdisc to avoid double free.
      
      [1]
      BUG: KASAN: use-after-free in mq_destroy+0x242/0x290 net/sched/sch_mq.c:33 at addr ffff8801d415d440
      Read of size 8 by task syz-executor2/5030
      CPU: 0 PID: 5030 Comm: syz-executor2 Not tainted 4.3.5-smp-DEV #119
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
       0000000000000046 ffff8801b435b870 ffffffff81bbbed4 ffff8801db000400
       ffff8801d415d440 ffff8801d415dc40 ffff8801c4988510 ffff8801b435b898
       ffffffff816682b1 ffff8801b435b928 ffff8801d415d440 ffff8801c49880c0
      Call Trace:
       [<ffffffff81bbbed4>] __dump_stack lib/dump_stack.c:15 [inline]
       [<ffffffff81bbbed4>] dump_stack+0x6c/0x98 lib/dump_stack.c:51
       [<ffffffff816682b1>] kasan_object_err+0x21/0x70 mm/kasan/report.c:158
       [<ffffffff81668524>] print_address_description mm/kasan/report.c:196 [inline]
       [<ffffffff81668524>] kasan_report_error+0x1b4/0x4b0 mm/kasan/report.c:285
       [<ffffffff81668953>] kasan_report mm/kasan/report.c:305 [inline]
       [<ffffffff81668953>] __asan_report_load8_noabort+0x43/0x50 mm/kasan/report.c:326
       [<ffffffff82527b02>] mq_destroy+0x242/0x290 net/sched/sch_mq.c:33
       [<ffffffff82524bdd>] qdisc_destroy+0x12d/0x290 net/sched/sch_generic.c:953
       [<ffffffff82524e30>] qdisc_create_dflt+0xf0/0x120 net/sched/sch_generic.c:848
       [<ffffffff8252550d>] attach_default_qdiscs net/sched/sch_generic.c:1029 [inline]
       [<ffffffff8252550d>] dev_activate+0x6ad/0x880 net/sched/sch_generic.c:1064
       [<ffffffff824b1db1>] __dev_open+0x221/0x320 net/core/dev.c:1403
       [<ffffffff824b24ce>] __dev_change_flags+0x15e/0x3e0 net/core/dev.c:6858
       [<ffffffff824b27de>] dev_change_flags+0x8e/0x140 net/core/dev.c:6926
       [<ffffffff824f5bf6>] dev_ifsioc+0x446/0x890 net/core/dev_ioctl.c:260
       [<ffffffff824f61fa>] dev_ioctl+0x1ba/0xb80 net/core/dev_ioctl.c:546
       [<ffffffff82430509>] sock_do_ioctl+0x99/0xb0 net/socket.c:879
       [<ffffffff82430d30>] sock_ioctl+0x2a0/0x390 net/socket.c:958
       [<ffffffff816f3b68>] vfs_ioctl fs/ioctl.c:44 [inline]
       [<ffffffff816f3b68>] do_vfs_ioctl+0x8a8/0xe50 fs/ioctl.c:611
       [<ffffffff816f41a4>] SYSC_ioctl fs/ioctl.c:626 [inline]
       [<ffffffff816f41a4>] SyS_ioctl+0x94/0xc0 fs/ioctl.c:617
       [<ffffffff8123e357>] entry_SYSCALL_64_fastpath+0x12/0x17
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      87b60cfa
  14. 11 8月, 2016 1 次提交
  15. 08 6月, 2016 1 次提交
    • E
      net: sched: do not acquire qdisc spinlock in qdisc/class stats dump · edb09eb1
      Eric Dumazet 提交于
      Large tc dumps (tc -s {qdisc|class} sh dev ethX) done by Google BwE host
      agent [1] are problematic at scale :
      
      For each qdisc/class found in the dump, we currently lock the root qdisc
      spinlock in order to get stats. Sampling stats every 5 seconds from
      thousands of HTB classes is a challenge when the root qdisc spinlock is
      under high pressure. Not only the dumps take time, they also slow
      down the fast path (queue/dequeue packets) by 10 % to 20 % in some cases.
      
      An audit of existing qdiscs showed that sch_fq_codel is the only qdisc
      that might need the qdisc lock in fq_codel_dump_stats() and
      fq_codel_dump_class_stats()
      
      In v2 of this patch, I now use the Qdisc running seqcount to provide
      consistent reads of packets/bytes counters, regardless of 32/64 bit arches.
      
      I also changed rate estimators to use the same infrastructure
      so that they no longer need to lock root qdisc lock.
      
      [1]
      http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43838.pdfSigned-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Kevin Athey <kda@google.com>
      Cc: Xiaotian Pei <xiaotian@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      edb09eb1
  16. 04 3月, 2016 1 次提交
  17. 02 3月, 2016 1 次提交
    • D
      sch_mqprio: Fix build with older gcc. · 241deec9
      David S. Miller 提交于
        CC [M]  net/sched/sch_mqprio.o
      net/sched/sch_mqprio.c: In function ?mqprio_init?:
      net/sched/sch_mqprio.c:145: error: unknown field ?tc? specified in initializer
      net/sched/sch_mqprio.c:145: warning: missing braces around initializer
      net/sched/sch_mqprio.c:145: warning: (near initialization for ?tc.<anonymous>?)
      make[2]: *** [net/sched/sch_mqprio.o] Error 1
      make[1]: *** [net/sched] Error 2
      make: *** [net] Error 2
      
      Several people reported this, surround the unnamed union
      member initialization with braces to fix.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      241deec9
  18. 17 2月, 2016 2 次提交
  19. 04 12月, 2015 1 次提交
    • E
      net_sched: fix qdisc_tree_decrease_qlen() races · 4eaf3b84
      Eric Dumazet 提交于
      qdisc_tree_decrease_qlen() suffers from two problems on multiqueue
      devices.
      
      One problem is that it updates sch->q.qlen and sch->qstats.drops
      on the mq/mqprio root qdisc, while it should not : Daniele
      reported underflows errors :
      [  681.774821] PAX: sch->q.qlen: 0 n: 1
      [  681.774825] PAX: size overflow detected in function qdisc_tree_decrease_qlen net/sched/sch_api.c:769 cicus.693_49 min, count: 72, decl: qlen; num: 0; context: sk_buff_head;
      [  681.774954] CPU: 2 PID: 19 Comm: ksoftirqd/2 Tainted: G           O    4.2.6.201511282239-1-grsec #1
      [  681.774955] Hardware name: ASUSTeK COMPUTER INC. X302LJ/X302LJ, BIOS X302LJ.202 03/05/2015
      [  681.774956]  ffffffffa9a04863 0000000000000000 0000000000000000 ffffffffa990ff7c
      [  681.774959]  ffffc90000d3bc38 ffffffffa95d2810 0000000000000007 ffffffffa991002b
      [  681.774960]  ffffc90000d3bc68 ffffffffa91a44f4 0000000000000001 0000000000000001
      [  681.774962] Call Trace:
      [  681.774967]  [<ffffffffa95d2810>] dump_stack+0x4c/0x7f
      [  681.774970]  [<ffffffffa91a44f4>] report_size_overflow+0x34/0x50
      [  681.774972]  [<ffffffffa94d17e2>] qdisc_tree_decrease_qlen+0x152/0x160
      [  681.774976]  [<ffffffffc02694b1>] fq_codel_dequeue+0x7b1/0x820 [sch_fq_codel]
      [  681.774978]  [<ffffffffc02680a0>] ? qdisc_peek_dequeued+0xa0/0xa0 [sch_fq_codel]
      [  681.774980]  [<ffffffffa94cd92d>] __qdisc_run+0x4d/0x1d0
      [  681.774983]  [<ffffffffa949b2b2>] net_tx_action+0xc2/0x160
      [  681.774985]  [<ffffffffa90664c1>] __do_softirq+0xf1/0x200
      [  681.774987]  [<ffffffffa90665ee>] run_ksoftirqd+0x1e/0x30
      [  681.774989]  [<ffffffffa90896b0>] smpboot_thread_fn+0x150/0x260
      [  681.774991]  [<ffffffffa9089560>] ? sort_range+0x40/0x40
      [  681.774992]  [<ffffffffa9085fe4>] kthread+0xe4/0x100
      [  681.774994]  [<ffffffffa9085f00>] ? kthread_worker_fn+0x170/0x170
      [  681.774995]  [<ffffffffa95d8d1e>] ret_from_fork+0x3e/0x70
      
      mq/mqprio have their own ways to report qlen/drops by folding stats on
      all their queues, with appropriate locking.
      
      A second problem is that qdisc_tree_decrease_qlen() calls qdisc_lookup()
      without proper locking : concurrent qdisc updates could corrupt the list
      that qdisc_match_from_root() parses to find a qdisc given its handle.
      
      Fix first problem adding a TCQ_F_NOPARENT qdisc flag that
      qdisc_tree_decrease_qlen() can use to abort its tree traversal,
      as soon as it meets a mq/mqprio qdisc children.
      
      Second problem can be fixed by RCU protection.
      Qdisc are already freed after RCU grace period, so qdisc_list_add() and
      qdisc_list_del() simply have to use appropriate rcu list variants.
      
      A future patch will add a per struct netdev_queue list anchor, so that
      qdisc_tree_decrease_qlen() can have more efficient lookups.
      Reported-by: NDaniele Fucini <dfucini@gmail.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Cong Wang <cwang@twopensource.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4eaf3b84
  20. 30 9月, 2014 3 次提交
  21. 14 9月, 2014 2 次提交
  22. 10 12月, 2013 1 次提交
  23. 31 8月, 2013 1 次提交
    • S
      qdisc: allow setting default queuing discipline · 6da7c8fc
      stephen hemminger 提交于
      By default, the pfifo_fast queue discipline has been used by default
      for all devices. But we have better choices now.
      
      This patch allow setting the default queueing discipline with sysctl.
      This allows easy use of better queueing disciplines on all devices
      without having to use tc qdisc scripts. It is intended to allow
      an easy path for distributions to make fq_codel or sfq the default
      qdisc.
      
      This patch also makes pfifo_fast more of a first class qdisc, since
      it is now possible to manually override the default and explicitly
      use pfifo_fast. The behavior for systems who do not use the sysctl
      is unchanged, they still get pfifo_fast
      
      Also removes leftover random # in sysctl net core.
      Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6da7c8fc
  24. 12 12月, 2012 1 次提交
    • E
      pkt_sched: avoid requeues if possible · 1abbe139
      Eric Dumazet 提交于
      With BQL being deployed, we can more likely have following behavior :
      
      We dequeue a packet from qdisc in dequeue_skb(), then we realize target
      tx queue is in XOFF state in sch_direct_xmit(), and we have to hold the
      skb into gso_skb for later.
      
      This shows in stats (tc -s qdisc dev eth0) as requeues.
      
      Problem of these requeues is that high priority packets can not be
      dequeued as long as this (possibly low prio and big TSO packet) is not
      removed from gso_skb.
      
      At 1Gbps speed, a full size TSO packet is 500 us of extra latency.
      
      In some cases, we know that all packets dequeued from a qdisc are
      for a particular and known txq :
      
      - If device is non multi queue
      - For all MQ/MQPRIO slave qdiscs
      
      This patch introduces a new qdisc flag, TCQ_F_ONETXQUEUE to mark
      this capability, so that dequeue_skb() is allowed to dequeue a packet
      only if the associated txq is not stopped.
      
      This indeed reduce latencies for high prio packets (or improve fairness
      with sfq/fq_codel), and almost remove qdisc 'requeues'.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1abbe139
  25. 02 4月, 2012 1 次提交
  26. 23 12月, 2011 1 次提交
  27. 01 11月, 2011 1 次提交
  28. 24 2月, 2011 1 次提交
  29. 15 2月, 2011 1 次提交
  30. 27 1月, 2011 1 次提交
  31. 22 1月, 2011 1 次提交
    • E
      net_sched: TCQ_F_CAN_BYPASS generalization · 23624935
      Eric Dumazet 提交于
      Now qdisc stab is handled before TCQ_F_CAN_BYPASS test in
      __dev_xmit_skb(), we can generalize TCQ_F_CAN_BYPASS to other qdiscs
      than pfifo_fast : pfifo, bfifo, pfifo_head_drop and sfq
      
      SFQ is special because it can have external classifiers, and in these
      cases, we cannot bypass queue discipline (packet could be dropped by
      classifier) without admin asking it, or further changes.
      
      Its worth doing this, especially for SFQ, avoiding dirtying memory in
      case no packets are already waiting in queue.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      23624935