1. 20 9月, 2016 1 次提交
  2. 19 8月, 2016 2 次提交
    • J
      net: sched: avoid duplicates in qdisc dump · ea327469
      Jiri Kosina 提交于
      tc_dump_qdisc() performs dumping of the per-device qdiscs in two phases;
      first, the "standard" dev->qdisc is being dumped. Second, if there is/are
      ingress queue(s), they are being dumped as well.
      
      After conversion of netdevice's qdisc linked-list into hashtable, these
      two sets are not in two disjunctive sets/lists any more, but are both
      "reachable" directly from netdevice's hashtable. As a consequence, the
      "full-depth" dump of the ingress qdiscs results in immediately hitting the
      netdevice hashtable again, and duplicating the dump that has already been
      performed for dev->qdisc.
      What in fact needs to be dumped in case of ingress queue is "just" the
      top-level ingress qdisc, as everything else has been dumped already.
      
      Fix this by extending tc_dump_qdisc_root() in a way that it can be instructed
      whether it should (while performing the "full" per-netdev qdisc dump) perform
      the whole recursion, or just dump "additional" top-level (ingress) qdiscs
      without performing any kind of recursion.
      
      This fixes duplicate dumps such as
      
      	qdisc mq 0: root
      	qdisc pfifo_fast 0: parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
      	qdisc pfifo_fast 0: parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
      	qdisc pfifo_fast 0: parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
      	qdisc pfifo_fast 0: parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
      	qdisc clsact ffff: parent ffff:fff1
      	qdisc pfifo_fast 0: parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
      	qdisc pfifo_fast 0: parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
      	qdisc pfifo_fast 0: parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
      	qdisc pfifo_fast 0: parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
      
      Fixes: 59cc1f61 ("net: sched: convert qdisc linked list to hashtable")
      Reported-by: NDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea327469
    • J
      net: sched: fix handling of singleton qdiscs with qdisc_hash · 69012ae4
      Jiri Kosina 提交于
      qdisc_match_from_root() is now iterating over per-netdevice qdisc
      hashtable instead of going through a linked-list of qdiscs (independently
      on the actual underlying netdev), which was the case before the switch to
      hashtable for qdiscs.
      
      For singleton qdiscs, there is no underlying netdev associated though, and
      therefore dumping a singleton qdisc will panic, as qdisc_dev(root) will
      always be NULL.
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000410
       IP: [<ffffffff8167efac>] qdisc_match_from_root+0x2c/0x70
       PGD 1aceba067 PUD 1aceb7067 PMD 0
       Oops: 0000 [#1] PREEMPT SMP
      [ ... ]
       task: ffff8801ec996e00 task.stack: ffff8801ec934000
       RIP: 0010:[<ffffffff8167efac>]  [<ffffffff8167efac>] qdisc_match_from_root+0x2c/0x70
       RSP: 0018:ffff8801ec937ab0  EFLAGS: 00010203
       RAX: 0000000000000408 RBX: ffff88025e612000 RCX: ffffffffffffffd8
       RDX: 0000000000000000 RSI: 00000000ffff0000 RDI: ffffffff81cf8100
       RBP: ffff8801ec937ab0 R08: 000000000001c160 R09: ffff8802668032c0
       R10: ffffffff81cf8100 R11: 0000000000000030 R12: 00000000ffff0000
       R13: ffff88025e612000 R14: ffffffff81cf3140 R15: 0000000000000000
       FS:  00007f24b9af6740(0000) GS:ffff88026f280000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000410 CR3: 00000001aceec000 CR4: 00000000001406e0
       Stack:
        ffff8801ec937ad0 ffffffff81681210 ffff88025dd51a00 00000000fffffff1
        ffff8801ec937b88 ffffffff81681e4e ffffffff81c42bc0 ffff880262431500
        ffffffff81cf3140 ffff88025dd51a10 ffff88025dd51a24 00000000ec937b38
       Call Trace:
        [<ffffffff81681210>] qdisc_lookup+0x40/0x50
        [<ffffffff81681e4e>] tc_modify_qdisc+0x21e/0x550
        [<ffffffff8166ae25>] rtnetlink_rcv_msg+0x95/0x220
        [<ffffffff81209602>] ? __kmalloc_track_caller+0x172/0x230
        [<ffffffff8166ad90>] ? rtnl_newlink+0x870/0x870
        [<ffffffff816897b7>] netlink_rcv_skb+0xa7/0xc0
        [<ffffffff816657c8>] rtnetlink_rcv+0x28/0x30
        [<ffffffff8168919b>] netlink_unicast+0x15b/0x210
        [<ffffffff81689569>] netlink_sendmsg+0x319/0x390
        [<ffffffff816379f8>] sock_sendmsg+0x38/0x50
        [<ffffffff81638296>] ___sys_sendmsg+0x256/0x260
        [<ffffffff811b1275>] ? __pagevec_lru_add_fn+0x135/0x280
        [<ffffffff811b1a90>] ? pagevec_lru_move_fn+0xd0/0xf0
        [<ffffffff811b1140>] ? trace_event_raw_event_mm_lru_insertion+0x180/0x180
        [<ffffffff811b1b85>] ? __lru_cache_add+0x75/0xb0
        [<ffffffff817708a6>] ? _raw_spin_unlock+0x16/0x40
        [<ffffffff811d8dff>] ? handle_mm_fault+0x39f/0x1160
        [<ffffffff81638b15>] __sys_sendmsg+0x45/0x80
        [<ffffffff81638b62>] SyS_sendmsg+0x12/0x20
        [<ffffffff810038e7>] do_syscall_64+0x57/0xb0
      
      Fix this by special-casing singleton qdiscs (those that don't have
      underlying netdevice) and introduce immediate handling of those rather
      than trying to go over an underlying netdevice. We're in the same
      situation in tc_dump_qdisc_root() and tc_dump_tclass_root().
      
      Ultimately, this will have to be slightly reworked so that we are actually
      able to show singleton qdiscs (noop) in the dump properly; but we're not
      currently doing that anyway, so no regression there, and better do this in
      a gradual manner.
      
      Fixes: 59cc1f61 ("net: sched: convert qdisc linked list to hashtable")
      Reported-by: NDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: NDaniel Borkmann <daniel@iogearbox.net>
      Reported-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Tested-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      69012ae4
  3. 11 8月, 2016 1 次提交
  4. 13 6月, 2016 1 次提交
  5. 11 6月, 2016 1 次提交
    • E
      net_sched: remove generic throttled management · 45f50bed
      Eric Dumazet 提交于
      __QDISC_STATE_THROTTLED bit manipulation is rather expensive
      for HTB and few others.
      
      I already removed it for sch_fq in commit f2600cf0
      ("net: sched: avoid costly atomic operation in fq_dequeue()")
      and so far nobody complained.
      
      When one ore more packets are stuck in one or more throttled
      HTB class, a htb dequeue() performs two atomic operations
      to clear/set __QDISC_STATE_THROTTLED bit, while root qdisc
      lock is held.
      
      Removing this pair of atomic operations bring me a 8 % performance
      increase on 200 TCP_RR tests, in presence of throttled classes.
      
      This patch has no side effect, since nothing actually uses
      disc_is_throttled() anymore.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      45f50bed
  6. 08 6月, 2016 1 次提交
    • E
      net: sched: do not acquire qdisc spinlock in qdisc/class stats dump · edb09eb1
      Eric Dumazet 提交于
      Large tc dumps (tc -s {qdisc|class} sh dev ethX) done by Google BwE host
      agent [1] are problematic at scale :
      
      For each qdisc/class found in the dump, we currently lock the root qdisc
      spinlock in order to get stats. Sampling stats every 5 seconds from
      thousands of HTB classes is a challenge when the root qdisc spinlock is
      under high pressure. Not only the dumps take time, they also slow
      down the fast path (queue/dequeue packets) by 10 % to 20 % in some cases.
      
      An audit of existing qdiscs showed that sch_fq_codel is the only qdisc
      that might need the qdisc lock in fq_codel_dump_stats() and
      fq_codel_dump_class_stats()
      
      In v2 of this patch, I now use the Qdisc running seqcount to provide
      consistent reads of packets/bytes counters, regardless of 32/64 bit arches.
      
      I also changed rate estimators to use the same infrastructure
      so that they no longer need to lock root qdisc lock.
      
      [1]
      http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43838.pdfSigned-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Kevin Athey <kda@google.com>
      Cc: Xiaotian Pei <xiaotian@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      edb09eb1
  7. 25 5月, 2016 1 次提交
    • E
      net_sched: avoid too many hrtimer_start() calls · a9efad8b
      Eric Dumazet 提交于
      I found a serious performance bug in packet schedulers using hrtimers.
      
      sch_htb and sch_fq are definitely impacted by this problem.
      
      We constantly rearm high resolution timers if some packets are throttled
      in one (or more) class, and other packets are flying through qdisc on
      another (non throttled) class.
      
      hrtimer_start() does not have the mod_timer() trick of doing nothing if
      expires value does not change :
      
      	if (timer_pending(timer) &&
                  timer->expires == expires)
                      return 1;
      
      This issue is particularly visible when multiple cpus can queue/dequeue
      packets on the same qdisc, as hrtimer code has to lock a remote base.
      
      I used following fix :
      
      1) Change htb to use qdisc_watchdog_schedule_ns() instead of open-coding
      it.
      
      2) Cache watchdog prior expiration. hrtimer might provide this, but I
      prefer to not rely on some hrtimer internal.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a9efad8b
  8. 27 4月, 2016 1 次提交
  9. 01 3月, 2016 1 次提交
  10. 19 2月, 2016 2 次提交
  11. 16 12月, 2015 1 次提交
  12. 04 12月, 2015 1 次提交
    • E
      net_sched: fix qdisc_tree_decrease_qlen() races · 4eaf3b84
      Eric Dumazet 提交于
      qdisc_tree_decrease_qlen() suffers from two problems on multiqueue
      devices.
      
      One problem is that it updates sch->q.qlen and sch->qstats.drops
      on the mq/mqprio root qdisc, while it should not : Daniele
      reported underflows errors :
      [  681.774821] PAX: sch->q.qlen: 0 n: 1
      [  681.774825] PAX: size overflow detected in function qdisc_tree_decrease_qlen net/sched/sch_api.c:769 cicus.693_49 min, count: 72, decl: qlen; num: 0; context: sk_buff_head;
      [  681.774954] CPU: 2 PID: 19 Comm: ksoftirqd/2 Tainted: G           O    4.2.6.201511282239-1-grsec #1
      [  681.774955] Hardware name: ASUSTeK COMPUTER INC. X302LJ/X302LJ, BIOS X302LJ.202 03/05/2015
      [  681.774956]  ffffffffa9a04863 0000000000000000 0000000000000000 ffffffffa990ff7c
      [  681.774959]  ffffc90000d3bc38 ffffffffa95d2810 0000000000000007 ffffffffa991002b
      [  681.774960]  ffffc90000d3bc68 ffffffffa91a44f4 0000000000000001 0000000000000001
      [  681.774962] Call Trace:
      [  681.774967]  [<ffffffffa95d2810>] dump_stack+0x4c/0x7f
      [  681.774970]  [<ffffffffa91a44f4>] report_size_overflow+0x34/0x50
      [  681.774972]  [<ffffffffa94d17e2>] qdisc_tree_decrease_qlen+0x152/0x160
      [  681.774976]  [<ffffffffc02694b1>] fq_codel_dequeue+0x7b1/0x820 [sch_fq_codel]
      [  681.774978]  [<ffffffffc02680a0>] ? qdisc_peek_dequeued+0xa0/0xa0 [sch_fq_codel]
      [  681.774980]  [<ffffffffa94cd92d>] __qdisc_run+0x4d/0x1d0
      [  681.774983]  [<ffffffffa949b2b2>] net_tx_action+0xc2/0x160
      [  681.774985]  [<ffffffffa90664c1>] __do_softirq+0xf1/0x200
      [  681.774987]  [<ffffffffa90665ee>] run_ksoftirqd+0x1e/0x30
      [  681.774989]  [<ffffffffa90896b0>] smpboot_thread_fn+0x150/0x260
      [  681.774991]  [<ffffffffa9089560>] ? sort_range+0x40/0x40
      [  681.774992]  [<ffffffffa9085fe4>] kthread+0xe4/0x100
      [  681.774994]  [<ffffffffa9085f00>] ? kthread_worker_fn+0x170/0x170
      [  681.774995]  [<ffffffffa95d8d1e>] ret_from_fork+0x3e/0x70
      
      mq/mqprio have their own ways to report qlen/drops by folding stats on
      all their queues, with appropriate locking.
      
      A second problem is that qdisc_tree_decrease_qlen() calls qdisc_lookup()
      without proper locking : concurrent qdisc updates could corrupt the list
      that qdisc_match_from_root() parses to find a qdisc given its handle.
      
      Fix first problem adding a TCQ_F_NOPARENT qdisc flag that
      qdisc_tree_decrease_qlen() can use to abort its tree traversal,
      as soon as it meets a mq/mqprio qdisc children.
      
      Second problem can be fixed by RCU protection.
      Qdisc are already freed after RCU grace period, so qdisc_list_add() and
      qdisc_list_del() simply have to use appropriate rcu list variants.
      
      A future patch will add a per struct netdev_queue list anchor, so that
      qdisc_tree_decrease_qlen() can have more efficient lookups.
      Reported-by: NDaniele Fucini <dfucini@gmail.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Cong Wang <cwang@twopensource.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4eaf3b84
  13. 29 8月, 2015 1 次提交
  14. 28 8月, 2015 2 次提交
  15. 28 5月, 2015 1 次提交
    • W
      net_sched: invoke ->attach() after setting dev->qdisc · 86e363dc
      WANG Cong 提交于
      For mq qdisc, we add per tx queue qdisc to root qdisc
      for display purpose, however, that happens too early,
      before the new dev->qdisc is finally set, this causes
      q->list points to an old root qdisc which is going to be
      freed right before assigning with a new one.
      
      Fix this by moving ->attach() after setting dev->qdisc.
      
      For the record, this fixes the following crash:
      
       ------------[ cut here ]------------
       WARNING: CPU: 1 PID: 975 at lib/list_debug.c:59 __list_del_entry+0x5a/0x98()
       list_del corruption. prev->next should be ffff8800d1998ae8, but was 6b6b6b6b6b6b6b6b
       CPU: 1 PID: 975 Comm: tc Not tainted 4.1.0-rc4+ #1019
       Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
        0000000000000009 ffff8800d73fb928 ffffffff81a44e7f 0000000047574756
        ffff8800d73fb978 ffff8800d73fb968 ffffffff810790da ffff8800cfc4cd20
        ffffffff814e725b ffff8800d1998ae8 ffffffff82381250 0000000000000000
       Call Trace:
        [<ffffffff81a44e7f>] dump_stack+0x4c/0x65
        [<ffffffff810790da>] warn_slowpath_common+0x9c/0xb6
        [<ffffffff814e725b>] ? __list_del_entry+0x5a/0x98
        [<ffffffff81079162>] warn_slowpath_fmt+0x46/0x48
        [<ffffffff81820eb0>] ? dev_graft_qdisc+0x5e/0x6a
        [<ffffffff814e725b>] __list_del_entry+0x5a/0x98
        [<ffffffff814e72a7>] list_del+0xe/0x2d
        [<ffffffff81822f05>] qdisc_list_del+0x1e/0x20
        [<ffffffff81820cd1>] qdisc_destroy+0x30/0xd6
        [<ffffffff81822676>] qdisc_graft+0x11d/0x243
        [<ffffffff818233c1>] tc_get_qdisc+0x1a6/0x1d4
        [<ffffffff810b5eaf>] ? mark_lock+0x2e/0x226
        [<ffffffff817ff8f5>] rtnetlink_rcv_msg+0x181/0x194
        [<ffffffff817ff72e>] ? rtnl_lock+0x17/0x19
        [<ffffffff817ff72e>] ? rtnl_lock+0x17/0x19
        [<ffffffff817ff774>] ? __rtnl_unlock+0x17/0x17
        [<ffffffff81855dc6>] netlink_rcv_skb+0x4d/0x93
        [<ffffffff817ff756>] rtnetlink_rcv+0x26/0x2d
        [<ffffffff818544b2>] netlink_unicast+0xcb/0x150
        [<ffffffff81161db9>] ? might_fault+0x59/0xa9
        [<ffffffff81854f78>] netlink_sendmsg+0x4fa/0x51c
        [<ffffffff817d6e09>] sock_sendmsg_nosec+0x12/0x1d
        [<ffffffff817d8967>] sock_sendmsg+0x29/0x2e
        [<ffffffff817d8cf3>] ___sys_sendmsg+0x1b4/0x23a
        [<ffffffff8100a1b8>] ? native_sched_clock+0x35/0x37
        [<ffffffff810a1d83>] ? sched_clock_local+0x12/0x72
        [<ffffffff810a1fd4>] ? sched_clock_cpu+0x9e/0xb7
        [<ffffffff810def2a>] ? current_kernel_time+0xe/0x32
        [<ffffffff810b4bc5>] ? lock_release_holdtime.part.29+0x71/0x7f
        [<ffffffff810ddebf>] ? read_seqcount_begin.constprop.27+0x5f/0x76
        [<ffffffff810b6292>] ? trace_hardirqs_on_caller+0x17d/0x199
        [<ffffffff811b14d5>] ? __fget_light+0x50/0x78
        [<ffffffff817d9808>] __sys_sendmsg+0x42/0x60
        [<ffffffff817d9838>] SyS_sendmsg+0x12/0x1c
        [<ffffffff81a50e97>] system_call_fastpath+0x12/0x6f
       ---[ end trace ef29d3fb28e97ae7 ]---
      
      For long term, we probably need to clean up the qdisc_graft() code
      in case it hides other bugs like this.
      
      Fixes: 95dc1929 ("pkt_sched: give visibility to mq slave qdiscs")
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86e363dc
  16. 14 5月, 2015 1 次提交
  17. 22 4月, 2015 1 次提交
  18. 10 3月, 2015 1 次提交
    • C
      net_sched: destroy proto tp when all filters are gone · 1e052be6
      Cong Wang 提交于
      Kernel automatically creates a tp for each
      (kind, protocol, priority) tuple, which has handle 0,
      when we add a new filter, but it still is left there
      after we remove our own, unless we don't specify the
      handle (literally means all the filters under
      the tuple). For example this one is left:
      
        # tc filter show dev eth0
        filter parent 8001: protocol arp pref 49152 basic
      
      The user-space is hard to clean up these for kernel
      because filters like u32 are organized in a complex way.
      So kernel is responsible to remove it after all filters
      are gone.  Each type of filter has its own way to
      store the filters, so each type has to provide its
      way to check if all filters are gone.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <cwang@twopensource.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: Jamal Hadi Salim<jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1e052be6
  19. 14 1月, 2015 1 次提交
  20. 22 10月, 2014 1 次提交
  21. 06 10月, 2014 1 次提交
  22. 05 10月, 2014 1 次提交
    • J
      net: sched: suspicious RCU usage in qdisc_watchdog · 1e203c1a
      John Fastabend 提交于
      Suspicious RCU usage in qdisc_watchdog call needs to be done inside
      rcu_read_lock/rcu_read_unlock. And then Qdisc destroy operations
      need to ensure timer is cancelled before removing qdisc structure.
      
      [ 3992.191339] ===============================
      [ 3992.191340] [ INFO: suspicious RCU usage. ]
      [ 3992.191343] 3.17.0-rc6net-next+ #72 Not tainted
      [ 3992.191345] -------------------------------
      [ 3992.191347] include/net/sch_generic.h:272 suspicious rcu_dereference_check() usage!
      [ 3992.191348]
      [ 3992.191348] other info that might help us debug this:
      [ 3992.191348]
      [ 3992.191351]
      [ 3992.191351] rcu_scheduler_active = 1, debug_locks = 1
      [ 3992.191353] no locks held by swapper/1/0.
      [ 3992.191355]
      [ 3992.191355] stack backtrace:
      [ 3992.191358] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.17.0-rc6net-next+ #72
      [ 3992.191360] Hardware name:                  /DZ77RE-75K, BIOS GAZ7711H.86A.0060.2012.1115.1750 11/15/2012
      [ 3992.191362]  0000000000000001 ffff880235803e48 ffffffff8178f92c 0000000000000000
      [ 3992.191366]  ffff8802322224a0 ffff880235803e78 ffffffff810c9966 ffff8800a5fe3000
      [ 3992.191370]  ffff880235803f30 ffff8802359cd768 ffff8802359cd6e0 ffff880235803e98
      [ 3992.191374] Call Trace:
      [ 3992.191376]  <IRQ>  [<ffffffff8178f92c>] dump_stack+0x4e/0x68
      [ 3992.191387]  [<ffffffff810c9966>] lockdep_rcu_suspicious+0xe6/0x130
      [ 3992.191392]  [<ffffffff8167213a>] qdisc_watchdog+0x8a/0xb0
      [ 3992.191396]  [<ffffffff810f93f2>] __run_hrtimer+0x72/0x420
      [ 3992.191399]  [<ffffffff810f9bcd>] ? hrtimer_interrupt+0x7d/0x240
      [ 3992.191403]  [<ffffffff816720b0>] ? tc_classify+0xc0/0xc0
      [ 3992.191406]  [<ffffffff810f9c4f>] hrtimer_interrupt+0xff/0x240
      [ 3992.191410]  [<ffffffff8109e4a5>] ? __atomic_notifier_call_chain+0x5/0x140
      [ 3992.191415]  [<ffffffff8103577b>] local_apic_timer_interrupt+0x3b/0x60
      [ 3992.191419]  [<ffffffff8179c2b5>] smp_apic_timer_interrupt+0x45/0x60
      [ 3992.191422]  [<ffffffff8179a6bf>] apic_timer_interrupt+0x6f/0x80
      [ 3992.191424]  <EOI>  [<ffffffff815ed233>] ? cpuidle_enter_state+0x73/0x2e0
      [ 3992.191432]  [<ffffffff815ed22e>] ? cpuidle_enter_state+0x6e/0x2e0
      [ 3992.191437]  [<ffffffff815ed567>] cpuidle_enter+0x17/0x20
      [ 3992.191441]  [<ffffffff810c0741>] cpu_startup_entry+0x3d1/0x4a0
      [ 3992.191445]  [<ffffffff81106fc6>] ? clockevents_config_and_register+0x26/0x30
      [ 3992.191448]  [<ffffffff81033c16>] start_secondary+0x1b6/0x260
      
      Fixes: b26b0d1e ("net: qdisc: use rcu prefix and silence sparse warnings")
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: NCong Wang <cwang@twopensource.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1e203c1a
  23. 30 9月, 2014 4 次提交
  24. 26 9月, 2014 1 次提交
    • E
      net: sched: use pinned timers · 4a8e320c
      Eric Dumazet 提交于
      While using a MQ + NETEM setup, I had confirmation that the default
      timer migration ( /proc/sys/kernel/timer_migration ) is killing us.
      
      Installing this on a receiver side of a TCP_STREAM test, (NIC has 8 TX
      queues) :
      
      EST="est 1sec 4sec"
      for ETH in eth1
      do
       tc qd del dev $ETH root 2>/dev/null
       tc qd add dev $ETH root handle 1: mq
       tc qd add dev $ETH parent 1:1 $EST netem limit 70000 delay 6ms
       tc qd add dev $ETH parent 1:2 $EST netem limit 70000 delay 8ms
       tc qd add dev $ETH parent 1:3 $EST netem limit 70000 delay 10ms
       tc qd add dev $ETH parent 1:4 $EST netem limit 70000 delay 12ms
       tc qd add dev $ETH parent 1:5 $EST netem limit 70000 delay 14ms
       tc qd add dev $ETH parent 1:6 $EST netem limit 70000 delay 16ms
       tc qd add dev $ETH parent 1:7 $EST netem limit 80000 delay 18ms
       tc qd add dev $ETH parent 1:8 $EST netem limit 90000 delay 20ms
      done
      
      We can see that timers get migrated into a single cpu, presumably idle
      at the time timers are set up.
      Then all qdisc dequeues run from this cpu and huge lock contention
      happens. This single cpu is stuck in softirq mode and cannot dequeue
      fast enough.
      
          39.24%  [kernel]          [k] _raw_spin_lock
           2.65%  [kernel]          [k] netem_enqueue
           1.80%  [kernel]          [k] netem_dequeue
           1.63%  [kernel]          [k] copy_user_enhanced_fast_string
           1.45%  [kernel]          [k] _raw_spin_lock_bh
      
      By pinning qdisc timers on the cpu running the qdisc, we respect proper
      XPS setting and remove this lock contention.
      
           5.84%  [kernel]          [k] netem_enqueue
           4.83%  [kernel]          [k] _raw_spin_lock
           2.92%  [kernel]          [k] copy_user_enhanced_fast_string
      
      Current Qdiscs that benefit from this change are :
      
      	netem, cbq, fq, hfsc, tbf, htb.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4a8e320c
  25. 14 9月, 2014 2 次提交
  26. 12 6月, 2014 1 次提交
    • F
      net_sched: drr: warn when qdisc is not work conserving · 6e765a00
      Florian Westphal 提交于
      The DRR scheduler requires that items on the active list are work
      conserving, i.e. do not hold on to skbs for throttling purposes, etc.
      Attaching e.g. tbf renders DRR useless because all other classes on the
      active list are delayed as well.
      
      So, warn users that this configuration won't work as expected; we
      already do this in couple of other qdiscs, see e.g.
      
      commit b00355db
      ('pkt_sched: sch_hfsc: sch_htb: Add non-work-conserving warning handler')
      
      The 'const' change is needed to avoid compiler warning ("discards 'const'
      qualifier from pointer target type").
      
      tested with:
      drr_hier() {
              parent=$1
              classes=$2
              for i in  $(seq 1 $classes); do
                      classid=$parent$(printf %x $i)
                      tc class add dev eth0 parent $parent classid $classid drr
      		tc qdisc add dev eth0 parent $classid tbf rate 64kbit burst 256kbit limit 64kbit
              done
      }
      tc qdisc add dev eth0 root handle 1: drr
      drr_hier 1: 32
      tc filter add dev eth0 protocol all pref 1 parent 1: handle 1 flow hash keys dst perturb 1 divisor 32
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e765a00
  27. 03 5月, 2014 1 次提交
  28. 25 4月, 2014 1 次提交
  29. 12 3月, 2014 2 次提交
  30. 11 3月, 2014 1 次提交
  31. 01 1月, 2014 1 次提交
  32. 14 12月, 2013 1 次提交