1. 23 8月, 2017 1 次提交
  2. 17 8月, 2017 1 次提交
  3. 16 8月, 2017 1 次提交
  4. 10 8月, 2017 1 次提交
  5. 05 7月, 2017 1 次提交
  6. 30 6月, 2017 1 次提交
    • G
      net: sched: Fix one possible panic when no destroy callback · c1a4872e
      Gao Feng 提交于
      When qdisc fail to init, qdisc_create would invoke the destroy callback
      to cleanup. But there is no check if the callback exists really. So it
      would cause the panic if there is no real destroy callback like the qdisc
      codel, fq, and so on.
      
      Take codel as an example following:
      When a malicious user constructs one invalid netlink msg, it would cause
      codel_init->codel_change->nla_parse_nested failed.
      Then kernel would invoke the destroy callback directly but qdisc codel
      doesn't define one. It causes one panic as a result.
      
      Now add one the check for destroy to avoid the possible panic.
      
      Fixes: 87b60cfa ("net_sched: fix error recovery at qdisc creation")
      Signed-off-by: NGao Feng <gfree.wind@vip.163.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c1a4872e
  7. 18 5月, 2017 2 次提交
  8. 12 5月, 2017 1 次提交
    • E
      net: sched: optimize class dumps · cb395b20
      Eric Dumazet 提交于
      In commit 59cc1f61 ("net: sched: convert qdisc linked list to
      hashtable") we missed the opportunity to considerably speed up
      tc_dump_tclass_root() if a qdisc handle is provided by user.
      
      Instead of iterating all the qdiscs, use qdisc_match_from_root()
      to directly get the one we look for.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb395b20
  9. 18 4月, 2017 2 次提交
  10. 14 4月, 2017 1 次提交
  11. 13 3月, 2017 1 次提交
  12. 18 2月, 2017 1 次提交
  13. 12 2月, 2017 1 次提交
    • E
      net_sched: fix error recovery at qdisc creation · 87b60cfa
      Eric Dumazet 提交于
      Dmitry reported uses after free in qdisc code [1]
      
      The problem here is that ops->init() can return an error.
      
      qdisc_create_dflt() then call ops->destroy(),
      while qdisc_create() does _not_ call it.
      
      Four qdisc chose to call their own ops->destroy(), assuming their caller
      would not.
      
      This patch makes sure qdisc_create() calls ops->destroy()
      and fixes the four qdisc to avoid double free.
      
      [1]
      BUG: KASAN: use-after-free in mq_destroy+0x242/0x290 net/sched/sch_mq.c:33 at addr ffff8801d415d440
      Read of size 8 by task syz-executor2/5030
      CPU: 0 PID: 5030 Comm: syz-executor2 Not tainted 4.3.5-smp-DEV #119
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
       0000000000000046 ffff8801b435b870 ffffffff81bbbed4 ffff8801db000400
       ffff8801d415d440 ffff8801d415dc40 ffff8801c4988510 ffff8801b435b898
       ffffffff816682b1 ffff8801b435b928 ffff8801d415d440 ffff8801c49880c0
      Call Trace:
       [<ffffffff81bbbed4>] __dump_stack lib/dump_stack.c:15 [inline]
       [<ffffffff81bbbed4>] dump_stack+0x6c/0x98 lib/dump_stack.c:51
       [<ffffffff816682b1>] kasan_object_err+0x21/0x70 mm/kasan/report.c:158
       [<ffffffff81668524>] print_address_description mm/kasan/report.c:196 [inline]
       [<ffffffff81668524>] kasan_report_error+0x1b4/0x4b0 mm/kasan/report.c:285
       [<ffffffff81668953>] kasan_report mm/kasan/report.c:305 [inline]
       [<ffffffff81668953>] __asan_report_load8_noabort+0x43/0x50 mm/kasan/report.c:326
       [<ffffffff82527b02>] mq_destroy+0x242/0x290 net/sched/sch_mq.c:33
       [<ffffffff82524bdd>] qdisc_destroy+0x12d/0x290 net/sched/sch_generic.c:953
       [<ffffffff82524e30>] qdisc_create_dflt+0xf0/0x120 net/sched/sch_generic.c:848
       [<ffffffff8252550d>] attach_default_qdiscs net/sched/sch_generic.c:1029 [inline]
       [<ffffffff8252550d>] dev_activate+0x6ad/0x880 net/sched/sch_generic.c:1064
       [<ffffffff824b1db1>] __dev_open+0x221/0x320 net/core/dev.c:1403
       [<ffffffff824b24ce>] __dev_change_flags+0x15e/0x3e0 net/core/dev.c:6858
       [<ffffffff824b27de>] dev_change_flags+0x8e/0x140 net/core/dev.c:6926
       [<ffffffff824f5bf6>] dev_ifsioc+0x446/0x890 net/core/dev_ioctl.c:260
       [<ffffffff824f61fa>] dev_ioctl+0x1ba/0xb80 net/core/dev_ioctl.c:546
       [<ffffffff82430509>] sock_do_ioctl+0x99/0xb0 net/socket.c:879
       [<ffffffff82430d30>] sock_ioctl+0x2a0/0x390 net/socket.c:958
       [<ffffffff816f3b68>] vfs_ioctl fs/ioctl.c:44 [inline]
       [<ffffffff816f3b68>] do_vfs_ioctl+0x8a8/0xe50 fs/ioctl.c:611
       [<ffffffff816f41a4>] SYSC_ioctl fs/ioctl.c:626 [inline]
       [<ffffffff816f41a4>] SyS_ioctl+0x94/0xc0 fs/ioctl.c:617
       [<ffffffff8123e357>] entry_SYSCALL_64_fastpath+0x12/0x17
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      87b60cfa
  14. 11 2月, 2017 2 次提交
  15. 09 1月, 2017 1 次提交
  16. 06 12月, 2016 1 次提交
    • E
      net_sched: gen_estimator: complete rewrite of rate estimators · 1c0d32fd
      Eric Dumazet 提交于
      1) Old code was hard to maintain, due to complex lock chains.
         (We probably will be able to remove some kfree_rcu() in callers)
      
      2) Using a single timer to update all estimators does not scale.
      
      3) Code was buggy on 32bit kernel (WRITE_ONCE() on 64bit quantity
         is not supposed to work well)
      
      In this rewrite :
      
      - I removed the RB tree that had to be scanned in
        gen_estimator_active(). qdisc dumps should be much faster.
      
      - Each estimator has its own timer.
      
      - Estimations are maintained in net_rate_estimator structure,
        instead of dirtying the qdisc. Minor, but part of the simplification.
      
      - Reading the estimator uses RCU and a seqcount to provide proper
        support for 32bit kernels.
      
      - We reduce memory need when estimators are not used, since
        we store a pointer, instead of the bytes/packets counters.
      
      - xt_rateest_mt() no longer has to grab a spinlock.
        (In the future, xt_rateest_tg() could be switched to per cpu counters)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1c0d32fd
  17. 08 11月, 2016 1 次提交
    • J
      qdisc: catch misconfig of attaching qdisc to tx_queue_len zero device · 84c46dd8
      Jesper Dangaard Brouer 提交于
      It is a clear misconfiguration to attach a qdisc to a device with
      tx_queue_len zero, because some qdisc's (namely, pfifo, bfifo, gred,
      htb, plug and sfb) inherit/copy this value as their queue length.
      
      Why should the kernel catch such a misconfiguration?  Because prior to
      introducing the IFF_NO_QUEUE device flag, userspace found a loophole
      in the qdisc config system that allowed them to achieve the equivalent
      of IFF_NO_QUEUE, which is to remove the qdisc code path entirely from
      a device.  The loophole on older kernels is setting tx_queue_len=0,
      *prior* to device qdisc init (the config time is significant, simply
      setting tx_queue_len=0 doesn't trigger the loophole).
      
      This loophole is currently used by Docker[1] to get better performance
      and scalability out of the veth device.  The Docker developers were
      warned[1] that they needed to adjust the tx_queue_len if ever
      attaching a qdisc.  The OpenShift project didn't remember this warning
      and attached a qdisc, this were caught and fixed in[2].
      
      [1] https://github.com/docker/libcontainer/pull/193
      [2] https://github.com/openshift/origin/pull/11126
      
      Instead of fixing every userspace program that used this loophole, and
      forgot to reset the tx_queue_len, prior to attaching a qdisc.  Let's
      catch the misconfiguration on the kernel side.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      84c46dd8
  18. 20 9月, 2016 1 次提交
  19. 19 8月, 2016 2 次提交
    • J
      net: sched: avoid duplicates in qdisc dump · ea327469
      Jiri Kosina 提交于
      tc_dump_qdisc() performs dumping of the per-device qdiscs in two phases;
      first, the "standard" dev->qdisc is being dumped. Second, if there is/are
      ingress queue(s), they are being dumped as well.
      
      After conversion of netdevice's qdisc linked-list into hashtable, these
      two sets are not in two disjunctive sets/lists any more, but are both
      "reachable" directly from netdevice's hashtable. As a consequence, the
      "full-depth" dump of the ingress qdiscs results in immediately hitting the
      netdevice hashtable again, and duplicating the dump that has already been
      performed for dev->qdisc.
      What in fact needs to be dumped in case of ingress queue is "just" the
      top-level ingress qdisc, as everything else has been dumped already.
      
      Fix this by extending tc_dump_qdisc_root() in a way that it can be instructed
      whether it should (while performing the "full" per-netdev qdisc dump) perform
      the whole recursion, or just dump "additional" top-level (ingress) qdiscs
      without performing any kind of recursion.
      
      This fixes duplicate dumps such as
      
      	qdisc mq 0: root
      	qdisc pfifo_fast 0: parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
      	qdisc pfifo_fast 0: parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
      	qdisc pfifo_fast 0: parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
      	qdisc pfifo_fast 0: parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
      	qdisc clsact ffff: parent ffff:fff1
      	qdisc pfifo_fast 0: parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
      	qdisc pfifo_fast 0: parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
      	qdisc pfifo_fast 0: parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
      	qdisc pfifo_fast 0: parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
      
      Fixes: 59cc1f61 ("net: sched: convert qdisc linked list to hashtable")
      Reported-by: NDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea327469
    • J
      net: sched: fix handling of singleton qdiscs with qdisc_hash · 69012ae4
      Jiri Kosina 提交于
      qdisc_match_from_root() is now iterating over per-netdevice qdisc
      hashtable instead of going through a linked-list of qdiscs (independently
      on the actual underlying netdev), which was the case before the switch to
      hashtable for qdiscs.
      
      For singleton qdiscs, there is no underlying netdev associated though, and
      therefore dumping a singleton qdisc will panic, as qdisc_dev(root) will
      always be NULL.
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000410
       IP: [<ffffffff8167efac>] qdisc_match_from_root+0x2c/0x70
       PGD 1aceba067 PUD 1aceb7067 PMD 0
       Oops: 0000 [#1] PREEMPT SMP
      [ ... ]
       task: ffff8801ec996e00 task.stack: ffff8801ec934000
       RIP: 0010:[<ffffffff8167efac>]  [<ffffffff8167efac>] qdisc_match_from_root+0x2c/0x70
       RSP: 0018:ffff8801ec937ab0  EFLAGS: 00010203
       RAX: 0000000000000408 RBX: ffff88025e612000 RCX: ffffffffffffffd8
       RDX: 0000000000000000 RSI: 00000000ffff0000 RDI: ffffffff81cf8100
       RBP: ffff8801ec937ab0 R08: 000000000001c160 R09: ffff8802668032c0
       R10: ffffffff81cf8100 R11: 0000000000000030 R12: 00000000ffff0000
       R13: ffff88025e612000 R14: ffffffff81cf3140 R15: 0000000000000000
       FS:  00007f24b9af6740(0000) GS:ffff88026f280000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000410 CR3: 00000001aceec000 CR4: 00000000001406e0
       Stack:
        ffff8801ec937ad0 ffffffff81681210 ffff88025dd51a00 00000000fffffff1
        ffff8801ec937b88 ffffffff81681e4e ffffffff81c42bc0 ffff880262431500
        ffffffff81cf3140 ffff88025dd51a10 ffff88025dd51a24 00000000ec937b38
       Call Trace:
        [<ffffffff81681210>] qdisc_lookup+0x40/0x50
        [<ffffffff81681e4e>] tc_modify_qdisc+0x21e/0x550
        [<ffffffff8166ae25>] rtnetlink_rcv_msg+0x95/0x220
        [<ffffffff81209602>] ? __kmalloc_track_caller+0x172/0x230
        [<ffffffff8166ad90>] ? rtnl_newlink+0x870/0x870
        [<ffffffff816897b7>] netlink_rcv_skb+0xa7/0xc0
        [<ffffffff816657c8>] rtnetlink_rcv+0x28/0x30
        [<ffffffff8168919b>] netlink_unicast+0x15b/0x210
        [<ffffffff81689569>] netlink_sendmsg+0x319/0x390
        [<ffffffff816379f8>] sock_sendmsg+0x38/0x50
        [<ffffffff81638296>] ___sys_sendmsg+0x256/0x260
        [<ffffffff811b1275>] ? __pagevec_lru_add_fn+0x135/0x280
        [<ffffffff811b1a90>] ? pagevec_lru_move_fn+0xd0/0xf0
        [<ffffffff811b1140>] ? trace_event_raw_event_mm_lru_insertion+0x180/0x180
        [<ffffffff811b1b85>] ? __lru_cache_add+0x75/0xb0
        [<ffffffff817708a6>] ? _raw_spin_unlock+0x16/0x40
        [<ffffffff811d8dff>] ? handle_mm_fault+0x39f/0x1160
        [<ffffffff81638b15>] __sys_sendmsg+0x45/0x80
        [<ffffffff81638b62>] SyS_sendmsg+0x12/0x20
        [<ffffffff810038e7>] do_syscall_64+0x57/0xb0
      
      Fix this by special-casing singleton qdiscs (those that don't have
      underlying netdevice) and introduce immediate handling of those rather
      than trying to go over an underlying netdevice. We're in the same
      situation in tc_dump_qdisc_root() and tc_dump_tclass_root().
      
      Ultimately, this will have to be slightly reworked so that we are actually
      able to show singleton qdiscs (noop) in the dump properly; but we're not
      currently doing that anyway, so no regression there, and better do this in
      a gradual manner.
      
      Fixes: 59cc1f61 ("net: sched: convert qdisc linked list to hashtable")
      Reported-by: NDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: NDaniel Borkmann <daniel@iogearbox.net>
      Reported-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Tested-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      69012ae4
  20. 11 8月, 2016 1 次提交
  21. 13 6月, 2016 1 次提交
  22. 11 6月, 2016 1 次提交
    • E
      net_sched: remove generic throttled management · 45f50bed
      Eric Dumazet 提交于
      __QDISC_STATE_THROTTLED bit manipulation is rather expensive
      for HTB and few others.
      
      I already removed it for sch_fq in commit f2600cf0
      ("net: sched: avoid costly atomic operation in fq_dequeue()")
      and so far nobody complained.
      
      When one ore more packets are stuck in one or more throttled
      HTB class, a htb dequeue() performs two atomic operations
      to clear/set __QDISC_STATE_THROTTLED bit, while root qdisc
      lock is held.
      
      Removing this pair of atomic operations bring me a 8 % performance
      increase on 200 TCP_RR tests, in presence of throttled classes.
      
      This patch has no side effect, since nothing actually uses
      disc_is_throttled() anymore.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      45f50bed
  23. 08 6月, 2016 1 次提交
    • E
      net: sched: do not acquire qdisc spinlock in qdisc/class stats dump · edb09eb1
      Eric Dumazet 提交于
      Large tc dumps (tc -s {qdisc|class} sh dev ethX) done by Google BwE host
      agent [1] are problematic at scale :
      
      For each qdisc/class found in the dump, we currently lock the root qdisc
      spinlock in order to get stats. Sampling stats every 5 seconds from
      thousands of HTB classes is a challenge when the root qdisc spinlock is
      under high pressure. Not only the dumps take time, they also slow
      down the fast path (queue/dequeue packets) by 10 % to 20 % in some cases.
      
      An audit of existing qdiscs showed that sch_fq_codel is the only qdisc
      that might need the qdisc lock in fq_codel_dump_stats() and
      fq_codel_dump_class_stats()
      
      In v2 of this patch, I now use the Qdisc running seqcount to provide
      consistent reads of packets/bytes counters, regardless of 32/64 bit arches.
      
      I also changed rate estimators to use the same infrastructure
      so that they no longer need to lock root qdisc lock.
      
      [1]
      http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43838.pdfSigned-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Kevin Athey <kda@google.com>
      Cc: Xiaotian Pei <xiaotian@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      edb09eb1
  24. 25 5月, 2016 1 次提交
    • E
      net_sched: avoid too many hrtimer_start() calls · a9efad8b
      Eric Dumazet 提交于
      I found a serious performance bug in packet schedulers using hrtimers.
      
      sch_htb and sch_fq are definitely impacted by this problem.
      
      We constantly rearm high resolution timers if some packets are throttled
      in one (or more) class, and other packets are flying through qdisc on
      another (non throttled) class.
      
      hrtimer_start() does not have the mod_timer() trick of doing nothing if
      expires value does not change :
      
      	if (timer_pending(timer) &&
                  timer->expires == expires)
                      return 1;
      
      This issue is particularly visible when multiple cpus can queue/dequeue
      packets on the same qdisc, as hrtimer code has to lock a remote base.
      
      I used following fix :
      
      1) Change htb to use qdisc_watchdog_schedule_ns() instead of open-coding
      it.
      
      2) Cache watchdog prior expiration. hrtimer might provide this, but I
      prefer to not rely on some hrtimer internal.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a9efad8b
  25. 27 4月, 2016 1 次提交
  26. 01 3月, 2016 1 次提交
  27. 19 2月, 2016 2 次提交
  28. 16 12月, 2015 1 次提交
  29. 04 12月, 2015 1 次提交
    • E
      net_sched: fix qdisc_tree_decrease_qlen() races · 4eaf3b84
      Eric Dumazet 提交于
      qdisc_tree_decrease_qlen() suffers from two problems on multiqueue
      devices.
      
      One problem is that it updates sch->q.qlen and sch->qstats.drops
      on the mq/mqprio root qdisc, while it should not : Daniele
      reported underflows errors :
      [  681.774821] PAX: sch->q.qlen: 0 n: 1
      [  681.774825] PAX: size overflow detected in function qdisc_tree_decrease_qlen net/sched/sch_api.c:769 cicus.693_49 min, count: 72, decl: qlen; num: 0; context: sk_buff_head;
      [  681.774954] CPU: 2 PID: 19 Comm: ksoftirqd/2 Tainted: G           O    4.2.6.201511282239-1-grsec #1
      [  681.774955] Hardware name: ASUSTeK COMPUTER INC. X302LJ/X302LJ, BIOS X302LJ.202 03/05/2015
      [  681.774956]  ffffffffa9a04863 0000000000000000 0000000000000000 ffffffffa990ff7c
      [  681.774959]  ffffc90000d3bc38 ffffffffa95d2810 0000000000000007 ffffffffa991002b
      [  681.774960]  ffffc90000d3bc68 ffffffffa91a44f4 0000000000000001 0000000000000001
      [  681.774962] Call Trace:
      [  681.774967]  [<ffffffffa95d2810>] dump_stack+0x4c/0x7f
      [  681.774970]  [<ffffffffa91a44f4>] report_size_overflow+0x34/0x50
      [  681.774972]  [<ffffffffa94d17e2>] qdisc_tree_decrease_qlen+0x152/0x160
      [  681.774976]  [<ffffffffc02694b1>] fq_codel_dequeue+0x7b1/0x820 [sch_fq_codel]
      [  681.774978]  [<ffffffffc02680a0>] ? qdisc_peek_dequeued+0xa0/0xa0 [sch_fq_codel]
      [  681.774980]  [<ffffffffa94cd92d>] __qdisc_run+0x4d/0x1d0
      [  681.774983]  [<ffffffffa949b2b2>] net_tx_action+0xc2/0x160
      [  681.774985]  [<ffffffffa90664c1>] __do_softirq+0xf1/0x200
      [  681.774987]  [<ffffffffa90665ee>] run_ksoftirqd+0x1e/0x30
      [  681.774989]  [<ffffffffa90896b0>] smpboot_thread_fn+0x150/0x260
      [  681.774991]  [<ffffffffa9089560>] ? sort_range+0x40/0x40
      [  681.774992]  [<ffffffffa9085fe4>] kthread+0xe4/0x100
      [  681.774994]  [<ffffffffa9085f00>] ? kthread_worker_fn+0x170/0x170
      [  681.774995]  [<ffffffffa95d8d1e>] ret_from_fork+0x3e/0x70
      
      mq/mqprio have their own ways to report qlen/drops by folding stats on
      all their queues, with appropriate locking.
      
      A second problem is that qdisc_tree_decrease_qlen() calls qdisc_lookup()
      without proper locking : concurrent qdisc updates could corrupt the list
      that qdisc_match_from_root() parses to find a qdisc given its handle.
      
      Fix first problem adding a TCQ_F_NOPARENT qdisc flag that
      qdisc_tree_decrease_qlen() can use to abort its tree traversal,
      as soon as it meets a mq/mqprio qdisc children.
      
      Second problem can be fixed by RCU protection.
      Qdisc are already freed after RCU grace period, so qdisc_list_add() and
      qdisc_list_del() simply have to use appropriate rcu list variants.
      
      A future patch will add a per struct netdev_queue list anchor, so that
      qdisc_tree_decrease_qlen() can have more efficient lookups.
      Reported-by: NDaniele Fucini <dfucini@gmail.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Cong Wang <cwang@twopensource.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4eaf3b84
  30. 29 8月, 2015 1 次提交
  31. 28 8月, 2015 2 次提交
  32. 28 5月, 2015 1 次提交
    • W
      net_sched: invoke ->attach() after setting dev->qdisc · 86e363dc
      WANG Cong 提交于
      For mq qdisc, we add per tx queue qdisc to root qdisc
      for display purpose, however, that happens too early,
      before the new dev->qdisc is finally set, this causes
      q->list points to an old root qdisc which is going to be
      freed right before assigning with a new one.
      
      Fix this by moving ->attach() after setting dev->qdisc.
      
      For the record, this fixes the following crash:
      
       ------------[ cut here ]------------
       WARNING: CPU: 1 PID: 975 at lib/list_debug.c:59 __list_del_entry+0x5a/0x98()
       list_del corruption. prev->next should be ffff8800d1998ae8, but was 6b6b6b6b6b6b6b6b
       CPU: 1 PID: 975 Comm: tc Not tainted 4.1.0-rc4+ #1019
       Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
        0000000000000009 ffff8800d73fb928 ffffffff81a44e7f 0000000047574756
        ffff8800d73fb978 ffff8800d73fb968 ffffffff810790da ffff8800cfc4cd20
        ffffffff814e725b ffff8800d1998ae8 ffffffff82381250 0000000000000000
       Call Trace:
        [<ffffffff81a44e7f>] dump_stack+0x4c/0x65
        [<ffffffff810790da>] warn_slowpath_common+0x9c/0xb6
        [<ffffffff814e725b>] ? __list_del_entry+0x5a/0x98
        [<ffffffff81079162>] warn_slowpath_fmt+0x46/0x48
        [<ffffffff81820eb0>] ? dev_graft_qdisc+0x5e/0x6a
        [<ffffffff814e725b>] __list_del_entry+0x5a/0x98
        [<ffffffff814e72a7>] list_del+0xe/0x2d
        [<ffffffff81822f05>] qdisc_list_del+0x1e/0x20
        [<ffffffff81820cd1>] qdisc_destroy+0x30/0xd6
        [<ffffffff81822676>] qdisc_graft+0x11d/0x243
        [<ffffffff818233c1>] tc_get_qdisc+0x1a6/0x1d4
        [<ffffffff810b5eaf>] ? mark_lock+0x2e/0x226
        [<ffffffff817ff8f5>] rtnetlink_rcv_msg+0x181/0x194
        [<ffffffff817ff72e>] ? rtnl_lock+0x17/0x19
        [<ffffffff817ff72e>] ? rtnl_lock+0x17/0x19
        [<ffffffff817ff774>] ? __rtnl_unlock+0x17/0x17
        [<ffffffff81855dc6>] netlink_rcv_skb+0x4d/0x93
        [<ffffffff817ff756>] rtnetlink_rcv+0x26/0x2d
        [<ffffffff818544b2>] netlink_unicast+0xcb/0x150
        [<ffffffff81161db9>] ? might_fault+0x59/0xa9
        [<ffffffff81854f78>] netlink_sendmsg+0x4fa/0x51c
        [<ffffffff817d6e09>] sock_sendmsg_nosec+0x12/0x1d
        [<ffffffff817d8967>] sock_sendmsg+0x29/0x2e
        [<ffffffff817d8cf3>] ___sys_sendmsg+0x1b4/0x23a
        [<ffffffff8100a1b8>] ? native_sched_clock+0x35/0x37
        [<ffffffff810a1d83>] ? sched_clock_local+0x12/0x72
        [<ffffffff810a1fd4>] ? sched_clock_cpu+0x9e/0xb7
        [<ffffffff810def2a>] ? current_kernel_time+0xe/0x32
        [<ffffffff810b4bc5>] ? lock_release_holdtime.part.29+0x71/0x7f
        [<ffffffff810ddebf>] ? read_seqcount_begin.constprop.27+0x5f/0x76
        [<ffffffff810b6292>] ? trace_hardirqs_on_caller+0x17d/0x199
        [<ffffffff811b14d5>] ? __fget_light+0x50/0x78
        [<ffffffff817d9808>] __sys_sendmsg+0x42/0x60
        [<ffffffff817d9838>] SyS_sendmsg+0x12/0x1c
        [<ffffffff81a50e97>] system_call_fastpath+0x12/0x6f
       ---[ end trace ef29d3fb28e97ae7 ]---
      
      For long term, we probably need to clean up the qdisc_graft() code
      in case it hides other bugs like this.
      
      Fixes: 95dc1929 ("pkt_sched: give visibility to mq slave qdiscs")
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86e363dc
  33. 14 5月, 2015 1 次提交
  34. 22 4月, 2015 1 次提交