1. 27 12月, 2019 1 次提交
  2. 19 5月, 2018 1 次提交
    • P
      net: sched: red: avoid hashing NULL child · 44a63b13
      Paolo Abeni 提交于
      Hangbin reported an Oops triggered by the syzkaller qdisc rules:
      
       kasan: GPF could be caused by NULL-ptr deref or user memory access
       general protection fault: 0000 [#1] SMP KASAN PTI
       Modules linked in: sch_red
       CPU: 0 PID: 28699 Comm: syz-executor5 Not tainted 4.17.0-rc4.kcov #1
       Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
       RIP: 0010:qdisc_hash_add+0x26/0xa0
       RSP: 0018:ffff8800589cf470 EFLAGS: 00010203
       RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff824ad971
       RDX: 0000000000000007 RSI: ffffc9000ce9f000 RDI: 000000000000003c
       RBP: 0000000000000001 R08: ffffed000b139ea2 R09: ffff8800589cf4f0
       R10: ffff8800589cf50f R11: ffffed000b139ea2 R12: ffff880054019fc0
       R13: ffff880054019fb4 R14: ffff88005c0af600 R15: ffff880054019fb0
       FS:  00007fa6edcb1700(0000) GS:ffff88005ce00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000020000740 CR3: 000000000fc16000 CR4: 00000000000006f0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Call Trace:
        red_change+0x2d2/0xed0 [sch_red]
        qdisc_create+0x57e/0xef0
        tc_modify_qdisc+0x47f/0x14e0
        rtnetlink_rcv_msg+0x6a8/0x920
        netlink_rcv_skb+0x2a2/0x3c0
        netlink_unicast+0x511/0x740
        netlink_sendmsg+0x825/0xc30
        sock_sendmsg+0xc5/0x100
        ___sys_sendmsg+0x778/0x8e0
        __sys_sendmsg+0xf5/0x1b0
        do_syscall_64+0xbd/0x3b0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x450869
       RSP: 002b:00007fa6edcb0c48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
       RAX: ffffffffffffffda RBX: 00007fa6edcb16b4 RCX: 0000000000450869
       RDX: 0000000000000000 RSI: 00000000200000c0 RDI: 0000000000000013
       RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
       R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
       R13: 0000000000008778 R14: 0000000000702838 R15: 00007fa6edcb1700
       Code: e9 0b fe ff ff 0f 1f 44 00 00 55 53 48 89 fb 89 f5 e8 3f 07 f3 fe 48 8d 7b 3c 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 51
       RIP: qdisc_hash_add+0x26/0xa0 RSP: ffff8800589cf470
      
      When a red qdisc is updated with a 0 limit, the child qdisc is left
      unmodified, no additional scheduler is created in red_change(),
      the 'child' local variable is rightfully NULL and must not add it
      to the hash table.
      
      This change addresses the above issue moving qdisc_hash_add() right
      after the child qdisc creation. It additionally removes unneeded checks
      for noop_qdisc.
      Reported-by: NHangbin Liu <liuhangbin@gmail.com>
      Fixes: 49b49971 ("net: sched: make default fifo qdiscs appear in the dump")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      44a63b13
  3. 05 3月, 2018 1 次提交
  4. 01 2月, 2018 1 次提交
  5. 22 12月, 2017 6 次提交
  6. 31 8月, 2017 1 次提交
    • N
      sch_tbf: fix two null pointer dereferences on init failure · c2d6511e
      Nikolay Aleksandrov 提交于
      sch_tbf calls qdisc_watchdog_cancel() in both its ->reset and ->destroy
      callbacks but it may fail before the timer is initialized due to missing
      options (either not supplied by user-space or set as a default qdisc),
      also q->qdisc is used by ->reset and ->destroy so we need it initialized.
      
      Reproduce:
      $ sysctl net.core.default_qdisc=tbf
      $ ip l set ethX up
      
      Crash log:
      [  959.160172] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
      [  959.160323] IP: qdisc_reset+0xa/0x5c
      [  959.160400] PGD 59cdb067
      [  959.160401] P4D 59cdb067
      [  959.160466] PUD 59ccb067
      [  959.160532] PMD 0
      [  959.160597]
      [  959.160706] Oops: 0000 [#1] SMP
      [  959.160778] Modules linked in: sch_tbf sch_sfb sch_prio sch_netem
      [  959.160891] CPU: 2 PID: 1562 Comm: ip Not tainted 4.13.0-rc6+ #62
      [  959.160998] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
      [  959.161157] task: ffff880059c9a700 task.stack: ffff8800376d0000
      [  959.161263] RIP: 0010:qdisc_reset+0xa/0x5c
      [  959.161347] RSP: 0018:ffff8800376d3610 EFLAGS: 00010286
      [  959.161531] RAX: ffffffffa001b1dd RBX: ffff8800373a2800 RCX: 0000000000000000
      [  959.161733] RDX: ffffffff8215f160 RSI: ffffffff8215f160 RDI: 0000000000000000
      [  959.161939] RBP: ffff8800376d3618 R08: 00000000014080c0 R09: 00000000ffffffff
      [  959.162141] R10: ffff8800376d3578 R11: 0000000000000020 R12: ffffffffa001d2c0
      [  959.162343] R13: ffff880037538000 R14: 00000000ffffffff R15: 0000000000000001
      [  959.162546] FS:  00007fcc5126b740(0000) GS:ffff88005d900000(0000) knlGS:0000000000000000
      [  959.162844] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  959.163030] CR2: 0000000000000018 CR3: 000000005abc4000 CR4: 00000000000406e0
      [  959.163233] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  959.163436] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  959.163638] Call Trace:
      [  959.163788]  tbf_reset+0x19/0x64 [sch_tbf]
      [  959.163957]  qdisc_destroy+0x8b/0xe5
      [  959.164119]  qdisc_create_dflt+0x86/0x94
      [  959.164284]  ? dev_activate+0x129/0x129
      [  959.164449]  attach_one_default_qdisc+0x36/0x63
      [  959.164623]  netdev_for_each_tx_queue+0x3d/0x48
      [  959.164795]  dev_activate+0x4b/0x129
      [  959.164957]  __dev_open+0xe7/0x104
      [  959.165118]  __dev_change_flags+0xc6/0x15c
      [  959.165287]  dev_change_flags+0x25/0x59
      [  959.165451]  do_setlink+0x30c/0xb3f
      [  959.165613]  ? check_chain_key+0xb0/0xfd
      [  959.165782]  rtnl_newlink+0x3a4/0x729
      [  959.165947]  ? rtnl_newlink+0x117/0x729
      [  959.166121]  ? ns_capable_common+0xd/0xb1
      [  959.166288]  ? ns_capable+0x13/0x15
      [  959.166450]  rtnetlink_rcv_msg+0x188/0x197
      [  959.166617]  ? rcu_read_unlock+0x3e/0x5f
      [  959.166783]  ? rtnl_newlink+0x729/0x729
      [  959.166948]  netlink_rcv_skb+0x6c/0xce
      [  959.167113]  rtnetlink_rcv+0x23/0x2a
      [  959.167273]  netlink_unicast+0x103/0x181
      [  959.167439]  netlink_sendmsg+0x326/0x337
      [  959.167607]  sock_sendmsg_nosec+0x14/0x3f
      [  959.167772]  sock_sendmsg+0x29/0x2e
      [  959.167932]  ___sys_sendmsg+0x209/0x28b
      [  959.168098]  ? do_raw_spin_unlock+0xcd/0xf8
      [  959.168267]  ? _raw_spin_unlock+0x27/0x31
      [  959.168432]  ? __handle_mm_fault+0x651/0xdb1
      [  959.168602]  ? check_chain_key+0xb0/0xfd
      [  959.168773]  __sys_sendmsg+0x45/0x63
      [  959.168934]  ? __sys_sendmsg+0x45/0x63
      [  959.169100]  SyS_sendmsg+0x19/0x1b
      [  959.169260]  entry_SYSCALL_64_fastpath+0x23/0xc2
      [  959.169432] RIP: 0033:0x7fcc5097e690
      [  959.169592] RSP: 002b:00007ffd0d5c7b48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      [  959.169887] RAX: ffffffffffffffda RBX: ffffffff810d278c RCX: 00007fcc5097e690
      [  959.170089] RDX: 0000000000000000 RSI: 00007ffd0d5c7b90 RDI: 0000000000000003
      [  959.170292] RBP: ffff8800376d3f98 R08: 0000000000000001 R09: 0000000000000003
      [  959.170494] R10: 00007ffd0d5c7910 R11: 0000000000000246 R12: 0000000000000006
      [  959.170697] R13: 000000000066f1a0 R14: 00007ffd0d5cfc40 R15: 0000000000000000
      [  959.170900]  ? trace_hardirqs_off_caller+0xa7/0xcf
      [  959.171076] Code: 00 41 c7 84 24 14 01 00 00 00 00 00 00 41 c7 84 24
      98 00 00 00 00 00 00 00 41 5c 41 5d 41 5e 5d c3 66 66 66 66 90 55 48 89
      e5 53 <48> 8b 47 18 48 89 fb 48 8b 40 48 48 85 c0 74 02 ff d0 48 8b bb
      [  959.171637] RIP: qdisc_reset+0xa/0x5c RSP: ffff8800376d3610
      [  959.171821] CR2: 0000000000000018
      
      Fixes: 87b60cfa ("net_sched: fix error recovery at qdisc creation")
      Fixes: 0fbbeb1b ("[PKT_SCHED]: Fix missing qdisc_destroy() in qdisc_create_dflt()")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c2d6511e
  7. 26 8月, 2017 1 次提交
    • W
      net_sched: remove tc class reference counting · 143976ce
      WANG Cong 提交于
      For TC classes, their ->get() and ->put() are always paired, and the
      reference counting is completely useless, because:
      
      1) For class modification and dumping paths, we already hold RTNL lock,
         so all of these ->get(),->change(),->put() are atomic.
      
      2) For filter bindiing/unbinding, we use other reference counter than
         this one, and they should have RTNL lock too.
      
      3) For ->qlen_notify(), it is special because it is called on ->enqueue()
         path, but we already hold qdisc tree lock there, and we hold this
         tree lock when graft or delete the class too, so it should not be gone
         or changed until we release the tree lock.
      
      Therefore, this patch removes ->get() and ->put(), but:
      
      1) Adds a new ->find() to find the pointer to a class by classid, no
         refcnt.
      
      2) Move the original class destroy upon the last refcnt into ->delete(),
         right after releasing tree lock. This is fine because the class is
         already removed from hash when holding the lock.
      
      For those who also use ->put() as ->unbind(), just rename them to reflect
      this change.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      143976ce
  8. 14 4月, 2017 1 次提交
  9. 13 3月, 2017 2 次提交
  10. 26 6月, 2016 1 次提交
    • E
      net_sched: drop packets after root qdisc lock is released · 520ac30f
      Eric Dumazet 提交于
      Qdisc performance suffers when packets are dropped at enqueue()
      time because drops (kfree_skb()) are done while qdisc lock is held,
      delaying a dequeue() draining the queue.
      
      Nominal throughput can be reduced by 50 % when this happens,
      at a time we would like the dequeue() to proceed as fast as possible.
      
      Even FQ is vulnerable to this problem, while one of FQ goals was
      to provide some flow isolation.
      
      This patch adds a 'struct sk_buff **to_free' parameter to all
      qdisc->enqueue(), and in qdisc_drop() helper.
      
      I measured a performance increase of up to 12 %, but this patch
      is a prereq so that future batches in enqueue() can fly.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      520ac30f
  11. 11 6月, 2016 1 次提交
    • E
      net_sched: remove generic throttled management · 45f50bed
      Eric Dumazet 提交于
      __QDISC_STATE_THROTTLED bit manipulation is rather expensive
      for HTB and few others.
      
      I already removed it for sch_fq in commit f2600cf0
      ("net: sched: avoid costly atomic operation in fq_dequeue()")
      and so far nobody complained.
      
      When one ore more packets are stuck in one or more throttled
      HTB class, a htb dequeue() performs two atomic operations
      to clear/set __QDISC_STATE_THROTTLED bit, while root qdisc
      lock is held.
      
      Removing this pair of atomic operations bring me a 8 % performance
      increase on 200 TCP_RR tests, in presence of throttled classes.
      
      This patch has no side effect, since nothing actually uses
      disc_is_throttled() anymore.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      45f50bed
  12. 09 6月, 2016 2 次提交
  13. 04 6月, 2016 1 次提交
  14. 26 4月, 2016 1 次提交
  15. 01 3月, 2016 2 次提交
  16. 06 10月, 2014 1 次提交
  17. 30 9月, 2014 1 次提交
  18. 23 8月, 2014 1 次提交
  19. 14 3月, 2014 1 次提交
  20. 04 3月, 2014 1 次提交
  21. 28 2月, 2014 1 次提交
  22. 27 1月, 2014 1 次提交
  23. 27 12月, 2013 1 次提交
  24. 12 12月, 2013 2 次提交
  25. 24 11月, 2013 1 次提交
    • E
      sch_tbf: handle too small burst · 4d0820cf
      Eric Dumazet 提交于
      If a too small burst is inadvertently set on TBF, we might trigger
      a bug in tbf_segment(), as 'skb' instead of 'segs' was used in a
      qdisc_reshape_fail() call.
      
      tc qdisc add dev eth0 root handle 1: tbf latency 50ms burst 1KB rate
      50mbit
      
      Fix the bug, and add a warning, as such configuration is not
      going to work anyway for non GSO packets.
      
      (For some reason, one has to use a burst >= 1520 to get a working
      configuration, even with old kernels. This is a probable iproute2/tc
      bug)
      
      Based on a report and initial patch from Yang Yingliang
      
      Fixes: e43ac79a ("sch_tbf: segment too big GSO packets")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4d0820cf
  26. 10 11月, 2013 1 次提交
  27. 21 9月, 2013 1 次提交
  28. 03 6月, 2013 1 次提交
  29. 23 5月, 2013 1 次提交
  30. 13 2月, 2013 1 次提交
    • J
      tbf: improved accuracy at high rates · b757c933
      Jiri Pirko 提交于
      Current TBF uses rate table computed by the "tc" userspace program,
      which has the following issue:
      
      The rate table has 256 entries to map packet lengths to
      token (time units).  With TSO sized packets, the 256 entry granularity
      leads to loss/gain of rate, making the token bucket inaccurate.
      
      Thus, instead of relying on rate table, this patch explicitly computes
      the time and accounts for packet transmission times with nanosecond
      granularity.
      
      This is a followup to 56b765b7
      ("htb: improved accuracy at high rates").
      Signed-off-by: NJiri Pirko <jiri@resnulli.us>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b757c933
  31. 02 4月, 2012 1 次提交