1. 18 10月, 2021 1 次提交
  2. 30 9月, 2021 1 次提交
  3. 26 9月, 2021 1 次提交
    • net: prevent user from passing illegal stab size · b193e15a
      王贇 提交于
      We observed below report when playing with netlink sock:
      
        UBSAN: shift-out-of-bounds in net/sched/sch_api.c:580:10
        shift exponent 249 is too large for 32-bit type
        CPU: 0 PID: 685 Comm: a.out Not tainted
        Call Trace:
         dump_stack_lvl+0x8d/0xcf
         ubsan_epilogue+0xa/0x4e
         __ubsan_handle_shift_out_of_bounds+0x161/0x182
         __qdisc_calculate_pkt_len+0xf0/0x190
         __dev_queue_xmit+0x2ed/0x15b0
      
      it seems like kernel won't check the stab log value passing from
      user, and will use the insane value later to calculate pkt_len.
      
      This patch just add a check on the size/cell_log to avoid insane
      calculation.
      Reported-by: NAbaci <abaci@linux.alibaba.com>
      Signed-off-by: NMichael Wang <yun.wang@linux.alibaba.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b193e15a
  4. 17 7月, 2021 1 次提交
  5. 05 3月, 2021 1 次提交
  6. 23 1月, 2021 1 次提交
  7. 16 1月, 2021 1 次提交
    • E
      net_sched: reject silly cell_log in qdisc_get_rtab() · e4bedf48
      Eric Dumazet 提交于
      iproute2 probably never goes beyond 8 for the cell exponent,
      but stick to the max shift exponent for signed 32bit.
      
      UBSAN reported:
      UBSAN: shift-out-of-bounds in net/sched/sch_api.c:389:22
      shift exponent 130 is too large for 32-bit type 'int'
      CPU: 1 PID: 8450 Comm: syz-executor586 Not tainted 5.11.0-rc3-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:79 [inline]
       dump_stack+0x183/0x22e lib/dump_stack.c:120
       ubsan_epilogue lib/ubsan.c:148 [inline]
       __ubsan_handle_shift_out_of_bounds+0x432/0x4d0 lib/ubsan.c:395
       __detect_linklayer+0x2a9/0x330 net/sched/sch_api.c:389
       qdisc_get_rtab+0x2b5/0x410 net/sched/sch_api.c:435
       cbq_init+0x28f/0x12c0 net/sched/sch_cbq.c:1180
       qdisc_create+0x801/0x1470 net/sched/sch_api.c:1246
       tc_modify_qdisc+0x9e3/0x1fc0 net/sched/sch_api.c:1662
       rtnetlink_rcv_msg+0xb1d/0xe60 net/core/rtnetlink.c:5564
       netlink_rcv_skb+0x1f0/0x460 net/netlink/af_netlink.c:2494
       netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
       netlink_unicast+0x7de/0x9b0 net/netlink/af_netlink.c:1330
       netlink_sendmsg+0xaa6/0xe90 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:652 [inline]
       sock_sendmsg net/socket.c:672 [inline]
       ____sys_sendmsg+0x5a2/0x900 net/socket.c:2345
       ___sys_sendmsg net/socket.c:2399 [inline]
       __sys_sendmsg+0x319/0x400 net/socket.c:2432
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Acked-by: NCong Wang <cong.wang@bytedance.com>
      Link: https://lore.kernel.org/r/20210114160637.1660597-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      e4bedf48
  8. 02 12月, 2020 1 次提交
  9. 17 11月, 2020 2 次提交
  10. 21 7月, 2020 1 次提交
    • J
      sched: sch_api: add missing rcu read lock to silence the warning · a8b7b2d0
      Jiri Pirko 提交于
      In case the qdisc_match_from_root function() is called from non-rcu path
      with rtnl mutex held, a suspiciout rcu usage warning appears:
      
      [  241.504354] =============================
      [  241.504358] WARNING: suspicious RCU usage
      [  241.504366] 5.8.0-rc4-custom-01521-g72a7c7d549c3 #32 Not tainted
      [  241.504370] -----------------------------
      [  241.504378] net/sched/sch_api.c:270 RCU-list traversed in non-reader section!!
      [  241.504382]
                     other info that might help us debug this:
      [  241.504388]
                     rcu_scheduler_active = 2, debug_locks = 1
      [  241.504394] 1 lock held by tc/1391:
      [  241.504398]  #0: ffffffff85a27850 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x49a/0xbd0
      [  241.504431]
                     stack backtrace:
      [  241.504440] CPU: 0 PID: 1391 Comm: tc Not tainted 5.8.0-rc4-custom-01521-g72a7c7d549c3 #32
      [  241.504446] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
      [  241.504453] Call Trace:
      [  241.504465]  dump_stack+0x100/0x184
      [  241.504482]  lockdep_rcu_suspicious+0x153/0x15d
      [  241.504499]  qdisc_match_from_root+0x293/0x350
      
      Fix this by passing the rtnl held lockdep condition down to
      hlist_for_each_entry_rcu()
      Reported-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a8b7b2d0
  11. 21 6月, 2020 1 次提交
  12. 28 5月, 2020 1 次提交
    • C
      net_sched: add a tracepoint for qdisc creation · f5a7833e
      Cong Wang 提交于
      With this tracepoint, we could know when qdisc's are created,
      especially those default qdisc's.
      
      Sample output:
      
        tc-736   [001] ...1    56.230107: qdisc_create: dev=ens3 kind=pfifo parent=1:0
        tc-736   [001] ...1    56.230113: qdisc_create: dev=ens3 kind=hfsc parent=ffff:ffff
        tc-738   [001] ...1    56.256816: qdisc_create: dev=ens3 kind=pfifo parent=1:100
        tc-739   [001] ...1    56.267584: qdisc_create: dev=ens3 kind=pfifo parent=1:200
        tc-740   [001] ...1    56.279649: qdisc_create: dev=ens3 kind=fq_codel parent=1:100
        tc-741   [001] ...1    56.289996: qdisc_create: dev=ens3 kind=pfifo_fast parent=1:200
        tc-745   [000] .N.1   111.687483: qdisc_create: dev=ens3 kind=ingress parent=ffff:fff1
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f5a7833e
  13. 18 3月, 2020 2 次提交
  14. 27 1月, 2020 2 次提交
    • C
      net_sched: walk through all child classes in tc_bind_tclass() · 760d228e
      Cong Wang 提交于
      In a complex TC class hierarchy like this:
      
      tc qdisc add dev eth0 root handle 1:0 cbq bandwidth 100Mbit         \
        avpkt 1000 cell 8
      tc class add dev eth0 parent 1:0 classid 1:1 cbq bandwidth 100Mbit  \
        rate 6Mbit weight 0.6Mbit prio 8 allot 1514 cell 8 maxburst 20      \
        avpkt 1000 bounded
      
      tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip \
        sport 80 0xffff flowid 1:3
      tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip \
        sport 25 0xffff flowid 1:4
      
      tc class add dev eth0 parent 1:1 classid 1:3 cbq bandwidth 100Mbit  \
        rate 5Mbit weight 0.5Mbit prio 5 allot 1514 cell 8 maxburst 20      \
        avpkt 1000
      tc class add dev eth0 parent 1:1 classid 1:4 cbq bandwidth 100Mbit  \
        rate 3Mbit weight 0.3Mbit prio 5 allot 1514 cell 8 maxburst 20      \
        avpkt 1000
      
      where filters are installed on qdisc 1:0, so we can't merely
      search from class 1:1 when creating class 1:3 and class 1:4. We have
      to walk through all the child classes of the direct parent qdisc.
      Otherwise we would miss filters those need reverse binding.
      
      Fixes: 07d79fc7 ("net_sched: add reverse binding for tc class")
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      760d228e
    • C
      net_sched: fix ops->bind_class() implementations · 2e24cd75
      Cong Wang 提交于
      The current implementations of ops->bind_class() are merely
      searching for classid and updating class in the struct tcf_result,
      without invoking either of cl_ops->bind_tcf() or
      cl_ops->unbind_tcf(). This breaks the design of them as qdisc's
      like cbq use them to count filters too. This is why syzbot triggered
      the warning in cbq_destroy_class().
      
      In order to fix this, we have to call cl_ops->bind_tcf() and
      cl_ops->unbind_tcf() like the filter binding path. This patch does
      so by refactoring out two helper functions __tcf_bind_filter()
      and __tcf_unbind_filter(), which are lockless and accept a Qdisc
      pointer, then teaching each implementation to call them correctly.
      
      Note, we merely pass the Qdisc pointer as an opaque pointer to
      each filter, they only need to pass it down to the helper
      functions without understanding it at all.
      
      Fixes: 07d79fc7 ("net_sched: add reverse binding for tc class")
      Reported-and-tested-by: syzbot+0a0596220218fcb603a8@syzkaller.appspotmail.com
      Reported-and-tested-by: syzbot+63bdb6006961d8c917c6@syzkaller.appspotmail.com
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e24cd75
  15. 09 10月, 2019 1 次提交
  16. 22 9月, 2019 1 次提交
  17. 11 9月, 2019 1 次提交
  18. 31 5月, 2019 1 次提交
  19. 28 4月, 2019 2 次提交
    • J
      netlink: make validation more configurable for future strictness · 8cb08174
      Johannes Berg 提交于
      We currently have two levels of strict validation:
      
       1) liberal (default)
           - undefined (type >= max) & NLA_UNSPEC attributes accepted
           - attribute length >= expected accepted
           - garbage at end of message accepted
       2) strict (opt-in)
           - NLA_UNSPEC attributes accepted
           - attribute length >= expected accepted
      
      Split out parsing strictness into four different options:
       * TRAILING     - check that there's no trailing data after parsing
                        attributes (in message or nested)
       * MAXTYPE      - reject attrs > max known type
       * UNSPEC       - reject attributes with NLA_UNSPEC policy entries
       * STRICT_ATTRS - strictly validate attribute size
      
      The default for future things should be *everything*.
      The current *_strict() is a combination of TRAILING and MAXTYPE,
      and is renamed to _deprecated_strict().
      The current regular parsing has none of this, and is renamed to
      *_parse_deprecated().
      
      Additionally it allows us to selectively set one of the new flags
      even on old policies. Notably, the UNSPEC flag could be useful in
      this case, since it can be arranged (by filling in the policy) to
      not be an incompatible userspace ABI change, but would then going
      forward prevent forgetting attribute entries. Similar can apply
      to the POLICY flag.
      
      We end up with the following renames:
       * nla_parse           -> nla_parse_deprecated
       * nla_parse_strict    -> nla_parse_deprecated_strict
       * nlmsg_parse         -> nlmsg_parse_deprecated
       * nlmsg_parse_strict  -> nlmsg_parse_deprecated_strict
       * nla_parse_nested    -> nla_parse_nested_deprecated
       * nla_validate_nested -> nla_validate_nested_deprecated
      
      Using spatch, of course:
          @@
          expression TB, MAX, HEAD, LEN, POL, EXT;
          @@
          -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
          +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)
      
          @@
          expression NLH, HDRLEN, TB, MAX, POL, EXT;
          @@
          -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
          +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)
      
          @@
          expression NLH, HDRLEN, TB, MAX, POL, EXT;
          @@
          -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
          +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
      
          @@
          expression TB, MAX, NLA, POL, EXT;
          @@
          -nla_parse_nested(TB, MAX, NLA, POL, EXT)
          +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)
      
          @@
          expression START, MAX, POL, EXT;
          @@
          -nla_validate_nested(START, MAX, POL, EXT)
          +nla_validate_nested_deprecated(START, MAX, POL, EXT)
      
          @@
          expression NLH, HDRLEN, MAX, POL, EXT;
          @@
          -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
          +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)
      
      For this patch, don't actually add the strict, non-renamed versions
      yet so that it breaks compile if I get it wrong.
      
      Also, while at it, make nla_validate and nla_parse go down to a
      common __nla_validate_parse() function to avoid code duplication.
      
      Ultimately, this allows us to have very strict validation for every
      new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
      next patch, while existing things will continue to work as is.
      
      In effect then, this adds fully strict validation for any new command.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8cb08174
    • M
      netlink: make nla_nest_start() add NLA_F_NESTED flag · ae0be8de
      Michal Kubecek 提交于
      Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
      netlink based interfaces (including recently added ones) are still not
      setting it in kernel generated messages. Without the flag, message parsers
      not aware of attribute semantics (e.g. wireshark dissector or libmnl's
      mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
      the structure of their contents.
      
      Unfortunately we cannot just add the flag everywhere as there may be
      userspace applications which check nlattr::nla_type directly rather than
      through a helper masking out the flags. Therefore the patch renames
      nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
      as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
      are rewritten to use nla_nest_start().
      
      Except for changes in include/net/netlink.h, the patch was generated using
      this semantic patch:
      
      @@ expression E1, E2; @@
      -nla_nest_start(E1, E2)
      +nla_nest_start_noflag(E1, E2)
      
      @@ expression E1, E2; @@
      -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
      +nla_nest_start(E1, E2)
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae0be8de
  20. 11 4月, 2019 1 次提交
    • P
      net: sched: when clearing NOLOCK, clear TCQ_F_CPUSTATS, too · 8a53e616
      Paolo Abeni 提交于
      Since stats updating is always consistent with TCQ_F_CPUSTATS flag,
      we can disable it at qdisc creation time flipping such bit.
      
      In my experiments, if the NOLOCK flag is cleared, per CPU stats
      accounting does not give any measurable performance gain, but it
      waste some memory.
      
      Let's clear TCQ_F_CPUSTATS together with NOLOCK, when enslaving
      a NOLOCK qdisc to 'lock' one.
      
      Use stats update helper inside pfifo_fast, to cope correctly with
      TCQ_F_CPUSTATS flag change.
      
      As a side effect, q.qlen value for any child qdiscs is always
      consistent for all lock classfull qdiscs.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a53e616
  21. 14 3月, 2019 1 次提交
  22. 19 2月, 2019 1 次提交
  23. 18 2月, 2019 1 次提交
  24. 13 2月, 2019 3 次提交
  25. 20 1月, 2019 1 次提交
  26. 16 12月, 2018 1 次提交
  27. 02 12月, 2018 1 次提交
    • P
      net/sched: Replace call_rcu_bh() and rcu_barrier_bh() · ae0e3349
      Paul E. McKenney 提交于
      Now that call_rcu()'s callback is not invoked until after bh-disable
      regions of code have completed (in addition to explicitly marked
      RCU read-side critical sections), call_rcu() can be used in place
      of call_rcu_bh().  Similarly, rcu_barrier() can be used in place o
      frcu_barrier_bh().  This commit therefore makes these changes.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.ibm.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: <netdev@vger.kernel.org>
      ae0e3349
  28. 15 11月, 2018 1 次提交
  29. 09 11月, 2018 3 次提交
  30. 25 10月, 2018 1 次提交
  31. 19 10月, 2018 1 次提交
    • P
      net: sched: Fix for duplicate class dump · 3c53ed8f
      Phil Sutter 提交于
      When dumping classes by parent, kernel would return classes twice:
      
      | # tc qdisc add dev lo root prio
      | # tc class show dev lo
      | class prio 8001:1 parent 8001:
      | class prio 8001:2 parent 8001:
      | class prio 8001:3 parent 8001:
      | # tc class show dev lo parent 8001:
      | class prio 8001:1 parent 8001:
      | class prio 8001:2 parent 8001:
      | class prio 8001:3 parent 8001:
      | class prio 8001:1 parent 8001:
      | class prio 8001:2 parent 8001:
      | class prio 8001:3 parent 8001:
      
      This comes from qdisc_match_from_root() potentially returning the root
      qdisc itself if its handle matched. Though in that case, root's classes
      were already dumped a few lines above.
      
      Fixes: cb395b20 ("net: sched: optimize class dumps")
      Signed-off-by: NPhil Sutter <phil@nwl.cc>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c53ed8f
  32. 16 10月, 2018 1 次提交
    • D
      net/sched: cls_api: add missing validation of netlink attributes · e331473f
      Davide Caratti 提交于
      Similarly to what has been done in 8b4c3cdd ("net: sched: Add policy
      validation for tc attributes"), fix classifier code to add validation of
      TCA_CHAIN and TCA_KIND netlink attributes.
      
      tested with:
       # ./tdc.py -c filter
      
      v2: Let sch_api and cls_api share nla_policy they have in common, thanks
          to David Ahern.
      v3: Avoid EXPORT_SYMBOL(), as validation of those attributes is not done
          by TC modules, thanks to Cong Wang.
          While at it, restore the 'Delete / get qdisc' comment to its orginal
          position, just above tc_get_qdisc() function prototype.
      
      Fixes: 5bc17018 ("net: sched: introduce multichain support for filters")
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e331473f