1. 29 3月, 2023 2 次提交
  2. 15 10月, 2021 1 次提交
  3. 13 10月, 2021 1 次提交
  4. 08 2月, 2021 1 次提交
    • E
      net_sched: avoid shift-out-of-bounds in tcindex_set_parms() · 871a58e6
      Eric Dumazet 提交于
      stable inclusion
      from stable-5.10.11
      commit 56ef551205e4f0a90291f3e481cfb753eae0d15e
      bugzilla: 47621
      
      --------------------------------
      
      commit bcd0cf19 upstream.
      
      tc_index being 16bit wide, we need to check that TCA_TCINDEX_SHIFT
      attribute is not silly.
      
      UBSAN: shift-out-of-bounds in net/sched/cls_tcindex.c:260:29
      shift exponent 255 is too large for 32-bit type 'int'
      CPU: 0 PID: 8516 Comm: syz-executor228 Not tainted 5.10.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:79 [inline]
       dump_stack+0x107/0x163 lib/dump_stack.c:120
       ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
       __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395
       valid_perfect_hash net/sched/cls_tcindex.c:260 [inline]
       tcindex_set_parms.cold+0x1b/0x215 net/sched/cls_tcindex.c:425
       tcindex_change+0x232/0x340 net/sched/cls_tcindex.c:546
       tc_new_tfilter+0x13fb/0x21b0 net/sched/cls_api.c:2127
       rtnetlink_rcv_msg+0x8b6/0xb80 net/core/rtnetlink.c:5555
       netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494
       netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
       netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1330
       netlink_sendmsg+0x907/0xe40 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:652 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:672
       ____sys_sendmsg+0x6e8/0x810 net/socket.c:2336
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2390
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2423
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20210114185229.1742255-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      871a58e6
  5. 23 6月, 2020 1 次提交
  6. 04 4月, 2020 1 次提交
  7. 02 4月, 2020 1 次提交
    • C
      net_sched: add a temporary refcnt for struct tcindex_data · 304e0242
      Cong Wang 提交于
      Although we intentionally use an ordered workqueue for all tc
      filter works, the ordering is not guaranteed by RCU work,
      given that tcf_queue_work() is esstenially a call_rcu().
      
      This problem is demostrated by Thomas:
      
        CPU 0:
          tcf_queue_work()
            tcf_queue_work(&r->rwork, tcindex_destroy_rexts_work);
      
        -> Migration to CPU 1
      
        CPU 1:
           tcf_queue_work(&p->rwork, tcindex_destroy_work);
      
      so the 2nd work could be queued before the 1st one, which leads
      to a free-after-free.
      
      Enforcing this order in RCU work is hard as it requires to change
      RCU code too. Fortunately we can workaround this problem in tcindex
      filter by taking a temporary refcnt, we only refcnt it right before
      we begin to destroy it. This simplifies the code a lot as a full
      refcnt requires much more changes in tcindex_set_parms().
      
      Reported-by: syzbot+46f513c3033d592409d2@syzkaller.appspotmail.com
      Fixes: 3d210534 ("net_sched: fix a race condition in tcindex_destroy()")
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Paul E. McKenney <paulmck@kernel.org>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Reviewed-by: NPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      304e0242
  8. 15 3月, 2020 2 次提交
    • C
      net_sched: keep alloc_hash updated after hash allocation · 0d1c3530
      Cong Wang 提交于
      In commit 599be01e ("net_sched: fix an OOB access in cls_tcindex")
      I moved cp->hash calculation before the first
      tcindex_alloc_perfect_hash(), but cp->alloc_hash is left untouched.
      This difference could lead to another out of bound access.
      
      cp->alloc_hash should always be the size allocated, we should
      update it after this tcindex_alloc_perfect_hash().
      
      Reported-and-tested-by: syzbot+dcc34d54d68ef7d2d53d@syzkaller.appspotmail.com
      Reported-and-tested-by: syzbot+c72da7b9ed57cde6fca2@syzkaller.appspotmail.com
      Fixes: 599be01e ("net_sched: fix an OOB access in cls_tcindex")
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d1c3530
    • C
      net_sched: hold rtnl lock in tcindex_partial_destroy_work() · b1be2e8c
      Cong Wang 提交于
      syzbot reported a use-after-free in tcindex_dump(). This is due to
      the lack of RTNL in the deferred rcu work. We queue this work with
      RTNL in tcindex_change(), later, tcindex_dump() is called:
      
              fh = tp->ops->get(tp, t->tcm_handle);
      	...
              err = tp->ops->change(..., &fh, ...);
              tfilter_notify(..., fh, ...);
      
      but there is nothing to serialize the pending
      tcindex_partial_destroy_work() with tcindex_dump().
      
      Fix this by simply holding RTNL in tcindex_partial_destroy_work(),
      so that it won't be called until RTNL is released after
      tc_new_tfilter() is completed.
      
      Reported-and-tested-by: syzbot+653090db2562495901dc@syzkaller.appspotmail.com
      Fixes: 3d210534 ("net_sched: fix a race condition in tcindex_destroy()")
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1be2e8c
  9. 05 2月, 2020 1 次提交
  10. 04 2月, 2020 1 次提交
    • C
      net_sched: fix an OOB access in cls_tcindex · 599be01e
      Cong Wang 提交于
      As Eric noticed, tcindex_alloc_perfect_hash() uses cp->hash
      to compute the size of memory allocation, but cp->hash is
      set again after the allocation, this caused an out-of-bound
      access.
      
      So we have to move all cp->hash initialization and computation
      before the memory allocation. Move cp->mask and cp->shift together
      as cp->hash may need them for computation too.
      
      Reported-and-tested-by: syzbot+35d4dea36c387813ed31@syzkaller.appspotmail.com
      Fixes: 331b7292 ("net: sched: RCU cls_tcindex")
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      599be01e
  11. 27 1月, 2020 1 次提交
    • C
      net_sched: fix ops->bind_class() implementations · 2e24cd75
      Cong Wang 提交于
      The current implementations of ops->bind_class() are merely
      searching for classid and updating class in the struct tcf_result,
      without invoking either of cl_ops->bind_tcf() or
      cl_ops->unbind_tcf(). This breaks the design of them as qdisc's
      like cbq use them to count filters too. This is why syzbot triggered
      the warning in cbq_destroy_class().
      
      In order to fix this, we have to call cl_ops->bind_tcf() and
      cl_ops->unbind_tcf() like the filter binding path. This patch does
      so by refactoring out two helper functions __tcf_bind_filter()
      and __tcf_unbind_filter(), which are lockless and accept a Qdisc
      pointer, then teaching each implementation to call them correctly.
      
      Note, we merely pass the Qdisc pointer as an opaque pointer to
      each filter, they only need to pass it down to the helper
      functions without understanding it at all.
      
      Fixes: 07d79fc7 ("net_sched: add reverse binding for tc class")
      Reported-and-tested-by: syzbot+0a0596220218fcb603a8@syzkaller.appspotmail.com
      Reported-and-tested-by: syzbot+63bdb6006961d8c917c6@syzkaller.appspotmail.com
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e24cd75
  12. 21 5月, 2019 1 次提交
  13. 28 4月, 2019 2 次提交
    • J
      netlink: make validation more configurable for future strictness · 8cb08174
      Johannes Berg 提交于
      We currently have two levels of strict validation:
      
       1) liberal (default)
           - undefined (type >= max) & NLA_UNSPEC attributes accepted
           - attribute length >= expected accepted
           - garbage at end of message accepted
       2) strict (opt-in)
           - NLA_UNSPEC attributes accepted
           - attribute length >= expected accepted
      
      Split out parsing strictness into four different options:
       * TRAILING     - check that there's no trailing data after parsing
                        attributes (in message or nested)
       * MAXTYPE      - reject attrs > max known type
       * UNSPEC       - reject attributes with NLA_UNSPEC policy entries
       * STRICT_ATTRS - strictly validate attribute size
      
      The default for future things should be *everything*.
      The current *_strict() is a combination of TRAILING and MAXTYPE,
      and is renamed to _deprecated_strict().
      The current regular parsing has none of this, and is renamed to
      *_parse_deprecated().
      
      Additionally it allows us to selectively set one of the new flags
      even on old policies. Notably, the UNSPEC flag could be useful in
      this case, since it can be arranged (by filling in the policy) to
      not be an incompatible userspace ABI change, but would then going
      forward prevent forgetting attribute entries. Similar can apply
      to the POLICY flag.
      
      We end up with the following renames:
       * nla_parse           -> nla_parse_deprecated
       * nla_parse_strict    -> nla_parse_deprecated_strict
       * nlmsg_parse         -> nlmsg_parse_deprecated
       * nlmsg_parse_strict  -> nlmsg_parse_deprecated_strict
       * nla_parse_nested    -> nla_parse_nested_deprecated
       * nla_validate_nested -> nla_validate_nested_deprecated
      
      Using spatch, of course:
          @@
          expression TB, MAX, HEAD, LEN, POL, EXT;
          @@
          -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
          +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)
      
          @@
          expression NLH, HDRLEN, TB, MAX, POL, EXT;
          @@
          -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
          +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)
      
          @@
          expression NLH, HDRLEN, TB, MAX, POL, EXT;
          @@
          -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
          +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
      
          @@
          expression TB, MAX, NLA, POL, EXT;
          @@
          -nla_parse_nested(TB, MAX, NLA, POL, EXT)
          +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)
      
          @@
          expression START, MAX, POL, EXT;
          @@
          -nla_validate_nested(START, MAX, POL, EXT)
          +nla_validate_nested_deprecated(START, MAX, POL, EXT)
      
          @@
          expression NLH, HDRLEN, MAX, POL, EXT;
          @@
          -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
          +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)
      
      For this patch, don't actually add the strict, non-renamed versions
      yet so that it breaks compile if I get it wrong.
      
      Also, while at it, make nla_validate and nla_parse go down to a
      common __nla_validate_parse() function to avoid code duplication.
      
      Ultimately, this allows us to have very strict validation for every
      new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
      next patch, while existing things will continue to work as is.
      
      In effect then, this adds fully strict validation for any new command.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8cb08174
    • M
      netlink: make nla_nest_start() add NLA_F_NESTED flag · ae0be8de
      Michal Kubecek 提交于
      Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
      netlink based interfaces (including recently added ones) are still not
      setting it in kernel generated messages. Without the flag, message parsers
      not aware of attribute semantics (e.g. wireshark dissector or libmnl's
      mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
      the structure of their contents.
      
      Unfortunately we cannot just add the flag everywhere as there may be
      userspace applications which check nlattr::nla_type directly rather than
      through a helper masking out the flags. Therefore the patch renames
      nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
      as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
      are rewritten to use nla_nest_start().
      
      Except for changes in include/net/netlink.h, the patch was generated using
      this semantic patch:
      
      @@ expression E1, E2; @@
      -nla_nest_start(E1, E2)
      +nla_nest_start_noflag(E1, E2)
      
      @@ expression E1, E2; @@
      -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
      +nla_nest_start(E1, E2)
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae0be8de
  14. 23 2月, 2019 1 次提交
  15. 21 2月, 2019 2 次提交
    • C
      net_sched: fix a memory leak in cls_tcindex · 51dcb69d
      Cong Wang 提交于
      (cherry picked from commit 033b228e)
      
      When tcindex_destroy() destroys all the filter results in
      the perfect hash table, it invokes the walker to delete
      each of them. However, results with class==0 are skipped
      in either tcindex_walk() or tcindex_delete(), which causes
      a memory leak reported by kmemleak.
      
      This patch fixes it by skipping the walker and directly
      deleting these filter results so we don't miss any filter
      result.
      
      As a result of this change, we have to initialize exts->net
      properly in tcindex_alloc_perfect_hash(). For net-next, we
      need to consider whether we should initialize ->net in
      tcf_exts_init() instead, before that just directly test
      CONFIG_NET_CLS_ACT=y.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      51dcb69d
    • C
      net_sched: fix a race condition in tcindex_destroy() · 3d210534
      Cong Wang 提交于
      (cherry picked from commit 8015d93e)
      
      tcindex_destroy() invokes tcindex_destroy_element() via
      a walker to delete each filter result in its perfect hash
      table, and tcindex_destroy_element() calls tcindex_delete()
      which schedules tcf RCU works to do the final deletion work.
      Unfortunately this races with the RCU callback
      __tcindex_destroy(), which could lead to use-after-free as
      reported by Adrian.
      
      Fix this by migrating this RCU callback to tcf RCU work too,
      as that workqueue is ordered, we will not have use-after-free.
      
      Note, we don't need to hold netns refcnt because we don't call
      tcf_exts_destroy() here.
      
      Fixes: 27ce4f05 ("net_sched: use tcf_queue_work() in tcindex filter")
      Reported-by: NAdrian <bugs@abtelecom.ro>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3d210534
  16. 13 2月, 2019 5 次提交
    • C
      net_sched: fix two more memory leaks in cls_tcindex · 1db817e7
      Cong Wang 提交于
      struct tcindex_filter_result contains two parts:
      struct tcf_exts and struct tcf_result.
      
      For the local variable 'cr', its exts part is never used but
      initialized without being released properly on success path. So
      just completely remove the exts part to fix this leak.
      
      For the local variable 'new_filter_result', it is never properly
      released if not used by 'r' on success path.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1db817e7
    • C
      net_sched: fix a memory leak in cls_tcindex · 033b228e
      Cong Wang 提交于
      When tcindex_destroy() destroys all the filter results in
      the perfect hash table, it invokes the walker to delete
      each of them. However, results with class==0 are skipped
      in either tcindex_walk() or tcindex_delete(), which causes
      a memory leak reported by kmemleak.
      
      This patch fixes it by skipping the walker and directly
      deleting these filter results so we don't miss any filter
      result.
      
      As a result of this change, we have to initialize exts->net
      properly in tcindex_alloc_perfect_hash(). For net-next, we
      need to consider whether we should initialize ->net in
      tcf_exts_init() instead, before that just directly test
      CONFIG_NET_CLS_ACT=y.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      033b228e
    • C
      net_sched: fix a race condition in tcindex_destroy() · 8015d93e
      Cong Wang 提交于
      tcindex_destroy() invokes tcindex_destroy_element() via
      a walker to delete each filter result in its perfect hash
      table, and tcindex_destroy_element() calls tcindex_delete()
      which schedules tcf RCU works to do the final deletion work.
      Unfortunately this races with the RCU callback
      __tcindex_destroy(), which could lead to use-after-free as
      reported by Adrian.
      
      Fix this by migrating this RCU callback to tcf RCU work too,
      as that workqueue is ordered, we will not have use-after-free.
      
      Note, we don't need to hold netns refcnt because we don't call
      tcf_exts_destroy() here.
      
      Fixes: 27ce4f05 ("net_sched: use tcf_queue_work() in tcindex filter")
      Reported-by: NAdrian <bugs@abtelecom.ro>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8015d93e
    • V
      net: sched: extend proto ops to support unlocked classifiers · 12db03b6
      Vlad Buslov 提交于
      Add 'rtnl_held' flag to tcf proto change, delete, destroy, dump, walk
      functions to track rtnl lock status. Extend users of these function in cls
      API to propagate rtnl lock status to them. This allows classifiers to
      obtain rtnl lock when necessary and to pass rtnl lock status to extensions
      and driver offload callbacks.
      
      Add flags field to tcf proto ops. Add flag value to indicate that
      classifier doesn't require rtnl lock.
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      12db03b6
    • V
      net: sched: track rtnl lock status when validating extensions · ec6743a1
      Vlad Buslov 提交于
      Actions API is already updated to not rely on rtnl lock for
      synchronization. However, it need to be provided with rtnl status when
      called from classifiers API in order to be able to correctly release the
      lock when loading kernel module.
      
      Extend extension validation function with 'rtnl_held' flag which is passed
      to actions API. Add new 'rtnl_held' parameter to tcf_exts_validate() in cls
      API. No classifier is currently updated to support unlocked execution, so
      pass hardcoded 'true' flag parameter value.
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec6743a1
  17. 14 8月, 2018 2 次提交
    • H
      net_sched: Fix missing res info when create new tc_index filter · 008369dc
      Hangbin Liu 提交于
      Li Shuang reported the following warn:
      
      [  733.484610] WARNING: CPU: 6 PID: 21123 at net/sched/sch_cbq.c:1418 cbq_destroy_class+0x5d/0x70 [sch_cbq]
      [  733.495190] Modules linked in: sch_cbq cls_tcindex sch_dsmark rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat l
      [  733.574155]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm igb ixgbe ahci libahci i2c_algo_bit libata i40e i2c_core dca mdio megaraid_sas dm_mirror dm_region_hash dm_log dm_mod
      [  733.592500] CPU: 6 PID: 21123 Comm: tc Not tainted 4.18.0-rc8.latest+ #131
      [  733.600169] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.1.5 04/11/2016
      [  733.608518] RIP: 0010:cbq_destroy_class+0x5d/0x70 [sch_cbq]
      [  733.614734] Code: e7 d9 d2 48 8b 7b 48 e8 61 05 da d2 48 8d bb f8 00 00 00 e8 75 ae d5 d2 48 39 eb 74 0a 48 89 df 5b 5d e9 16 6c 94 d2 5b 5d c3 <0f> 0b eb b6 0f 1f 44 00 00 66 2e 0f 1f 84
      [  733.635798] RSP: 0018:ffffbfbb066bb9d8 EFLAGS: 00010202
      [  733.641627] RAX: 0000000000000001 RBX: ffff9cdd17392800 RCX: 000000008010000f
      [  733.649588] RDX: ffff9cdd1df547e0 RSI: ffff9cdd17392800 RDI: ffff9cdd0f84c800
      [  733.657547] RBP: ffff9cdd0f84c800 R08: 0000000000000001 R09: 0000000000000000
      [  733.665508] R10: ffff9cdd0f84d000 R11: 0000000000000001 R12: 0000000000000001
      [  733.673469] R13: 0000000000000000 R14: 0000000000000001 R15: ffff9cdd17392200
      [  733.681430] FS:  00007f911890a740(0000) GS:ffff9cdd1f8c0000(0000) knlGS:0000000000000000
      [  733.690456] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  733.696864] CR2: 0000000000b5544c CR3: 0000000859374002 CR4: 00000000001606e0
      [  733.704826] Call Trace:
      [  733.707554]  cbq_destroy+0xa1/0xd0 [sch_cbq]
      [  733.712318]  qdisc_destroy+0x62/0x130
      [  733.716401]  dsmark_destroy+0x2a/0x70 [sch_dsmark]
      [  733.721745]  qdisc_destroy+0x62/0x130
      [  733.725829]  qdisc_graft+0x3ba/0x470
      [  733.729817]  tc_get_qdisc+0x2a6/0x2c0
      [  733.733901]  ? cred_has_capability+0x7d/0x130
      [  733.738761]  rtnetlink_rcv_msg+0x263/0x2d0
      [  733.743330]  ? rtnl_calcit.isra.30+0x110/0x110
      [  733.748287]  netlink_rcv_skb+0x4d/0x130
      [  733.752576]  netlink_unicast+0x1a3/0x250
      [  733.756949]  netlink_sendmsg+0x2ae/0x3a0
      [  733.761324]  sock_sendmsg+0x36/0x40
      [  733.765213]  ___sys_sendmsg+0x26f/0x2d0
      [  733.769493]  ? handle_pte_fault+0x586/0xdf0
      [  733.774158]  ? __handle_mm_fault+0x389/0x500
      [  733.778919]  ? __sys_sendmsg+0x5e/0xa0
      [  733.783099]  __sys_sendmsg+0x5e/0xa0
      [  733.787087]  do_syscall_64+0x5b/0x180
      [  733.791171]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  733.796805] RIP: 0033:0x7f9117f23f10
      [  733.800791] Code: c3 48 8b 05 82 6f 2c 00 f7 db 64 89 18 48 83 cb ff eb dd 0f 1f 80 00 00 00 00 83 3d 8d d0 2c 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8
      [  733.821873] RSP: 002b:00007ffe96818398 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      [  733.830319] RAX: ffffffffffffffda RBX: 000000005b71244c RCX: 00007f9117f23f10
      [  733.838280] RDX: 0000000000000000 RSI: 00007ffe968183e0 RDI: 0000000000000003
      [  733.846241] RBP: 00007ffe968183e0 R08: 000000000000ffff R09: 0000000000000003
      [  733.854202] R10: 00007ffe96817e20 R11: 0000000000000246 R12: 0000000000000000
      [  733.862161] R13: 0000000000662ee0 R14: 0000000000000000 R15: 0000000000000000
      [  733.870121] ---[ end trace 28edd4aad712ddca ]---
      
      This is because we didn't update f->result.res when create new filter. Then in
      tcindex_delete() -> tcf_unbind_filter(), we will failed to find out the res
      and unbind filter, which will trigger the WARN_ON() in cbq_destroy_class().
      
      Fix it by updating f->result.res when create new filter.
      
      Fixes: 6e056569 ("net_sched: fix another crash in cls_tcindex")
      Reported-by: NLi Shuang <shuali@redhat.com>
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      008369dc
    • H
      net_sched: fix NULL pointer dereference when delete tcindex filter · 2df8bee5
      Hangbin Liu 提交于
      Li Shuang reported the following crash:
      
      [   71.267724] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
      [   71.276456] PGD 800000085d9bd067 P4D 800000085d9bd067 PUD 859a0b067 PMD 0
      [   71.284127] Oops: 0000 [#1] SMP PTI
      [   71.288015] CPU: 12 PID: 2386 Comm: tc Not tainted 4.18.0-rc8.latest+ #131
      [   71.295686] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.1.5 04/11/2016
      [   71.304037] RIP: 0010:tcindex_delete+0x72/0x280 [cls_tcindex]
      [   71.310446] Code: 00 31 f6 48 87 75 20 48 85 f6 74 11 48 8b 47 18 48 8b 40 08 48 8b 40 50 e8 fb a6 f8 fc 48 85 db 0f 84 dc 00 00 00 48 8b 73 18 <8b> 56 04 48 8d 7e 04 85 d2 0f 84 7b 01 00
      [   71.331517] RSP: 0018:ffffb45207b3f898 EFLAGS: 00010282
      [   71.337345] RAX: ffff8ad3d72d6360 RBX: ffff8acc84393680 RCX: 000000000000002e
      [   71.345306] RDX: ffff8ad3d72c8570 RSI: 0000000000000000 RDI: ffff8ad847a45800
      [   71.353277] RBP: ffff8acc84393688 R08: ffff8ad3d72c8400 R09: 0000000000000000
      [   71.361238] R10: ffff8ad3de786e00 R11: 0000000000000000 R12: ffffb45207b3f8c7
      [   71.369199] R13: ffff8ad3d93bd2a0 R14: 000000000000002e R15: ffff8ad3d72c9600
      [   71.377161] FS:  00007f9d3ec3e740(0000) GS:ffff8ad3df980000(0000) knlGS:0000000000000000
      [   71.386188] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   71.392597] CR2: 0000000000000004 CR3: 0000000852f06003 CR4: 00000000001606e0
      [   71.400558] Call Trace:
      [   71.403299]  tcindex_destroy_element+0x25/0x40 [cls_tcindex]
      [   71.409611]  tcindex_walk+0xbb/0x110 [cls_tcindex]
      [   71.414953]  tcindex_destroy+0x44/0x90 [cls_tcindex]
      [   71.420492]  ? tcindex_delete+0x280/0x280 [cls_tcindex]
      [   71.426323]  tcf_proto_destroy+0x16/0x40
      [   71.430696]  tcf_chain_flush+0x51/0x70
      [   71.434876]  tcf_block_put_ext.part.30+0x8f/0x1b0
      [   71.440122]  tcf_block_put+0x4d/0x70
      [   71.444108]  cbq_destroy+0x4d/0xd0 [sch_cbq]
      [   71.448869]  qdisc_destroy+0x62/0x130
      [   71.452951]  dsmark_destroy+0x2a/0x70 [sch_dsmark]
      [   71.458300]  qdisc_destroy+0x62/0x130
      [   71.462373]  qdisc_graft+0x3ba/0x470
      [   71.466359]  tc_get_qdisc+0x2a6/0x2c0
      [   71.470443]  ? cred_has_capability+0x7d/0x130
      [   71.475307]  rtnetlink_rcv_msg+0x263/0x2d0
      [   71.479875]  ? rtnl_calcit.isra.30+0x110/0x110
      [   71.484832]  netlink_rcv_skb+0x4d/0x130
      [   71.489109]  netlink_unicast+0x1a3/0x250
      [   71.493482]  netlink_sendmsg+0x2ae/0x3a0
      [   71.497859]  sock_sendmsg+0x36/0x40
      [   71.501748]  ___sys_sendmsg+0x26f/0x2d0
      [   71.506029]  ? handle_pte_fault+0x586/0xdf0
      [   71.510694]  ? __handle_mm_fault+0x389/0x500
      [   71.515457]  ? __sys_sendmsg+0x5e/0xa0
      [   71.519636]  __sys_sendmsg+0x5e/0xa0
      [   71.523626]  do_syscall_64+0x5b/0x180
      [   71.527711]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   71.533345] RIP: 0033:0x7f9d3e257f10
      [   71.537331] Code: c3 48 8b 05 82 6f 2c 00 f7 db 64 89 18 48 83 cb ff eb dd 0f 1f 80 00 00 00 00 83 3d 8d d0 2c 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8
      [   71.558401] RSP: 002b:00007fff6f893398 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      [   71.566848] RAX: ffffffffffffffda RBX: 000000005b71274d RCX: 00007f9d3e257f10
      [   71.574810] RDX: 0000000000000000 RSI: 00007fff6f8933e0 RDI: 0000000000000003
      [   71.582770] RBP: 00007fff6f8933e0 R08: 000000000000ffff R09: 0000000000000003
      [   71.590729] R10: 00007fff6f892e20 R11: 0000000000000246 R12: 0000000000000000
      [   71.598689] R13: 0000000000662ee0 R14: 0000000000000000 R15: 0000000000000000
      [   71.606651] Modules linked in: sch_cbq cls_tcindex sch_dsmark xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_coni
      [   71.685425]  libahci i2c_algo_bit i2c_core i40e libata dca mdio megaraid_sas dm_mirror dm_region_hash dm_log dm_mod
      [   71.697075] CR2: 0000000000000004
      [   71.700792] ---[ end trace f604eb1acacd978b ]---
      
      Reproducer:
      tc qdisc add dev lo handle 1:0 root dsmark indices 64 set_tc_index
      tc filter add dev lo parent 1:0 protocol ip prio 1 tcindex mask 0xfc shift 2
      tc qdisc add dev lo parent 1:0 handle 2:0 cbq bandwidth 10Mbit cell 8 avpkt 1000 mpu 64
      tc class add dev lo parent 2:0 classid 2:1 cbq bandwidth 10Mbit rate 1500Kbit avpkt 1000 prio 1 bounded isolated allot 1514 weight 1 maxburst 10
      tc filter add dev lo parent 2:0 protocol ip prio 1 handle 0x2e tcindex classid 2:1 pass_on
      tc qdisc add dev lo parent 2:1 pfifo limit 5
      tc qdisc del dev lo root
      
      This is because in tcindex_set_parms, when there is no old_r, we set new
      exts to cr.exts. And we didn't set it to filter when r == &new_filter_result.
      
      Then in tcindex_delete() -> tcf_exts_get_net(), we will get NULL pointer
      dereference as we didn't init exts.
      
      Fix it by moving tcf_exts_change() after "if (old_r && old_r != r)" check.
      Then we don't need "cr" as there is no errout after that.
      
      Fixes: bf63ac73 ("net_sched: fix an oops in tcindex filter")
      Reported-by: NLi Shuang <shuali@redhat.com>
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2df8bee5
  18. 25 5月, 2018 1 次提交
  19. 25 1月, 2018 1 次提交
  20. 20 1月, 2018 3 次提交
  21. 09 11月, 2017 1 次提交
  22. 29 10月, 2017 1 次提交
  23. 17 10月, 2017 1 次提交
  24. 01 9月, 2017 1 次提交
    • C
      net_sched: add reverse binding for tc class · 07d79fc7
      Cong Wang 提交于
      TC filters when used as classifiers are bound to TC classes.
      However, there is a hidden difference when adding them in different
      orders:
      
      1. If we add tc classes before its filters, everything is fine.
         Logically, the classes exist before we specify their ID's in
         filters, it is easy to bind them together, just as in the current
         code base.
      
      2. If we add tc filters before the tc classes they bind, we have to
         do dynamic lookup in fast path. What's worse, this happens all
         the time not just once, because on fast path tcf_result is passed
         on stack, there is no way to propagate back to the one in tc filters.
      
      This hidden difference hurts performance silently if we have many tc
      classes in hierarchy.
      
      This patch intends to close this gap by doing the reverse binding when
      we create a new class, in this case we can actually search all the
      filters in its parent, match and fixup by classid. And because
      tcf_result is specific to each type of tc filter, we have to introduce
      a new ops for each filter to tell how to bind the class.
      
      Note, we still can NOT totally get rid of those class lookup in
      ->enqueue() because cgroup and flow filters have no way to determine
      the classid at setup time, they still have to go through dynamic lookup.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      07d79fc7
  25. 08 8月, 2017 1 次提交
  26. 05 8月, 2017 2 次提交
  27. 22 4月, 2017 1 次提交
    • W
      net_sched: move the empty tp check from ->destroy() to ->delete() · 763dbf63
      WANG Cong 提交于
      We could have a race condition where in ->classify() path we
      dereference tp->root and meanwhile a parallel ->destroy() makes it
      a NULL. Daniel cured this bug in commit d9363774
      ("net, sched: respect rcu grace period on cls destruction").
      
      This happens when ->destroy() is called for deleting a filter to
      check if we are the last one in tp, this tp is still linked and
      visible at that time. The root cause of this problem is the semantic
      of ->destroy(), it does two things (for non-force case):
      
      1) check if tp is empty
      2) if tp is empty we could really destroy it
      
      and its caller, if cares, needs to check its return value to see if it
      is really destroyed. Therefore we can't unlink tp unless we know it is
      empty.
      
      As suggested by Daniel, we could actually move the test logic to ->delete()
      so that we can safely unlink tp after ->delete() tells us the last one is
      just deleted and before ->destroy().
      
      Fixes: 1e052be6 ("net_sched: destroy proto tp when all filters are gone")
      Cc: Roi Dayan <roid@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      763dbf63
  28. 14 4月, 2017 1 次提交