1. 09 11月, 2017 1 次提交
    • C
      net_sched: introduce tcf_exts_get_net() and tcf_exts_put_net() · e4b95c41
      Cong Wang 提交于
      Instead of holding netns refcnt in tc actions, we can minimize
      the holding time by saving it in struct tcf_exts instead. This
      means we can just hold netns refcnt right before call_rcu() and
      release it after tcf_exts_destroy() is done.
      
      However, because on netns cleanup path we call tcf_proto_destroy()
      too, obviously we can not hold netns for a zero refcnt, in this
      case we have to do cleanup synchronously. It is fine for RCU too,
      the caller cleanup_net() already waits for a grace period.
      
      For other cases, refcnt is non-zero and we can safely grab it as
      normal and release it after we are done.
      
      This patch provides two new API for each filter to use:
      tcf_exts_get_net() and tcf_exts_put_net(). And all filters now can
      use the following pattern:
      
      void __destroy_filter() {
        tcf_exts_destroy();
        tcf_exts_put_net();  // <== release netns refcnt
        kfree();
      }
      void some_work() {
        rtnl_lock();
        __destroy_filter();
        rtnl_unlock();
      }
      void some_rcu_callback() {
        tcf_queue_work(some_work);
      }
      
      if (tcf_exts_get_net())  // <== hold netns refcnt
        call_rcu(some_rcu_callback);
      else
        __destroy_filter();
      
      Cc: Lucas Bates <lucasb@mojatatu.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e4b95c41
  2. 31 10月, 2017 1 次提交
    • C
      net_sched: remove tcf_block_put_deferred() · 822e86d9
      Cong Wang 提交于
      In commit 7aa0045d ("net_sched: introduce a workqueue for RCU callbacks of tc filter")
      I defer tcf_chain_flush() to a workqueue, this causes a use-after-free
      because qdisc is already destroyed after we queue this work.
      
      The tcf_block_put_deferred() is no longer necessary after we get RTNL
      for each tc filter destroy work, no others could jump in at this point.
      Same for tcf_chain_hold(), we are fully serialized now.
      
      This also reduces one indirection therefore makes the code more
      readable. Note this brings back a rcu_barrier(), however comparing
      to the code prior to commit 7aa0045d we still reduced one
      rcu_barrier(). For net-next, we can consider to refcnt tcf block to
      avoid it.
      
      Fixes: 7aa0045d ("net_sched: introduce a workqueue for RCU callbacks of tc filter")
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      822e86d9
  3. 29 10月, 2017 2 次提交
    • C
      net_sched: add rtnl assertion to tcf_exts_destroy() · 2d132eba
      Cong Wang 提交于
      After previous patches, it is now safe to claim that
      tcf_exts_destroy() is always called with RTNL lock.
      
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d132eba
    • C
      net_sched: introduce a workqueue for RCU callbacks of tc filter · 7aa0045d
      Cong Wang 提交于
      This patch introduces a dedicated workqueue for tc filters
      so that each tc filter's RCU callback could defer their
      action destroy work to this workqueue. The helper
      tcf_queue_work() is introduced for them to use.
      
      Because we hold RTNL lock when calling tcf_block_put(), we
      can not simply flush works inside it, therefore we have to
      defer it again to this workqueue and make sure all flying RCU
      callbacks have already queued their work before this one, in
      other words, to ensure this is the last one to execute to
      prevent any use-after-free.
      
      On the other hand, this makes tcf_block_put() ugly and
      harder to understand. Since David and Eric strongly dislike
      adding synchronize_rcu(), this is probably the only
      solution that could make everyone happy.
      
      Please also see the code comments below.
      Reported-by: NChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7aa0045d
  4. 13 9月, 2017 2 次提交
    • C
      net_sched: carefully handle tcf_block_put() · 1697c4bb
      Cong Wang 提交于
      As pointed out by Jiri, there is still a race condition between
      tcf_block_put() and tcf_chain_destroy() in a RCU callback. There
      is no way to make it correct without proper locking or synchronization,
      because both operate on a shared list.
      
      Locking is hard, because the only lock we can pick here is a spinlock,
      however, in tc_dump_tfilter() we iterate this list with a sleeping
      function called (tcf_chain_dump()), which makes using a lock to protect
      chain_list almost impossible.
      
      Jiri suggested the idea of holding a refcnt before flushing, this works
      because it guarantees us there would be no parallel tcf_chain_destroy()
      during the loop, therefore the race condition is gone. But we have to
      be very careful with proper synchronization with RCU callbacks.
      Suggested-by: NJiri Pirko <jiri@mellanox.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1697c4bb
    • C
      net_sched: fix reference counting of tc filter chain · e2ef7544
      Cong Wang 提交于
      This patch fixes the following ugliness of tc filter chain refcnt:
      
      a) tp proto should hold a refcnt to the chain too. This significantly
         simplifies the logic.
      
      b) Chain 0 is no longer special, it is created with refcnt=1 like any
         other chains. All the ugliness in tcf_chain_put() can be gone!
      
      c) No need to handle the flushing oddly, because block still holds
         chain 0, it can not be released, this guarantees block is the last
         user.
      
      d) The race condition with RCU callbacks is easier to handle with just
         a rcu_barrier(). Much easier to understand, nothing to hide. Thanks
         to the previous patch. Please see also the comments in code.
      
      e) Make the code understandable by humans, much less error-prone.
      
      Fixes: 744a4cf6 ("net: sched: fix use after free when tcf_chain_destroy is called multiple times")
      Fixes: 5bc17018 ("net: sched: introduce multichain support for filters")
      Cc: Jiri Pirko <jiri@mellanox.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e2ef7544
  5. 08 9月, 2017 1 次提交
  6. 26 8月, 2017 1 次提交
    • W
      net_sched: remove tc class reference counting · 143976ce
      WANG Cong 提交于
      For TC classes, their ->get() and ->put() are always paired, and the
      reference counting is completely useless, because:
      
      1) For class modification and dumping paths, we already hold RTNL lock,
         so all of these ->get(),->change(),->put() are atomic.
      
      2) For filter bindiing/unbinding, we use other reference counter than
         this one, and they should have RTNL lock too.
      
      3) For ->qlen_notify(), it is special because it is called on ->enqueue()
         path, but we already hold qdisc tree lock there, and we hold this
         tree lock when graft or delete the class too, so it should not be gone
         or changed until we release the tree lock.
      
      Therefore, this patch removes ->get() and ->put(), but:
      
      1) Adds a new ->find() to find the pointer to a class by classid, no
         refcnt.
      
      2) Move the original class destroy upon the last refcnt into ->delete(),
         right after releasing tree lock. This is fine because the class is
         already removed from hash when holding the lock.
      
      For those who also use ->put() as ->unbind(), just rename them to reflect
      this change.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      143976ce
  7. 23 8月, 2017 2 次提交
    • J
      net: sched: don't do tcf_chain_flush from tcf_chain_destroy · 30d65e8f
      Jiri Pirko 提交于
      tcf_chain_flush needs to be called with RTNL. However, on
      free_tcf->
       tcf_action_goto_chain_fini->
        tcf_chain_put->
         tcf_chain_destroy->
          tcf_chain_flush
      callpath, it is called without RTNL.
      This issue was notified by following warning:
      
      [  155.599052] WARNING: suspicious RCU usage
      [  155.603165] 4.13.0-rc5jiri+ #54 Not tainted
      [  155.607456] -----------------------------
      [  155.611561] net/sched/cls_api.c:195 suspicious rcu_dereference_protected() usage!
      
      Since on this callpath, the chain is guaranteed to be already empty
      by check in tcf_chain_put, move the tcf_chain_flush call out and call it
      only where it is needed - into tcf_block_put.
      
      Fixes: db50514f ("net: sched: add termination action to allow goto chain")
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30d65e8f
    • J
      net: sched: fix use after free when tcf_chain_destroy is called multiple times · 744a4cf6
      Jiri Pirko 提交于
      The goto_chain termination action takes a reference of a chain. In that
      case, there is an issue when block_put is called tcf_chain_destroy
      directly. The follo-up call of tcf_chain_put by goto_chain action free
      works with memory that is already freed. This was caught by kasan:
      
      [  220.337908] BUG: KASAN: use-after-free in tcf_chain_put+0x1b/0x50
      [  220.344103] Read of size 4 at addr ffff88036d1f2cec by task systemd-journal/261
      [  220.353047] CPU: 0 PID: 261 Comm: systemd-journal Not tainted 4.13.0-rc5jiri+ #54
      [  220.360661] Hardware name: Mellanox Technologies Ltd. Mellanox switch/Mellanox x86 mezzanine board, BIOS 4.6.5 08/02/2016
      [  220.371784] Call Trace:
      [  220.374290]  <IRQ>
      [  220.376355]  dump_stack+0xd5/0x150
      [  220.391485]  print_address_description+0x86/0x410
      [  220.396308]  kasan_report+0x181/0x4c0
      [  220.415211]  tcf_chain_put+0x1b/0x50
      [  220.418949]  free_tcf+0x95/0xc0
      
      So allow tcf_chain_destroy to be called multiple times, free only in
      case the reference count drops to 0.
      
      Fixes: 5bc17018 ("net: sched: introduce multichain support for filters")
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      744a4cf6
  8. 19 8月, 2017 1 次提交
  9. 10 8月, 2017 1 次提交
  10. 09 8月, 2017 1 次提交
  11. 08 8月, 2017 2 次提交
  12. 05 8月, 2017 3 次提交
  13. 26 5月, 2017 2 次提交
  14. 23 5月, 2017 2 次提交
  15. 18 5月, 2017 10 次提交
  16. 22 4月, 2017 1 次提交
    • W
      net_sched: move the empty tp check from ->destroy() to ->delete() · 763dbf63
      WANG Cong 提交于
      We could have a race condition where in ->classify() path we
      dereference tp->root and meanwhile a parallel ->destroy() makes it
      a NULL. Daniel cured this bug in commit d9363774
      ("net, sched: respect rcu grace period on cls destruction").
      
      This happens when ->destroy() is called for deleting a filter to
      check if we are the last one in tp, this tp is still linked and
      visible at that time. The root cause of this problem is the semantic
      of ->destroy(), it does two things (for non-force case):
      
      1) check if tp is empty
      2) if tp is empty we could really destroy it
      
      and its caller, if cares, needs to check its return value to see if it
      is really destroyed. Therefore we can't unlink tp unless we know it is
      empty.
      
      As suggested by Daniel, we could actually move the test logic to ->delete()
      so that we can safely unlink tp after ->delete() tells us the last one is
      just deleted and before ->destroy().
      
      Fixes: 1e052be6 ("net_sched: destroy proto tp when all filters are gone")
      Cc: Roi Dayan <roid@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      763dbf63
  17. 18 4月, 2017 1 次提交
  18. 14 4月, 2017 1 次提交
  19. 15 2月, 2017 1 次提交
  20. 11 2月, 2017 4 次提交