1. 28 3月, 2018 1 次提交
  2. 10 3月, 2018 1 次提交
  3. 28 2月, 2018 1 次提交
  4. 21 2月, 2018 1 次提交
    • R
      net: sched: report if filter is too large to dump · 5ae437ad
      Roman Kapl 提交于
      So far, if the filter was too large to fit in the allocated skb, the
      kernel did not return any error and stopped dumping. Modify the dumper
      so that it returns -EMSGSIZE when a filter fails to dump and it is the
      first filter in the skb. If we are not first, we will get a next chance
      with more room.
      
      I understand this is pretty near to being an API change, but the
      original design (silent truncation) can be considered a bug.
      
      Note: The error case can happen pretty easily if you create a filter
      with 32 actions and have 4kb pages. Also recent versions of iproute try
      to be clever with their buffer allocation size, which in turn leads to
      Signed-off-by: NRoman Kapl <code@rkapl.cz>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5ae437ad
  5. 17 2月, 2018 3 次提交
  6. 14 2月, 2018 1 次提交
  7. 07 2月, 2018 3 次提交
  8. 25 1月, 2018 1 次提交
  9. 20 1月, 2018 4 次提交
  10. 18 1月, 2018 8 次提交
  11. 27 12月, 2017 1 次提交
  12. 22 12月, 2017 2 次提交
  13. 16 12月, 2017 1 次提交
  14. 14 12月, 2017 1 次提交
  15. 09 12月, 2017 1 次提交
    • J
      net: sched: fix use-after-free in tcf_block_put_ext · df45bf84
      Jiri Pirko 提交于
      Since the block is freed with last chain being put, once we reach the
      end of iteration of list_for_each_entry_safe, the block may be
      already freed. I'm hitting this only by creating and deleting clsact:
      
      [  202.171952] ==================================================================
      [  202.180182] BUG: KASAN: use-after-free in tcf_block_put_ext+0x240/0x390
      [  202.187590] Read of size 8 at addr ffff880225539a80 by task tc/796
      [  202.194508]
      [  202.196185] CPU: 0 PID: 796 Comm: tc Not tainted 4.15.0-rc2jiri+ #5
      [  202.203200] Hardware name: Mellanox Technologies Ltd. "MSN2100-CB2F"/"SA001017", BIOS 5.6.5 06/07/2016
      [  202.213613] Call Trace:
      [  202.216369]  dump_stack+0xda/0x169
      [  202.220192]  ? dma_virt_map_sg+0x147/0x147
      [  202.224790]  ? show_regs_print_info+0x54/0x54
      [  202.229691]  ? tcf_chain_destroy+0x1dc/0x250
      [  202.234494]  print_address_description+0x83/0x3d0
      [  202.239781]  ? tcf_block_put_ext+0x240/0x390
      [  202.244575]  kasan_report+0x1ba/0x460
      [  202.248707]  ? tcf_block_put_ext+0x240/0x390
      [  202.253518]  tcf_block_put_ext+0x240/0x390
      [  202.258117]  ? tcf_chain_flush+0x290/0x290
      [  202.262708]  ? qdisc_hash_del+0x82/0x1a0
      [  202.267111]  ? qdisc_hash_add+0x50/0x50
      [  202.271411]  ? __lock_is_held+0x5f/0x1a0
      [  202.275843]  clsact_destroy+0x3d/0x80 [sch_ingress]
      [  202.281323]  qdisc_destroy+0xcb/0x240
      [  202.285445]  qdisc_graft+0x216/0x7b0
      [  202.289497]  tc_get_qdisc+0x260/0x560
      
      Fix this by holding the block also by chain 0 and put chain 0
      explicitly, out of the list_for_each_entry_safe loop at the very
      end of tcf_block_put_ext.
      
      Fixes: efbf7897 ("net_sched: get rid of rcu_barrier() in tcf_block_put_ext()")
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df45bf84
  16. 06 12月, 2017 1 次提交
    • C
      net_sched: get rid of rcu_barrier() in tcf_block_put_ext() · efbf7897
      Cong Wang 提交于
      Both Eric and Paolo noticed the rcu_barrier() we use in
      tcf_block_put_ext() could be a performance bottleneck when
      we have a lot of tc classes.
      
      Paolo provided the following to demonstrate the issue:
      
      tc qdisc add dev lo root htb
      for I in `seq 1 1000`; do
              tc class add dev lo parent 1: classid 1:$I htb rate 100kbit
              tc qdisc add dev lo parent 1:$I handle $((I + 1)): htb
              for J in `seq 1 10`; do
                      tc filter add dev lo parent $((I + 1)): u32 match ip src 1.1.1.$J
              done
      done
      time tc qdisc del dev root
      
      real    0m54.764s
      user    0m0.023s
      sys     0m0.000s
      
      The rcu_barrier() there is to ensure we free the block after all chains
      are gone, that is, to queue tcf_block_put_final() at the tail of workqueue.
      We can achieve this ordering requirement by refcnt'ing tcf block instead,
      that is, the tcf block is freed only when the last chain in this block is
      gone. This also simplifies the code.
      
      Paolo reported after this patch we get:
      
      real    0m0.017s
      user    0m0.000s
      sys     0m0.017s
      Tested-by: NPaolo Abeni <pabeni@redhat.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      efbf7897
  17. 25 11月, 2017 1 次提交
    • R
      net: sched: crash on blocks with goto chain action · a60b3f51
      Roman Kapl 提交于
      tcf_block_put_ext has assumed that all filters (and thus their goto
      actions) are destroyed in RCU callback and thus can not race with our
      list iteration. However, that is not true during netns cleanup (see
      tcf_exts_get_net comment).
      
      Prevent the user after free by holding all chains (except 0, that one is
      already held). foreach_safe is not enough in this case.
      
      To reproduce, run the following in a netns and then delete the ns:
          ip link add dtest type dummy
          tc qdisc add dev dtest ingress
          tc filter add dev dtest chain 1 parent ffff: handle 1 prio 1 flower action goto chain 2
      
      Fixes: 822e86d9 ("net_sched: remove tcf_block_put_deferred()")
      Signed-off-by: NRoman Kapl <code@rkapl.cz>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a60b3f51
  18. 24 11月, 2017 1 次提交
    • R
      net: sched: fix crash when deleting secondary chains · d7aa04a5
      Roman Kapl 提交于
      If you flush (delete) a filter chain other than chain 0 (such as when
      deleting the device), the kernel may run into a use-after-free. The
      chain refcount must not be decremented unless we are sure we are done
      with the chain.
      
      To reproduce the bug, run:
          ip link add dtest type dummy
          tc qdisc add dev dtest ingress
          tc filter add dev dtest chain 1  parent ffff: flower
          ip link del dtest
      
      Introduced in: commit f93e1cdc ("net/sched: fix filter flushing"),
      but unless you have KAsan or luck, you won't notice it until
      commit 0dadc117 ("cls_flower: use tcf_exts_get_net() before call_rcu()")
      
      Fixes: f93e1cdc ("net/sched: fix filter flushing")
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NRoman Kapl <code@rkapl.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7aa04a5
  19. 09 11月, 2017 1 次提交
    • C
      net_sched: introduce tcf_exts_get_net() and tcf_exts_put_net() · e4b95c41
      Cong Wang 提交于
      Instead of holding netns refcnt in tc actions, we can minimize
      the holding time by saving it in struct tcf_exts instead. This
      means we can just hold netns refcnt right before call_rcu() and
      release it after tcf_exts_destroy() is done.
      
      However, because on netns cleanup path we call tcf_proto_destroy()
      too, obviously we can not hold netns for a zero refcnt, in this
      case we have to do cleanup synchronously. It is fine for RCU too,
      the caller cleanup_net() already waits for a grace period.
      
      For other cases, refcnt is non-zero and we can safely grab it as
      normal and release it after we are done.
      
      This patch provides two new API for each filter to use:
      tcf_exts_get_net() and tcf_exts_put_net(). And all filters now can
      use the following pattern:
      
      void __destroy_filter() {
        tcf_exts_destroy();
        tcf_exts_put_net();  // <== release netns refcnt
        kfree();
      }
      void some_work() {
        rtnl_lock();
        __destroy_filter();
        rtnl_unlock();
      }
      void some_rcu_callback() {
        tcf_queue_work(some_work);
      }
      
      if (tcf_exts_get_net())  // <== hold netns refcnt
        call_rcu(some_rcu_callback);
      else
        __destroy_filter();
      
      Cc: Lucas Bates <lucasb@mojatatu.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e4b95c41
  20. 03 11月, 2017 3 次提交
  21. 02 11月, 2017 2 次提交
  22. 31 10月, 2017 1 次提交
    • C
      net_sched: remove tcf_block_put_deferred() · 822e86d9
      Cong Wang 提交于
      In commit 7aa0045d ("net_sched: introduce a workqueue for RCU callbacks of tc filter")
      I defer tcf_chain_flush() to a workqueue, this causes a use-after-free
      because qdisc is already destroyed after we queue this work.
      
      The tcf_block_put_deferred() is no longer necessary after we get RTNL
      for each tc filter destroy work, no others could jump in at this point.
      Same for tcf_chain_hold(), we are fully serialized now.
      
      This also reduces one indirection therefore makes the code more
      readable. Note this brings back a rcu_barrier(), however comparing
      to the code prior to commit 7aa0045d we still reduced one
      rcu_barrier(). For net-next, we can consider to refcnt tcf block to
      avoid it.
      
      Fixes: 7aa0045d ("net_sched: introduce a workqueue for RCU callbacks of tc filter")
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      822e86d9