1. 26 8月, 2017 1 次提交
    • W
      net_sched: remove tc class reference counting · 143976ce
      WANG Cong 提交于
      For TC classes, their ->get() and ->put() are always paired, and the
      reference counting is completely useless, because:
      
      1) For class modification and dumping paths, we already hold RTNL lock,
         so all of these ->get(),->change(),->put() are atomic.
      
      2) For filter bindiing/unbinding, we use other reference counter than
         this one, and they should have RTNL lock too.
      
      3) For ->qlen_notify(), it is special because it is called on ->enqueue()
         path, but we already hold qdisc tree lock there, and we hold this
         tree lock when graft or delete the class too, so it should not be gone
         or changed until we release the tree lock.
      
      Therefore, this patch removes ->get() and ->put(), but:
      
      1) Adds a new ->find() to find the pointer to a class by classid, no
         refcnt.
      
      2) Move the original class destroy upon the last refcnt into ->delete(),
         right after releasing tree lock. This is fine because the class is
         already removed from hash when holding the lock.
      
      For those who also use ->put() as ->unbind(), just rename them to reflect
      this change.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      143976ce
  2. 23 8月, 2017 2 次提交
    • J
      net: sched: don't do tcf_chain_flush from tcf_chain_destroy · 30d65e8f
      Jiri Pirko 提交于
      tcf_chain_flush needs to be called with RTNL. However, on
      free_tcf->
       tcf_action_goto_chain_fini->
        tcf_chain_put->
         tcf_chain_destroy->
          tcf_chain_flush
      callpath, it is called without RTNL.
      This issue was notified by following warning:
      
      [  155.599052] WARNING: suspicious RCU usage
      [  155.603165] 4.13.0-rc5jiri+ #54 Not tainted
      [  155.607456] -----------------------------
      [  155.611561] net/sched/cls_api.c:195 suspicious rcu_dereference_protected() usage!
      
      Since on this callpath, the chain is guaranteed to be already empty
      by check in tcf_chain_put, move the tcf_chain_flush call out and call it
      only where it is needed - into tcf_block_put.
      
      Fixes: db50514f ("net: sched: add termination action to allow goto chain")
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30d65e8f
    • J
      net: sched: fix use after free when tcf_chain_destroy is called multiple times · 744a4cf6
      Jiri Pirko 提交于
      The goto_chain termination action takes a reference of a chain. In that
      case, there is an issue when block_put is called tcf_chain_destroy
      directly. The follo-up call of tcf_chain_put by goto_chain action free
      works with memory that is already freed. This was caught by kasan:
      
      [  220.337908] BUG: KASAN: use-after-free in tcf_chain_put+0x1b/0x50
      [  220.344103] Read of size 4 at addr ffff88036d1f2cec by task systemd-journal/261
      [  220.353047] CPU: 0 PID: 261 Comm: systemd-journal Not tainted 4.13.0-rc5jiri+ #54
      [  220.360661] Hardware name: Mellanox Technologies Ltd. Mellanox switch/Mellanox x86 mezzanine board, BIOS 4.6.5 08/02/2016
      [  220.371784] Call Trace:
      [  220.374290]  <IRQ>
      [  220.376355]  dump_stack+0xd5/0x150
      [  220.391485]  print_address_description+0x86/0x410
      [  220.396308]  kasan_report+0x181/0x4c0
      [  220.415211]  tcf_chain_put+0x1b/0x50
      [  220.418949]  free_tcf+0x95/0xc0
      
      So allow tcf_chain_destroy to be called multiple times, free only in
      case the reference count drops to 0.
      
      Fixes: 5bc17018 ("net: sched: introduce multichain support for filters")
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      744a4cf6
  3. 19 8月, 2017 1 次提交
  4. 10 8月, 2017 1 次提交
  5. 09 8月, 2017 1 次提交
  6. 08 8月, 2017 2 次提交
  7. 05 8月, 2017 3 次提交
  8. 26 5月, 2017 2 次提交
  9. 23 5月, 2017 2 次提交
  10. 18 5月, 2017 10 次提交
  11. 22 4月, 2017 1 次提交
    • W
      net_sched: move the empty tp check from ->destroy() to ->delete() · 763dbf63
      WANG Cong 提交于
      We could have a race condition where in ->classify() path we
      dereference tp->root and meanwhile a parallel ->destroy() makes it
      a NULL. Daniel cured this bug in commit d9363774
      ("net, sched: respect rcu grace period on cls destruction").
      
      This happens when ->destroy() is called for deleting a filter to
      check if we are the last one in tp, this tp is still linked and
      visible at that time. The root cause of this problem is the semantic
      of ->destroy(), it does two things (for non-force case):
      
      1) check if tp is empty
      2) if tp is empty we could really destroy it
      
      and its caller, if cares, needs to check its return value to see if it
      is really destroyed. Therefore we can't unlink tp unless we know it is
      empty.
      
      As suggested by Daniel, we could actually move the test logic to ->delete()
      so that we can safely unlink tp after ->delete() tells us the last one is
      just deleted and before ->destroy().
      
      Fixes: 1e052be6 ("net_sched: destroy proto tp when all filters are gone")
      Cc: Roi Dayan <roid@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      763dbf63
  12. 18 4月, 2017 1 次提交
  13. 14 4月, 2017 1 次提交
  14. 15 2月, 2017 1 次提交
  15. 11 2月, 2017 6 次提交
  16. 27 12月, 2016 1 次提交
    • D
      net, sched: fix soft lockup in tc_classify · 628185cf
      Daniel Borkmann 提交于
      Shahar reported a soft lockup in tc_classify(), where we run into an
      endless loop when walking the classifier chain due to tp->next == tp
      which is a state we should never run into. The issue only seems to
      trigger under load in the tc control path.
      
      What happens is that in tc_ctl_tfilter(), thread A allocates a new
      tp, initializes it, sets tp_created to 1, and calls into tp->ops->change()
      with it. In that classifier callback we had to unlock/lock the rtnl
      mutex and returned with -EAGAIN. One reason why we need to drop there
      is, for example, that we need to request an action module to be loaded.
      
      This happens via tcf_exts_validate() -> tcf_action_init/_1() meaning
      after we loaded and found the requested action, we need to redo the
      whole request so we don't race against others. While we had to unlock
      rtnl in that time, thread B's request was processed next on that CPU.
      Thread B added a new tp instance successfully to the classifier chain.
      When thread A returned grabbing the rtnl mutex again, propagating -EAGAIN
      and destroying its tp instance which never got linked, we goto replay
      and redo A's request.
      
      This time when walking the classifier chain in tc_ctl_tfilter() for
      checking for existing tp instances we had a priority match and found
      the tp instance that was created and linked by thread B. Now calling
      again into tp->ops->change() with that tp was successful and returned
      without error.
      
      tp_created was never cleared in the second round, thus kernel thinks
      that we need to link it into the classifier chain (once again). tp and
      *back point to the same object due to the match we had earlier on. Thus
      for thread B's already public tp, we reset tp->next to tp itself and
      link it into the chain, which eventually causes the mentioned endless
      loop in tc_classify() once a packet hits the data path.
      
      Fix is to clear tp_created at the beginning of each request, also when
      we replay it. On the paths that can cause -EAGAIN we already destroy
      the original tp instance we had and on replay we really need to start
      from scratch. It seems that this issue was first introduced in commit
      12186be7 ("net_cls: fix unconfigured struct tcf_proto keeps chaining
      and avoid kernel panic when we use cls_cgroup").
      
      Fixes: 12186be7 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup")
      Reported-by: NShahar Klein <shahark@mellanox.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Tested-by: NShahar Klein <shahark@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      628185cf
  17. 03 12月, 2016 1 次提交
    • H
      net/sched: cls_flower: Add offload support using egress Hardware device · 7091d8c7
      Hadar Hen Zion 提交于
      In order to support hardware offloading when the device given by the tc
      rule is different from the Hardware underline device, extract the mirred
      (egress) device from the tc action when a filter is added, using the new
      tc_action_ops, get_dev().
      
      Flower caches the information about the mirred device and use it for
      calling ndo_setup_tc in filter change, update stats and delete.
      
      Calling ndo_setup_tc of the mirred (egress) device instead of the
      ingress device will allow a resolution between the software ingress
      device and the underline hardware device.
      
      The resolution will take place inside the offloading driver using
      'egress_device' flag added to tc_to_netdev struct which is provided to
      the offloading driver.
      Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7091d8c7
  18. 25 11月, 2016 1 次提交
  19. 18 11月, 2016 1 次提交
  20. 28 10月, 2016 1 次提交
    • J
      net sched filters: fix notification of filter delete with proper handle · 9ee78374
      Jamal Hadi Salim 提交于
      Daniel says:
      
      While trying out [1][2], I noticed that tc monitor doesn't show the
      correct handle on delete:
      
      $ tc monitor
      qdisc clsact ffff: dev eno1 parent ffff:fff1
      filter dev eno1 ingress protocol all pref 49152 bpf handle 0x2a [...]
      deleted filter dev eno1 ingress protocol all pref 49152 bpf handle 0xf3be0c80
      
      some context to explain the above:
      The user identity of any tc filter is represented by a 32-bit
      identifier encoded in tcm->tcm_handle. Example 0x2a in the bpf filter
      above. A user wishing to delete, get or even modify a specific filter
      uses this handle to reference it.
      Every classifier is free to provide its own semantics for the 32 bit handle.
      Example: classifiers like u32 use schemes like 800:1:801 to describe
      the semantics of their filters represented as hash table, bucket and
      node ids etc.
      Classifiers also have internal per-filter representation which is different
      from this externally visible identity. Most classifiers set this
      internal representation to be a pointer address (which allows fast retrieval
      of said filters in their implementations). This internal representation
      is referenced with the "fh" variable in the kernel control code.
      
      When a user successfuly deletes a specific filter, by specifying the correct
      tcm->tcm_handle, an event is generated to user space which indicates
      which specific filter was deleted.
      
      Before this patch, the "fh" value was sent to user space as the identity.
      As an example what is shown in the sample bpf filter delete event above
      is 0xf3be0c80. This is infact a 32-bit truncation of 0xffff8807f3be0c80
      which happens to be a 64-bit memory address of the internal filter
      representation (address of the corresponding filter's struct cls_bpf_prog);
      
      After this patch the appropriate user identifiable handle as encoded
      in the originating request tcm->tcm_handle is generated in the event.
      One of the cardinal rules of netlink rules is to be able to take an
      event (such as a delete in this case) and reflect it back to the
      kernel and successfully delete the filter. This patch achieves that.
      
      Note, this issue has existed since the original TC action
      infrastructure code patch back in 2004 as found in:
      https://git.kernel.org/cgit/linux/kernel/git/history/history.git/commit/
      
      [1] http://patchwork.ozlabs.org/patch/682828/
      [2] http://patchwork.ozlabs.org/patch/682829/
      
      Fixes: 4e54c4816bfe ("[NET]: Add tc extensions infrastructure.")
      Reported-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ee78374