1. 01 9月, 2017 1 次提交
    • C
      net_sched: add reverse binding for tc class · 07d79fc7
      Cong Wang 提交于
      TC filters when used as classifiers are bound to TC classes.
      However, there is a hidden difference when adding them in different
      orders:
      
      1. If we add tc classes before its filters, everything is fine.
         Logically, the classes exist before we specify their ID's in
         filters, it is easy to bind them together, just as in the current
         code base.
      
      2. If we add tc filters before the tc classes they bind, we have to
         do dynamic lookup in fast path. What's worse, this happens all
         the time not just once, because on fast path tcf_result is passed
         on stack, there is no way to propagate back to the one in tc filters.
      
      This hidden difference hurts performance silently if we have many tc
      classes in hierarchy.
      
      This patch intends to close this gap by doing the reverse binding when
      we create a new class, in this case we can actually search all the
      filters in its parent, match and fixup by classid. And because
      tcf_result is specific to each type of tc filter, we have to introduce
      a new ops for each filter to tell how to bind the class.
      
      Note, we still can NOT totally get rid of those class lookup in
      ->enqueue() because cgroup and flow filters have no way to determine
      the classid at setup time, they still have to go through dynamic lookup.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      07d79fc7
  2. 31 8月, 2017 6 次提交
  3. 30 8月, 2017 10 次提交
  4. 29 8月, 2017 20 次提交
  5. 26 8月, 2017 3 次提交
    • E
      tcp: fix hang in tcp_sendpage_locked() · bd9dfc54
      Eric Dumazet 提交于
      syszkaller got a hang in tcp stack, related to a bug in
      tcp_sendpage_locked()
      
      root@syzkaller:~# cat /proc/3059/stack
      [<ffffffff83de926c>] __lock_sock+0x1dc/0x2f0
      [<ffffffff83de9473>] lock_sock_nested+0xf3/0x110
      [<ffffffff8408ce01>] tcp_sendmsg+0x21/0x50
      [<ffffffff84163b6f>] inet_sendmsg+0x11f/0x5e0
      [<ffffffff83dd8eea>] sock_sendmsg+0xca/0x110
      [<ffffffff83dd9547>] kernel_sendmsg+0x47/0x60
      [<ffffffff83de35dc>] sock_no_sendpage+0x1cc/0x280
      [<ffffffff8408916b>] tcp_sendpage_locked+0x10b/0x160
      [<ffffffff84089203>] tcp_sendpage+0x43/0x60
      [<ffffffff841641da>] inet_sendpage+0x1aa/0x660
      [<ffffffff83dd4fcd>] kernel_sendpage+0x8d/0xe0
      [<ffffffff83dd50ac>] sock_sendpage+0x8c/0xc0
      [<ffffffff81b63300>] pipe_to_sendpage+0x290/0x3b0
      [<ffffffff81b67243>] __splice_from_pipe+0x343/0x750
      [<ffffffff81b6a459>] splice_from_pipe+0x1e9/0x330
      [<ffffffff81b6a5e0>] generic_splice_sendpage+0x40/0x50
      [<ffffffff81b6b1d7>] SyS_splice+0x7b7/0x1610
      [<ffffffff84d77a01>] entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      Fixes: 306b13eb ("proto_ops: Add locked held versions of sendmsg and sendpage")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: Tom Herbert <tom@quantonium.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd9dfc54
    • W
      net_sched: kill u32_node pointer in Qdisc · 3cd904ec
      WANG Cong 提交于
      It is ugly to hide a u32-filter-specific pointer inside Qdisc,
      this breaks the TC layers:
      
      1. Qdisc is a generic representation, should not have any specific
         data of any type
      
      2. Qdisc layer is above filter layer, should only save filters in
         the list of struct tcf_proto.
      
      This pointer is used as the head of the chain of u32 hash tables,
      that is struct tc_u_hnode, because u32 filter is very special,
      it allows to create multiple hash tables within one qdisc and
      across multiple u32 filters.
      
      Instead of using this ugly pointer, we can just save it in a global
      hash table key'ed by (dev ifindex, qdisc handle), therefore we can
      still treat it as a per qdisc basis data structure conceptually.
      
      Of course, because of network namespaces, this key is not unique
      at all, but it is fine as we already have a pointer to Qdisc in
      struct tc_u_common, we can just compare the pointers when collision.
      
      And this only affects slow paths, has no impact to fast path,
      thanks to the pointer ->tp_c.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3cd904ec
    • W
      net_sched: remove tc class reference counting · 143976ce
      WANG Cong 提交于
      For TC classes, their ->get() and ->put() are always paired, and the
      reference counting is completely useless, because:
      
      1) For class modification and dumping paths, we already hold RTNL lock,
         so all of these ->get(),->change(),->put() are atomic.
      
      2) For filter bindiing/unbinding, we use other reference counter than
         this one, and they should have RTNL lock too.
      
      3) For ->qlen_notify(), it is special because it is called on ->enqueue()
         path, but we already hold qdisc tree lock there, and we hold this
         tree lock when graft or delete the class too, so it should not be gone
         or changed until we release the tree lock.
      
      Therefore, this patch removes ->get() and ->put(), but:
      
      1) Adds a new ->find() to find the pointer to a class by classid, no
         refcnt.
      
      2) Move the original class destroy upon the last refcnt into ->delete(),
         right after releasing tree lock. This is fine because the class is
         already removed from hash when holding the lock.
      
      For those who also use ->put() as ->unbind(), just rename them to reflect
      this change.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      143976ce