1. 08 10月, 2018 1 次提交
    • A
      net: sched: cls_u32: fix hnode refcounting · 6d4c4077
      Al Viro 提交于
      cls_u32.c misuses refcounts for struct tc_u_hnode - it counts references
      via ->hlist and via ->tp_root together.  u32_destroy() drops the former
      and, in case when there had been links, leaves the sucker on the list.
      As the result, there's nothing to protect it from getting freed once links
      are dropped.
      That also makes the "is it busy" check incapable of catching the root
      hnode - it *is* busy (there's a reference from tp), but we don't see it as
      something separate.  "Is it our root?" check partially covers that, but
      the problem exists for others' roots as well.
      
      AFAICS, the minimal fix preserving the existing behaviour (where it doesn't
      include oopsen, that is) would be this:
              * count tp->root and tp_c->hlist as separate references.  I.e.
      have u32_init() set refcount to 2, not 1.
      	* in u32_destroy() we always drop the former;
      in u32_destroy_hnode() - the latter.
      
      	That way we have *all* references contributing to refcount.  List
      removal happens in u32_destroy_hnode() (called only when ->refcnt is 1)
      an in u32_destroy() in case of tc_u_common going away, along with
      everything reachable from it.  IOW, that way we know that
      u32_destroy_key() won't free something still on the list (or pointed to by
      someone's ->root).
      
      Reproducer:
      
      tc qdisc add dev eth0 ingress
      tc filter add dev eth0 parent ffff: protocol ip prio 100 handle 1: \
      u32 divisor 1
      tc filter add dev eth0 parent ffff: protocol ip prio 200 handle 2: \
      u32 divisor 1
      tc filter add dev eth0 parent ffff: protocol ip prio 100 \
      handle 1:0:11 u32 ht 1: link 801: offset at 0 mask 0f00 shift 6 \
      plus 0 eat match ip protocol 6 ff
      tc filter delete dev eth0 parent ffff: protocol ip prio 200
      tc filter change dev eth0 parent ffff: protocol ip prio 100 \
      handle 1:0:11 u32 ht 1: link 0: offset at 0 mask 0f00 shift 6 plus 0 \
      eat match ip protocol 6 ff
      tc filter delete dev eth0 parent ffff: protocol ip prio 100
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6d4c4077
  2. 27 8月, 2018 1 次提交
  3. 26 6月, 2018 1 次提交
  4. 25 5月, 2018 1 次提交
  5. 08 4月, 2018 1 次提交
  6. 14 2月, 2018 1 次提交
  7. 09 2月, 2018 1 次提交
    • I
      net/sched: cls_u32: fix cls_u32 on filter replace · eb53f7af
      Ivan Vecera 提交于
      The following sequence is currently broken:
      
       # tc qdisc add dev foo ingress
       # tc filter replace dev foo protocol all ingress \
         u32 match u8 0 0 action mirred egress mirror dev bar1
       # tc filter replace dev foo protocol all ingress \
         handle 800::800 pref 49152 \
         u32 match u8 0 0 action mirred egress mirror dev bar2
       Error: cls_u32: Key node flags do not match passed flags.
       We have an error talking to the kernel, -1
      
      The error comes from u32_change() when comparing new and
      existing flags. The existing ones always contains one of
      TCA_CLS_FLAGS_{,NOT}_IN_HW flag depending on offloading state.
      These flags cannot be passed from userspace so the condition
      (n->flags != flags) in u32_change() always fails.
      
      Fix the condition so the flags TCA_CLS_FLAGS_NOT_IN_HW and
      TCA_CLS_FLAGS_IN_HW are not taken into account.
      
      Fixes: 24d3dc6d ("net/sched: cls_u32: Reflect HW offload status")
      Signed-off-by: NIvan Vecera <ivecera@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb53f7af
  8. 07 2月, 2018 5 次提交
    • M
      cls_u32: Convert to idr_alloc_u32 · f730cb93
      Matthew Wilcox 提交于
      No real benefit to this classifier, but since we're allocating a u32
      anyway, we should use this function.
      Signed-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
      f730cb93
    • M
      cls_u32: Reinstate cyclic allocation · ffdc2d9e
      Matthew Wilcox 提交于
      Commit e7614370 ("net_sched: use idr to allocate u32 filter handles)
      converted htid allocation to use the IDR.  The ID allocated by this
      scheme changes; it used to be cyclic, but now always allocates the
      lowest available.  The IDR supports cyclic allocation, so just use
      the right function.
      Signed-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
      ffdc2d9e
    • M
      idr: Delete idr_replace_ext function · 234a4624
      Matthew Wilcox 提交于
      Changing idr_replace's 'id' argument to 'unsigned long' works for all
      callers.  Callers which passed a negative ID now get -ENOENT instead of
      -EINVAL.  No callers relied on this error value.
      Signed-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
      234a4624
    • M
      idr: Delete idr_remove_ext function · 9c160941
      Matthew Wilcox 提交于
      Simply changing idr_remove's 'id' argument to 'unsigned long' suffices
      for all callers.
      Signed-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
      9c160941
    • P
      cls_u32: fix use after free in u32_destroy_key() · d7cdee5e
      Paolo Abeni 提交于
      Li Shuang reported an Oops with cls_u32 due to an use-after-free
      in u32_destroy_key(). The use-after-free can be triggered with:
      
      dev=lo
      tc qdisc add dev $dev root handle 1: htb default 10
      tc filter add dev $dev parent 1: prio 5 handle 1: protocol ip u32 divisor 256
      tc filter add dev $dev protocol ip parent 1: prio 5 u32 ht 800:: match ip dst\
       10.0.0.0/8 hashkey mask 0x0000ff00 at 16 link 1:
      tc qdisc del dev $dev root
      
      Which causes the following kasan splat:
      
       ==================================================================
       BUG: KASAN: use-after-free in u32_destroy_key.constprop.21+0x117/0x140 [cls_u32]
       Read of size 4 at addr ffff881b83dae618 by task kworker/u48:5/571
      
       CPU: 17 PID: 571 Comm: kworker/u48:5 Not tainted 4.15.0+ #87
       Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016
       Workqueue: tc_filter_workqueue u32_delete_key_freepf_work [cls_u32]
       Call Trace:
        dump_stack+0xd6/0x182
        ? dma_virt_map_sg+0x22e/0x22e
        print_address_description+0x73/0x290
        kasan_report+0x277/0x360
        ? u32_destroy_key.constprop.21+0x117/0x140 [cls_u32]
        u32_destroy_key.constprop.21+0x117/0x140 [cls_u32]
        u32_delete_key_freepf_work+0x1c/0x30 [cls_u32]
        process_one_work+0xae0/0x1c80
        ? sched_clock+0x5/0x10
        ? pwq_dec_nr_in_flight+0x3c0/0x3c0
        ? _raw_spin_unlock_irq+0x29/0x40
        ? trace_hardirqs_on_caller+0x381/0x570
        ? _raw_spin_unlock_irq+0x29/0x40
        ? finish_task_switch+0x1e5/0x760
        ? finish_task_switch+0x208/0x760
        ? preempt_notifier_dec+0x20/0x20
        ? __schedule+0x839/0x1ee0
        ? check_noncircular+0x20/0x20
        ? firmware_map_remove+0x73/0x73
        ? find_held_lock+0x39/0x1c0
        ? worker_thread+0x434/0x1820
        ? lock_contended+0xee0/0xee0
        ? lock_release+0x1100/0x1100
        ? init_rescuer.part.16+0x150/0x150
        ? retint_kernel+0x10/0x10
        worker_thread+0x216/0x1820
        ? process_one_work+0x1c80/0x1c80
        ? lock_acquire+0x1a5/0x540
        ? lock_downgrade+0x6b0/0x6b0
        ? sched_clock+0x5/0x10
        ? lock_release+0x1100/0x1100
        ? compat_start_thread+0x80/0x80
        ? do_raw_spin_trylock+0x190/0x190
        ? _raw_spin_unlock_irq+0x29/0x40
        ? trace_hardirqs_on_caller+0x381/0x570
        ? _raw_spin_unlock_irq+0x29/0x40
        ? finish_task_switch+0x1e5/0x760
        ? finish_task_switch+0x208/0x760
        ? preempt_notifier_dec+0x20/0x20
        ? __schedule+0x839/0x1ee0
        ? kmem_cache_alloc_trace+0x143/0x320
        ? firmware_map_remove+0x73/0x73
        ? sched_clock+0x5/0x10
        ? sched_clock_cpu+0x18/0x170
        ? find_held_lock+0x39/0x1c0
        ? schedule+0xf3/0x3b0
        ? lock_downgrade+0x6b0/0x6b0
        ? __schedule+0x1ee0/0x1ee0
        ? do_wait_intr_irq+0x340/0x340
        ? do_raw_spin_trylock+0x190/0x190
        ? _raw_spin_unlock_irqrestore+0x32/0x60
        ? process_one_work+0x1c80/0x1c80
        ? process_one_work+0x1c80/0x1c80
        kthread+0x312/0x3d0
        ? kthread_create_worker_on_cpu+0xc0/0xc0
        ret_from_fork+0x3a/0x50
      
       Allocated by task 1688:
        kasan_kmalloc+0xa0/0xd0
        __kmalloc+0x162/0x380
        u32_change+0x1220/0x3c9e [cls_u32]
        tc_ctl_tfilter+0x1ba6/0x2f80
        rtnetlink_rcv_msg+0x4f0/0x9d0
        netlink_rcv_skb+0x124/0x320
        netlink_unicast+0x430/0x600
        netlink_sendmsg+0x8fa/0xd60
        sock_sendmsg+0xb1/0xe0
        ___sys_sendmsg+0x678/0x980
        __sys_sendmsg+0xc4/0x210
        do_syscall_64+0x232/0x7f0
        return_from_SYSCALL_64+0x0/0x75
      
       Freed by task 112:
        kasan_slab_free+0x71/0xc0
        kfree+0x114/0x320
        rcu_process_callbacks+0xc3f/0x1600
        __do_softirq+0x2bf/0xc06
      
       The buggy address belongs to the object at ffff881b83dae600
        which belongs to the cache kmalloc-4096 of size 4096
       The buggy address is located 24 bytes inside of
        4096-byte region [ffff881b83dae600, ffff881b83daf600)
       The buggy address belongs to the page:
       page:ffffea006e0f6a00 count:1 mapcount:0 mapping:          (null) index:0x0 compound_mapcount: 0
       flags: 0x17ffffc0008100(slab|head)
       raw: 0017ffffc0008100 0000000000000000 0000000000000000 0000000100070007
       raw: dead000000000100 dead000000000200 ffff880187c0e600 0000000000000000
       page dumped because: kasan: bad access detected
      
       Memory state around the buggy address:
        ffff881b83dae500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
        ffff881b83dae580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       >ffff881b83dae600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                   ^
        ffff881b83dae680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        ffff881b83dae700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ==================================================================
      
      The problem is that the htnode is freed before the linked knodes and the
      latter will try to access the first at u32_destroy_key() time.
      This change addresses the issue using the htnode refcnt to guarantee
      the correct free order. While at it also add a RCU annotation,
      to keep sparse happy.
      
      v1 -> v2: use rtnl_derefence() instead of RCU read locks
      v2 -> v3:
        - don't check refcnt in u32_destroy_hnode()
        - cleaned-up u32_destroy() implementation
        - cleaned-up code comment
      v3 -> v4:
        - dropped unneeded comment
      Reported-by: NLi Shuang <shuali@redhat.com>
      Fixes: c0d378ef ("net_sched: use tcf_queue_work() in u32 filter")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7cdee5e
  9. 03 2月, 2018 1 次提交
  10. 25 1月, 2018 4 次提交
  11. 23 1月, 2018 2 次提交
  12. 20 1月, 2018 6 次提交
  13. 18 1月, 2018 1 次提交
  14. 14 12月, 2017 1 次提交
  15. 09 11月, 2017 1 次提交
  16. 04 11月, 2017 1 次提交
  17. 29 10月, 2017 1 次提交
  18. 21 10月, 2017 3 次提交
  19. 20 10月, 2017 1 次提交
  20. 17 10月, 2017 1 次提交
  21. 29 9月, 2017 1 次提交
  22. 01 9月, 2017 1 次提交
    • C
      net_sched: add reverse binding for tc class · 07d79fc7
      Cong Wang 提交于
      TC filters when used as classifiers are bound to TC classes.
      However, there is a hidden difference when adding them in different
      orders:
      
      1. If we add tc classes before its filters, everything is fine.
         Logically, the classes exist before we specify their ID's in
         filters, it is easy to bind them together, just as in the current
         code base.
      
      2. If we add tc filters before the tc classes they bind, we have to
         do dynamic lookup in fast path. What's worse, this happens all
         the time not just once, because on fast path tcf_result is passed
         on stack, there is no way to propagate back to the one in tc filters.
      
      This hidden difference hurts performance silently if we have many tc
      classes in hierarchy.
      
      This patch intends to close this gap by doing the reverse binding when
      we create a new class, in this case we can actually search all the
      filters in its parent, match and fixup by classid. And because
      tcf_result is specific to each type of tc filter, we have to introduce
      a new ops for each filter to tell how to bind the class.
      
      Note, we still can NOT totally get rid of those class lookup in
      ->enqueue() because cgroup and flow filters have no way to determine
      the classid at setup time, they still have to go through dynamic lookup.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      07d79fc7
  23. 26 8月, 2017 1 次提交
    • W
      net_sched: kill u32_node pointer in Qdisc · 3cd904ec
      WANG Cong 提交于
      It is ugly to hide a u32-filter-specific pointer inside Qdisc,
      this breaks the TC layers:
      
      1. Qdisc is a generic representation, should not have any specific
         data of any type
      
      2. Qdisc layer is above filter layer, should only save filters in
         the list of struct tcf_proto.
      
      This pointer is used as the head of the chain of u32 hash tables,
      that is struct tc_u_hnode, because u32 filter is very special,
      it allows to create multiple hash tables within one qdisc and
      across multiple u32 filters.
      
      Instead of using this ugly pointer, we can just save it in a global
      hash table key'ed by (dev ifindex, qdisc handle), therefore we can
      still treat it as a per qdisc basis data structure conceptually.
      
      Of course, because of network namespaces, this key is not unique
      at all, but it is fine as we already have a pointer to Qdisc in
      struct tc_u_common, we can just compare the pointers when collision.
      
      And this only affects slow paths, has no impact to fast path,
      thanks to the pointer ->tp_c.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3cd904ec
  24. 12 8月, 2017 1 次提交
  25. 08 8月, 2017 1 次提交