1. 22 3月, 2019 1 次提交
  2. 07 3月, 2019 1 次提交
    • V
      net: sched: flower: insert new filter to idr after setting its mask · ecb3dea4
      Vlad Buslov 提交于
      When adding new filter to flower classifier, fl_change() inserts it to
      handle_idr before initializing filter extensions and assigning it a mask.
      Normally this ordering doesn't matter because all flower classifier ops
      callbacks assume rtnl lock protection. However, when filter has an action
      that doesn't have its kernel module loaded, rtnl lock is released before
      call to request_module(). During this time the filter can be accessed bu
      concurrent task before its initialization is completed, which can lead to a
      crash.
      
      Example case of NULL pointer dereference in concurrent dump:
      
      Task 1                           Task 2
      
      tc_new_tfilter()
       fl_change()
        idr_alloc_u32(fnew)
        fl_set_parms()
         tcf_exts_validate()
          tcf_action_init()
           tcf_action_init_1()
            rtnl_unlock()
            request_module()
            ...                        rtnl_lock()
            				 tc_dump_tfilter()
            				  tcf_chain_dump()
      				   fl_walk()
      				    idr_get_next_ul()
      				    tcf_node_dump()
      				     tcf_fill_node()
      				      fl_dump()
      				       mask = &f->mask->key; <- NULL ptr
            rtnl_lock()
      
      Extension initialization and mask assignment don't depend on fnew->handle
      that is allocated by idr_alloc_u32(). Move idr allocation code after action
      creation and mask assignment in fl_change() to prevent concurrent access
      to not fully initialized filter when rtnl lock is released to load action
      module.
      
      Fixes: 01683a14 ("net: sched: refactor flower walk to iterate over idr")
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: NRoi Dayan <roid@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ecb3dea4
  3. 23 2月, 2019 1 次提交
  4. 14 2月, 2019 1 次提交
  5. 13 2月, 2019 2 次提交
  6. 07 2月, 2019 5 次提交
  7. 05 2月, 2019 1 次提交
  8. 18 1月, 2019 1 次提交
  9. 20 12月, 2018 1 次提交
  10. 15 12月, 2018 1 次提交
  11. 10 12月, 2018 1 次提交
  12. 16 11月, 2018 1 次提交
    • A
      net: sched: cls_flower: Classify packets using port ranges · 5c72299f
      Amritha Nambiar 提交于
      Added support in tc flower for filtering based on port ranges.
      
      Example:
      1. Match on a port range:
      -------------------------
      $ tc filter add dev enp4s0 protocol ip parent ffff:\
        prio 1 flower ip_proto tcp dst_port range 20-30 skip_hw\
        action drop
      
      $ tc -s filter show dev enp4s0 parent ffff:
      filter protocol ip pref 1 flower chain 0
      filter protocol ip pref 1 flower chain 0 handle 0x1
        eth_type ipv4
        ip_proto tcp
        dst_port range 20-30
        skip_hw
        not_in_hw
              action order 1: gact action drop
               random type none pass val 0
               index 1 ref 1 bind 1 installed 85 sec used 3 sec
              Action statistics:
              Sent 460 bytes 10 pkt (dropped 10, overlimits 0 requeues 0)
              backlog 0b 0p requeues 0
      
      2. Match on IP address and port range:
      --------------------------------------
      $ tc filter add dev enp4s0 protocol ip parent ffff:\
        prio 1 flower dst_ip 192.168.1.1 ip_proto tcp dst_port range 100-200\
        skip_hw action drop
      
      $ tc -s filter show dev enp4s0 parent ffff:
      filter protocol ip pref 1 flower chain 0 handle 0x2
        eth_type ipv4
        ip_proto tcp
        dst_ip 192.168.1.1
        dst_port range 100-200
        skip_hw
        not_in_hw
              action order 1: gact action drop
               random type none pass val 0
               index 2 ref 1 bind 1 installed 58 sec used 2 sec
              Action statistics:
              Sent 920 bytes 20 pkt (dropped 20, overlimits 0 requeues 0)
              backlog 0b 0p requeues 0
      
      v4:
      1. Added condition before setting port key.
      2. Organized setting and dumping port range keys into functions
         and added validation of input range.
      
      v3:
      1. Moved new fields in UAPI enum to the end of enum.
      2. Removed couple of empty lines.
      
      v2:
      Addressed Jiri's comments:
      1. Added separate functions for dst and src comparisons.
      2. Removed endpoint enum.
      3. Added new bit TCA_FLOWER_FLAGS_RANGE to decide normal/range
        lookup.
      4. Cleaned up fl_lookup function.
      Signed-off-by: NAmritha Nambiar <amritha.nambiar@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5c72299f
  13. 11 11月, 2018 1 次提交
  14. 05 10月, 2018 1 次提交
    • C
      net_sched: convert idrinfo->lock from spinlock to a mutex · 95278dda
      Cong Wang 提交于
      In commit ec3ed293 ("net_sched: change tcf_del_walker() to take idrinfo->lock")
      we move fl_hw_destroy_tmplt() to a workqueue to avoid blocking
      with the spinlock held. Unfortunately, this causes a lot of
      troubles here:
      
      1. tcf_chain_destroy() could be called right after we queue the work
         but before the work runs. This is a use-after-free.
      
      2. The chain refcnt is already 0, we can't even just hold it again.
         We can check refcnt==1 but it is ugly.
      
      3. The chain with refcnt 0 is still visible in its block, which means
         it could be still found and used!
      
      4. The block has a refcnt too, we can't hold it without introducing a
         proper API either.
      
      We can make it working but the end result is ugly. Instead of wasting
      time on reviewing it, let's just convert the troubling spinlock to
      a mutex, which allows us to use non-atomic allocations too.
      
      Fixes: ec3ed293 ("net_sched: change tcf_del_walker() to take idrinfo->lock")
      Reported-by: NIdo Schimmel <idosch@idosch.org>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Vlad Buslov <vladbu@mellanox.com>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Tested-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95278dda
  15. 21 9月, 2018 1 次提交
    • V
      net_sched: change tcf_del_walker() to take idrinfo->lock · ec3ed293
      Vlad Buslov 提交于
      Action API was changed to work with actions and action_idr in concurrency
      safe manner, however tcf_del_walker() still uses actions without taking a
      reference or idrinfo->lock first, and deletes them directly, disregarding
      possible concurrent delete.
      
      Change tcf_del_walker() to take idrinfo->lock while iterating over actions
      and use new tcf_idr_release_unsafe() to release them while holding the
      lock.
      
      And the blocking function fl_hw_destroy_tmplt() could be called when we
      put a filter chain, so defer it to a work queue.
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      [xiyou.wangcong@gmail.com: heavily modify the code and changelog]
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec3ed293
  16. 20 9月, 2018 1 次提交
  17. 11 9月, 2018 1 次提交
  18. 08 8月, 2018 2 次提交
    • V
      net: sched: cls_flower: set correct offload data in fl_reoffload · 9ca61630
      Vlad Buslov 提交于
      fl_reoffload implementation sets following members of struct
      tc_cls_flower_offload incorrectly:
       - masked key instead of mask
       - key instead of masked key
      
      Fix fl_reoffload to provide correct data to offload callback.
      
      Fixes: 31533cba ("net: sched: cls_flower: implement offload tcf_proto_op")
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ca61630
    • P
      net/sched: allow flower to match tunnel options · 0a6e7778
      Pieter Jansen van Vuuren 提交于
      Allow matching on options in Geneve tunnel headers.
      This makes use of existing tunnel metadata support.
      
      The options can be described in the form
      CLASS:TYPE:DATA/CLASS_MASK:TYPE_MASK:DATA_MASK, where CLASS is
      represented as a 16bit hexadecimal value, TYPE as an 8bit
      hexadecimal value and DATA as a variable length hexadecimal value.
      
      e.g.
       # ip link add name geneve0 type geneve dstport 0 external
       # tc qdisc add dev geneve0 ingress
       # tc filter add dev geneve0 protocol ip parent ffff: \
           flower \
             enc_src_ip 10.0.99.192 \
             enc_dst_ip 10.0.99.193 \
             enc_key_id 11 \
             geneve_opts 0102:80:1122334421314151/ffff:ff:ffffffffffffffff \
             ip_proto udp \
             action mirred egress redirect dev eth1
      
      This patch adds support for matching Geneve options in the order
      supplied by the user. This leads to an efficient implementation in
      the software datapath (and in our opinion hardware datapaths that
      offload this feature). It is also compatible with Geneve options
      matching provided by the Open vSwitch kernel datapath which is
      relevant here as the Flower classifier may be used as a mechanism
      to program flows into hardware as a form of Open vSwitch datapath
      offload (sometimes referred to as OVS-TC). The netlink
      Kernel/Userspace API may be extended, for example by adding a flag,
      if other matching options are desired, for example matching given
      options in any order. This would require an implementation in the
      TC software datapath. And be done in a way that drivers that
      facilitate offload of the Flower classifier can reject or accept
      such flows based on hardware datapath capabilities.
      
      This approach was discussed and agreed on at Netconf 2017 in Seoul.
      Signed-off-by: NSimon Horman <simon.horman@netronome.com>
      Signed-off-by: NPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
      Acked-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a6e7778
  19. 06 8月, 2018 1 次提交
  20. 26 7月, 2018 1 次提交
  21. 24 7月, 2018 4 次提交
  22. 20 7月, 2018 1 次提交
  23. 14 7月, 2018 1 次提交
  24. 12 7月, 2018 1 次提交
  25. 07 7月, 2018 3 次提交
  26. 26 6月, 2018 1 次提交
  27. 22 6月, 2018 1 次提交
    • P
      cls_flower: fix use after free in flower S/W path · 44a5cd43
      Paolo Abeni 提交于
      If flower filter is created without the skip_sw flag, fl_mask_put()
      can race with fl_classify() and we can destroy the mask rhashtable
      while a lookup operation is accessing it.
      
       BUG: unable to handle kernel paging request at 00000000000911d1
       PGD 0 P4D 0
       SMP PTI
       CPU: 3 PID: 5582 Comm: vhost-5541 Not tainted 4.18.0-rc1.vanilla+ #1950
       Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016
       RIP: 0010:rht_bucket_nested+0x20/0x60
       Code: 31 c8 c1 c1 18 29 c8 c3 66 90 8b 4f 04 ba 01 00 00 00 8b 07 48 8b bf 80 00 00 0
       RSP: 0018:ffffafc5cfbb7a48 EFLAGS: 00010206
       RAX: 0000000000001978 RBX: ffff9f12dff88a00 RCX: 00000000ffff9f12
       RDX: 00000000000911d1 RSI: 0000000000000148 RDI: 0000000000000001
       RBP: ffff9f12dff88a00 R08: 000000005f1cc119 R09: 00000000a715fae2
       R10: ffffafc5cfbb7aa8 R11: ffff9f1cb4be804e R12: ffff9f1265e13000
       R13: 0000000000000000 R14: ffffafc5cfbb7b48 R15: ffff9f12dff88b68
       FS:  0000000000000000(0000) GS:ffff9f1d3f0c0000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00000000000911d1 CR3: 0000001575a94006 CR4: 00000000001626e0
       Call Trace:
        fl_lookup+0x134/0x140 [cls_flower]
        fl_classify+0xf3/0x180 [cls_flower]
        tcf_classify+0x78/0x150
        __netif_receive_skb_core+0x69e/0xa50
        netif_receive_skb_internal+0x42/0xf0
        tun_get_user+0xdd5/0xfd0 [tun]
        tun_sendmsg+0x52/0x70 [tun]
        handle_tx+0x2b3/0x5f0 [vhost_net]
        vhost_worker+0xab/0x100 [vhost]
        kthread+0xf8/0x130
        ret_from_fork+0x35/0x40
       Modules linked in: act_mirred act_gact cls_flower vhost_net vhost tap sch_ingress
       CR2: 00000000000911d1
      
      Fix the above waiting for a RCU grace period before destroying the
      rhashtable: we need to use tcf_queue_work(), as rhashtable_destroy()
      must run in process context, as pointed out by Cong Wang.
      
      v1 -> v2: use tcf_queue_work to run rhashtable_destroy().
      
      Fixes: 05cd271f ("cls_flower: Support multiple masks per priority")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      44a5cd43
  28. 05 6月, 2018 2 次提交