1. 08 5月, 2019 1 次提交
  2. 06 5月, 2019 1 次提交
  3. 28 4月, 2019 2 次提交
    • J
      netlink: make validation more configurable for future strictness · 8cb08174
      Johannes Berg 提交于
      We currently have two levels of strict validation:
      
       1) liberal (default)
           - undefined (type >= max) & NLA_UNSPEC attributes accepted
           - attribute length >= expected accepted
           - garbage at end of message accepted
       2) strict (opt-in)
           - NLA_UNSPEC attributes accepted
           - attribute length >= expected accepted
      
      Split out parsing strictness into four different options:
       * TRAILING     - check that there's no trailing data after parsing
                        attributes (in message or nested)
       * MAXTYPE      - reject attrs > max known type
       * UNSPEC       - reject attributes with NLA_UNSPEC policy entries
       * STRICT_ATTRS - strictly validate attribute size
      
      The default for future things should be *everything*.
      The current *_strict() is a combination of TRAILING and MAXTYPE,
      and is renamed to _deprecated_strict().
      The current regular parsing has none of this, and is renamed to
      *_parse_deprecated().
      
      Additionally it allows us to selectively set one of the new flags
      even on old policies. Notably, the UNSPEC flag could be useful in
      this case, since it can be arranged (by filling in the policy) to
      not be an incompatible userspace ABI change, but would then going
      forward prevent forgetting attribute entries. Similar can apply
      to the POLICY flag.
      
      We end up with the following renames:
       * nla_parse           -> nla_parse_deprecated
       * nla_parse_strict    -> nla_parse_deprecated_strict
       * nlmsg_parse         -> nlmsg_parse_deprecated
       * nlmsg_parse_strict  -> nlmsg_parse_deprecated_strict
       * nla_parse_nested    -> nla_parse_nested_deprecated
       * nla_validate_nested -> nla_validate_nested_deprecated
      
      Using spatch, of course:
          @@
          expression TB, MAX, HEAD, LEN, POL, EXT;
          @@
          -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
          +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)
      
          @@
          expression NLH, HDRLEN, TB, MAX, POL, EXT;
          @@
          -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
          +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)
      
          @@
          expression NLH, HDRLEN, TB, MAX, POL, EXT;
          @@
          -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
          +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
      
          @@
          expression TB, MAX, NLA, POL, EXT;
          @@
          -nla_parse_nested(TB, MAX, NLA, POL, EXT)
          +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)
      
          @@
          expression START, MAX, POL, EXT;
          @@
          -nla_validate_nested(START, MAX, POL, EXT)
          +nla_validate_nested_deprecated(START, MAX, POL, EXT)
      
          @@
          expression NLH, HDRLEN, MAX, POL, EXT;
          @@
          -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
          +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)
      
      For this patch, don't actually add the strict, non-renamed versions
      yet so that it breaks compile if I get it wrong.
      
      Also, while at it, make nla_validate and nla_parse go down to a
      common __nla_validate_parse() function to avoid code duplication.
      
      Ultimately, this allows us to have very strict validation for every
      new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
      next patch, while existing things will continue to work as is.
      
      In effect then, this adds fully strict validation for any new command.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8cb08174
    • M
      netlink: make nla_nest_start() add NLA_F_NESTED flag · ae0be8de
      Michal Kubecek 提交于
      Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
      netlink based interfaces (including recently added ones) are still not
      setting it in kernel generated messages. Without the flag, message parsers
      not aware of attribute semantics (e.g. wireshark dissector or libmnl's
      mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
      the structure of their contents.
      
      Unfortunately we cannot just add the flag everywhere as there may be
      userspace applications which check nlattr::nla_type directly rather than
      through a helper masking out the flags. Therefore the patch renames
      nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
      as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
      are rewritten to use nla_nest_start().
      
      Except for changes in include/net/netlink.h, the patch was generated using
      this semantic patch:
      
      @@ expression E1, E2; @@
      -nla_nest_start(E1, E2)
      +nla_nest_start_noflag(E1, E2)
      
      @@ expression E1, E2; @@
      -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
      +nla_nest_start(E1, E2)
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae0be8de
  4. 25 4月, 2019 1 次提交
    • V
      net: sched: flower: refactor reoffload for concurrent access · c049d56e
      Vlad Buslov 提交于
      Recent changes that introduced unlocked flower did not properly account for
      case when reoffload is initiated concurrently with filter updates. To fix
      the issue, extend flower with 'hw_filters' list that is used to store
      filters that don't have 'skip_hw' flag set. Filter is added to the list
      when it is inserted to hardware and only removed from it after being
      unoffloaded from all drivers that parent block is attached to. This ensures
      that concurrent reoffload can still access filter that is being deleted and
      prevents race condition when driver callback can be removed when filter is
      no longer accessible trough idr, but is still present in hardware.
      
      Refactor fl_change() to respect new filter reference counter and to release
      filter reference with __fl_put() in case of error, instead of directly
      deallocating filter memory. This allows for concurrent access to filter
      from fl_reoffload() and protects it with reference counting. Refactor
      fl_reoffload() to iterate over hw_filters list instead of idr. Implement
      fl_get_next_hw_filter() helper function that is used to iterate over
      hw_filters list with reference counting and skips filters that are being
      concurrently deleted.
      
      Fixes: 92149190 ("net: sched: flower: set unlocked flag for flower proto ops")
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c049d56e
  5. 12 4月, 2019 2 次提交
    • V
      net: sched: flower: fix filter net reference counting · 9994677c
      Vlad Buslov 提交于
      Fix net reference counting in fl_change() and remove redundant call to
      tcf_exts_get_net() from __fl_delete(). __fl_put() already tries to get net
      before releasing exts and deallocating a filter, so this code caused flower
      classifier to obtain net twice per filter that is being deleted.
      
      Implementation of __fl_delete() called tcf_exts_get_net() to pass its
      result as 'async' flag to fl_mask_put(). However, 'async' flag is redundant
      and only complicates fl_mask_put() implementation. This functionality seems
      to be copied from filter cleanup code, where it was added by Cong with
      following explanation:
      
          This patchset tries to fix the race between call_rcu() and
          cleanup_net() again. Without holding the netns refcnt the
          tc_action_net_exit() in netns workqueue could be called before
          filter destroy works in tc filter workqueue. This patchset
          moves the netns refcnt from tc actions to tcf_exts, without
          breaking per-netns tc actions.
      
      This doesn't apply to flower mask, which doesn't call any tc action code
      during cleanup. Simplify fl_mask_put() by removing the flag parameter and
      always use tcf_queue_work() to free mask objects.
      
      Fixes: 06177558 ("net: sched: flower: introduce reference counting for filters")
      Fixes: 1f17f774 ("net: sched: flower: insert filter to ht before offloading it to hw")
      Fixes: 05cd271f ("cls_flower: Support multiple masks per priority")
      Reported-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9994677c
    • V
      net: sched: flower: use correct ht function to prevent duplicates · 9e35552a
      Vlad Buslov 提交于
      Implementation of function rhashtable_insert_fast() check if its internal
      helper function __rhashtable_insert_fast() returns non-NULL pointer and
      seemingly return -EEXIST in such case. However, since
      __rhashtable_insert_fast() is called with NULL key pointer, it never
      actually checks for duplicates, which means that -EEXIST is never returned
      to the user. Use rhashtable_lookup_insert_fast() hash table API instead. In
      order to verify that it works as expected and prevent the problem from
      happening in future, extend tc-tests with new test that verifies that no
      new filters with existing key can be inserted to flower classifier.
      
      Fixes: 1f17f774 ("net: sched: flower: insert filter to ht before offloading it to hw")
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9e35552a
  6. 08 4月, 2019 1 次提交
    • V
      net: sched: flower: insert filter to ht before offloading it to hw · 1f17f774
      Vlad Buslov 提交于
      John reports:
      
      Recent refactoring of fl_change aims to use the classifier spinlock to
      avoid the need for rtnl lock. In doing so, the fl_hw_replace_filer()
      function was moved to before the lock is taken. This can create problems
      for drivers if duplicate filters are created (commmon in ovs tc offload
      due to filters being triggered by user-space matches).
      
      Drivers registered for such filters will now receive multiple copies of
      the same rule, each with a different cookie value. This means that the
      drivers would need to do a full match field lookup to determine
      duplicates, repeating work that will happen in flower __fl_lookup().
      Currently, drivers do not expect to receive duplicate filters.
      
      To fix this, verify that filter with same key is not present in flower
      classifier hash table and insert the new filter to the flower hash table
      before offloading it to hardware. Implement helper function
      fl_ht_insert_unique() to atomically verify/insert a filter.
      
      This change makes filter visible to fast path at the beginning of
      fl_change() function, which means it can no longer be freed directly in
      case of error. Refactor fl_change() error handling code to deallocate the
      filter with rcu timeout.
      
      Fixes: 620da486 ("net: sched: flower: refactor fl_change")
      Reported-by: NJohn Hurley <john.hurley@netronome.com>
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f17f774
  7. 05 4月, 2019 1 次提交
  8. 22 3月, 2019 12 次提交
  9. 07 3月, 2019 1 次提交
    • V
      net: sched: flower: insert new filter to idr after setting its mask · ecb3dea4
      Vlad Buslov 提交于
      When adding new filter to flower classifier, fl_change() inserts it to
      handle_idr before initializing filter extensions and assigning it a mask.
      Normally this ordering doesn't matter because all flower classifier ops
      callbacks assume rtnl lock protection. However, when filter has an action
      that doesn't have its kernel module loaded, rtnl lock is released before
      call to request_module(). During this time the filter can be accessed bu
      concurrent task before its initialization is completed, which can lead to a
      crash.
      
      Example case of NULL pointer dereference in concurrent dump:
      
      Task 1                           Task 2
      
      tc_new_tfilter()
       fl_change()
        idr_alloc_u32(fnew)
        fl_set_parms()
         tcf_exts_validate()
          tcf_action_init()
           tcf_action_init_1()
            rtnl_unlock()
            request_module()
            ...                        rtnl_lock()
            				 tc_dump_tfilter()
            				  tcf_chain_dump()
      				   fl_walk()
      				    idr_get_next_ul()
      				    tcf_node_dump()
      				     tcf_fill_node()
      				      fl_dump()
      				       mask = &f->mask->key; <- NULL ptr
            rtnl_lock()
      
      Extension initialization and mask assignment don't depend on fnew->handle
      that is allocated by idr_alloc_u32(). Move idr allocation code after action
      creation and mask assignment in fl_change() to prevent concurrent access
      to not fully initialized filter when rtnl lock is released to load action
      module.
      
      Fixes: 01683a14 ("net: sched: refactor flower walk to iterate over idr")
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: NRoi Dayan <roid@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ecb3dea4
  10. 23 2月, 2019 1 次提交
  11. 14 2月, 2019 1 次提交
  12. 13 2月, 2019 2 次提交
  13. 07 2月, 2019 5 次提交
  14. 05 2月, 2019 1 次提交
  15. 18 1月, 2019 1 次提交
  16. 20 12月, 2018 1 次提交
  17. 15 12月, 2018 1 次提交
  18. 10 12月, 2018 1 次提交
  19. 16 11月, 2018 1 次提交
    • A
      net: sched: cls_flower: Classify packets using port ranges · 5c72299f
      Amritha Nambiar 提交于
      Added support in tc flower for filtering based on port ranges.
      
      Example:
      1. Match on a port range:
      -------------------------
      $ tc filter add dev enp4s0 protocol ip parent ffff:\
        prio 1 flower ip_proto tcp dst_port range 20-30 skip_hw\
        action drop
      
      $ tc -s filter show dev enp4s0 parent ffff:
      filter protocol ip pref 1 flower chain 0
      filter protocol ip pref 1 flower chain 0 handle 0x1
        eth_type ipv4
        ip_proto tcp
        dst_port range 20-30
        skip_hw
        not_in_hw
              action order 1: gact action drop
               random type none pass val 0
               index 1 ref 1 bind 1 installed 85 sec used 3 sec
              Action statistics:
              Sent 460 bytes 10 pkt (dropped 10, overlimits 0 requeues 0)
              backlog 0b 0p requeues 0
      
      2. Match on IP address and port range:
      --------------------------------------
      $ tc filter add dev enp4s0 protocol ip parent ffff:\
        prio 1 flower dst_ip 192.168.1.1 ip_proto tcp dst_port range 100-200\
        skip_hw action drop
      
      $ tc -s filter show dev enp4s0 parent ffff:
      filter protocol ip pref 1 flower chain 0 handle 0x2
        eth_type ipv4
        ip_proto tcp
        dst_ip 192.168.1.1
        dst_port range 100-200
        skip_hw
        not_in_hw
              action order 1: gact action drop
               random type none pass val 0
               index 2 ref 1 bind 1 installed 58 sec used 2 sec
              Action statistics:
              Sent 920 bytes 20 pkt (dropped 20, overlimits 0 requeues 0)
              backlog 0b 0p requeues 0
      
      v4:
      1. Added condition before setting port key.
      2. Organized setting and dumping port range keys into functions
         and added validation of input range.
      
      v3:
      1. Moved new fields in UAPI enum to the end of enum.
      2. Removed couple of empty lines.
      
      v2:
      Addressed Jiri's comments:
      1. Added separate functions for dst and src comparisons.
      2. Removed endpoint enum.
      3. Added new bit TCA_FLOWER_FLAGS_RANGE to decide normal/range
        lookup.
      4. Cleaned up fl_lookup function.
      Signed-off-by: NAmritha Nambiar <amritha.nambiar@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5c72299f
  20. 11 11月, 2018 1 次提交
  21. 05 10月, 2018 1 次提交
    • C
      net_sched: convert idrinfo->lock from spinlock to a mutex · 95278dda
      Cong Wang 提交于
      In commit ec3ed293 ("net_sched: change tcf_del_walker() to take idrinfo->lock")
      we move fl_hw_destroy_tmplt() to a workqueue to avoid blocking
      with the spinlock held. Unfortunately, this causes a lot of
      troubles here:
      
      1. tcf_chain_destroy() could be called right after we queue the work
         but before the work runs. This is a use-after-free.
      
      2. The chain refcnt is already 0, we can't even just hold it again.
         We can check refcnt==1 but it is ugly.
      
      3. The chain with refcnt 0 is still visible in its block, which means
         it could be still found and used!
      
      4. The block has a refcnt too, we can't hold it without introducing a
         proper API either.
      
      We can make it working but the end result is ugly. Instead of wasting
      time on reviewing it, let's just convert the troubling spinlock to
      a mutex, which allows us to use non-atomic allocations too.
      
      Fixes: ec3ed293 ("net_sched: change tcf_del_walker() to take idrinfo->lock")
      Reported-by: NIdo Schimmel <idosch@idosch.org>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Vlad Buslov <vladbu@mellanox.com>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Tested-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95278dda
  22. 21 9月, 2018 1 次提交
    • V
      net_sched: change tcf_del_walker() to take idrinfo->lock · ec3ed293
      Vlad Buslov 提交于
      Action API was changed to work with actions and action_idr in concurrency
      safe manner, however tcf_del_walker() still uses actions without taking a
      reference or idrinfo->lock first, and deletes them directly, disregarding
      possible concurrent delete.
      
      Change tcf_del_walker() to take idrinfo->lock while iterating over actions
      and use new tcf_idr_release_unsafe() to release them while holding the
      lock.
      
      And the blocking function fl_hw_destroy_tmplt() could be called when we
      put a filter chain, so defer it to a work queue.
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      [xiyou.wangcong@gmail.com: heavily modify the code and changelog]
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec3ed293