1. 31 3月, 2020 1 次提交
    • J
      net: sched: expose HW stats types per action used by drivers · 93a129eb
      Jiri Pirko 提交于
      It may be up to the driver (in case ANY HW stats is passed) to select
      which type of HW stats he is going to use. Add an infrastructure to
      expose this information to user.
      
      $ tc filter add dev enp3s0np1 ingress proto ip handle 1 pref 1 flower dst_ip 192.168.1.1 action drop
      $ tc -s filter show dev enp3s0np1 ingress
      filter protocol ip pref 1 flower chain 0
      filter protocol ip pref 1 flower chain 0 handle 0x1
        eth_type ipv4
        dst_ip 192.168.1.1
        in_hw in_hw_count 2
              action order 1: gact action drop
               random type none pass val 0
               index 1 ref 1 bind 1 installed 10 sec used 10 sec
              Action statistics:
              Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
              backlog 0b 0p requeues 0
              used_hw_stats immediate     <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93a129eb
  2. 27 3月, 2020 3 次提交
  3. 18 2月, 2020 2 次提交
  4. 14 2月, 2020 1 次提交
  5. 27 1月, 2020 1 次提交
    • C
      net_sched: fix ops->bind_class() implementations · 2e24cd75
      Cong Wang 提交于
      The current implementations of ops->bind_class() are merely
      searching for classid and updating class in the struct tcf_result,
      without invoking either of cl_ops->bind_tcf() or
      cl_ops->unbind_tcf(). This breaks the design of them as qdisc's
      like cbq use them to count filters too. This is why syzbot triggered
      the warning in cbq_destroy_class().
      
      In order to fix this, we have to call cl_ops->bind_tcf() and
      cl_ops->unbind_tcf() like the filter binding path. This patch does
      so by refactoring out two helper functions __tcf_bind_filter()
      and __tcf_unbind_filter(), which are lockless and accept a Qdisc
      pointer, then teaching each implementation to call them correctly.
      
      Note, we merely pass the Qdisc pointer as an opaque pointer to
      each filter, they only need to pass it down to the helper
      functions without understanding it at all.
      
      Fixes: 07d79fc7 ("net_sched: add reverse binding for tc class")
      Reported-and-tested-by: syzbot+0a0596220218fcb603a8@syzkaller.appspotmail.com
      Reported-and-tested-by: syzbot+63bdb6006961d8c917c6@syzkaller.appspotmail.com
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e24cd75
  6. 31 12月, 2019 1 次提交
    • D
      net/sched: add delete_empty() to filters and use it in cls_flower · a5b72a08
      Davide Caratti 提交于
      Revert "net/sched: cls_u32: fix refcount leak in the error path of
      u32_change()", and fix the u32 refcount leak in a more generic way that
      preserves the semantic of rule dumping.
      On tc filters that don't support lockless insertion/removal, there is no
      need to guard against concurrent insertion when a removal is in progress.
      Therefore, for most of them we can avoid a full walk() when deleting, and
      just decrease the refcount, like it was done on older Linux kernels.
      This fixes situations where walk() was wrongly detecting a non-empty
      filter, like it happened with cls_u32 in the error path of change(), thus
      leading to failures in the following tdc selftests:
      
       6aa7: (filter, u32) Add/Replace u32 with source match and invalid indev
       6658: (filter, u32) Add/Replace u32 with custom hash table and invalid handle
       74c2: (filter, u32) Add/Replace u32 filter with invalid hash table id
      
      On cls_flower, and on (future) lockless filters, this check is necessary:
      move all the check_empty() logic in a callback so that each filter
      can have its own implementation. For cls_flower, it's sufficient to check
      if no IDRs have been allocated.
      
      This reverts commit 275c44aa.
      
      Changes since v1:
       - document the need for delete_empty() when TCF_PROTO_OPS_DOIT_UNLOCKED
         is used, thanks to Vlad Buslov
       - implement delete_empty() without doing fl_walk(), thanks to Vlad Buslov
       - squash revert and new fix in a single patch, to be nice with bisect
         tests that run tdc on u32 filter, thanks to Dave Miller
      
      Fixes: 275c44aa ("net/sched: cls_u32: fix refcount leak in the error path of u32_change()")
      Fixes: 6676d5e4 ("net: sched: set dedicated tcf_walker flag when tp is empty")
      Suggested-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Suggested-by: NVlad Buslov <vladbu@mellanox.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Reviewed-by: NVlad Buslov <vladbu@mellanox.com>
      Tested-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a5b72a08
  7. 10 12月, 2019 1 次提交
  8. 04 12月, 2019 1 次提交
    • Y
      cls_flower: Fix the behavior using port ranges with hw-offload · 8ffb055b
      Yoshiki Komachi 提交于
      The recent commit 5c72299f ("net: sched: cls_flower: Classify
      packets using port ranges") had added filtering based on port ranges
      to tc flower. However the commit missed necessary changes in hw-offload
      code, so the feature gave rise to generating incorrect offloaded flow
      keys in NIC.
      
      One more detailed example is below:
      
      $ tc qdisc add dev eth0 ingress
      $ tc filter add dev eth0 ingress protocol ip flower ip_proto tcp \
        dst_port 100-200 action drop
      
      With the setup above, an exact match filter with dst_port == 0 will be
      installed in NIC by hw-offload. IOW, the NIC will have a rule which is
      equivalent to the following one.
      
      $ tc qdisc add dev eth0 ingress
      $ tc filter add dev eth0 ingress protocol ip flower ip_proto tcp \
        dst_port 0 action drop
      
      The behavior was caused by the flow dissector which extracts packet
      data into the flow key in the tc flower. More specifically, regardless
      of exact match or specified port ranges, fl_init_dissector() set the
      FLOW_DISSECTOR_KEY_PORTS flag in struct flow_dissector to extract port
      numbers from skb in skb_flow_dissect() called by fl_classify(). Note
      that device drivers received the same struct flow_dissector object as
      used in skb_flow_dissect(). Thus, offloaded drivers could not identify
      which of these is used because the FLOW_DISSECTOR_KEY_PORTS flag was
      set to struct flow_dissector in either case.
      
      This patch adds the new FLOW_DISSECTOR_KEY_PORTS_RANGE flag and the new
      tp_range field in struct fl_flow_key to recognize which filters are applied
      to offloaded drivers. At this point, when filters based on port ranges
      passed to drivers, drivers return the EOPNOTSUPP error because they do
      not support the feature (the newly created FLOW_DISSECTOR_KEY_PORTS_RANGE
      flag).
      
      Fixes: 5c72299f ("net: sched: cls_flower: Classify packets using port ranges")
      Signed-off-by: NYoshiki Komachi <komachi.yoshiki@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ffb055b
  9. 22 11月, 2019 2 次提交
    • X
      net: sched: allow flower to match erspan options · 79b1011c
      Xin Long 提交于
      This patch is to allow matching options in erspan.
      
      The options can be described in the form:
      VER:INDEX:DIR:HWID/VER:INDEX_MASK:DIR_MASK:HWID_MASK.
      When ver is set to 1, index will be applied while dir
      and hwid will be ignored, and when ver is set to 2,
      dir and hwid will be used while index will be ignored.
      
      Different from geneve, only one option can be set. And
      also, geneve options, vxlan options or erspan options
      can't be set at the same time.
      
        # ip link add name erspan1 type erspan external
        # tc qdisc add dev erspan1 ingress
        # tc filter add dev erspan1 protocol ip parent ffff: \
            flower \
              enc_src_ip 10.0.99.192 \
              enc_dst_ip 10.0.99.193 \
              enc_key_id 11 \
              erspan_opts 1:12:0:0/1:ffff:0:0 \
              ip_proto udp \
              action mirred egress redirect dev eth0
      
      v1->v2:
        - improve some err msgs of extack.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79b1011c
    • X
      net: sched: allow flower to match vxlan options · d8f9dfae
      Xin Long 提交于
      This patch is to allow matching gbp option in vxlan.
      
      The options can be described in the form GBP/GBP_MASK,
      where GBP is represented as a 32bit hexadecimal value.
      Different from geneve, only one option can be set. And
      also, geneve options and vxlan options can't be set at
      the same time.
      
        # ip link add name vxlan0 type vxlan dstport 0 external
        # tc qdisc add dev vxlan0 ingress
        # tc filter add dev vxlan0 protocol ip parent ffff: \
            flower \
              enc_src_ip 10.0.99.192 \
              enc_dst_ip 10.0.99.193 \
              enc_key_id 11 \
              vxlan_opts 01020304/ffffffff \
              ip_proto udp \
              action mirred egress redirect dev eth0
      
      v1->v2:
        - add .strict_start_type for enc_opts_policy as Jakub noticed.
        - use Duplicate instead of Wrong in err msg for extack as Jakub
          suggested.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d8f9dfae
  10. 27 8月, 2019 5 次提交
    • V
      net: sched: flower: don't take rtnl lock for cls hw offloads API · 918190f5
      Vlad Buslov 提交于
      Don't manually take rtnl lock in flower classifier before calling cls
      hardware offloads API. Instead, pass rtnl lock status via 'rtnl_held'
      parameter.
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      918190f5
    • V
      net: sched: take reference to action dev before calling offloads · 5a6ff4b1
      Vlad Buslov 提交于
      In order to remove dependency on rtnl lock when calling hardware offload
      API, take reference to action mirred dev when initializing flow_action
      structure in tc_setup_flow_action(). Implement function
      tc_cleanup_flow_action(), use it to release the device after hardware
      offload API is done using it.
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a6ff4b1
    • V
      net: sched: take rtnl lock in tc_setup_flow_action() · 9838b20a
      Vlad Buslov 提交于
      In order to allow using new flow_action infrastructure from unlocked
      classifiers, modify tc_setup_flow_action() to accept new 'rtnl_held'
      argument. Take rtnl lock before accessing tc_action data. This is necessary
      to protect from concurrent action replace.
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9838b20a
    • V
      net: sched: notify classifier on successful offload add/delete · a449a3e7
      Vlad Buslov 提交于
      To remove dependency on rtnl lock, extend classifier ops with new
      ops->hw_add() and ops->hw_del() callbacks. Call them from cls API while
      holding cb_lock every time filter if successfully added to or deleted from
      hardware.
      
      Implement the new API in flower classifier. Use it to manage hw_filters
      list under cb_lock protection, instead of relying on rtnl lock to
      synchronize with concurrent fl_reoffload() call.
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a449a3e7
    • V
      net: sched: refactor block offloads counter usage · 40119211
      Vlad Buslov 提交于
      Without rtnl lock protection filters can no longer safely manage block
      offloads counter themselves. Refactor cls API to protect block offloadcnt
      with tcf_block->cb_lock that is already used to protect driver callback
      list and nooffloaddevcnt counter. The counter can be modified by concurrent
      tasks by new functions that execute block callbacks (which is safe with
      previous patch that changed its type to atomic_t), however, block
      bind/unbind code that checks the counter value takes cb_lock in write mode
      to exclude any concurrent modifications. This approach prevents race
      conditions between bind/unbind and callback execution code but allows for
      concurrency for tc rule update path.
      
      Move block offload counter, filter in hardware counter and filter flags
      management from classifiers into cls hardware offloads API. Make functions
      tcf_block_offload_{inc|dec}() and tc_cls_offload_cnt_update() to be cls API
      private. Implement following new cls API to be used instead:
      
        tc_setup_cb_add() - non-destructive filter add. If filter that wasn't
        already in hardware is successfully offloaded, increment block offloads
        counter, set filter in hardware counter and flag. On failure, previously
        offloaded filter is considered to be intact and offloads counter is not
        decremented.
      
        tc_setup_cb_replace() - destructive filter replace. Release existing
        filter block offload counter and reset its in hardware counter and flag.
        Set new filter in hardware counter and flag. On failure, previously
        offloaded filter is considered to be destroyed and offload counter is
        decremented.
      
        tc_setup_cb_destroy() - filter destroy. Unconditionally decrement block
        offloads counter.
      
        tc_setup_cb_reoffload() - reoffload filter to single cb. Execute cb() and
        call tc_cls_offload_cnt_update() if cb() didn't return an error.
      
      Refactor all offload-capable classifiers to atomically offload filters to
      hardware, change block offload counter, and set filter in hardware counter
      and flag by means of the new cls API functions.
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40119211
  11. 20 7月, 2019 1 次提交
  12. 10 7月, 2019 2 次提交
  13. 02 7月, 2019 1 次提交
  14. 19 6月, 2019 1 次提交
  15. 16 6月, 2019 1 次提交
  16. 15 6月, 2019 1 次提交
  17. 31 5月, 2019 1 次提交
  18. 08 5月, 2019 1 次提交
  19. 06 5月, 2019 1 次提交
  20. 28 4月, 2019 2 次提交
    • J
      netlink: make validation more configurable for future strictness · 8cb08174
      Johannes Berg 提交于
      We currently have two levels of strict validation:
      
       1) liberal (default)
           - undefined (type >= max) & NLA_UNSPEC attributes accepted
           - attribute length >= expected accepted
           - garbage at end of message accepted
       2) strict (opt-in)
           - NLA_UNSPEC attributes accepted
           - attribute length >= expected accepted
      
      Split out parsing strictness into four different options:
       * TRAILING     - check that there's no trailing data after parsing
                        attributes (in message or nested)
       * MAXTYPE      - reject attrs > max known type
       * UNSPEC       - reject attributes with NLA_UNSPEC policy entries
       * STRICT_ATTRS - strictly validate attribute size
      
      The default for future things should be *everything*.
      The current *_strict() is a combination of TRAILING and MAXTYPE,
      and is renamed to _deprecated_strict().
      The current regular parsing has none of this, and is renamed to
      *_parse_deprecated().
      
      Additionally it allows us to selectively set one of the new flags
      even on old policies. Notably, the UNSPEC flag could be useful in
      this case, since it can be arranged (by filling in the policy) to
      not be an incompatible userspace ABI change, but would then going
      forward prevent forgetting attribute entries. Similar can apply
      to the POLICY flag.
      
      We end up with the following renames:
       * nla_parse           -> nla_parse_deprecated
       * nla_parse_strict    -> nla_parse_deprecated_strict
       * nlmsg_parse         -> nlmsg_parse_deprecated
       * nlmsg_parse_strict  -> nlmsg_parse_deprecated_strict
       * nla_parse_nested    -> nla_parse_nested_deprecated
       * nla_validate_nested -> nla_validate_nested_deprecated
      
      Using spatch, of course:
          @@
          expression TB, MAX, HEAD, LEN, POL, EXT;
          @@
          -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
          +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)
      
          @@
          expression NLH, HDRLEN, TB, MAX, POL, EXT;
          @@
          -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
          +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)
      
          @@
          expression NLH, HDRLEN, TB, MAX, POL, EXT;
          @@
          -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
          +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
      
          @@
          expression TB, MAX, NLA, POL, EXT;
          @@
          -nla_parse_nested(TB, MAX, NLA, POL, EXT)
          +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)
      
          @@
          expression START, MAX, POL, EXT;
          @@
          -nla_validate_nested(START, MAX, POL, EXT)
          +nla_validate_nested_deprecated(START, MAX, POL, EXT)
      
          @@
          expression NLH, HDRLEN, MAX, POL, EXT;
          @@
          -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
          +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)
      
      For this patch, don't actually add the strict, non-renamed versions
      yet so that it breaks compile if I get it wrong.
      
      Also, while at it, make nla_validate and nla_parse go down to a
      common __nla_validate_parse() function to avoid code duplication.
      
      Ultimately, this allows us to have very strict validation for every
      new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
      next patch, while existing things will continue to work as is.
      
      In effect then, this adds fully strict validation for any new command.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8cb08174
    • M
      netlink: make nla_nest_start() add NLA_F_NESTED flag · ae0be8de
      Michal Kubecek 提交于
      Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
      netlink based interfaces (including recently added ones) are still not
      setting it in kernel generated messages. Without the flag, message parsers
      not aware of attribute semantics (e.g. wireshark dissector or libmnl's
      mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
      the structure of their contents.
      
      Unfortunately we cannot just add the flag everywhere as there may be
      userspace applications which check nlattr::nla_type directly rather than
      through a helper masking out the flags. Therefore the patch renames
      nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
      as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
      are rewritten to use nla_nest_start().
      
      Except for changes in include/net/netlink.h, the patch was generated using
      this semantic patch:
      
      @@ expression E1, E2; @@
      -nla_nest_start(E1, E2)
      +nla_nest_start_noflag(E1, E2)
      
      @@ expression E1, E2; @@
      -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
      +nla_nest_start(E1, E2)
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae0be8de
  21. 25 4月, 2019 1 次提交
    • V
      net: sched: flower: refactor reoffload for concurrent access · c049d56e
      Vlad Buslov 提交于
      Recent changes that introduced unlocked flower did not properly account for
      case when reoffload is initiated concurrently with filter updates. To fix
      the issue, extend flower with 'hw_filters' list that is used to store
      filters that don't have 'skip_hw' flag set. Filter is added to the list
      when it is inserted to hardware and only removed from it after being
      unoffloaded from all drivers that parent block is attached to. This ensures
      that concurrent reoffload can still access filter that is being deleted and
      prevents race condition when driver callback can be removed when filter is
      no longer accessible trough idr, but is still present in hardware.
      
      Refactor fl_change() to respect new filter reference counter and to release
      filter reference with __fl_put() in case of error, instead of directly
      deallocating filter memory. This allows for concurrent access to filter
      from fl_reoffload() and protects it with reference counting. Refactor
      fl_reoffload() to iterate over hw_filters list instead of idr. Implement
      fl_get_next_hw_filter() helper function that is used to iterate over
      hw_filters list with reference counting and skips filters that are being
      concurrently deleted.
      
      Fixes: 92149190 ("net: sched: flower: set unlocked flag for flower proto ops")
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c049d56e
  22. 12 4月, 2019 2 次提交
    • V
      net: sched: flower: fix filter net reference counting · 9994677c
      Vlad Buslov 提交于
      Fix net reference counting in fl_change() and remove redundant call to
      tcf_exts_get_net() from __fl_delete(). __fl_put() already tries to get net
      before releasing exts and deallocating a filter, so this code caused flower
      classifier to obtain net twice per filter that is being deleted.
      
      Implementation of __fl_delete() called tcf_exts_get_net() to pass its
      result as 'async' flag to fl_mask_put(). However, 'async' flag is redundant
      and only complicates fl_mask_put() implementation. This functionality seems
      to be copied from filter cleanup code, where it was added by Cong with
      following explanation:
      
          This patchset tries to fix the race between call_rcu() and
          cleanup_net() again. Without holding the netns refcnt the
          tc_action_net_exit() in netns workqueue could be called before
          filter destroy works in tc filter workqueue. This patchset
          moves the netns refcnt from tc actions to tcf_exts, without
          breaking per-netns tc actions.
      
      This doesn't apply to flower mask, which doesn't call any tc action code
      during cleanup. Simplify fl_mask_put() by removing the flag parameter and
      always use tcf_queue_work() to free mask objects.
      
      Fixes: 06177558 ("net: sched: flower: introduce reference counting for filters")
      Fixes: 1f17f774 ("net: sched: flower: insert filter to ht before offloading it to hw")
      Fixes: 05cd271f ("cls_flower: Support multiple masks per priority")
      Reported-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9994677c
    • V
      net: sched: flower: use correct ht function to prevent duplicates · 9e35552a
      Vlad Buslov 提交于
      Implementation of function rhashtable_insert_fast() check if its internal
      helper function __rhashtable_insert_fast() returns non-NULL pointer and
      seemingly return -EEXIST in such case. However, since
      __rhashtable_insert_fast() is called with NULL key pointer, it never
      actually checks for duplicates, which means that -EEXIST is never returned
      to the user. Use rhashtable_lookup_insert_fast() hash table API instead. In
      order to verify that it works as expected and prevent the problem from
      happening in future, extend tc-tests with new test that verifies that no
      new filters with existing key can be inserted to flower classifier.
      
      Fixes: 1f17f774 ("net: sched: flower: insert filter to ht before offloading it to hw")
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9e35552a
  23. 08 4月, 2019 1 次提交
    • V
      net: sched: flower: insert filter to ht before offloading it to hw · 1f17f774
      Vlad Buslov 提交于
      John reports:
      
      Recent refactoring of fl_change aims to use the classifier spinlock to
      avoid the need for rtnl lock. In doing so, the fl_hw_replace_filer()
      function was moved to before the lock is taken. This can create problems
      for drivers if duplicate filters are created (commmon in ovs tc offload
      due to filters being triggered by user-space matches).
      
      Drivers registered for such filters will now receive multiple copies of
      the same rule, each with a different cookie value. This means that the
      drivers would need to do a full match field lookup to determine
      duplicates, repeating work that will happen in flower __fl_lookup().
      Currently, drivers do not expect to receive duplicate filters.
      
      To fix this, verify that filter with same key is not present in flower
      classifier hash table and insert the new filter to the flower hash table
      before offloading it to hardware. Implement helper function
      fl_ht_insert_unique() to atomically verify/insert a filter.
      
      This change makes filter visible to fast path at the beginning of
      fl_change() function, which means it can no longer be freed directly in
      case of error. Refactor fl_change() error handling code to deallocate the
      filter with rcu timeout.
      
      Fixes: 620da486 ("net: sched: flower: refactor fl_change")
      Reported-by: NJohn Hurley <john.hurley@netronome.com>
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f17f774
  24. 05 4月, 2019 1 次提交
  25. 22 3月, 2019 5 次提交