1. 23 3月, 2021 1 次提交
    • V
      net/sched: cls_flower: use ntohs for struct flow_dissector_key_ports · 6215afcb
      Vladimir Oltean 提交于
      A make W=1 build complains that:
      
      net/sched/cls_flower.c:214:20: warning: cast from restricted __be16
      net/sched/cls_flower.c:214:20: warning: incorrect type in argument 1 (different base types)
      net/sched/cls_flower.c:214:20:    expected unsigned short [usertype] val
      net/sched/cls_flower.c:214:20:    got restricted __be16 [usertype] dst
      
      This is because we use htons on struct flow_dissector_key_ports members
      src and dst, which are defined as __be16, so they are already in network
      byte order, not host. The byte swap function for the other direction
      should have been used.
      
      Because htons and ntohs do the same thing (either both swap, or none
      does), this change has no functional effect except to silence the
      warnings.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6215afcb
  2. 24 2月, 2021 1 次提交
  3. 11 2月, 2021 1 次提交
  4. 30 1月, 2021 1 次提交
  5. 21 1月, 2021 1 次提交
  6. 16 1月, 2021 1 次提交
  7. 10 12月, 2020 1 次提交
  8. 15 9月, 2020 2 次提交
  9. 25 7月, 2020 1 次提交
  10. 04 7月, 2020 1 次提交
    • T
      sched: consistently handle layer3 header accesses in the presence of VLANs · d7bf2ebe
      Toke Høiland-Jørgensen 提交于
      There are a couple of places in net/sched/ that check skb->protocol and act
      on the value there. However, in the presence of VLAN tags, the value stored
      in skb->protocol can be inconsistent based on whether VLAN acceleration is
      enabled. The commit quoted in the Fixes tag below fixed the users of
      skb->protocol to use a helper that will always see the VLAN ethertype.
      
      However, most of the callers don't actually handle the VLAN ethertype, but
      expect to find the IP header type in the protocol field. This means that
      things like changing the ECN field, or parsing diffserv values, stops
      working if there's a VLAN tag, or if there are multiple nested VLAN
      tags (QinQ).
      
      To fix this, change the helper to take an argument that indicates whether
      the caller wants to skip the VLAN tags or not. When skipping VLAN tags, we
      make sure to skip all of them, so behaviour is consistent even in QinQ
      mode.
      
      To make the helper usable from the ECN code, move it to if_vlan.h instead
      of pkt_sched.h.
      
      v3:
      - Remove empty lines
      - Move vlan variable definitions inside loop in skb_protocol()
      - Also use skb_protocol() helper in IP{,6}_ECN_decapsulate() and
        bpf_skb_ecn_set_ce()
      
      v2:
      - Use eth_type_vlan() helper in skb_protocol()
      - Also fix code that reads skb->protocol directly
      - Change a couple of 'if/else if' statements to switch constructs to avoid
        calling the helper twice
      Reported-by: NIlya Ponetayev <i.ponetaev@ndmsystems.com>
      Fixes: d8b9605d ("net: sched: fix skb->protocol use in case of accelerated vlan path")
      Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7bf2ebe
  11. 20 6月, 2020 1 次提交
  12. 02 6月, 2020 2 次提交
    • G
      cls_flower: remove mpls_opts_policy · 4e4f4ce6
      Guillaume Nault 提交于
      Compiling with W=1 gives the following warning:
      net/sched/cls_flower.c:731:1: warning: ‘mpls_opts_policy’ defined but not used [-Wunused-const-variable=]
      
      The TCA_FLOWER_KEY_MPLS_OPTS contains a list of
      TCA_FLOWER_KEY_MPLS_OPTS_LSE. Therefore, the attributes all have the
      same type and we can't parse the list with nla_parse*() and have the
      attributes validated automatically using an nla_policy.
      
      fl_set_key_mpls_opts() properly verifies that all attributes in the
      list are TCA_FLOWER_KEY_MPLS_OPTS_LSE. Then fl_set_key_mpls_lse()
      uses nla_parse_nested() on all these attributes, thus verifying that
      they have the NLA_F_NESTED flag. So we can safely drop the
      mpls_opts_policy.
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Signed-off-by: NGuillaume Nault <gnault@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e4f4ce6
    • A
      flow_dissector: work around stack frame size warning · 0af413bd
      Arnd Bergmann 提交于
      The fl_flow_key structure is around 500 bytes, so having two of them
      on the stack in one function now exceeds the warning limit after an
      otherwise correct change:
      
      net/sched/cls_flower.c:298:12: error: stack frame size of 1056 bytes in function 'fl_classify' [-Werror,-Wframe-larger-than=]
      
      I suspect the fl_classify function could be reworked to only have one
      of them on the stack and modify it in place, but I could not work out
      how to do that.
      
      As a somewhat hacky workaround, move one of them into an out-of-line
      function to reduce its scope. This does not necessarily reduce the stack
      usage of the outer function, but at least the second copy is removed
      from the stack during most of it and does not add up to whatever is
      called from there.
      
      I now see 552 bytes of stack usage for fl_classify(), plus 528 bytes
      for fl_mask_lookup().
      
      Fixes: 58cff782 ("flow_dissector: Parse multiple MPLS Label Stack Entries")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NGuillaume Nault <gnault@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0af413bd
  13. 27 5月, 2020 2 次提交
    • G
      cls_flower: Support filtering on multiple MPLS Label Stack Entries · 61aec25a
      Guillaume Nault 提交于
      With struct flow_dissector_key_mpls now recording the first
      FLOW_DIS_MPLS_MAX labels, we can extend Flower to filter on any of
      these LSEs independently.
      
      In order to avoid creating new netlink attributes for every possible
      depth, let's define a new TCA_FLOWER_KEY_MPLS_OPTS nested attribute
      that contains the list of LSEs to match. Each LSE is represented by
      another attribute, TCA_FLOWER_KEY_MPLS_OPTS_LSE, which then contains
      the attributes representing the depth and the MPLS fields to match at
      this depth (label, TTL, etc.).
      
      For each MPLS field, the mask is always set to all-ones, as this is
      what the original API did. We could allow user configurable masks in
      the future if there is demand for more flexibility.
      
      The new API also allows to only specify an LSE depth. In that case,
      Flower only verifies that the MPLS label stack depth is greater or
      equal to the provided depth (that is, an LSE exists at this depth).
      
      Filters that only match on one (or more) fields of the first LSE are
      dumped using the old netlink attributes, to avoid confusing user space
      programs that don't understand the new API.
      Signed-off-by: NGuillaume Nault <gnault@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      61aec25a
    • G
      flow_dissector: Parse multiple MPLS Label Stack Entries · 58cff782
      Guillaume Nault 提交于
      The current MPLS dissector only parses the first MPLS Label Stack
      Entry (second LSE can be parsed too, but only to set a key_id).
      
      This patch adds the possibility to parse several LSEs by making
      __skb_flow_dissect_mpls() return FLOW_DISSECT_RET_PROTO_AGAIN as long
      as the Bottom Of Stack bit hasn't been seen, up to a maximum of
      FLOW_DIS_MPLS_MAX entries.
      
      FLOW_DIS_MPLS_MAX is arbitrarily set to 7. This should be enough for
      many practical purposes, without wasting too much space.
      
      To record the parsed values, flow_dissector_key_mpls is modified to
      store an array of stack entries, instead of just the values of the
      first one. A bit field, "used_lses", is also added to keep track of
      the LSEs that have been set. The objective is to avoid defining a
      new FLOW_DISSECTOR_KEY_MPLS_XX for each level of the MPLS stack.
      
      TC flower is adapted for the new struct flow_dissector_key_mpls layout.
      Matching on several MPLS Label Stack Entries will be added in the next
      patch.
      
      The NFP and MLX5 drivers are also adapted: nfp_flower_compile_mac() and
      mlx5's parse_tunnel() now verify that the rule only uses the first LSE
      and fail if it doesn't.
      
      Finally, the behaviour of the FLOW_DISSECTOR_KEY_MPLS_ENTROPY key is
      slightly modified. Instead of recording the first Entropy Label, it
      now records the last one. This shouldn't have any consequences since
      there doesn't seem to have any user of FLOW_DISSECTOR_KEY_MPLS_ENTROPY
      in the tree. We'd probably better do a hash of all parsed MPLS labels
      instead (excluding reserved labels) anyway. That'd give better entropy
      and would probably also simplify the code. But that's not the purpose
      of this patch, so I'm keeping that as a future possible improvement.
      Signed-off-by: NGuillaume Nault <gnault@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58cff782
  14. 16 5月, 2020 1 次提交
  15. 31 3月, 2020 1 次提交
    • J
      net: sched: expose HW stats types per action used by drivers · 93a129eb
      Jiri Pirko 提交于
      It may be up to the driver (in case ANY HW stats is passed) to select
      which type of HW stats he is going to use. Add an infrastructure to
      expose this information to user.
      
      $ tc filter add dev enp3s0np1 ingress proto ip handle 1 pref 1 flower dst_ip 192.168.1.1 action drop
      $ tc -s filter show dev enp3s0np1 ingress
      filter protocol ip pref 1 flower chain 0
      filter protocol ip pref 1 flower chain 0 handle 0x1
        eth_type ipv4
        dst_ip 192.168.1.1
        in_hw in_hw_count 2
              action order 1: gact action drop
               random type none pass val 0
               index 1 ref 1 bind 1 installed 10 sec used 10 sec
              Action statistics:
              Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
              backlog 0b 0p requeues 0
              used_hw_stats immediate     <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93a129eb
  16. 27 3月, 2020 3 次提交
  17. 18 2月, 2020 2 次提交
  18. 14 2月, 2020 1 次提交
  19. 27 1月, 2020 1 次提交
    • C
      net_sched: fix ops->bind_class() implementations · 2e24cd75
      Cong Wang 提交于
      The current implementations of ops->bind_class() are merely
      searching for classid and updating class in the struct tcf_result,
      without invoking either of cl_ops->bind_tcf() or
      cl_ops->unbind_tcf(). This breaks the design of them as qdisc's
      like cbq use them to count filters too. This is why syzbot triggered
      the warning in cbq_destroy_class().
      
      In order to fix this, we have to call cl_ops->bind_tcf() and
      cl_ops->unbind_tcf() like the filter binding path. This patch does
      so by refactoring out two helper functions __tcf_bind_filter()
      and __tcf_unbind_filter(), which are lockless and accept a Qdisc
      pointer, then teaching each implementation to call them correctly.
      
      Note, we merely pass the Qdisc pointer as an opaque pointer to
      each filter, they only need to pass it down to the helper
      functions without understanding it at all.
      
      Fixes: 07d79fc7 ("net_sched: add reverse binding for tc class")
      Reported-and-tested-by: syzbot+0a0596220218fcb603a8@syzkaller.appspotmail.com
      Reported-and-tested-by: syzbot+63bdb6006961d8c917c6@syzkaller.appspotmail.com
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e24cd75
  20. 31 12月, 2019 1 次提交
    • D
      net/sched: add delete_empty() to filters and use it in cls_flower · a5b72a08
      Davide Caratti 提交于
      Revert "net/sched: cls_u32: fix refcount leak in the error path of
      u32_change()", and fix the u32 refcount leak in a more generic way that
      preserves the semantic of rule dumping.
      On tc filters that don't support lockless insertion/removal, there is no
      need to guard against concurrent insertion when a removal is in progress.
      Therefore, for most of them we can avoid a full walk() when deleting, and
      just decrease the refcount, like it was done on older Linux kernels.
      This fixes situations where walk() was wrongly detecting a non-empty
      filter, like it happened with cls_u32 in the error path of change(), thus
      leading to failures in the following tdc selftests:
      
       6aa7: (filter, u32) Add/Replace u32 with source match and invalid indev
       6658: (filter, u32) Add/Replace u32 with custom hash table and invalid handle
       74c2: (filter, u32) Add/Replace u32 filter with invalid hash table id
      
      On cls_flower, and on (future) lockless filters, this check is necessary:
      move all the check_empty() logic in a callback so that each filter
      can have its own implementation. For cls_flower, it's sufficient to check
      if no IDRs have been allocated.
      
      This reverts commit 275c44aa.
      
      Changes since v1:
       - document the need for delete_empty() when TCF_PROTO_OPS_DOIT_UNLOCKED
         is used, thanks to Vlad Buslov
       - implement delete_empty() without doing fl_walk(), thanks to Vlad Buslov
       - squash revert and new fix in a single patch, to be nice with bisect
         tests that run tdc on u32 filter, thanks to Dave Miller
      
      Fixes: 275c44aa ("net/sched: cls_u32: fix refcount leak in the error path of u32_change()")
      Fixes: 6676d5e4 ("net: sched: set dedicated tcf_walker flag when tp is empty")
      Suggested-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Suggested-by: NVlad Buslov <vladbu@mellanox.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Reviewed-by: NVlad Buslov <vladbu@mellanox.com>
      Tested-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a5b72a08
  21. 10 12月, 2019 1 次提交
  22. 04 12月, 2019 1 次提交
    • Y
      cls_flower: Fix the behavior using port ranges with hw-offload · 8ffb055b
      Yoshiki Komachi 提交于
      The recent commit 5c72299f ("net: sched: cls_flower: Classify
      packets using port ranges") had added filtering based on port ranges
      to tc flower. However the commit missed necessary changes in hw-offload
      code, so the feature gave rise to generating incorrect offloaded flow
      keys in NIC.
      
      One more detailed example is below:
      
      $ tc qdisc add dev eth0 ingress
      $ tc filter add dev eth0 ingress protocol ip flower ip_proto tcp \
        dst_port 100-200 action drop
      
      With the setup above, an exact match filter with dst_port == 0 will be
      installed in NIC by hw-offload. IOW, the NIC will have a rule which is
      equivalent to the following one.
      
      $ tc qdisc add dev eth0 ingress
      $ tc filter add dev eth0 ingress protocol ip flower ip_proto tcp \
        dst_port 0 action drop
      
      The behavior was caused by the flow dissector which extracts packet
      data into the flow key in the tc flower. More specifically, regardless
      of exact match or specified port ranges, fl_init_dissector() set the
      FLOW_DISSECTOR_KEY_PORTS flag in struct flow_dissector to extract port
      numbers from skb in skb_flow_dissect() called by fl_classify(). Note
      that device drivers received the same struct flow_dissector object as
      used in skb_flow_dissect(). Thus, offloaded drivers could not identify
      which of these is used because the FLOW_DISSECTOR_KEY_PORTS flag was
      set to struct flow_dissector in either case.
      
      This patch adds the new FLOW_DISSECTOR_KEY_PORTS_RANGE flag and the new
      tp_range field in struct fl_flow_key to recognize which filters are applied
      to offloaded drivers. At this point, when filters based on port ranges
      passed to drivers, drivers return the EOPNOTSUPP error because they do
      not support the feature (the newly created FLOW_DISSECTOR_KEY_PORTS_RANGE
      flag).
      
      Fixes: 5c72299f ("net: sched: cls_flower: Classify packets using port ranges")
      Signed-off-by: NYoshiki Komachi <komachi.yoshiki@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ffb055b
  23. 22 11月, 2019 2 次提交
    • X
      net: sched: allow flower to match erspan options · 79b1011c
      Xin Long 提交于
      This patch is to allow matching options in erspan.
      
      The options can be described in the form:
      VER:INDEX:DIR:HWID/VER:INDEX_MASK:DIR_MASK:HWID_MASK.
      When ver is set to 1, index will be applied while dir
      and hwid will be ignored, and when ver is set to 2,
      dir and hwid will be used while index will be ignored.
      
      Different from geneve, only one option can be set. And
      also, geneve options, vxlan options or erspan options
      can't be set at the same time.
      
        # ip link add name erspan1 type erspan external
        # tc qdisc add dev erspan1 ingress
        # tc filter add dev erspan1 protocol ip parent ffff: \
            flower \
              enc_src_ip 10.0.99.192 \
              enc_dst_ip 10.0.99.193 \
              enc_key_id 11 \
              erspan_opts 1:12:0:0/1:ffff:0:0 \
              ip_proto udp \
              action mirred egress redirect dev eth0
      
      v1->v2:
        - improve some err msgs of extack.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79b1011c
    • X
      net: sched: allow flower to match vxlan options · d8f9dfae
      Xin Long 提交于
      This patch is to allow matching gbp option in vxlan.
      
      The options can be described in the form GBP/GBP_MASK,
      where GBP is represented as a 32bit hexadecimal value.
      Different from geneve, only one option can be set. And
      also, geneve options and vxlan options can't be set at
      the same time.
      
        # ip link add name vxlan0 type vxlan dstport 0 external
        # tc qdisc add dev vxlan0 ingress
        # tc filter add dev vxlan0 protocol ip parent ffff: \
            flower \
              enc_src_ip 10.0.99.192 \
              enc_dst_ip 10.0.99.193 \
              enc_key_id 11 \
              vxlan_opts 01020304/ffffffff \
              ip_proto udp \
              action mirred egress redirect dev eth0
      
      v1->v2:
        - add .strict_start_type for enc_opts_policy as Jakub noticed.
        - use Duplicate instead of Wrong in err msg for extack as Jakub
          suggested.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d8f9dfae
  24. 27 8月, 2019 5 次提交
    • V
      net: sched: flower: don't take rtnl lock for cls hw offloads API · 918190f5
      Vlad Buslov 提交于
      Don't manually take rtnl lock in flower classifier before calling cls
      hardware offloads API. Instead, pass rtnl lock status via 'rtnl_held'
      parameter.
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      918190f5
    • V
      net: sched: take reference to action dev before calling offloads · 5a6ff4b1
      Vlad Buslov 提交于
      In order to remove dependency on rtnl lock when calling hardware offload
      API, take reference to action mirred dev when initializing flow_action
      structure in tc_setup_flow_action(). Implement function
      tc_cleanup_flow_action(), use it to release the device after hardware
      offload API is done using it.
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a6ff4b1
    • V
      net: sched: take rtnl lock in tc_setup_flow_action() · 9838b20a
      Vlad Buslov 提交于
      In order to allow using new flow_action infrastructure from unlocked
      classifiers, modify tc_setup_flow_action() to accept new 'rtnl_held'
      argument. Take rtnl lock before accessing tc_action data. This is necessary
      to protect from concurrent action replace.
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9838b20a
    • V
      net: sched: notify classifier on successful offload add/delete · a449a3e7
      Vlad Buslov 提交于
      To remove dependency on rtnl lock, extend classifier ops with new
      ops->hw_add() and ops->hw_del() callbacks. Call them from cls API while
      holding cb_lock every time filter if successfully added to or deleted from
      hardware.
      
      Implement the new API in flower classifier. Use it to manage hw_filters
      list under cb_lock protection, instead of relying on rtnl lock to
      synchronize with concurrent fl_reoffload() call.
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a449a3e7
    • V
      net: sched: refactor block offloads counter usage · 40119211
      Vlad Buslov 提交于
      Without rtnl lock protection filters can no longer safely manage block
      offloads counter themselves. Refactor cls API to protect block offloadcnt
      with tcf_block->cb_lock that is already used to protect driver callback
      list and nooffloaddevcnt counter. The counter can be modified by concurrent
      tasks by new functions that execute block callbacks (which is safe with
      previous patch that changed its type to atomic_t), however, block
      bind/unbind code that checks the counter value takes cb_lock in write mode
      to exclude any concurrent modifications. This approach prevents race
      conditions between bind/unbind and callback execution code but allows for
      concurrency for tc rule update path.
      
      Move block offload counter, filter in hardware counter and filter flags
      management from classifiers into cls hardware offloads API. Make functions
      tcf_block_offload_{inc|dec}() and tc_cls_offload_cnt_update() to be cls API
      private. Implement following new cls API to be used instead:
      
        tc_setup_cb_add() - non-destructive filter add. If filter that wasn't
        already in hardware is successfully offloaded, increment block offloads
        counter, set filter in hardware counter and flag. On failure, previously
        offloaded filter is considered to be intact and offloads counter is not
        decremented.
      
        tc_setup_cb_replace() - destructive filter replace. Release existing
        filter block offload counter and reset its in hardware counter and flag.
        Set new filter in hardware counter and flag. On failure, previously
        offloaded filter is considered to be destroyed and offload counter is
        decremented.
      
        tc_setup_cb_destroy() - filter destroy. Unconditionally decrement block
        offloads counter.
      
        tc_setup_cb_reoffload() - reoffload filter to single cb. Execute cb() and
        call tc_cls_offload_cnt_update() if cb() didn't return an error.
      
      Refactor all offload-capable classifiers to atomically offload filters to
      hardware, change block offload counter, and set filter in hardware counter
      and flag by means of the new cls API functions.
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40119211
  25. 20 7月, 2019 1 次提交
  26. 10 7月, 2019 2 次提交
  27. 02 7月, 2019 1 次提交
  28. 19 6月, 2019 1 次提交