1. 20 9月, 2022 1 次提交
  2. 27 7月, 2022 1 次提交
  3. 29 6月, 2022 1 次提交
  4. 20 4月, 2022 1 次提交
  5. 12 3月, 2022 1 次提交
    • W
      net/sched: Allow flower to match on GTP options · e3acda7a
      Wojciech Drewek 提交于
      Options are as follows: PDU_TYPE:QFI and they refernce to
      the fields from the  PDU Session Protocol. PDU Session data
      is conveyed in GTP-U Extension Header.
      
      GTP-U Extension Header is described in 3GPP TS 29.281.
      PDU Session Protocol is described in 3GPP TS 38.415.
      
      PDU_TYPE -  indicates the type of the PDU Session Information (4 bits)
      QFI      -  QoS Flow Identifier (6 bits)
      
        # ip link add gtp_dev type gtp role sgsn
        # tc qdisc add dev gtp_dev ingress
        # tc filter add dev gtp_dev protocol ip parent ffff: \
            flower \
              enc_key_id 11 \
              gtp_opts 1:8/ff:ff \
            action mirred egress redirect dev eth0
      Signed-off-by: NWojciech Drewek <wojciech.drewek@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      e3acda7a
  6. 19 12月, 2021 1 次提交
  7. 02 8月, 2021 1 次提交
    • C
      net_sched: refactor TC action init API · 695176bf
      Cong Wang 提交于
      TC action ->init() API has 10 parameters, it becomes harder
      to read. Some of them are just boolean and can be replaced
      by flags. Similarly for the internal API tcf_action_init()
      and tcf_exts_validate().
      
      This patch converts them to flags and fold them into
      the upper 16 bits of "flags", whose lower 16 bits are still
      reserved for user-space. More specifically, the following
      kernel flags are introduced:
      
      TCA_ACT_FLAGS_POLICE replace 'name' in a few contexts, to
      distinguish whether it is compatible with policer.
      
      TCA_ACT_FLAGS_BIND replaces 'bind', to indicate whether
      this action is bound to a filter.
      
      TCA_ACT_FLAGS_REPLACE  replaces 'ovr' in most contexts,
      means we are replacing an existing action.
      
      TCA_ACT_FLAGS_NO_RTNL replaces 'rtnl_held' but has the
      opposite meaning, because we still hold RTNL in most
      cases.
      
      The only user-space flag TCA_ACT_FLAGS_NO_PERCPU_STATS is
      untouched and still stored as before.
      
      I have tested this patch with tdc and I do not see any
      failure related to this patch.
      Tested-by: NVlad Buslov <vladbu@nvidia.com>
      Acked-by: Jamal Hadi Salim<jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <cong.wang@bytedance.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      695176bf
  8. 14 3月, 2021 1 次提交
  9. 11 2月, 2021 1 次提交
  10. 30 1月, 2021 1 次提交
  11. 21 1月, 2021 1 次提交
  12. 25 7月, 2020 1 次提交
  13. 27 5月, 2020 1 次提交
    • G
      cls_flower: Support filtering on multiple MPLS Label Stack Entries · 61aec25a
      Guillaume Nault 提交于
      With struct flow_dissector_key_mpls now recording the first
      FLOW_DIS_MPLS_MAX labels, we can extend Flower to filter on any of
      these LSEs independently.
      
      In order to avoid creating new netlink attributes for every possible
      depth, let's define a new TCA_FLOWER_KEY_MPLS_OPTS nested attribute
      that contains the list of LSEs to match. Each LSE is represented by
      another attribute, TCA_FLOWER_KEY_MPLS_OPTS_LSE, which then contains
      the attributes representing the depth and the MPLS fields to match at
      this depth (label, TTL, etc.).
      
      For each MPLS field, the mask is always set to all-ones, as this is
      what the original API did. We could allow user configurable masks in
      the future if there is demand for more flexibility.
      
      The new API also allows to only specify an LSE depth. In that case,
      Flower only verifies that the MPLS label stack depth is greater or
      equal to the provided depth (that is, an LSE exists at this depth).
      
      Filters that only match on one (or more) fields of the first LSE are
      dumped using the old netlink attributes, to avoid confusing user space
      programs that don't understand the new API.
      Signed-off-by: NGuillaume Nault <gnault@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      61aec25a
  14. 02 5月, 2020 1 次提交
    • P
      net: qos: introduce a gate control flow action · a51c328d
      Po Liu 提交于
      Introduce a ingress frame gate control flow action.
      Tc gate action does the work like this:
      Assume there is a gate allow specified ingress frames can be passed at
      specific time slot, and be dropped at specific time slot. Tc filter
      chooses the ingress frames, and tc gate action would specify what slot
      does these frames can be passed to device and what time slot would be
      dropped.
      Tc gate action would provide an entry list to tell how much time gate
      keep open and how much time gate keep state close. Gate action also
      assign a start time to tell when the entry list start. Then driver would
      repeat the gate entry list cyclically.
      For the software simulation, gate action requires the user assign a time
      clock type.
      
      Below is the setting example in user space. Tc filter a stream source ip
      address is 192.168.0.20 and gate action own two time slots. One is last
      200ms gate open let frame pass another is last 100ms gate close let
      frames dropped. When the ingress frames have reach total frames over
      8000000 bytes, the excessive frames will be dropped in that 200000000ns
      time slot.
      
      > tc qdisc add dev eth0 ingress
      
      > tc filter add dev eth0 parent ffff: protocol ip \
      	   flower src_ip 192.168.0.20 \
      	   action gate index 2 clockid CLOCK_TAI \
      	   sched-entry open 200000000 -1 8000000 \
      	   sched-entry close 100000000 -1 -1
      
      > tc chain del dev eth0 ingress chain 0
      
      "sched-entry" follow the name taprio style. Gate state is
      "open"/"close". Follow with period nanosecond. Then next item is internal
      priority value means which ingress queue should put. "-1" means
      wildcard. The last value optional specifies the maximum number of
      MSDU octets that are permitted to pass the gate during the specified
      time interval.
      Base-time is not set will be 0 as default, as result start time would
      be ((N + 1) * cycletime) which is the minimal of future time.
      
      Below example shows filtering a stream with destination mac address is
      10:00:80:00:00:00 and ip type is ICMP, follow the action gate. The gate
      action would run with one close time slot which means always keep close.
      The time cycle is total 200000000ns. The base-time would calculate by:
      
       1357000000000 + (N + 1) * cycletime
      
      When the total value is the future time, it will be the start time.
      The cycletime here would be 200000000ns for this case.
      
      > tc filter add dev eth0 parent ffff:  protocol ip \
      	   flower skip_hw ip_proto icmp dst_mac 10:00:80:00:00:00 \
      	   action gate index 12 base-time 1357000000000 \
      	   sched-entry close 200000000 -1 -1 \
      	   clockid CLOCK_TAI
      Signed-off-by: NPo Liu <Po.Liu@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a51c328d
  15. 31 3月, 2020 1 次提交
    • J
      net: sched: expose HW stats types per action used by drivers · 93a129eb
      Jiri Pirko 提交于
      It may be up to the driver (in case ANY HW stats is passed) to select
      which type of HW stats he is going to use. Add an infrastructure to
      expose this information to user.
      
      $ tc filter add dev enp3s0np1 ingress proto ip handle 1 pref 1 flower dst_ip 192.168.1.1 action drop
      $ tc -s filter show dev enp3s0np1 ingress
      filter protocol ip pref 1 flower chain 0
      filter protocol ip pref 1 flower chain 0 handle 0x1
        eth_type ipv4
        dst_ip 192.168.1.1
        in_hw in_hw_count 2
              action order 1: gact action drop
               random type none pass val 0
               index 1 ref 1 bind 1 installed 10 sec used 10 sec
              Action statistics:
              Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
              backlog 0b 0p requeues 0
              used_hw_stats immediate     <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93a129eb
  16. 24 3月, 2020 1 次提交
  17. 09 3月, 2020 1 次提交
  18. 22 11月, 2019 2 次提交
    • X
      net: sched: allow flower to match erspan options · 79b1011c
      Xin Long 提交于
      This patch is to allow matching options in erspan.
      
      The options can be described in the form:
      VER:INDEX:DIR:HWID/VER:INDEX_MASK:DIR_MASK:HWID_MASK.
      When ver is set to 1, index will be applied while dir
      and hwid will be ignored, and when ver is set to 2,
      dir and hwid will be used while index will be ignored.
      
      Different from geneve, only one option can be set. And
      also, geneve options, vxlan options or erspan options
      can't be set at the same time.
      
        # ip link add name erspan1 type erspan external
        # tc qdisc add dev erspan1 ingress
        # tc filter add dev erspan1 protocol ip parent ffff: \
            flower \
              enc_src_ip 10.0.99.192 \
              enc_dst_ip 10.0.99.193 \
              enc_key_id 11 \
              erspan_opts 1:12:0:0/1:ffff:0:0 \
              ip_proto udp \
              action mirred egress redirect dev eth0
      
      v1->v2:
        - improve some err msgs of extack.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79b1011c
    • X
      net: sched: allow flower to match vxlan options · d8f9dfae
      Xin Long 提交于
      This patch is to allow matching gbp option in vxlan.
      
      The options can be described in the form GBP/GBP_MASK,
      where GBP is represented as a 32bit hexadecimal value.
      Different from geneve, only one option can be set. And
      also, geneve options and vxlan options can't be set at
      the same time.
      
        # ip link add name vxlan0 type vxlan dstport 0 external
        # tc qdisc add dev vxlan0 ingress
        # tc filter add dev vxlan0 protocol ip parent ffff: \
            flower \
              enc_src_ip 10.0.99.192 \
              enc_dst_ip 10.0.99.193 \
              enc_key_id 11 \
              vxlan_opts 01020304/ffffffff \
              ip_proto udp \
              action mirred egress redirect dev eth0
      
      v1->v2:
        - add .strict_start_type for enc_opts_policy as Jakub noticed.
        - use Duplicate instead of Wrong in err msg for extack as Jakub
          suggested.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d8f9dfae
  19. 31 10月, 2019 1 次提交
  20. 06 9月, 2019 1 次提交
  21. 10 7月, 2019 2 次提交
    • P
      net/sched: cls_flower: Add matching on conntrack info · e0ace68a
      Paul Blakey 提交于
      New matches for conntrack mark, label, zone, and state.
      Signed-off-by: NPaul Blakey <paulb@mellanox.com>
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NYossi Kuperman <yossiku@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e0ace68a
    • P
      net/sched: Introduce action ct · b57dc7c1
      Paul Blakey 提交于
      Allow sending a packet to conntrack module for connection tracking.
      
      The packet will be marked with conntrack connection's state, and
      any metadata such as conntrack mark and label. This state metadata
      can later be matched against with tc classifers, for example with the
      flower classifier as below.
      
      In addition to committing new connections the user can optionally
      specific a zone to track within, set a mark/label and configure nat
      with an address range and port range.
      
      Usage is as follows:
      $ tc qdisc add dev ens1f0_0 ingress
      $ tc qdisc add dev ens1f0_1 ingress
      
      $ tc filter add dev ens1f0_0 ingress \
        prio 1 chain 0 proto ip \
        flower ip_proto tcp ct_state -trk \
        action ct zone 2 pipe \
        action goto chain 2
      $ tc filter add dev ens1f0_0 ingress \
        prio 1 chain 2 proto ip \
        flower ct_state +trk+new \
        action ct zone 2 commit mark 0xbb nat src addr 5.5.5.7 pipe \
        action mirred egress redirect dev ens1f0_1
      $ tc filter add dev ens1f0_0 ingress \
        prio 1 chain 2 proto ip \
        flower ct_zone 2 ct_mark 0xbb ct_state +trk+est \
        action ct nat pipe \
        action mirred egress redirect dev ens1f0_1
      
      $ tc filter add dev ens1f0_1 ingress \
        prio 1 chain 0 proto ip \
        flower ip_proto tcp ct_state -trk \
        action ct zone 2 pipe \
        action goto chain 1
      $ tc filter add dev ens1f0_1 ingress \
        prio 1 chain 1 proto ip \
        flower ct_zone 2 ct_mark 0xbb ct_state +trk+est \
        action ct nat pipe \
        action mirred egress redirect dev ens1f0_0
      Signed-off-by: NPaul Blakey <paulb@mellanox.com>
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NYossi Kuperman <yossiku@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      
      Changelog:
      V5->V6:
      	Added CONFIG_NF_DEFRAG_IPV6 in handle fragments ipv6 case
      V4->V5:
      	Reordered nf_conntrack_put() in tcf_ct_skb_nfct_cached()
      V3->V4:
      	Added strict_start_type for act_ct policy
      V2->V3:
      	Fixed david's comments: Removed extra newline after rcu in tcf_ct_params , and indent of break in act_ct.c
      V1->V2:
      	Fixed parsing of ranges TCA_CT_NAT_IPV6_MAX as 'else' case overwritten ipv4 max
      	Refactored NAT_PORT_MIN_MAX range handling as well
      	Added ipv4/ipv6 defragmentation
      	Removed extra skb pull push of nw offset in exectute nat
      	Refactored tcf_ct_skb_network_trim after pull
      	Removed TCA_ACT_CT define
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b57dc7c1
  22. 09 7月, 2019 1 次提交
  23. 16 6月, 2019 1 次提交
  24. 30 5月, 2019 1 次提交
    • K
      net: sched: Introduce act_ctinfo action · 24ec483c
      Kevin 'ldir' Darbyshire-Bryant 提交于
      ctinfo is a new tc filter action module.  It is designed to restore
      information contained in firewall conntrack marks to other packet fields
      and is typically used on packet ingress paths.  At present it has two
      independent sub-functions or operating modes, DSCP restoration mode &
      skb mark restoration mode.
      
      The DSCP restore mode:
      
      This mode copies DSCP values that have been placed in the firewall
      conntrack mark back into the IPv4/v6 diffserv fields of relevant
      packets.
      
      The DSCP restoration is intended for use and has been found useful for
      restoring ingress classifications based on egress classifications across
      links that bleach or otherwise change DSCP, typically home ISP Internet
      links.  Restoring DSCP on ingress on the WAN link allows qdiscs such as
      but by no means limited to CAKE to shape inbound packets according to
      policies that are easier to set & mark on egress.
      
      Ingress classification is traditionally a challenging task since
      iptables rules haven't yet run and tc filter/eBPF programs are pre-NAT
      lookups, hence are unable to see internal IPv4 addresses as used on the
      typical home masquerading gateway.  Thus marking the connection in some
      manner on egress for later restoration of classification on ingress is
      easier to implement.
      
      Parameters related to DSCP restore mode:
      
      dscpmask - a 32 bit mask of 6 contiguous bits and indicate bits of the
      conntrack mark field contain the DSCP value to be restored.
      
      statemask - a 32 bit mask of (usually) 1 bit length, outside the area
      specified by dscpmask.  This represents a conditional operation flag
      whereby the DSCP is only restored if the flag is set.  This is useful to
      implement a 'one shot' iptables based classification where the
      'complicated' iptables rules are only run once to classify the
      connection on initial (egress) packet and subsequent packets are all
      marked/restored with the same DSCP.  A mask of zero disables the
      conditional behaviour ie. the conntrack mark DSCP bits are always
      restored to the ip diffserv field (assuming the conntrack entry is found
      & the skb is an ipv4/ipv6 type)
      
      e.g. dscpmask 0xfc000000 statemask 0x01000000
      
      |----0xFC----conntrack mark----000000---|
      | Bits 31-26 | bit 25 | bit24 |~~~ Bit 0|
      | DSCP       | unused | flag  |unused   |
      |-----------------------0x01---000000---|
            |                   |
            |                   |
            ---|             Conditional flag
               v             only restore if set
      |-ip diffserv-|
      | 6 bits      |
      |-------------|
      
      The skb mark restore mode (cpmark):
      
      This mode copies the firewall conntrack mark to the skb's mark field.
      It is completely the functional equivalent of the existing act_connmark
      action with the additional feature of being able to apply a mask to the
      restored value.
      
      Parameters related to skb mark restore mode:
      
      mask - a 32 bit mask applied to the firewall conntrack mark to mask out
      bits unwanted for restoration.  This can be useful where the conntrack
      mark is being used for different purposes by different applications.  If
      not specified and by default the whole mark field is copied (i.e.
      default mask of 0xffffffff)
      
      e.g. mask 0x00ffffff to mask out the top 8 bits being used by the
      aforementioned DSCP restore mode.
      
      |----0x00----conntrack mark----ffffff---|
      | Bits 31-24 |                          |
      | DSCP & flag|      some value here     |
      |---------------------------------------|
      			|
      			|
      			v
      |------------skb mark-------------------|
      |            |                          |
      |  zeroed    |                          |
      |---------------------------------------|
      
      Overall parameters:
      
      zone - conntrack zone
      
      control - action related control (reclassify | pipe | drop | continue |
      ok | goto chain <CHAIN_INDEX>)
      Signed-off-by: NKevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
      Reviewed-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      24ec483c
  25. 11 2月, 2019 2 次提交
  26. 20 1月, 2019 1 次提交
    • C
      net_sched: add performance counters for basic filter · 5954894b
      Cong Wang 提交于
      Similar to u32 filter, it is useful to know how many times
      we reach each basic filter and how many times we pass the
      ematch attached to it.
      
      Sample output:
      
      filter protocol arp pref 49152 basic chain 0
      filter protocol arp pref 49152 basic chain 0 handle 0x1  (rule hit 3 success 3)
      	action order 1: gact action pass
      	 random type none pass val 0
      	 index 1 ref 1 bind 1 installed 81 sec used 4 sec
      	Action statistics:
      	Sent 126 bytes 3 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5954894b
  27. 19 1月, 2019 1 次提交
    • C
      net_sched: add hit counter for matchall · f88c19aa
      Cong Wang 提交于
      Although matchall always matches packets, however, it still
      relies on a protocol match first. So it is still useful to have
      such a counter for matchall. Of course, unlike u32, every time
      we hit a matchall filter, it is always a success, so we don't
      have to distinguish them.
      
      Sample output:
      
      filter protocol 802.1Q pref 100 matchall chain 0
      filter protocol 802.1Q pref 100 matchall chain 0 handle 0x1
        not_in_hw (rule hit 10)
      	action order 1: vlan  pop continue
      	 index 1 ref 1 bind 1 installed 40 sec used 1 sec
      	Action statistics:
      	Sent 836 bytes 10 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      Reported-by: NMartin Olsson <martin.olsson+netdev@sentorsecurity.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f88c19aa
  28. 16 11月, 2018 1 次提交
    • A
      net: sched: cls_flower: Classify packets using port ranges · 5c72299f
      Amritha Nambiar 提交于
      Added support in tc flower for filtering based on port ranges.
      
      Example:
      1. Match on a port range:
      -------------------------
      $ tc filter add dev enp4s0 protocol ip parent ffff:\
        prio 1 flower ip_proto tcp dst_port range 20-30 skip_hw\
        action drop
      
      $ tc -s filter show dev enp4s0 parent ffff:
      filter protocol ip pref 1 flower chain 0
      filter protocol ip pref 1 flower chain 0 handle 0x1
        eth_type ipv4
        ip_proto tcp
        dst_port range 20-30
        skip_hw
        not_in_hw
              action order 1: gact action drop
               random type none pass val 0
               index 1 ref 1 bind 1 installed 85 sec used 3 sec
              Action statistics:
              Sent 460 bytes 10 pkt (dropped 10, overlimits 0 requeues 0)
              backlog 0b 0p requeues 0
      
      2. Match on IP address and port range:
      --------------------------------------
      $ tc filter add dev enp4s0 protocol ip parent ffff:\
        prio 1 flower dst_ip 192.168.1.1 ip_proto tcp dst_port range 100-200\
        skip_hw action drop
      
      $ tc -s filter show dev enp4s0 parent ffff:
      filter protocol ip pref 1 flower chain 0 handle 0x2
        eth_type ipv4
        ip_proto tcp
        dst_ip 192.168.1.1
        dst_port range 100-200
        skip_hw
        not_in_hw
              action order 1: gact action drop
               random type none pass val 0
               index 2 ref 1 bind 1 installed 58 sec used 2 sec
              Action statistics:
              Sent 920 bytes 20 pkt (dropped 20, overlimits 0 requeues 0)
              backlog 0b 0p requeues 0
      
      v4:
      1. Added condition before setting port key.
      2. Organized setting and dumping port range keys into functions
         and added validation of input range.
      
      v3:
      1. Moved new fields in UAPI enum to the end of enum.
      2. Removed couple of empty lines.
      
      v2:
      Addressed Jiri's comments:
      1. Added separate functions for dst and src comparisons.
      2. Removed endpoint enum.
      3. Added new bit TCA_FLOWER_FLAGS_RANGE to decide normal/range
        lookup.
      4. Cleaned up fl_lookup function.
      Signed-off-by: NAmritha Nambiar <amritha.nambiar@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5c72299f
  29. 11 9月, 2018 1 次提交
  30. 08 8月, 2018 1 次提交
    • P
      net/sched: allow flower to match tunnel options · 0a6e7778
      Pieter Jansen van Vuuren 提交于
      Allow matching on options in Geneve tunnel headers.
      This makes use of existing tunnel metadata support.
      
      The options can be described in the form
      CLASS:TYPE:DATA/CLASS_MASK:TYPE_MASK:DATA_MASK, where CLASS is
      represented as a 16bit hexadecimal value, TYPE as an 8bit
      hexadecimal value and DATA as a variable length hexadecimal value.
      
      e.g.
       # ip link add name geneve0 type geneve dstport 0 external
       # tc qdisc add dev geneve0 ingress
       # tc filter add dev geneve0 protocol ip parent ffff: \
           flower \
             enc_src_ip 10.0.99.192 \
             enc_dst_ip 10.0.99.193 \
             enc_key_id 11 \
             geneve_opts 0102:80:1122334421314151/ffff:ff:ffffffffffffffff \
             ip_proto udp \
             action mirred egress redirect dev eth1
      
      This patch adds support for matching Geneve options in the order
      supplied by the user. This leads to an efficient implementation in
      the software datapath (and in our opinion hardware datapaths that
      offload this feature). It is also compatible with Geneve options
      matching provided by the Open vSwitch kernel datapath which is
      relevant here as the Flower classifier may be used as a mechanism
      to program flows into hardware as a form of Open vSwitch datapath
      offload (sometimes referred to as OVS-TC). The netlink
      Kernel/Userspace API may be extended, for example by adding a flag,
      if other matching options are desired, for example matching given
      options in any order. This would require an implementation in the
      TC software datapath. And be done in a way that drivers that
      facilitate offload of the Flower classifier can reject or accept
      such flows based on hardware datapath capabilities.
      
      This approach was discussed and agreed on at Netconf 2017 in Seoul.
      Signed-off-by: NSimon Horman <simon.horman@netronome.com>
      Signed-off-by: NPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
      Acked-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a6e7778
  31. 31 7月, 2018 1 次提交
    • P
      net/sched: user-space can't set unknown tcfa_action values · 802bfb19
      Paolo Abeni 提交于
      Currently, when initializing an action, the user-space can specify
      and use arbitrary values for the tcfa_action field. If the value
      is unknown by the kernel, is implicitly threaded as TC_ACT_UNSPEC.
      
      This change explicitly checks for unknown values at action creation
      time, and explicitly convert them to TC_ACT_UNSPEC. No functional
      changes are introduced, but this will allow introducing tcfa_action
      values not exposed to user-space in a later patch.
      
      Note: we can't use the above to hide TC_ACT_REDIRECT from user-space,
      as the latter is already part of uAPI.
      
      v3 -> v4:
       - use an helper to check for action validity (JiriP)
       - emit an extack for invalid actions (JiriP)
      v4 -> v5:
       - keep messages on a single line, drop net_warn (Marcelo)
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      802bfb19
  32. 20 7月, 2018 1 次提交
  33. 07 7月, 2018 1 次提交
  34. 15 5月, 2018 1 次提交
    • M
      sched: cls: enable verbose logging · 81c7288b
      Marcelo Ricardo Leitner 提交于
      Currently, when the rule is not to be exclusively executed by the
      hardware, extack is not passed along and offloading failures don't
      get logged. The idea was that hardware failures are okay because the
      rule will get executed in software then and this way it doesn't confuse
      unware users.
      
      But this is not helpful in case one needs to understand why a certain
      rule failed to get offloaded. Considering it may have been a temporary
      failure, like resources exceeded or so, reproducing it later and knowing
      that it is triggering the same reason may be challenging.
      
      The ultimate goal is to improve Open vSwitch debuggability when using
      flower offloading.
      
      This patch adds a new flag to enable verbose logging. With the flag set,
      extack will be passed to the driver, which will be able to log the
      error. As the operation itself probably won't fail (not because of this,
      at least), current iproute will already log it as a Warning.
      
      The flag is generic, so it can be reused later. No need to restrict it
      just for HW offloading. The command line will follow the syntax that
      tc-ebpf already uses, tc ... [ verbose ] ... , and extend its meaning.
      
      For example:
      # ./tc qdisc add dev p7p1 ingress
      # ./tc filter add dev p7p1 parent ffff: protocol ip prio 1 \
      	flower verbose \
      	src_mac ed:13:db:00:00:00 dst_mac 01:80:c2:00:00:d0 \
      	src_ip 56.0.0.0 dst_ip 55.0.0.0 action drop
      Warning: TC offload is disabled on net device.
      # echo $?
      0
      # ./tc filter add dev p7p1 parent ffff: protocol ip prio 1 \
      	flower \
      	src_mac ff:13:db:00:00:00 dst_mac 01:80:c2:00:00:d0 \
      	src_ip 56.0.0.0 dst_ip 55.0.0.0 action drop
      # echo $?
      0
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      81c7288b
  35. 09 3月, 2018 1 次提交
  36. 22 2月, 2018 1 次提交
    • E
      net: sched: add em_ipt ematch for calling xtables matches · ccc007e4
      Eyal Birger 提交于
      The commit a new tc ematch for using netfilter xtable matches.
      
      This allows early classification as well as mirroning/redirecting traffic
      based on logic implemented in netfilter extensions.
      
      Current supported use case is classification based on the incoming IPSec
      state used during decpsulation using the 'policy' iptables extension
      (xt_policy).
      
      The module dynamically fetches the netfilter match module and calls
      it using a fake xt_action_param structure based on validated userspace
      provided parameters.
      
      As the xt_policy match does not access skb->data, no skb modifications
      are needed on match.
      Signed-off-by: NEyal Birger <eyal.birger@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ccc007e4
  37. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX license identifier to uapi header files with no license · 6f52b16c
      Greg Kroah-Hartman 提交于
      Many user space API headers are missing licensing information, which
      makes it hard for compliance tools to determine the correct license.
      
      By default are files without license information under the default
      license of the kernel, which is GPLV2.  Marking them GPLV2 would exclude
      them from being included in non GPLV2 code, which is obviously not
      intended. The user space API headers fall under the syscall exception
      which is in the kernels COPYING file:
      
         NOTE! This copyright does *not* cover user programs that use kernel
         services by normal system calls - this is merely considered normal use
         of the kernel, and does *not* fall under the heading of "derived work".
      
      otherwise syscall usage would not be possible.
      
      Update the files which contain no license information with an SPDX
      license identifier.  The chosen identifier is 'GPL-2.0 WITH
      Linux-syscall-note' which is the officially assigned identifier for the
      Linux syscall exception.  SPDX license identifiers are a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.  See the previous patch in this series for the
      methodology of how this patch was researched.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6f52b16c