1. 25 5月, 2019 3 次提交
    • D
      ipv6: Make fib6_nh optional at the end of fib6_info · 1cf844c7
      David Ahern 提交于
      Move fib6_nh to the end of fib6_info and make it an array of
      size 0. Pass a flag to fib6_info_alloc indicating if the
      allocation needs to add space for a fib6_nh.
      
      The current code path always has a fib6_nh allocated with a
      fib6_info; with nexthop objects they will be separate.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1cf844c7
    • D
      ipv6: Move exception bucket to fib6_nh · cc5c073a
      David Ahern 提交于
      Similar to the pcpu routes exceptions are really per nexthop, so move
      rt6i_exception_bucket from fib6_info to fib6_nh.
      
      To avoid additional increases to the size of fib6_nh for a 1-bit flag,
      use the lowest bit in the allocated memory pointer for the flushed flag.
      Add helpers for retrieving the bucket pointer to mask off the flag.
      
      The cleanup of the exception bucket is moved to fib6_nh_release.
      
      fib6_nh_flush_exceptions can now be called from 2 contexts:
      1. deleting a fib entry
      2. deleting a fib6_nh
      
      For 1., fib6_nh_flush_exceptions is called for a specific fib6_info that
      is getting deleted. All exceptions in the cache using the entry are
      deleted. For 2, the fib6_nh itself is getting destroyed so
      fib6_nh_flush_exceptions is called for a NULL fib6_info which means
      flush all entries.
      
      The pmtu.sh selftest exercises the affected code paths - from creating
      exceptions to cleaning them up on device delete. All tests pass without
      any rcu locking or memleak warnings.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc5c073a
    • D
      ipv6: Move pcpu cached routes to fib6_nh · f40b6ae2
      David Ahern 提交于
      rt6_info are specific instances of a fib entry and are tied to a
      device and gateway - ie., a nexthop. Before nexthop objects, IPv6 fib
      entries have separate fib6_info for each nexthop in a multipath route,
      so the location of the pcpu cache in the fib6_info struct worked.
      However, with nexthop objects a fib6_info can point to a set of nexthops
      (yet another alignment of ipv6 with ipv4). Accordingly, the pcpu
      cache needs to be moved to the fib6_nh struct so the cached entries
      are local to the nexthop specification used to create the rt6_info.
      
      Initialization and free of the pcpu entries moved to fib6_nh_init and
      fib6_nh_release.
      
      Change in location only, from fib6_info down to fib6_nh; no other
      functional change intended.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f40b6ae2
  2. 24 5月, 2019 1 次提交
  3. 23 5月, 2019 9 次提交
  4. 21 5月, 2019 3 次提交
  5. 17 5月, 2019 2 次提交
  6. 16 5月, 2019 3 次提交
  7. 13 5月, 2019 3 次提交
    • V
      net: dsa: Remove the now unused DSA_SKB_CB_COPY() macro · 1c9b1420
      Vladimir Oltean 提交于
      It's best to not expose this, due to the performance hit it may cause
      when calling it.
      
      Fixes: b68b0dd0 ("net: dsa: Keep private info in the skb->cb")
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1c9b1420
    • V
      net: dsa: Remove dangerous DSA_SKB_CLONE() macro · 506f0e09
      Vladimir Oltean 提交于
      This does not cause any bug now because it has no users, but its body
      contains two pointer definitions within a code block:
      
      		struct sk_buff *clone = _clone;	\
      		struct sk_buff *skb = _skb;	\
      
      When calling the macro as DSA_SKB_CLONE(clone, skb), these variables
      would obscure the arguments that the macro was called with, and the
      initializers would be a no-op instead of doing their job (undefined
      behavior, by the way, but GCC nicely puts NULL pointers instead).
      
      So simply remove this broken macro and leave users to simply call
      "DSA_SKB_CB(skb)->clone = clone" by hand when needed.
      
      There is one functional difference when doing what I just suggested
      above: the control block won't be transferred from the original skb into
      the clone. Since there's no foreseen need for the control block in the
      clone ATM, this is ok.
      
      Fixes: b68b0dd0 ("net: dsa: Keep private info in the skb->cb")
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      506f0e09
    • V
      net: dsa: Initialize DSA_SKB_CB(skb)->deferred_xmit variable · 87671375
      Vladimir Oltean 提交于
      The sk_buff control block can have any contents on xmit put there by the
      stack, so initialization is mandatory, since we are checking its value
      after the actual DSA xmit (the tagger may have changed it).
      
      The DSA_SKB_ZERO() macro could have been used for this purpose, but:
      - Zeroizing a 48-byte memory region in the hotpath is best avoided.
      - It would have triggered a warning with newer compilers since
        __dsa_skb_cb contains a structure within a structure, and the {0}
        initializer was incorrect for that purpose.
      
      So simply remove the DSA_SKB_ZERO() macro and initialize the
      deferred_xmit variable by hand (which should be done for all further
      dsa_skb_cb variables which need initialization - currently none - to
      avoid the performance penalty).
      
      Fixes: 97a69a0d ("net: dsa: Add support for deferred xmit")
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      87671375
  8. 10 5月, 2019 1 次提交
  9. 08 5月, 2019 1 次提交
  10. 06 5月, 2019 14 次提交
    • V
      net: dsa: sja1105: Add support for traffic through standalone ports · 227d07a0
      Vladimir Oltean 提交于
      In order to support this, we are creating a make-shift switch tag out of
      a VLAN trunk configured on the CPU port. Termination of normal traffic
      on switch ports only works when not under a vlan_filtering bridge.
      Termination of management (PTP, BPDU) traffic works under all
      circumstances because it uses a different tagging mechanism
      (incl_srcpt). We are making use of the generic CONFIG_NET_DSA_TAG_8021Q
      code and leveraging it from our own CONFIG_NET_DSA_TAG_SJA1105.
      
      There are two types of traffic: regular and link-local.
      
      The link-local traffic received on the CPU port is trapped from the
      switch's regular forwarding decisions because it matched one of the two
      DMAC filters for management traffic.
      
      On transmission, the switch requires special massaging for these
      link-local frames. Due to a weird implementation of the switching IP, by
      default it drops link-local frames that originate on the CPU port.
      It needs to be told where to forward them to, through an SPI command
      ("management route") that is valid for only a single frame.
      So when we're sending link-local traffic, we are using the
      dsa_defer_xmit mechanism.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      227d07a0
    • V
      net: dsa: Add a private structure pointer to dsa_port · c362beb0
      Vladimir Oltean 提交于
      This is supposed to share information between the driver and the tagger,
      or used by the tagger to keep some state. Its use is optional.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NVivien Didelot <vivien.didelot@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c362beb0
    • V
      net: dsa: Add support for deferred xmit · 97a69a0d
      Vladimir Oltean 提交于
      Some hardware needs to take work to get convinced to receive frames on
      the CPU port (such as the sja1105 which takes temporary L2 forwarding
      rules over SPI that last for a single frame). Such work needs a
      sleepable context, and because the regular .ndo_start_xmit is atomic,
      this cannot be done in the tagger. So introduce a generic DSA mechanism
      that sets up a transmit skb queue and a workqueue for deferred
      transmission.
      
      The new driver callback (.port_deferred_xmit) is in dsa_switch and not
      in the tagger because the operations that require sleeping typically
      also involve interacting with the hardware, and not simply skb
      manipulations. Therefore having it there simplifies the structure a bit
      and makes it unnecessary to export functions from the driver to the
      tagger.
      
      The driver is responsible of calling dsa_enqueue_skb which transfers it
      to the master netdevice. This is so that it has a chance of performing
      some more work afterwards, such as cleanup or TX timestamping.
      
      To tell DSA that skb xmit deferral is required, I have thought about
      changing the return type of the tagger .xmit from struct sk_buff * into
      a enum dsa_tx_t that could potentially encode a DSA_XMIT_DEFER value.
      
      But the trailer tagger is reallocating every skb on xmit and therefore
      making a valid use of the pointer return value. So instead of reworking
      the API in complicated ways, right now a boolean property in the newly
      introduced DSA_SKB_CB is set.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      97a69a0d
    • V
      net: dsa: Keep private info in the skb->cb · b68b0dd0
      Vladimir Oltean 提交于
      Map a DSA structure over the 48-byte control block that will hold
      skb info on transmit and receive. This is only for use within the DSA
      processing layer (e.g. communicating between DSA core and tagger) and
      not for passing info around with other layers such as the master net
      device.
      
      Also add a DSA_SKB_CB_PRIV() macro which retrieves a pointer to the
      space up to 48 bytes that the DSA structure does not use. This space can
      be used for drivers to add their own private info.
      
      One use is for the PTP timestamping code path. When cloning a skb,
      annotate the original with a pointer to the clone, which the driver can
      then find easily and place the timestamp to. This avoids the need of a
      separate queue to hold clones and a way to match an original to a cloned
      skb.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b68b0dd0
    • V
      net: dsa: Allow drivers to filter packets they can decode source port from · cc1939e4
      Vladimir Oltean 提交于
      Frames get processed by DSA and redirected to switch port net devices
      based on the ETH_P_XDSA multiplexed packet_type handler found by the
      network stack when calling eth_type_trans().
      
      The running assumption is that once the DSA .rcv function is called, DSA
      is always able to decode the switch tag in order to change the skb->dev
      from its master.
      
      However there are tagging protocols (such as the new DSA_TAG_PROTO_SJA1105,
      user of DSA_TAG_PROTO_8021Q) where this assumption is not completely
      true, since switch tagging piggybacks on the absence of a vlan_filtering
      bridge. Moreover, management traffic (BPDU, PTP) for this switch doesn't
      rely on switch tagging, but on a different mechanism. So it would make
      sense to at least be able to terminate that.
      
      Having DSA receive traffic it can't decode would put it in an impossible
      situation: the eth_type_trans() function would invoke the DSA .rcv(),
      which could not change skb->dev, then eth_type_trans() would be invoked
      again, which again would call the DSA .rcv, and the packet would never
      be able to exit the DSA filter and would spiral in a loop until the
      whole system dies.
      
      This happens because eth_type_trans() doesn't actually look at the skb
      (so as to identify a potential tag) when it deems it as being
      ETH_P_XDSA. It just checks whether skb->dev has a DSA private pointer
      installed (therefore it's a DSA master) and that there exists a .rcv
      callback (everybody except DSA_TAG_PROTO_NONE has that). This is
      understandable as there are many switch tags out there, and exhaustively
      checking for all of them is far from ideal.
      
      The solution lies in introducing a filtering function for each tagging
      protocol. In the absence of a filtering function, all traffic is passed
      to the .rcv DSA callback. The tagging protocol should see the filtering
      function as a pre-validation that it can decode the incoming skb. The
      traffic that doesn't match the filter will bypass the DSA .rcv callback
      and be left on the master netdevice, which wasn't previously possible.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc1939e4
    • V
      net: dsa: Optional VLAN-based port separation for switches without tagging · f9bbe447
      Vladimir Oltean 提交于
      This patch provides generic DSA code for using VLAN (802.1Q) tags for
      the same purpose as a dedicated switch tag for injection/extraction.
      It is based on the discussions and interest that has been so far
      expressed in https://www.spinics.net/lists/netdev/msg556125.html.
      
      Unlike all other DSA-supported tagging protocols, CONFIG_NET_DSA_TAG_8021Q
      does not offer a complete solution for drivers (nor can it). Instead, it
      provides generic code that driver can opt into calling:
      - dsa_8021q_xmit: Inserts a VLAN header with the specified contents.
        Can be called from another tagging protocol's xmit function.
        Currently the LAN9303 driver is inserting headers that are simply
        802.1Q with custom fields, so this is an opportunity for code reuse.
      - dsa_8021q_rcv: Retrieves the TPID and TCI from a VLAN-tagged skb.
        Removing the VLAN header is left as a decision for the caller to make.
      - dsa_port_setup_8021q_tagging: For each user port, installs an Rx VID
        and a Tx VID, for proper untagged traffic identification on ingress
        and steering on egress. Also sets up the VLAN trunk on the upstream
        (CPU or DSA) port. Drivers are intentionally left to call this
        function explicitly, depending on the context and hardware support.
        The expected switch behavior and VLAN semantics should not be violated
        under any conditions. That is, after calling
        dsa_port_setup_8021q_tagging, the hardware should still pass all
        ingress traffic, be it tagged or untagged.
      
      For uniformity with the other tagging protocols, a module for the
      dsa_8021q_netdev_ops structure is registered, but the typical usage is
      to set up another tagging protocol which selects CONFIG_NET_DSA_TAG_8021Q,
      and calls the API from tag_8021q.h. Null function definitions are also
      provided so that a "depends on" is not forced in the Kconfig.
      
      This tagging protocol only works when switch ports are standalone, or
      when they are added to a VLAN-unaware bridge. It will probably remain
      this way for the reasons below.
      
      When added to a bridge that has vlan_filtering 1, the bridge core will
      install its own VLANs and reset the pvids through switchdev. For the
      bridge core, switchdev is a write-only pipe. All VLAN-related state is
      kept in the bridge core and nothing is read from DSA/switchdev or from
      the driver. So the bridge core will break this port separation because
      it will install the vlan_default_pvid into all switchdev ports.
      
      Even if we could teach the bridge driver about switchdev preference of a
      certain vlan_default_pvid (task difficult in itself since the current
      setting is per-bridge but we would need it per-port), there would still
      exist many other challenges.
      
      Firstly, in the DSA rcv callback, a driver would have to perform an
      iterative reverse lookup to find the correct switch port. That is
      because the port is a bridge slave, so its Rx VID (port PVID) is subject
      to user configuration. How would we ensure that the user doesn't reset
      the pvid to a different value (which would make an O(1) translation
      impossible), or to a non-unique value within this DSA switch tree (which
      would make any translation impossible)?
      
      Finally, not all switch ports are equal in DSA, and that makes it
      difficult for the bridge to be completely aware of this anyway.
      The CPU port needs to transmit tagged packets (VLAN trunk) in order for
      the DSA rcv code to be able to decode source information.
      But the bridge code has absolutely no idea which switch port is the CPU
      port, if nothing else then just because there is no netdevice registered
      by DSA for the CPU port.
      Also DSA does not currently allow the user to specify that they want the
      CPU port to do VLAN trunking anyway. VLANs are added to the CPU port
      using the same flags as they were added on the user port.
      
      So the VLANs installed by dsa_port_setup_8021q_tagging per driver
      request should remain private from the bridge's and user's perspective,
      and should not alter the VLAN semantics observed by the user.
      
      In the current implementation a VLAN range ending at 4095 (VLAN_N_VID)
      is reserved for this purpose. Each port receives a unique Rx VLAN and a
      unique Tx VLAN. Separate VLANs are needed for Rx and Tx because they
      serve different purposes: on Rx the switch must process traffic as
      untagged and process it with a port-based VLAN, but with care not to
      hinder bridging. On the other hand, the Tx VLAN is where the
      reachability restrictions are imposed, since by tagging frames in the
      xmit callback we are telling the switch onto which port to steer the
      frame.
      
      Some general guidance on how this support might be employed for
      real-life hardware (some comments made by Florian Fainelli):
      
      - If the hardware supports VLAN tag stacking, it should somehow back
        up its private VLAN settings when the bridge tries to override them.
        Then the driver could re-apply them as outer tags. Dedicating an outer
        tag per bridge device would allow identical inner tag VID numbers to
        co-exist, yet preserve broadcast domain isolation.
      
      - If the switch cannot handle VLAN tag stacking, it should disable this
        port separation when added as slave to a vlan_filtering bridge, in
        that case having reduced functionality.
      
      - Drivers for old switches that don't support the entire VLAN_N_VID
        range will need to rework the current range selection mechanism.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NVivien Didelot <vivien.didelot@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f9bbe447
    • P
      net/sched: add block pointer to tc_cls_common_offload structure · 88c44a52
      Pieter Jansen van Vuuren 提交于
      Some actions like the police action are stateful and could share state
      between devices. This is incompatible with offloading to multiple devices
      and drivers might want to test for shared blocks when offloading.
      Store a pointer to the tcf_block structure in the tc_cls_common_offload
      structure to allow drivers to determine when offloads apply to a shared
      block.
      Signed-off-by: NPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      88c44a52
    • P
      net/sched: extend matchall offload for hardware statistics · b7fe4ab8
      Pieter Jansen van Vuuren 提交于
      Introduce a new command for matchall classifiers that allows hardware
      to update statistics.
      Signed-off-by: NPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b7fe4ab8
    • P
      net/sched: add police action to the hardware intermediate representation · 8c8cfc6e
      Pieter Jansen van Vuuren 提交于
      Add police action to the hardware intermediate representation which
      would subsequently allow it to be used by drivers for offload.
      Signed-off-by: NPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8c8cfc6e
    • P
      net/sched: move police action structures to header · fa762da9
      Pieter Jansen van Vuuren 提交于
      Move tcf_police_params, tcf_police and tc_police_compat structures to a
      header. Making them usable to other code for example drivers that would
      offload police actions to hardware.
      Signed-off-by: NPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fa762da9
    • P
      net/sched: remove unused functions for matchall offload · dfcb19f0
      Pieter Jansen van Vuuren 提交于
      Cleanup unused functions and variables after porting to the newer
      intermediate representation.
      Signed-off-by: NPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dfcb19f0
    • P
      mlxsw: use intermediate representation for matchall offload · ab79af32
      Pieter Jansen van Vuuren 提交于
      Updates the Mellanox spectrum driver to use the newer intermediate
      representation for flow actions in matchall offloads.
      Signed-off-by: NPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab79af32
    • P
      net/sched: use the hardware intermediate representation for matchall · f00cbf19
      Pieter Jansen van Vuuren 提交于
      Extends matchall offload to make use of the hardware intermediate
      representation. More specifically, this patch moves the native TC
      actions in cls_matchall offload to the newer flow_action
      representation. This ultimately allows us to avoid a direct
      dependency on native TC actions for matchall.
      Signed-off-by: NPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f00cbf19
    • P
      net/sched: add sample action to the hardware intermediate representation · a7a7be60
      Pieter Jansen van Vuuren 提交于
      Add sample action to the hardware intermediate representation model which
      would subsequently allow it to be used by drivers for offload.
      Signed-off-by: NPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a7a7be60