1. 28 3月, 2020 2 次提交
    • V
      net: dsa: implement auto-normalization of MTU for bridge hardware datapath · bff33f7e
      Vladimir Oltean 提交于
      Many switches don't have an explicit knob for configuring the MTU
      (maximum transmission unit per interface).  Instead, they do the
      length-based packet admission checks on the ingress interface, for
      reasons that are easy to understand (why would you accept a packet in
      the queuing subsystem if you know you're going to drop it anyway).
      
      So it is actually the MRU that these switches permit configuring.
      
      In Linux there only exists the IFLA_MTU netlink attribute and the
      associated dev_set_mtu function. The comments like to play blind and say
      that it's changing the "maximum transfer unit", which is to say that
      there isn't any directionality in the meaning of the MTU word. So that
      is the interpretation that this patch is giving to things: MTU == MRU.
      
      When 2 interfaces having different MTUs are bridged, the bridge driver
      MTU auto-adjustment logic kicks in: what br_mtu_auto_adjust() does is it
      adjusts the MTU of the bridge net device itself (and not that of the
      slave net devices) to the minimum value of all slave interfaces, in
      order for forwarded packets to not exceed the MTU regardless of the
      interface they are received and send on.
      
      The idea behind this behavior, and why the slave MTUs are not adjusted,
      is that normal termination from Linux over the L2 forwarding domain
      should happen over the bridge net device, which _is_ properly limited by
      the minimum MTU. And termination over individual slave devices is
      possible even if those are bridged. But that is not "forwarding", so
      there's no reason to do normalization there, since only a single
      interface sees that packet.
      
      The problem with those switches that can only control the MRU is with
      the offloaded data path, where a packet received on an interface with
      MRU 9000 would still be forwarded to an interface with MRU 1500. And the
      br_mtu_auto_adjust() function does not really help, since the MTU
      configured on the bridge net device is ignored.
      
      In order to enforce the de-facto MTU == MRU rule for these switches, we
      need to do MTU normalization, which means: in order for no packet larger
      than the MTU configured on this port to be sent, then we need to limit
      the MRU on all ports that this packet could possibly come from. AKA
      since we are configuring the MRU via MTU, it means that all ports within
      a bridge forwarding domain should have the same MTU.
      
      And that is exactly what this patch is trying to do.
      
      >From an implementation perspective, we try to follow the intent of the
      user, otherwise there is a risk that we might livelock them (they try to
      change the MTU on an already-bridged interface, but we just keep
      changing it back in an attempt to keep the MTU normalized). So the MTU
      that the bridge is normalized to is either:
      
       - The most recently changed one:
      
         ip link set dev swp0 master br0
         ip link set dev swp1 master br0
         ip link set dev swp0 mtu 1400
      
         This sequence will make swp1 inherit MTU 1400 from swp0.
      
       - The one of the most recently added interface to the bridge:
      
         ip link set dev swp0 master br0
         ip link set dev swp1 mtu 1400
         ip link set dev swp1 master br0
      
         The above sequence will make swp0 inherit MTU 1400 as well.
      Suggested-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bff33f7e
    • V
      net: dsa: configure the MTU for switch ports · bfcb8132
      Vladimir Oltean 提交于
      It is useful be able to configure port policers on a switch to accept
      frames of various sizes:
      
      - Increase the MTU for better throughput from the default of 1500 if it
        is known that there is no 10/100 Mbps device in the network.
      - Decrease the MTU to limit the latency of high-priority frames under
        congestion, or work around various network segments that add extra
        headers to packets which can't be fragmented.
      
      For DSA slave ports, this is mostly a pass-through callback, called
      through the regular ndo ops and at probe time (to ensure consistency
      across all supported switches).
      
      The CPU port is called with an MTU equal to the largest configured MTU
      of the slave ports. The assumption is that the user might want to
      sustain a bidirectional conversation with a partner over any switch
      port.
      
      The DSA master is configured the same as the CPU port, plus the tagger
      overhead. Since the MTU is by definition L2 payload (sans Ethernet
      header), it is up to each individual driver to figure out if it needs to
      do anything special for its frame tags on the CPU port (it shouldn't
      except in special cases). So the MTU does not contain the tagger
      overhead on the CPU port.
      However the MTU of the DSA master, minus the tagger overhead, is used as
      a proxy for the MTU of the CPU port, which does not have a net device.
      This is to avoid uselessly calling the .change_mtu function on the CPU
      port when nothing should change.
      
      So it is safe to assume that the DSA master and the CPU port MTUs are
      apart by exactly the tagger's overhead in bytes.
      
      Some changes were made around dsa_master_set_mtu(), function which was
      now removed, for 2 reasons:
        - dev_set_mtu() already calls dev_validate_mtu(), so it's redundant to
          do the same thing in DSA
        - __dev_set_mtu() returns 0 if ops->ndo_change_mtu is an absent method
      That is to say, there's no need for this function in DSA, we can safely
      call dev_set_mtu() directly, take the rtnl lock when necessary, and just
      propagate whatever errors get reported (since the user probably wants to
      be informed).
      
      Some inspiration (mainly in the MTU DSA notifier) was taken from a
      vaguely similar patch from Murali and Florian, who are credited as
      co-developers down below.
      Co-developed-by: NMurali Krishna Policharla <murali.policharla@broadcom.com>
      Signed-off-by: NMurali Krishna Policharla <murali.policharla@broadcom.com>
      Co-developed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bfcb8132
  2. 04 3月, 2020 1 次提交
  3. 28 2月, 2020 1 次提交
  4. 09 1月, 2020 1 次提交
    • F
      net: dsa: Get information about stacked DSA protocol · 4d776482
      Florian Fainelli 提交于
      It is possible to stack multiple DSA switches in a way that they are not
      part of the tree (disjoint) but the DSA master of a switch is a DSA
      slave of another. When that happens switch drivers may have to know this
      is the case so as to determine whether their tagging protocol has a
      remove chance of working.
      
      This is useful for specific switch drivers such as b53 where devices
      have been known to be stacked in the wild without the Broadcom tag
      protocol supporting that feature. This allows b53 to continue supporting
      those devices by forcing the disabling of Broadcom tags on the outermost
      switches if necessary.
      
      The get_tag_protocol() function is therefore updated to gain an
      additional enum dsa_tag_protocol argument which denotes the current
      tagging protocol used by the DSA master we are attached to, else
      DSA_TAG_PROTO_NONE for the top of the dsa_switch_tree.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4d776482
  5. 06 1月, 2020 2 次提交
    • V
      net: dsa: Pass pcs_poll flag from driver to PHYLINK · 787cac3f
      Vladimir Oltean 提交于
      The DSA drivers that implement .phylink_mac_link_state should normally
      register an interrupt for the PCS, from which they should call
      phylink_mac_change(). However not all switches implement this, and those
      who don't should set this flag in dsa_switch in the .setup callback, so
      that PHYLINK will poll for a few ms until the in-band AN link timer
      expires and the PCS state settles.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      787cac3f
    • V
      net: dsa: Make deferred_xmit private to sja1105 · a68578c2
      Vladimir Oltean 提交于
      There are 3 things that are wrong with the DSA deferred xmit mechanism:
      
      1. Its introduction has made the DSA hotpath ever so slightly more
         inefficient for everybody, since DSA_SKB_CB(skb)->deferred_xmit needs
         to be initialized to false for every transmitted frame, in order to
         figure out whether the driver requested deferral or not (a very rare
         occasion, rare even for the only driver that does use this mechanism:
         sja1105). That was necessary to avoid kfree_skb from freeing the skb.
      
      2. Because L2 PTP is a link-local protocol like STP, it requires
         management routes and deferred xmit with this switch. But as opposed
         to STP, the deferred work mechanism needs to schedule the packet
         rather quickly for the TX timstamp to be collected in time and sent
         to user space. But there is no provision for controlling the
         scheduling priority of this deferred xmit workqueue. Too bad this is
         a rather specific requirement for a feature that nobody else uses
         (more below).
      
      3. Perhaps most importantly, it makes the DSA core adhere a bit too
         much to the NXP company-wide policy "Innovate Where It Doesn't
         Matter". The sja1105 is probably the only DSA switch that requires
         some frames sent from the CPU to be routed to the slave port via an
         out-of-band configuration (register write) rather than in-band (DSA
         tag). And there are indeed very good reasons to not want to do that:
         if that out-of-band register is at the other end of a slow bus such
         as SPI, then you limit that Ethernet flow's throughput to effectively
         the throughput of the SPI bus. So hardware vendors should definitely
         not be encouraged to design this way. We do _not_ want more
         widespread use of this mechanism.
      
      Luckily we have a solution for each of the 3 issues:
      
      For 1, we can just remove that variable in the skb->cb and counteract
      the effect of kfree_skb with skb_get, much to the same effect. The
      advantage, of course, being that anybody who doesn't use deferred xmit
      doesn't need to do any extra operation in the hotpath.
      
      For 2, we can create a kernel thread for each port's deferred xmit work.
      If the user switch ports are named swp0, swp1, swp2, the kernel threads
      will be named swp0_xmit, swp1_xmit, swp2_xmit (there appears to be a 15
      character length limit on kernel thread names). With this, the user can
      change the scheduling priority with chrt $(pidof swp2_xmit).
      
      For 3, we can actually move the entire implementation to the sja1105
      driver.
      
      So this patch deletes the generic implementation from the DSA core and
      adds a new one, more adequate to the requirements of PTP TX
      timestamping, in sja1105_main.c.
      Suggested-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a68578c2
  6. 21 12月, 2019 1 次提交
  7. 16 11月, 2019 1 次提交
    • V
      net: dsa: ocelot: add tagger for Ocelot/Felix switches · 8dce89aa
      Vladimir Oltean 提交于
      While it is entirely possible that this tagger format is in fact more
      generic than just these 2 switch families, I don't have that knowledge.
      The Seville switch in NXP T1040 has a similar frame format, but there
      are enough differences (e.g. DEST field starts at bit 57 instead of 56)
      that calling this file tag_vitesse.c is a bit of a stretch at the
      moment. The frame format has been listed in a comment so that people who
      add support for further Vitesse switches can rework this tagger while
      keeping compatibility with Felix.
      
      The "ocelot" name was chosen instead of "felix" because even the Ocelot
      switch can act as a DSA device when it is used in NPI mode, and the Felix
      tagger format is almost identical. Currently it is only used for the
      Felix switch embedded in the NXP LS1028A chip.
      
      The ABI for this tagger should be considered "not stable" at the moment.
      The DSA tag is always placed before the Ethernet header and therefore,
      we are using the long prefix for RX tags to avoid putting the DSA master
      port in promiscuous mode. Once there will be an API in DSA for drivers
      to request DSA masters to be in promiscuous mode unconditionally, we
      will switch to the "no prefix" extraction frame header, which will save
      16 padding bytes for each RX frame.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8dce89aa
  8. 06 11月, 2019 1 次提交
  9. 01 11月, 2019 3 次提交
  10. 30 10月, 2019 1 次提交
  11. 29 10月, 2019 1 次提交
  12. 23 10月, 2019 7 次提交
  13. 03 10月, 2019 1 次提交
    • V
      net: dsa: Remove unused __DSA_SKB_CB macro · 37048e94
      Vladimir Oltean 提交于
      The struct __dsa_skb_cb is supposed to span the entire 48-byte skb
      control block, while the struct dsa_skb_cb only the portion of it which
      is used by the DSA core (the rest is available as private data to
      drivers).
      
      The DSA_SKB_CB and __DSA_SKB_CB helpers are supposed to help retrieve
      this pointer based on a skb, but it turns out there is nobody directly
      interested in the struct __dsa_skb_cb in the kernel. So remove it.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      37048e94
  14. 17 9月, 2019 1 次提交
  15. 28 8月, 2019 1 次提交
    • V
      net: dsa: remove bitmap operations · e65d45cc
      Vivien Didelot 提交于
      The bitmap operations were introduced to simplify the switch drivers
      in the future, since most of them could implement the common VLAN and
      MDB operations (add, del, dump) with simple functions taking all target
      ports at once, and thus limiting the number of hardware accesses.
      
      Programming an MDB or VLAN this way in a single operation would clearly
      simplify the drivers a lot but would require a new get-set interface
      in DSA. The usage of such bitmap from the stack also raised concerned
      in the past, leading to the dynamic allocation of a new ds->_bitmap
      member in the dsa_switch structure. So let's get rid of them for now.
      
      This commit nicely wraps the ds->ops->port_{mdb,vlan}_{prepare,add}
      switch operations into new dsa_switch_{mdb,vlan}_{prepare,add}
      variants not using any bitmap argument anymore.
      
      New dsa_switch_{mdb,vlan}_match helpers have been introduced to make
      clear which local port of a switch must be programmed with the target
      object. While the targeted user port is an obvious candidate, the
      DSA links must also be programmed, as well as the CPU port for VLANs.
      
      While at it, also remove local variables that are only used once.
      Signed-off-by: NVivien Didelot <vivien.didelot@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e65d45cc
  16. 31 7月, 2019 1 次提交
  17. 15 6月, 2019 1 次提交
  18. 09 6月, 2019 1 次提交
  19. 31 5月, 2019 1 次提交
  20. 30 5月, 2019 1 次提交
    • I
      net: phylink: Add struct phylink_config to PHYLINK API · 44cc27e4
      Ioana Ciornei 提交于
      The phylink_config structure will encapsulate a pointer to a struct
      device and the operation type requested for this instance of PHYLINK.
      This patch does not make any functional changes, it just transitions the
      PHYLINK internals and all its users to the new API.
      
      A pointer to a phylink_config structure will be passed to
      phylink_create() instead of the net_device directly. Also, the same
      phylink_config pointer will be passed back to all phylink_mac_ops
      callbacks instead of the net_device. Using this mechanism, a PHYLINK
      user can get the original net_device using a structure such as
      'to_net_dev(config->dev)' or directly the structure containing the
      phylink_config using a container_of call.
      
      At the moment, only the PHYLINK_NETDEV is defined as a valid operation
      type for PHYLINK. In this mode, a valid reference to a struct device
      linked to the original net_device should be passed to PHYLINK through
      the phylink_config structure.
      
      This API changes is mainly driven by the necessity of adding a new
      operation type in PHYLINK that disconnects the phy_device from the
      net_device and also works when the net_device is lacking.
      Signed-off-by: NIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NMaxime Chevallier <maxime.chevallier@bootlin.com>
      Tested-by: NMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      44cc27e4
  21. 13 5月, 2019 3 次提交
    • V
      net: dsa: Remove the now unused DSA_SKB_CB_COPY() macro · 1c9b1420
      Vladimir Oltean 提交于
      It's best to not expose this, due to the performance hit it may cause
      when calling it.
      
      Fixes: b68b0dd0 ("net: dsa: Keep private info in the skb->cb")
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1c9b1420
    • V
      net: dsa: Remove dangerous DSA_SKB_CLONE() macro · 506f0e09
      Vladimir Oltean 提交于
      This does not cause any bug now because it has no users, but its body
      contains two pointer definitions within a code block:
      
      		struct sk_buff *clone = _clone;	\
      		struct sk_buff *skb = _skb;	\
      
      When calling the macro as DSA_SKB_CLONE(clone, skb), these variables
      would obscure the arguments that the macro was called with, and the
      initializers would be a no-op instead of doing their job (undefined
      behavior, by the way, but GCC nicely puts NULL pointers instead).
      
      So simply remove this broken macro and leave users to simply call
      "DSA_SKB_CB(skb)->clone = clone" by hand when needed.
      
      There is one functional difference when doing what I just suggested
      above: the control block won't be transferred from the original skb into
      the clone. Since there's no foreseen need for the control block in the
      clone ATM, this is ok.
      
      Fixes: b68b0dd0 ("net: dsa: Keep private info in the skb->cb")
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      506f0e09
    • V
      net: dsa: Initialize DSA_SKB_CB(skb)->deferred_xmit variable · 87671375
      Vladimir Oltean 提交于
      The sk_buff control block can have any contents on xmit put there by the
      stack, so initialization is mandatory, since we are checking its value
      after the actual DSA xmit (the tagger may have changed it).
      
      The DSA_SKB_ZERO() macro could have been used for this purpose, but:
      - Zeroizing a 48-byte memory region in the hotpath is best avoided.
      - It would have triggered a warning with newer compilers since
        __dsa_skb_cb contains a structure within a structure, and the {0}
        initializer was incorrect for that purpose.
      
      So simply remove the DSA_SKB_ZERO() macro and initialize the
      deferred_xmit variable by hand (which should be done for all further
      dsa_skb_cb variables which need initialization - currently none - to
      avoid the performance penalty).
      
      Fixes: 97a69a0d ("net: dsa: Add support for deferred xmit")
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      87671375
  22. 06 5月, 2019 6 次提交
    • V
      net: dsa: sja1105: Add support for traffic through standalone ports · 227d07a0
      Vladimir Oltean 提交于
      In order to support this, we are creating a make-shift switch tag out of
      a VLAN trunk configured on the CPU port. Termination of normal traffic
      on switch ports only works when not under a vlan_filtering bridge.
      Termination of management (PTP, BPDU) traffic works under all
      circumstances because it uses a different tagging mechanism
      (incl_srcpt). We are making use of the generic CONFIG_NET_DSA_TAG_8021Q
      code and leveraging it from our own CONFIG_NET_DSA_TAG_SJA1105.
      
      There are two types of traffic: regular and link-local.
      
      The link-local traffic received on the CPU port is trapped from the
      switch's regular forwarding decisions because it matched one of the two
      DMAC filters for management traffic.
      
      On transmission, the switch requires special massaging for these
      link-local frames. Due to a weird implementation of the switching IP, by
      default it drops link-local frames that originate on the CPU port.
      It needs to be told where to forward them to, through an SPI command
      ("management route") that is valid for only a single frame.
      So when we're sending link-local traffic, we are using the
      dsa_defer_xmit mechanism.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      227d07a0
    • V
      net: dsa: Add a private structure pointer to dsa_port · c362beb0
      Vladimir Oltean 提交于
      This is supposed to share information between the driver and the tagger,
      or used by the tagger to keep some state. Its use is optional.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NVivien Didelot <vivien.didelot@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c362beb0
    • V
      net: dsa: Add support for deferred xmit · 97a69a0d
      Vladimir Oltean 提交于
      Some hardware needs to take work to get convinced to receive frames on
      the CPU port (such as the sja1105 which takes temporary L2 forwarding
      rules over SPI that last for a single frame). Such work needs a
      sleepable context, and because the regular .ndo_start_xmit is atomic,
      this cannot be done in the tagger. So introduce a generic DSA mechanism
      that sets up a transmit skb queue and a workqueue for deferred
      transmission.
      
      The new driver callback (.port_deferred_xmit) is in dsa_switch and not
      in the tagger because the operations that require sleeping typically
      also involve interacting with the hardware, and not simply skb
      manipulations. Therefore having it there simplifies the structure a bit
      and makes it unnecessary to export functions from the driver to the
      tagger.
      
      The driver is responsible of calling dsa_enqueue_skb which transfers it
      to the master netdevice. This is so that it has a chance of performing
      some more work afterwards, such as cleanup or TX timestamping.
      
      To tell DSA that skb xmit deferral is required, I have thought about
      changing the return type of the tagger .xmit from struct sk_buff * into
      a enum dsa_tx_t that could potentially encode a DSA_XMIT_DEFER value.
      
      But the trailer tagger is reallocating every skb on xmit and therefore
      making a valid use of the pointer return value. So instead of reworking
      the API in complicated ways, right now a boolean property in the newly
      introduced DSA_SKB_CB is set.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      97a69a0d
    • V
      net: dsa: Keep private info in the skb->cb · b68b0dd0
      Vladimir Oltean 提交于
      Map a DSA structure over the 48-byte control block that will hold
      skb info on transmit and receive. This is only for use within the DSA
      processing layer (e.g. communicating between DSA core and tagger) and
      not for passing info around with other layers such as the master net
      device.
      
      Also add a DSA_SKB_CB_PRIV() macro which retrieves a pointer to the
      space up to 48 bytes that the DSA structure does not use. This space can
      be used for drivers to add their own private info.
      
      One use is for the PTP timestamping code path. When cloning a skb,
      annotate the original with a pointer to the clone, which the driver can
      then find easily and place the timestamp to. This avoids the need of a
      separate queue to hold clones and a way to match an original to a cloned
      skb.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b68b0dd0
    • V
      net: dsa: Allow drivers to filter packets they can decode source port from · cc1939e4
      Vladimir Oltean 提交于
      Frames get processed by DSA and redirected to switch port net devices
      based on the ETH_P_XDSA multiplexed packet_type handler found by the
      network stack when calling eth_type_trans().
      
      The running assumption is that once the DSA .rcv function is called, DSA
      is always able to decode the switch tag in order to change the skb->dev
      from its master.
      
      However there are tagging protocols (such as the new DSA_TAG_PROTO_SJA1105,
      user of DSA_TAG_PROTO_8021Q) where this assumption is not completely
      true, since switch tagging piggybacks on the absence of a vlan_filtering
      bridge. Moreover, management traffic (BPDU, PTP) for this switch doesn't
      rely on switch tagging, but on a different mechanism. So it would make
      sense to at least be able to terminate that.
      
      Having DSA receive traffic it can't decode would put it in an impossible
      situation: the eth_type_trans() function would invoke the DSA .rcv(),
      which could not change skb->dev, then eth_type_trans() would be invoked
      again, which again would call the DSA .rcv, and the packet would never
      be able to exit the DSA filter and would spiral in a loop until the
      whole system dies.
      
      This happens because eth_type_trans() doesn't actually look at the skb
      (so as to identify a potential tag) when it deems it as being
      ETH_P_XDSA. It just checks whether skb->dev has a DSA private pointer
      installed (therefore it's a DSA master) and that there exists a .rcv
      callback (everybody except DSA_TAG_PROTO_NONE has that). This is
      understandable as there are many switch tags out there, and exhaustively
      checking for all of them is far from ideal.
      
      The solution lies in introducing a filtering function for each tagging
      protocol. In the absence of a filtering function, all traffic is passed
      to the .rcv DSA callback. The tagging protocol should see the filtering
      function as a pre-validation that it can decode the incoming skb. The
      traffic that doesn't match the filter will bypass the DSA .rcv callback
      and be left on the master netdevice, which wasn't previously possible.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc1939e4
    • V
      net: dsa: Optional VLAN-based port separation for switches without tagging · f9bbe447
      Vladimir Oltean 提交于
      This patch provides generic DSA code for using VLAN (802.1Q) tags for
      the same purpose as a dedicated switch tag for injection/extraction.
      It is based on the discussions and interest that has been so far
      expressed in https://www.spinics.net/lists/netdev/msg556125.html.
      
      Unlike all other DSA-supported tagging protocols, CONFIG_NET_DSA_TAG_8021Q
      does not offer a complete solution for drivers (nor can it). Instead, it
      provides generic code that driver can opt into calling:
      - dsa_8021q_xmit: Inserts a VLAN header with the specified contents.
        Can be called from another tagging protocol's xmit function.
        Currently the LAN9303 driver is inserting headers that are simply
        802.1Q with custom fields, so this is an opportunity for code reuse.
      - dsa_8021q_rcv: Retrieves the TPID and TCI from a VLAN-tagged skb.
        Removing the VLAN header is left as a decision for the caller to make.
      - dsa_port_setup_8021q_tagging: For each user port, installs an Rx VID
        and a Tx VID, for proper untagged traffic identification on ingress
        and steering on egress. Also sets up the VLAN trunk on the upstream
        (CPU or DSA) port. Drivers are intentionally left to call this
        function explicitly, depending on the context and hardware support.
        The expected switch behavior and VLAN semantics should not be violated
        under any conditions. That is, after calling
        dsa_port_setup_8021q_tagging, the hardware should still pass all
        ingress traffic, be it tagged or untagged.
      
      For uniformity with the other tagging protocols, a module for the
      dsa_8021q_netdev_ops structure is registered, but the typical usage is
      to set up another tagging protocol which selects CONFIG_NET_DSA_TAG_8021Q,
      and calls the API from tag_8021q.h. Null function definitions are also
      provided so that a "depends on" is not forced in the Kconfig.
      
      This tagging protocol only works when switch ports are standalone, or
      when they are added to a VLAN-unaware bridge. It will probably remain
      this way for the reasons below.
      
      When added to a bridge that has vlan_filtering 1, the bridge core will
      install its own VLANs and reset the pvids through switchdev. For the
      bridge core, switchdev is a write-only pipe. All VLAN-related state is
      kept in the bridge core and nothing is read from DSA/switchdev or from
      the driver. So the bridge core will break this port separation because
      it will install the vlan_default_pvid into all switchdev ports.
      
      Even if we could teach the bridge driver about switchdev preference of a
      certain vlan_default_pvid (task difficult in itself since the current
      setting is per-bridge but we would need it per-port), there would still
      exist many other challenges.
      
      Firstly, in the DSA rcv callback, a driver would have to perform an
      iterative reverse lookup to find the correct switch port. That is
      because the port is a bridge slave, so its Rx VID (port PVID) is subject
      to user configuration. How would we ensure that the user doesn't reset
      the pvid to a different value (which would make an O(1) translation
      impossible), or to a non-unique value within this DSA switch tree (which
      would make any translation impossible)?
      
      Finally, not all switch ports are equal in DSA, and that makes it
      difficult for the bridge to be completely aware of this anyway.
      The CPU port needs to transmit tagged packets (VLAN trunk) in order for
      the DSA rcv code to be able to decode source information.
      But the bridge code has absolutely no idea which switch port is the CPU
      port, if nothing else then just because there is no netdevice registered
      by DSA for the CPU port.
      Also DSA does not currently allow the user to specify that they want the
      CPU port to do VLAN trunking anyway. VLANs are added to the CPU port
      using the same flags as they were added on the user port.
      
      So the VLANs installed by dsa_port_setup_8021q_tagging per driver
      request should remain private from the bridge's and user's perspective,
      and should not alter the VLAN semantics observed by the user.
      
      In the current implementation a VLAN range ending at 4095 (VLAN_N_VID)
      is reserved for this purpose. Each port receives a unique Rx VLAN and a
      unique Tx VLAN. Separate VLANs are needed for Rx and Tx because they
      serve different purposes: on Rx the switch must process traffic as
      untagged and process it with a port-based VLAN, but with care not to
      hinder bridging. On the other hand, the Tx VLAN is where the
      reachability restrictions are imposed, since by tagging frames in the
      xmit callback we are telling the switch onto which port to steer the
      frame.
      
      Some general guidance on how this support might be employed for
      real-life hardware (some comments made by Florian Fainelli):
      
      - If the hardware supports VLAN tag stacking, it should somehow back
        up its private VLAN settings when the bridge tries to override them.
        Then the driver could re-apply them as outer tags. Dedicating an outer
        tag per bridge device would allow identical inner tag VID numbers to
        co-exist, yet preserve broadcast domain isolation.
      
      - If the switch cannot handle VLAN tag stacking, it should disable this
        port separation when added as slave to a vlan_filtering bridge, in
        that case having reduced functionality.
      
      - Drivers for old switches that don't support the entire VLAN_N_VID
        range will need to rework the current range selection mechanism.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NVivien Didelot <vivien.didelot@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f9bbe447
  23. 01 5月, 2019 1 次提交