1. 24 8月, 2021 1 次提交
    • V
      net: dsa: let drivers state that they need VLAN filtering while standalone · 58adf9dc
      Vladimir Oltean 提交于
      As explained in commit e358bef7 ("net: dsa: Give drivers the chance
      to veto certain upper devices"), the hellcreek driver uses some tricks
      to comply with the network stack expectations: it enforces port
      separation in standalone mode using VLANs. For untagged traffic,
      bridging between ports is prevented by using different PVIDs, and for
      VLAN-tagged traffic, it never accepts 8021q uppers with the same VID on
      two ports, so packets with one VLAN cannot leak from one port to another.
      
      That is almost fine*, and has worked because hellcreek relied on an
      implicit behavior of the DSA core that was changed by the previous
      patch: the standalone ports declare the 'rx-vlan-filter' feature as 'on
      [fixed]'. Since most of the DSA drivers are actually VLAN-unaware in
      standalone mode, that feature was actually incorrectly reflecting the
      hardware/driver state, so there was a desire to fix it. This leaves the
      hellcreek driver in a situation where it has to explicitly request this
      behavior from the DSA framework.
      
      We configure the ports as follows:
      
      - Standalone: 'rx-vlan-filter' is on. An 8021q upper on top of a
        standalone hellcreek port will go through dsa_slave_vlan_rx_add_vid
        and will add a VLAN to the hardware tables, giving the driver the
        opportunity to refuse it through .port_prechangeupper.
      
      - Bridged with vlan_filtering=0: 'rx-vlan-filter' is off. An 8021q upper
        on top of a bridged hellcreek port will not go through
        dsa_slave_vlan_rx_add_vid, because there will not be any attempt to
        offload this VLAN. The driver already disables VLAN awareness, so that
        upper should receive the traffic it needs.
      
      - Bridged with vlan_filtering=1: 'rx-vlan-filter' is on. An 8021q upper
        on top of a bridged hellcreek port will call dsa_slave_vlan_rx_add_vid,
        and can again be vetoed through .port_prechangeupper.
      
      *It is not actually completely fine, because if I follow through
      correctly, we can have the following situation:
      
      ip link add br0 type bridge vlan_filtering 0
      ip link set lan0 master br0 # lan0 now becomes VLAN-unaware
      ip link set lan0 nomaster # lan0 fails to become VLAN-aware again, therefore breaking isolation
      
      This patch fixes that corner case by extending the DSA core logic, based
      on this requested attribute, to change the VLAN awareness state of the
      switch (port) when it leaves the bridge.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: NKurt Kanzenbach <kurt@linutronix.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58adf9dc
  2. 23 8月, 2021 1 次提交
    • V
      net: dsa: track unique bridge numbers across all DSA switch trees · f5e165e7
      Vladimir Oltean 提交于
      Right now, cross-tree bridging setups work somewhat by mistake.
      
      In the case of cross-tree bridging with sja1105, all switch instances
      need to agree upon a common VLAN ID for forwarding a packet that belongs
      to a certain bridging domain.
      
      With TX forwarding offload, the VLAN ID is the bridge VLAN for
      VLAN-aware bridging, and the tag_8021q TX forwarding offload VID
      (a VLAN which has non-zero VBID bits) for VLAN-unaware bridging.
      
      The VBID for VLAN-unaware bridging is derived from the dp->bridge_num
      value calculated by DSA independently for each switch tree.
      
      If ports from one tree join one bridge, and ports from another tree join
      another bridge, DSA will assign them the same bridge_num, even though
      the bridges are different. If cross-tree bridging is supported, this
      is an issue.
      
      Modify DSA to calculate the bridge_num globally across all switch trees.
      This has the implication for a driver that the dp->bridge_num value that
      DSA will assign to its ports might not be contiguous, if there are
      boards with multiple DSA drivers instantiated. Additionally, all
      bridge_num values eat up towards each switch's
      ds->num_fwd_offloading_bridges maximum, which is potentially unfortunate,
      and can be seen as a limitation introduced by this patch. However, that
      is the lesser evil for now.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f5e165e7
  3. 09 8月, 2021 2 次提交
    • V
      net: dsa: sja1105: rely on DSA core tracking of port learning state · 5313a37b
      Vladimir Oltean 提交于
      Now that DSA keeps track of the port learning state, it becomes
      superfluous to keep an additional variable with this information in the
      sja1105 driver. Remove it.
      
      The DSA core's learning state is present in struct dsa_port *dp.
      To avoid the antipattern where we iterate through a DSA switch's
      ports and then call dsa_to_port to obtain the "dp" reference (which is
      bad because dsa_to_port iterates through the DSA switch tree once
      again), just iterate through the dst->ports and operate on those
      directly.
      
      The sja1105 had an extra use of priv->learn_ena on non-user ports. DSA
      does not touch the learning state of those ports - drivers are free to
      do what they wish on them. Mark that information with a comment in
      struct dsa_port and let sja1105 set dp->learning for cascade ports.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5313a37b
    • V
      net: dsa: centralize fast ageing when address learning is turned off · 045c45d1
      Vladimir Oltean 提交于
      Currently DSA leaves it down to device drivers to fast age the FDB on a
      port when address learning is disabled on it. There are 2 reasons for
      doing that in the first place:
      
      - when address learning is disabled by user space, through
        IFLA_BRPORT_LEARNING or the brport_attr_learning sysfs, what user
        space typically wants to achieve is to operate in a mode with no
        dynamic FDB entry on that port. But if the port is already up, some
        addresses might have been already learned on it, and it seems silly to
        wait for 5 minutes for them to expire until something useful can be
        done.
      
      - when a port leaves a bridge and becomes standalone, DSA turns off
        address learning on it. This also has the nice side effect of flushing
        the dynamically learned bridge FDB entries on it, which is a good idea
        because standalone ports should not have bridge FDB entries on them.
      
      We let drivers manage fast ageing under this condition because if DSA
      were to do it, it would need to track each port's learning state, and
      act upon the transition, which it currently doesn't.
      
      But there are 2 reasons why doing it is better after all:
      
      - drivers might get it wrong and not do it (see b53_port_set_learning)
      
      - we would like to flush the dynamic entries from the software bridge
        too, and letting drivers do that would be another pain point
      
      So track the port learning state and trigger a fast age process
      automatically within DSA.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      045c45d1
  4. 06 8月, 2021 1 次提交
    • V
      net: dsa: don't disable multicast flooding to the CPU even without an IGMP querier · c73c5708
      Vladimir Oltean 提交于
      Commit 08cc83cc ("net: dsa: add support for BRIDGE_MROUTER
      attribute") added an option for users to turn off multicast flooding
      towards the CPU if they turn off the IGMP querier on a bridge which
      already has enslaved ports (echo 0 > /sys/class/net/br0/bridge/multicast_router).
      
      And commit a8b659e7 ("net: dsa: act as passthrough for bridge port flags")
      simply papered over that issue, because it moved the decision to flood
      the CPU with multicast (or not) from the DSA core down to individual drivers,
      instead of taking a more radical position then.
      
      The truth is that disabling multicast flooding to the CPU is simply
      something we are not prepared to do now, if at all. Some reasons:
      
      - ICMP6 neighbor solicitation messages are unregistered multicast
        packets as far as the bridge is concerned. So if we stop flooding
        multicast, the outside world cannot ping the bridge device's IPv6
        link-local address.
      
      - There might be foreign interfaces bridged with our DSA switch ports
        (sending a packet towards the host does not necessarily equal
        termination, but maybe software forwarding). So if there is no one
        interested in that multicast traffic in the local network stack, that
        doesn't mean nobody is.
      
      - PTP over L4 (IPv4, IPv6) is multicast, but is unregistered as far as
        the bridge is concerned. This should reach the CPU port.
      
      - The switch driver might not do FDB partitioning. And since we don't
        even bother to do more fine-grained flood disabling (such as "disable
        flooding _from_port_N_ towards the CPU port" as opposed to "disable
        flooding _from_any_port_ towards the CPU port"), this breaks standalone
        ports, or even multiple bridges where one has an IGMP querier and one
        doesn't.
      
      Reverting the logic makes all of the above work.
      
      Fixes: a8b659e7 ("net: dsa: act as passthrough for bridge port flags")
      Fixes: 08cc83cc ("net: dsa: add support for BRIDGE_MROUTER attribute")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c73c5708
  5. 02 8月, 2021 1 次提交
  6. 28 7月, 2021 1 次提交
    • A
      dev_ioctl: split out ndo_eth_ioctl · a7605370
      Arnd Bergmann 提交于
      Most users of ndo_do_ioctl are ethernet drivers that implement
      the MII commands SIOCGMIIPHY/SIOCGMIIREG/SIOCSMIIREG, or hardware
      timestamping with SIOCSHWTSTAMP/SIOCGHWTSTAMP.
      
      Separate these from the few drivers that use ndo_do_ioctl to
      implement SIOCBOND, SIOCBR and SIOCWANDEV commands.
      
      This is a purely cosmetic change intended to help readers find
      their way through the implementation.
      
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Jay Vosburgh <j.vosburgh@gmail.com>
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Cc: Andrew Lunn <andrew@lunn.ch>
      Cc: Vivien Didelot <vivien.didelot@gmail.com>
      Cc: Florian Fainelli <f.fainelli@gmail.com>
      Cc: Vladimir Oltean <olteanv@gmail.com>
      Cc: Leon Romanovsky <leon@kernel.org>
      Cc: linux-rdma@vger.kernel.org
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NJason Gunthorpe <jgg@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a7605370
  7. 27 7月, 2021 1 次提交
  8. 23 7月, 2021 2 次提交
    • V
      net: dsa: add support for bridge TX forwarding offload · 123abc06
      Vladimir Oltean 提交于
      For a DSA switch, to offload the forwarding process of a bridge device
      means to send the packets coming from the software bridge as data plane
      packets. This is contrary to everything that DSA has done so far,
      because the current taggers only know to send control packets (ones that
      target a specific destination port), whereas data plane packets are
      supposed to be forwarded according to the FDB lookup, much like packets
      ingressing on any regular ingress port. If the FDB lookup process
      returns multiple destination ports (flooding, multicast), then
      replication is also handled by the switch hardware - the bridge only
      sends a single packet and avoids the skb_clone().
      
      DSA keeps for each bridge port a zero-based index (the number of the
      bridge). Multiple ports performing TX forwarding offload to the same
      bridge have the same dp->bridge_num value, and ports not offloading the
      TX data plane of a bridge have dp->bridge_num = -1.
      
      The tagger can check if the packet that is being transmitted on has
      skb->offload_fwd_mark = true or not. If it does, it can be sure that the
      packet belongs to the data plane of a bridge, further information about
      which can be obtained based on dp->bridge_dev and dp->bridge_num.
      It can then compose a DSA tag for injecting a data plane packet into
      that bridge number.
      
      For the switch driver side, we offer two new dsa_switch_ops methods,
      called .port_bridge_fwd_offload_{add,del}, which are modeled after
      .port_bridge_{join,leave}.
      These methods are provided in case the driver needs to configure the
      hardware to treat packets coming from that bridge software interface as
      data plane packets. The switchdev <-> bridge interaction happens during
      the netdev_master_upper_dev_link() call, so to switch drivers, the
      effect is that the .port_bridge_fwd_offload_add() method is called
      immediately after .port_bridge_join().
      
      If the bridge number exceeds the number of bridges for which the switch
      driver can offload the TX data plane (and this includes the case where
      the driver can offload none), DSA falls back to simply returning
      tx_fwd_offload = false in the switchdev_bridge_port_offload() call.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      123abc06
    • V
      net: dsa: track the number of switches in a tree · 5b22d366
      Vladimir Oltean 提交于
      In preparation of supporting data plane forwarding on behalf of a
      software bridge, some drivers might need to view bridges as virtual
      switches behind the CPU port in a cross-chip topology.
      
      Give them some help and let them know how many physical switches there
      are in the tree, so that they can count the virtual switches starting
      from that number on.
      
      Note that the first dsa_switch_ops method where this information is
      reliably available is .setup(). This is because of how DSA works:
      in a tree with 3 switches, each calling dsa_register_switch(), the first
      2 will advance until dsa_tree_setup() -> dsa_tree_setup_routing_table()
      and exit with error code 0 because the topology is not complete. Since
      probing is parallel at this point, one switch does not know about the
      existence of the other. Then the third switch comes, and for it,
      dsa_tree_setup_routing_table() returns complete = true. This switch goes
      ahead and calls dsa_tree_setup_switches() for everybody else, calling
      their .setup() methods too. This acts as the synchronization point.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b22d366
  9. 20 7月, 2021 2 次提交
    • V
      net: dsa: make tag_8021q operations part of the core · 5da11eb4
      Vladimir Oltean 提交于
      Make tag_8021q a more central element of DSA and move the 2 driver
      specific operations outside of struct dsa_8021q_context (which is
      supposed to hold dynamic data and not really constant function
      pointers).
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5da11eb4
    • V
      net: dsa: let the core manage the tag_8021q context · d7b1fd52
      Vladimir Oltean 提交于
      The basic problem description is as follows:
      
      Be there 3 switches in a daisy chain topology:
      
                                                   |
          sw0p0     sw0p1     sw0p2     sw0p3     sw0p4
       [  user ] [  user ] [  user ] [  dsa  ] [  cpu  ]
                                         |
                                         +---------+
                                                   |
          sw1p0     sw1p1     sw1p2     sw1p3     sw1p4
       [  user ] [  user ] [  user ] [  dsa  ] [  dsa  ]
                                         |
                                         +---------+
                                                   |
          sw2p0     sw2p1     sw2p2     sw2p3     sw2p4
       [  user ] [  user ] [  user ] [  user ] [  dsa  ]
      
      The CPU will not be able to ping through the user ports of the
      bottom-most switch (like for example sw2p0), simply because tag_8021q
      was not coded up for this scenario - it has always assumed DSA switch
      trees with a single switch.
      
      To add support for the topology above, we must admit that the RX VLAN of
      sw2p0 must be added on some ports of switches 0 and 1 as well. This is
      in fact a textbook example of thing that can use the cross-chip notifier
      framework that DSA has set up in switch.c.
      
      There is only one problem: core DSA (switch.c) is not able right now to
      make the connection between a struct dsa_switch *ds and a struct
      dsa_8021q_context *ctx. Right now, it is drivers who call into
      tag_8021q.c and always provide a struct dsa_8021q_context *ctx pointer,
      and tag_8021q.c calls them back with the .tag_8021q_vlan_{add,del}
      methods.
      
      But with cross-chip notifiers, it is possible for tag_8021q to call
      drivers without drivers having ever asked for anything. A good example
      is right above: when sw2p0 wants to set itself up for tag_8021q,
      the .tag_8021q_vlan_add method needs to be called for switches 1 and 0,
      so that they transport sw2p0's VLANs towards the CPU without dropping
      them.
      
      So instead of letting drivers manage the tag_8021q context, add a
      tag_8021q_ctx pointer inside of struct dsa_switch, which will be
      populated when dsa_tag_8021q_register() returns success.
      
      The patch is fairly long-winded because we are partly reverting commit
      5899ee36 ("net: dsa: tag_8021q: add a context structure") which made
      the driver-facing tag_8021q API use "ctx" instead of "ds". Now that we
      can access "ctx" directly from "ds", this is no longer needed.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7b1fd52
  10. 30 6月, 2021 3 次提交
    • V
      net: dsa: reference count the FDB addresses at the cross-chip notifier level · 3f6e32f9
      Vladimir Oltean 提交于
      The same concerns expressed for host MDB entries are valid for host FDBs
      just as well:
      
      - in the case of multiple bridges spanning the same switch chip, deleting
        a host FDB entry that belongs to one bridge will result in breakage to
        the other bridge
      - not deleting FDB entries across DSA links means that the switch's
        hardware tables will eventually run out, given enough wear&tear
      
      So do the same thing and introduce reference counting for CPU ports and
      DSA links using the same data structures as we have for MDB entries.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3f6e32f9
    • V
      net: dsa: reference count the MDB entries at the cross-chip notifier level · 161ca59d
      Vladimir Oltean 提交于
      Ever since the cross-chip notifiers were introduced, the design was
      meant to be simplistic and just get the job done without worrying too
      much about dangling resources left behind.
      
      For example, somebody installs an MDB entry on sw0p0 in this daisy chain
      topology. It gets installed using ds->ops->port_mdb_add() on sw0p0,
      sw1p4 and sw2p4.
      
                                                          |
                 sw0p0     sw0p1     sw0p2     sw0p3     sw0p4
              [  user ] [  user ] [  user ] [  dsa  ] [  cpu  ]
              [   x   ] [       ] [       ] [       ] [       ]
                                                |
                                                +---------+
                                                          |
                 sw1p0     sw1p1     sw1p2     sw1p3     sw1p4
              [  user ] [  user ] [  user ] [  dsa  ] [  dsa  ]
              [       ] [       ] [       ] [       ] [   x   ]
                                                |
                                                +---------+
                                                          |
                 sw2p0     sw2p1     sw2p2     sw2p3     sw2p4
              [  user ] [  user ] [  user ] [  user ] [  dsa  ]
              [       ] [       ] [       ] [       ] [   x   ]
      
      Then the same person deletes that MDB entry. The cross-chip notifier for
      deletion only matches sw0p0:
      
                                                          |
                 sw0p0     sw0p1     sw0p2     sw0p3     sw0p4
              [  user ] [  user ] [  user ] [  dsa  ] [  cpu  ]
              [   x   ] [       ] [       ] [       ] [       ]
                                                |
                                                +---------+
                                                          |
                 sw1p0     sw1p1     sw1p2     sw1p3     sw1p4
              [  user ] [  user ] [  user ] [  dsa  ] [  dsa  ]
              [       ] [       ] [       ] [       ] [       ]
                                                |
                                                +---------+
                                                          |
                 sw2p0     sw2p1     sw2p2     sw2p3     sw2p4
              [  user ] [  user ] [  user ] [  user ] [  dsa  ]
              [       ] [       ] [       ] [       ] [       ]
      
      Why?
      
      Because the DSA links are 'trunk' ports, if we just go ahead and delete
      the MDB from sw1p4 and sw2p4 directly, we might delete those multicast
      entries when they are still needed. Just consider the fact that somebody
      does:
      
      - add a multicast MAC address towards sw0p0 [ via the cross-chip
        notifiers it gets installed on the DSA links too ]
      - add the same multicast MAC address towards sw0p1 (another port of that
        same switch)
      - delete the same multicast MAC address from sw0p0.
      
      At this point, if we deleted the MAC address from the DSA links, it
      would be flooded, even though there is still an entry on switch 0 which
      needs it not to.
      
      So that is why deletions only match the targeted source port and nothing
      on DSA links. Of course, dangling resources means that the hardware
      tables will eventually run out given enough additions/removals, but hey,
      at least it's simple.
      
      But there is a bigger concern which needs to be addressed, and that is
      our support for SWITCHDEV_OBJ_ID_HOST_MDB. DSA simply translates such an
      object into a dsa_port_host_mdb_add() which ends up as ds->ops->port_mdb_add()
      on the upstream port, and a similar thing happens on deletion:
      dsa_port_host_mdb_del() will trigger ds->ops->port_mdb_del() on the
      upstream port.
      
      When there are 2 VLAN-unaware bridges spanning the same switch (which is
      a use case DSA proudly supports), each bridge will install its own
      SWITCHDEV_OBJ_ID_HOST_MDB entries. But upon deletion, DSA goes ahead and
      emits a DSA_NOTIFIER_MDB_DEL for dp->cpu_dp, which is shared between the
      user ports enslaved to br0 and the user ports enslaved to br1. Not good.
      The host-trapped multicast addresses installed by br1 will be deleted
      when any state changes in br0 (IGMP timers expire, or ports leave, etc).
      
      To avoid this, we could of course go the route of the zero-sum game and
      delete the DSA_NOTIFIER_MDB_DEL call for dp->cpu_dp. But the better
      design is to just admit that on shared ports like DSA links and CPU
      ports, we should be reference counting calls, even if this consumes some
      dynamic memory which DSA has traditionally avoided. On the flip side,
      the hardware tables of switches are limited in size, so it would be good
      if the OS managed them properly instead of having them eventually
      overflow.
      
      To address the memory usage concern, we only apply the refcounting of
      MDB entries on ports that are really shared (CPU ports and DSA links)
      and not on user ports. In a typical single-switch setup, this means only
      the CPU port (and the host MDB entries are not that many, really).
      
      The name of the newly introduced data structures (dsa_mac_addr) is
      chosen in such a way that will be reusable for host FDB entries (next
      patch).
      
      With this change, we can finally have the same matching logic for the
      MDB additions and deletions, as well as for their host-trapped variants.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      161ca59d
    • V
      net: dsa: introduce dsa_is_upstream_port and dsa_switch_is_upstream_of · 63609c8f
      Vladimir Oltean 提交于
      In preparation for the new cross-chip notifiers for host addresses,
      let's introduce some more topology helpers which we are going to use to
      discern switches that are in our path towards the dedicated CPU port
      from switches that aren't.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      63609c8f
  11. 22 6月, 2021 1 次提交
  12. 12 6月, 2021 2 次提交
    • V
      net: dsa: add support for the SJA1110 native tagging protocol · 4913b8eb
      Vladimir Oltean 提交于
      The SJA1110 has improved a few things compared to SJA1105:
      
      - To send a control packet from the host port with SJA1105, one needed
        to program a one-shot "management route" over SPI. This is no longer
        true with SJA1110, you can actually send "in-band control extensions"
        in the packets sent by DSA, these are in fact DSA tags which contain
        the destination port and switch ID.
      
      - When receiving a control packet from the switch with SJA1105, the
        source port and switch ID were written in bytes 3 and 4 of the
        destination MAC address of the frame (which was a very poor shot at a
        DSA header). If the control packet also had an RX timestamp, that
        timestamp was sent in an actual follow-up packet, so there were
        reordering concerns on multi-core/multi-queue DSA masters, where the
        metadata frame with the RX timestamp might get processed before the
        actual packet to which that timestamp belonged (there is no way to
        pair a packet to its timestamp other than the order in which they were
        received). On SJA1110, this is no longer true, control packets have
        the source port, switch ID and timestamp all in the DSA tags.
      
      - Timestamps from the switch were partial: to get a 64-bit timestamp as
        required by PTP stacks, one would need to take the partial 24-bit or
        32-bit timestamp from the packet, then read the current PTP time very
        quickly, and then patch in the high bits of the current PTP time into
        the captured partial timestamp, to reconstruct what the full 64-bit
        timestamp must have been. That is awful because packet processing is
        done in NAPI context, but reading the current PTP time is done over
        SPI and therefore needs sleepable context.
      
      But it also aggravated a few things:
      
      - Not only is there a DSA header in SJA1110, but there is a DSA trailer
        in fact, too. So DSA needs to be extended to support taggers which
        have both a header and a trailer. Very unconventional - my understanding
        is that the trailer exists because the timestamps couldn't be prepared
        in time for putting them in the header area.
      
      - Like SJA1105, not all packets sent to the CPU have the DSA tag added
        to them, only control packets do:
      
        * the ones which match the destination MAC filters/traps in
          MAC_FLTRES1 and MAC_FLTRES0
        * the ones which match FDB entries which have TRAP or TAKETS bits set
      
        So we could in theory hack something up to request the switch to take
        timestamps for all packets that reach the CPU, and those would be
        DSA-tagged and contain the source port / switch ID by virtue of the
        fact that there needs to be a timestamp trailer provided. BUT:
      
      - The SJA1110 does not parse its own DSA tags in a way that is useful
        for routing in cross-chip topologies, a la Marvell. And the sja1105
        driver already supports cross-chip bridging from the SJA1105 days.
        It does that by automatically setting up the DSA links as VLAN trunks
        which contain all the necessary tag_8021q RX VLANs that must be
        communicated between the switches that span the same bridge. So when
        using tag_8021q on sja1105, it is possible to have 2 switches with
        ports sw0p0, sw0p1, sw1p0, sw1p1, and 2 VLAN-unaware bridges br0 and
        br1, and br0 can take sw0p0 and sw1p0, and br1 can take sw0p1 and
        sw1p1, and forwarding will happen according to the expected rules of
        the Linux bridge.
        We like that, and we don't want that to go away, so as a matter of
        fact, the SJA1110 tagger still needs to support tag_8021q.
      
      So the sja1110 tagger is a hybrid between tag_8021q for data packets,
      and the native hardware support for control packets.
      
      On RX, packets have a 13-byte trailer if they contain an RX timestamp.
      That trailer is padded in such a way that its byte 8 (the start of the
      "residence time" field - not parsed by Linux because we don't care) is
      aligned on a 16 byte boundary. So the padding has a variable length
      between 0 and 15 bytes. The DSA header contains the offset of the
      beginning of the padding relative to the beginning of the frame (and the
      end of the padding is obviously the end of the packet minus 13 bytes,
      the length of the trailer). So we discard it.
      
      Packets which don't have a trailer contain the source port and switch ID
      information in the header (they are "trap-to-host" packets). Packets
      which have a trailer contain the source port and switch ID in the trailer.
      
      On TX, the destination port mask and switch ID is always in the trailer,
      so we always need to say in the header that a trailer is present.
      
      The header needs a custom EtherType and this was chosen as 0xdadc, after
      0xdada which is for Marvell and 0xdadb which is for VLANs in
      VLAN-unaware mode on SJA1105 (and SJA1110 in fact too).
      
      Because we use tag_8021q in concert with the native tagging protocol,
      control packets will have 2 DSA tags.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4913b8eb
    • V
      net: dsa: generalize overhead for taggers that use both headers and trailers · 4e500251
      Vladimir Oltean 提交于
      Some really really weird switches just couldn't decide whether to use a
      normal or a tail tagger, so they just did both.
      
      This creates problems for DSA, because we only have the concept of an
      'overhead' which can be applied to the headroom or to the tailroom of
      the skb (like for example during the central TX reallocation procedure),
      depending on the value of bool tail_tag, but not to both.
      
      We need to generalize DSA to cater for these odd switches by
      transforming the 'overhead / tail_tag' pair into 'needed_headroom /
      needed_tailroom'.
      
      The DSA master's MTU is increased to account for both.
      
      The flow dissector code is modified such that it only calls the DSA
      adjustment callback if the tagger has a non-zero header length.
      
      Taggers are trivially modified to declare either needed_headroom or
      needed_tailroom, based on the tail_tag value that they currently
      declare.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e500251
  13. 28 4月, 2021 3 次提交
  14. 21 4月, 2021 2 次提交
  15. 14 4月, 2021 1 次提交
    • M
      of: net: pass the dst buffer to of_get_mac_address() · 83216e39
      Michael Walle 提交于
      of_get_mac_address() returns a "const void*" pointer to a MAC address.
      Lately, support to fetch the MAC address by an NVMEM provider was added.
      But this will only work with platform devices. It will not work with
      PCI devices (e.g. of an integrated root complex) and esp. not with DSA
      ports.
      
      There is an of_* variant of the nvmem binding which works without
      devices. The returned data of a nvmem_cell_read() has to be freed after
      use. On the other hand the return of_get_mac_address() points to some
      static data without a lifetime. The trick for now, was to allocate a
      device resource managed buffer which is then returned. This will only
      work if we have an actual device.
      
      Change it, so that the caller of of_get_mac_address() has to supply a
      buffer where the MAC address is written to. Unfortunately, this will
      touch all drivers which use the of_get_mac_address().
      
      Usually the code looks like:
      
        const char *addr;
        addr = of_get_mac_address(np);
        if (!IS_ERR(addr))
          ether_addr_copy(ndev->dev_addr, addr);
      
      This can then be simply rewritten as:
      
        of_get_mac_address(np, ndev->dev_addr);
      
      Sometimes is_valid_ether_addr() is used to test the MAC address.
      of_get_mac_address() already makes sure, it just returns a valid MAC
      address. Thus we can just test its return code. But we have to be
      careful if there are still other sources for the MAC address before the
      of_get_mac_address(). In this case we have to keep the
      is_valid_ether_addr() call.
      
      The following coccinelle patch was used to convert common cases to the
      new style. Afterwards, I've manually gone over the drivers and fixed the
      return code variable: either used a new one or if one was already
      available use that. Mansour Moufid, thanks for that coccinelle patch!
      
      <spml>
      @a@
      identifier x;
      expression y, z;
      @@
      - x = of_get_mac_address(y);
      + x = of_get_mac_address(y, z);
        <...
      - ether_addr_copy(z, x);
        ...>
      
      @@
      identifier a.x;
      @@
      - if (<+... x ...+>) {}
      
      @@
      identifier a.x;
      @@
        if (<+... x ...+>) {
            ...
        }
      - else {}
      
      @@
      identifier a.x;
      expression e;
      @@
      - if (<+... x ...+>@e)
      -     {}
      - else
      + if (!(e))
            {...}
      
      @@
      expression x, y, z;
      @@
      - x = of_get_mac_address(y, z);
      + of_get_mac_address(y, z);
        ... when != x
      </spml>
      
      All drivers, except drivers/net/ethernet/aeroflex/greth.c, were
      compile-time tested.
      Suggested-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NMichael Walle <michael@walle.cc>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      83216e39
  16. 19 3月, 2021 1 次提交
  17. 18 3月, 2021 1 次提交
  18. 17 2月, 2021 1 次提交
  19. 15 2月, 2021 3 次提交
    • V
      net: dsa: propagate extack to .port_vlan_filtering · 89153ed6
      Vladimir Oltean 提交于
      Some drivers can't dynamically change the VLAN filtering option, or
      impose some restrictions, it would be nice to propagate this info
      through netlink instead of printing it to a kernel log that might never
      be read. Also netlink extack includes the module that emitted the
      message, which means that it's easier to figure out which ones are
      driver-generated errors as opposed to command misuse.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89153ed6
    • V
      net: dsa: propagate extack to .port_vlan_add · 31046a5f
      Vladimir Oltean 提交于
      Allow drivers to communicate their restrictions to user space directly,
      instead of printing to the kernel log. Where the conversion would have
      been lossy and things like VLAN ID could no longer be conveyed (due to
      the lack of support for printf format specifier in netlink extack), I
      chose to keep the messages in full form to the kernel log only, and
      leave it up to individual driver maintainers to move more messages to
      extack.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31046a5f
    • V
      net: dsa: tag_ocelot: create separate tagger for Seville · 7c4bb540
      Vladimir Oltean 提交于
      The ocelot tagger is a hot mess currently, it relies on memory
      initialized by the attached driver for basic frame transmission.
      This is against all that DSA tagging protocols stand for, which is that
      the transmission and reception of a DSA-tagged frame, the data path,
      should be independent from the switch control path, because the tag
      protocol is in principle hot-pluggable and reusable across switches
      (even if in practice it wasn't until very recently). But if another
      driver like dsa_loop wants to make use of tag_ocelot, it couldn't.
      
      This was done to have common code between Felix and Ocelot, which have
      one bit difference in the frame header format. Quoting from commit
      67c24049 ("net: dsa: felix: create a template for the DSA tags on
      xmit"):
      
          Other alternatives have been analyzed, such as:
          - Create a separate tag_seville.c: too much code duplication for just 1
            bit field difference.
          - Create a separate DSA_TAG_PROTO_SEVILLE under tag_ocelot.c, just like
            tag_brcm.c, which would have a separate .xmit function. Again, too
            much code duplication for just 1 bit field difference.
          - Allocate the template from the init function of the tag_ocelot.c
            module, instead of from the driver: couldn't figure out a method of
            accessing the correct port template corresponding to the correct
            tagger in the .xmit function.
      
      The really interesting part is that Seville should have had its own
      tagging protocol defined - it is not compatible on the wire with Ocelot,
      even for that single bit. In principle, a packet generated by
      DSA_TAG_PROTO_OCELOT when booted on NXP LS1028A would look in a certain
      way, but when booted on NXP T1040 it would look differently. The reverse
      is also true: a packet generated by a Seville switch would be
      interpreted incorrectly by Wireshark if it was told it was generated by
      an Ocelot switch.
      
      Actually things are a bit more nuanced. If we concentrate only on the
      DSA tag, what I said above is true, but Ocelot/Seville also support an
      optional DSA tag prefix, which can be short or long, and it is possible
      to distinguish the two taggers based on an integer constant put in that
      prefix. Nonetheless, creating a separate tagger is still justified,
      since the tag prefix is optional, and without it, there is again no way
      to distinguish.
      
      Claiming backwards binary compatibility is a bit more tough, since I've
      already changed the format of tag_ocelot once, in commit 5124197c
      ("net: dsa: tag_ocelot: use a short prefix on both ingress and egress").
      Therefore I am not very concerned with treating this as a bugfix and
      backporting it to stable kernels (which would be another mess due to the
      fact that there would be lots of conflicts with the other DSA_TAG_PROTO*
      definitions). It's just simpler to say that the string values of the
      taggers have ABI value starting with kernel 5.12, which will be when the
      changing of tag protocol via /sys/class/net/<dsa-master>/dsa/tagging
      goes live.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c4bb540
  20. 13 2月, 2021 1 次提交
    • V
      net: dsa: act as passthrough for bridge port flags · a8b659e7
      Vladimir Oltean 提交于
      There are multiple ways in which a PORT_BRIDGE_FLAGS attribute can be
      expressed by the bridge through switchdev, and not all of them can be
      emulated by DSA mid-layer API at the same time.
      
      One possible configuration is when the bridge offloads the port flags
      using a mask that has a single bit set - therefore only one feature
      should change. However, DSA currently groups together unicast and
      multicast flooding in the .port_egress_floods method, which limits our
      options when we try to add support for turning off broadcast flooding:
      do we extend .port_egress_floods with a third parameter which b53 and
      mv88e6xxx will ignore? But that means that the DSA layer, which
      currently implements the PRE_BRIDGE_FLAGS attribute all by itself, will
      see that .port_egress_floods is implemented, and will report that all 3
      types of flooding are supported - not necessarily true.
      
      Another configuration is when the user specifies more than one flag at
      the same time, in the same netlink message. If we were to create one
      individual function per offloadable bridge port flag, we would limit the
      expressiveness of the switch driver of refusing certain combinations of
      flag values. For example, a switch may not have an explicit knob for
      flooding of unknown multicast, just for flooding in general. In that
      case, the only correct thing to do is to allow changes to BR_FLOOD and
      BR_MCAST_FLOOD in tandem, and never allow mismatched values. But having
      a separate .port_set_unicast_flood and .port_set_multicast_flood would
      not allow the driver to possibly reject that.
      
      Also, DSA doesn't consider it necessary to inform the driver that a
      SWITCHDEV_ATTR_ID_BRIDGE_MROUTER attribute was offloaded, because it
      just calls .port_egress_floods for the CPU port. When we'll add support
      for the plain SWITCHDEV_ATTR_ID_PORT_MROUTER, that will become a real
      problem because the flood settings will need to be held statefully in
      the DSA middle layer, otherwise changing the mrouter port attribute will
      impact the flooding attribute. And that's _assuming_ that the underlying
      hardware doesn't have anything else to do when a multicast router
      attaches to a port than flood unknown traffic to it.  If it does, there
      will need to be a dedicated .port_set_mrouter anyway.
      
      So we need to let the DSA drivers see the exact form that the bridge
      passes this switchdev attribute in, otherwise we are standing in the
      way. Therefore we also need to use this form of language when
      communicating to the driver that it needs to configure its initial
      (before bridge join) and final (after bridge leave) port flags.
      
      The b53 and mv88e6xxx drivers are converted to the passthrough API and
      their implementation of .port_egress_floods is split into two: a
      function that configures unicast flooding and another for multicast.
      The mv88e6xxx implementation is quite hairy, and it turns out that
      the implementations of unknown unicast flooding are actually the same
      for 6185 and for 6352:
      
      behind the confusing names actually lie two individual bits:
      NO_UNKNOWN_MC -> FLOOD_UC = 0x4 = BIT(2)
      NO_UNKNOWN_UC -> FLOOD_MC = 0x8 = BIT(3)
      
      so there was no reason to entangle them in the first place.
      
      Whereas the 6185 writes to MV88E6185_PORT_CTL0_FORWARD_UNKNOWN of
      PORT_CTL0, which has the exact same bit index. I have left the
      implementations separate though, for the only reason that the names are
      different enough to confuse me, since I am not able to double-check with
      a user manual. The multicast flooding setting for 6185 is in a different
      register than for 6352 though.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a8b659e7
  21. 12 2月, 2021 1 次提交
  22. 30 1月, 2021 3 次提交
    • V
      net: dsa: add a second tagger for Ocelot switches based on tag_8021q · 7c83a7c5
      Vladimir Oltean 提交于
      There are use cases for which the existing tagger, based on the NPI
      (Node Processor Interface) functionality, is insufficient.
      
      Namely:
      - Frames injected through the NPI port bypass the frame analyzer, so no
        source address learning is performed, no TSN stream classification,
        etc.
      - Flow control is not functional over an NPI port (PAUSE frames are
        encapsulated in the same Extraction Frame Header as all other frames)
      - There can be at most one NPI port configured for an Ocelot switch. But
        in NXP LS1028A and T1040 there are two Ethernet CPU ports. The non-NPI
        port is currently either disabled, or operated as a plain user port
        (albeit an internally-facing one). Having the ability to configure the
        two CPU ports symmetrically could pave the way for e.g. creating a LAG
        between them, to increase bandwidth seamlessly for the system.
      
      So there is a desire to have an alternative to the NPI mode. This change
      keeps the default tagger for the Seville and Felix switches as "ocelot",
      but it can be changed via the following device attribute:
      
      echo ocelot-8021q > /sys/class/<dsa-master>/dsa/tagging
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      7c83a7c5
    • V
      net: dsa: allow changing the tag protocol via the "tagging" device attribute · 53da0eba
      Vladimir Oltean 提交于
      Currently DSA exposes the following sysfs:
      $ cat /sys/class/net/eno2/dsa/tagging
      ocelot
      
      which is a read-only device attribute, introduced in the kernel as
      commit 98cdb480 ("net: dsa: Expose tagging protocol to user-space"),
      and used by libpcap since its commit 993db3800d7d ("Add support for DSA
      link-layer types").
      
      It would be nice if we could extend this device attribute by making it
      writable:
      $ echo ocelot-8021q > /sys/class/net/eno2/dsa/tagging
      
      This is useful with DSA switches that can make use of more than one
      tagging protocol. It may be useful in dsa_loop in the future too, to
      perform offline testing of various taggers, or for changing between dsa
      and edsa on Marvell switches, if that is desirable.
      
      In terms of implementation, drivers can support this feature by
      implementing .change_tag_protocol, which should always leave the switch
      in a consistent state: either with the new protocol if things went well,
      or with the old one if something failed. Teardown of the old protocol,
      if necessary, must be handled by the driver.
      
      Some things remain as before:
      - The .get_tag_protocol is currently only called at probe time, to load
        the initial tagging protocol driver. Nonetheless, new drivers should
        report the tagging protocol in current use now.
      - The driver should manage by itself the initial setup of tagging
        protocol, no later than the .setup() method, as well as destroying
        resources used by the last tagger in use, no earlier than the
        .teardown() method.
      
      For multi-switch DSA trees, error handling is a bit more complicated,
      since e.g. the 5th out of 7 switches may fail to change the tag
      protocol. When that happens, a revert to the original tag protocol is
      attempted, but that may fail too, leaving the tree in an inconsistent
      state despite each individual switch implementing .change_tag_protocol
      transactionally. Since the intersection between drivers that implement
      .change_tag_protocol and drivers that support D in DSA is currently the
      empty set, the possibility for this error to happen is ignored for now.
      
      Testing:
      
      $ insmod mscc_felix.ko
      [   79.549784] mscc_felix 0000:00:00.5: Adding to iommu group 14
      [   79.565712] mscc_felix 0000:00:00.5: Failed to register DSA switch: -517
      $ insmod tag_ocelot.ko
      $ rmmod mscc_felix.ko
      $ insmod mscc_felix.ko
      [   97.261724] libphy: VSC9959 internal MDIO bus: probed
      [   97.267363] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 0
      [   97.274998] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 1
      [   97.282561] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 2
      [   97.289700] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 3
      [   97.599163] mscc_felix 0000:00:00.5 swp0 (uninitialized): PHY [0000:00:00.3:10] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
      [   97.862034] mscc_felix 0000:00:00.5 swp1 (uninitialized): PHY [0000:00:00.3:11] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
      [   97.950731] mscc_felix 0000:00:00.5 swp0: configuring for inband/qsgmii link mode
      [   97.964278] 8021q: adding VLAN 0 to HW filter on device swp0
      [   98.146161] mscc_felix 0000:00:00.5 swp2 (uninitialized): PHY [0000:00:00.3:12] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
      [   98.238649] mscc_felix 0000:00:00.5 swp1: configuring for inband/qsgmii link mode
      [   98.251845] 8021q: adding VLAN 0 to HW filter on device swp1
      [   98.433916] mscc_felix 0000:00:00.5 swp3 (uninitialized): PHY [0000:00:00.3:13] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
      [   98.485542] mscc_felix 0000:00:00.5: configuring for fixed/internal link mode
      [   98.503584] mscc_felix 0000:00:00.5: Link is Up - 2.5Gbps/Full - flow control rx/tx
      [   98.527948] device eno2 entered promiscuous mode
      [   98.544755] DSA: tree 0 setup
      
      $ ping 10.0.0.1
      PING 10.0.0.1 (10.0.0.1): 56 data bytes
      64 bytes from 10.0.0.1: seq=0 ttl=64 time=2.337 ms
      64 bytes from 10.0.0.1: seq=1 ttl=64 time=0.754 ms
      ^C
       -  10.0.0.1 ping statistics  -
      2 packets transmitted, 2 packets received, 0% packet loss
      round-trip min/avg/max = 0.754/1.545/2.337 ms
      
      $ cat /sys/class/net/eno2/dsa/tagging
      ocelot
      $ cat ./test_ocelot_8021q.sh
              #!/bin/bash
      
              ip link set swp0 down
              ip link set swp1 down
              ip link set swp2 down
              ip link set swp3 down
              ip link set swp5 down
              ip link set eno2 down
              echo ocelot-8021q > /sys/class/net/eno2/dsa/tagging
              ip link set eno2 up
              ip link set swp0 up
              ip link set swp1 up
              ip link set swp2 up
              ip link set swp3 up
              ip link set swp5 up
      $ ./test_ocelot_8021q.sh
      ./test_ocelot_8021q.sh: line 9: echo: write error: Protocol not available
      $ rmmod tag_ocelot.ko
      rmmod: can't unload module 'tag_ocelot': Resource temporarily unavailable
      $ insmod tag_ocelot_8021q.ko
      $ ./test_ocelot_8021q.sh
      $ cat /sys/class/net/eno2/dsa/tagging
      ocelot-8021q
      $ rmmod tag_ocelot.ko
      $ rmmod tag_ocelot_8021q.ko
      rmmod: can't unload module 'tag_ocelot_8021q': Resource temporarily unavailable
      $ ping 10.0.0.1
      PING 10.0.0.1 (10.0.0.1): 56 data bytes
      64 bytes from 10.0.0.1: seq=0 ttl=64 time=0.953 ms
      64 bytes from 10.0.0.1: seq=1 ttl=64 time=0.787 ms
      64 bytes from 10.0.0.1: seq=2 ttl=64 time=0.771 ms
      $ rmmod mscc_felix.ko
      [  645.544426] mscc_felix 0000:00:00.5: Link is Down
      [  645.838608] DSA: tree 0 torn down
      $ rmmod tag_ocelot_8021q.ko
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      53da0eba
    • V
      net: dsa: keep a copy of the tagging protocol in the DSA switch tree · 357f203b
      Vladimir Oltean 提交于
      Cascading DSA switches can be done multiple ways. There is the brute
      force approach / tag stacking, where one upstream switch, located
      between leaf switches and the host Ethernet controller, will just
      happily transport the DSA header of those leaf switches as payload.
      For this kind of setups, DSA works without any special kind of treatment
      compared to a single switch - they just aren't aware of each other.
      Then there's the approach where the upstream switch understands the tags
      it transports from its leaves below, as it doesn't push a tag of its own,
      but it routes based on the source port & switch id information present
      in that tag (as opposed to DMAC & VID) and it strips the tag when
      egressing a front-facing port. Currently only Marvell implements the
      latter, and Marvell DSA trees contain only Marvell switches.
      
      So it is safe to say that DSA trees already have a single tag protocol
      shared by all switches, and in fact this is what makes the switches able
      to understand each other. This fact is also implied by the fact that
      currently, the tagging protocol is reported as part of a sysfs installed
      on the DSA master and not per port, so it must be the same for all the
      ports connected to that DSA master regardless of the switch that they
      belong to.
      
      It's time to make this official and enforce it (yes, this also means we
      won't have any "switch understands tag to some extent but is not able to
      speak it" hardware oddities that we'll support in the future).
      
      This is needed due to the imminent introduction of the dsa_switch_ops::
      change_tag_protocol driver API. When that is introduced, we'll have
      to notify switches of the tagging protocol that they're configured to
      use. Currently the tag_ops structure pointer is held only for CPU ports.
      But there are switches which don't have CPU ports and nonetheless still
      need to be configured. These would be Marvell leaf switches whose
      upstream port is just a DSA link. How do we inform these of their
      tagging protocol setup/deletion?
      
      One answer to the above would be: iterate through the DSA switch tree's
      ports once, list the CPU ports, get their tag_ops, then iterate again
      now that we have it, and notify everybody of that tag_ops. But what to
      do if conflicts appear between one cpu_dp->tag_ops and another? There's
      no escaping the fact that conflict resolution needs to be done, so we
      can be upfront about it.
      
      Ease our work and just keep the master copy of the tag_ops inside the
      struct dsa_switch_tree. Reference counting is now moved to be per-tree
      too, instead of per-CPU port.
      
      There are many places in the data path that access master->dsa_ptr->tag_ops
      and we would introduce unnecessary performance penalty going through yet
      another indirection, so keep those right where they are.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      357f203b
  23. 16 1月, 2021 2 次提交
  24. 15 1月, 2021 1 次提交
    • T
      net: dsa: Link aggregation support · 058102a6
      Tobias Waldekranz 提交于
      Monitor the following events and notify the driver when:
      
      - A DSA port joins/leaves a LAG.
      - A LAG, made up of DSA ports, joins/leaves a bridge.
      - A DSA port in a LAG is enabled/disabled (enabled meaning
        "distributing" in 802.3ad LACP terms).
      
      When a LAG joins a bridge, the DSA subsystem will treat that as each
      individual port joining the bridge. The driver may look at the port's
      LAG device pointer to see if it is associated with any LAG, if that is
      required. This is analogue to how switchdev events are replicated out
      to all lower devices when reaching e.g. a LAG.
      
      Drivers can optionally request that DSA maintain a linear mapping from
      a LAG ID to the corresponding netdev by setting ds->num_lag_ids to the
      desired size.
      
      In the event that the hardware is not capable of offloading a
      particular LAG for any reason (the typical case being use of exotic
      modes like broadcast), DSA will take a hands-off approach, allowing
      the LAG to be formed as a pure software construct. This is reported
      back through the extended ACK, but is otherwise transparent to the
      user.
      Signed-off-by: NTobias Waldekranz <tobias@waldekranz.com>
      Reviewed-by: NVladimir Oltean <olteanv@gmail.com>
      Tested-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      058102a6
  25. 13 1月, 2021 1 次提交
  26. 12 1月, 2021 1 次提交
    • V
      net: dsa: remove the transactional logic from VLAN objects · 1958d581
      Vladimir Oltean 提交于
      It should be the driver's business to logically separate its VLAN
      offloading into a preparation and a commit phase, and some drivers don't
      need / can't do this.
      
      So remove the transactional shim from DSA and let drivers propagate
      errors directly from the .port_vlan_add callback.
      
      It would appear that the code has worse error handling now than it had
      before. DSA is the only in-kernel user of switchdev that offloads one
      switchdev object to more than one port: for every VLAN object offloaded
      to a user port, that VLAN is also offloaded to the CPU port. So the
      "prepare for user port -> check for errors -> prepare for CPU port ->
      check for errors -> commit for user port -> commit for CPU port"
      sequence appears to make more sense than the one we are using now:
      "offload to user port -> check for errors -> offload to CPU port ->
      check for errors", but it is really a compromise. In the new way, we can
      catch errors from the commit phase that we previously had to ignore.
      But we have our hands tied and cannot do any rollback now: if we add a
      VLAN on the CPU port and it fails, we can't do the rollback by simply
      deleting it from the user port, because the switchdev API is not so nice
      with us: it could have simply been there already, even with the same
      flags. So we don't even attempt to rollback anything on addition error,
      just leave whatever VLANs managed to get offloaded right where they are.
      This should not be a problem at all in practice.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: NLinus Walleij <linus.walleij@linaro.org>
      Acked-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      1958d581