1. 09 8月, 2021 1 次提交
    • V
      net: dsa: centralize fast ageing when address learning is turned off · 045c45d1
      Vladimir Oltean 提交于
      Currently DSA leaves it down to device drivers to fast age the FDB on a
      port when address learning is disabled on it. There are 2 reasons for
      doing that in the first place:
      
      - when address learning is disabled by user space, through
        IFLA_BRPORT_LEARNING or the brport_attr_learning sysfs, what user
        space typically wants to achieve is to operate in a mode with no
        dynamic FDB entry on that port. But if the port is already up, some
        addresses might have been already learned on it, and it seems silly to
        wait for 5 minutes for them to expire until something useful can be
        done.
      
      - when a port leaves a bridge and becomes standalone, DSA turns off
        address learning on it. This also has the nice side effect of flushing
        the dynamically learned bridge FDB entries on it, which is a good idea
        because standalone ports should not have bridge FDB entries on them.
      
      We let drivers manage fast ageing under this condition because if DSA
      were to do it, it would need to track each port's learning state, and
      act upon the transition, which it currently doesn't.
      
      But there are 2 reasons why doing it is better after all:
      
      - drivers might get it wrong and not do it (see b53_port_set_learning)
      
      - we would like to flush the dynamic entries from the software bridge
        too, and letting drivers do that would be another pain point
      
      So track the port learning state and trigger a fast age process
      automatically within DSA.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      045c45d1
  2. 08 8月, 2021 1 次提交
    • V
      net: dsa: don't fast age standalone ports · 39f32101
      Vladimir Oltean 提交于
      DSA drives the procedure to flush dynamic FDB entries from a port based
      on the change of STP state: whenever we go from a state where address
      learning is enabled (LEARNING, FORWARDING) to a state where it isn't
      (LISTENING, BLOCKING, DISABLED), we need to flush the existing dynamic
      entries.
      
      However, there are cases when this is not needed. Internally, when a
      DSA switch interface is not under a bridge, DSA still keeps it in the
      "FORWARDING" STP state. And when that interface joins a bridge, the
      bridge will meticulously iterate that port through all STP states,
      starting with BLOCKING and ending with FORWARDING. Because there is a
      state transition from the standalone version of FORWARDING into the
      temporary BLOCKING bridge port state, DSA calls the fast age procedure.
      
      Since commit 5e38c158 ("net: dsa: configure better brport flags when
      ports leave the bridge"), DSA asks standalone ports to disable address
      learning. Therefore, there can be no dynamic FDB entries on a standalone
      port. Therefore, it does not make sense to flush dynamic FDB entries on
      one.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      39f32101
  3. 06 8月, 2021 1 次提交
    • V
      net: dsa: don't disable multicast flooding to the CPU even without an IGMP querier · c73c5708
      Vladimir Oltean 提交于
      Commit 08cc83cc ("net: dsa: add support for BRIDGE_MROUTER
      attribute") added an option for users to turn off multicast flooding
      towards the CPU if they turn off the IGMP querier on a bridge which
      already has enslaved ports (echo 0 > /sys/class/net/br0/bridge/multicast_router).
      
      And commit a8b659e7 ("net: dsa: act as passthrough for bridge port flags")
      simply papered over that issue, because it moved the decision to flood
      the CPU with multicast (or not) from the DSA core down to individual drivers,
      instead of taking a more radical position then.
      
      The truth is that disabling multicast flooding to the CPU is simply
      something we are not prepared to do now, if at all. Some reasons:
      
      - ICMP6 neighbor solicitation messages are unregistered multicast
        packets as far as the bridge is concerned. So if we stop flooding
        multicast, the outside world cannot ping the bridge device's IPv6
        link-local address.
      
      - There might be foreign interfaces bridged with our DSA switch ports
        (sending a packet towards the host does not necessarily equal
        termination, but maybe software forwarding). So if there is no one
        interested in that multicast traffic in the local network stack, that
        doesn't mean nobody is.
      
      - PTP over L4 (IPv4, IPv6) is multicast, but is unregistered as far as
        the bridge is concerned. This should reach the CPU port.
      
      - The switch driver might not do FDB partitioning. And since we don't
        even bother to do more fine-grained flood disabling (such as "disable
        flooding _from_port_N_ towards the CPU port" as opposed to "disable
        flooding _from_any_port_ towards the CPU port"), this breaks standalone
        ports, or even multiple bridges where one has an IGMP querier and one
        doesn't.
      
      Reverting the logic makes all of the above work.
      
      Fixes: a8b659e7 ("net: dsa: act as passthrough for bridge port flags")
      Fixes: 08cc83cc ("net: dsa: add support for BRIDGE_MROUTER attribute")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c73c5708
  4. 30 7月, 2021 1 次提交
    • V
      net: dsa: don't set skb->offload_fwd_mark when not offloading the bridge · bea79078
      Vladimir Oltean 提交于
      DSA has gained the recent ability to deal gracefully with upper
      interfaces it cannot offload, such as the bridge, bonding or team
      drivers. When such uppers exist, the ports are still in standalone mode
      as far as the hardware is concerned.
      
      But when we deliver packets to the software bridge in order for that to
      do the forwarding, there is an unpleasant surprise in that the bridge
      will refuse to forward them. This is because we unconditionally set
      skb->offload_fwd_mark = true, meaning that the bridge thinks the frames
      were already forwarded in hardware by us.
      
      Since dp->bridge_dev is populated only when there is hardware offload
      for it, but not in the software fallback case, let's introduce a new
      helper that can be called from the tagger data path which sets the
      skb->offload_fwd_mark accordingly to zero when there is no hardware
      offload for bridging. This lets the bridge forward packets back to other
      interfaces of our switch, if needed.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NTobias Waldekranz <tobias@waldekranz.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bea79078
  5. 27 7月, 2021 1 次提交
    • V
      net: dsa: sja1105: add support for imprecise RX · 884be12f
      Vladimir Oltean 提交于
      This is already common knowledge by now, but the sja1105 does not have
      hardware support for DSA tagging for data plane packets, and tag_8021q
      sets up a unique pvid per port, transmitted as VLAN-tagged towards the
      CPU, for the source port to be decoded nonetheless.
      
      When the port is part of a VLAN-aware bridge, the pvid committed to
      hardware is taken from the bridge and not from tag_8021q, so we need to
      work with that the best we can.
      
      Configure the switches to send all packets to the CPU as VLAN-tagged
      (even ones that were originally untagged on the wire) and make use of
      dsa_untag_bridge_pvid() to get rid of it before we send those packets up
      the network stack.
      
      With the classified VLAN used by hardware known to the tagger, we first
      peek at the VID in an attempt to figure out if the packet was received
      from a VLAN-unaware port (standalone or under a VLAN-unaware bridge),
      case in which we can continue to call dsa_8021q_rcv(). If that is not
      the case, the packet probably came from a VLAN-aware bridge. So we call
      the DSA helper that finds for us a "designated bridge port" - one that
      is a member of the VLAN ID from the packet, and is in the proper STP
      state - basically these are all checks performed by br_handle_frame() in
      the software RX data path.
      
      The bridge will accept the packet as valid even if the source port was
      maybe wrong. So it will maybe learn the MAC SA of the packet on the
      wrong port, and its software FDB will be out of sync with the hardware
      FDB. So replies towards this same MAC DA will not work, because the
      bridge will send towards a different netdev.
      
      This is where the bridge data plane offload ("imprecise TX") added by
      the next patch comes in handy. The software FDB is wrong, true, but the
      hardware FDB isn't, and by offloading the bridge forwarding plane we
      have a chance to right a wrong, and have the hardware look up the FDB
      for us for the reply packet. So it all cancels out.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      884be12f
  6. 23 7月, 2021 1 次提交
    • V
      net: dsa: add support for bridge TX forwarding offload · 123abc06
      Vladimir Oltean 提交于
      For a DSA switch, to offload the forwarding process of a bridge device
      means to send the packets coming from the software bridge as data plane
      packets. This is contrary to everything that DSA has done so far,
      because the current taggers only know to send control packets (ones that
      target a specific destination port), whereas data plane packets are
      supposed to be forwarded according to the FDB lookup, much like packets
      ingressing on any regular ingress port. If the FDB lookup process
      returns multiple destination ports (flooding, multicast), then
      replication is also handled by the switch hardware - the bridge only
      sends a single packet and avoids the skb_clone().
      
      DSA keeps for each bridge port a zero-based index (the number of the
      bridge). Multiple ports performing TX forwarding offload to the same
      bridge have the same dp->bridge_num value, and ports not offloading the
      TX data plane of a bridge have dp->bridge_num = -1.
      
      The tagger can check if the packet that is being transmitted on has
      skb->offload_fwd_mark = true or not. If it does, it can be sure that the
      packet belongs to the data plane of a bridge, further information about
      which can be obtained based on dp->bridge_dev and dp->bridge_num.
      It can then compose a DSA tag for injecting a data plane packet into
      that bridge number.
      
      For the switch driver side, we offer two new dsa_switch_ops methods,
      called .port_bridge_fwd_offload_{add,del}, which are modeled after
      .port_bridge_{join,leave}.
      These methods are provided in case the driver needs to configure the
      hardware to treat packets coming from that bridge software interface as
      data plane packets. The switchdev <-> bridge interaction happens during
      the netdev_master_upper_dev_link() call, so to switch drivers, the
      effect is that the .port_bridge_fwd_offload_add() method is called
      immediately after .port_bridge_join().
      
      If the bridge number exceeds the number of bridges for which the switch
      driver can offload the TX data plane (and this includes the case where
      the driver can offload none), DSA falls back to simply returning
      tx_fwd_offload = false in the switchdev_bridge_port_offload() call.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      123abc06
  7. 22 7月, 2021 1 次提交
    • V
      net: bridge: move the switchdev object replay helpers to "push" mode · 4e51bf44
      Vladimir Oltean 提交于
      Starting with commit 4f2673b3 ("net: bridge: add helper to replay
      port and host-joined mdb entries"), DSA has introduced some bridge
      helpers that replay switchdev events (FDB/MDB/VLAN additions and
      deletions) that can be lost by the switchdev drivers in a variety of
      circumstances:
      
      - an IP multicast group was host-joined on the bridge itself before any
        switchdev port joined the bridge, leading to the host MDB entries
        missing in the hardware database.
      - during the bridge creation process, the MAC address of the bridge was
        added to the FDB as an entry pointing towards the bridge device
        itself, but with no switchdev ports being part of the bridge yet, this
        local FDB entry would remain unknown to the switchdev hardware
        database.
      - a VLAN/FDB/MDB was added to a bridge port that is a LAG interface,
        before any switchdev port joined that LAG, leading to the hardware
        database missing those entries.
      - a switchdev port left a LAG that is a bridge port, while the LAG
        remained part of the bridge, and all FDB/MDB/VLAN entries remained
        installed in the hardware database of the switchdev port.
      
      Also, since commit 0d2cfbd4 ("net: bridge: ignore switchdev events
      for LAG ports which didn't request replay"), DSA introduced a method,
      based on a const void *ctx, to ensure that two switchdev ports under the
      same LAG that is a bridge port do not see the same MDB/VLAN entry being
      replayed twice by the bridge, once for every bridge port that joins the
      LAG.
      
      With so many ordering corner cases being possible, it seems unreasonable
      to expect a switchdev driver writer to get it right from the first try.
      Therefore, now that DSA has experimented with the bridge replay helpers
      for a little bit, we can move the code to the bridge driver where it is
      more readily available to all switchdev drivers.
      
      To convert the switchdev object replay helpers from "pull mode" (where
      the driver asks for them) to a "push mode" (where the bridge offers them
      automatically), the biggest problem is that the bridge needs to be aware
      when a switchdev port joins and leaves, even when the switchdev is only
      indirectly a bridge port (for example when the bridge port is a LAG
      upper of the switchdev).
      
      Luckily, we already have a hook for that, in the form of the newly
      introduced switchdev_bridge_port_offload() and
      switchdev_bridge_port_unoffload() calls. These offer a natural place for
      hooking the object addition and deletion replays.
      
      Extend the above 2 functions with:
      - pointers to the switchdev atomic notifier (for FDB replays) and the
        blocking notifier (for MDB and VLAN replays).
      - the "const void *ctx" argument required for drivers to be able to
        disambiguate between which port is targeted, when multiple ports are
        lowers of the same LAG that is a bridge port. Most of the drivers pass
        NULL to this argument, except the ones that support LAG offload and have
        the proper context check already in place in the switchdev blocking
        notifier handler.
      
      Also unexport the replay helpers, since nobody except the bridge calls
      them directly now.
      
      Note that:
      (a) we abuse the terminology slightly, because FDB entries are not
          "switchdev objects", but we count them as objects nonetheless.
          With no direct way to prove it, I think they are not modeled as
          switchdev objects because those can only be installed by the bridge
          to the hardware (as opposed to FDB entries which can be propagated
          in the other direction too). This is merely an abuse of terms, FDB
          entries are replayed too, despite not being objects.
      (b) the bridge does not attempt to sync port attributes to newly joined
          ports, just the countable stuff (the objects). The reason for this
          is simple: no universal and symmetric way to sync and unsync them is
          known. For example, VLAN filtering: what to do on unsync, disable or
          leave it enabled? Similarly, STP state, ageing timer, etc etc. What
          a switchdev port does when it becomes standalone again is not really
          up to the bridge's competence, and the driver should deal with it.
          On the other hand, replaying deletions of switchdev objects can be
          seen a matter of cleanup and therefore be treated by the bridge,
          hence this patch.
      
      We make the replay helpers opt-in for drivers, because they might not
      bring immediate benefits for them:
      
      - nbp_vlan_init() is called _after_ netdev_master_upper_dev_link(),
        so br_vlan_replay() should not do anything for the new drivers on
        which we call it. The existing drivers where there was even a slight
        possibility for there to exist a VLAN on a bridge port before they
        join it are already guarded against this: mlxsw and prestera deny
        joining LAG interfaces that are members of a bridge.
      
      - br_fdb_replay() should now notify of local FDB entries, but I patched
        all drivers except DSA to ignore these new entries in commit
        2c4eca3e ("net: bridge: switchdev: include local flag in FDB
        notifications"). Driver authors can lift this restriction as they
        wish, and when they do, they can also opt into the FDB replay
        functionality.
      
      - br_mdb_replay() should fix a real issue which is described in commit
        4f2673b3 ("net: bridge: add helper to replay port and host-joined
        mdb entries"). However most drivers do not offload the
        SWITCHDEV_OBJ_ID_HOST_MDB to see this issue: only cpsw and am65_cpsw
        offload this switchdev object, and I don't completely understand the
        way in which they offload this switchdev object anyway. So I'll leave
        it up to these drivers' respective maintainers to opt into
        br_mdb_replay().
      
      So most of the drivers pass NULL notifier blocks for the replay helpers,
      except:
      - dpaa2-switch which was already acked/regression-tested with the
        helpers enabled (and there isn't much of a downside in having them)
      - ocelot which already had replay logic in "pull" mode
      - DSA which already had replay logic in "pull" mode
      
      An important observation is that the drivers which don't currently
      request bridge event replays don't even have the
      switchdev_bridge_port_{offload,unoffload} calls placed in proper places
      right now. This was done to avoid unnecessary rework for drivers which
      might never even add support for this. For driver writers who wish to
      add replay support, this can be used as a tentative placement guide:
      https://patchwork.kernel.org/project/netdevbpf/patch/20210720134655.892334-11-vladimir.oltean@nxp.com/
      
      Cc: Vadym Kochan <vkochan@marvell.com>
      Cc: Taras Chornyi <tchornyi@marvell.com>
      Cc: Ioana Ciornei <ioana.ciornei@nxp.com>
      Cc: Lars Povlsen <lars.povlsen@microchip.com>
      Cc: Steen Hegelund <Steen.Hegelund@microchip.com>
      Cc: UNGLinuxDriver@microchip.com
      Cc: Claudiu Manoil <claudiu.manoil@nxp.com>
      Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
      Cc: Grygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: Ioana Ciornei <ioana.ciornei@nxp.com> # dpaa2-switch
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e51bf44
  8. 20 7月, 2021 3 次提交
    • V
      net: dsa: use switchdev_handle_fdb_{add,del}_to_device · b94dc99c
      Vladimir Oltean 提交于
      Using the new fan-out helper for FDB entries installed on the software
      bridge, we can install host addresses with the proper refcount on the
      CPU port, such that this case:
      
      ip link set swp0 master br0
      ip link set swp1 master br0
      ip link set swp2 master br0
      ip link set swp3 master br0
      ip link set br0 address 00:01:02:03:04:05
      ip link set swp3 nomaster
      
      works properly and the br0 address remains installed as a host entry
      with refcount 3 instead of getting deleted.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b94dc99c
    • V
      net: dsa: tag_8021q: add proper cross-chip notifier support · c64b9c05
      Vladimir Oltean 提交于
      The big problem which mandates cross-chip notifiers for tag_8021q is
      this:
      
                                                   |
          sw0p0     sw0p1     sw0p2     sw0p3     sw0p4
       [  user ] [  user ] [  user ] [  dsa  ] [  cpu  ]
                                         |
                                         +---------+
                                                   |
          sw1p0     sw1p1     sw1p2     sw1p3     sw1p4
       [  user ] [  user ] [  user ] [  dsa  ] [  dsa  ]
                                         |
                                         +---------+
                                                   |
          sw2p0     sw2p1     sw2p2     sw2p3     sw2p4
       [  user ] [  user ] [  user ] [  dsa  ] [  dsa  ]
      
      When the user runs:
      
      ip link add br0 type bridge
      ip link set sw0p0 master br0
      ip link set sw2p0 master br0
      
      It doesn't work.
      
      This is because dsa_8021q_crosschip_bridge_join() assumes that "ds" and
      "other_ds" are at most 1 hop away from each other, so it is sufficient
      to add the RX VLAN of {ds, port} into {other_ds, other_port} and vice
      versa and presto, the cross-chip link works. When there is another
      switch in the middle, such as in this case switch 1 with its DSA links
      sw1p3 and sw1p4, somebody needs to tell it about these VLANs too.
      
      Which is exactly why the problem is quadratic: when a port joins a
      bridge, for each port in the tree that's already in that same bridge we
      notify a tag_8021q VLAN addition of that port's RX VLAN to the entire
      tree. It is a very complicated web of VLANs.
      
      It must be mentioned that currently we install tag_8021q VLANs on too
      many ports (DSA links - to be precise, on all of them). For example,
      when sw2p0 joins br0, and assuming sw1p0 was part of br0 too, we add the
      RX VLAN of sw2p0 on the DSA links of switch 0 too, even though there
      isn't any port of switch 0 that is a member of br0 (at least yet).
      In theory we could notify only the switches which sit in between the
      port joining the bridge and the port reacting to that bridge_join event.
      But in practice that is impossible, because of the way 'link' properties
      are described in the device tree. The DSA bindings require DT writers to
      list out not only the real/physical DSA links, but in fact the entire
      routing table, like for example switch 0 above will have:
      
      	sw0p3: port@3 {
      		link = <&sw1p4 &sw2p4>;
      	};
      
      This was done because:
      
      /* TODO: ideally DSA ports would have a single dp->link_dp member,
       * and no dst->rtable nor this struct dsa_link would be needed,
       * but this would require some more complex tree walking,
       * so keep it stupid at the moment and list them all.
       */
      
      but it is a perfect example of a situation where too much information is
      actively detrimential, because we are now in the position where we
      cannot distinguish a real DSA link from one that is put there to avoid
      the 'complex tree walking'. And because DT is ABI, there is not much we
      can change.
      
      And because we do not know which DSA links are real and which ones
      aren't, we can't really know if DSA switch A is in the data path between
      switches B and C, in the general case.
      
      So this is why tag_8021q RX VLANs are added on all DSA links, and
      probably why it will never change.
      
      On the other hand, at least the number of additions/deletions is well
      balanced, and this means that once we implement reference counting at
      the cross-chip notifier level a la fdb/mdb, there is absolutely zero
      need for a struct dsa_8021q_crosschip_link, it's all self-managing.
      
      In fact, with the tag_8021q notifiers emitted from the bridge join
      notifiers, it becomes so generic that sja1105 does not need to do
      anything anymore, we can just delete its implementation of the
      .crosschip_bridge_{join,leave} methods.
      
      Among other things we can simply delete is the home-grown implementation
      of sja1105_notify_crosschip_switches(). The reason why that is wrong is
      because it is not quadratic - it only covers remote switches to which we
      have a cross-chip bridging link and that does not cover in-between
      switches. This deletion is part of the same patch because sja1105 used
      to poke deep inside the guts of the tag_8021q context in order to do
      that. Because the cross-chip links went away, so needs the sja1105 code.
      
      Last but not least, dsa_8021q_setup_port() is simplified (and also
      renamed). Because our TAG_8021Q_VLAN_ADD notifier is designed to react
      on the CPU port too, the four dsa_8021q_vid_apply() calls:
      - 1 for RX VLAN on user port
      - 1 for the user port's RX VLAN on the CPU port
      - 1 for TX VLAN on user port
      - 1 for the user port's TX VLAN on the CPU port
      
      now get squashed into only 2 notifier calls via
      dsa_port_tag_8021q_vlan_add.
      
      And because the notifiers to add and to delete a tag_8021q VLAN are
      distinct, now we finally break up the port setup and teardown into
      separate functions instead of relying on a "bool enabled" flag which
      tells us what to do. Arguably it should have been this way from the
      get go.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c64b9c05
    • V
      net: dsa: tag_8021q: manage RX VLANs dynamically at bridge join/leave time · e19cc13c
      Vladimir Oltean 提交于
      There has been at least one wasted opportunity for tag_8021q to be used
      by a driver:
      
      https://patchwork.ozlabs.org/project/netdev/patch/20200710113611.3398-3-kurt@linutronix.de/#2484272
      
      because of a design decision: the declared purpose of tag_8021q is to
      offer source port/switch identification for a tagging driver for packets
      coming from a switch with no hardware DSA tagging support. It is not
      intended to provide VLAN-based port isolation, because its first user,
      sja1105, had another mechanism for bridging domain isolation, the L2
      Forwarding Table. So even if 2 ports are in the same VLAN but they are
      separated via the L2 Forwarding Table, they will not communicate with
      one another. The L2 Forwarding Table is managed by the
      sja1105_bridge_join() and sja1105_bridge_leave() methods.
      
      As a consequence, today tag_8021q does not bother too much with hooking
      into .port_bridge_join() and .port_bridge_leave() because that would
      introduce yet another degree of freedom, it just iterates statically
      through all ports of a switch and adds the RX VLAN of one port to all
      the others. In this way, whenever .port_bridge_join() is called,
      bridging will magically work because the RX VLANs are already installed
      everywhere they need to be.
      
      This is not to say that the reason for the change in this patch is to
      satisfy the hellcreek and similar use cases, that is merely a nice side
      effect. Instead it is to make sja1105 cross-chip links work properly
      over a DSA link.
      
      For context, sja1105 today supports a degenerate form of cross-chip
      bridging, where the switches are interconnected through their CPU ports
      ("disjoint trees" topology). There is some code which has been
      generalized into dsa_8021q_crosschip_link_{add,del}, but it is not
      enough, and frankly it is impossible to build upon that.
      Real multi-switch DSA trees, like daisy chains or H trees, which have
      actual DSA links, do not work.
      
      The problem is that sja1105 is unlike mv88e6xxx, and does not have a PVT
      for cross-chip bridging, which is a table by which the local switch can
      select the forwarding domain for packets from a certain ingress switch
      ID and source port. The sja1105 switches cannot parse their own DSA
      tags, because, well, they don't really have support for DSA tags, it's
      all VLANs.
      
      So to make something like cross-chip bridging between sw0p0 and sw1p0 to
      work over the sw0p3/sw1p3 DSA link to work with sja1105 in the topology
      below:
      
                               |                                  |
          sw0p0     sw0p1     sw0p2     sw0p3          sw1p3     sw1p2     sw1p1     sw1p0
       [  user ] [  user ] [  cpu  ] [  dsa  ] ---- [  dsa  ] [  cpu  ] [  user ] [  user ]
      
      we need to ask ourselves 2 questions:
      
      (1) how should the L2 Forwarding Table be managed?
      (2) how should the VLAN Lookup Table be managed?
      
      i.e. what should prevent packets from going to unwanted ports?
      
      Since as mentioned, there is no PVT, the L2 Forwarding Table only
      contains forwarding rules for local ports. So we can say "all user ports
      are allowed to forward to all CPU ports and all DSA links".
      
      If we allow forwarding to DSA links unconditionally, this means we must
      prevent forwarding using the VLAN Lookup Table. This is in fact
      asymmetric with what we do for tag_8021q on ports local to the same
      switch, and it matters because now that we are making tag_8021q a core
      DSA feature, we need to hook into .crosschip_bridge_join() to add/remove
      the tag_8021q VLANs. So for symmetry it makes sense to manage the VLANs
      for local forwarding in the same way as cross-chip forwarding.
      
      Note that there is a very precise reason why tag_8021q hooks into
      dsa_switch_bridge_join() which acts at the cross-chip notifier level,
      and not at a higher level such as dsa_port_bridge_join(). We need to
      install the RX VLAN of the newly joining port into the VLAN table of all
      the existing ports across the tree that are part of the same bridge, and
      the notifier already does the iteration through the switches for us.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e19cc13c
  9. 30 6月, 2021 3 次提交
    • V
      net: dsa: ensure during dsa_fdb_offload_notify that dev_hold and dev_put are on the same dev · 4bed397c
      Vladimir Oltean 提交于
      When
      (a) "dev" is a bridge port which the DSA switch tree offloads, but is
          otherwise not a dsa slave (such as a LAG netdev), or
      (b) "dev" is the bridge net device itself
      
      then strange things happen to the dev_hold/dev_put pair:
      dsa_schedule_work() will still be called with a DSA port that offloads
      that netdev, but dev_hold() will be called on the non-DSA netdev.
      Then the "if" condition in dsa_slave_switchdev_event_work() does not
      pass, because "dev" is not a DSA netdev, so dev_put() is not called.
      
      This results in the simple fact that we have a reference counting
      mismatch on the "dev" net device.
      
      This can be seen when we add support for host addresses installed on the
      bridge net device.
      
      ip link add br1 type bridge
      ip link set br1 address 00:01:02:03:04:05
      ip link set swp0 master br1
      ip link del br1
      [  968.512278] unregister_netdevice: waiting for br1 to become free. Usage count = 5
      
      It seems foolish to do penny pinching and not add the net_device pointer
      in the dsa_switchdev_event_work structure, so let's finally do that.
      As an added bonus, when we start offloading local entries pointing
      towards the bridge, these will now properly appear as 'offloaded' in
      'bridge fdb' (this was not possible before, because 'dev' was assumed to
      only be a DSA net device):
      
      00:01:02:03:04:05 dev br0 vlan 1 offload master br0 permanent
      00:01:02:03:04:05 dev br0 offload master br0 permanent
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4bed397c
    • V
      net: dsa: introduce a separate cross-chip notifier type for host FDBs · 3dc80afc
      Vladimir Oltean 提交于
      DSA treats some bridge FDB entries by trapping them to the CPU port.
      Currently, the only class of such entries are FDB addresses learnt by
      the software bridge on a foreign interface. However there are many more
      to be added:
      
      - FDB entries with the is_local flag (for termination) added by the
        bridge on the user ports (typically containing the MAC address of the
        bridge port)
      - FDB entries pointing towards the bridge net device (for termination).
        Typically these contain the MAC address of the bridge net device.
      - Static FDB entries installed on a foreign interface that is in the
        same bridge with a DSA user port.
      
      The reason why a separate cross-chip notifier for host FDBs is justified
      compared to normal FDBs is the same as in the case of host MDBs: the
      cross-chip notifier matching function in switch.c should avoid
      installing these entries on routing ports that route towards the
      targeted switch, but not towards the CPU. This is required in order to
      have proper support for H-like multi-chip topologies.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3dc80afc
    • V
      net: dsa: introduce a separate cross-chip notifier type for host MDBs · b8e997c4
      Vladimir Oltean 提交于
      Commit abd49535 ("net: dsa: execute dsa_switch_mdb_add only for
      routing port in cross-chip topologies") does a surprisingly good job
      even for the SWITCHDEV_OBJ_ID_HOST_MDB use case, where DSA simply
      translates a switchdev object received on dp into a cross-chip notifier
      for dp->cpu_dp.
      
      To visualize how that works, imagine the daisy chain topology below and
      consider a SWITCHDEV_OBJ_ID_HOST_MDB object emitted on sw2p0. How does
      the cross-chip notifier know to match on all the right ports (sw0p4, the
      dedicated CPU port, sw1p4, an upstream DSA link, and sw2p4, another
      upstream DSA link)?
      
                                                      |
             sw0p0     sw0p1     sw0p2     sw0p3     sw0p4
          [  user ] [  user ] [  user ] [  dsa  ] [  cpu  ]
          [       ] [       ] [       ] [       ] [   x   ]
                                            |
                                            +---------+
                                                      |
             sw1p0     sw1p1     sw1p2     sw1p3     sw1p4
          [  user ] [  user ] [  user ] [  dsa  ] [  dsa  ]
          [       ] [       ] [       ] [       ] [   x   ]
                                            |
                                            +---------+
                                                      |
             sw2p0     sw2p1     sw2p2     sw2p3     sw2p4
          [  user ] [  user ] [  user ] [  user ] [  dsa  ]
          [       ] [       ] [       ] [       ] [   x   ]
      
      The answer is simple: the dedicated CPU port of sw2p0 is sw0p4, and
      dsa_routing_port returns the upstream port for all switches.
      
      That is fine, but there are other topologies where this does not work as
      well. There are trees with "H" topologies in the wild, where there are 2
      or more switches with DSA links between them, but every switch has its
      dedicated CPU port. For these topologies, it seems stupid for the neighbor
      switches to install an MDB entry on the routing port, since these
      multicast addresses are fundamentally different than the usual ones we
      support (and that is the justification for this patch, to introduce the
      concept of a termination plane multicast MAC address, as opposed to a
      forwarding plane multicast MAC address).
      
      For example, when a SWITCHDEV_OBJ_ID_HOST_MDB would get added to sw0p0,
      without this patch, it would get treated as a regular port MDB on sw0p2
      and it would match on the ports below (including the sw1p3 routing port).
      
                               |                                  |
          sw0p0     sw0p1     sw0p2     sw0p3          sw1p3     sw1p2     sw1p1     sw1p0
       [  user ] [  user ] [  cpu  ] [  dsa  ]      [  dsa  ] [  cpu  ] [  user ] [  user ]
       [       ] [       ] [   x   ] [       ] ---- [   x   ] [       ] [       ] [       ]
      
      With the patch, the host MDB notifier on sw0p0 matches only on the local
      switch, which is what we want for a termination plane address.
      
                               |                                  |
          sw0p0     sw0p1     sw0p2     sw0p3          sw1p3     sw1p2     sw1p1     sw1p0
       [  user ] [  user ] [  cpu  ] [  dsa  ]      [  dsa  ] [  cpu  ] [  user ] [  user ]
       [       ] [       ] [   x   ] [       ] ---- [       ] [       ] [       ] [       ]
      
      Name this new matching function "dsa_switch_host_address_match" since we
      will be reusing it soon for host FDB entries as well.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b8e997c4
  10. 29 6月, 2021 1 次提交
    • V
      net: dsa: replay a deletion of switchdev objects for ports leaving a bridged LAG · 74918945
      Vladimir Oltean 提交于
      When a DSA switch port leaves a bonding interface that is under a
      bridge, there might be dangling switchdev objects on that port left
      behind, because the bridge is not aware that its lower interface (the
      bond) changed state in any way.
      
      Call the bridge replay helpers with adding=false before changing
      dp->bridge_dev to NULL, because we need to simulate to
      dsa_slave_port_obj_del() that these notifications were emitted by the
      bridge.
      
      We add this hook to the NETDEV_PRECHANGEUPPER event handler, because
      we are calling into switchdev (and the __switchdev_handle_port_obj_del
      fanout helpers expect the upper/lower adjacency lists to still be valid)
      and PRECHANGEUPPER is the last moment in time when they still are.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      74918945
  11. 22 6月, 2021 1 次提交
    • V
      net: dsa: targeted MTU notifiers should only match on one port · 88faba20
      Vladimir Oltean 提交于
      dsa_slave_change_mtu() calls dsa_port_mtu_change() twice:
      - it sends a cross-chip notifier with the MTU of the CPU port which is
        used to update the DSA links.
      - it sends one targeted MTU notifier which is supposed to only match the
        user port on which we are changing the MTU. The "propagate_upstream"
        variable is used here to bypass the cross-chip notifier system from
        switch.c
      
      But due to a mistake, the second, targeted notifier matches not only on
      the user port, but also on the DSA link which is a member of the same
      switch, if that exists.
      
      And because the DSA links of the entire dst were programmed in a
      previous round to the largest_mtu via a "propagate_upstream == true"
      notification, then the dsa_port_mtu_change(propagate_upstream == false)
      call that is immediately upcoming will break the MTU on the one DSA link
      which is chip-wise local to the dp whose MTU is changing right now.
      
      Example given this daisy chain topology:
      
         sw0p0     sw0p1     sw0p2     sw0p3     sw0p4
      [  cpu  ] [  user ] [  user ] [  dsa  ] [  user ]
      [   x   ] [       ] [       ] [   x   ] [       ]
                                        |
                                        +---------+
                                                  |
         sw1p0     sw1p1     sw1p2     sw1p3     sw1p4
      [  user ] [  user ] [  user ] [  dsa  ] [  dsa  ]
      [       ] [       ] [       ] [       ] [   x   ]
      
      ip link set sw0p1 mtu 9000
      ip link set sw1p1 mtu 9000 # at this stage, sw0p1 and sw1p1 can talk
                                 # to one another using jumbo frames
      ip link set sw0p2 mtu 1500 # this programs the sw0p3 DSA link first to
                                 # the largest_mtu of 9000, then reprograms it to
                                 # 1500 with the "propagate_upstream == false"
                                 # notifier, breaking communication between
                                 # sw0p1 and sw1p1
      
      To escape from this situation, make the targeted match really match on a
      single port - the user port, and rename the "propagate_upstream"
      variable to "targeted_match" to clarify the intention and avoid future
      issues.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      88faba20
  12. 12 6月, 2021 1 次提交
    • V
      net: dsa: generalize overhead for taggers that use both headers and trailers · 4e500251
      Vladimir Oltean 提交于
      Some really really weird switches just couldn't decide whether to use a
      normal or a tail tagger, so they just did both.
      
      This creates problems for DSA, because we only have the concept of an
      'overhead' which can be applied to the headroom or to the tailroom of
      the skb (like for example during the central TX reallocation procedure),
      depending on the value of bool tail_tag, but not to both.
      
      We need to generalize DSA to cater for these odd switches by
      transforming the 'overhead / tail_tag' pair into 'needed_headroom /
      needed_tailroom'.
      
      The DSA master's MTU is increased to account for both.
      
      The flow dissector code is modified such that it only calls the DSA
      adjustment callback if the tagger has a non-zero header length.
      
      Taggers are trivially modified to declare either needed_headroom or
      needed_tailroom, based on the tail_tag value that they currently
      declare.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e500251
  13. 24 3月, 2021 2 次提交
  14. 19 3月, 2021 1 次提交
  15. 09 3月, 2021 1 次提交
    • V
      net: dsa: fix switchdev objects on bridge master mistakenly being applied on ports · 03cbb870
      Vladimir Oltean 提交于
      Tobias reports that after the blamed patch, VLAN objects being added to
      a bridge device are being added to all slave ports instead (swp2, swp3).
      
      ip link add br0 type bridge vlan_filtering 1
      ip link set swp2 master br0
      ip link set swp3 master br0
      bridge vlan add dev br0 vid 100 self
      
      This is because the fix was too broad: we made dsa_port_offloads_netdev
      say "yes, I offload the br0 bridge" for all slave ports, but we didn't
      add the checks whether the switchdev object was in fact meant for the
      physical port or for the bridge itself. So we are reacting on events in
      a way in which we shouldn't.
      
      The reason why the fix was too broad is because the question itself,
      "does this DSA port offload this netdev", was too broad in the first
      place. The solution is to disambiguate the question and separate it into
      two different functions, one to be called for each switchdev attribute /
      object that has an orig_dev == net_bridge (dsa_port_offloads_bridge),
      and the other for orig_dev == net_bridge_port (*_offloads_bridge_port).
      
      In the case of VLAN objects on the bridge interface, this solves the
      problem because we know that VLAN objects are per bridge port and not
      per bridge. And when orig_dev is equal to the net_bridge, we offload it
      as a bridge, but not as a bridge port; that's how we are able to skip
      reacting on those events. Note that this is compatible with future plans
      to have explicit offloading of VLAN objects on the bridge interface as a
      bridge port (in DSA, this signifies that we should add that VLAN towards
      the CPU port).
      
      Fixes: 99b8202b ("net: dsa: fix SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING getting ignored")
      Reported-by: NTobias Waldekranz <tobias@waldekranz.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NTobias Waldekranz <tobias@waldekranz.com>
      Tested-by: NTobias Waldekranz <tobias@waldekranz.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      03cbb870
  16. 17 2月, 2021 1 次提交
  17. 15 2月, 2021 2 次提交
  18. 13 2月, 2021 2 次提交
    • V
      net: dsa: act as passthrough for bridge port flags · a8b659e7
      Vladimir Oltean 提交于
      There are multiple ways in which a PORT_BRIDGE_FLAGS attribute can be
      expressed by the bridge through switchdev, and not all of them can be
      emulated by DSA mid-layer API at the same time.
      
      One possible configuration is when the bridge offloads the port flags
      using a mask that has a single bit set - therefore only one feature
      should change. However, DSA currently groups together unicast and
      multicast flooding in the .port_egress_floods method, which limits our
      options when we try to add support for turning off broadcast flooding:
      do we extend .port_egress_floods with a third parameter which b53 and
      mv88e6xxx will ignore? But that means that the DSA layer, which
      currently implements the PRE_BRIDGE_FLAGS attribute all by itself, will
      see that .port_egress_floods is implemented, and will report that all 3
      types of flooding are supported - not necessarily true.
      
      Another configuration is when the user specifies more than one flag at
      the same time, in the same netlink message. If we were to create one
      individual function per offloadable bridge port flag, we would limit the
      expressiveness of the switch driver of refusing certain combinations of
      flag values. For example, a switch may not have an explicit knob for
      flooding of unknown multicast, just for flooding in general. In that
      case, the only correct thing to do is to allow changes to BR_FLOOD and
      BR_MCAST_FLOOD in tandem, and never allow mismatched values. But having
      a separate .port_set_unicast_flood and .port_set_multicast_flood would
      not allow the driver to possibly reject that.
      
      Also, DSA doesn't consider it necessary to inform the driver that a
      SWITCHDEV_ATTR_ID_BRIDGE_MROUTER attribute was offloaded, because it
      just calls .port_egress_floods for the CPU port. When we'll add support
      for the plain SWITCHDEV_ATTR_ID_PORT_MROUTER, that will become a real
      problem because the flood settings will need to be held statefully in
      the DSA middle layer, otherwise changing the mrouter port attribute will
      impact the flooding attribute. And that's _assuming_ that the underlying
      hardware doesn't have anything else to do when a multicast router
      attaches to a port than flood unknown traffic to it.  If it does, there
      will need to be a dedicated .port_set_mrouter anyway.
      
      So we need to let the DSA drivers see the exact form that the bridge
      passes this switchdev attribute in, otherwise we are standing in the
      way. Therefore we also need to use this form of language when
      communicating to the driver that it needs to configure its initial
      (before bridge join) and final (after bridge leave) port flags.
      
      The b53 and mv88e6xxx drivers are converted to the passthrough API and
      their implementation of .port_egress_floods is split into two: a
      function that configures unicast flooding and another for multicast.
      The mv88e6xxx implementation is quite hairy, and it turns out that
      the implementations of unknown unicast flooding are actually the same
      for 6185 and for 6352:
      
      behind the confusing names actually lie two individual bits:
      NO_UNKNOWN_MC -> FLOOD_UC = 0x4 = BIT(2)
      NO_UNKNOWN_UC -> FLOOD_MC = 0x8 = BIT(3)
      
      so there was no reason to entangle them in the first place.
      
      Whereas the 6185 writes to MV88E6185_PORT_CTL0_FORWARD_UNKNOWN of
      PORT_CTL0, which has the exact same bit index. I have left the
      implementations separate though, for the only reason that the names are
      different enough to confuse me, since I am not able to double-check with
      a user manual. The multicast flooding setting for 6185 is in a different
      register than for 6352 though.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a8b659e7
    • V
      net: switchdev: pass flags and mask to both {PRE_,}BRIDGE_FLAGS attributes · e18f4c18
      Vladimir Oltean 提交于
      This switchdev attribute offers a counterproductive API for a driver
      writer, because although br_switchdev_set_port_flag gets passed a
      "flags" and a "mask", those are passed piecemeal to the driver, so while
      the PRE_BRIDGE_FLAGS listener knows what changed because it has the
      "mask", the BRIDGE_FLAGS listener doesn't, because it only has the final
      value. But certain drivers can offload only certain combinations of
      settings, like for example they cannot change unicast flooding
      independently of multicast flooding - they must be both on or both off.
      The way the information is passed to switchdev makes drivers not
      expressive enough, and unable to reject this request ahead of time, in
      the PRE_BRIDGE_FLAGS notifier, so they are forced to reject it during
      the deferred BRIDGE_FLAGS attribute, where the rejection is currently
      ignored.
      
      This patch also changes drivers to make use of the "mask" field for edge
      detection when possible.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e18f4c18
  19. 12 2月, 2021 1 次提交
  20. 07 2月, 2021 1 次提交
    • V
      net: dsa: make assisted_learning_on_cpu_port bypass offloaded LAG interfaces · a324d3d4
      Vladimir Oltean 提交于
      Given the following topology, and focusing only on Box A:
      
               Box A
               +----------------------------------+
               | Board 1         br0              |
               |             +---------+          |
               |            /           \         |
               |            |           |         |
               |            |         bond0       |
               |            |        +-----+      |
               |192.168.1.1 |       /       \     |
               |  eno0     swp0    swp1    swp2   |
               +---|--------|-------|-------|-----+
                   |        |       |       |
                   +--------+       |       |
                     Cable          |       |
                               Cable|       |Cable
                     Cable          |       |
                   +--------+       |       |
                   |        |       |       |
               +---|--------|-------|-------|-----+
               |  eno0     swp0    swp1    swp2   |
               |192.168.1.2 |       \       /     |
               |            |        +-----+      |
               |            |         bond0       |
               |            |           |         |
               |            \           /         |
               |             +---------+          |
               | Board 2         br0              |
               +----------------------------------+
               Box B
      
      The assisted_learning_on_cpu_port logic will see that swp0 is bridged
      with a "foreign interface" (bond0) and will therefore install all
      addresses learnt by the software bridge towards bond0 (including the
      address of eno0 on Box B) as static addresses towards the CPU port.
      
      But that's not what we want - bond0 is not really a "foreign interface"
      but one we can offload including L2 forwarding from/towards it. So we
      need to refine our logic for assisted learning such that, whenever we
      see an address learnt on a non-DSA interface, we search through the tree
      for any port that offloads that non-DSA interface.
      
      Some confusion might arise as to why we search through the whole tree
      instead of just the local switch returned by dsa_slave_dev_lower_find.
      Or a different angle of the same confusion: why does
      dsa_slave_dev_lower_find(br_dev) return a single dp that's under br_dev
      instead of the whole list of bridged DSA ports?
      
      To answer the second question, it should be enough to install the static
      FDB entry on the CPU port of a single switch in the tree, because
      dsa_port_fdb_add uses DSA_NOTIFIER_FDB_ADD which ensures that all other
      switches in the tree get notified of that address, and add the entry
      themselves using dsa_towards_port().
      
      This should help understand the answer to the first question: the port
      returned by dsa_slave_dev_lower_find may not be on the same switch as
      the ports that offload the LAG. Nonetheless, if the driver implements
      .crosschip_lag_join and .crosschip_bridge_join as mv88e6xxx does, there
      still isn't any reason for trapping addresses learnt on the remote LAG
      towards the CPU, and we should prevent that.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      a324d3d4
  21. 04 2月, 2021 1 次提交
    • V
      net: dsa: fix SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING getting ignored · 99b8202b
      Vladimir Oltean 提交于
      The bridge emits VLAN filtering events and quite a few others via
      switchdev with orig_dev = br->dev. After the blamed commit, these events
      started getting ignored.
      
      The point of the patch was to not offload switchdev objects for ports
      that didn't go through dsa_port_bridge_join, because the configuration
      is unsupported:
      - ports that offload a bonding/team interface go through
        dsa_port_bridge_join when that bonding/team interface is later bridged
        with another switch port or LAG
      - ports that don't offload LAG don't get notified of the bridge that is
        on top of that LAG.
      
      Sadly, a check is missing, which is that the orig_dev is equal to the
      bridge device. This check is compatible with the original intention,
      because ports that don't offload bridging because they use a software
      LAG don't have dp->bridge_dev set.
      
      On a semi-related note, we should not offload switchdev objects or
      populate dp->bridge_dev if the driver doesn't implement .port_bridge_join
      either. However there is no regression associated with that, so it can
      be done separately.
      
      Fixes: 5696c8ae ("net: dsa: Don't offload port attributes on standalone ports")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NTobias Waldekranz <tobias@waldekranz.com>
      Tested-by: NTobias Waldekranz <tobias@waldekranz.com>
      Link: https://lore.kernel.org/r/20210202233109.1591466-1-olteanv@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      99b8202b
  22. 30 1月, 2021 2 次提交
    • V
      net: dsa: allow changing the tag protocol via the "tagging" device attribute · 53da0eba
      Vladimir Oltean 提交于
      Currently DSA exposes the following sysfs:
      $ cat /sys/class/net/eno2/dsa/tagging
      ocelot
      
      which is a read-only device attribute, introduced in the kernel as
      commit 98cdb480 ("net: dsa: Expose tagging protocol to user-space"),
      and used by libpcap since its commit 993db3800d7d ("Add support for DSA
      link-layer types").
      
      It would be nice if we could extend this device attribute by making it
      writable:
      $ echo ocelot-8021q > /sys/class/net/eno2/dsa/tagging
      
      This is useful with DSA switches that can make use of more than one
      tagging protocol. It may be useful in dsa_loop in the future too, to
      perform offline testing of various taggers, or for changing between dsa
      and edsa on Marvell switches, if that is desirable.
      
      In terms of implementation, drivers can support this feature by
      implementing .change_tag_protocol, which should always leave the switch
      in a consistent state: either with the new protocol if things went well,
      or with the old one if something failed. Teardown of the old protocol,
      if necessary, must be handled by the driver.
      
      Some things remain as before:
      - The .get_tag_protocol is currently only called at probe time, to load
        the initial tagging protocol driver. Nonetheless, new drivers should
        report the tagging protocol in current use now.
      - The driver should manage by itself the initial setup of tagging
        protocol, no later than the .setup() method, as well as destroying
        resources used by the last tagger in use, no earlier than the
        .teardown() method.
      
      For multi-switch DSA trees, error handling is a bit more complicated,
      since e.g. the 5th out of 7 switches may fail to change the tag
      protocol. When that happens, a revert to the original tag protocol is
      attempted, but that may fail too, leaving the tree in an inconsistent
      state despite each individual switch implementing .change_tag_protocol
      transactionally. Since the intersection between drivers that implement
      .change_tag_protocol and drivers that support D in DSA is currently the
      empty set, the possibility for this error to happen is ignored for now.
      
      Testing:
      
      $ insmod mscc_felix.ko
      [   79.549784] mscc_felix 0000:00:00.5: Adding to iommu group 14
      [   79.565712] mscc_felix 0000:00:00.5: Failed to register DSA switch: -517
      $ insmod tag_ocelot.ko
      $ rmmod mscc_felix.ko
      $ insmod mscc_felix.ko
      [   97.261724] libphy: VSC9959 internal MDIO bus: probed
      [   97.267363] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 0
      [   97.274998] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 1
      [   97.282561] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 2
      [   97.289700] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 3
      [   97.599163] mscc_felix 0000:00:00.5 swp0 (uninitialized): PHY [0000:00:00.3:10] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
      [   97.862034] mscc_felix 0000:00:00.5 swp1 (uninitialized): PHY [0000:00:00.3:11] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
      [   97.950731] mscc_felix 0000:00:00.5 swp0: configuring for inband/qsgmii link mode
      [   97.964278] 8021q: adding VLAN 0 to HW filter on device swp0
      [   98.146161] mscc_felix 0000:00:00.5 swp2 (uninitialized): PHY [0000:00:00.3:12] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
      [   98.238649] mscc_felix 0000:00:00.5 swp1: configuring for inband/qsgmii link mode
      [   98.251845] 8021q: adding VLAN 0 to HW filter on device swp1
      [   98.433916] mscc_felix 0000:00:00.5 swp3 (uninitialized): PHY [0000:00:00.3:13] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
      [   98.485542] mscc_felix 0000:00:00.5: configuring for fixed/internal link mode
      [   98.503584] mscc_felix 0000:00:00.5: Link is Up - 2.5Gbps/Full - flow control rx/tx
      [   98.527948] device eno2 entered promiscuous mode
      [   98.544755] DSA: tree 0 setup
      
      $ ping 10.0.0.1
      PING 10.0.0.1 (10.0.0.1): 56 data bytes
      64 bytes from 10.0.0.1: seq=0 ttl=64 time=2.337 ms
      64 bytes from 10.0.0.1: seq=1 ttl=64 time=0.754 ms
      ^C
       -  10.0.0.1 ping statistics  -
      2 packets transmitted, 2 packets received, 0% packet loss
      round-trip min/avg/max = 0.754/1.545/2.337 ms
      
      $ cat /sys/class/net/eno2/dsa/tagging
      ocelot
      $ cat ./test_ocelot_8021q.sh
              #!/bin/bash
      
              ip link set swp0 down
              ip link set swp1 down
              ip link set swp2 down
              ip link set swp3 down
              ip link set swp5 down
              ip link set eno2 down
              echo ocelot-8021q > /sys/class/net/eno2/dsa/tagging
              ip link set eno2 up
              ip link set swp0 up
              ip link set swp1 up
              ip link set swp2 up
              ip link set swp3 up
              ip link set swp5 up
      $ ./test_ocelot_8021q.sh
      ./test_ocelot_8021q.sh: line 9: echo: write error: Protocol not available
      $ rmmod tag_ocelot.ko
      rmmod: can't unload module 'tag_ocelot': Resource temporarily unavailable
      $ insmod tag_ocelot_8021q.ko
      $ ./test_ocelot_8021q.sh
      $ cat /sys/class/net/eno2/dsa/tagging
      ocelot-8021q
      $ rmmod tag_ocelot.ko
      $ rmmod tag_ocelot_8021q.ko
      rmmod: can't unload module 'tag_ocelot_8021q': Resource temporarily unavailable
      $ ping 10.0.0.1
      PING 10.0.0.1 (10.0.0.1): 56 data bytes
      64 bytes from 10.0.0.1: seq=0 ttl=64 time=0.953 ms
      64 bytes from 10.0.0.1: seq=1 ttl=64 time=0.787 ms
      64 bytes from 10.0.0.1: seq=2 ttl=64 time=0.771 ms
      $ rmmod mscc_felix.ko
      [  645.544426] mscc_felix 0000:00:00.5: Link is Down
      [  645.838608] DSA: tree 0 torn down
      $ rmmod tag_ocelot_8021q.ko
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      53da0eba
    • V
      net: dsa: document the existing switch tree notifiers and add a new one · 886f8e26
      Vladimir Oltean 提交于
      The existence of dsa_broadcast has generated some confusion in the past:
      https://www.mail-archive.com/netdev@vger.kernel.org/msg365042.html
      
      So let's document the existing dsa_port_notify and dsa_broadcast
      functions and explain when each of them should be used.
      
      Also, in fact, the in-between function has always been there but was
      lacking a name, and is the main reason for this patch: dsa_tree_notify.
      Refactor dsa_broadcast to use it.
      
      This patch also moves dsa_broadcast (a top-level function) to dsa2.c,
      where it really belonged in the first place, but had no companion so it
      stood with dsa_port_notify.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      886f8e26
  23. 15 1月, 2021 1 次提交
    • T
      net: dsa: Link aggregation support · 058102a6
      Tobias Waldekranz 提交于
      Monitor the following events and notify the driver when:
      
      - A DSA port joins/leaves a LAG.
      - A LAG, made up of DSA ports, joins/leaves a bridge.
      - A DSA port in a LAG is enabled/disabled (enabled meaning
        "distributing" in 802.3ad LACP terms).
      
      When a LAG joins a bridge, the DSA subsystem will treat that as each
      individual port joining the bridge. The driver may look at the port's
      LAG device pointer to see if it is associated with any LAG, if that is
      required. This is analogue to how switchdev events are replicated out
      to all lower devices when reaching e.g. a LAG.
      
      Drivers can optionally request that DSA maintain a linear mapping from
      a LAG ID to the corresponding netdev by setting ds->num_lag_ids to the
      desired size.
      
      In the event that the hardware is not capable of offloading a
      particular LAG for any reason (the typical case being use of exotic
      modes like broadcast), DSA will take a hands-off approach, allowing
      the LAG to be formed as a pure software construct. This is reported
      back through the extended ACK, but is otherwise transparent to the
      user.
      Signed-off-by: NTobias Waldekranz <tobias@waldekranz.com>
      Reviewed-by: NVladimir Oltean <olteanv@gmail.com>
      Tested-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      058102a6
  24. 12 1月, 2021 3 次提交
    • V
      net: dsa: remove the transactional logic from ageing time notifiers · 77b61365
      Vladimir Oltean 提交于
      Remove the shim introduced in DSA for offloading the bridge ageing time
      from switchdev, by first checking whether the ageing time is within the
      range limits requested by the driver.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: NLinus Walleij <linus.walleij@linaro.org>
      Acked-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      77b61365
    • V
      net: switchdev: remove the transaction structure from port attributes · bae33f2b
      Vladimir Oltean 提交于
      Since the introduction of the switchdev API, port attributes were
      transmitted to drivers for offloading using a two-step transactional
      model, with a prepare phase that was supposed to catch all errors, and a
      commit phase that was supposed to never fail.
      
      Some classes of failures can never be avoided, like hardware access, or
      memory allocation. In the latter case, merely attempting to move the
      memory allocation to the preparation phase makes it impossible to avoid
      memory leaks, since commit 91cf8ece ("switchdev: Remove unused
      transaction item queue") which has removed the unused mechanism of
      passing on the allocated memory between one phase and another.
      
      It is time we admit that separating the preparation from the commit
      phase is something that is best left for the driver to decide, and not
      something that should be baked into the API, especially since there are
      no switchdev callers that depend on this.
      
      This patch removes the struct switchdev_trans member from switchdev port
      attribute notifier structures, and converts drivers to not look at this
      member.
      
      In part, this patch contains a revert of my previous commit 2e554a7a
      ("net: dsa: propagate switchdev vlan_filtering prepare phase to
      drivers").
      
      For the most part, the conversion was trivial except for:
      - Rocker's world implementation based on Broadcom OF-DPA had an odd
        implementation of ofdpa_port_attr_bridge_flags_set. The conversion was
        done mechanically, by pasting the implementation twice, then only
        keeping the code that would get executed during prepare phase on top,
        then only keeping the code that gets executed during the commit phase
        on bottom, then simplifying the resulting code until this was obtained.
      - DSA's offloading of STP state, bridge flags, VLAN filtering and
        multicast router could be converted right away. But the ageing time
        could not, so a shim was introduced and this was left for a further
        commit.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NLinus Walleij <linus.walleij@linaro.org>
      Acked-by: NJiri Pirko <jiri@nvidia.com>
      Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek
      Reviewed-by: Linus Walleij <linus.walleij@linaro.org> # RTL8366RB
      Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      bae33f2b
    • V
      net: switchdev: remove the transaction structure from port object notifiers · ffb68fc5
      Vladimir Oltean 提交于
      Since the introduction of the switchdev API, port objects were
      transmitted to drivers for offloading using a two-step transactional
      model, with a prepare phase that was supposed to catch all errors, and a
      commit phase that was supposed to never fail.
      
      Some classes of failures can never be avoided, like hardware access, or
      memory allocation. In the latter case, merely attempting to move the
      memory allocation to the preparation phase makes it impossible to avoid
      memory leaks, since commit 91cf8ece ("switchdev: Remove unused
      transaction item queue") which has removed the unused mechanism of
      passing on the allocated memory between one phase and another.
      
      It is time we admit that separating the preparation from the commit
      phase is something that is best left for the driver to decide, and not
      something that should be baked into the API, especially since there are
      no switchdev callers that depend on this.
      
      This patch removes the struct switchdev_trans member from switchdev port
      object notifier structures, and converts drivers to not look at this
      member.
      
      Where driver conversion is trivial (like in the case of the Marvell
      Prestera driver, NXP DPAA2 switch, TI CPSW, and Rocker drivers), it is
      done in this patch.
      
      Where driver conversion needs more attention (DSA, Mellanox Spectrum),
      the conversion is left for subsequent patches and here we only fake the
      prepare/commit phases at a lower level, just not in the switchdev
      notifier itself.
      
      Where the code has a natural structure that is best left alone as a
      preparation and a commit phase (as in the case of the Ocelot switch),
      that structure is left in place, just made to not depend upon the
      switchdev transactional model.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: NLinus Walleij <linus.walleij@linaro.org>
      Acked-by: NJiri Pirko <jiri@nvidia.com>
      Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      ffb68fc5
  25. 10 1月, 2021 1 次提交
  26. 08 1月, 2021 2 次提交
  27. 10 11月, 2020 1 次提交
  28. 03 10月, 2020 2 次提交