1. 18 2月, 2022 2 次提交
  2. 16 2月, 2022 1 次提交
    • V
      net: dsa: add explicit support for host bridge VLANs · 134ef238
      Vladimir Oltean 提交于
      Currently, DSA programs VLANs on shared (DSA and CPU) ports each time it
      does so on user ports. This is good for basic functionality but has
      several limitations:
      
      - the VLAN group which must reach the CPU may be radically different
        from the VLAN group that must be autonomously forwarded by the switch.
        In other words, the admin may want to isolate noisy stations and avoid
        traffic from them going to the control processor of the switch, where
        it would just waste useless cycles. The bridge already supports
        independent control of VLAN groups on bridge ports and on the bridge
        itself, and when VLAN-aware, it will drop packets in software anyway
        if their VID isn't added as a 'self' entry towards the bridge device.
      
      - Replaying host FDB entries may depend, for some drivers like mv88e6xxx,
        on replaying the host VLANs as well. The 2 VLAN groups are
        approximately the same in most regular cases, but there are corner
        cases when timing matters, and DSA's approximation of replicating
        VLANs on shared ports simply does not work.
      
      - If a user makes the bridge (implicitly the CPU port) join a VLAN by
        accident, there is no way for the CPU port to isolate itself from that
        noisy VLAN except by rebooting the system. This is because for each
        VLAN added on a user port, DSA will add it on shared ports too, but
        for each VLAN deletion on a user port, it will remain installed on
        shared ports, since DSA has no good indication of whether the VLAN is
        still in use or not.
      
      Now that the bridge driver emits well-balanced SWITCHDEV_OBJ_ID_PORT_VLAN
      addition and removal events, DSA has a simple and straightforward task
      of separating the bridge port VLANs (these have an orig_dev which is a
      DSA slave interface, or a LAG interface) from the host VLANs (these have
      an orig_dev which is a bridge interface), and to keep a simple reference
      count of each VID on each shared port.
      
      Forwarding VLANs must be installed on the bridge ports and on all DSA
      ports interconnecting them. We don't have a good view of the exact
      topology, so we simply install forwarding VLANs on all DSA ports, which
      is what has been done until now.
      
      Host VLANs must be installed primarily on the dedicated CPU port of each
      bridge port. More subtly, they must also be installed on upstream-facing
      and downstream-facing DSA ports that are connecting the bridge ports and
      the CPU. This ensures that the mv88e6xxx's problem (VID of host FDB
      entry may be absent from VTU) is still addressed even if that switch is
      in a cross-chip setup, and it has no local CPU port.
      
      Therefore:
      - user ports contain only bridge port (forwarding) VLANs, and no
        refcounting is necessary
      - DSA ports contain both forwarding and host VLANs. Refcounting is
        necessary among these 2 types.
      - CPU ports contain only host VLANs. Refcounting is also necessary.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      134ef238
  3. 05 1月, 2022 2 次提交
  4. 10 12月, 2021 1 次提交
  5. 09 12月, 2021 5 次提交
    • V
      net: dsa: eliminate dsa_switch_ops :: port_bridge_tx_fwd_{,un}offload · 857fdd74
      Vladimir Oltean 提交于
      We don't really need new switch API for these, and with new switches
      which intend to add support for this feature, it will become cumbersome
      to maintain.
      
      The change consists in restructuring the two drivers that implement this
      offload (sja1105 and mv88e6xxx) such that the offload is enabled and
      disabled from the ->port_bridge_{join,leave} methods instead of the old
      ->port_bridge_tx_fwd_{,un}offload.
      
      The only non-trivial change is that mv88e6xxx_map_virtual_bridge_to_pvt()
      has been moved to avoid a forward declaration, and the
      mv88e6xxx_reg_lock() calls from inside it have been removed, since
      locking is now done from mv88e6xxx_port_bridge_{join,leave}.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NAlvin Šipraga <alsi@bang-olufsen.dk>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      857fdd74
    • V
      net: dsa: keep the bridge_dev and bridge_num as part of the same structure · d3eed0e5
      Vladimir Oltean 提交于
      The main desire behind this is to provide coherent bridge information to
      the fast path without locking.
      
      For example, right now we set dp->bridge_dev and dp->bridge_num from
      separate code paths, it is theoretically possible for a packet
      transmission to read these two port properties consecutively and find a
      bridge number which does not correspond with the bridge device.
      
      Another desire is to start passing more complex bridge information to
      dsa_switch_ops functions. For example, with FDB isolation, it is
      expected that drivers will need to be passed the bridge which requested
      an FDB/MDB entry to be offloaded, and along with that bridge_dev, the
      associated bridge_num should be passed too, in case the driver might
      want to implement an isolation scheme based on that number.
      
      We already pass the {bridge_dev, bridge_num} pair to the TX forwarding
      offload switch API, however we'd like to remove that and squash it into
      the basic bridge join/leave API. So that means we need to pass this
      pair to the bridge join/leave API.
      
      During dsa_port_bridge_leave, first we unset dp->bridge_dev, then we
      call the driver's .port_bridge_leave with what used to be our
      dp->bridge_dev, but provided as an argument.
      
      When bridge_dev and bridge_num get folded into a single structure, we
      need to preserve this behavior in dsa_port_bridge_leave: we need a copy
      of what used to be in dp->bridge.
      
      Switch drivers check bridge membership by comparing dp->bridge_dev with
      the provided bridge_dev, but now, if we provide the struct dsa_bridge as
      a pointer, they cannot keep comparing dp->bridge to the provided
      pointer, since this only points to an on-stack copy. To make this
      obvious and prevent driver writers from forgetting and doing stupid
      things, in this new API, the struct dsa_bridge is provided as a full
      structure (not very large, contains an int and a pointer) instead of a
      pointer. An explicit comparison function needs to be used to determine
      bridge membership: dsa_port_offloads_bridge().
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NAlvin Šipraga <alsi@bang-olufsen.dk>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      d3eed0e5
    • V
      net: dsa: hide dp->bridge_dev and dp->bridge_num in the core behind helpers · 36cbf39b
      Vladimir Oltean 提交于
      The location of the bridge device pointer and number is going to change.
      It is not going to be kept individually per port, but in a common
      structure allocated dynamically and which will have lockdep validation.
      
      Create helpers to access these elements so that we have a migration path
      to the new organization.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      36cbf39b
    • V
      net: dsa: assign a bridge number even without TX forwarding offload · 947c8746
      Vladimir Oltean 提交于
      The service where DSA assigns a unique bridge number for each forwarding
      domain is useful even for drivers which do not implement the TX
      forwarding offload feature.
      
      For example, drivers might use the dp->bridge_num for FDB isolation.
      
      So rename ds->num_fwd_offloading_bridges to ds->max_num_bridges, and
      calculate a unique bridge_num for all drivers that set this value.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NAlvin Šipraga <alsi@bang-olufsen.dk>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      947c8746
    • V
      net: dsa: make dp->bridge_num one-based · 3f9bb030
      Vladimir Oltean 提交于
      I have seen too many bugs already due to the fact that we must encode an
      invalid dp->bridge_num as a negative value, because the natural tendency
      is to check that invalid value using (!dp->bridge_num). Latest example
      can be seen in commit 1bec0f05 ("net: dsa: fix bridge_num not
      getting cleared after ports leaving the bridge").
      
      Convert the existing users to assume that dp->bridge_num == 0 is the
      encoding for invalid, and valid bridge numbers start from 1.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NAlvin Šipraga <alsi@bang-olufsen.dk>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      3f9bb030
  6. 02 12月, 2021 3 次提交
  7. 01 11月, 2021 1 次提交
  8. 26 10月, 2021 1 次提交
    • V
      net: dsa: flush switchdev workqueue when leaving the bridge · d7d0d423
      Vladimir Oltean 提交于
      DSA is preparing to offer switch drivers an API through which they can
      associate each FDB entry with a struct net_device *bridge_dev. This can
      be used to perform FDB isolation (the FDB lookup performed on the
      ingress of a standalone, or bridged port, should not find an FDB entry
      that is present in the FDB of another bridge).
      
      In preparation of that work, DSA needs to ensure that by the time we
      call the switch .port_fdb_add and .port_fdb_del methods, the
      dp->bridge_dev pointer is still valid, i.e. the port is still a bridge
      port.
      
      This is not guaranteed because the SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE API
      requires drivers that must have sleepable context to handle those events
      to schedule the deferred work themselves. DSA does this through the
      dsa_owq.
      
      It can happen that a port leaves a bridge, del_nbp() flushes the FDB on
      that port, SWITCHDEV_FDB_DEL_TO_DEVICE is notified in atomic context,
      DSA schedules its deferred work, but del_nbp() finishes unlinking the
      bridge as a master from the port before DSA's deferred work is run.
      
      Fundamentally, the port must not be unlinked from the bridge until all
      FDB deletion deferred work items have been flushed. The bridge must wait
      for the completion of these hardware accesses.
      
      An attempt has been made to address this issue centrally in switchdev by
      making SWITCHDEV_FDB_DEL_TO_DEVICE deferred (=> blocking) at the switchdev
      level, which would offer implicit synchronization with del_nbp:
      
      https://patchwork.kernel.org/project/netdevbpf/cover/20210820115746.3701811-1-vladimir.oltean@nxp.com/
      
      but it seems that any attempt to modify switchdev's behavior and make
      the events blocking there would introduce undesirable side effects in
      other switchdev consumers.
      
      The most undesirable behavior seems to be that
      switchdev_deferred_process_work() takes the rtnl_mutex itself, which
      would be worse off than having the rtnl_mutex taken individually from
      drivers which is what we have now (except DSA which has removed that
      lock since commit 0faf890f ("net: dsa: drop rtnl_lock from
      dsa_slave_switchdev_event_work")).
      
      So to offer the needed guarantee to DSA switch drivers, I have come up
      with a compromise solution that does not require switchdev rework:
      we already have a hook at the last moment in time when the bridge is
      still an upper of ours: the NETDEV_PRECHANGEUPPER handler. We can flush
      the dsa_owq manually from there, which makes all FDB deletions
      synchronous.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7d0d423
  9. 21 10月, 2021 2 次提交
  10. 24 8月, 2021 2 次提交
    • V
      net: dsa: don't advertise 'rx-vlan-filter' when not needed · 06cfb2df
      Vladimir Oltean 提交于
      There have been multiple independent reports about
      dsa_slave_vlan_rx_add_vid being called (and consequently calling the
      drivers' .port_vlan_add) when it isn't needed, and sometimes (not
      always) causing problems in the process.
      
      Case 1:
      mv88e6xxx_port_vlan_prepare is stubborn and only accepts VLANs on
      bridged ports. That is understandably so, because standalone mv88e6xxx
      ports are VLAN-unaware, and VTU entries are said to be a scarce
      resource.
      
      Otherwise said, the following fails lamentably on mv88e6xxx:
      
      ip link add br0 type bridge vlan_filtering 1
      ip link set lan3 master br0
      ip link add link lan10 name lan10.1 type vlan id 1
      [485256.724147] mv88e6085 d0032004.mdio-mii:12: p10: hw VLAN 1 already used by port 3 in br0
      RTNETLINK answers: Operation not supported
      
      This has become a worse issue since commit 9b236d2a ("net: dsa:
      Advertise the VLAN offload netdev ability only if switch supports it").
      Up to that point, the driver was returning -EOPNOTSUPP and DSA was
      reconverting that error to 0, making the 8021q upper think all is ok
      (but obviously the error message was there even prior to this change).
      After that change the -EOPNOTSUPP is propagated to vlan_vid_add, and it
      is a hard error.
      
      Case 2:
      Ports that don't offload the Linux bridge (have a dp->bridge_dev = NULL
      because they don't implement .port_bridge_{join,leave}). Understandably,
      a standalone port should not offload VLANs either, it should remain VLAN
      unaware and any VLAN should be a software VLAN (as long as the hardware
      is not quirky, that is).
      
      In fact, dsa_slave_port_obj_add does do the right thing and rejects
      switchdev VLAN objects coming from the bridge when that bridge is not
      offloaded:
      
      	case SWITCHDEV_OBJ_ID_PORT_VLAN:
      		if (!dsa_port_offloads_bridge_port(dp, obj->orig_dev))
      			return -EOPNOTSUPP;
      
      		err = dsa_slave_vlan_add(dev, obj, extack);
      
      But it seems that the bridge is able to trick us. The __vlan_vid_add
      from br_vlan.c has:
      
      	/* Try switchdev op first. In case it is not supported, fallback to
      	 * 8021q add.
      	 */
      	err = br_switchdev_port_vlan_add(dev, v->vid, flags, extack);
      	if (err == -EOPNOTSUPP)
      		return vlan_vid_add(dev, br->vlan_proto, v->vid);
      
      So it says "no, no, you need this VLAN in your life!". And we, naive as
      we are, say "oh, this comes from the vlan_vid_add code path, it must be
      an 8021q upper, sure, I'll take that". And we end up with that bridge
      VLAN installed on our port anyway. But this time, it has the wrong flags:
      if the bridge was trying to install VLAN 1 as a pvid/untagged VLAN,
      failed via switchdev, retried via vlan_vid_add, we have this comment:
      
      	/* This API only allows programming tagged, non-PVID VIDs */
      
      So what we do makes absolutely no sense.
      
      Backtracing a bit, we see the common pattern. We allow the network stack
      to think that our standalone ports are VLAN-aware, but they aren't, for
      the vast majority of switches. The quirky ones should not dictate the
      norm. The dsa_slave_vlan_rx_add_vid and dsa_slave_vlan_rx_kill_vid
      methods exist for drivers that need the 'rx-vlan-filter: on' feature in
      ethtool -k, which can be due to any of the following reasons:
      
      1. vlan_filtering_is_global = true, and some ports are under a
         VLAN-aware bridge while others are standalone, and the standalone
         ports would otherwise drop VLAN-tagged traffic. This is described in
         commit 061f6a50 ("net: dsa: Add ndo_vlan_rx_{add, kill}_vid
         implementation").
      
      2. the ports that are under a VLAN-aware bridge should also set this
         feature, for 8021q uppers having a VID not claimed by the bridge.
         In this case, the driver will essentially not even know that the VID
         is coming from the 8021q layer and not the bridge.
      
      3. Hellcreek. This driver needs it because in standalone mode, it uses
         unique VLANs per port to ensure separation. For separation of untagged
         traffic, it uses different PVIDs for each port, and for separation of
         VLAN-tagged traffic, it never accepts 8021q uppers with the same vid
         on two ports.
      
      If a driver does not fall under any of the above 3 categories, there is
      no reason why it should advertise the 'rx-vlan-filter' feature, therefore
      no reason why it should offload the VLANs added through vlan_vid_add.
      
      This commit fixes the problem by removing the 'rx-vlan-filter' feature
      from the slave devices when they operate in standalone mode, and when
      they offload a VLAN-unaware bridge.
      
      The way it works is that vlan_vid_add will now stop its processing here:
      
      vlan_add_rx_filter_info:
      	if (!vlan_hw_filter_capable(dev, proto))
      		return 0;
      
      So the VLAN will still be saved in the interface's VLAN RX filtering
      list, but because it does not declare VLAN filtering in its features,
      the 8021q module will return zero without committing that VLAN to
      hardware.
      
      This gives the drivers what they want, since it keeps the 8021q VLANs
      away from the VLAN table until VLAN awareness is enabled (point at which
      the ports are no longer standalone, hence in the mv88e6xxx case, the
      check in mv88e6xxx_port_vlan_prepare passes).
      
      Since the issue predates the existence of the hellcreek driver, case 3
      will be dealt with in a separate patch.
      
      The main change that this patch makes is to no longer set
      NETIF_F_HW_VLAN_CTAG_FILTER unconditionally, but toggle it dynamically
      (for most switches, never).
      
      The second part of the patch addresses an issue that the first part
      introduces: because the 'rx-vlan-filter' feature is now dynamically
      toggled, and our .ndo_vlan_rx_add_vid does not get called when
      'rx-vlan-filter' is off, we need to avoid bugs such as the following by
      replaying the VLANs from 8021q uppers every time we enable VLAN
      filtering:
      
      ip link add link lan0 name lan0.100 type vlan id 100
      ip addr add 192.168.100.1/24 dev lan0.100
      ping 192.168.100.2 # should work
      ip link add br0 type bridge vlan_filtering 0
      ip link set lan0 master br0
      ping 192.168.100.2 # should still work
      ip link set br0 type bridge vlan_filtering 1
      ping 192.168.100.2 # should still work but doesn't
      
      As reported by Florian, some drivers look at ds->vlan_filtering in
      their .port_vlan_add() implementation. So this patch also makes sure
      that ds->vlan_filtering is committed before calling the driver. This is
      the reason why it is first committed, then restored on the failure path.
      Reported-by: NTobias Waldekranz <tobias@waldekranz.com>
      Reported-by: NAlvin Šipraga <alsi@bang-olufsen.dk>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Tested-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06cfb2df
    • V
      net: dsa: don't call switchdev_bridge_port_unoffload for unoffloaded bridge ports · 09dba21b
      Vladimir Oltean 提交于
      For ports that have a NULL dp->bridge_dev, dsa_port_to_bridge_port()
      also returns NULL as expected.
      
      Issue #1 is that we are performing a NULL pointer dereference on brport_dev.
      
      Issue #2 is that these are ports on which switchdev_bridge_port_offload
      has not been called, so we should not call switchdev_bridge_port_unoffload
      on them either.
      
      Both issues are addressed by checking against a NULL brport_dev in
      dsa_port_pre_bridge_leave and exiting early.
      
      Fixes: 2f5dc00f ("net: bridge: switchdev: let drivers inform which bridge ports are offloaded")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      09dba21b
  11. 23 8月, 2021 1 次提交
    • V
      net: dsa: track unique bridge numbers across all DSA switch trees · f5e165e7
      Vladimir Oltean 提交于
      Right now, cross-tree bridging setups work somewhat by mistake.
      
      In the case of cross-tree bridging with sja1105, all switch instances
      need to agree upon a common VLAN ID for forwarding a packet that belongs
      to a certain bridging domain.
      
      With TX forwarding offload, the VLAN ID is the bridge VLAN for
      VLAN-aware bridging, and the tag_8021q TX forwarding offload VID
      (a VLAN which has non-zero VBID bits) for VLAN-unaware bridging.
      
      The VBID for VLAN-unaware bridging is derived from the dp->bridge_num
      value calculated by DSA independently for each switch tree.
      
      If ports from one tree join one bridge, and ports from another tree join
      another bridge, DSA will assign them the same bridge_num, even though
      the bridges are different. If cross-tree bridging is supported, this
      is an issue.
      
      Modify DSA to calculate the bridge_num globally across all switch trees.
      This has the implication for a driver that the dp->bridge_num value that
      DSA will assign to its ports might not be contiguous, if there are
      boards with multiple DSA drivers instantiated. Additionally, all
      bridge_num values eat up towards each switch's
      ds->num_fwd_offloading_bridges maximum, which is potentially unfortunate,
      and can be seen as a limitation introduced by this patch. However, that
      is the lesser evil for now.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f5e165e7
  12. 12 8月, 2021 2 次提交
    • V
      net: dsa: tag_8021q: don't broadcast during setup/teardown · 724395f4
      Vladimir Oltean 提交于
      Currently, on my board with multiple sja1105 switches in disjoint trees
      described in commit f66a6a69 ("net: dsa: permit cross-chip bridging
      between all trees in the system"), rebooting the board triggers the
      following benign warnings:
      
      [   12.345566] sja1105 spi2.0: port 0 failed to notify tag_8021q VLAN 1088 deletion: -ENOENT
      [   12.353804] sja1105 spi2.0: port 0 failed to notify tag_8021q VLAN 2112 deletion: -ENOENT
      [   12.362019] sja1105 spi2.0: port 1 failed to notify tag_8021q VLAN 1089 deletion: -ENOENT
      [   12.370246] sja1105 spi2.0: port 1 failed to notify tag_8021q VLAN 2113 deletion: -ENOENT
      [   12.378466] sja1105 spi2.0: port 2 failed to notify tag_8021q VLAN 1090 deletion: -ENOENT
      [   12.386683] sja1105 spi2.0: port 2 failed to notify tag_8021q VLAN 2114 deletion: -ENOENT
      
      Basically switch 1 calls dsa_tag_8021q_unregister, and switch 1's TX and
      RX VLANs cannot be found on switch 2's CPU port.
      
      But why would switch 2 even attempt to delete switch 1's TX and RX
      tag_8021q VLANs from its CPU port? Well, because we use dsa_broadcast,
      and it is supposed that it had added those VLANs in the first place
      (because in dsa_port_tag_8021q_vlan_match, all CPU ports match
      regardless of their tree index or switch index).
      
      The two trees probe asynchronously, and when switch 1 probed, it called
      dsa_broadcast which did not notify the tree of switch 2, because that
      didn't probe yet. But during unbind, switch 2's tree _is_ probed, so it
      _is_ notified of the deletion.
      
      Before jumping to introduce a synchronization mechanism between the
      probing across disjoint switch trees, let's take a step back and see
      whether we _need_ to do that in the first place.
      
      The RX and TX VLANs of switch 1 would be needed on switch 2's CPU port
      only if switch 1 and 2 were part of a cross-chip bridge. And
      dsa_tag_8021q_bridge_join takes care precisely of that (but if probing
      was synchronous, the bridge_join would just end up bumping the VLANs'
      refcount, because they are already installed by the setup path).
      
      Since by the time the ports are bridged, all DSA trees are already set
      up, and we don't need the tag_8021q VLANs of one switch installed on the
      other switches during probe time, the answer is that we don't need to
      fix the synchronization issue.
      
      So make the setup and teardown code paths call dsa_port_notify, which
      notifies only the local tree, and the bridge code paths call
      dsa_broadcast, which let the other trees know as well.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      724395f4
    • V
      net: dsa: print more information when a cross-chip notifier fails · ab97462b
      Vladimir Oltean 提交于
      Currently this error message does not say a lot:
      
      [   32.693498] DSA: failed to notify tag_8021q VLAN deletion: -ENOENT
      [   32.699725] DSA: failed to notify tag_8021q VLAN deletion: -ENOENT
      [   32.705931] DSA: failed to notify tag_8021q VLAN deletion: -ENOENT
      [   32.712139] DSA: failed to notify tag_8021q VLAN deletion: -ENOENT
      [   32.718347] DSA: failed to notify tag_8021q VLAN deletion: -ENOENT
      [   32.724554] DSA: failed to notify tag_8021q VLAN deletion: -ENOENT
      
      but in this form, it is immediately obvious (at least to me) what the
      problem is, even without further looking at the code:
      
      [   12.345566] sja1105 spi2.0: port 0 failed to notify tag_8021q VLAN 1088 deletion: -ENOENT
      [   12.353804] sja1105 spi2.0: port 0 failed to notify tag_8021q VLAN 2112 deletion: -ENOENT
      [   12.362019] sja1105 spi2.0: port 1 failed to notify tag_8021q VLAN 1089 deletion: -ENOENT
      [   12.370246] sja1105 spi2.0: port 1 failed to notify tag_8021q VLAN 2113 deletion: -ENOENT
      [   12.378466] sja1105 spi2.0: port 2 failed to notify tag_8021q VLAN 1090 deletion: -ENOENT
      [   12.386683] sja1105 spi2.0: port 2 failed to notify tag_8021q VLAN 2114 deletion: -ENOENT
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab97462b
  13. 09 8月, 2021 5 次提交
    • V
      net: dsa: avoid fast ageing twice when port leaves a bridge · bee7c577
      Vladimir Oltean 提交于
      Drivers that support both the toggling of address learning and dynamic
      FDB flushing (mv88e6xxx, b53, sja1105) currently need to fast-age a port
      twice when it leaves a bridge:
      
      - once, when del_nbp() calls br_stp_disable_port() which puts the port
        in the BLOCKING state
      - twice, when dsa_port_switchdev_unsync_attrs() calls
        dsa_port_clear_brport_flags() which disables address learning
      
      The knee-jerk reaction might be to say "dsa_port_clear_brport_flags does
      not need to fast-age the port at all", but the thing is, we still need
      both code paths to flush the dynamic FDB entries in different situations.
      When a DSA switch port leaves a bonding/team interface that is (still) a
      bridge port, no del_nbp() will be called, so we rely on
      dsa_port_clear_brport_flags() function to restore proper standalone port
      functionality with address learning disabled.
      
      So the solution is just to avoid double the work when both code paths
      are called in series. Luckily, DSA already caches the STP port state, so
      we can skip flushing the dynamic FDB when we disable address learning
      and the STP state is one where no address learning takes place at all.
      Under that condition, not flushing the FDB is safe because there is
      supposed to not be any dynamic FDB entry at all (they were flushed
      during the transition towards that state, and none were learned in the
      meanwhile).
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bee7c577
    • V
      net: dsa: still fast-age ports joining a bridge if they can't configure learning · a4ffe09f
      Vladimir Oltean 提交于
      Commit 39f32101 ("net: dsa: don't fast age standalone ports")
      assumed that all standalone ports disable address learning, but if the
      switch driver implements .port_fast_age but not .port_bridge_flags (like
      ksz9477, ksz8795, lantiq_gswip, lan9303), then that might not actually
      be true.
      
      So whereas before, the bridge temporarily walking us through the
      BLOCKING STP state meant that the standalone ports had a checkpoint to
      flush their baggage and start fresh when they join a bridge, after that
      commit they no longer do.
      
      Restore the old behavior for these drivers by checking if the switch can
      toggle address learning. If it can't, disregard the "do_fast_age"
      argument and unconditionally perform fast ageing on STP state changes.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a4ffe09f
    • V
      net: dsa: flush the dynamic FDB of the software bridge when fast ageing a port · 9264e4ad
      Vladimir Oltean 提交于
      Currently, when DSA performs fast ageing on a port, 'bridge fdb' shows
      us that the 'self' entries (corresponding to the hardware bridge, as
      printed by dsa_slave_fdb_dump) are deleted, but the 'master' entries
      (corresponding to the software bridge) aren't.
      
      Indeed, searching through the bridge driver, neither the
      brport_attr_learning handler nor the IFLA_BRPORT_LEARNING handler call
      br_fdb_delete_by_port. However, br_stp_disable_port does, which is one
      of the paths which DSA uses to trigger a fast ageing process anyway.
      
      There is, however, one other very promising caller of
      br_fdb_delete_by_port, and that is the bridge driver's handler of the
      SWITCHDEV_FDB_FLUSH_TO_BRIDGE atomic notifier. Currently the s390/qeth
      HiperSockets card driver is the only user of this.
      
      I can't say I understand that driver's architecture or interaction with
      the bridge, but it appears to not be a switchdev driver in the traditional
      sense of the word. Nonetheless, the mechanism it provides is a useful
      way for DSA to express the fact that it performs fast ageing too, in a
      way that does not change the existing behavior for other drivers.
      
      Cc: Alexandra Winter <wintera@linux.ibm.com>
      Cc: Julian Wiedmann <jwi@linux.ibm.com>
      Cc: Roopa Prabhu <roopa@nvidia.com>
      Cc: Nikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9264e4ad
    • V
      net: dsa: don't fast age bridge ports with learning turned off · 4eab90d9
      Vladimir Oltean 提交于
      On topology changes, stations that were dynamically learned on ports
      that are no longer part of the active topology must be flushed - this is
      described by clause "17.11 Updating learned station location information"
      of IEEE 802.1D-2004.
      
      However, when address learning on the bridge port is turned off in the
      first place, there is nothing to flush, so skip a potentially expensive
      operation.
      
      We can finally do this now since DSA is aware of the learning state of
      its bridged ports.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4eab90d9
    • V
      net: dsa: centralize fast ageing when address learning is turned off · 045c45d1
      Vladimir Oltean 提交于
      Currently DSA leaves it down to device drivers to fast age the FDB on a
      port when address learning is disabled on it. There are 2 reasons for
      doing that in the first place:
      
      - when address learning is disabled by user space, through
        IFLA_BRPORT_LEARNING or the brport_attr_learning sysfs, what user
        space typically wants to achieve is to operate in a mode with no
        dynamic FDB entry on that port. But if the port is already up, some
        addresses might have been already learned on it, and it seems silly to
        wait for 5 minutes for them to expire until something useful can be
        done.
      
      - when a port leaves a bridge and becomes standalone, DSA turns off
        address learning on it. This also has the nice side effect of flushing
        the dynamically learned bridge FDB entries on it, which is a good idea
        because standalone ports should not have bridge FDB entries on them.
      
      We let drivers manage fast ageing under this condition because if DSA
      were to do it, it would need to track each port's learning state, and
      act upon the transition, which it currently doesn't.
      
      But there are 2 reasons why doing it is better after all:
      
      - drivers might get it wrong and not do it (see b53_port_set_learning)
      
      - we would like to flush the dynamic entries from the software bridge
        too, and letting drivers do that would be another pain point
      
      So track the port learning state and trigger a fast age process
      automatically within DSA.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      045c45d1
  14. 08 8月, 2021 1 次提交
    • V
      net: dsa: don't fast age standalone ports · 39f32101
      Vladimir Oltean 提交于
      DSA drives the procedure to flush dynamic FDB entries from a port based
      on the change of STP state: whenever we go from a state where address
      learning is enabled (LEARNING, FORWARDING) to a state where it isn't
      (LISTENING, BLOCKING, DISABLED), we need to flush the existing dynamic
      entries.
      
      However, there are cases when this is not needed. Internally, when a
      DSA switch interface is not under a bridge, DSA still keeps it in the
      "FORWARDING" STP state. And when that interface joins a bridge, the
      bridge will meticulously iterate that port through all STP states,
      starting with BLOCKING and ending with FORWARDING. Because there is a
      state transition from the standalone version of FORWARDING into the
      temporary BLOCKING bridge port state, DSA calls the fast age procedure.
      
      Since commit 5e38c158 ("net: dsa: configure better brport flags when
      ports leave the bridge"), DSA asks standalone ports to disable address
      learning. Therefore, there can be no dynamic FDB entries on a standalone
      port. Therefore, it does not make sense to flush dynamic FDB entries on
      one.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      39f32101
  15. 06 8月, 2021 2 次提交
    • V
      net: dsa: don't disable multicast flooding to the CPU even without an IGMP querier · c73c5708
      Vladimir Oltean 提交于
      Commit 08cc83cc ("net: dsa: add support for BRIDGE_MROUTER
      attribute") added an option for users to turn off multicast flooding
      towards the CPU if they turn off the IGMP querier on a bridge which
      already has enslaved ports (echo 0 > /sys/class/net/br0/bridge/multicast_router).
      
      And commit a8b659e7 ("net: dsa: act as passthrough for bridge port flags")
      simply papered over that issue, because it moved the decision to flood
      the CPU with multicast (or not) from the DSA core down to individual drivers,
      instead of taking a more radical position then.
      
      The truth is that disabling multicast flooding to the CPU is simply
      something we are not prepared to do now, if at all. Some reasons:
      
      - ICMP6 neighbor solicitation messages are unregistered multicast
        packets as far as the bridge is concerned. So if we stop flooding
        multicast, the outside world cannot ping the bridge device's IPv6
        link-local address.
      
      - There might be foreign interfaces bridged with our DSA switch ports
        (sending a packet towards the host does not necessarily equal
        termination, but maybe software forwarding). So if there is no one
        interested in that multicast traffic in the local network stack, that
        doesn't mean nobody is.
      
      - PTP over L4 (IPv4, IPv6) is multicast, but is unregistered as far as
        the bridge is concerned. This should reach the CPU port.
      
      - The switch driver might not do FDB partitioning. And since we don't
        even bother to do more fine-grained flood disabling (such as "disable
        flooding _from_port_N_ towards the CPU port" as opposed to "disable
        flooding _from_any_port_ towards the CPU port"), this breaks standalone
        ports, or even multiple bridges where one has an IGMP querier and one
        doesn't.
      
      Reverting the logic makes all of the above work.
      
      Fixes: a8b659e7 ("net: dsa: act as passthrough for bridge port flags")
      Fixes: 08cc83cc ("net: dsa: add support for BRIDGE_MROUTER attribute")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c73c5708
    • V
      net: dsa: stop syncing the bridge mcast_router attribute at join time · 7df4e744
      Vladimir Oltean 提交于
      Qingfang points out that when a bridge with the default settings is
      created and a port joins it:
      
      ip link add br0 type bridge
      ip link set swp0 master br0
      
      DSA calls br_multicast_router() on the bridge to see if the br0 device
      is a multicast router port, and if it is, it enables multicast flooding
      to the CPU port, otherwise it disables it.
      
      If we look through the multicast_router_show() sysfs or at the
      IFLA_BR_MCAST_ROUTER netlink attribute, we see that the default mrouter
      attribute for the bridge device is "1" (MDB_RTR_TYPE_TEMP_QUERY).
      
      However, br_multicast_router() will return "0" (MDB_RTR_TYPE_DISABLED),
      because an mrouter port in the MDB_RTR_TYPE_TEMP_QUERY state may not be
      actually _active_ until it receives an actual IGMP query. So, the
      br_multicast_router() function should really have been called
      br_multicast_router_active() perhaps.
      
      When/if an IGMP query is received, the bridge device will transition via
      br_multicast_mark_router() into the active state until the
      ip4_mc_router_timer expires after an multicast_querier_interval.
      
      Of course, this does not happen if the bridge is created with an
      mcast_router attribute of "2" (MDB_RTR_TYPE_PERM).
      
      The point is that in lack of any IGMP query messages, and in the default
      bridge configuration, unregistered multicast packets will not be able to
      reach the CPU port through flooding, and this breaks many use cases
      (most obviously, IPv6 ND, with its ICMP6 neighbor solicitation multicast
      messages).
      
      Leave the multicast flooding setting towards the CPU port down to a driver
      level decision.
      
      Fixes: 010e269f ("net: dsa: sync up switchdev objects and port attributes when joining the bridge")
      Reported-by: NDENG Qingfang <dqfext@gmail.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7df4e744
  16. 27 7月, 2021 1 次提交
  17. 23 7月, 2021 2 次提交
    • V
      net: dsa: add support for bridge TX forwarding offload · 123abc06
      Vladimir Oltean 提交于
      For a DSA switch, to offload the forwarding process of a bridge device
      means to send the packets coming from the software bridge as data plane
      packets. This is contrary to everything that DSA has done so far,
      because the current taggers only know to send control packets (ones that
      target a specific destination port), whereas data plane packets are
      supposed to be forwarded according to the FDB lookup, much like packets
      ingressing on any regular ingress port. If the FDB lookup process
      returns multiple destination ports (flooding, multicast), then
      replication is also handled by the switch hardware - the bridge only
      sends a single packet and avoids the skb_clone().
      
      DSA keeps for each bridge port a zero-based index (the number of the
      bridge). Multiple ports performing TX forwarding offload to the same
      bridge have the same dp->bridge_num value, and ports not offloading the
      TX data plane of a bridge have dp->bridge_num = -1.
      
      The tagger can check if the packet that is being transmitted on has
      skb->offload_fwd_mark = true or not. If it does, it can be sure that the
      packet belongs to the data plane of a bridge, further information about
      which can be obtained based on dp->bridge_dev and dp->bridge_num.
      It can then compose a DSA tag for injecting a data plane packet into
      that bridge number.
      
      For the switch driver side, we offer two new dsa_switch_ops methods,
      called .port_bridge_fwd_offload_{add,del}, which are modeled after
      .port_bridge_{join,leave}.
      These methods are provided in case the driver needs to configure the
      hardware to treat packets coming from that bridge software interface as
      data plane packets. The switchdev <-> bridge interaction happens during
      the netdev_master_upper_dev_link() call, so to switch drivers, the
      effect is that the .port_bridge_fwd_offload_add() method is called
      immediately after .port_bridge_join().
      
      If the bridge number exceeds the number of bridges for which the switch
      driver can offload the TX data plane (and this includes the case where
      the driver can offload none), DSA falls back to simply returning
      tx_fwd_offload = false in the switchdev_bridge_port_offload() call.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      123abc06
    • T
      net: bridge: switchdev: allow the TX data plane forwarding to be offloaded · 47211192
      Tobias Waldekranz 提交于
      Allow switchdevs to forward frames from the CPU in accordance with the
      bridge configuration in the same way as is done between bridge
      ports. This means that the bridge will only send a single skb towards
      one of the ports under the switchdev's control, and expects the driver
      to deliver the packet to all eligible ports in its domain.
      
      Primarily this improves the performance of multicast flows with
      multiple subscribers, as it allows the hardware to perform the frame
      replication.
      
      The basic flow between the driver and the bridge is as follows:
      
      - When joining a bridge port, the switchdev driver calls
        switchdev_bridge_port_offload() with tx_fwd_offload = true.
      
      - The bridge sends offloadable skbs to one of the ports under the
        switchdev's control using skb->offload_fwd_mark = true.
      
      - The switchdev driver checks the skb->offload_fwd_mark field and lets
        its FDB lookup select the destination port mask for this packet.
      
      v1->v2:
      - convert br_input_skb_cb::fwd_hwdoms to a plain unsigned long
      - introduce a static key "br_switchdev_fwd_offload_used" to minimize the
        impact of the newly introduced feature on all the setups which don't
        have hardware that can make use of it
      - introduce a check for nbp->flags & BR_FWD_OFFLOAD to optimize cache
        line access
      - reorder nbp_switchdev_frame_mark_accel() and br_handle_vlan() in
        __br_forward()
      - do not strip VLAN on egress if forwarding offload on VLAN-aware bridge
        is being used
      - propagate errors from .ndo_dfwd_add_station() if not EOPNOTSUPP
      
      v2->v3:
      - replace the solution based on .ndo_dfwd_add_station with a solution
        based on switchdev_bridge_port_offload
      - rename BR_FWD_OFFLOAD to BR_TX_FWD_OFFLOAD
      v3->v4: rebase
      v4->v5:
      - make sure the static key is decremented on bridge port unoffload
      - more function and variable renaming and comments for them:
        br_switchdev_fwd_offload_used to br_switchdev_tx_fwd_offload
        br_switchdev_accels_skb to br_switchdev_frame_uses_tx_fwd_offload
        nbp_switchdev_frame_mark_tx_fwd to nbp_switchdev_frame_mark_tx_fwd_to_hwdom
        nbp_switchdev_frame_mark_accel to nbp_switchdev_frame_mark_tx_fwd_offload
        fwd_accel to tx_fwd_offload
      Signed-off-by: NTobias Waldekranz <tobias@waldekranz.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      47211192
  18. 22 7月, 2021 2 次提交
    • V
      net: bridge: move the switchdev object replay helpers to "push" mode · 4e51bf44
      Vladimir Oltean 提交于
      Starting with commit 4f2673b3 ("net: bridge: add helper to replay
      port and host-joined mdb entries"), DSA has introduced some bridge
      helpers that replay switchdev events (FDB/MDB/VLAN additions and
      deletions) that can be lost by the switchdev drivers in a variety of
      circumstances:
      
      - an IP multicast group was host-joined on the bridge itself before any
        switchdev port joined the bridge, leading to the host MDB entries
        missing in the hardware database.
      - during the bridge creation process, the MAC address of the bridge was
        added to the FDB as an entry pointing towards the bridge device
        itself, but with no switchdev ports being part of the bridge yet, this
        local FDB entry would remain unknown to the switchdev hardware
        database.
      - a VLAN/FDB/MDB was added to a bridge port that is a LAG interface,
        before any switchdev port joined that LAG, leading to the hardware
        database missing those entries.
      - a switchdev port left a LAG that is a bridge port, while the LAG
        remained part of the bridge, and all FDB/MDB/VLAN entries remained
        installed in the hardware database of the switchdev port.
      
      Also, since commit 0d2cfbd4 ("net: bridge: ignore switchdev events
      for LAG ports which didn't request replay"), DSA introduced a method,
      based on a const void *ctx, to ensure that two switchdev ports under the
      same LAG that is a bridge port do not see the same MDB/VLAN entry being
      replayed twice by the bridge, once for every bridge port that joins the
      LAG.
      
      With so many ordering corner cases being possible, it seems unreasonable
      to expect a switchdev driver writer to get it right from the first try.
      Therefore, now that DSA has experimented with the bridge replay helpers
      for a little bit, we can move the code to the bridge driver where it is
      more readily available to all switchdev drivers.
      
      To convert the switchdev object replay helpers from "pull mode" (where
      the driver asks for them) to a "push mode" (where the bridge offers them
      automatically), the biggest problem is that the bridge needs to be aware
      when a switchdev port joins and leaves, even when the switchdev is only
      indirectly a bridge port (for example when the bridge port is a LAG
      upper of the switchdev).
      
      Luckily, we already have a hook for that, in the form of the newly
      introduced switchdev_bridge_port_offload() and
      switchdev_bridge_port_unoffload() calls. These offer a natural place for
      hooking the object addition and deletion replays.
      
      Extend the above 2 functions with:
      - pointers to the switchdev atomic notifier (for FDB replays) and the
        blocking notifier (for MDB and VLAN replays).
      - the "const void *ctx" argument required for drivers to be able to
        disambiguate between which port is targeted, when multiple ports are
        lowers of the same LAG that is a bridge port. Most of the drivers pass
        NULL to this argument, except the ones that support LAG offload and have
        the proper context check already in place in the switchdev blocking
        notifier handler.
      
      Also unexport the replay helpers, since nobody except the bridge calls
      them directly now.
      
      Note that:
      (a) we abuse the terminology slightly, because FDB entries are not
          "switchdev objects", but we count them as objects nonetheless.
          With no direct way to prove it, I think they are not modeled as
          switchdev objects because those can only be installed by the bridge
          to the hardware (as opposed to FDB entries which can be propagated
          in the other direction too). This is merely an abuse of terms, FDB
          entries are replayed too, despite not being objects.
      (b) the bridge does not attempt to sync port attributes to newly joined
          ports, just the countable stuff (the objects). The reason for this
          is simple: no universal and symmetric way to sync and unsync them is
          known. For example, VLAN filtering: what to do on unsync, disable or
          leave it enabled? Similarly, STP state, ageing timer, etc etc. What
          a switchdev port does when it becomes standalone again is not really
          up to the bridge's competence, and the driver should deal with it.
          On the other hand, replaying deletions of switchdev objects can be
          seen a matter of cleanup and therefore be treated by the bridge,
          hence this patch.
      
      We make the replay helpers opt-in for drivers, because they might not
      bring immediate benefits for them:
      
      - nbp_vlan_init() is called _after_ netdev_master_upper_dev_link(),
        so br_vlan_replay() should not do anything for the new drivers on
        which we call it. The existing drivers where there was even a slight
        possibility for there to exist a VLAN on a bridge port before they
        join it are already guarded against this: mlxsw and prestera deny
        joining LAG interfaces that are members of a bridge.
      
      - br_fdb_replay() should now notify of local FDB entries, but I patched
        all drivers except DSA to ignore these new entries in commit
        2c4eca3e ("net: bridge: switchdev: include local flag in FDB
        notifications"). Driver authors can lift this restriction as they
        wish, and when they do, they can also opt into the FDB replay
        functionality.
      
      - br_mdb_replay() should fix a real issue which is described in commit
        4f2673b3 ("net: bridge: add helper to replay port and host-joined
        mdb entries"). However most drivers do not offload the
        SWITCHDEV_OBJ_ID_HOST_MDB to see this issue: only cpsw and am65_cpsw
        offload this switchdev object, and I don't completely understand the
        way in which they offload this switchdev object anyway. So I'll leave
        it up to these drivers' respective maintainers to opt into
        br_mdb_replay().
      
      So most of the drivers pass NULL notifier blocks for the replay helpers,
      except:
      - dpaa2-switch which was already acked/regression-tested with the
        helpers enabled (and there isn't much of a downside in having them)
      - ocelot which already had replay logic in "pull" mode
      - DSA which already had replay logic in "pull" mode
      
      An important observation is that the drivers which don't currently
      request bridge event replays don't even have the
      switchdev_bridge_port_{offload,unoffload} calls placed in proper places
      right now. This was done to avoid unnecessary rework for drivers which
      might never even add support for this. For driver writers who wish to
      add replay support, this can be used as a tentative placement guide:
      https://patchwork.kernel.org/project/netdevbpf/patch/20210720134655.892334-11-vladimir.oltean@nxp.com/
      
      Cc: Vadym Kochan <vkochan@marvell.com>
      Cc: Taras Chornyi <tchornyi@marvell.com>
      Cc: Ioana Ciornei <ioana.ciornei@nxp.com>
      Cc: Lars Povlsen <lars.povlsen@microchip.com>
      Cc: Steen Hegelund <Steen.Hegelund@microchip.com>
      Cc: UNGLinuxDriver@microchip.com
      Cc: Claudiu Manoil <claudiu.manoil@nxp.com>
      Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
      Cc: Grygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: Ioana Ciornei <ioana.ciornei@nxp.com> # dpaa2-switch
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e51bf44
    • V
      net: bridge: switchdev: let drivers inform which bridge ports are offloaded · 2f5dc00f
      Vladimir Oltean 提交于
      On reception of an skb, the bridge checks if it was marked as 'already
      forwarded in hardware' (checks if skb->offload_fwd_mark == 1), and if it
      is, it assigns the source hardware domain of that skb based on the
      hardware domain of the ingress port. Then during forwarding, it enforces
      that the egress port must have a different hardware domain than the
      ingress one (this is done in nbp_switchdev_allowed_egress).
      
      Non-switchdev drivers don't report any physical switch id (neither
      through devlink nor .ndo_get_port_parent_id), therefore the bridge
      assigns them a hardware domain of 0, and packets coming from them will
      always have skb->offload_fwd_mark = 0. So there aren't any restrictions.
      
      Problems appear due to the fact that DSA would like to perform software
      fallback for bonding and team interfaces that the physical switch cannot
      offload.
      
             +-- br0 ---+
            / /   |      \
           / /    |       \
          /  |    |      bond0
         /   |    |     /    \
       swp0 swp1 swp2 swp3 swp4
      
      There, it is desirable that the presence of swp3 and swp4 under a
      non-offloaded LAG does not preclude us from doing hardware bridging
      beteen swp0, swp1 and swp2. The bandwidth of the CPU is often times high
      enough that software bridging between {swp0,swp1,swp2} and bond0 is not
      impractical.
      
      But this creates an impossible paradox given the current way in which
      port hardware domains are assigned. When the driver receives a packet
      from swp0 (say, due to flooding), it must set skb->offload_fwd_mark to
      something.
      
      - If we set it to 0, then the bridge will forward it towards swp1, swp2
        and bond0. But the switch has already forwarded it towards swp1 and
        swp2 (not to bond0, remember, that isn't offloaded, so as far as the
        switch is concerned, ports swp3 and swp4 are not looking up the FDB,
        and the entire bond0 is a destination that is strictly behind the
        CPU). But we don't want duplicated traffic towards swp1 and swp2, so
        it's not ok to set skb->offload_fwd_mark = 0.
      
      - If we set it to 1, then the bridge will not forward the skb towards
        the ports with the same switchdev mark, i.e. not to swp1, swp2 and
        bond0. Towards swp1 and swp2 that's ok, but towards bond0? It should
        have forwarded the skb there.
      
      So the real issue is that bond0 will be assigned the same hardware
      domain as {swp0,swp1,swp2}, because the function that assigns hardware
      domains to bridge ports, nbp_switchdev_add(), recurses through bond0's
      lower interfaces until it finds something that implements devlink (calls
      dev_get_port_parent_id with bool recurse = true). This is a problem
      because the fact that bond0 can be offloaded by swp3 and swp4 in our
      example is merely an assumption.
      
      A solution is to give the bridge explicit hints as to what hardware
      domain it should use for each port.
      
      Currently, the bridging offload is very 'silent': a driver registers a
      netdevice notifier, which is put on the netns's notifier chain, and
      which sniffs around for NETDEV_CHANGEUPPER events where the upper is a
      bridge, and the lower is an interface it knows about (one registered by
      this driver, normally). Then, from within that notifier, it does a bunch
      of stuff behind the bridge's back, without the bridge necessarily
      knowing that there's somebody offloading that port. It looks like this:
      
           ip link set swp0 master br0
                        |
                        v
       br_add_if() calls netdev_master_upper_dev_link()
                        |
                        v
              call_netdevice_notifiers
                        |
                        v
             dsa_slave_netdevice_event
                        |
                        v
              oh, hey! it's for me!
                        |
                        v
                 .port_bridge_join
      
      What we do to solve the conundrum is to be less silent, and change the
      switchdev drivers to present themselves to the bridge. Something like this:
      
           ip link set swp0 master br0
                        |
                        v
       br_add_if() calls netdev_master_upper_dev_link()
                        |
                        v                    bridge: Aye! I'll use this
              call_netdevice_notifiers           ^  ppid as the
                        |                        |  hardware domain for
                        v                        |  this port, and zero
             dsa_slave_netdevice_event           |  if I got nothing.
                        |                        |
                        v                        |
              oh, hey! it's for me!              |
                        |                        |
                        v                        |
                 .port_bridge_join               |
                        |                        |
                        +------------------------+
                   switchdev_bridge_port_offload(swp0, swp0)
      
      Then stacked interfaces (like bond0 on top of swp3/swp4) would be
      treated differently in DSA, depending on whether we can or cannot
      offload them.
      
      The offload case:
      
          ip link set bond0 master br0
                        |
                        v
       br_add_if() calls netdev_master_upper_dev_link()
                        |
                        v                    bridge: Aye! I'll use this
              call_netdevice_notifiers           ^  ppid as the
                        |                        |  switchdev mark for
                        v                        |        bond0.
             dsa_slave_netdevice_event           | Coincidentally (or not),
                        |                        | bond0 and swp0, swp1, swp2
                        v                        | all have the same switchdev
              hmm, it's not quite for me,        | mark now, since the ASIC
               but my driver has already         | is able to forward towards
                 called .port_lag_join           | all these ports in hw.
                for it, because I have           |
            a port with dp->lag_dev == bond0.    |
                        |                        |
                        v                        |
                 .port_bridge_join               |
                 for swp3 and swp4               |
                        |                        |
                        +------------------------+
                  switchdev_bridge_port_offload(bond0, swp3)
                  switchdev_bridge_port_offload(bond0, swp4)
      
      And the non-offload case:
      
          ip link set bond0 master br0
                        |
                        v
       br_add_if() calls netdev_master_upper_dev_link()
                        |
                        v                    bridge waiting:
              call_netdevice_notifiers           ^  huh, switchdev_bridge_port_offload
                        |                        |  wasn't called, okay, I'll use a
                        v                        |  hwdom of zero for this one.
             dsa_slave_netdevice_event           :  Then packets received on swp0 will
                        |                        :  not be software-forwarded towards
                        v                        :  swp1, but they will towards bond0.
               it's not for me, but
             bond0 is an upper of swp3
            and swp4, but their dp->lag_dev
             is NULL because they couldn't
                  offload it.
      
      Basically we can draw the conclusion that the lowers of a bridge port
      can come and go, so depending on the configuration of lowers for a
      bridge port, it can dynamically toggle between offloaded and unoffloaded.
      Therefore, we need an equivalent switchdev_bridge_port_unoffload too.
      
      This patch changes the way any switchdev driver interacts with the
      bridge. From now on, everybody needs to call switchdev_bridge_port_offload
      and switchdev_bridge_port_unoffload, otherwise the bridge will treat the
      port as non-offloaded and allow software flooding to other ports from
      the same ASIC.
      
      Note that these functions lay the ground for a more complex handshake
      between switchdev drivers and the bridge in the future.
      
      For drivers that will request a replay of the switchdev objects when
      they offload and unoffload a bridge port (DSA, dpaa2-switch, ocelot), we
      place the call to switchdev_bridge_port_unoffload() strategically inside
      the NETDEV_PRECHANGEUPPER notifier's code path, and not inside
      NETDEV_CHANGEUPPER. This is because the switchdev object replay helpers
      need the netdev adjacency lists to be valid, and that is only true in
      NETDEV_PRECHANGEUPPER.
      
      Cc: Vadym Kochan <vkochan@marvell.com>
      Cc: Taras Chornyi <tchornyi@marvell.com>
      Cc: Ioana Ciornei <ioana.ciornei@nxp.com>
      Cc: Lars Povlsen <lars.povlsen@microchip.com>
      Cc: Steen Hegelund <Steen.Hegelund@microchip.com>
      Cc: UNGLinuxDriver@microchip.com
      Cc: Claudiu Manoil <claudiu.manoil@nxp.com>
      Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
      Cc: Grygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> # dpaa2-switch: regression
      Acked-by: Ioana Ciornei <ioana.ciornei@nxp.com> # dpaa2-switch
      Tested-by: Horatiu Vultur <horatiu.vultur@microchip.com> # ocelot-switch
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2f5dc00f
  19. 20 7月, 2021 1 次提交
    • V
      net: dsa: tag_8021q: add proper cross-chip notifier support · c64b9c05
      Vladimir Oltean 提交于
      The big problem which mandates cross-chip notifiers for tag_8021q is
      this:
      
                                                   |
          sw0p0     sw0p1     sw0p2     sw0p3     sw0p4
       [  user ] [  user ] [  user ] [  dsa  ] [  cpu  ]
                                         |
                                         +---------+
                                                   |
          sw1p0     sw1p1     sw1p2     sw1p3     sw1p4
       [  user ] [  user ] [  user ] [  dsa  ] [  dsa  ]
                                         |
                                         +---------+
                                                   |
          sw2p0     sw2p1     sw2p2     sw2p3     sw2p4
       [  user ] [  user ] [  user ] [  dsa  ] [  dsa  ]
      
      When the user runs:
      
      ip link add br0 type bridge
      ip link set sw0p0 master br0
      ip link set sw2p0 master br0
      
      It doesn't work.
      
      This is because dsa_8021q_crosschip_bridge_join() assumes that "ds" and
      "other_ds" are at most 1 hop away from each other, so it is sufficient
      to add the RX VLAN of {ds, port} into {other_ds, other_port} and vice
      versa and presto, the cross-chip link works. When there is another
      switch in the middle, such as in this case switch 1 with its DSA links
      sw1p3 and sw1p4, somebody needs to tell it about these VLANs too.
      
      Which is exactly why the problem is quadratic: when a port joins a
      bridge, for each port in the tree that's already in that same bridge we
      notify a tag_8021q VLAN addition of that port's RX VLAN to the entire
      tree. It is a very complicated web of VLANs.
      
      It must be mentioned that currently we install tag_8021q VLANs on too
      many ports (DSA links - to be precise, on all of them). For example,
      when sw2p0 joins br0, and assuming sw1p0 was part of br0 too, we add the
      RX VLAN of sw2p0 on the DSA links of switch 0 too, even though there
      isn't any port of switch 0 that is a member of br0 (at least yet).
      In theory we could notify only the switches which sit in between the
      port joining the bridge and the port reacting to that bridge_join event.
      But in practice that is impossible, because of the way 'link' properties
      are described in the device tree. The DSA bindings require DT writers to
      list out not only the real/physical DSA links, but in fact the entire
      routing table, like for example switch 0 above will have:
      
      	sw0p3: port@3 {
      		link = <&sw1p4 &sw2p4>;
      	};
      
      This was done because:
      
      /* TODO: ideally DSA ports would have a single dp->link_dp member,
       * and no dst->rtable nor this struct dsa_link would be needed,
       * but this would require some more complex tree walking,
       * so keep it stupid at the moment and list them all.
       */
      
      but it is a perfect example of a situation where too much information is
      actively detrimential, because we are now in the position where we
      cannot distinguish a real DSA link from one that is put there to avoid
      the 'complex tree walking'. And because DT is ABI, there is not much we
      can change.
      
      And because we do not know which DSA links are real and which ones
      aren't, we can't really know if DSA switch A is in the data path between
      switches B and C, in the general case.
      
      So this is why tag_8021q RX VLANs are added on all DSA links, and
      probably why it will never change.
      
      On the other hand, at least the number of additions/deletions is well
      balanced, and this means that once we implement reference counting at
      the cross-chip notifier level a la fdb/mdb, there is absolutely zero
      need for a struct dsa_8021q_crosschip_link, it's all self-managing.
      
      In fact, with the tag_8021q notifiers emitted from the bridge join
      notifiers, it becomes so generic that sja1105 does not need to do
      anything anymore, we can just delete its implementation of the
      .crosschip_bridge_{join,leave} methods.
      
      Among other things we can simply delete is the home-grown implementation
      of sja1105_notify_crosschip_switches(). The reason why that is wrong is
      because it is not quadratic - it only covers remote switches to which we
      have a cross-chip bridging link and that does not cover in-between
      switches. This deletion is part of the same patch because sja1105 used
      to poke deep inside the guts of the tag_8021q context in order to do
      that. Because the cross-chip links went away, so needs the sja1105 code.
      
      Last but not least, dsa_8021q_setup_port() is simplified (and also
      renamed). Because our TAG_8021Q_VLAN_ADD notifier is designed to react
      on the CPU port too, the four dsa_8021q_vid_apply() calls:
      - 1 for RX VLAN on user port
      - 1 for the user port's RX VLAN on the CPU port
      - 1 for TX VLAN on user port
      - 1 for the user port's TX VLAN on the CPU port
      
      now get squashed into only 2 notifier calls via
      dsa_port_tag_8021q_vlan_add.
      
      And because the notifiers to add and to delete a tag_8021q VLAN are
      distinct, now we finally break up the port setup and teardown into
      separate functions instead of relying on a "bool enabled" flag which
      tells us what to do. Arguably it should have been this way from the
      get go.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c64b9c05
  20. 30 6月, 2021 3 次提交
    • V
      net: dsa: replay the local bridge FDB entries pointing to the bridge dev too · 63c51453
      Vladimir Oltean 提交于
      When we join a bridge that already has some local addresses pointing to
      itself, we do not get those notifications. Similarly, when we leave that
      bridge, we do not get notifications for the deletion of those entries.
      The only switchdev notifications we get are those of entries added while
      the DSA port is enslaved to the bridge.
      
      This makes use cases such as the following work properly (with the
      number of additions and removals properly balanced):
      
      ip link add br0 type bridge
      ip link add br1 type bridge
      ip link set br0 address 00:01:02:03:04:05
      ip link set br1 address 00:01:02:03:04:05
      ip link set swp0 up
      ip link set swp1 up
      ip link set swp0 master br0
      ip link set swp1 master br1
      ip link set br0 up
      ip link set br1 up
      ip link del br1 # 00:01:02:03:04:05 still installed on the CPU port
      ip link del br0 # 00:01:02:03:04:05 finally removed from the CPU port
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      63c51453
    • V
      net: dsa: install the host MDB and FDB entries in the master's RX filter · 26ee7b06
      Vladimir Oltean 提交于
      If the DSA master implements strict address filtering, then the unicast
      and multicast addresses kept by the DSA CPU ports should be synchronized
      with the address lists of the DSA master.
      
      Note that we want the synchronization of the master's address lists even
      if the DSA switch doesn't support unicast/multicast database operations,
      on the premises that the packets will be flooded to the CPU in that
      case, and we should still instruct the master to receive them. This is
      why we do the dev_uc_add() etc first, even if dsa_port_notify() returns
      -EOPNOTSUPP. In turn, dev_uc_add() and friends return error only if
      memory allocation fails, so it is probably ok to check and propagate
      that error code and not just ignore it.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      26ee7b06
    • V
      net: dsa: introduce a separate cross-chip notifier type for host FDBs · 3dc80afc
      Vladimir Oltean 提交于
      DSA treats some bridge FDB entries by trapping them to the CPU port.
      Currently, the only class of such entries are FDB addresses learnt by
      the software bridge on a foreign interface. However there are many more
      to be added:
      
      - FDB entries with the is_local flag (for termination) added by the
        bridge on the user ports (typically containing the MAC address of the
        bridge port)
      - FDB entries pointing towards the bridge net device (for termination).
        Typically these contain the MAC address of the bridge net device.
      - Static FDB entries installed on a foreign interface that is in the
        same bridge with a DSA user port.
      
      The reason why a separate cross-chip notifier for host FDBs is justified
      compared to normal FDBs is the same as in the case of host MDBs: the
      cross-chip notifier matching function in switch.c should avoid
      installing these entries on routing ports that route towards the
      targeted switch, but not towards the CPU. This is required in order to
      have proper support for H-like multi-chip topologies.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3dc80afc