1. 03 4月, 2023 1 次提交
    • V
      net: create a netdev notifier for DSA to reject PTP on DSA master · 88c0a6b5
      Vladimir Oltean 提交于
      The fact that PTP 2-step TX timestamping is broken on DSA switches if
      the master also timestamps the same packets is documented by commit
      f685e609 ("net: dsa: Deny PTP on master if switch supports it").
      We attempt to help the users avoid shooting themselves in the foot by
      making DSA reject the timestamping ioctls on an interface that is a DSA
      master, and the switch tree beneath it contains switches which are aware
      of PTP.
      
      The only problem is that there isn't an established way of intercepting
      ndo_eth_ioctl calls, so DSA creates avoidable burden upon the network
      stack by creating a struct dsa_netdevice_ops with overlaid function
      pointers that are manually checked from the relevant call sites. There
      used to be 2 such dsa_netdevice_ops, but now, ndo_eth_ioctl is the only
      one left.
      
      There is an ongoing effort to migrate driver-visible hardware timestamping
      control from the ndo_eth_ioctl() based API to a new ndo_hwtstamp_set()
      model, but DSA actively prevents that migration, since dsa_master_ioctl()
      is currently coded to manually call the master's legacy ndo_eth_ioctl(),
      and so, whenever a network device driver would be converted to the new
      API, DSA's restrictions would be circumvented, because any device could
      be used as a DSA master.
      
      The established way for unrelated modules to react on a net device event
      is via netdevice notifiers. So we create a new notifier which gets
      called whenever there is an attempt to change hardware timestamping
      settings on a device.
      
      Finally, there is another reason why a netdev notifier will be a good
      idea, besides strictly DSA, and this has to do with PHY timestamping.
      
      With ndo_eth_ioctl(), all MAC drivers must manually call
      phy_has_hwtstamp() before deciding whether to act upon SIOCSHWTSTAMP,
      otherwise they must pass this ioctl to the PHY driver via
      phy_mii_ioctl().
      
      With the new ndo_hwtstamp_set() API, it will be desirable to simply not
      make any calls into the MAC device driver when timestamping should be
      performed at the PHY level.
      
      But there exist drivers, such as the lan966x switch, which need to
      install packet traps for PTP regardless of whether they are the layer
      that provides the hardware timestamps, or the PHY is. That would be
      impossible to support with the new API.
      
      The proposal there, too, is to introduce a netdev notifier which acts as
      a better cue for switching drivers to add or remove PTP packet traps,
      than ndo_hwtstamp_set(). The one introduced here "almost" works there as
      well, except for the fact that packet traps should only be installed if
      the PHY driver succeeded to enable hardware timestamping, whereas here,
      we need to deny hardware timestamping on the DSA master before it
      actually gets enabled. This is why this notifier is called "PRE_", and
      the notifier that would get used for PHY timestamping and packet traps
      would be called NETDEV_CHANGE_HWTSTAMP. This isn't a new concept, for
      example NETDEV_CHANGEUPPER and NETDEV_PRECHANGEUPPER do the same thing.
      
      In expectation of future netlink UAPI, we also pass a non-NULL extack
      pointer to the netdev notifier, and we make DSA populate it with an
      informative reason for the rejection. To avoid making it go to waste, we
      make the ioctl-based dev_set_hwtstamp() create a fake extack and print
      the message to the kernel log.
      
      Link: https://lore.kernel.org/netdev/20230401191215.tvveoi3lkawgg6g4@skbuf/
      Link: https://lore.kernel.org/netdev/20230310164451.ls7bbs6pdzs4m6pw@skbuf/Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      88c0a6b5
  2. 23 1月, 2023 1 次提交
  3. 23 11月, 2022 2 次提交
  4. 18 11月, 2022 1 次提交
    • V
      net: dsa: stop exposing tag proto module helpers to the world · 9999f85b
      Vladimir Oltean 提交于
      The DSA tagging protocol driver macros are in the public include/net/dsa.h
      probably because that's also where the DSA_TAG_PROTO_*_VALUE macros are
      (MODULE_ALIAS_DSA_TAG_DRIVER hinges on those macro definitions).
      
      But there is no reason to expose these helpers to <net/dsa.h>. That
      header is shared between switch drivers (drivers/net/dsa/), tagging
      protocol drivers (net/dsa/tag_*.c), the DSA core (net/dsa/ sans tag_*.c),
      and the rest of the world (DSA master drivers, network stack, etc).
      Too much exposure.
      
      On the other hand, net/dsa/dsa_priv.h is included only by the DSA core
      and by DSA tagging protocol drivers (or IOW, "friend" modules). Also a
      bit too much exposure - I've contemplated creating a new header which is
      only included by tagging protocol drivers, but completely separating a
      new dsa_tag_proto.h from dsa_priv.h is not immediately trivial - for
      example dsa_slave_to_port() is used both from the fast path and from the
      control path.
      
      So for now, move these definitions to dsa_priv.h which at least hides
      them from the world.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: NMichael Walle <michael@walle.cc>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      9999f85b
  5. 16 11月, 2022 1 次提交
  6. 01 10月, 2022 1 次提交
  7. 20 9月, 2022 4 次提交
    • V
      net: dsa: allow masters to join a LAG · acc43b7b
      Vladimir Oltean 提交于
      There are 2 ways in which a DSA user port may become handled by 2 CPU
      ports in a LAG:
      
      (1) its current DSA master joins a LAG
      
       ip link del bond0 && ip link add bond0 type bond mode 802.3ad
       ip link set eno2 master bond0
      
      When this happens, all user ports with "eno2" as DSA master get
      automatically migrated to "bond0" as DSA master.
      
      (2) it is explicitly configured as such by the user
      
       # Before, the DSA master was eno3
       ip link set swp0 type dsa master bond0
      
      The design of this configuration is that the LAG device dynamically
      becomes a DSA master through dsa_master_setup() when the first physical
      DSA master becomes a LAG slave, and stops being so through
      dsa_master_teardown() when the last physical DSA master leaves.
      
      A LAG interface is considered as a valid DSA master only if it contains
      existing DSA masters, and no other lower interfaces. Therefore, we
      mainly rely on method (1) to enter this configuration.
      
      Each physical DSA master (LAG slave) retains its dev->dsa_ptr for when
      it becomes a standalone DSA master again. But the LAG master also has a
      dev->dsa_ptr, and this is actually duplicated from one of the physical
      LAG slaves, and therefore needs to be balanced when LAG slaves come and
      go.
      
      To the switch driver, putting DSA masters in a LAG is seen as putting
      their associated CPU ports in a LAG.
      
      We need to prepare cross-chip host FDB notifiers for CPU ports in a LAG,
      by calling the driver's ->lag_fdb_add method rather than ->port_fdb_add.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      acc43b7b
    • V
      net: dsa: propagate extack to port_lag_join · 2e359b00
      Vladimir Oltean 提交于
      Drivers could refuse to offload a LAG configuration for a variety of
      reasons, mainly having to do with its TX type. Additionally, since DSA
      masters may now also be LAG interfaces, and this will translate into a
      call to port_lag_join on the CPU ports, there may be extra restrictions
      there. Propagate the netlink extack to this DSA method in order for
      drivers to give a meaningful error message back to the user.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      2e359b00
    • V
      net: dsa: allow the DSA master to be seen and changed through rtnetlink · 95f510d0
      Vladimir Oltean 提交于
      Some DSA switches have multiple CPU ports, which can be used to improve
      CPU termination throughput, but DSA, through dsa_tree_setup_cpu_ports(),
      sets up only the first one, leading to suboptimal use of hardware.
      
      The desire is to not change the default configuration but to permit the
      user to create a dynamic mapping between individual user ports and the
      CPU port that they are served by, configurable through rtnetlink. It is
      also intended to permit load balancing between CPU ports, and in that
      case, the foreseen model is for the DSA master to be a bonding interface
      whose lowers are the physical DSA masters.
      
      To that end, we create a struct rtnl_link_ops for DSA user ports with
      the "dsa" kind. We expose the IFLA_DSA_MASTER link attribute that
      contains the ifindex of the newly desired DSA master.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      95f510d0
    • V
      net: dsa: introduce dsa_port_get_master() · 8f6a19c0
      Vladimir Oltean 提交于
      There is a desire to support for DSA masters in a LAG.
      
      That configuration is intended to work by simply enslaving the master to
      a bonding/team device. But the physical DSA master (the LAG slave) still
      has a dev->dsa_ptr, and that cpu_dp still corresponds to the physical
      CPU port.
      
      However, we would like to be able to retrieve the LAG that's the upper
      of the physical DSA master. In preparation for that, introduce a helper
      called dsa_port_get_master() that replaces all occurrences of the
      dp->cpu_dp->master pattern. The distinction between LAG and non-LAG will
      be made later within the helper itself.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      8f6a19c0
  8. 23 8月, 2022 1 次提交
  9. 02 7月, 2022 1 次提交
  10. 30 6月, 2022 1 次提交
  11. 27 6月, 2022 2 次提交
  12. 13 5月, 2022 3 次提交
    • V
      net: dsa: remove port argument from ->change_tag_protocol() · bacf93b0
      Vladimir Oltean 提交于
      DSA has not supported (and probably will not support in the future
      either) independent tagging protocols per CPU port.
      
      Different switch drivers have different requirements, some may need to
      replicate some settings for each CPU port, some may need to apply some
      settings on a single CPU port, while some may have to configure some
      global settings and then some per-CPU-port settings.
      
      In any case, the current model where DSA calls ->change_tag_protocol for
      each CPU port turns out to be impractical for drivers where there are
      global things to be done. For example, felix calls dsa_tag_8021q_register(),
      which makes no sense per CPU port, so it suppresses the second call.
      
      Let drivers deal with replication towards all CPU ports, and remove the
      CPU port argument from the function prototype.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NLuiz Angelo Daros de Luca <luizluca@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      bacf93b0
    • V
      net: dsa: felix: manage host flooding using a specific driver callback · 72c3b0c7
      Vladimir Oltean 提交于
      At the time - commit 7569459a ("net: dsa: manage flooding on the CPU
      ports") - not introducing a dedicated switch callback for host flooding
      made sense, because for the only user, the felix driver, there was
      nothing different to do for the CPU port than set the flood flags on the
      CPU port just like on any other bridge port.
      
      There are 2 reasons why this approach is not good enough, however.
      
      (1) Other drivers, like sja1105, support configuring flooding as a
          function of {ingress port, egress port}, whereas the DSA
          ->port_bridge_flags() function only operates on an egress port.
          So with that driver we'd have useless host flooding from user ports
          which don't need it.
      
      (2) Even with the felix driver, support for multiple CPU ports makes it
          difficult to piggyback on ->port_bridge_flags(). The way in which
          the felix driver is going to support host-filtered addresses with
          multiple CPU ports is that it will direct these addresses towards
          both CPU ports (in a sort of multicast fashion), then restrict the
          forwarding to only one of the two using the forwarding masks.
          Consequently, flooding will also be enabled towards both CPU ports.
          However, ->port_bridge_flags() gets passed the index of a single CPU
          port, and that leaves the flood settings out of sync between the 2
          CPU ports.
      
      This is to say, it's better to have a specific driver method for host
      flooding, which takes the user port as argument. This solves problem (1)
      by allowing the driver to do different things for different user ports,
      and problem (2) by abstracting the operation and letting the driver do
      whatever, rather than explicitly making the DSA core point to the CPU
      port it thinks needs to be touched.
      
      This new method also creates a problem, which is that cross-chip setups
      are not handled. However I don't have hardware right now where I can
      test what is the proper thing to do, and there isn't hardware compatible
      with multi-switch trees that supports host flooding. So it remains a
      problem to be tackled in the future.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      72c3b0c7
    • V
      net: dsa: introduce the dsa_cpu_ports() helper · 465c3de4
      Vladimir Oltean 提交于
      Similar to dsa_user_ports() which retrieves a port mask of all user
      ports, introduce dsa_cpu_ports() which retrieves the mask of all CPU
      ports of a switch.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      465c3de4
  13. 07 5月, 2022 1 次提交
  14. 18 3月, 2022 3 次提交
  15. 14 3月, 2022 2 次提交
    • V
      net: dsa: report and change port dscp priority using dcbnl · 47d75f78
      Vladimir Oltean 提交于
      Similar to the port-based default priority, IEEE 802.1Q-2018 allows the
      Application Priority Table to define QoS classes (0 to 7) per IP DSCP
      value (0 to 63).
      
      In the absence of an app table entry for a packet with DSCP value X,
      QoS classification for that packet falls back to other methods (VLAN PCP
      or port-based default). The presence of an app table for DSCP value X
      with priority Y makes the hardware classify the packet to QoS class Y.
      
      As opposed to the default-prio where DSA exposes only a "set" in
      dsa_switch_ops (because the port-based default is the fallback, it
      always exists, either implicitly or explicitly), for DSCP priorities we
      expose an "add" and a "del". The addition of a DSCP entry means trusting
      that DSCP priority, the deletion means ignoring it.
      
      Drivers that already trust (at least some) DSCP values can describe
      their configuration in dsa_switch_ops :: port_get_dscp_prio(), which is
      called for each DSCP value from 0 to 63.
      
      Again, there can be more than one dcbnl app table entry for the same
      DSCP value, DSA chooses the one with the largest configured priority.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      47d75f78
    • V
      net: dsa: report and change port default priority using dcbnl · d538eca8
      Vladimir Oltean 提交于
      The port-based default QoS class is assigned to packets that lack a
      VLAN PCP (or the port is configured to not trust the VLAN PCP),
      an IP DSCP (or the port is configured to not trust IP DSCP), and packets
      on which no tc-skbedit action has matched.
      
      Similar to other drivers, this can be exposed to user space using the
      DCB Application Priority Table. IEEE 802.1Q-2018 specifies in Table
      D-8 - Sel field values that when the Selector is 1, the Protocol ID
      value of 0 denotes the "Default application priority. For use when
      application priority is not otherwise specified."
      
      The way in which the dcbnl integration in DSA has been designed has to
      do with its requirements. Andrew Lunn explains that SOHO switches are
      expected to come with some sort of pre-configured QoS profile, and that
      it is desirable for this to come pre-loaded into the DSA slave interfaces'
      DCB application priority table.
      
      In the dcbnl design, this is possible because calls to dcb_ieee_setapp()
      can be initiated by anyone including being self-initiated by this device
      driver.
      
      However, what makes this challenging to implement in DSA is that the DSA
      core manages the net_devices (effectively hiding them from drivers),
      while drivers manage the hardware. The DSA core has no knowledge of what
      individual drivers' QoS policies are. DSA could export to drivers a
      wrapper over dcb_ieee_setapp() and these could call that function to
      pre-populate the app priority table, however drivers don't have a good
      moment in time to do this. The dsa_switch_ops :: setup() method gets
      called before the net_devices are created (dsa_slave_create), and so is
      dsa_switch_ops :: port_setup(). What remains is dsa_switch_ops ::
      port_enable(), but this gets called upon each ndo_open. If we add app
      table entries on every open, we'd need to remove them on close, to avoid
      duplicate entry errors. But if we delete app priority entries on close,
      what we delete may not be the initial, driver pre-populated entries, but
      rather user-added entries.
      
      So it is clear that letting drivers choose the timing of the
      dcb_ieee_setapp() call is inappropriate. The alternative which was
      chosen is to introduce hardware-specific ops in dsa_switch_ops, and
      effectively hide dcbnl details from drivers as well. For pre-populating
      the application table, dsa_slave_dcbnl_init() will call
      ds->ops->port_get_default_prio() which is supposed to read from
      hardware. If the operation succeeds, DSA creates a default-prio app
      table entry. The method is called as soon as the slave_dev is
      registered, but before we release the rtnl_mutex. This is done such that
      user space sees the app table entries as soon as it sees the interface
      being registered.
      
      The fact that we populate slave_dev->dcbnl_ops with a non-NULL pointer
      changes behavior in dcb_doit() from net/dcb/dcbnl.c, which used to
      return -EOPNOTSUPP for any dcbnl operation where netdev->dcbnl_ops is
      NULL. Because there are still dcbnl-unaware DSA drivers even if they
      have dcbnl_ops populated, the way to restore the behavior is to make all
      dcbnl_ops return -EOPNOTSUPP on absence of the hardware-specific
      dsa_switch_ops method.
      
      The dcbnl framework absurdly allows there to be more than one app table
      entry for the same selector and protocol (in other words, more than one
      port-based default priority). In the iproute2 dcb program, there is a
      "replace" syntactical sugar command which performs an "add" and a "del"
      to hide this away. But we choose the largest configured priority when we
      call ds->ops->port_set_default_prio(), using __fls(). When there is no
      default-prio app table entry left, the port-default priority is restored
      to 0.
      
      Link: https://patchwork.kernel.org/project/netdevbpf/patch/20210113154139.1803705-2-olteanv@gmail.com/Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d538eca8
  16. 09 3月, 2022 1 次提交
    • V
      net: dsa: felix: avoid early deletion of host FDB entries · 7e580490
      Vladimir Oltean 提交于
      The Felix driver declares FDB isolation but puts all standalone ports in
      VID 0. This is mostly problem-free as discussed with Alvin here:
      https://patchwork.kernel.org/project/netdevbpf/cover/20220302191417.1288145-1-vladimir.oltean@nxp.com/#24763870
      
      however there is one catch. DSA still thinks that FDB entries are
      installed on the CPU port as many times as there are user ports, and
      this is problematic when multiple user ports share the same MAC address.
      
      Consider the default case where all user ports inherit their MAC address
      from the DSA master, and then the user runs:
      
      ip link set swp0 address 00:01:02:03:04:05
      
      The above will make dsa_slave_set_mac_address() call
      dsa_port_standalone_host_fdb_add() for 00:01:02:03:04:05 in port 0's
      standalone database, and dsa_port_standalone_host_fdb_del() for the old
      address of swp0, again in swp0's standalone database.
      
      Both the ->port_fdb_add() and ->port_fdb_del() will be propagated down
      to the felix driver, which will end up deleting the old MAC address from
      the CPU port. But this is still in use by other user ports, so we end up
      breaking unicast termination for them.
      
      There isn't a problem in the fact that DSA keeps track of host
      standalone addresses in the individual database of each user port: some
      drivers like sja1105 need this. There also isn't a problem in the fact
      that some drivers choose the same VID/FID for all standalone ports.
      It is just that the deletion of these host addresses must be delayed
      until they are known to not be in use any longer, and only the driver
      has this knowledge. Since DSA keeps these addresses in &cpu_dp->fdbs and
      &cpu_db->mdbs, it is just a matter of walking over those lists and see
      whether the same MAC address is present on the CPU port in the port db
      of another user port.
      
      I have considered reusing the generic dsa_port_walk_fdbs() and
      dsa_port_walk_mdbs() schemes for this, but locking makes it difficult.
      In the ->port_fdb_add() method and co, &dp->addr_lists_lock is held, but
      dsa_port_walk_fdbs() also acquires that lock. Also, even assuming that
      we introduce an unlocked variant of the address iterator, we'd still
      need some relatively complex data structures, and a void *ctx in the
      dsa_fdb_walk_cb_t which we don't currently pass, such that drivers are
      able to figure out, after iterating, whether the same MAC address is or
      isn't present in the port db of another port.
      
      All the above, plus the fact that I expect other drivers to follow the
      same model as felix where all standalone ports use the same FID, made me
      conclude that a generic method provided by DSA is necessary:
      dsa_fdb_present_in_other_db() and the mdb equivalent. Felix calls this
      from the ->port_fdb_del() handler for the CPU port, when the database
      was classified to either a port db, or a LAG db.
      
      For symmetry, we also call this from ->port_fdb_add(), because if the
      address was installed once, then installing it a second time serves no
      purpose: it's already in hardware in VID 0 and it affects all standalone
      ports.
      
      This change moves dsa_db_equal() from switch.c to dsa.c, since it now
      has one more caller.
      
      Fixes: 54c31984 ("net: mscc: ocelot: enforce FDB isolation when VLAN-unaware")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e580490
  17. 05 3月, 2022 1 次提交
  18. 03 3月, 2022 1 次提交
    • V
      net: dsa: felix: migrate host FDB and MDB entries when changing tag proto · f9cef64f
      Vladimir Oltean 提交于
      The "ocelot" and "ocelot-8021q" tagging protocols make use of different
      hardware resources, and host FDB entries have different destination
      ports in the switch analyzer module, practically speaking.
      
      So when the user requests a tagging protocol change, the driver must
      migrate all host FDB and MDB entries from the NPI port (in fact CPU port
      module) towards the same physical port, but this time used as a regular
      port.
      
      It is pointless for the felix driver to keep a copy of the host
      addresses, when we can create and export DSA helpers for walking through
      the addresses that it already needs to keep on the CPU port, for
      refcounting purposes.
      
      felix_classify_db() is moved up to avoid a forward declaration.
      
      We pass "bool change" because dp->fdbs and dp->mdbs are uninitialized
      lists when felix_setup() first calls felix_set_tag_protocol(), so we
      need to avoid calling dsa_port_walk_fdbs() during probe time.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f9cef64f
  19. 27 2月, 2022 2 次提交
    • V
      net: dsa: pass extack to .port_bridge_join driver methods · 06b9cce4
      Vladimir Oltean 提交于
      As FDB isolation cannot be enforced between VLAN-aware bridges in lack
      of hardware assistance like extra FID bits, it seems plausible that many
      DSA switches cannot do it. Therefore, they need to reject configurations
      with multiple VLAN-aware bridges from the two code paths that can
      transition towards that state:
      
      - joining a VLAN-aware bridge
      - toggling VLAN awareness on an existing bridge
      
      The .port_vlan_filtering method already propagates the netlink extack to
      the driver, let's propagate it from .port_bridge_join too, to make sure
      that the driver can use the same function for both.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06b9cce4
    • V
      net: dsa: request drivers to perform FDB isolation · c2693363
      Vladimir Oltean 提交于
      For DSA, to encourage drivers to perform FDB isolation simply means to
      track which bridge does each FDB and MDB entry belong to. It then
      becomes the driver responsibility to use something that makes the FDB
      entry from one bridge not match the FDB lookup of ports from other
      bridges.
      
      The top-level functions where the bridge is determined are:
      - dsa_port_fdb_{add,del}
      - dsa_port_host_fdb_{add,del}
      - dsa_port_mdb_{add,del}
      - dsa_port_host_mdb_{add,del}
      
      aka the pre-crosschip-notifier functions.
      
      Changing the API to pass a reference to a bridge is not superfluous, and
      looking at the passed bridge argument is not the same as having the
      driver look at dsa_to_port(ds, port)->bridge from the ->port_fdb_add()
      method.
      
      DSA installs FDB and MDB entries on shared (CPU and DSA) ports as well,
      and those do not have any dp->bridge information to retrieve, because
      they are not in any bridge - they are merely the pipes that serve the
      user ports that are in one or multiple bridges.
      
      The struct dsa_bridge associated with each FDB/MDB entry is encapsulated
      in a larger "struct dsa_db" database. Although only databases associated
      to bridges are notified for now, this API will be the starting point for
      implementing IFF_UNICAST_FLT in DSA. There, the idea is to install FDB
      entries on the CPU port which belong to the corresponding user port's
      port database. These are supposed to match only when the port is
      standalone.
      
      It is better to introduce the API in its expected final form than to
      introduce it for bridges first, then to have to change drivers which may
      have made one or more assumptions.
      
      Drivers can use the provided bridge.num, but they can also use a
      different numbering scheme that is more convenient.
      
      DSA must perform refcounting on the CPU and DSA ports by also taking
      into account the bridge number. So if two bridges request the same local
      address, DSA must notify the driver twice, once for each bridge.
      
      In fact, if the driver supports FDB isolation, DSA must perform
      refcounting per bridge, but if the driver doesn't, DSA must refcount
      host addresses across all bridges, otherwise it would be telling the
      driver to delete an FDB entry for a bridge and the driver would delete
      it for all bridges. So introduce a bool fdb_isolation in drivers which
      would make all bridge databases passed to the cross-chip notifier have
      the same number (0). This makes dsa_mac_addr_find() -> dsa_db_equal()
      say that all bridge databases are the same database - which is
      essentially the legacy behavior.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c2693363
  20. 25 2月, 2022 4 次提交
    • V
      net: dsa: support FDB events on offloaded LAG interfaces · e212fa7c
      Vladimir Oltean 提交于
      This change introduces support for installing static FDB entries towards
      a bridge port that is a LAG of multiple DSA switch ports, as well as
      support for filtering towards the CPU local FDB entries emitted for LAG
      interfaces that are bridge ports.
      
      Conceptually, host addresses on LAG ports are identical to what we do
      for plain bridge ports. Whereas FDB entries _towards_ a LAG can't simply
      be replicated towards all member ports like we do for multicast, or VLAN.
      Instead we need new driver API. Hardware usually considers a LAG to be a
      "logical port", and sets the entire LAG as the forwarding destination.
      The physical egress port selection within the LAG is made by hashing
      policy, as usual.
      
      To represent the logical port corresponding to the LAG, we pass by value
      a copy of the dsa_lag structure to all switches in the tree that have at
      least one port in that LAG.
      
      To illustrate why a refcounted list of FDB entries is needed in struct
      dsa_lag, it is enough to say that:
      - a LAG may be a bridge port and may therefore receive FDB events even
        while it isn't yet offloaded by any DSA interface
      - DSA interfaces may be removed from a LAG while that is a bridge port;
        we don't want FDB entries lingering around, but we don't want to
        remove entries that are still in use, either
      
      For all the cases below to work, the idea is to always keep an FDB entry
      on a LAG with a reference count equal to the DSA member ports. So:
      - if a port joins a LAG, it requests the bridge to replay the FDB, and
        the FDB entries get created, or their refcount gets bumped by one
      - if a port leaves a LAG, the FDB replay deletes or decrements refcount
        by one
      - if an FDB is installed towards a LAG with ports already present, that
        entry is created (if it doesn't exist) and its refcount is bumped by
        the amount of ports already present in the LAG
      
      echo "Adding FDB entry to bond with existing ports"
      ip link del bond0
      ip link add bond0 type bond mode 802.3ad
      ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up
      ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up
      ip link del br0
      ip link add br0 type bridge
      ip link set bond0 master br0
      bridge fdb add dev bond0 00:01:02:03:04:05 master static
      
      ip link del br0
      ip link del bond0
      
      echo "Adding FDB entry to empty bond"
      ip link del bond0
      ip link add bond0 type bond mode 802.3ad
      ip link del br0
      ip link add br0 type bridge
      ip link set bond0 master br0
      bridge fdb add dev bond0 00:01:02:03:04:05 master static
      ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up
      ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up
      
      ip link del br0
      ip link del bond0
      
      echo "Adding FDB entry to empty bond, then removing ports one by one"
      ip link del bond0
      ip link add bond0 type bond mode 802.3ad
      ip link del br0
      ip link add br0 type bridge
      ip link set bond0 master br0
      bridge fdb add dev bond0 00:01:02:03:04:05 master static
      ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up
      ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up
      
      ip link set swp1 nomaster
      ip link set swp2 nomaster
      ip link del br0
      ip link del bond0
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      e212fa7c
    • V
      net: dsa: create a dsa_lag structure · dedd6a00
      Vladimir Oltean 提交于
      The main purpose of this change is to create a data structure for a LAG
      as seen by DSA. This is similar to what we have for bridging - we pass a
      copy of this structure by value to ->port_lag_join and ->port_lag_leave.
      For now we keep the lag_dev, id and a reference count in it. Future
      patches will add a list of FDB entries for the LAG (these also need to
      be refcounted to work properly).
      
      The LAG structure is created using dsa_port_lag_create() and destroyed
      using dsa_port_lag_destroy(), just like we have for bridging.
      
      Because now, the dsa_lag itself is refcounted, we can simplify
      dsa_lag_map() and dsa_lag_unmap(). These functions need to keep a LAG in
      the dst->lags array only as long as at least one port uses it. The
      refcounting logic inside those functions can be removed now - they are
      called only when we should perform the operation.
      
      dsa_lag_dev() is renamed to dsa_lag_by_id() and now returns the dsa_lag
      structure instead of the lag_dev net_device.
      
      dsa_lag_foreach_port() now takes the dsa_lag structure as argument.
      
      dst->lags holds an array of dsa_lag structures.
      
      dsa_lag_map() now also saves the dsa_lag->id value, so that linear
      walking of dst->lags in drivers using dsa_lag_id() is no longer
      necessary. They can just look at lag.id.
      
      dsa_port_lag_id_get() is a helper, similar to dsa_port_bridge_num_get(),
      which can be used by drivers to get the LAG ID assigned by DSA to a
      given port.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      dedd6a00
    • V
      net: dsa: make LAG IDs one-based · 3d4a0a2a
      Vladimir Oltean 提交于
      The DSA LAG API will be changed to become more similar with the bridge
      data structures, where struct dsa_bridge holds an unsigned int num,
      which is generated by DSA and is one-based. We have a similar thing
      going with the DSA LAG, except that isn't stored anywhere, it is
      calculated dynamically by dsa_lag_id() by iterating through dst->lags.
      
      The idea of encoding an invalid (or not requested) LAG ID as zero for
      the purpose of simplifying checks in drivers means that the LAG IDs
      passed by DSA to drivers need to be one-based too. So back-and-forth
      conversion is needed when indexing the dst->lags array, as well as in
      drivers which assume a zero-based index.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      3d4a0a2a
    • V
      net: dsa: rename references to "lag" as "lag_dev" · 46a76724
      Vladimir Oltean 提交于
      In preparation of converting struct net_device *dp->lag_dev into a
      struct dsa_lag *dp->lag, we need to rename, for consistency purposes,
      all occurrences of the "lag" variable in the DSA core to "lag_dev".
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      46a76724
  21. 20 2月, 2022 1 次提交
  22. 18 2月, 2022 2 次提交
  23. 16 2月, 2022 1 次提交
    • V
      net: dsa: add explicit support for host bridge VLANs · 134ef238
      Vladimir Oltean 提交于
      Currently, DSA programs VLANs on shared (DSA and CPU) ports each time it
      does so on user ports. This is good for basic functionality but has
      several limitations:
      
      - the VLAN group which must reach the CPU may be radically different
        from the VLAN group that must be autonomously forwarded by the switch.
        In other words, the admin may want to isolate noisy stations and avoid
        traffic from them going to the control processor of the switch, where
        it would just waste useless cycles. The bridge already supports
        independent control of VLAN groups on bridge ports and on the bridge
        itself, and when VLAN-aware, it will drop packets in software anyway
        if their VID isn't added as a 'self' entry towards the bridge device.
      
      - Replaying host FDB entries may depend, for some drivers like mv88e6xxx,
        on replaying the host VLANs as well. The 2 VLAN groups are
        approximately the same in most regular cases, but there are corner
        cases when timing matters, and DSA's approximation of replicating
        VLANs on shared ports simply does not work.
      
      - If a user makes the bridge (implicitly the CPU port) join a VLAN by
        accident, there is no way for the CPU port to isolate itself from that
        noisy VLAN except by rebooting the system. This is because for each
        VLAN added on a user port, DSA will add it on shared ports too, but
        for each VLAN deletion on a user port, it will remain installed on
        shared ports, since DSA has no good indication of whether the VLAN is
        still in use or not.
      
      Now that the bridge driver emits well-balanced SWITCHDEV_OBJ_ID_PORT_VLAN
      addition and removal events, DSA has a simple and straightforward task
      of separating the bridge port VLANs (these have an orig_dev which is a
      DSA slave interface, or a LAG interface) from the host VLANs (these have
      an orig_dev which is a bridge interface), and to keep a simple reference
      count of each VID on each shared port.
      
      Forwarding VLANs must be installed on the bridge ports and on all DSA
      ports interconnecting them. We don't have a good view of the exact
      topology, so we simply install forwarding VLANs on all DSA ports, which
      is what has been done until now.
      
      Host VLANs must be installed primarily on the dedicated CPU port of each
      bridge port. More subtly, they must also be installed on upstream-facing
      and downstream-facing DSA ports that are connecting the bridge ports and
      the CPU. This ensures that the mv88e6xxx's problem (VID of host FDB
      entry may be absent from VTU) is still addressed even if that switch is
      in a cross-chip setup, and it has no local CPU port.
      
      Therefore:
      - user ports contain only bridge port (forwarding) VLANs, and no
        refcounting is necessary
      - DSA ports contain both forwarding and host VLANs. Refcounting is
        necessary among these 2 types.
      - CPU ports contain only host VLANs. Refcounting is also necessary.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      134ef238
  24. 14 2月, 2022 1 次提交
  25. 09 2月, 2022 1 次提交