1. 13 10月, 2021 1 次提交
    • V
      net: dsa: sja1105: break dependency between dsa_port_is_sja1105 and switch driver · 4ac0567e
      Vladimir Oltean 提交于
      It's nice to be able to test a tagging protocol with dsa_loop, but not
      at the cost of losing the ability of building the tagging protocol and
      switch driver as modules, because as things stand, there is a circular
      dependency between the two. Tagging protocol drivers cannot depend on
      switch drivers, that is a hard fact.
      
      The reasoning behind the blamed patch was that accessing dp->priv should
      first make sure that the structure behind that pointer is what we really
      think it is.
      
      Currently the "sja1105" and "sja1110" tagging protocols only operate
      with the sja1105 switch driver, just like any other tagging protocol and
      switch combination. The only way to mix and match them is by modifying
      the code, and this applies to dsa_loop as well (by default that uses
      DSA_TAG_PROTO_NONE). So while in principle there is an issue, in
      practice there isn't one.
      
      Until we extend dsa_loop to allow user space configuration, treat the
      problem as a non-issue and just say that DSA ports found by tag_sja1105
      are always sja1105 ports, which is in fact true. But keep the
      dsa_port_is_sja1105 function so that it's easy to patch it during
      testing, and rely on dead code elimination.
      
      Fixes: 994d2cbb ("net: dsa: tag_sja1105: be dsa_loop-safe")
      Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      4ac0567e
  2. 23 9月, 2021 4 次提交
  3. 19 9月, 2021 1 次提交
    • V
      net: dsa: be compatible with masters which unregister on shutdown · 0650bf52
      Vladimir Oltean 提交于
      Lino reports that on his system with bcmgenet as DSA master and KSZ9897
      as a switch, rebooting or shutting down never works properly.
      
      What does the bcmgenet driver have special to trigger this, that other
      DSA masters do not? It has an implementation of ->shutdown which simply
      calls its ->remove implementation. Otherwise said, it unregisters its
      network interface on shutdown.
      
      This message can be seen in a loop, and it hangs the reboot process there:
      
      unregister_netdevice: waiting for eth0 to become free. Usage count = 3
      
      So why 3?
      
      A usage count of 1 is normal for a registered network interface, and any
      virtual interface which links itself as an upper of that will increment
      it via dev_hold. In the case of DSA, this is the call path:
      
      dsa_slave_create
      -> netdev_upper_dev_link
         -> __netdev_upper_dev_link
            -> __netdev_adjacent_dev_insert
               -> dev_hold
      
      So a DSA switch with 3 interfaces will result in a usage count elevated
      by two, and netdev_wait_allrefs will wait until they have gone away.
      
      Other stacked interfaces, like VLAN, watch NETDEV_UNREGISTER events and
      delete themselves, but DSA cannot just vanish and go poof, at most it
      can unbind itself from the switch devices, but that must happen strictly
      earlier compared to when the DSA master unregisters its net_device, so
      reacting on the NETDEV_UNREGISTER event is way too late.
      
      It seems that it is a pretty established pattern to have a driver's
      ->shutdown hook redirect to its ->remove hook, so the same code is
      executed regardless of whether the driver is unbound from the device, or
      the system is just shutting down. As Florian puts it, it is quite a big
      hammer for bcmgenet to unregister its net_device during shutdown, but
      having a common code path with the driver unbind helps ensure it is well
      tested.
      
      So DSA, for better or for worse, has to live with that and engage in an
      arms race of implementing the ->shutdown hook too, from all individual
      drivers, and do something sane when paired with masters that unregister
      their net_device there. The only sane thing to do, of course, is to
      unlink from the master.
      
      However, complications arise really quickly.
      
      The pattern of redirecting ->shutdown to ->remove is not unique to
      bcmgenet or even to net_device drivers. In fact, SPI controllers do it
      too (see dspi_shutdown -> dspi_remove), and presumably, I2C controllers
      and MDIO controllers do it too (this is something I have not researched
      too deeply, but even if this is not the case today, it is certainly
      plausible to happen in the future, and must be taken into consideration).
      
      Since DSA switches might be SPI devices, I2C devices, MDIO devices, the
      insane implication is that for the exact same DSA switch device, we
      might have both ->shutdown and ->remove getting called.
      
      So we need to do something with that insane environment. The pattern
      I've come up with is "if this, then not that", so if either ->shutdown
      or ->remove gets called, we set the device's drvdata to NULL, and in the
      other hook, we check whether the drvdata is NULL and just do nothing.
      This is probably not necessary for platform devices, just for devices on
      buses, but I would really insist for consistency among drivers, because
      when code is copy-pasted, it is not always copy-pasted from the best
      sources.
      
      So depending on whether the DSA switch's ->remove or ->shutdown will get
      called first, we cannot really guarantee even for the same driver if
      rebooting will result in the same code path on all platforms. But
      nonetheless, we need to do something minimally reasonable on ->shutdown
      too to fix the bug. Of course, the ->remove will do more (a full
      teardown of the tree, with all data structures freed, and this is why
      the bug was not caught for so long). The new ->shutdown method is kept
      separate from dsa_unregister_switch not because we couldn't have
      unregistered the switch, but simply in the interest of doing something
      quick and to the point.
      
      The big question is: does the DSA switch's ->shutdown get called earlier
      than the DSA master's ->shutdown? If not, there is still a risk that we
      might still trigger the WARN_ON in unregister_netdevice that says we are
      attempting to unregister a net_device which has uppers. That's no good.
      Although the reference to the master net_device won't physically go away
      even if DSA's ->shutdown comes afterwards, remember we have a dev_hold
      on it.
      
      The answer to that question lies in this comment above device_link_add:
      
       * A side effect of the link creation is re-ordering of dpm_list and the
       * devices_kset list by moving the consumer device and all devices depending
       * on it to the ends of these lists (that does not happen to devices that have
       * not been registered when this function is called).
      
      so the fact that DSA uses device_link_add towards its master is not
      exactly for nothing. device_shutdown() walks devices_kset from the back,
      so this is our guarantee that DSA's shutdown happens before the master's
      shutdown.
      
      Fixes: 2f1e8ea7 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings")
      Link: https://lore.kernel.org/netdev/20210909095324.12978-1-LinoSanfilippo@gmx.de/Reported-by: NLino Sanfilippo <LinoSanfilippo@gmx.de>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0650bf52
  4. 25 8月, 2021 3 次提交
    • V
      net: dsa: tag_sja1105: stop asking the sja1105 driver in sja1105_xmit_tpid · 8ded9160
      Vladimir Oltean 提交于
      Introduced in commit 38b5beea ("net: dsa: sja1105: prepare tagger
      for handling DSA tags and VLAN simultaneously"), the sja1105_xmit_tpid
      function solved quite a different problem than our needs are now.
      
      Then, we used best-effort VLAN filtering and we were using the xmit_tpid
      to tunnel packets coming from an 8021q upper through the TX VLAN allocated
      by tag_8021q to that egress port. The need for a different VLAN protocol
      depending on switch revision came from the fact that this in itself was
      more of a hack to trick the hardware into accepting tunneled VLANs in
      the first place.
      
      Right now, we deny 8021q uppers (see sja1105_prechangeupper). Even if we
      supported them again, we would not do that using the same method of
      {tunneling the VLAN on egress, retagging the VLAN on ingress} that we
      had in the best-effort VLAN filtering mode. It seems rather simpler that
      we just allocate a VLAN in the VLAN table that is simply not used by the
      bridge at all, or by any other port.
      
      Anyway, I have 2 gripes with the current sja1105_xmit_tpid:
      
      1. When sending packets on behalf of a VLAN-aware bridge (with the new
         TX forwarding offload framework) plus untagged (with the tag_8021q
         VLAN added by the tagger) packets, we can see that on SJA1105P/Q/R/S
         and later (which have a qinq_tpid of ETH_P_8021AD), some packets sent
         through the DSA master have a VLAN protocol of 0x8100 and others of
         0x88a8. This is strange and there is no reason for it now. If we have
         a bridge and are therefore forced to send using that bridge's TPID,
         we can as well blend with that bridge's VLAN protocol for all packets.
      
      2. The sja1105_xmit_tpid introduces a dependency on the sja1105 driver,
         because it looks inside dp->priv. It is desirable to keep as much
         separation between taggers and switch drivers as possible. Now it
         doesn't do that anymore.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ded9160
    • V
      net: dsa: sja1105: drop untagged packets on the CPU and DSA ports · b0b8c67e
      Vladimir Oltean 提交于
      The sja1105 driver is a bit special in its use of VLAN headers as DSA
      tags. This is because in VLAN-aware mode, the VLAN headers use an actual
      TPID of 0x8100, which is understood even by the DSA master as an actual
      VLAN header.
      
      Furthermore, control packets such as PTP and STP are transmitted with no
      VLAN header as a DSA tag, because, depending on switch generation, there
      are ways to steer these control packets towards a precise egress port
      other than VLAN tags. Transmitting control packets as untagged means
      leaving a door open for traffic in general to be transmitted as untagged
      from the DSA master, and for it to traverse the switch and exit a random
      switch port according to the FDB lookup.
      
      This behavior is a bit out of line with other DSA drivers which have
      native support for DSA tagging. There, it is to be expected that the
      switch only accepts DSA-tagged packets on its CPU port, dropping
      everything that does not match this pattern.
      
      We perhaps rely a bit too much on the switches' hardware dropping on the
      CPU port, and place no other restrictions in the kernel data path to
      avoid that. For example, sja1105 is also a bit special in that STP/PTP
      packets are transmitted using "management routes"
      (sja1105_port_deferred_xmit): when sending a link-local packet from the
      CPU, we must first write a SPI message to the switch to tell it to
      expect a packet towards multicast MAC DA 01-80-c2-00-00-0e, and to route
      it towards port 3 when it gets it. This entry expires as soon as it
      matches a packet received by the switch, and it needs to be reinstalled
      for the next packet etc. All in all quite a ghetto mechanism, but it is
      all that the sja1105 switches offer for injecting a control packet.
      The driver takes a mutex for serializing control packets and making the
      pairs of SPI writes of a management route and its associated skb atomic,
      but to be honest, a mutex is only relevant as long as all parties agree
      to take it. With the DSA design, it is possible to open an AF_PACKET
      socket on the DSA master net device, and blast packets towards
      01-80-c2-00-00-0e, and whatever locking the DSA switch driver might use,
      it all goes kaput because management routes installed by the driver will
      match skbs sent by the DSA master, and not skbs generated by the driver
      itself. So they will end up being routed on the wrong port.
      
      So through the lens of that, maybe it would make sense to avoid that
      from happening by doing something in the network stack, like: introduce
      a new bit in struct sk_buff, like xmit_from_dsa. Then, somewhere around
      dev_hard_start_xmit(), introduce the following check:
      
      	if (netdev_uses_dsa(dev) && !skb->xmit_from_dsa)
      		kfree_skb(skb);
      
      Ok, maybe that is a bit drastic, but that would at least prevent a bunch
      of problems. For example, right now, even though the majority of DSA
      switches drop packets without DSA tags sent by the DSA master (and
      therefore the majority of garbage that user space daemons like avahi and
      udhcpcd and friends create), it is still conceivable that an aggressive
      user space program can open an AF_PACKET socket and inject a spoofed DSA
      tag directly on the DSA master. We have no protection against that; the
      packet will be understood by the switch and be routed wherever user
      space says. Furthermore: there are some DSA switches where we even have
      register access over Ethernet, using DSA tags. So even user space
      drivers are possible in this way. This is a huge hole.
      
      However, the biggest thing that bothers me is that udhcpcd attempts to
      ask for an IP address on all interfaces by default, and with sja1105, it
      will attempt to get a valid IP address on both the DSA master as well as
      on sja1105 switch ports themselves. So with IP addresses in the same
      subnet on multiple interfaces, the routing table will be messed up and
      the system will be unusable for traffic until it is configured manually
      to not ask for an IP address on the DSA master itself.
      
      It turns out that it is possible to avoid that in the sja1105 driver, at
      least very superficially, by requesting the switch to drop VLAN-untagged
      packets on the CPU port. With the exception of control packets, all
      traffic originated from tag_sja1105.c is already VLAN-tagged, so only
      STP and PTP packets need to be converted. For that, we need to uphold
      the equivalence between an untagged and a pvid-tagged packet, and to
      remember that the CPU port of sja1105 uses a pvid of 4095.
      
      Now that we drop untagged traffic on the CPU port, non-aggressive user
      space applications like udhcpcd stop bothering us, and sja1105 effectively
      becomes just as vulnerable to the aggressive kind of user space programs
      as other DSA switches are (ok, users can also create 8021q uppers on top
      of the DSA master in the case of sja1105, but in future patches we can
      easily deny that, but it still doesn't change the fact that VLAN-tagged
      packets can still be injected over raw sockets).
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b0b8c67e
    • V
      net: dsa: sja1105: prevent tag_8021q VLANs from being received on user ports · 73ceab83
      Vladimir Oltean 提交于
      Currently it is possible for an attacker to craft packets with a fake
      DSA tag and send them to us, and our user ports will accept them and
      preserve that VLAN when transmitting towards the CPU. Then the tagger
      will be misled into thinking that the packets came on a different port
      than they really came on.
      
      Up until recently there wasn't a good option to prevent this from
      happening. In SJA1105P and later, the MAC Configuration Table introduced
      two options called:
      - DRPSITAG: Drop Single Inner Tagged Frames
      - DRPSOTAG: Drop Single Outer Tagged Frames
      
      Because the sja1105 driver classifies all VLANs as "outer VLANs" (S-Tags),
      it would be in principle possible to enable the DRPSOTAG bit on ports
      using tag_8021q, and drop on ingress all packets which have a VLAN tag.
      When the switch is VLAN-unaware, this works, because it uses a custom
      TPID of 0xdadb, so any "tagged" packets received on a user port are
      probably a spoofing attempt. But when the switch overall is VLAN-aware,
      and some ports are standalone (therefore they use tag_8021q), the TPID
      is 0x8100, and the port can receive a mix of untagged and VLAN-tagged
      packets. The untagged ones will be classified to the tag_8021q pvid, and
      the tagged ones to the VLAN ID from the packet header. Yes, it is true
      that since commit 4fbc08bd ("net: dsa: sja1105: deny 8021q uppers on
      ports") we no longer support this mixed mode, but that is a temporary
      limitation which will eventually be lifted. It would be nice to not
      introduce one more restriction via DRPSOTAG, which would make the
      standalone ports of a VLAN-aware switch drop genuinely VLAN-tagged
      packets.
      
      Also, the DRPSOTAG bit is not available on the first generation of
      switches (SJA1105E, SJA1105T). So since one of the key features of this
      driver is compatibility across switch generations, this makes it an even
      less desirable approach.
      
      The breakthrough comes from commit bef0746c ("net: dsa: sja1105:
      make sure untagged packets are dropped on ingress ports with no pvid"),
      where it became obvious that untagged packets are not dropped even if
      the ingress port is not in the VMEMB_PORT vector of that port's pvid.
      However, VLAN-tagged packets are subject to VLAN ingress
      checking/dropping. This means that instead of using the catch-all
      DRPSOTAG bit introduced in SJA1105P, we can drop tagged packets on a
      per-VLAN basis, and this is already compatible with SJA1105E/T.
      
      This patch adds an "allowed_ingress" argument to sja1105_vlan_add(), and
      we call it with "false" for tag_8021q VLANs on user ports. The tag_8021q
      VLANs still need to be allowed, of course, on ingress to DSA ports and
      CPU ports.
      
      We also need to refine the drop_untagged check in sja1105_commit_pvid to
      make it not freak out about this new configuration. Currently it will
      try to keep the configuration consistent between untagged and pvid-tagged
      packets, so if the pvid of a port is 1 but VLAN 1 is not in VMEMB_PORT,
      packets tagged with VID 1 will behave the same as untagged packets, and
      be dropped. This behavior is what we want for ports under a VLAN-aware
      bridge, but for the ports with a tag_8021q pvid, we want untagged
      packets to be accepted, but packets tagged with a header recognized by
      the switch as a tag_8021q VLAN to be dropped. So only restrict the
      drop_untagged check to apply to the bridge_pvid, not to the tag_8021q_pvid.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73ceab83
  5. 18 8月, 2021 1 次提交
    • V
      net: dsa: tag_sja1105: be dsa_loop-safe · 994d2cbb
      Vladimir Oltean 提交于
      Add support for tag_sja1105 running on non-sja1105 DSA ports, by making
      sure that every time we dereference dp->priv, we check the switch's
      dsa_switch_ops (otherwise we access a struct sja1105_port structure that
      is in fact something else).
      
      This adds an unconditional build-time dependency between sja1105 being
      built as module => tag_sja1105 must also be built as module. This was
      there only for PTP before.
      
      Some sane defaults must also take place when not running on sja1105
      hardware. These are:
      
      - sja1105_xmit_tpid: the sja1105 driver uses different VLAN protocols
        depending on VLAN awareness and switch revision (when an encapsulated
        VLAN must be sent). Default to 0x8100.
      
      - sja1105_rcv_meta_state_machine: this aggregates PTP frames with their
        metadata timestamp frames. When running on non-sja1105 hardware, don't
        do that and accept all frames unmodified.
      
      - sja1105_defer_xmit: calls sja1105_port_deferred_xmit in sja1105_main.c
        which writes a management route over SPI. When not running on sja1105
        hardware, bypass the SPI write and send the frame as-is.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      994d2cbb
  6. 16 8月, 2021 1 次提交
    • V
      net: dsa: sja1105: reorganize probe, remove, setup and teardown ordering · 022522ac
      Vladimir Oltean 提交于
      The sja1105 driver's initialization and teardown sequence is a chaotic
      mess that has gathered a lot of cruft over time. It works because there
      is no strict dependency between the functions, but it could be improved.
      
      The basic principle that teardown should be the exact reverse of setup
      is obviously not held. We have initialization steps (sja1105_tas_setup,
      sja1105_flower_setup) in the probe method that are torn down in the DSA
      .teardown method instead of driver unbind time.
      
      We also have code after the dsa_register_switch() call, which implicitly
      means after the .setup() method has finished, which is pretty unusual.
      
      Also, sja1105_teardown() has calls set up in a different order than the
      error path of sja1105_setup(): see the reversed ordering between
      sja1105_ptp_clock_unregister and sja1105_mdiobus_unregister.
      
      Also, sja1105_static_config_load() is called towards the end of
      sja1105_setup(), but sja1105_static_config_free() is also towards the
      end of the error path and teardown path. The static_config_load() call
      should be earlier.
      
      Also, making and breaking the connections between struct sja1105_port
      and struct dsa_port could be refactored into dedicated functions, makes
      the code easier to follow.
      
      We move some code from the DSA .setup() method into the probe method,
      like the device tree parsing, and we move some code from the probe
      method into the DSA .setup() method to be symmetric with its placement
      in the DSA .teardown() method, which is nice because the unbind function
      has a single call to dsa_unregister_switch(). Example of the latter type
      of code movement are the connections between ports mentioned above, they
      are now in the .setup() method.
      
      Finally, due to fact that the kthread_init_worker() call is no longer
      in sja1105_probe() - located towards the bottom of the file - but in
      sja1105_setup() - located much higher - there is an inverse ordering
      with the worker function declaration, sja1105_port_deferred_xmit. To
      avoid that, the entire sja1105_setup() and sja1105_teardown() functions
      are moved towards the bottom of the file.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      022522ac
  7. 12 8月, 2021 1 次提交
  8. 10 8月, 2021 1 次提交
    • V
      net: dsa: sja1105: fix broken backpressure in .port_fdb_dump · 21b52fed
      Vladimir Oltean 提交于
      rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into
      multiple netlink skbs if the buffer provided by user space is too small
      (one buffer will typically handle a few hundred FDB entries).
      
      When the current buffer becomes full, nlmsg_put() in
      dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index
      of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that
      point, and then the dump resumes on the same port with a new skb, and
      FDB entries up to the saved index are simply skipped.
      
      Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to
      drivers, then drivers must check for the -EMSGSIZE error code returned
      by it. Otherwise, when a netlink skb becomes full, DSA will no longer
      save newly dumped FDB entries to it, but the driver will continue
      dumping. So FDB entries will be missing from the dump.
      
      Fix the broken backpressure by propagating the "cb" return code and
      allow rtnl_fdb_dump() to restart the FDB dump with a new skb.
      
      Fixes: 291d1e72 ("net: dsa: sja1105: Add support for FDB and MDB management")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      21b52fed
  9. 09 8月, 2021 2 次提交
    • V
      net: dsa: sja1105: add FDB fast ageing support · 5126ec72
      Vladimir Oltean 提交于
      Delete the dynamically learned FDB entries when the STP state changes
      and when address learning is disabled.
      
      On sja1105 there is no shorthand SPI command for this, so we need to
      walk through the entire FDB to delete.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5126ec72
    • V
      net: dsa: sja1105: rely on DSA core tracking of port learning state · 5313a37b
      Vladimir Oltean 提交于
      Now that DSA keeps track of the port learning state, it becomes
      superfluous to keep an additional variable with this information in the
      sja1105 driver. Remove it.
      
      The DSA core's learning state is present in struct dsa_port *dp.
      To avoid the antipattern where we iterate through a DSA switch's
      ports and then call dsa_to_port to obtain the "dp" reference (which is
      bad because dsa_to_port iterates through the DSA switch tree once
      again), just iterate through the dst->ports and operate on those
      directly.
      
      The sja1105 had an extra use of priv->learn_ena on non-user ports. DSA
      does not touch the learning state of those ports - drivers are free to
      do what they wish on them. Mark that information with a comment in
      struct dsa_port and let sja1105 set dp->learning for cascade ports.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5313a37b
  10. 05 8月, 2021 6 次提交
    • V
      net: dsa: sja1105: enable address learning on cascade ports · 81d45898
      Vladimir Oltean 提交于
      Right now, address learning is disabled on DSA ports, which means that a
      packet received over a DSA port from a cross-chip switch will be flooded
      to unrelated ports.
      
      It is desirable to eliminate that, but for that we need a breakdown of
      the possibilities for the sja1105 driver. A DSA port can be:
      
      - a downstream-facing cascade port. This is simple because it will
        always receive packets from a downstream switch, and there should be
        no other route to reach that downstream switch in the first place,
        which means it should be safe to learn that MAC address towards that
        switch.
      
      - an upstream-facing cascade port. This receives packets either:
        * autonomously forwarded by an upstream switch (and therefore these
          packets belong to the data plane of a bridge, so address learning
          should be ok), or
        * injected from the CPU. This deserves further discussion, as normally,
          an upstream-facing cascade port is no different than the CPU port
          itself. But with "H" topologies (a DSA link towards a switch that
          has its own CPU port), these are more "laterally-facing" cascade
          ports than they are "upstream-facing". Here, there is a risk that
          the port might learn the host addresses on the wrong port (on the
          DSA port instead of on its own CPU port), but this is solved by
          DSA's RX filtering infrastructure, which installs the host addresses
          as static FDB entries on the CPU port of all switches in a "H" tree.
          So even if there will be an attempt from the switch to migrate the
          FDB entry from the CPU port to the laterally-facing cascade port, it
          will fail to do that, because the FDB entry that already exists is
          static and cannot migrate. So address learning should be safe for
          this configuration too.
      
      Ok, so what about other MAC addresses coming from the host, not
      necessarily the bridge local FDB entries? What about MAC addresses
      dynamically learned on foreign interfaces, isn't there a risk that
      cascade ports will learn these entries dynamically when they are
      supposed to be delivered towards the CPU port? Well, that is correct,
      and this is why we also need to enable the assisted learning feature, to
      snoop for these addresses and write them to hardware as static FDB
      entries towards the CPU, to make the switch's learning process on the
      cascade ports ineffective for them. With assisted learning enabled, the
      hardware learning on the CPU port must be disabled.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      81d45898
    • V
      net: dsa: sja1105: suppress TX packets from looping back in "H" topologies · 0f9b762c
      Vladimir Oltean 提交于
      H topologies like this one have a problem:
      
               eth0                                                     eth1
                |                                                        |
             CPU port                                                CPU port
                |                        DSA link                        |
       sw0p0  sw0p1  sw0p2  sw0p3  sw0p4 -------- sw1p4  sw1p3  sw1p2  sw1p1  sw1p0
         |             |      |                            |      |             |
       user          user   user                         user   user          user
       port          port   port                         port   port          port
      
      Basically any packet sent by the eth0 DSA master can be flooded on the
      interconnecting DSA link sw0p4 <-> sw1p4 and it will be received by the
      eth1 DSA master too. Basically we are talking to ourselves.
      
      In VLAN-unaware mode, these packets are encoded using a tag_8021q TX
      VLAN, which dsa_8021q_rcv() rightfully cannot decode and complains.
      Whereas in VLAN-aware mode, the packets are encoded with a bridge VLAN
      which _can_ be decoded by the tagger running on eth1, so it will attempt
      to reinject that packet into the network stack (the bridge, if there is
      any port under eth1 that is under a bridge). In the case where the ports
      under eth1 are under the same cross-chip bridge as the ports under eth0,
      the TX packets will even be learned as RX packets. The only thing that
      will prevent loops with the software bridging path, and therefore
      disaster, is that the source port and the destination port are in the
      same hardware domain, and the bridge will receive packets from the
      driver with skb->offload_fwd_mark = true and will not forward between
      the two.
      
      The proper solution to this problem is to detect H topologies and
      enforce that all packets are received through the local switch and we do
      not attempt to receive packets on our CPU port from switches that have
      their own. This is a viable solution which works thanks to the fact that
      MAC addresses which should be filtered towards the host are installed by
      DSA as static MAC addresses towards the CPU port of each switch.
      
      TX from a CPU port towards the DSA port continues to be allowed, this is
      because sja1105 supports bridge TX forwarding offload, and the skb->dev
      used initially for xmit does not have any direct correlation with where
      the station that will respond to that packet is connected. It may very
      well happen that when we send a ping through a br0 interface that spans
      all switch ports, the xmit packet will exit the system through a DSA
      switch interface under eth1 (say sw1p2), but the destination station is
      connected to a switch port under eth0, like sw0p0. So the switch under
      eth1 needs to communicate on TX with the switch under eth0. The
      response, however, will not follow the same path, but instead, this
      patch enforces that the response is sent by the first switch directly to
      its DSA master which is eth0.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0f9b762c
    • V
      net: dsa: sja1105: increase MTU to account for VLAN header on DSA ports · 777e55e3
      Vladimir Oltean 提交于
      Since all packets are transmitted as VLAN-tagged over a DSA link (this
      VLAN tag represents the tag_8021q header), we need to increase the MTU
      of these interfaces to account for the possibility that we are already
      transporting a user-visible VLAN header.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      777e55e3
    • V
      net: dsa: sja1105: manage VLANs on cascade ports · c5130029
      Vladimir Oltean 提交于
      Since commit ed040abc ("net: dsa: sja1105: use 4095 as the private
      VLAN for untagged traffic"), this driver uses a reserved value as pvid
      for the host port (DSA CPU port). Control packets which are sent as
      untagged get classified to this VLAN, and all ports are members of it
      (this is to be expected for control packets).
      
      Manage all cascade ports in the same way and allow control packets to
      egress everywhere.
      
      Also, all VLANs need to be sent as egress-tagged on all cascade ports.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5130029
    • V
      net: dsa: sja1105: manage the forwarding domain towards DSA ports · 3fa21270
      Vladimir Oltean 提交于
      Manage DSA links towards other switches, be they host ports or cascade
      ports, the same as the CPU port, i.e. allow forwarding and flooding
      unconditionally from all user ports.
      
      We send packets as always VLAN-tagged on a DSA port, and we rely on the
      cross-chip notifiers from tag_8021q to install the RX VLAN of a switch
      port only on the proper remote ports of another switch (the ports that
      are in the same bridging domain). So if there is no cross-chip bridging
      in the system, the flooded packets will be sent on the DSA ports too,
      but they will be dropped by the remote switches due to either
      (a) a lack of the RX VLAN in the VLAN table of the ingress DSA port, or
      (b) a lack of valid destinations for those packets, due to a lack of the
          RX VLAN on the user ports of the switch
      
      Note that switches which only transport packets in a cross-chip bridge,
      but have no user ports of their own as part of that bridge, such as
      switch 1 in this case:
      
                          DSA link                   DSA link
        sw0p0 sw0p1 sw0p2 -------- sw1p0 sw1p2 sw1p3 -------- sw2p0 sw2p2 sw2p3
      
      ip link set sw0p0 master br0
      ip link set sw2p3 master br0
      
      will still work, because the tag_8021q cross-chip notifiers keep the RX
      VLANs installed on all DSA ports.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3fa21270
    • V
      net: dsa: sja1105: configure the cascade ports based on topology · 30a100e6
      Vladimir Oltean 提交于
      The sja1105 switch family has a feature called "cascade ports" which can
      be used in topologies where multiple SJA1105/SJA1110 switches are daisy
      chained. Upstream switches set this bit for the DSA link towards the
      downstream switches. This is used when the upstream switch receives a
      control packet (PTP, STP) from a downstream switch, because if the
      source port for a control packet is marked as a cascade port, then the
      source port, switch ID and RX timestamp will not be taken again on the
      upstream switch, it is assumed that this has already been done by the
      downstream switch (the leaf port in the tree) and that the CPU has
      everything it needs to decode the information from this packet.
      
      We need to distinguish between an upstream-facing DSA link and a
      downstream-facing DSA link, because the upstream-facing DSA links are
      "host ports" for the SJA1105/SJA1110 switches, and the downstream-facing
      DSA links are "cascade ports".
      
      Note that SJA1105 supports a single cascade port, so only daisy chain
      topologies work. With SJA1110, there can be more complex topologies such
      as:
      
                          eth0
                           |
                       host port
                           |
       sw0p0    sw0p1    sw0p2    sw0p3    sw0p4
         |        |                 |        |
       cascade  cascade            user     user
        port     port              port     port
         |        |
         |        |
         |        |
         |       host
         |       port
         |        |
         |      sw1p0    sw1p1    sw1p2    sw1p3    sw1p4
         |                 |        |        |        |
         |                user     user     user     user
        host              port     port     port     port
        port
         |
       sw2p0    sw2p1    sw2p2    sw2p3    sw2p4
                  |        |        |        |
                 user     user     user     user
                 port     port     port     port
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30a100e6
  11. 02 8月, 2021 5 次提交
    • V
      net: dsa: sja1105: match FDB entries regardless of inner/outer VLAN tag · 47c2c0c2
      Vladimir Oltean 提交于
      On SJA1105P/Q/R/S and SJA1110, the L2 Lookup Table entries contain a
      maskable "inner/outer tag" bit which means:
      - when set to 1: match single-outer and double tagged frames
      - when set to 0: match untagged and single-inner tagged frames
      - when masked off: match all frames regardless of the type of tag
      
      This driver does not make any meaningful distinction between inner tags
      (matches on TPID) and outer tags (matches on TPID2). In fact, all VLAN
      table entries are installed as SJA1110_VLAN_D_TAG, which means that they
      match on both inner and outer tags.
      
      So it does not make sense that we install FDB entries with the IOTAG bit
      set to 1.
      
      In VLAN-unaware mode, we set both TPID and TPID2 to 0xdadb, so the
      switch will see frames as outer-tagged or double-tagged (never inner).
      So the FDB entries will match if IOTAG is set to 1.
      
      In VLAN-aware mode, we set TPID to 0x8100 and TPID2 to 0x88a8. So the
      switch will see untagged and 802.1Q-tagged packets as inner-tagged, and
      802.1ad-tagged packets as outer-tagged. So untagged and 802.1Q-tagged
      packets will not match FDB entries if IOTAG is set to 1, but 802.1ad
      tagged packets will. Strange.
      
      To fix this, simply mask off the IOTAG bit from FDB entries, and make
      them match regardless of whether the VLAN tag is inner or outer.
      
      Fixes: 1da73821 ("net: dsa: sja1105: Add FDB operations for P/Q/R/S series")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      47c2c0c2
    • V
      net: dsa: sja1105: be stateless with FDB entries on SJA1105P/Q/R/S/SJA1110 too · 589918df
      Vladimir Oltean 提交于
      Similar but not quite the same with what was done in commit b11f0a4c
      ("net: dsa: sja1105: be stateless when installing FDB entries") for
      SJA1105E/T, it is desirable to drop the priv->vlan_aware check and
      simply go ahead and install FDB entries in the VLAN that was given by
      the bridge.
      
      As opposed to SJA1105E/T, in SJA1105P/Q/R/S and SJA1110, the FDB is a
      maskable TCAM, and we are installing VLAN-unaware FDB entries with the
      VLAN ID masked off. However, such FDB entries might completely obscure
      VLAN-aware entries where the VLAN ID is included in the search mask,
      because the switch looks up the FDB from left to right and picks the
      first entry which results in a masked match. So it depends on whether
      the bridge installs first the VLAN-unaware or the VLAN-aware FDB entries.
      
      Anyway, if we had a VLAN-unaware FDB entry towards one set of DESTPORTS
      and a VLAN-aware one towards other set of DESTPORTS, the result is that
      the packets in VLAN-aware mode will be forwarded towards the DESTPORTS
      specified by the VLAN-unaware entry.
      
      To solve this, simply do not use the masked matching ability of the FDB
      for VLAN ID, and always match precisely on it. In VLAN-unaware mode, we
      configure the switch for shared VLAN learning, so the VLAN ID will be
      ignored anyway during lookup, so it is redundant to mask it off in the
      TCAM.
      
      This patch conflicts with net-next commit 0fac6aa0 ("net: dsa: sja1105:
      delete the best_effort_vlan_filtering mode") which changed this line:
      	if (priv->vlan_state != SJA1105_VLAN_UNAWARE) {
      into:
      	if (priv->vlan_aware) {
      
      When merging with net-next, the lines added by this patch should take
      precedence in the conflict resolution (i.e. the "if" condition should be
      deleted in both cases).
      
      Fixes: 1da73821 ("net: dsa: sja1105: Add FDB operations for P/Q/R/S series")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      589918df
    • V
      net: dsa: sja1105: ignore the FDB entry for unknown multicast when adding a new address · 728db843
      Vladimir Oltean 提交于
      Currently, when sja1105pqrs_fdb_add() is called for a host-joined IPv6
      MDB entry such as 33:33:00:00:00:6a, the search for that address will
      return the FDB entry for SJA1105_UNKNOWN_MULTICAST, which has a
      destination MAC of 01:00:00:00:00:00 and a mask of 01:00:00:00:00:00.
      It returns that entry because, well, it matches, in the sense that
      unknown multicast is supposed by design to match it...
      
      But the issue is that we then proceed to overwrite this entry with the
      one for our precise host-joined multicast address, and the unknown
      multicast entry is no longer there - unknown multicast is now flooded to
      the same group of ports as broadcast, which does not look up the FDB.
      
      To solve this problem, we should ignore searches that return the unknown
      multicast address as the match, and treat them as "no match" which will
      result in the entry being installed to hardware.
      
      For this to work properly, we need to put the result of the FDB search
      in a temporary variable in order to avoid overwriting the l2_lookup
      entry we want to program. The l2_lookup entry returned by the search
      might not have the same set of DESTPORTS and not even the same MACADDR
      as the entry we're trying to add.
      
      Fixes: 4d942354 ("net: dsa: sja1105: offload bridge port flags to device")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      728db843
    • V
      net: dsa: sja1105: invalidate dynamic FDB entries learned concurrently with statically added ones · 6c5fc159
      Vladimir Oltean 提交于
      The procedure to add a static FDB entry in sja1105 is concurrent with
      dynamic learning performed on all bridge ports and the CPU port.
      
      The switch looks up the FDB from left to right, and also learns
      dynamically from left to right, so it is possible that between the
      moment when we pick up a free slot to install an FDB entry, another slot
      to the left of that one becomes free due to an address ageing out, and
      that other slot is then immediately used by the switch to learn
      dynamically the same address as we're trying to add statically.
      
      The result is that we succeeded to add our static FDB entry, but it is
      being shadowed by a dynamic FDB entry to its left, and the switch will
      behave as if our static FDB entry did not exist.
      
      We cannot really prevent this from happening unless we make the entire
      process to add a static FDB entry a huge critical section where address
      learning is temporarily disabled on _all_ ports, and then re-enabled
      according to the configuration done by sja1105_port_set_learning.
      However, that is kind of disruptive for the operation of the network.
      
      What we can do alternatively is to simply read back the FDB for dynamic
      entries located before our newly added static one, and delete them.
      This will guarantee that our static FDB entry is now operational. It
      will still not guarantee that there aren't dynamic FDB entries to the
      _right_ of that static FDB entry, but at least those entries will age
      out by themselves since they aren't hit, and won't bother anyone.
      
      Fixes: 291d1e72 ("net: dsa: sja1105: Add support for FDB and MDB management")
      Fixes: 1da73821 ("net: dsa: sja1105: Add FDB operations for P/Q/R/S series")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6c5fc159
    • V
      net: dsa: sja1105: overwrite dynamic FDB entries with static ones in .port_fdb_add · e11e865b
      Vladimir Oltean 提交于
      The SJA1105 switch family leaves it up to software to decide where
      within the FDB to install a static entry, and to concatenate destination
      ports for already existing entries (the FDB is also used for multicast
      entries), it is not as simple as just saying "please add this entry".
      
      This means we first need to search for an existing FDB entry before
      adding a new one. The driver currently manages to fool itself into
      thinking that if an FDB entry already exists, there is nothing to be
      done. But that FDB entry might be dynamically learned, case in which it
      should be replaced with a static entry, but instead it is left alone.
      
      This patch checks the LOCKEDS ("locked/static") bit from found FDB
      entries, and lets the code "goto skip_finding_an_index;" if the FDB
      entry was not static. So we also need to move the place where we set
      LOCKEDS = true, to cover the new case where a dynamic FDB entry existed
      but was dynamic.
      
      Fixes: 291d1e72 ("net: dsa: sja1105: Add support for FDB and MDB management")
      Fixes: 1da73821 ("net: dsa: sja1105: Add FDB operations for P/Q/R/S series")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e11e865b
  12. 29 7月, 2021 3 次提交
    • V
      net: dsa: sja1105: make sure untagged packets are dropped on ingress ports with no pvid · bef0746c
      Vladimir Oltean 提交于
      Surprisingly, this configuration:
      
      ip link add br0 type bridge vlan_filtering 1
      ip link set swp2 master br0
      bridge vlan del dev swp2 vid 1
      
      still has the sja1105 switch sending untagged packets to the CPU (and
      failing to decode them, since dsa_find_designated_bridge_port_by_vid
      searches by VID 1 and rightfully finds no bridge VLAN 1 on a port).
      
      Dumping the switch configuration, the VLANs are managed properly:
      - the pvid of swp2 is 1 in the MAC Configuration Table, but
      - only the CPU port is in the port membership of VLANID 1 in the VLAN
        Lookup Table
      
      When the ingress packets are tagged with VID 1, they are properly
      dropped. But when they are untagged, they are able to reach the CPU
      port. Also, when the pvid in the MAC Configuration Table is changed to
      e.g. 55 (an unused VLAN), the untagged packets are also dropped.
      
      So it looks like:
      - the switch bypasses ingress VLAN membership checks for untagged traffic
      - the reason why the untagged traffic is dropped when I make the pvid 55
        is due to the lack of valid destination ports in VLAN 55, rather than
        an ingress membership violation
      - the ingress VLAN membership cheks are only done for VLAN-tagged traffic
      
      Interesting. It looks like there is an explicit bit to drop untagged
      traffic, so we should probably be using that to preserve user expectations.
      
      Note that only VLAN-aware ports should drop untagged packets due to no
      pvid - when VLAN-unaware, the software bridge doesn't do this even if
      there is no pvid on any bridge port and on the bridge itself. So the new
      sja1105_drop_untagged() function cannot simply be called with "false"
      from sja1105_bridge_vlan_add() and with "true" from sja1105_bridge_vlan_del.
      Instead, we need to also consider the VLAN awareness state. That means
      we need to hook the "drop untagged" setting in all the same places where
      the "commit pvid" logic is, and it needs to factor in all the state when
      flipping the "drop untagged" bit: is our current pvid in the VLAN Lookup
      Table, and is the current port in that VLAN's port membership list?
      VLAN-unaware ports will never drop untagged frames because these checks
      always succeed by construction, and the tag_8021q VLANs cannot be changed
      by the user.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bef0746c
    • V
      net: dsa: sja1105: reset the port pvid when leaving a VLAN-aware bridge · cde8078e
      Vladimir Oltean 提交于
      Now that we no longer have the ultra-central sja1105_build_vlan_table(),
      we need to be more careful about checking all corner cases manually.
      
      For example, when a port leaves a VLAN-aware bridge, it becomes
      standalone so its pvid should become a tag_8021q RX VLAN again. However,
      sja1105_commit_pvid() only gets called from sja1105_bridge_vlan_add()
      and from sja1105_vlan_filtering(), and no VLAN awareness change takes
      place (VLAN filtering is a global setting for sja1105, so the switch
      remains VLAN-aware overall).
      
      This means that we need to put another sja1105_commit_pvid() call in
      sja1105_bridge_member().
      
      Fixes: 6dfd23d3 ("net: dsa: sja1105: delete vlan delta save/restore logic")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cde8078e
    • V
      net: dsa: sja1105: be stateless when installing FDB entries · b11f0a4c
      Vladimir Oltean 提交于
      Currently there are issues when adding a bridge FDB entry as VLAN-aware
      and deleting it as VLAN-unaware, or vice versa.
      
      However this is an unneeded complication, since the bridge always
      installs its default FDB entries in VLAN 0 to match on VLAN-unaware
      ports, and in the default_pvid (VLAN 1) to match on VLAN-aware ports.
      So instead of trying to outsmart the bridge, just install all entries it
      gives us, and they will start matching packets when the vlan_filtering
      mode changes.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b11f0a4c
  13. 27 7月, 2021 6 次提交
    • V
      net: dsa: sja1105: add bridge TX data plane offload based on tag_8021q · b6ad86e6
      Vladimir Oltean 提交于
      The main desire for having this feature in sja1105 is to support network
      stack termination for traffic coming from a VLAN-aware bridge.
      
      For sja1105, offloading the bridge data plane means sending packets
      as-is, with the proper VLAN tag, to the chip. The chip will look up its
      FDB and forward them to the correct destination port.
      
      But we support bridge data plane offload even for VLAN-unaware bridges,
      and the implementation there is different. In fact, VLAN-unaware
      bridging is governed by tag_8021q, so it makes sense to have the
      .bridge_fwd_offload_add() implementation fully within tag_8021q.
      The key difference is that we only support 1 VLAN-aware bridge, but we
      support multiple VLAN-unaware bridges. So we need to make sure that the
      forwarding domain is not crossed by packets injected from the stack.
      
      For this, we introduce the concept of a tag_8021q TX VLAN for bridge
      forwarding offload. As opposed to the regular TX VLANs which contain
      only 2 ports (the user port and the CPU port), a bridge data plane TX
      VLAN is "multicast" (or "imprecise"): it contains all the ports that are
      part of a certain bridge, and the hardware will select where the packet
      goes within this "imprecise" forwarding domain.
      
      Each VLAN-unaware bridge has its own "imprecise" TX VLAN, so we make use
      of the unique "bridge_num" provided by DSA for the data plane offload.
      We use the same 3 bits from the tag_8021q VLAN ID format to encode this
      bridge number.
      
      Note that these 3 bit positions have been used before for sub-VLANs in
      best-effort VLAN filtering mode. The difference is that for best-effort,
      the sub-VLANs were only valid on RX (and it was documented that the
      sub-VLAN field needed to be transmitted as zero). Whereas for the bridge
      data plane offload, these 3 bits are only valid on TX.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6ad86e6
    • V
      net: dsa: sja1105: add support for imprecise RX · 884be12f
      Vladimir Oltean 提交于
      This is already common knowledge by now, but the sja1105 does not have
      hardware support for DSA tagging for data plane packets, and tag_8021q
      sets up a unique pvid per port, transmitted as VLAN-tagged towards the
      CPU, for the source port to be decoded nonetheless.
      
      When the port is part of a VLAN-aware bridge, the pvid committed to
      hardware is taken from the bridge and not from tag_8021q, so we need to
      work with that the best we can.
      
      Configure the switches to send all packets to the CPU as VLAN-tagged
      (even ones that were originally untagged on the wire) and make use of
      dsa_untag_bridge_pvid() to get rid of it before we send those packets up
      the network stack.
      
      With the classified VLAN used by hardware known to the tagger, we first
      peek at the VID in an attempt to figure out if the packet was received
      from a VLAN-unaware port (standalone or under a VLAN-unaware bridge),
      case in which we can continue to call dsa_8021q_rcv(). If that is not
      the case, the packet probably came from a VLAN-aware bridge. So we call
      the DSA helper that finds for us a "designated bridge port" - one that
      is a member of the VLAN ID from the packet, and is in the proper STP
      state - basically these are all checks performed by br_handle_frame() in
      the software RX data path.
      
      The bridge will accept the packet as valid even if the source port was
      maybe wrong. So it will maybe learn the MAC SA of the packet on the
      wrong port, and its software FDB will be out of sync with the hardware
      FDB. So replies towards this same MAC DA will not work, because the
      bridge will send towards a different netdev.
      
      This is where the bridge data plane offload ("imprecise TX") added by
      the next patch comes in handy. The software FDB is wrong, true, but the
      hardware FDB isn't, and by offloading the bridge forwarding plane we
      have a chance to right a wrong, and have the hardware look up the FDB
      for us for the reply packet. So it all cancels out.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      884be12f
    • V
      net: dsa: sja1105: deny more than one VLAN-aware bridge · 19fa937a
      Vladimir Oltean 提交于
      With tag_sja1105.c's only ability being to perform an imprecise RX
      procedure and identify whether a packet comes from a VLAN-aware bridge
      or not, we have no way to determine whether a packet with VLAN ID 5
      comes from, say, br0 or br1. Actually we could, but it would mean that
      we need to restrict all VLANs from br0 to be different from all VLANs
      from br1, and this includes the default_pvid, which makes a setup with 2
      VLAN-aware bridges highly imprectical.
      
      The fact of the matter is that this isn't even that big of a practical
      limitation, since even with a single VLAN-aware bridge we can pretty
      much enforce forwarding isolation based on the VLAN port membership.
      
      So in the end, tell the user that they need to model their setup using a
      single VLAN-aware bridge.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      19fa937a
    • V
      net: dsa: sja1105: deny 8021q uppers on ports · 4fbc08bd
      Vladimir Oltean 提交于
      Now that best-effort VLAN filtering is gone and we are left with the
      imprecise RX and imprecise TX based in VLAN-aware mode, where the tagger
      just guesses the source port based on plausibility of the VLAN ID, 8021q
      uppers installed on top of a standalone port, while other ports of that
      switch are under a VLAN-aware bridge don't quite "just work".
      
      In fact it could be possible to restrict the VLAN IDs used by the 8021q
      uppers to not be shared with VLAN IDs used by that VLAN-aware bridge,
      but then the tagger needs to be patched to search for 8021q uppers too,
      not just for the "designated bridge port" which will be introduced in a
      later patch.
      
      I haven't given a possible implementation full thought, it seems maybe
      possible but not worth the effort right now. The only certain thing is
      that currently the tagger won't be able to figure out the source port
      for these packets because they will come with the VLAN ID of the 8021q
      upper and are no longer retagged to a tag_8021q sub-VLAN like the best
      effort VLAN filtering code used to do. So just deny these for the
      moment.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4fbc08bd
    • V
      net: dsa: sja1105: delete vlan delta save/restore logic · 6dfd23d3
      Vladimir Oltean 提交于
      With the best_effort_vlan_filtering mode now gone, the driver does not
      have 3 operating modes anymore (VLAN-unaware, VLAN-aware and best effort),
      but only 2.
      
      The idea is that we will gain support for network stack I/O through a
      VLAN-aware bridge, using the data plane offload framework (imprecise RX,
      imprecise TX). So the VLAN-aware use case will be more functional.
      
      But standalone ports that are part of the same switch when some other
      ports are under a VLAN-aware bridge should work too. Termination on
      those should work through the tag_8021q RX VLAN and TX VLAN.
      
      This was not possible using the old logic, because:
      - in VLAN-unaware mode, only the tag_8021q VLANs were committed to hw
      - in VLAN-aware mode, only the bridge VLANs were committed to hw
      - in best-effort VLAN mode, both the tag_8021q and bridge VLANs were
        committed to hw
      
      The strategy for the new VLAN-aware mode is to allow the bridge and the
      tag_8021q VLANs to coexist in the VLAN table at the same time.
      
      [ yes, we need to make sure that the bridge cannot install a tag_8021q
        VLAN, but ]
      
      This means that the save/restore logic introduced by commit ec5ae610
      ("net: dsa: sja1105: save/restore VLANs using a delta commit method")
      does not serve a purpose any longer. We can delete it and restore the
      old code that simply adds a VLAN to the VLAN table and calls it a day.
      
      Note that we keep the sja1105_commit_pvid() function from those days,
      but adapt it slightly. Ports that are under a VLAN-aware bridge use the
      bridge's pvid, ports that are standalone or under a VLAN-unaware bridge
      use the tag_8021q pvid, for local termination or VLAN-unaware forwarding.
      
      Now, when the vlan_filtering property is toggled for the bridge, the
      pvid of the ports beneath it is the only thing that's changing, we no
      longer delete some VLANs and restore others.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6dfd23d3
    • C
      net: dsa: sja1105: remove redundant re-assignment of pointer table · d63f8877
      Colin Ian King 提交于
      The pointer table is being re-assigned with a value that is never
      read. The assignment is redundant and can be removed.
      
      Addresses-Coverity: ("Unused value")
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d63f8877
  14. 22 7月, 2021 1 次提交
    • V
      net: dsa: sja1105: make VID 4095 a bridge VLAN too · e40cba94
      Vladimir Oltean 提交于
      This simple series of commands:
      
      ip link add br0 type bridge vlan_filtering 1
      ip link set swp0 master br0
      
      fails on sja1105 with the following error:
      [   33.439103] sja1105 spi0.1: vlan-lookup-table needs to have at least the default untagged VLAN
      [   33.447710] sja1105 spi0.1: Invalid config, cannot upload
      Warning: sja1105: Failed to change VLAN Ethertype.
      
      For context, sja1105 has 3 operating modes:
      - SJA1105_VLAN_UNAWARE: the dsa_8021q_vlans are committed to hardware
      - SJA1105_VLAN_FILTERING_FULL: the bridge_vlans are committed to hardware
      - SJA1105_VLAN_FILTERING_BEST_EFFORT: both the dsa_8021q_vlans and the
        bridge_vlans are committed to hardware
      
      Swapping out a VLAN list and another in happens in
      sja1105_build_vlan_table(), which performs a delta update procedure.
      That function is called from a few places, notably from
      sja1105_vlan_filtering() which is called from the
      SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING handler.
      
      The above set of 2 commands fails when run on a kernel pre-commit
      8841f6e6 ("net: dsa: sja1105: make devlink property
      best_effort_vlan_filtering true by default"). So the priv->vlan_state
      transition that takes place is between VLAN-unaware and full VLAN
      filtering. So the dsa_8021q_vlans are swapped out and the bridge_vlans
      are swapped in.
      
      So why does it fail?
      
      Well, the bridge driver, through nbp_vlan_init(), first sets up the
      SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING attribute, and only then
      proceeds to call nbp_vlan_add for the default_pvid.
      
      So when we swap out the dsa_8021q_vlans and swap in the bridge_vlans in
      the SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING handler, there are no bridge
      VLANs (yet). So we have wiped the VLAN table clean, and the low-level
      static config checker complains of an invalid configuration. We _will_
      add the bridge VLANs using the dynamic config interface, albeit later,
      when nbp_vlan_add() calls us. So it is natural that it fails.
      
      So why did it ever work?
      
      Surprisingly, it looks like I only tested this configuration with 2
      things set up in a particular way:
      - a network manager that brings all ports up
      - a kernel with CONFIG_VLAN_8021Q=y
      
      It is widely known that commit ad1afb00 ("vlan_dev: VLAN 0 should be
      treated as "no vlan tag" (802.1p packet)") installs VID 0 to every net
      device that comes up. DSA treats these VLANs as bridge VLANs, and
      therefore, in my testing, the list of bridge_vlans was never empty.
      
      However, if CONFIG_VLAN_8021Q is not enabled, or the port is not up when
      it joins a VLAN-aware bridge, the bridge_vlans list will be temporarily
      empty, and the sja1105_static_config_reload() call from
      sja1105_vlan_filtering() will fail.
      
      To fix this, the simplest thing is to keep VID 4095, the one used for
      CPU-injected control packets since commit ed040abc ("net: dsa:
      sja1105: use 4095 as the private VLAN for untagged traffic"), in the
      list of bridge VLANs too, not just the list of tag_8021q VLANs. This
      ensures that the list of bridge VLANs will never be empty.
      
      Fixes: ec5ae610 ("net: dsa: sja1105: save/restore VLANs using a delta commit method")
      Reported-by: NRadu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e40cba94
  15. 20 7月, 2021 4 次提交
    • V
      net: dsa: tag_8021q: add proper cross-chip notifier support · c64b9c05
      Vladimir Oltean 提交于
      The big problem which mandates cross-chip notifiers for tag_8021q is
      this:
      
                                                   |
          sw0p0     sw0p1     sw0p2     sw0p3     sw0p4
       [  user ] [  user ] [  user ] [  dsa  ] [  cpu  ]
                                         |
                                         +---------+
                                                   |
          sw1p0     sw1p1     sw1p2     sw1p3     sw1p4
       [  user ] [  user ] [  user ] [  dsa  ] [  dsa  ]
                                         |
                                         +---------+
                                                   |
          sw2p0     sw2p1     sw2p2     sw2p3     sw2p4
       [  user ] [  user ] [  user ] [  dsa  ] [  dsa  ]
      
      When the user runs:
      
      ip link add br0 type bridge
      ip link set sw0p0 master br0
      ip link set sw2p0 master br0
      
      It doesn't work.
      
      This is because dsa_8021q_crosschip_bridge_join() assumes that "ds" and
      "other_ds" are at most 1 hop away from each other, so it is sufficient
      to add the RX VLAN of {ds, port} into {other_ds, other_port} and vice
      versa and presto, the cross-chip link works. When there is another
      switch in the middle, such as in this case switch 1 with its DSA links
      sw1p3 and sw1p4, somebody needs to tell it about these VLANs too.
      
      Which is exactly why the problem is quadratic: when a port joins a
      bridge, for each port in the tree that's already in that same bridge we
      notify a tag_8021q VLAN addition of that port's RX VLAN to the entire
      tree. It is a very complicated web of VLANs.
      
      It must be mentioned that currently we install tag_8021q VLANs on too
      many ports (DSA links - to be precise, on all of them). For example,
      when sw2p0 joins br0, and assuming sw1p0 was part of br0 too, we add the
      RX VLAN of sw2p0 on the DSA links of switch 0 too, even though there
      isn't any port of switch 0 that is a member of br0 (at least yet).
      In theory we could notify only the switches which sit in between the
      port joining the bridge and the port reacting to that bridge_join event.
      But in practice that is impossible, because of the way 'link' properties
      are described in the device tree. The DSA bindings require DT writers to
      list out not only the real/physical DSA links, but in fact the entire
      routing table, like for example switch 0 above will have:
      
      	sw0p3: port@3 {
      		link = <&sw1p4 &sw2p4>;
      	};
      
      This was done because:
      
      /* TODO: ideally DSA ports would have a single dp->link_dp member,
       * and no dst->rtable nor this struct dsa_link would be needed,
       * but this would require some more complex tree walking,
       * so keep it stupid at the moment and list them all.
       */
      
      but it is a perfect example of a situation where too much information is
      actively detrimential, because we are now in the position where we
      cannot distinguish a real DSA link from one that is put there to avoid
      the 'complex tree walking'. And because DT is ABI, there is not much we
      can change.
      
      And because we do not know which DSA links are real and which ones
      aren't, we can't really know if DSA switch A is in the data path between
      switches B and C, in the general case.
      
      So this is why tag_8021q RX VLANs are added on all DSA links, and
      probably why it will never change.
      
      On the other hand, at least the number of additions/deletions is well
      balanced, and this means that once we implement reference counting at
      the cross-chip notifier level a la fdb/mdb, there is absolutely zero
      need for a struct dsa_8021q_crosschip_link, it's all self-managing.
      
      In fact, with the tag_8021q notifiers emitted from the bridge join
      notifiers, it becomes so generic that sja1105 does not need to do
      anything anymore, we can just delete its implementation of the
      .crosschip_bridge_{join,leave} methods.
      
      Among other things we can simply delete is the home-grown implementation
      of sja1105_notify_crosschip_switches(). The reason why that is wrong is
      because it is not quadratic - it only covers remote switches to which we
      have a cross-chip bridging link and that does not cover in-between
      switches. This deletion is part of the same patch because sja1105 used
      to poke deep inside the guts of the tag_8021q context in order to do
      that. Because the cross-chip links went away, so needs the sja1105 code.
      
      Last but not least, dsa_8021q_setup_port() is simplified (and also
      renamed). Because our TAG_8021Q_VLAN_ADD notifier is designed to react
      on the CPU port too, the four dsa_8021q_vid_apply() calls:
      - 1 for RX VLAN on user port
      - 1 for the user port's RX VLAN on the CPU port
      - 1 for TX VLAN on user port
      - 1 for the user port's TX VLAN on the CPU port
      
      now get squashed into only 2 notifier calls via
      dsa_port_tag_8021q_vlan_add.
      
      And because the notifiers to add and to delete a tag_8021q VLAN are
      distinct, now we finally break up the port setup and teardown into
      separate functions instead of relying on a "bool enabled" flag which
      tells us what to do. Arguably it should have been this way from the
      get go.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c64b9c05
    • V
      net: dsa: tag_8021q: absorb dsa_8021q_setup into dsa_tag_8021q_{,un}register · 328621f6
      Vladimir Oltean 提交于
      Right now, setting up tag_8021q is a 2-step operation for a driver,
      first the context structure needs to be created, then the VLANs need to
      be installed on the ports. A similar thing is true for teardown.
      
      Merge the 2 steps into the register/unregister methods, to be as
      transparent as possible for the driver as to what tag_8021q does behind
      the scenes. This also gets rid of the funny "bool setup == true means
      setup, == false means teardown" API that tag_8021q used to expose.
      
      Note that dsa_tag_8021q_register() must be called at least in the
      .setup() driver method and never earlier (like in the driver probe
      function). This is because the DSA switch tree is not initialized at
      probe time, and the cross-chip notifiers will not work.
      
      For symmetry with .setup(), the unregister method should be put in
      .teardown().
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      328621f6
    • V
      net: dsa: make tag_8021q operations part of the core · 5da11eb4
      Vladimir Oltean 提交于
      Make tag_8021q a more central element of DSA and move the 2 driver
      specific operations outside of struct dsa_8021q_context (which is
      supposed to hold dynamic data and not really constant function
      pointers).
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5da11eb4
    • V
      net: dsa: let the core manage the tag_8021q context · d7b1fd52
      Vladimir Oltean 提交于
      The basic problem description is as follows:
      
      Be there 3 switches in a daisy chain topology:
      
                                                   |
          sw0p0     sw0p1     sw0p2     sw0p3     sw0p4
       [  user ] [  user ] [  user ] [  dsa  ] [  cpu  ]
                                         |
                                         +---------+
                                                   |
          sw1p0     sw1p1     sw1p2     sw1p3     sw1p4
       [  user ] [  user ] [  user ] [  dsa  ] [  dsa  ]
                                         |
                                         +---------+
                                                   |
          sw2p0     sw2p1     sw2p2     sw2p3     sw2p4
       [  user ] [  user ] [  user ] [  user ] [  dsa  ]
      
      The CPU will not be able to ping through the user ports of the
      bottom-most switch (like for example sw2p0), simply because tag_8021q
      was not coded up for this scenario - it has always assumed DSA switch
      trees with a single switch.
      
      To add support for the topology above, we must admit that the RX VLAN of
      sw2p0 must be added on some ports of switches 0 and 1 as well. This is
      in fact a textbook example of thing that can use the cross-chip notifier
      framework that DSA has set up in switch.c.
      
      There is only one problem: core DSA (switch.c) is not able right now to
      make the connection between a struct dsa_switch *ds and a struct
      dsa_8021q_context *ctx. Right now, it is drivers who call into
      tag_8021q.c and always provide a struct dsa_8021q_context *ctx pointer,
      and tag_8021q.c calls them back with the .tag_8021q_vlan_{add,del}
      methods.
      
      But with cross-chip notifiers, it is possible for tag_8021q to call
      drivers without drivers having ever asked for anything. A good example
      is right above: when sw2p0 wants to set itself up for tag_8021q,
      the .tag_8021q_vlan_add method needs to be called for switches 1 and 0,
      so that they transport sw2p0's VLANs towards the CPU without dropping
      them.
      
      So instead of letting drivers manage the tag_8021q context, add a
      tag_8021q_ctx pointer inside of struct dsa_switch, which will be
      populated when dsa_tag_8021q_register() returns success.
      
      The patch is fairly long-winded because we are partly reverting commit
      5899ee36 ("net: dsa: tag_8021q: add a context structure") which made
      the driver-facing tag_8021q API use "ctx" instead of "ds". Now that we
      can access "ctx" directly from "ds", this is no longer needed.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7b1fd52