1. 13 10月, 2021 2 次提交
    • V
      net: dsa: sja1105: break dependency between dsa_port_is_sja1105 and switch driver · 4ac0567e
      Vladimir Oltean 提交于
      It's nice to be able to test a tagging protocol with dsa_loop, but not
      at the cost of losing the ability of building the tagging protocol and
      switch driver as modules, because as things stand, there is a circular
      dependency between the two. Tagging protocol drivers cannot depend on
      switch drivers, that is a hard fact.
      
      The reasoning behind the blamed patch was that accessing dp->priv should
      first make sure that the structure behind that pointer is what we really
      think it is.
      
      Currently the "sja1105" and "sja1110" tagging protocols only operate
      with the sja1105 switch driver, just like any other tagging protocol and
      switch combination. The only way to mix and match them is by modifying
      the code, and this applies to dsa_loop as well (by default that uses
      DSA_TAG_PROTO_NONE). So while in principle there is an issue, in
      practice there isn't one.
      
      Until we extend dsa_loop to allow user space configuration, treat the
      problem as a non-issue and just say that DSA ports found by tag_sja1105
      are always sja1105 ports, which is in fact true. But keep the
      dsa_port_is_sja1105 function so that it's easy to patch it during
      testing, and rely on dead code elimination.
      
      Fixes: 994d2cbb ("net: dsa: tag_sja1105: be dsa_loop-safe")
      Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      4ac0567e
    • V
      net: dsa: move sja1110_process_meta_tstamp inside the tagging protocol driver · 28da0555
      Vladimir Oltean 提交于
      The problem is that DSA tagging protocols really must not depend on the
      switch driver, because this creates a circular dependency at insmod
      time, and the switch driver will effectively not load when the tagging
      protocol driver is missing.
      
      The code was structured in the way it was for a reason, though. The DSA
      driver-facing API for PTP timestamping relies on the assumption that
      two-step TX timestamps are provided by the hardware in an out-of-band
      manner, typically by raising an interrupt and making that timestamp
      available inside some sort of FIFO which is to be accessed over
      SPI/MDIO/etc.
      
      So the API puts .port_txtstamp into dsa_switch_ops, because it is
      expected that the switch driver needs to save some state (like put the
      skb into a queue until its TX timestamp arrives).
      
      On SJA1110, TX timestamps are provided by the switch as Ethernet
      packets, so this makes them be received and processed by the tagging
      protocol driver. This in itself is great, because the timestamps are
      full 64-bit and do not require reconstruction, and since Ethernet is the
      fastest I/O method available to/from the switch, PTP timestamps arrive
      very quickly, no matter how bottlenecked the SPI connection is, because
      SPI interaction is not needed at all.
      
      DSA's code structure and strict isolation between the tagging protocol
      driver and the switch driver break the natural code organization.
      
      When the tagging protocol driver receives a packet which is classified
      as a metadata packet containing timestamps, it passes those timestamps
      one by one to the switch driver, which then proceeds to compare them
      based on the recorded timestamp ID that was generated in .port_txtstamp.
      
      The communication between the tagging protocol and the switch driver is
      done through a method exported by the switch driver, sja1110_process_meta_tstamp.
      To satisfy build requirements, we force a dependency to build the
      tagging protocol driver as a module when the switch driver is a module.
      However, as explained in the first paragraph, that causes the circular
      dependency.
      
      To solve this, move the skb queue from struct sja1105_private :: struct
      sja1105_ptp_data to struct sja1105_private :: struct sja1105_tagger_data.
      The latter is a data structure for which hacks have already been put
      into place to be able to create persistent storage per switch that is
      accessible from the tagging protocol driver (see sja1105_setup_ports).
      
      With the skb queue directly accessible from the tagging protocol driver,
      we can now move sja1110_process_meta_tstamp into the tagging driver
      itself, and avoid exporting a symbol.
      
      Fixes: 566b18c8 ("net: dsa: sja1105: implement TX timestamping for SJA1110")
      Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      28da0555
  2. 23 9月, 2021 5 次提交
    • V
      net: dsa: sja1105: stop using priv->vlan_aware · 9aad3e4e
      Vladimir Oltean 提交于
      Now that the sja1105 driver is finally sane enough again to stop having
      a ternary VLAN awareness state, we can remove priv->vlan_aware and query
      DSA for the ds->vlan_filtering value (for SJA1105, VLAN filtering is a
      global property).
      
      Also drop the paranoid checking that DSA calls ->port_vlan_filtering
      multiple times without the VLAN awareness state changing. It doesn't,
      the same check is present inside dsa_port_vlan_filtering too.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9aad3e4e
    • V
      net: dsa: sja1105: don't keep a persistent reference to the reset GPIO · 33e1501f
      Vladimir Oltean 提交于
      The driver only needs the reset GPIO for a very brief period, so instead
      of using devres and keeping the descriptor pointer inside priv, just use
      that descriptor inside the sja1105_hw_reset function and then let go of
      it.
      
      Also use gpiod_get_optional while at it, and error out on real errors
      (bad flags etc).
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      33e1501f
    • V
      net: dsa: sja1105: break dependency between dsa_port_is_sja1105 and switch driver · f5aef424
      Vladimir Oltean 提交于
      It's nice to be able to test a tagging protocol with dsa_loop, but not
      at the cost of losing the ability of building the tagging protocol and
      switch driver as modules, because as things stand, there is a circular
      dependency between the two. Tagging protocol drivers cannot depend on
      switch drivers, that is a hard fact.
      
      The reasoning behind the blamed patch was that accessing dp->priv should
      first make sure that the structure behind that pointer is what we really
      think it is.
      
      Currently the "sja1105" and "sja1110" tagging protocols only operate
      with the sja1105 switch driver, just like any other tagging protocol and
      switch combination. The only way to mix and match them is by modifying
      the code, and this applies to dsa_loop as well (by default that uses
      DSA_TAG_PROTO_NONE). So while in principle there is an issue, in
      practice there isn't one.
      
      Until we extend dsa_loop to allow user space configuration, treat the
      problem as a non-issue and just say that DSA ports found by tag_sja1105
      are always sja1105 ports, which is in fact true. But keep the
      dsa_port_is_sja1105 function so that it's easy to patch it during
      testing, and rely on dead code elimination.
      
      Fixes: 994d2cbb ("net: dsa: tag_sja1105: be dsa_loop-safe")
      Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f5aef424
    • V
      net: dsa: move sja1110_process_meta_tstamp inside the tagging protocol driver · 6d709cad
      Vladimir Oltean 提交于
      The problem is that DSA tagging protocols really must not depend on the
      switch driver, because this creates a circular dependency at insmod
      time, and the switch driver will effectively not load when the tagging
      protocol driver is missing.
      
      The code was structured in the way it was for a reason, though. The DSA
      driver-facing API for PTP timestamping relies on the assumption that
      two-step TX timestamps are provided by the hardware in an out-of-band
      manner, typically by raising an interrupt and making that timestamp
      available inside some sort of FIFO which is to be accessed over
      SPI/MDIO/etc.
      
      So the API puts .port_txtstamp into dsa_switch_ops, because it is
      expected that the switch driver needs to save some state (like put the
      skb into a queue until its TX timestamp arrives).
      
      On SJA1110, TX timestamps are provided by the switch as Ethernet
      packets, so this makes them be received and processed by the tagging
      protocol driver. This in itself is great, because the timestamps are
      full 64-bit and do not require reconstruction, and since Ethernet is the
      fastest I/O method available to/from the switch, PTP timestamps arrive
      very quickly, no matter how bottlenecked the SPI connection is, because
      SPI interaction is not needed at all.
      
      DSA's code structure and strict isolation between the tagging protocol
      driver and the switch driver break the natural code organization.
      
      When the tagging protocol driver receives a packet which is classified
      as a metadata packet containing timestamps, it passes those timestamps
      one by one to the switch driver, which then proceeds to compare them
      based on the recorded timestamp ID that was generated in .port_txtstamp.
      
      The communication between the tagging protocol and the switch driver is
      done through a method exported by the switch driver, sja1110_process_meta_tstamp.
      To satisfy build requirements, we force a dependency to build the
      tagging protocol driver as a module when the switch driver is a module.
      However, as explained in the first paragraph, that causes the circular
      dependency.
      
      To solve this, move the skb queue from struct sja1105_private :: struct
      sja1105_ptp_data to struct sja1105_private :: struct sja1105_tagger_data.
      The latter is a data structure for which hacks have already been put
      into place to be able to create persistent storage per switch that is
      accessible from the tagging protocol driver (see sja1105_setup_ports).
      
      With the skb queue directly accessible from the tagging protocol driver,
      we can now move sja1110_process_meta_tstamp into the tagging driver
      itself, and avoid exporting a symbol.
      
      Fixes: 566b18c8 ("net: dsa: sja1105: implement TX timestamping for SJA1110")
      Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6d709cad
    • V
      net: dsa: sja1105: remove sp->dp · 68a81bb2
      Vladimir Oltean 提交于
      It looks like this field was never used since its introduction in commit
      227d07a0 ("net: dsa: sja1105: Add support for traffic through
      standalone ports") remove it.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68a81bb2
  3. 19 9月, 2021 1 次提交
    • V
      net: dsa: be compatible with masters which unregister on shutdown · 0650bf52
      Vladimir Oltean 提交于
      Lino reports that on his system with bcmgenet as DSA master and KSZ9897
      as a switch, rebooting or shutting down never works properly.
      
      What does the bcmgenet driver have special to trigger this, that other
      DSA masters do not? It has an implementation of ->shutdown which simply
      calls its ->remove implementation. Otherwise said, it unregisters its
      network interface on shutdown.
      
      This message can be seen in a loop, and it hangs the reboot process there:
      
      unregister_netdevice: waiting for eth0 to become free. Usage count = 3
      
      So why 3?
      
      A usage count of 1 is normal for a registered network interface, and any
      virtual interface which links itself as an upper of that will increment
      it via dev_hold. In the case of DSA, this is the call path:
      
      dsa_slave_create
      -> netdev_upper_dev_link
         -> __netdev_upper_dev_link
            -> __netdev_adjacent_dev_insert
               -> dev_hold
      
      So a DSA switch with 3 interfaces will result in a usage count elevated
      by two, and netdev_wait_allrefs will wait until they have gone away.
      
      Other stacked interfaces, like VLAN, watch NETDEV_UNREGISTER events and
      delete themselves, but DSA cannot just vanish and go poof, at most it
      can unbind itself from the switch devices, but that must happen strictly
      earlier compared to when the DSA master unregisters its net_device, so
      reacting on the NETDEV_UNREGISTER event is way too late.
      
      It seems that it is a pretty established pattern to have a driver's
      ->shutdown hook redirect to its ->remove hook, so the same code is
      executed regardless of whether the driver is unbound from the device, or
      the system is just shutting down. As Florian puts it, it is quite a big
      hammer for bcmgenet to unregister its net_device during shutdown, but
      having a common code path with the driver unbind helps ensure it is well
      tested.
      
      So DSA, for better or for worse, has to live with that and engage in an
      arms race of implementing the ->shutdown hook too, from all individual
      drivers, and do something sane when paired with masters that unregister
      their net_device there. The only sane thing to do, of course, is to
      unlink from the master.
      
      However, complications arise really quickly.
      
      The pattern of redirecting ->shutdown to ->remove is not unique to
      bcmgenet or even to net_device drivers. In fact, SPI controllers do it
      too (see dspi_shutdown -> dspi_remove), and presumably, I2C controllers
      and MDIO controllers do it too (this is something I have not researched
      too deeply, but even if this is not the case today, it is certainly
      plausible to happen in the future, and must be taken into consideration).
      
      Since DSA switches might be SPI devices, I2C devices, MDIO devices, the
      insane implication is that for the exact same DSA switch device, we
      might have both ->shutdown and ->remove getting called.
      
      So we need to do something with that insane environment. The pattern
      I've come up with is "if this, then not that", so if either ->shutdown
      or ->remove gets called, we set the device's drvdata to NULL, and in the
      other hook, we check whether the drvdata is NULL and just do nothing.
      This is probably not necessary for platform devices, just for devices on
      buses, but I would really insist for consistency among drivers, because
      when code is copy-pasted, it is not always copy-pasted from the best
      sources.
      
      So depending on whether the DSA switch's ->remove or ->shutdown will get
      called first, we cannot really guarantee even for the same driver if
      rebooting will result in the same code path on all platforms. But
      nonetheless, we need to do something minimally reasonable on ->shutdown
      too to fix the bug. Of course, the ->remove will do more (a full
      teardown of the tree, with all data structures freed, and this is why
      the bug was not caught for so long). The new ->shutdown method is kept
      separate from dsa_unregister_switch not because we couldn't have
      unregistered the switch, but simply in the interest of doing something
      quick and to the point.
      
      The big question is: does the DSA switch's ->shutdown get called earlier
      than the DSA master's ->shutdown? If not, there is still a risk that we
      might still trigger the WARN_ON in unregister_netdevice that says we are
      attempting to unregister a net_device which has uppers. That's no good.
      Although the reference to the master net_device won't physically go away
      even if DSA's ->shutdown comes afterwards, remember we have a dev_hold
      on it.
      
      The answer to that question lies in this comment above device_link_add:
      
       * A side effect of the link creation is re-ordering of dpm_list and the
       * devices_kset list by moving the consumer device and all devices depending
       * on it to the ends of these lists (that does not happen to devices that have
       * not been registered when this function is called).
      
      so the fact that DSA uses device_link_add towards its master is not
      exactly for nothing. device_shutdown() walks devices_kset from the back,
      so this is our guarantee that DSA's shutdown happens before the master's
      shutdown.
      
      Fixes: 2f1e8ea7 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings")
      Link: https://lore.kernel.org/netdev/20210909095324.12978-1-LinoSanfilippo@gmx.de/Reported-by: NLino Sanfilippo <LinoSanfilippo@gmx.de>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0650bf52
  4. 17 9月, 2021 1 次提交
    • V
      net: update NXP copyright text · 3c9cfb52
      Vladimir Oltean 提交于
      NXP Legal insists that the following are not fine:
      
      - Saying "NXP Semiconductors" instead of "NXP", since the company's
        registered name is "NXP"
      
      - Putting a "(c)" sign in the copyright string
      
      - Putting a comma in the copyright string
      
      The only accepted copyright string format is "Copyright <year-range> NXP".
      
      This patch changes the copyright headers in the networking files that
      were sent by me, or derived from code sent by me.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c9cfb52
  5. 25 8月, 2021 3 次提交
    • V
      net: dsa: tag_sja1105: stop asking the sja1105 driver in sja1105_xmit_tpid · 8ded9160
      Vladimir Oltean 提交于
      Introduced in commit 38b5beea ("net: dsa: sja1105: prepare tagger
      for handling DSA tags and VLAN simultaneously"), the sja1105_xmit_tpid
      function solved quite a different problem than our needs are now.
      
      Then, we used best-effort VLAN filtering and we were using the xmit_tpid
      to tunnel packets coming from an 8021q upper through the TX VLAN allocated
      by tag_8021q to that egress port. The need for a different VLAN protocol
      depending on switch revision came from the fact that this in itself was
      more of a hack to trick the hardware into accepting tunneled VLANs in
      the first place.
      
      Right now, we deny 8021q uppers (see sja1105_prechangeupper). Even if we
      supported them again, we would not do that using the same method of
      {tunneling the VLAN on egress, retagging the VLAN on ingress} that we
      had in the best-effort VLAN filtering mode. It seems rather simpler that
      we just allocate a VLAN in the VLAN table that is simply not used by the
      bridge at all, or by any other port.
      
      Anyway, I have 2 gripes with the current sja1105_xmit_tpid:
      
      1. When sending packets on behalf of a VLAN-aware bridge (with the new
         TX forwarding offload framework) plus untagged (with the tag_8021q
         VLAN added by the tagger) packets, we can see that on SJA1105P/Q/R/S
         and later (which have a qinq_tpid of ETH_P_8021AD), some packets sent
         through the DSA master have a VLAN protocol of 0x8100 and others of
         0x88a8. This is strange and there is no reason for it now. If we have
         a bridge and are therefore forced to send using that bridge's TPID,
         we can as well blend with that bridge's VLAN protocol for all packets.
      
      2. The sja1105_xmit_tpid introduces a dependency on the sja1105 driver,
         because it looks inside dp->priv. It is desirable to keep as much
         separation between taggers and switch drivers as possible. Now it
         doesn't do that anymore.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ded9160
    • V
      net: dsa: sja1105: drop untagged packets on the CPU and DSA ports · b0b8c67e
      Vladimir Oltean 提交于
      The sja1105 driver is a bit special in its use of VLAN headers as DSA
      tags. This is because in VLAN-aware mode, the VLAN headers use an actual
      TPID of 0x8100, which is understood even by the DSA master as an actual
      VLAN header.
      
      Furthermore, control packets such as PTP and STP are transmitted with no
      VLAN header as a DSA tag, because, depending on switch generation, there
      are ways to steer these control packets towards a precise egress port
      other than VLAN tags. Transmitting control packets as untagged means
      leaving a door open for traffic in general to be transmitted as untagged
      from the DSA master, and for it to traverse the switch and exit a random
      switch port according to the FDB lookup.
      
      This behavior is a bit out of line with other DSA drivers which have
      native support for DSA tagging. There, it is to be expected that the
      switch only accepts DSA-tagged packets on its CPU port, dropping
      everything that does not match this pattern.
      
      We perhaps rely a bit too much on the switches' hardware dropping on the
      CPU port, and place no other restrictions in the kernel data path to
      avoid that. For example, sja1105 is also a bit special in that STP/PTP
      packets are transmitted using "management routes"
      (sja1105_port_deferred_xmit): when sending a link-local packet from the
      CPU, we must first write a SPI message to the switch to tell it to
      expect a packet towards multicast MAC DA 01-80-c2-00-00-0e, and to route
      it towards port 3 when it gets it. This entry expires as soon as it
      matches a packet received by the switch, and it needs to be reinstalled
      for the next packet etc. All in all quite a ghetto mechanism, but it is
      all that the sja1105 switches offer for injecting a control packet.
      The driver takes a mutex for serializing control packets and making the
      pairs of SPI writes of a management route and its associated skb atomic,
      but to be honest, a mutex is only relevant as long as all parties agree
      to take it. With the DSA design, it is possible to open an AF_PACKET
      socket on the DSA master net device, and blast packets towards
      01-80-c2-00-00-0e, and whatever locking the DSA switch driver might use,
      it all goes kaput because management routes installed by the driver will
      match skbs sent by the DSA master, and not skbs generated by the driver
      itself. So they will end up being routed on the wrong port.
      
      So through the lens of that, maybe it would make sense to avoid that
      from happening by doing something in the network stack, like: introduce
      a new bit in struct sk_buff, like xmit_from_dsa. Then, somewhere around
      dev_hard_start_xmit(), introduce the following check:
      
      	if (netdev_uses_dsa(dev) && !skb->xmit_from_dsa)
      		kfree_skb(skb);
      
      Ok, maybe that is a bit drastic, but that would at least prevent a bunch
      of problems. For example, right now, even though the majority of DSA
      switches drop packets without DSA tags sent by the DSA master (and
      therefore the majority of garbage that user space daemons like avahi and
      udhcpcd and friends create), it is still conceivable that an aggressive
      user space program can open an AF_PACKET socket and inject a spoofed DSA
      tag directly on the DSA master. We have no protection against that; the
      packet will be understood by the switch and be routed wherever user
      space says. Furthermore: there are some DSA switches where we even have
      register access over Ethernet, using DSA tags. So even user space
      drivers are possible in this way. This is a huge hole.
      
      However, the biggest thing that bothers me is that udhcpcd attempts to
      ask for an IP address on all interfaces by default, and with sja1105, it
      will attempt to get a valid IP address on both the DSA master as well as
      on sja1105 switch ports themselves. So with IP addresses in the same
      subnet on multiple interfaces, the routing table will be messed up and
      the system will be unusable for traffic until it is configured manually
      to not ask for an IP address on the DSA master itself.
      
      It turns out that it is possible to avoid that in the sja1105 driver, at
      least very superficially, by requesting the switch to drop VLAN-untagged
      packets on the CPU port. With the exception of control packets, all
      traffic originated from tag_sja1105.c is already VLAN-tagged, so only
      STP and PTP packets need to be converted. For that, we need to uphold
      the equivalence between an untagged and a pvid-tagged packet, and to
      remember that the CPU port of sja1105 uses a pvid of 4095.
      
      Now that we drop untagged traffic on the CPU port, non-aggressive user
      space applications like udhcpcd stop bothering us, and sja1105 effectively
      becomes just as vulnerable to the aggressive kind of user space programs
      as other DSA switches are (ok, users can also create 8021q uppers on top
      of the DSA master in the case of sja1105, but in future patches we can
      easily deny that, but it still doesn't change the fact that VLAN-tagged
      packets can still be injected over raw sockets).
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b0b8c67e
    • V
      net: dsa: sja1105: prevent tag_8021q VLANs from being received on user ports · 73ceab83
      Vladimir Oltean 提交于
      Currently it is possible for an attacker to craft packets with a fake
      DSA tag and send them to us, and our user ports will accept them and
      preserve that VLAN when transmitting towards the CPU. Then the tagger
      will be misled into thinking that the packets came on a different port
      than they really came on.
      
      Up until recently there wasn't a good option to prevent this from
      happening. In SJA1105P and later, the MAC Configuration Table introduced
      two options called:
      - DRPSITAG: Drop Single Inner Tagged Frames
      - DRPSOTAG: Drop Single Outer Tagged Frames
      
      Because the sja1105 driver classifies all VLANs as "outer VLANs" (S-Tags),
      it would be in principle possible to enable the DRPSOTAG bit on ports
      using tag_8021q, and drop on ingress all packets which have a VLAN tag.
      When the switch is VLAN-unaware, this works, because it uses a custom
      TPID of 0xdadb, so any "tagged" packets received on a user port are
      probably a spoofing attempt. But when the switch overall is VLAN-aware,
      and some ports are standalone (therefore they use tag_8021q), the TPID
      is 0x8100, and the port can receive a mix of untagged and VLAN-tagged
      packets. The untagged ones will be classified to the tag_8021q pvid, and
      the tagged ones to the VLAN ID from the packet header. Yes, it is true
      that since commit 4fbc08bd ("net: dsa: sja1105: deny 8021q uppers on
      ports") we no longer support this mixed mode, but that is a temporary
      limitation which will eventually be lifted. It would be nice to not
      introduce one more restriction via DRPSOTAG, which would make the
      standalone ports of a VLAN-aware switch drop genuinely VLAN-tagged
      packets.
      
      Also, the DRPSOTAG bit is not available on the first generation of
      switches (SJA1105E, SJA1105T). So since one of the key features of this
      driver is compatibility across switch generations, this makes it an even
      less desirable approach.
      
      The breakthrough comes from commit bef0746c ("net: dsa: sja1105:
      make sure untagged packets are dropped on ingress ports with no pvid"),
      where it became obvious that untagged packets are not dropped even if
      the ingress port is not in the VMEMB_PORT vector of that port's pvid.
      However, VLAN-tagged packets are subject to VLAN ingress
      checking/dropping. This means that instead of using the catch-all
      DRPSOTAG bit introduced in SJA1105P, we can drop tagged packets on a
      per-VLAN basis, and this is already compatible with SJA1105E/T.
      
      This patch adds an "allowed_ingress" argument to sja1105_vlan_add(), and
      we call it with "false" for tag_8021q VLANs on user ports. The tag_8021q
      VLANs still need to be allowed, of course, on ingress to DSA ports and
      CPU ports.
      
      We also need to refine the drop_untagged check in sja1105_commit_pvid to
      make it not freak out about this new configuration. Currently it will
      try to keep the configuration consistent between untagged and pvid-tagged
      packets, so if the pvid of a port is 1 but VLAN 1 is not in VMEMB_PORT,
      packets tagged with VID 1 will behave the same as untagged packets, and
      be dropped. This behavior is what we want for ports under a VLAN-aware
      bridge, but for the ports with a tag_8021q pvid, we want untagged
      packets to be accepted, but packets tagged with a header recognized by
      the switch as a tag_8021q VLAN to be dropped. So only restrict the
      drop_untagged check to apply to the bridge_pvid, not to the tag_8021q_pvid.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73ceab83
  6. 18 8月, 2021 2 次提交
    • V
      net: dsa: tag_sja1105: be dsa_loop-safe · 994d2cbb
      Vladimir Oltean 提交于
      Add support for tag_sja1105 running on non-sja1105 DSA ports, by making
      sure that every time we dereference dp->priv, we check the switch's
      dsa_switch_ops (otherwise we access a struct sja1105_port structure that
      is in fact something else).
      
      This adds an unconditional build-time dependency between sja1105 being
      built as module => tag_sja1105 must also be built as module. This was
      there only for PTP before.
      
      Some sane defaults must also take place when not running on sja1105
      hardware. These are:
      
      - sja1105_xmit_tpid: the sja1105 driver uses different VLAN protocols
        depending on VLAN awareness and switch revision (when an encapsulated
        VLAN must be sent). Default to 0x8100.
      
      - sja1105_rcv_meta_state_machine: this aggregates PTP frames with their
        metadata timestamp frames. When running on non-sja1105 hardware, don't
        do that and accept all frames unmodified.
      
      - sja1105_defer_xmit: calls sja1105_port_deferred_xmit in sja1105_main.c
        which writes a management route over SPI. When not running on sja1105
        hardware, bypass the SPI write and send the frame as-is.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      994d2cbb
    • V
      net: dsa: sja1105: fix use-after-free after calling of_find_compatible_node, or worse · ed5d2937
      Vladimir Oltean 提交于
      It seems that of_find_compatible_node has a weird calling convention in
      which it calls of_node_put() on the "from" node argument, instead of
      leaving that up to the caller. This comes from the fact that
      of_find_compatible_node with a non-NULL "from" argument it only supposed
      to be used as the iterator function of for_each_compatible_node(). OF
      iterator functions call of_node_get on the next OF node and of_node_put()
      on the previous one.
      
      When of_find_compatible_node calls of_node_put, it actually never
      expects the refcount to drop to zero, because the call is done under the
      atomic devtree_lock context, and when the refcount drops to zero it
      triggers a kobject and a sysfs file deletion, which assume blocking
      context.
      
      So any driver call to of_find_compatible_node is probably buggy because
      an unexpected of_node_put() takes place.
      
      What should be done is to use the of_get_compatible_child() function.
      
      Fixes: 5a8f0974 ("net: dsa: sja1105: register the MDIO buses for 100base-T1 and 100base-TX")
      Link: https://lore.kernel.org/netdev/20210814010139.kzryimmp4rizlznt@skbuf/Suggested-by: NFrank Rowand <frowand.list@gmail.com>
      Suggested-by: NRob Herring <robh+dt@kernel.org>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ed5d2937
  7. 16 8月, 2021 1 次提交
    • V
      net: dsa: sja1105: reorganize probe, remove, setup and teardown ordering · 022522ac
      Vladimir Oltean 提交于
      The sja1105 driver's initialization and teardown sequence is a chaotic
      mess that has gathered a lot of cruft over time. It works because there
      is no strict dependency between the functions, but it could be improved.
      
      The basic principle that teardown should be the exact reverse of setup
      is obviously not held. We have initialization steps (sja1105_tas_setup,
      sja1105_flower_setup) in the probe method that are torn down in the DSA
      .teardown method instead of driver unbind time.
      
      We also have code after the dsa_register_switch() call, which implicitly
      means after the .setup() method has finished, which is pretty unusual.
      
      Also, sja1105_teardown() has calls set up in a different order than the
      error path of sja1105_setup(): see the reversed ordering between
      sja1105_ptp_clock_unregister and sja1105_mdiobus_unregister.
      
      Also, sja1105_static_config_load() is called towards the end of
      sja1105_setup(), but sja1105_static_config_free() is also towards the
      end of the error path and teardown path. The static_config_load() call
      should be earlier.
      
      Also, making and breaking the connections between struct sja1105_port
      and struct dsa_port could be refactored into dedicated functions, makes
      the code easier to follow.
      
      We move some code from the DSA .setup() method into the probe method,
      like the device tree parsing, and we move some code from the probe
      method into the DSA .setup() method to be symmetric with its placement
      in the DSA .teardown() method, which is nice because the unbind function
      has a single call to dsa_unregister_switch(). Example of the latter type
      of code movement are the connections between ports mentioned above, they
      are now in the .setup() method.
      
      Finally, due to fact that the kthread_init_worker() call is no longer
      in sja1105_probe() - located towards the bottom of the file - but in
      sja1105_setup() - located much higher - there is an inverse ordering
      with the worker function declaration, sja1105_port_deferred_xmit. To
      avoid that, the entire sja1105_setup() and sja1105_teardown() functions
      are moved towards the bottom of the file.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      022522ac
  8. 14 8月, 2021 1 次提交
    • A
      ethernet: fix PTP_1588_CLOCK dependencies · e5f31552
      Arnd Bergmann 提交于
      The 'imply' keyword does not do what most people think it does, it only
      politely asks Kconfig to turn on another symbol, but does not prevent
      it from being disabled manually or built as a loadable module when the
      user is built-in. In the ICE driver, the latter now causes a link failure:
      
      aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_eth_ioctl':
      ice_main.c:(.text+0x13b0): undefined reference to `ice_ptp_get_ts_config'
      ice_main.c:(.text+0x13b0): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_get_ts_config'
      aarch64-linux-ld: ice_main.c:(.text+0x13bc): undefined reference to `ice_ptp_set_ts_config'
      ice_main.c:(.text+0x13bc): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_set_ts_config'
      aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_prepare_for_reset':
      ice_main.c:(.text+0x31fc): undefined reference to `ice_ptp_release'
      ice_main.c:(.text+0x31fc): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_release'
      aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_rebuild':
      
      This is a recurring problem in many drivers, and we have discussed
      it several times befores, without reaching a consensus. I'm providing
      a link to the previous email thread for reference, which discusses
      some related problems.
      
      To solve the dependency issue better than the 'imply' keyword, introduce a
      separate Kconfig symbol "CONFIG_PTP_1588_CLOCK_OPTIONAL" that any driver
      can depend on if it is able to use PTP support when available, but works
      fine without it. Whenever CONFIG_PTP_1588_CLOCK=m, those drivers are
      then prevented from being built-in, the same way as with a 'depends on
      PTP_1588_CLOCK || !PTP_1588_CLOCK' dependency that does the same trick,
      but that can be rather confusing when you first see it.
      
      Since this should cover the dependencies correctly, the IS_REACHABLE()
      hack in the header is no longer needed now, and can be turned back
      into a normal IS_ENABLED() check. Any driver that gets the dependency
      wrong will now cause a link time failure rather than being unable to use
      PTP support when that is in a loadable module.
      
      However, the two recently added ptp_get_vclocks_index() and
      ptp_convert_timestamp() interfaces are only called from builtin code with
      ethtool and socket timestamps, so keep the current behavior by stubbing
      those out completely when PTP is in a loadable module. This should be
      addressed properly in a follow-up.
      
      As Richard suggested, we may want to actually turn PTP support into a
      'bool' option later on, preventing it from being a loadable module
      altogether, which would be one way to solve the problem with the ethtool
      interface.
      
      Fixes: 06c16d89 ("ice: register 1588 PTP clock device object for E810 devices")
      Link: https://lore.kernel.org/netdev/20210804121318.337276-1-arnd@kernel.org/
      Link: https://lore.kernel.org/netdev/CAK8P3a06enZOf=XyZ+zcAwBczv41UuCTz+=0FMf2gBz1_cOnZQ@mail.gmail.com/
      Link: https://lore.kernel.org/netdev/CAK8P3a3=eOxE-K25754+fB_-i_0BZzf9a9RfPTX3ppSwu9WZXw@mail.gmail.com/
      Link: https://lore.kernel.org/netdev/20210726084540.3282344-1-arnd@kernel.org/Acked-by: NShannon Nelson <snelson@pensando.io>
      Acked-by: NJacob Keller <jacob.e.keller@intel.com>
      Acked-by: NRichard Cochran <richardcochran@gmail.com>
      Reviewed-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Link: https://lore.kernel.org/r/20210812183509.1362782-1-arnd@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      e5f31552
  9. 12 8月, 2021 1 次提交
  10. 10 8月, 2021 1 次提交
    • V
      net: dsa: sja1105: fix broken backpressure in .port_fdb_dump · 21b52fed
      Vladimir Oltean 提交于
      rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into
      multiple netlink skbs if the buffer provided by user space is too small
      (one buffer will typically handle a few hundred FDB entries).
      
      When the current buffer becomes full, nlmsg_put() in
      dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index
      of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that
      point, and then the dump resumes on the same port with a new skb, and
      FDB entries up to the saved index are simply skipped.
      
      Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to
      drivers, then drivers must check for the -EMSGSIZE error code returned
      by it. Otherwise, when a netlink skb becomes full, DSA will no longer
      save newly dumped FDB entries to it, but the driver will continue
      dumping. So FDB entries will be missing from the dump.
      
      Fix the broken backpressure by propagating the "cb" return code and
      allow rtnl_fdb_dump() to restart the FDB dump with a new skb.
      
      Fixes: 291d1e72 ("net: dsa: sja1105: Add support for FDB and MDB management")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      21b52fed
  11. 09 8月, 2021 2 次提交
    • V
      net: dsa: sja1105: add FDB fast ageing support · 5126ec72
      Vladimir Oltean 提交于
      Delete the dynamically learned FDB entries when the STP state changes
      and when address learning is disabled.
      
      On sja1105 there is no shorthand SPI command for this, so we need to
      walk through the entire FDB to delete.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5126ec72
    • V
      net: dsa: sja1105: rely on DSA core tracking of port learning state · 5313a37b
      Vladimir Oltean 提交于
      Now that DSA keeps track of the port learning state, it becomes
      superfluous to keep an additional variable with this information in the
      sja1105 driver. Remove it.
      
      The DSA core's learning state is present in struct dsa_port *dp.
      To avoid the antipattern where we iterate through a DSA switch's
      ports and then call dsa_to_port to obtain the "dp" reference (which is
      bad because dsa_to_port iterates through the DSA switch tree once
      again), just iterate through the dst->ports and operate on those
      directly.
      
      The sja1105 had an extra use of priv->learn_ena on non-user ports. DSA
      does not touch the learning state of those ports - drivers are free to
      do what they wish on them. Mark that information with a comment in
      struct dsa_port and let sja1105 set dp->learning for cascade ports.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5313a37b
  12. 05 8月, 2021 6 次提交
    • V
      net: dsa: sja1105: enable address learning on cascade ports · 81d45898
      Vladimir Oltean 提交于
      Right now, address learning is disabled on DSA ports, which means that a
      packet received over a DSA port from a cross-chip switch will be flooded
      to unrelated ports.
      
      It is desirable to eliminate that, but for that we need a breakdown of
      the possibilities for the sja1105 driver. A DSA port can be:
      
      - a downstream-facing cascade port. This is simple because it will
        always receive packets from a downstream switch, and there should be
        no other route to reach that downstream switch in the first place,
        which means it should be safe to learn that MAC address towards that
        switch.
      
      - an upstream-facing cascade port. This receives packets either:
        * autonomously forwarded by an upstream switch (and therefore these
          packets belong to the data plane of a bridge, so address learning
          should be ok), or
        * injected from the CPU. This deserves further discussion, as normally,
          an upstream-facing cascade port is no different than the CPU port
          itself. But with "H" topologies (a DSA link towards a switch that
          has its own CPU port), these are more "laterally-facing" cascade
          ports than they are "upstream-facing". Here, there is a risk that
          the port might learn the host addresses on the wrong port (on the
          DSA port instead of on its own CPU port), but this is solved by
          DSA's RX filtering infrastructure, which installs the host addresses
          as static FDB entries on the CPU port of all switches in a "H" tree.
          So even if there will be an attempt from the switch to migrate the
          FDB entry from the CPU port to the laterally-facing cascade port, it
          will fail to do that, because the FDB entry that already exists is
          static and cannot migrate. So address learning should be safe for
          this configuration too.
      
      Ok, so what about other MAC addresses coming from the host, not
      necessarily the bridge local FDB entries? What about MAC addresses
      dynamically learned on foreign interfaces, isn't there a risk that
      cascade ports will learn these entries dynamically when they are
      supposed to be delivered towards the CPU port? Well, that is correct,
      and this is why we also need to enable the assisted learning feature, to
      snoop for these addresses and write them to hardware as static FDB
      entries towards the CPU, to make the switch's learning process on the
      cascade ports ineffective for them. With assisted learning enabled, the
      hardware learning on the CPU port must be disabled.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      81d45898
    • V
      net: dsa: sja1105: suppress TX packets from looping back in "H" topologies · 0f9b762c
      Vladimir Oltean 提交于
      H topologies like this one have a problem:
      
               eth0                                                     eth1
                |                                                        |
             CPU port                                                CPU port
                |                        DSA link                        |
       sw0p0  sw0p1  sw0p2  sw0p3  sw0p4 -------- sw1p4  sw1p3  sw1p2  sw1p1  sw1p0
         |             |      |                            |      |             |
       user          user   user                         user   user          user
       port          port   port                         port   port          port
      
      Basically any packet sent by the eth0 DSA master can be flooded on the
      interconnecting DSA link sw0p4 <-> sw1p4 and it will be received by the
      eth1 DSA master too. Basically we are talking to ourselves.
      
      In VLAN-unaware mode, these packets are encoded using a tag_8021q TX
      VLAN, which dsa_8021q_rcv() rightfully cannot decode and complains.
      Whereas in VLAN-aware mode, the packets are encoded with a bridge VLAN
      which _can_ be decoded by the tagger running on eth1, so it will attempt
      to reinject that packet into the network stack (the bridge, if there is
      any port under eth1 that is under a bridge). In the case where the ports
      under eth1 are under the same cross-chip bridge as the ports under eth0,
      the TX packets will even be learned as RX packets. The only thing that
      will prevent loops with the software bridging path, and therefore
      disaster, is that the source port and the destination port are in the
      same hardware domain, and the bridge will receive packets from the
      driver with skb->offload_fwd_mark = true and will not forward between
      the two.
      
      The proper solution to this problem is to detect H topologies and
      enforce that all packets are received through the local switch and we do
      not attempt to receive packets on our CPU port from switches that have
      their own. This is a viable solution which works thanks to the fact that
      MAC addresses which should be filtered towards the host are installed by
      DSA as static MAC addresses towards the CPU port of each switch.
      
      TX from a CPU port towards the DSA port continues to be allowed, this is
      because sja1105 supports bridge TX forwarding offload, and the skb->dev
      used initially for xmit does not have any direct correlation with where
      the station that will respond to that packet is connected. It may very
      well happen that when we send a ping through a br0 interface that spans
      all switch ports, the xmit packet will exit the system through a DSA
      switch interface under eth1 (say sw1p2), but the destination station is
      connected to a switch port under eth0, like sw0p0. So the switch under
      eth1 needs to communicate on TX with the switch under eth0. The
      response, however, will not follow the same path, but instead, this
      patch enforces that the response is sent by the first switch directly to
      its DSA master which is eth0.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0f9b762c
    • V
      net: dsa: sja1105: increase MTU to account for VLAN header on DSA ports · 777e55e3
      Vladimir Oltean 提交于
      Since all packets are transmitted as VLAN-tagged over a DSA link (this
      VLAN tag represents the tag_8021q header), we need to increase the MTU
      of these interfaces to account for the possibility that we are already
      transporting a user-visible VLAN header.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      777e55e3
    • V
      net: dsa: sja1105: manage VLANs on cascade ports · c5130029
      Vladimir Oltean 提交于
      Since commit ed040abc ("net: dsa: sja1105: use 4095 as the private
      VLAN for untagged traffic"), this driver uses a reserved value as pvid
      for the host port (DSA CPU port). Control packets which are sent as
      untagged get classified to this VLAN, and all ports are members of it
      (this is to be expected for control packets).
      
      Manage all cascade ports in the same way and allow control packets to
      egress everywhere.
      
      Also, all VLANs need to be sent as egress-tagged on all cascade ports.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5130029
    • V
      net: dsa: sja1105: manage the forwarding domain towards DSA ports · 3fa21270
      Vladimir Oltean 提交于
      Manage DSA links towards other switches, be they host ports or cascade
      ports, the same as the CPU port, i.e. allow forwarding and flooding
      unconditionally from all user ports.
      
      We send packets as always VLAN-tagged on a DSA port, and we rely on the
      cross-chip notifiers from tag_8021q to install the RX VLAN of a switch
      port only on the proper remote ports of another switch (the ports that
      are in the same bridging domain). So if there is no cross-chip bridging
      in the system, the flooded packets will be sent on the DSA ports too,
      but they will be dropped by the remote switches due to either
      (a) a lack of the RX VLAN in the VLAN table of the ingress DSA port, or
      (b) a lack of valid destinations for those packets, due to a lack of the
          RX VLAN on the user ports of the switch
      
      Note that switches which only transport packets in a cross-chip bridge,
      but have no user ports of their own as part of that bridge, such as
      switch 1 in this case:
      
                          DSA link                   DSA link
        sw0p0 sw0p1 sw0p2 -------- sw1p0 sw1p2 sw1p3 -------- sw2p0 sw2p2 sw2p3
      
      ip link set sw0p0 master br0
      ip link set sw2p3 master br0
      
      will still work, because the tag_8021q cross-chip notifiers keep the RX
      VLANs installed on all DSA ports.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3fa21270
    • V
      net: dsa: sja1105: configure the cascade ports based on topology · 30a100e6
      Vladimir Oltean 提交于
      The sja1105 switch family has a feature called "cascade ports" which can
      be used in topologies where multiple SJA1105/SJA1110 switches are daisy
      chained. Upstream switches set this bit for the DSA link towards the
      downstream switches. This is used when the upstream switch receives a
      control packet (PTP, STP) from a downstream switch, because if the
      source port for a control packet is marked as a cascade port, then the
      source port, switch ID and RX timestamp will not be taken again on the
      upstream switch, it is assumed that this has already been done by the
      downstream switch (the leaf port in the tree) and that the CPU has
      everything it needs to decode the information from this packet.
      
      We need to distinguish between an upstream-facing DSA link and a
      downstream-facing DSA link, because the upstream-facing DSA links are
      "host ports" for the SJA1105/SJA1110 switches, and the downstream-facing
      DSA links are "cascade ports".
      
      Note that SJA1105 supports a single cascade port, so only daisy chain
      topologies work. With SJA1110, there can be more complex topologies such
      as:
      
                          eth0
                           |
                       host port
                           |
       sw0p0    sw0p1    sw0p2    sw0p3    sw0p4
         |        |                 |        |
       cascade  cascade            user     user
        port     port              port     port
         |        |
         |        |
         |        |
         |       host
         |       port
         |        |
         |      sw1p0    sw1p1    sw1p2    sw1p3    sw1p4
         |                 |        |        |        |
         |                user     user     user     user
        host              port     port     port     port
        port
         |
       sw2p0    sw2p1    sw2p2    sw2p3    sw2p4
                  |        |        |        |
                 user     user     user     user
                 port     port     port     port
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30a100e6
  13. 02 8月, 2021 6 次提交
    • V
      net: dsa: sja1105: match FDB entries regardless of inner/outer VLAN tag · 47c2c0c2
      Vladimir Oltean 提交于
      On SJA1105P/Q/R/S and SJA1110, the L2 Lookup Table entries contain a
      maskable "inner/outer tag" bit which means:
      - when set to 1: match single-outer and double tagged frames
      - when set to 0: match untagged and single-inner tagged frames
      - when masked off: match all frames regardless of the type of tag
      
      This driver does not make any meaningful distinction between inner tags
      (matches on TPID) and outer tags (matches on TPID2). In fact, all VLAN
      table entries are installed as SJA1110_VLAN_D_TAG, which means that they
      match on both inner and outer tags.
      
      So it does not make sense that we install FDB entries with the IOTAG bit
      set to 1.
      
      In VLAN-unaware mode, we set both TPID and TPID2 to 0xdadb, so the
      switch will see frames as outer-tagged or double-tagged (never inner).
      So the FDB entries will match if IOTAG is set to 1.
      
      In VLAN-aware mode, we set TPID to 0x8100 and TPID2 to 0x88a8. So the
      switch will see untagged and 802.1Q-tagged packets as inner-tagged, and
      802.1ad-tagged packets as outer-tagged. So untagged and 802.1Q-tagged
      packets will not match FDB entries if IOTAG is set to 1, but 802.1ad
      tagged packets will. Strange.
      
      To fix this, simply mask off the IOTAG bit from FDB entries, and make
      them match regardless of whether the VLAN tag is inner or outer.
      
      Fixes: 1da73821 ("net: dsa: sja1105: Add FDB operations for P/Q/R/S series")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      47c2c0c2
    • V
      net: dsa: sja1105: be stateless with FDB entries on SJA1105P/Q/R/S/SJA1110 too · 589918df
      Vladimir Oltean 提交于
      Similar but not quite the same with what was done in commit b11f0a4c
      ("net: dsa: sja1105: be stateless when installing FDB entries") for
      SJA1105E/T, it is desirable to drop the priv->vlan_aware check and
      simply go ahead and install FDB entries in the VLAN that was given by
      the bridge.
      
      As opposed to SJA1105E/T, in SJA1105P/Q/R/S and SJA1110, the FDB is a
      maskable TCAM, and we are installing VLAN-unaware FDB entries with the
      VLAN ID masked off. However, such FDB entries might completely obscure
      VLAN-aware entries where the VLAN ID is included in the search mask,
      because the switch looks up the FDB from left to right and picks the
      first entry which results in a masked match. So it depends on whether
      the bridge installs first the VLAN-unaware or the VLAN-aware FDB entries.
      
      Anyway, if we had a VLAN-unaware FDB entry towards one set of DESTPORTS
      and a VLAN-aware one towards other set of DESTPORTS, the result is that
      the packets in VLAN-aware mode will be forwarded towards the DESTPORTS
      specified by the VLAN-unaware entry.
      
      To solve this, simply do not use the masked matching ability of the FDB
      for VLAN ID, and always match precisely on it. In VLAN-unaware mode, we
      configure the switch for shared VLAN learning, so the VLAN ID will be
      ignored anyway during lookup, so it is redundant to mask it off in the
      TCAM.
      
      This patch conflicts with net-next commit 0fac6aa0 ("net: dsa: sja1105:
      delete the best_effort_vlan_filtering mode") which changed this line:
      	if (priv->vlan_state != SJA1105_VLAN_UNAWARE) {
      into:
      	if (priv->vlan_aware) {
      
      When merging with net-next, the lines added by this patch should take
      precedence in the conflict resolution (i.e. the "if" condition should be
      deleted in both cases).
      
      Fixes: 1da73821 ("net: dsa: sja1105: Add FDB operations for P/Q/R/S series")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      589918df
    • V
      net: dsa: sja1105: ignore the FDB entry for unknown multicast when adding a new address · 728db843
      Vladimir Oltean 提交于
      Currently, when sja1105pqrs_fdb_add() is called for a host-joined IPv6
      MDB entry such as 33:33:00:00:00:6a, the search for that address will
      return the FDB entry for SJA1105_UNKNOWN_MULTICAST, which has a
      destination MAC of 01:00:00:00:00:00 and a mask of 01:00:00:00:00:00.
      It returns that entry because, well, it matches, in the sense that
      unknown multicast is supposed by design to match it...
      
      But the issue is that we then proceed to overwrite this entry with the
      one for our precise host-joined multicast address, and the unknown
      multicast entry is no longer there - unknown multicast is now flooded to
      the same group of ports as broadcast, which does not look up the FDB.
      
      To solve this problem, we should ignore searches that return the unknown
      multicast address as the match, and treat them as "no match" which will
      result in the entry being installed to hardware.
      
      For this to work properly, we need to put the result of the FDB search
      in a temporary variable in order to avoid overwriting the l2_lookup
      entry we want to program. The l2_lookup entry returned by the search
      might not have the same set of DESTPORTS and not even the same MACADDR
      as the entry we're trying to add.
      
      Fixes: 4d942354 ("net: dsa: sja1105: offload bridge port flags to device")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      728db843
    • V
      net: dsa: sja1105: invalidate dynamic FDB entries learned concurrently with statically added ones · 6c5fc159
      Vladimir Oltean 提交于
      The procedure to add a static FDB entry in sja1105 is concurrent with
      dynamic learning performed on all bridge ports and the CPU port.
      
      The switch looks up the FDB from left to right, and also learns
      dynamically from left to right, so it is possible that between the
      moment when we pick up a free slot to install an FDB entry, another slot
      to the left of that one becomes free due to an address ageing out, and
      that other slot is then immediately used by the switch to learn
      dynamically the same address as we're trying to add statically.
      
      The result is that we succeeded to add our static FDB entry, but it is
      being shadowed by a dynamic FDB entry to its left, and the switch will
      behave as if our static FDB entry did not exist.
      
      We cannot really prevent this from happening unless we make the entire
      process to add a static FDB entry a huge critical section where address
      learning is temporarily disabled on _all_ ports, and then re-enabled
      according to the configuration done by sja1105_port_set_learning.
      However, that is kind of disruptive for the operation of the network.
      
      What we can do alternatively is to simply read back the FDB for dynamic
      entries located before our newly added static one, and delete them.
      This will guarantee that our static FDB entry is now operational. It
      will still not guarantee that there aren't dynamic FDB entries to the
      _right_ of that static FDB entry, but at least those entries will age
      out by themselves since they aren't hit, and won't bother anyone.
      
      Fixes: 291d1e72 ("net: dsa: sja1105: Add support for FDB and MDB management")
      Fixes: 1da73821 ("net: dsa: sja1105: Add FDB operations for P/Q/R/S series")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6c5fc159
    • V
      net: dsa: sja1105: overwrite dynamic FDB entries with static ones in .port_fdb_add · e11e865b
      Vladimir Oltean 提交于
      The SJA1105 switch family leaves it up to software to decide where
      within the FDB to install a static entry, and to concatenate destination
      ports for already existing entries (the FDB is also used for multicast
      entries), it is not as simple as just saying "please add this entry".
      
      This means we first need to search for an existing FDB entry before
      adding a new one. The driver currently manages to fool itself into
      thinking that if an FDB entry already exists, there is nothing to be
      done. But that FDB entry might be dynamically learned, case in which it
      should be replaced with a static entry, but instead it is left alone.
      
      This patch checks the LOCKEDS ("locked/static") bit from found FDB
      entries, and lets the code "goto skip_finding_an_index;" if the FDB
      entry was not static. So we also need to move the place where we set
      LOCKEDS = true, to cover the new case where a dynamic FDB entry existed
      but was dynamic.
      
      Fixes: 291d1e72 ("net: dsa: sja1105: Add support for FDB and MDB management")
      Fixes: 1da73821 ("net: dsa: sja1105: Add FDB operations for P/Q/R/S series")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e11e865b
    • V
      net: dsa: sja1105: fix static FDB writes for SJA1110 · cb81698f
      Vladimir Oltean 提交于
      The blamed commit made FDB access on SJA1110 functional only as far as
      dumping the existing entries goes, but anything having to do with an
      entry's index (adding, deleting) is still broken.
      
      There are in fact 2 problems, all caused by improperly inheriting the
      code from SJA1105P/Q/R/S:
      - An entry size is SJA1110_SIZE_L2_LOOKUP_ENTRY (24) bytes and not
        SJA1105PQRS_SIZE_L2_LOOKUP_ENTRY (20) bytes
      - The "index" field within an FDB entry is at bits 10:1 for SJA1110 and
        not 15:6 as in SJA1105P/Q/R/S
      
      This patch moves the packing function for the cmd->index outside of
      sja1105pqrs_common_l2_lookup_cmd_packing() and into the device specific
      functions sja1105pqrs_l2_lookup_cmd_packing and
      sja1110_l2_lookup_cmd_packing.
      
      Fixes: 74e7feff ("net: dsa: sja1105: fix dynamic access to L2 Address Lookup table for SJA1110")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb81698f
  14. 29 7月, 2021 3 次提交
    • V
      net: dsa: sja1105: make sure untagged packets are dropped on ingress ports with no pvid · bef0746c
      Vladimir Oltean 提交于
      Surprisingly, this configuration:
      
      ip link add br0 type bridge vlan_filtering 1
      ip link set swp2 master br0
      bridge vlan del dev swp2 vid 1
      
      still has the sja1105 switch sending untagged packets to the CPU (and
      failing to decode them, since dsa_find_designated_bridge_port_by_vid
      searches by VID 1 and rightfully finds no bridge VLAN 1 on a port).
      
      Dumping the switch configuration, the VLANs are managed properly:
      - the pvid of swp2 is 1 in the MAC Configuration Table, but
      - only the CPU port is in the port membership of VLANID 1 in the VLAN
        Lookup Table
      
      When the ingress packets are tagged with VID 1, they are properly
      dropped. But when they are untagged, they are able to reach the CPU
      port. Also, when the pvid in the MAC Configuration Table is changed to
      e.g. 55 (an unused VLAN), the untagged packets are also dropped.
      
      So it looks like:
      - the switch bypasses ingress VLAN membership checks for untagged traffic
      - the reason why the untagged traffic is dropped when I make the pvid 55
        is due to the lack of valid destination ports in VLAN 55, rather than
        an ingress membership violation
      - the ingress VLAN membership cheks are only done for VLAN-tagged traffic
      
      Interesting. It looks like there is an explicit bit to drop untagged
      traffic, so we should probably be using that to preserve user expectations.
      
      Note that only VLAN-aware ports should drop untagged packets due to no
      pvid - when VLAN-unaware, the software bridge doesn't do this even if
      there is no pvid on any bridge port and on the bridge itself. So the new
      sja1105_drop_untagged() function cannot simply be called with "false"
      from sja1105_bridge_vlan_add() and with "true" from sja1105_bridge_vlan_del.
      Instead, we need to also consider the VLAN awareness state. That means
      we need to hook the "drop untagged" setting in all the same places where
      the "commit pvid" logic is, and it needs to factor in all the state when
      flipping the "drop untagged" bit: is our current pvid in the VLAN Lookup
      Table, and is the current port in that VLAN's port membership list?
      VLAN-unaware ports will never drop untagged frames because these checks
      always succeed by construction, and the tag_8021q VLANs cannot be changed
      by the user.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bef0746c
    • V
      net: dsa: sja1105: reset the port pvid when leaving a VLAN-aware bridge · cde8078e
      Vladimir Oltean 提交于
      Now that we no longer have the ultra-central sja1105_build_vlan_table(),
      we need to be more careful about checking all corner cases manually.
      
      For example, when a port leaves a VLAN-aware bridge, it becomes
      standalone so its pvid should become a tag_8021q RX VLAN again. However,
      sja1105_commit_pvid() only gets called from sja1105_bridge_vlan_add()
      and from sja1105_vlan_filtering(), and no VLAN awareness change takes
      place (VLAN filtering is a global setting for sja1105, so the switch
      remains VLAN-aware overall).
      
      This means that we need to put another sja1105_commit_pvid() call in
      sja1105_bridge_member().
      
      Fixes: 6dfd23d3 ("net: dsa: sja1105: delete vlan delta save/restore logic")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cde8078e
    • V
      net: dsa: sja1105: be stateless when installing FDB entries · b11f0a4c
      Vladimir Oltean 提交于
      Currently there are issues when adding a bridge FDB entry as VLAN-aware
      and deleting it as VLAN-unaware, or vice versa.
      
      However this is an unneeded complication, since the bridge always
      installs its default FDB entries in VLAN 0 to match on VLAN-unaware
      ports, and in the default_pvid (VLAN 1) to match on VLAN-aware ports.
      So instead of trying to outsmart the bridge, just install all entries it
      gives us, and they will start matching packets when the vlan_filtering
      mode changes.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b11f0a4c
  15. 27 7月, 2021 5 次提交
    • V
      net: dsa: sja1105: add bridge TX data plane offload based on tag_8021q · b6ad86e6
      Vladimir Oltean 提交于
      The main desire for having this feature in sja1105 is to support network
      stack termination for traffic coming from a VLAN-aware bridge.
      
      For sja1105, offloading the bridge data plane means sending packets
      as-is, with the proper VLAN tag, to the chip. The chip will look up its
      FDB and forward them to the correct destination port.
      
      But we support bridge data plane offload even for VLAN-unaware bridges,
      and the implementation there is different. In fact, VLAN-unaware
      bridging is governed by tag_8021q, so it makes sense to have the
      .bridge_fwd_offload_add() implementation fully within tag_8021q.
      The key difference is that we only support 1 VLAN-aware bridge, but we
      support multiple VLAN-unaware bridges. So we need to make sure that the
      forwarding domain is not crossed by packets injected from the stack.
      
      For this, we introduce the concept of a tag_8021q TX VLAN for bridge
      forwarding offload. As opposed to the regular TX VLANs which contain
      only 2 ports (the user port and the CPU port), a bridge data plane TX
      VLAN is "multicast" (or "imprecise"): it contains all the ports that are
      part of a certain bridge, and the hardware will select where the packet
      goes within this "imprecise" forwarding domain.
      
      Each VLAN-unaware bridge has its own "imprecise" TX VLAN, so we make use
      of the unique "bridge_num" provided by DSA for the data plane offload.
      We use the same 3 bits from the tag_8021q VLAN ID format to encode this
      bridge number.
      
      Note that these 3 bit positions have been used before for sub-VLANs in
      best-effort VLAN filtering mode. The difference is that for best-effort,
      the sub-VLANs were only valid on RX (and it was documented that the
      sub-VLAN field needed to be transmitted as zero). Whereas for the bridge
      data plane offload, these 3 bits are only valid on TX.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6ad86e6
    • V
      net: dsa: sja1105: add support for imprecise RX · 884be12f
      Vladimir Oltean 提交于
      This is already common knowledge by now, but the sja1105 does not have
      hardware support for DSA tagging for data plane packets, and tag_8021q
      sets up a unique pvid per port, transmitted as VLAN-tagged towards the
      CPU, for the source port to be decoded nonetheless.
      
      When the port is part of a VLAN-aware bridge, the pvid committed to
      hardware is taken from the bridge and not from tag_8021q, so we need to
      work with that the best we can.
      
      Configure the switches to send all packets to the CPU as VLAN-tagged
      (even ones that were originally untagged on the wire) and make use of
      dsa_untag_bridge_pvid() to get rid of it before we send those packets up
      the network stack.
      
      With the classified VLAN used by hardware known to the tagger, we first
      peek at the VID in an attempt to figure out if the packet was received
      from a VLAN-unaware port (standalone or under a VLAN-unaware bridge),
      case in which we can continue to call dsa_8021q_rcv(). If that is not
      the case, the packet probably came from a VLAN-aware bridge. So we call
      the DSA helper that finds for us a "designated bridge port" - one that
      is a member of the VLAN ID from the packet, and is in the proper STP
      state - basically these are all checks performed by br_handle_frame() in
      the software RX data path.
      
      The bridge will accept the packet as valid even if the source port was
      maybe wrong. So it will maybe learn the MAC SA of the packet on the
      wrong port, and its software FDB will be out of sync with the hardware
      FDB. So replies towards this same MAC DA will not work, because the
      bridge will send towards a different netdev.
      
      This is where the bridge data plane offload ("imprecise TX") added by
      the next patch comes in handy. The software FDB is wrong, true, but the
      hardware FDB isn't, and by offloading the bridge forwarding plane we
      have a chance to right a wrong, and have the hardware look up the FDB
      for us for the reply packet. So it all cancels out.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      884be12f
    • V
      net: dsa: sja1105: deny more than one VLAN-aware bridge · 19fa937a
      Vladimir Oltean 提交于
      With tag_sja1105.c's only ability being to perform an imprecise RX
      procedure and identify whether a packet comes from a VLAN-aware bridge
      or not, we have no way to determine whether a packet with VLAN ID 5
      comes from, say, br0 or br1. Actually we could, but it would mean that
      we need to restrict all VLANs from br0 to be different from all VLANs
      from br1, and this includes the default_pvid, which makes a setup with 2
      VLAN-aware bridges highly imprectical.
      
      The fact of the matter is that this isn't even that big of a practical
      limitation, since even with a single VLAN-aware bridge we can pretty
      much enforce forwarding isolation based on the VLAN port membership.
      
      So in the end, tell the user that they need to model their setup using a
      single VLAN-aware bridge.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      19fa937a
    • V
      net: dsa: sja1105: deny 8021q uppers on ports · 4fbc08bd
      Vladimir Oltean 提交于
      Now that best-effort VLAN filtering is gone and we are left with the
      imprecise RX and imprecise TX based in VLAN-aware mode, where the tagger
      just guesses the source port based on plausibility of the VLAN ID, 8021q
      uppers installed on top of a standalone port, while other ports of that
      switch are under a VLAN-aware bridge don't quite "just work".
      
      In fact it could be possible to restrict the VLAN IDs used by the 8021q
      uppers to not be shared with VLAN IDs used by that VLAN-aware bridge,
      but then the tagger needs to be patched to search for 8021q uppers too,
      not just for the "designated bridge port" which will be introduced in a
      later patch.
      
      I haven't given a possible implementation full thought, it seems maybe
      possible but not worth the effort right now. The only certain thing is
      that currently the tagger won't be able to figure out the source port
      for these packets because they will come with the VLAN ID of the 8021q
      upper and are no longer retagged to a tag_8021q sub-VLAN like the best
      effort VLAN filtering code used to do. So just deny these for the
      moment.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4fbc08bd
    • V
      net: dsa: sja1105: delete vlan delta save/restore logic · 6dfd23d3
      Vladimir Oltean 提交于
      With the best_effort_vlan_filtering mode now gone, the driver does not
      have 3 operating modes anymore (VLAN-unaware, VLAN-aware and best effort),
      but only 2.
      
      The idea is that we will gain support for network stack I/O through a
      VLAN-aware bridge, using the data plane offload framework (imprecise RX,
      imprecise TX). So the VLAN-aware use case will be more functional.
      
      But standalone ports that are part of the same switch when some other
      ports are under a VLAN-aware bridge should work too. Termination on
      those should work through the tag_8021q RX VLAN and TX VLAN.
      
      This was not possible using the old logic, because:
      - in VLAN-unaware mode, only the tag_8021q VLANs were committed to hw
      - in VLAN-aware mode, only the bridge VLANs were committed to hw
      - in best-effort VLAN mode, both the tag_8021q and bridge VLANs were
        committed to hw
      
      The strategy for the new VLAN-aware mode is to allow the bridge and the
      tag_8021q VLANs to coexist in the VLAN table at the same time.
      
      [ yes, we need to make sure that the bridge cannot install a tag_8021q
        VLAN, but ]
      
      This means that the save/restore logic introduced by commit ec5ae610
      ("net: dsa: sja1105: save/restore VLANs using a delta commit method")
      does not serve a purpose any longer. We can delete it and restore the
      old code that simply adds a VLAN to the VLAN table and calls it a day.
      
      Note that we keep the sja1105_commit_pvid() function from those days,
      but adapt it slightly. Ports that are under a VLAN-aware bridge use the
      bridge's pvid, ports that are standalone or under a VLAN-unaware bridge
      use the tag_8021q pvid, for local termination or VLAN-unaware forwarding.
      
      Now, when the vlan_filtering property is toggled for the bridge, the
      pvid of the ports beneath it is the only thing that's changing, we no
      longer delete some VLANs and restore others.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6dfd23d3