1. 09 10月, 2021 1 次提交
    • V
      net: dsa: mv88e6xxx: keep the pvid at 0 when VLAN-unaware · 8b6836d8
      Vladimir Oltean 提交于
      The VLAN support in mv88e6xxx has a loaded history. Commit 2ea7a679
      ("net: dsa: Don't add vlans when vlan filtering is disabled") noticed
      some issues with VLAN and decided the best way to deal with them was to
      make the DSA core ignore VLANs added by the bridge while VLAN awareness
      is turned off. Those issues were never explained, just presented as
      "at least one corner case".
      
      That approach had problems of its own, presented by
      commit 54a0ed0d ("net: dsa: provide an option for drivers to always
      receive bridge VLANs") for the DSA core, followed by
      commit 1fb74191 ("net: dsa: mv88e6xxx: fix vlan setup") which
      applied ds->configure_vlan_while_not_filtering = true for mv88e6xxx in
      particular.
      
      We still don't know what corner case Andrew saw when he wrote
      commit 2ea7a679 ("net: dsa: Don't add vlans when vlan filtering is
      disabled"), but Tobias now reports that when we use TX forwarding
      offload, pinging an external station from the bridge device is broken if
      the front-facing DSA user port has flooding turned off. The full
      description is in the link below, but for short, when a mv88e6xxx port
      is under a VLAN-unaware bridge, it inherits that bridge's pvid.
      So packets ingressing a user port will be classified to e.g. VID 1
      (assuming that value for the bridge_default_pvid), whereas when
      tag_dsa.c xmits towards a user port, it always sends packets using a VID
      of 0 if that port is standalone or under a VLAN-unaware bridge - or at
      least it did so prior to commit d82f8ab0 ("net: dsa: tag_dsa:
      offload the bridge forwarding process").
      
      In any case, when there is a conversation between the CPU and a station
      connected to a user port, the station's MAC address is learned in VID 1
      but the CPU tries to transmit through VID 0. The packets reach the
      intended station, but via flooding and not by virtue of matching the
      existing ATU entry.
      
      DSA has established (and enforced in other drivers: sja1105, felix,
      mt7530) that a VLAN-unaware port should use a private pvid, and not
      inherit the one from the bridge. The bridge's pvid should only be
      inherited when that bridge is VLAN-aware, so all state transitions need
      to be handled. On the other hand, all bridge VLANs should sit in the VTU
      starting with the moment when the bridge offloads them via switchdev,
      they are just not used.
      
      This solves the problem that Tobias sees because packets ingressing on
      VLAN-unaware user ports now get classified to VID 0, which is also the
      VID used by tag_dsa.c on xmit.
      
      Fixes: d82f8ab0 ("net: dsa: tag_dsa: offload the bridge forwarding process")
      Link: https://patchwork.kernel.org/project/netdevbpf/patch/20211003222312.284175-2-vladimir.oltean@nxp.com/#24491503Reported-by: NTobias Waldekranz <tobias@waldekranz.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      8b6836d8
  2. 27 9月, 2021 3 次提交
  3. 21 9月, 2021 1 次提交
    • V
      net: dsa: realtek: register the MDIO bus under devres · 74b6d7d1
      Vladimir Oltean 提交于
      The Linux device model permits both the ->shutdown and ->remove driver
      methods to get called during a shutdown procedure. Example: a DSA switch
      which sits on an SPI bus, and the SPI bus driver calls this on its
      ->shutdown method:
      
      spi_unregister_controller
      -> device_for_each_child(&ctlr->dev, NULL, __unregister);
         -> spi_unregister_device(to_spi_device(dev));
            -> device_del(&spi->dev);
      
      So this is a simple pattern which can theoretically appear on any bus,
      although the only other buses on which I've been able to find it are
      I2C:
      
      i2c_del_adapter
      -> device_for_each_child(&adap->dev, NULL, __unregister_client);
         -> i2c_unregister_device(client);
            -> device_unregister(&client->dev);
      
      The implication of this pattern is that devices on these buses can be
      unregistered after having been shut down. The drivers for these devices
      might choose to return early either from ->remove or ->shutdown if the
      other callback has already run once, and they might choose that the
      ->shutdown method should only perform a subset of the teardown done by
      ->remove (to avoid unnecessary delays when rebooting).
      
      So in other words, the device driver may choose on ->remove to not
      do anything (therefore to not unregister an MDIO bus it has registered
      on ->probe), because this ->remove is actually triggered by the
      device_shutdown path, and its ->shutdown method has already run and done
      the minimally required cleanup.
      
      This used to be fine until the blamed commit, but now, the following
      BUG_ON triggers:
      
      void mdiobus_free(struct mii_bus *bus)
      {
      	/* For compatibility with error handling in drivers. */
      	if (bus->state == MDIOBUS_ALLOCATED) {
      		kfree(bus);
      		return;
      	}
      
      	BUG_ON(bus->state != MDIOBUS_UNREGISTERED);
      	bus->state = MDIOBUS_RELEASED;
      
      	put_device(&bus->dev);
      }
      
      In other words, there is an attempt to free an MDIO bus which was not
      unregistered. The attempt to free it comes from the devres release
      callbacks of the SPI device, which are executed after the device is
      unregistered.
      
      I'm not saying that the fact that MDIO buses allocated using devres
      would automatically get unregistered wasn't strange. I'm just saying
      that the commit didn't care about auditing existing call paths in the
      kernel, and now, the following code sequences are potentially buggy:
      
      (a) devm_mdiobus_alloc followed by plain mdiobus_register, for a device
          located on a bus that unregisters its children on shutdown. After
          the blamed patch, either both the alloc and the register should use
          devres, or none should.
      
      (b) devm_mdiobus_alloc followed by plain mdiobus_register, and then no
          mdiobus_unregister at all in the remove path. After the blamed
          patch, nobody unregisters the MDIO bus anymore, so this is even more
          buggy than the previous case which needs a specific bus
          configuration to be seen, this one is an unconditional bug.
      
      In this case, the Realtek drivers fall under category (b). To solve it,
      we can register the MDIO bus under devres too, which restores the
      previous behavior.
      
      Fixes: ac3a68d5 ("net: phy: don't abuse devres in devm_mdiobus_register()")
      Reported-by: NLino Sanfilippo <LinoSanfilippo@gmx.de>
      Reported-by: NAlvin Šipraga <alsi@bang-olufsen.dk>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      74b6d7d1
  4. 19 9月, 2021 5 次提交
    • V
      net: dsa: tear down devlink port regions when tearing down the devlink port on error · fd292c18
      Vladimir Oltean 提交于
      Commit 86f8b1c0 ("net: dsa: Do not make user port errors fatal")
      decided it was fine to ignore errors on certain ports that fail to
      probe, and go on with the ports that do probe fine.
      
      Commit fb6ec87f ("net: dsa: Fix type was not set for devlink port")
      noticed that devlink_port_type_eth_set(dlp, dp->slave); does not get
      called, and devlink notices after a timeout of 3600 seconds and prints a
      WARN_ON. So it went ahead to unregister the devlink port. And because
      there exists an UNUSED port flavour, we actually re-register the devlink
      port as UNUSED.
      
      Commit 08156ba4 ("net: dsa: Add devlink port regions support to
      DSA") added devlink port regions, which are set up by the driver and not
      by DSA.
      
      When we trigger the devlink port deregistration and reregistration as
      unused, devlink now prints another WARN_ON, from here:
      
      devlink_port_unregister:
      	WARN_ON(!list_empty(&devlink_port->region_list));
      
      So the port still has regions, which makes sense, because they were set
      up by the driver, and the driver doesn't know we're unregistering the
      devlink port.
      
      Somebody needs to tear them down, and optionally (actually it would be
      nice, to be consistent) set them up again for the new devlink port.
      
      But DSA's layering stays in our way quite badly here.
      
      The options I've considered are:
      
      1. Introduce a function in devlink to just change a port's type and
         flavour. No dice, devlink keeps a lot of state, it really wants the
         port to not be registered when you set its parameters, so changing
         anything can only be done by destroying what we currently have and
         recreating it.
      
      2. Make DSA cache the parameters passed to dsa_devlink_port_region_create,
         and the region returned, keep those in a list, then when the devlink
         port unregister needs to take place, the existing devlink regions are
         destroyed by DSA, and we replay the creation of new regions using the
         cached parameters. Problem: mv88e6xxx keeps the region pointers in
         chip->ports[port].region, and these will remain stale after DSA frees
         them. There are many things DSA can do, but updating mv88e6xxx's
         private pointers is not one of them.
      
      3. Just let the driver do it (i.e. introduce a very specific method
         called ds->ops->port_reinit_as_unused, which unregisters its devlink
         port devlink regions, then the old devlink port, then registers the
         new one, then the devlink port regions for it). While it does work,
         as opposed to the others, it's pretty horrible from an API
         perspective and we can do better.
      
      4. Introduce a new pair of methods, ->port_setup and ->port_teardown,
         which in the case of mv88e6xxx must register and unregister the
         devlink port regions. Call these 2 methods when the port must be
         reinitialized as unused.
      
      Naturally, I went for the 4th approach.
      
      Fixes: 08156ba4 ("net: dsa: Add devlink port regions support to DSA")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd292c18
    • V
      net: dsa: xrs700x: be compatible with masters which unregister on shutdown · a68e9da4
      Vladimir Oltean 提交于
      Since commit 2f1e8ea7 ("net: dsa: link interfaces with the DSA
      master to get rid of lockdep warnings"), DSA gained a requirement which
      it did not fulfill, which is to unlink itself from the DSA master at
      shutdown time.
      
      Since the Arrow SpeedChips XRS700x driver was introduced after the bad
      commit, it has never worked with DSA masters which decide to unregister
      their net_device on shutdown, effectively hanging the reboot process.
      To fix that, we need to call dsa_switch_shutdown.
      
      These devices can be connected by I2C or by MDIO, and if I search for
      I2C or MDIO bus drivers that implement their ->shutdown by redirecting
      it to ->remove I don't see any, however this does not mean it would not
      be possible. To be compatible with that pattern, it is necessary to
      implement an "if this then not that" scheme, to avoid ->remove and
      ->shutdown from being called both for the same struct device.
      
      Fixes: ee00b24f ("net: dsa: add Arrow SpeedChips XRS700x driver")
      Link: https://lore.kernel.org/netdev/20210909095324.12978-1-LinoSanfilippo@gmx.de/Reported-by: NLino Sanfilippo <LinoSanfilippo@gmx.de>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NGeorge McCollister <george.mccollister@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a68e9da4
    • V
      net: dsa: microchip: ksz8863: be compatible with masters which unregister on shutdown · fe405307
      Vladimir Oltean 提交于
      Since commit 2f1e8ea7 ("net: dsa: link interfaces with the DSA
      master to get rid of lockdep warnings"), DSA gained a requirement which
      it did not fulfill, which is to unlink itself from the DSA master at
      shutdown time.
      
      Since the Microchip sub-driver for KSZ8863 was introduced after the bad
      commit, it has never worked with DSA masters which decide to unregister
      their net_device on shutdown, effectively hanging the reboot process.
      To fix that, we need to call dsa_switch_shutdown.
      
      Since this driver expects the MDIO bus to be backed by mdio_bitbang, I
      don't think there is currently any MDIO bus driver which implements its
      ->shutdown by redirecting it to ->remove, but in any case, to be
      compatible with that pattern, it is necessary to implement an "if this
      then not that" scheme, to avoid ->remove and ->shutdown from being
      called both for the same struct device.
      
      Fixes: 60a36476 ("net: dsa: microchip: Add Microchip KSZ8863 SMI based driver support")
      Link: https://lore.kernel.org/netdev/20210909095324.12978-1-LinoSanfilippo@gmx.de/Reported-by: NLino Sanfilippo <LinoSanfilippo@gmx.de>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe405307
    • V
      net: dsa: hellcreek: be compatible with masters which unregister on shutdown · 46baae56
      Vladimir Oltean 提交于
      Since commit 2f1e8ea7 ("net: dsa: link interfaces with the DSA
      master to get rid of lockdep warnings"), DSA gained a requirement which
      it did not fulfill, which is to unlink itself from the DSA master at
      shutdown time.
      
      Since the hellcreek driver was introduced after the bad commit, it has
      never worked with DSA masters which decide to unregister their
      net_device on shutdown, effectively hanging the reboot process.
      
      Hellcreek is a platform device driver, so we probably cannot have the
      oddities of ->shutdown and ->remove getting both called for the exact
      same struct device. But to be in line with the pattern from the other
      device drivers which are on slow buses, implement the same "if this then
      not that" pattern of either running the ->shutdown or the ->remove hook.
      The driver's current ->remove implementation makes that very easy
      because it already zeroes out its device_drvdata on ->remove.
      
      Fixes: e4b27ebc ("net: dsa: Add DSA driver for Hirschmann Hellcreek switches")
      Link: https://lore.kernel.org/netdev/20210909095324.12978-1-LinoSanfilippo@gmx.de/Reported-by: NLino Sanfilippo <LinoSanfilippo@gmx.de>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: NKurt Kanzenbach <kurt@linutronix.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      46baae56
    • V
      net: dsa: be compatible with masters which unregister on shutdown · 0650bf52
      Vladimir Oltean 提交于
      Lino reports that on his system with bcmgenet as DSA master and KSZ9897
      as a switch, rebooting or shutting down never works properly.
      
      What does the bcmgenet driver have special to trigger this, that other
      DSA masters do not? It has an implementation of ->shutdown which simply
      calls its ->remove implementation. Otherwise said, it unregisters its
      network interface on shutdown.
      
      This message can be seen in a loop, and it hangs the reboot process there:
      
      unregister_netdevice: waiting for eth0 to become free. Usage count = 3
      
      So why 3?
      
      A usage count of 1 is normal for a registered network interface, and any
      virtual interface which links itself as an upper of that will increment
      it via dev_hold. In the case of DSA, this is the call path:
      
      dsa_slave_create
      -> netdev_upper_dev_link
         -> __netdev_upper_dev_link
            -> __netdev_adjacent_dev_insert
               -> dev_hold
      
      So a DSA switch with 3 interfaces will result in a usage count elevated
      by two, and netdev_wait_allrefs will wait until they have gone away.
      
      Other stacked interfaces, like VLAN, watch NETDEV_UNREGISTER events and
      delete themselves, but DSA cannot just vanish and go poof, at most it
      can unbind itself from the switch devices, but that must happen strictly
      earlier compared to when the DSA master unregisters its net_device, so
      reacting on the NETDEV_UNREGISTER event is way too late.
      
      It seems that it is a pretty established pattern to have a driver's
      ->shutdown hook redirect to its ->remove hook, so the same code is
      executed regardless of whether the driver is unbound from the device, or
      the system is just shutting down. As Florian puts it, it is quite a big
      hammer for bcmgenet to unregister its net_device during shutdown, but
      having a common code path with the driver unbind helps ensure it is well
      tested.
      
      So DSA, for better or for worse, has to live with that and engage in an
      arms race of implementing the ->shutdown hook too, from all individual
      drivers, and do something sane when paired with masters that unregister
      their net_device there. The only sane thing to do, of course, is to
      unlink from the master.
      
      However, complications arise really quickly.
      
      The pattern of redirecting ->shutdown to ->remove is not unique to
      bcmgenet or even to net_device drivers. In fact, SPI controllers do it
      too (see dspi_shutdown -> dspi_remove), and presumably, I2C controllers
      and MDIO controllers do it too (this is something I have not researched
      too deeply, but even if this is not the case today, it is certainly
      plausible to happen in the future, and must be taken into consideration).
      
      Since DSA switches might be SPI devices, I2C devices, MDIO devices, the
      insane implication is that for the exact same DSA switch device, we
      might have both ->shutdown and ->remove getting called.
      
      So we need to do something with that insane environment. The pattern
      I've come up with is "if this, then not that", so if either ->shutdown
      or ->remove gets called, we set the device's drvdata to NULL, and in the
      other hook, we check whether the drvdata is NULL and just do nothing.
      This is probably not necessary for platform devices, just for devices on
      buses, but I would really insist for consistency among drivers, because
      when code is copy-pasted, it is not always copy-pasted from the best
      sources.
      
      So depending on whether the DSA switch's ->remove or ->shutdown will get
      called first, we cannot really guarantee even for the same driver if
      rebooting will result in the same code path on all platforms. But
      nonetheless, we need to do something minimally reasonable on ->shutdown
      too to fix the bug. Of course, the ->remove will do more (a full
      teardown of the tree, with all data structures freed, and this is why
      the bug was not caught for so long). The new ->shutdown method is kept
      separate from dsa_unregister_switch not because we couldn't have
      unregistered the switch, but simply in the interest of doing something
      quick and to the point.
      
      The big question is: does the DSA switch's ->shutdown get called earlier
      than the DSA master's ->shutdown? If not, there is still a risk that we
      might still trigger the WARN_ON in unregister_netdevice that says we are
      attempting to unregister a net_device which has uppers. That's no good.
      Although the reference to the master net_device won't physically go away
      even if DSA's ->shutdown comes afterwards, remember we have a dev_hold
      on it.
      
      The answer to that question lies in this comment above device_link_add:
      
       * A side effect of the link creation is re-ordering of dpm_list and the
       * devices_kset list by moving the consumer device and all devices depending
       * on it to the ends of these lists (that does not happen to devices that have
       * not been registered when this function is called).
      
      so the fact that DSA uses device_link_add towards its master is not
      exactly for nothing. device_shutdown() walks devices_kset from the back,
      so this is our guarantee that DSA's shutdown happens before the master's
      shutdown.
      
      Fixes: 2f1e8ea7 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings")
      Link: https://lore.kernel.org/netdev/20210909095324.12978-1-LinoSanfilippo@gmx.de/Reported-by: NLino Sanfilippo <LinoSanfilippo@gmx.de>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0650bf52
  5. 17 9月, 2021 2 次提交
  6. 13 9月, 2021 1 次提交
  7. 12 9月, 2021 1 次提交
  8. 06 9月, 2021 1 次提交
    • R
      net: dsa: b53: Fix IMP port setup on BCM5301x · 63f8428b
      Rafał Miłecki 提交于
      Broadcom's b53 switches have one IMP (Inband Management Port) that needs
      to be programmed using its own designed register. IMP port may be
      different than CPU port - especially on devices with multiple CPU ports.
      
      For that reason it's required to explicitly note IMP port index and
      check for it when choosing a register to use.
      
      This commit fixes BCM5301x support. Those switches use CPU port 5 while
      their IMP port is 8. Before this patch b53 was trying to program port 5
      with B53_PORT_OVERRIDE_CTRL instead of B53_GMII_PORT_OVERRIDE_CTRL(5).
      
      It may be possible to also replace "cpu_port" usages with
      dsa_is_cpu_port() but that is out of the scope of thix BCM5301x fix.
      
      Fixes: 967dd82f ("net: dsa: b53: Add support for Broadcom RoboSwitch")
      Signed-off-by: NRafał Miłecki <rafal@milecki.pl>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      63f8428b
  9. 02 9月, 2021 3 次提交
  10. 26 8月, 2021 2 次提交
  11. 25 8月, 2021 4 次提交
    • V
      net: dsa: tag_sja1105: stop asking the sja1105 driver in sja1105_xmit_tpid · 8ded9160
      Vladimir Oltean 提交于
      Introduced in commit 38b5beea ("net: dsa: sja1105: prepare tagger
      for handling DSA tags and VLAN simultaneously"), the sja1105_xmit_tpid
      function solved quite a different problem than our needs are now.
      
      Then, we used best-effort VLAN filtering and we were using the xmit_tpid
      to tunnel packets coming from an 8021q upper through the TX VLAN allocated
      by tag_8021q to that egress port. The need for a different VLAN protocol
      depending on switch revision came from the fact that this in itself was
      more of a hack to trick the hardware into accepting tunneled VLANs in
      the first place.
      
      Right now, we deny 8021q uppers (see sja1105_prechangeupper). Even if we
      supported them again, we would not do that using the same method of
      {tunneling the VLAN on egress, retagging the VLAN on ingress} that we
      had in the best-effort VLAN filtering mode. It seems rather simpler that
      we just allocate a VLAN in the VLAN table that is simply not used by the
      bridge at all, or by any other port.
      
      Anyway, I have 2 gripes with the current sja1105_xmit_tpid:
      
      1. When sending packets on behalf of a VLAN-aware bridge (with the new
         TX forwarding offload framework) plus untagged (with the tag_8021q
         VLAN added by the tagger) packets, we can see that on SJA1105P/Q/R/S
         and later (which have a qinq_tpid of ETH_P_8021AD), some packets sent
         through the DSA master have a VLAN protocol of 0x8100 and others of
         0x88a8. This is strange and there is no reason for it now. If we have
         a bridge and are therefore forced to send using that bridge's TPID,
         we can as well blend with that bridge's VLAN protocol for all packets.
      
      2. The sja1105_xmit_tpid introduces a dependency on the sja1105 driver,
         because it looks inside dp->priv. It is desirable to keep as much
         separation between taggers and switch drivers as possible. Now it
         doesn't do that anymore.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ded9160
    • V
      net: dsa: sja1105: drop untagged packets on the CPU and DSA ports · b0b8c67e
      Vladimir Oltean 提交于
      The sja1105 driver is a bit special in its use of VLAN headers as DSA
      tags. This is because in VLAN-aware mode, the VLAN headers use an actual
      TPID of 0x8100, which is understood even by the DSA master as an actual
      VLAN header.
      
      Furthermore, control packets such as PTP and STP are transmitted with no
      VLAN header as a DSA tag, because, depending on switch generation, there
      are ways to steer these control packets towards a precise egress port
      other than VLAN tags. Transmitting control packets as untagged means
      leaving a door open for traffic in general to be transmitted as untagged
      from the DSA master, and for it to traverse the switch and exit a random
      switch port according to the FDB lookup.
      
      This behavior is a bit out of line with other DSA drivers which have
      native support for DSA tagging. There, it is to be expected that the
      switch only accepts DSA-tagged packets on its CPU port, dropping
      everything that does not match this pattern.
      
      We perhaps rely a bit too much on the switches' hardware dropping on the
      CPU port, and place no other restrictions in the kernel data path to
      avoid that. For example, sja1105 is also a bit special in that STP/PTP
      packets are transmitted using "management routes"
      (sja1105_port_deferred_xmit): when sending a link-local packet from the
      CPU, we must first write a SPI message to the switch to tell it to
      expect a packet towards multicast MAC DA 01-80-c2-00-00-0e, and to route
      it towards port 3 when it gets it. This entry expires as soon as it
      matches a packet received by the switch, and it needs to be reinstalled
      for the next packet etc. All in all quite a ghetto mechanism, but it is
      all that the sja1105 switches offer for injecting a control packet.
      The driver takes a mutex for serializing control packets and making the
      pairs of SPI writes of a management route and its associated skb atomic,
      but to be honest, a mutex is only relevant as long as all parties agree
      to take it. With the DSA design, it is possible to open an AF_PACKET
      socket on the DSA master net device, and blast packets towards
      01-80-c2-00-00-0e, and whatever locking the DSA switch driver might use,
      it all goes kaput because management routes installed by the driver will
      match skbs sent by the DSA master, and not skbs generated by the driver
      itself. So they will end up being routed on the wrong port.
      
      So through the lens of that, maybe it would make sense to avoid that
      from happening by doing something in the network stack, like: introduce
      a new bit in struct sk_buff, like xmit_from_dsa. Then, somewhere around
      dev_hard_start_xmit(), introduce the following check:
      
      	if (netdev_uses_dsa(dev) && !skb->xmit_from_dsa)
      		kfree_skb(skb);
      
      Ok, maybe that is a bit drastic, but that would at least prevent a bunch
      of problems. For example, right now, even though the majority of DSA
      switches drop packets without DSA tags sent by the DSA master (and
      therefore the majority of garbage that user space daemons like avahi and
      udhcpcd and friends create), it is still conceivable that an aggressive
      user space program can open an AF_PACKET socket and inject a spoofed DSA
      tag directly on the DSA master. We have no protection against that; the
      packet will be understood by the switch and be routed wherever user
      space says. Furthermore: there are some DSA switches where we even have
      register access over Ethernet, using DSA tags. So even user space
      drivers are possible in this way. This is a huge hole.
      
      However, the biggest thing that bothers me is that udhcpcd attempts to
      ask for an IP address on all interfaces by default, and with sja1105, it
      will attempt to get a valid IP address on both the DSA master as well as
      on sja1105 switch ports themselves. So with IP addresses in the same
      subnet on multiple interfaces, the routing table will be messed up and
      the system will be unusable for traffic until it is configured manually
      to not ask for an IP address on the DSA master itself.
      
      It turns out that it is possible to avoid that in the sja1105 driver, at
      least very superficially, by requesting the switch to drop VLAN-untagged
      packets on the CPU port. With the exception of control packets, all
      traffic originated from tag_sja1105.c is already VLAN-tagged, so only
      STP and PTP packets need to be converted. For that, we need to uphold
      the equivalence between an untagged and a pvid-tagged packet, and to
      remember that the CPU port of sja1105 uses a pvid of 4095.
      
      Now that we drop untagged traffic on the CPU port, non-aggressive user
      space applications like udhcpcd stop bothering us, and sja1105 effectively
      becomes just as vulnerable to the aggressive kind of user space programs
      as other DSA switches are (ok, users can also create 8021q uppers on top
      of the DSA master in the case of sja1105, but in future patches we can
      easily deny that, but it still doesn't change the fact that VLAN-tagged
      packets can still be injected over raw sockets).
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b0b8c67e
    • V
      net: dsa: sja1105: prevent tag_8021q VLANs from being received on user ports · 73ceab83
      Vladimir Oltean 提交于
      Currently it is possible for an attacker to craft packets with a fake
      DSA tag and send them to us, and our user ports will accept them and
      preserve that VLAN when transmitting towards the CPU. Then the tagger
      will be misled into thinking that the packets came on a different port
      than they really came on.
      
      Up until recently there wasn't a good option to prevent this from
      happening. In SJA1105P and later, the MAC Configuration Table introduced
      two options called:
      - DRPSITAG: Drop Single Inner Tagged Frames
      - DRPSOTAG: Drop Single Outer Tagged Frames
      
      Because the sja1105 driver classifies all VLANs as "outer VLANs" (S-Tags),
      it would be in principle possible to enable the DRPSOTAG bit on ports
      using tag_8021q, and drop on ingress all packets which have a VLAN tag.
      When the switch is VLAN-unaware, this works, because it uses a custom
      TPID of 0xdadb, so any "tagged" packets received on a user port are
      probably a spoofing attempt. But when the switch overall is VLAN-aware,
      and some ports are standalone (therefore they use tag_8021q), the TPID
      is 0x8100, and the port can receive a mix of untagged and VLAN-tagged
      packets. The untagged ones will be classified to the tag_8021q pvid, and
      the tagged ones to the VLAN ID from the packet header. Yes, it is true
      that since commit 4fbc08bd ("net: dsa: sja1105: deny 8021q uppers on
      ports") we no longer support this mixed mode, but that is a temporary
      limitation which will eventually be lifted. It would be nice to not
      introduce one more restriction via DRPSOTAG, which would make the
      standalone ports of a VLAN-aware switch drop genuinely VLAN-tagged
      packets.
      
      Also, the DRPSOTAG bit is not available on the first generation of
      switches (SJA1105E, SJA1105T). So since one of the key features of this
      driver is compatibility across switch generations, this makes it an even
      less desirable approach.
      
      The breakthrough comes from commit bef0746c ("net: dsa: sja1105:
      make sure untagged packets are dropped on ingress ports with no pvid"),
      where it became obvious that untagged packets are not dropped even if
      the ingress port is not in the VMEMB_PORT vector of that port's pvid.
      However, VLAN-tagged packets are subject to VLAN ingress
      checking/dropping. This means that instead of using the catch-all
      DRPSOTAG bit introduced in SJA1105P, we can drop tagged packets on a
      per-VLAN basis, and this is already compatible with SJA1105E/T.
      
      This patch adds an "allowed_ingress" argument to sja1105_vlan_add(), and
      we call it with "false" for tag_8021q VLANs on user ports. The tag_8021q
      VLANs still need to be allowed, of course, on ingress to DSA ports and
      CPU ports.
      
      We also need to refine the drop_untagged check in sja1105_commit_pvid to
      make it not freak out about this new configuration. Currently it will
      try to keep the configuration consistent between untagged and pvid-tagged
      packets, so if the pvid of a port is 1 but VLAN 1 is not in VMEMB_PORT,
      packets tagged with VID 1 will behave the same as untagged packets, and
      be dropped. This behavior is what we want for ports under a VLAN-aware
      bridge, but for the ports with a tag_8021q pvid, we want untagged
      packets to be accepted, but packets tagged with a header recognized by
      the switch as a tag_8021q VLAN to be dropped. So only restrict the
      drop_untagged check to apply to the bridge_pvid, not to the tag_8021q_pvid.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73ceab83
    • D
      net: dsa: mt7530: manually set up VLAN ID 0 · 1ca8a193
      DENG Qingfang 提交于
      The driver was relying on dsa_slave_vlan_rx_add_vid to add VLAN ID 0. After
      the blamed commit, VLAN ID 0 won't be set up anymore, breaking software
      bridging fallback on VLAN-unaware bridges.
      
      Manually set up VLAN ID 0 to fix this.
      
      Fixes: 06cfb2df ("net: dsa: don't advertise 'rx-vlan-filter' when not needed")
      Signed-off-by: NDENG Qingfang <dqfext@gmail.com>
      Reviewed-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ca8a193
  12. 24 8月, 2021 2 次提交
    • N
      net: dsa: mv88e6xxx: Update mv88e6393x serdes errata · 3b0720ba
      Nathan Rossi 提交于
      In early erratas this issue only covered port 0 when changing from
      [x]MII (rev A 3.6). In subsequent errata versions this errata changed to
      cover the additional "Hardware reset in CPU managed mode" condition, and
      removed the note specifying that it only applied to port 0.
      
      In designs where the device is configured with CPU managed mode
      (CPU_MGD), on reset all SERDES ports (p0, p9, p10) have a stuck power
      down bit and require this initial power up procedure. As such apply this
      errata to all three SERDES ports of the mv88e6393x.
      Signed-off-by: NNathan Rossi <nathan.rossi@digi.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b0720ba
    • V
      net: dsa: let drivers state that they need VLAN filtering while standalone · 58adf9dc
      Vladimir Oltean 提交于
      As explained in commit e358bef7 ("net: dsa: Give drivers the chance
      to veto certain upper devices"), the hellcreek driver uses some tricks
      to comply with the network stack expectations: it enforces port
      separation in standalone mode using VLANs. For untagged traffic,
      bridging between ports is prevented by using different PVIDs, and for
      VLAN-tagged traffic, it never accepts 8021q uppers with the same VID on
      two ports, so packets with one VLAN cannot leak from one port to another.
      
      That is almost fine*, and has worked because hellcreek relied on an
      implicit behavior of the DSA core that was changed by the previous
      patch: the standalone ports declare the 'rx-vlan-filter' feature as 'on
      [fixed]'. Since most of the DSA drivers are actually VLAN-unaware in
      standalone mode, that feature was actually incorrectly reflecting the
      hardware/driver state, so there was a desire to fix it. This leaves the
      hellcreek driver in a situation where it has to explicitly request this
      behavior from the DSA framework.
      
      We configure the ports as follows:
      
      - Standalone: 'rx-vlan-filter' is on. An 8021q upper on top of a
        standalone hellcreek port will go through dsa_slave_vlan_rx_add_vid
        and will add a VLAN to the hardware tables, giving the driver the
        opportunity to refuse it through .port_prechangeupper.
      
      - Bridged with vlan_filtering=0: 'rx-vlan-filter' is off. An 8021q upper
        on top of a bridged hellcreek port will not go through
        dsa_slave_vlan_rx_add_vid, because there will not be any attempt to
        offload this VLAN. The driver already disables VLAN awareness, so that
        upper should receive the traffic it needs.
      
      - Bridged with vlan_filtering=1: 'rx-vlan-filter' is on. An 8021q upper
        on top of a bridged hellcreek port will call dsa_slave_vlan_rx_add_vid,
        and can again be vetoed through .port_prechangeupper.
      
      *It is not actually completely fine, because if I follow through
      correctly, we can have the following situation:
      
      ip link add br0 type bridge vlan_filtering 0
      ip link set lan0 master br0 # lan0 now becomes VLAN-unaware
      ip link set lan0 nomaster # lan0 fails to become VLAN-aware again, therefore breaking isolation
      
      This patch fixes that corner case by extending the DSA core logic, based
      on this requested attribute, to change the VLAN awareness state of the
      switch (port) when it leaves the bridge.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: NKurt Kanzenbach <kurt@linutronix.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58adf9dc
  13. 20 8月, 2021 2 次提交
  14. 18 8月, 2021 2 次提交
    • V
      net: dsa: tag_sja1105: be dsa_loop-safe · 994d2cbb
      Vladimir Oltean 提交于
      Add support for tag_sja1105 running on non-sja1105 DSA ports, by making
      sure that every time we dereference dp->priv, we check the switch's
      dsa_switch_ops (otherwise we access a struct sja1105_port structure that
      is in fact something else).
      
      This adds an unconditional build-time dependency between sja1105 being
      built as module => tag_sja1105 must also be built as module. This was
      there only for PTP before.
      
      Some sane defaults must also take place when not running on sja1105
      hardware. These are:
      
      - sja1105_xmit_tpid: the sja1105 driver uses different VLAN protocols
        depending on VLAN awareness and switch revision (when an encapsulated
        VLAN must be sent). Default to 0x8100.
      
      - sja1105_rcv_meta_state_machine: this aggregates PTP frames with their
        metadata timestamp frames. When running on non-sja1105 hardware, don't
        do that and accept all frames unmodified.
      
      - sja1105_defer_xmit: calls sja1105_port_deferred_xmit in sja1105_main.c
        which writes a management route over SPI. When not running on sja1105
        hardware, bypass the SPI write and send the frame as-is.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      994d2cbb
    • V
      net: dsa: sja1105: fix use-after-free after calling of_find_compatible_node, or worse · ed5d2937
      Vladimir Oltean 提交于
      It seems that of_find_compatible_node has a weird calling convention in
      which it calls of_node_put() on the "from" node argument, instead of
      leaving that up to the caller. This comes from the fact that
      of_find_compatible_node with a non-NULL "from" argument it only supposed
      to be used as the iterator function of for_each_compatible_node(). OF
      iterator functions call of_node_get on the next OF node and of_node_put()
      on the previous one.
      
      When of_find_compatible_node calls of_node_put, it actually never
      expects the refcount to drop to zero, because the call is done under the
      atomic devtree_lock context, and when the refcount drops to zero it
      triggers a kobject and a sysfs file deletion, which assume blocking
      context.
      
      So any driver call to of_find_compatible_node is probably buggy because
      an unexpected of_node_put() takes place.
      
      What should be done is to use the of_get_compatible_child() function.
      
      Fixes: 5a8f0974 ("net: dsa: sja1105: register the MDIO buses for 100base-T1 and 100base-TX")
      Link: https://lore.kernel.org/netdev/20210814010139.kzryimmp4rizlznt@skbuf/Suggested-by: NFrank Rowand <frowand.list@gmail.com>
      Suggested-by: NRob Herring <robh+dt@kernel.org>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ed5d2937
  15. 16 8月, 2021 3 次提交
    • V
      net: dsa: sja1105: reorganize probe, remove, setup and teardown ordering · 022522ac
      Vladimir Oltean 提交于
      The sja1105 driver's initialization and teardown sequence is a chaotic
      mess that has gathered a lot of cruft over time. It works because there
      is no strict dependency between the functions, but it could be improved.
      
      The basic principle that teardown should be the exact reverse of setup
      is obviously not held. We have initialization steps (sja1105_tas_setup,
      sja1105_flower_setup) in the probe method that are torn down in the DSA
      .teardown method instead of driver unbind time.
      
      We also have code after the dsa_register_switch() call, which implicitly
      means after the .setup() method has finished, which is pretty unusual.
      
      Also, sja1105_teardown() has calls set up in a different order than the
      error path of sja1105_setup(): see the reversed ordering between
      sja1105_ptp_clock_unregister and sja1105_mdiobus_unregister.
      
      Also, sja1105_static_config_load() is called towards the end of
      sja1105_setup(), but sja1105_static_config_free() is also towards the
      end of the error path and teardown path. The static_config_load() call
      should be earlier.
      
      Also, making and breaking the connections between struct sja1105_port
      and struct dsa_port could be refactored into dedicated functions, makes
      the code easier to follow.
      
      We move some code from the DSA .setup() method into the probe method,
      like the device tree parsing, and we move some code from the probe
      method into the DSA .setup() method to be symmetric with its placement
      in the DSA .teardown() method, which is nice because the unbind function
      has a single call to dsa_unregister_switch(). Example of the latter type
      of code movement are the connections between ports mentioned above, they
      are now in the .setup() method.
      
      Finally, due to fact that the kthread_init_worker() call is no longer
      in sja1105_probe() - located towards the bottom of the file - but in
      sja1105_setup() - located much higher - there is an inverse ordering
      with the worker function declaration, sja1105_port_deferred_xmit. To
      avoid that, the entire sja1105_setup() and sja1105_teardown() functions
      are moved towards the bottom of the file.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      022522ac
    • V
      net: mscc: ocelot: convert to phylink · e6e12df6
      Vladimir Oltean 提交于
      The felix DSA driver, which is a wrapper over the same hardware class as
      ocelot, is integrated with phylink, but ocelot is using the plain PHY
      library. It makes sense to bring together the two implementations, which
      is what this patch achieves.
      
      This is a large patch and hard to break up, but it does the following:
      
      The existing ocelot_adjust_link writes some registers, and
      felix_phylink_mac_link_up writes some registers, some of them are
      common, but both functions write to some registers to which the other
      doesn't.
      
      The main reasons for this are:
      - Felix switches so far have used an NXP PCS so they had no need to
        write the PCS1G registers that ocelot_adjust_link writes
      - Felix switches have the MAC fixed at 1G, so some of the MAC speed
        changes actually break the link and must be avoided.
      
      The naming conventions for the functions introduced in this patch are:
      - vsc7514_phylink_{mac_config,validate} are specific to the Ocelot
        instantiations and placed in ocelot_net.c which is built only for the
        ocelot switchdev driver.
      - ocelot_phylink_mac_link_{up,down} are shared between the ocelot
        switchdev driver and the felix DSA driver (they are put in the common
        lib).
      
      One by one, the registers written by ocelot_adjust_link are:
      
      DEV_MAC_MODE_CFG - felix_phylink_mac_link_up had no need to write this
                         register since its out-of-reset value was fine and
                         did not need changing. The write is moved to the
                         common ocelot_phylink_mac_link_up and on felix it is
                         guarded by a quirk bit that makes the written value
                         identical with the out-of-reset one
      DEV_PORT_MISC - runtime invariant, was moved to vsc7514_phylink_mac_config
      PCS1G_MODE_CFG - same as above
      PCS1G_SD_CFG - same as above
      PCS1G_CFG - same as above
      PCS1G_ANEG_CFG - same as above
      PCS1G_LB_CFG - same as above
      DEV_MAC_ENA_CFG - both ocelot_adjust_link and ocelot_port_disable
                        touched this. felix_phylink_mac_link_{up,down} also
                        do. We go with what felix does and put it in
                        ocelot_phylink_mac_link_up.
      DEV_CLOCK_CFG - ocelot_adjust_link and felix_phylink_mac_link_up both
                      write this, but to different values. Move to the common
                      ocelot_phylink_mac_link_up and make sure via the quirk
                      that the old values are preserved for both.
      ANA_PFC_PFC_CFG - ocelot_adjust_link wrote this, felix_phylink_mac_link_up
                        did not. Runtime invariant, speed does not matter since
                        PFC is disabled via the RX_PFC_ENA bits which are cleared.
                        Move to vsc7514_phylink_mac_config.
      QSYS_SWITCH_PORT_MODE_PORT_ENA - both ocelot_adjust_link and
                                       felix_phylink_mac_link_{up,down} wrote
                                       this. Ocelot also wrote this register
                                       from ocelot_port_disable. Keep what
                                       felix did, move in ocelot_phylink_mac_link_{up,down}
                                       and delete ocelot_port_disable.
      ANA_POL_FLOWC - same as above
      SYS_MAC_FC_CFG - same as above, except slight behavior change. Whereas
                       ocelot always enabled RX and TX flow control, felix
                       listened to phylink (for the most part, at least - see
                       the 2500base-X comment).
      
      The registers which only felix_phylink_mac_link_up wrote are:
      
      SYS_PAUSE_CFG_PAUSE_ENA - this is why I am not sure that flow control
                                worked on ocelot. Not it should, since the
                                code is shared with felix where it does.
      ANA_PORT_PORT_CFG - this is a Frame Analyzer block register, phylink
                          should be the one touching them, deleted.
      
      Other changes:
      
      - The old phylib registration code was in mscc_ocelot_init_ports. It is
        hard to work with 2 levels of indentation already in, and with hard to
        follow teardown logic. The new phylink registration code was moved
        inside ocelot_probe_port(), right between alloc_etherdev() and
        register_netdev(). It could not be done before (=> outside of)
        ocelot_probe_port() because ocelot_probe_port() allocates the struct
        ocelot_port which we then use to assign ocelot_port->phy_mode to. It
        is more preferable to me to have all PHY handling logic inside the
        same function.
      - On the same topic: struct ocelot_port_private :: serdes is only used
        in ocelot_port_open to set the SERDES protocol to Ethernet. This is
        logically a runtime invariant and can be done just once, when the port
        registers with phylink. We therefore don't even need to keep the
        serdes reference inside struct ocelot_port_private, or to use the devm
        variant of of_phy_get().
      - Phylink needs a valid phy-mode for phylink_create() to succeed, and
        the existing device tree bindings in arch/mips/boot/dts/mscc/ocelot_pcb120.dts
        don't define one for the internal PHY ports. So we patch
        PHY_INTERFACE_MODE_NA into PHY_INTERFACE_MODE_INTERNAL.
      - There was a strategically placed:
      
      	switch (priv->phy_mode) {
      	case PHY_INTERFACE_MODE_NA:
      	        continue;
      
        which made the code skip the serdes initialization for the internal
        PHY ports. Frankly that is not all that obvious, so now we explicitly
        initialize the serdes under an "if" condition and not rely on code
        jumps, so everything is clearer.
      - There was a write of OCELOT_SPEED_1000 to DEV_CLOCK_CFG for QSGMII
        ports. Since that is in fact the default value for the register field
        DEV_CLOCK_CFG_LINK_SPEED, I can only guess the intention was to clear
        the adjacent fields, MAC_TX_RST and MAC_RX_RST, aka take the port out
        of reset, which does match the comment. I don't even want to know why
        this code is placed there, but if there is indeed an issue that all
        ports that share a QSGMII lane must all be up, then this logic is
        already buggy, since mscc_ocelot_init_ports iterates using
        for_each_available_child_of_node, so nobody prevents the user from
        putting a 'status = "disabled";' for some QSGMII ports which would
        break the driver's assumption.
        In any case, in the eventuality that I'm right, we would have yet
        another issue if ocelot_phylink_mac_link_down would reset those ports
        and that would be forbidden, so since the ocelot_adjust_link logic did
        not do that (maybe for a reason), add another quirk to preserve the
        old logic.
      
      The ocelot driver teardown goes through all ports in one fell swoop.
      When initialization of one port fails, the ocelot->ports[port] pointer
      for that is reset to NULL, and teardown is done only for non-NULL ports,
      so there is no reason to do partial teardowns, let the central
      mscc_ocelot_release_ports() do its job.
      
      Tested bind, unbind, rebind, link up, link down, speed change on mock-up
      hardware (modified the driver to probe on Felix VSC9959). Also
      regression tested the felix DSA driver. Could not test the Ocelot
      specific bits (PCS1G, SERDES, device tree bindings).
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e6e12df6
    • V
      net: dsa: felix: stop calling ocelot_port_{enable,disable} · 46efe4ef
      Vladimir Oltean 提交于
      ocelot_port_enable touches ANA_PORT_PORT_CFG, which has the following
      fields:
      
      - LOCKED_PORTMOVE_CPU, LEARNDROP, LEARNCPU, LEARNAUTO, RECV_ENA, all of
        which are written with their hardware default values, also runtime
        invariants. So it makes no sense to write these during every .ndo_open.
      
      - PORTID_VAL: this field has an out-of-reset value of zero for all ports
        and must be initialized by software. Additionally, the
        ocelot_setup_logical_port_ids() code path sets up different logical
        port IDs for the ports in a hardware LAG, and we absolutely don't want
        .ndo_open to interfere there and reset those values.
      
      So in fact the write from ocelot_port_enable can better be moved to
      ocelot_init_port, and the .ndo_open hook deleted.
      
      ocelot_port_disable touches DEV_MAC_ENA_CFG and QSYS_SWITCH_PORT_MODE_PORT_ENA,
      in an attempt to undo what ocelot_adjust_link did. But since .ndo_stop
      does not get called each time the link falls (i.e. this isn't a
      substitute for .phylink_mac_link_down), felix already does better at
      this by writing those registers already in felix_phylink_mac_link_down.
      
      So keep ocelot_port_disable (for now, until ocelot is converted to
      phylink too), and just delete the felix call to it, which is not
      necessary.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      46efe4ef
  16. 14 8月, 2021 1 次提交
    • A
      ethernet: fix PTP_1588_CLOCK dependencies · e5f31552
      Arnd Bergmann 提交于
      The 'imply' keyword does not do what most people think it does, it only
      politely asks Kconfig to turn on another symbol, but does not prevent
      it from being disabled manually or built as a loadable module when the
      user is built-in. In the ICE driver, the latter now causes a link failure:
      
      aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_eth_ioctl':
      ice_main.c:(.text+0x13b0): undefined reference to `ice_ptp_get_ts_config'
      ice_main.c:(.text+0x13b0): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_get_ts_config'
      aarch64-linux-ld: ice_main.c:(.text+0x13bc): undefined reference to `ice_ptp_set_ts_config'
      ice_main.c:(.text+0x13bc): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_set_ts_config'
      aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_prepare_for_reset':
      ice_main.c:(.text+0x31fc): undefined reference to `ice_ptp_release'
      ice_main.c:(.text+0x31fc): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_release'
      aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_rebuild':
      
      This is a recurring problem in many drivers, and we have discussed
      it several times befores, without reaching a consensus. I'm providing
      a link to the previous email thread for reference, which discusses
      some related problems.
      
      To solve the dependency issue better than the 'imply' keyword, introduce a
      separate Kconfig symbol "CONFIG_PTP_1588_CLOCK_OPTIONAL" that any driver
      can depend on if it is able to use PTP support when available, but works
      fine without it. Whenever CONFIG_PTP_1588_CLOCK=m, those drivers are
      then prevented from being built-in, the same way as with a 'depends on
      PTP_1588_CLOCK || !PTP_1588_CLOCK' dependency that does the same trick,
      but that can be rather confusing when you first see it.
      
      Since this should cover the dependencies correctly, the IS_REACHABLE()
      hack in the header is no longer needed now, and can be turned back
      into a normal IS_ENABLED() check. Any driver that gets the dependency
      wrong will now cause a link time failure rather than being unable to use
      PTP support when that is in a loadable module.
      
      However, the two recently added ptp_get_vclocks_index() and
      ptp_convert_timestamp() interfaces are only called from builtin code with
      ethtool and socket timestamps, so keep the current behavior by stubbing
      those out completely when PTP is in a loadable module. This should be
      addressed properly in a follow-up.
      
      As Richard suggested, we may want to actually turn PTP support into a
      'bool' option later on, preventing it from being a loadable module
      altogether, which would be one way to solve the problem with the ethtool
      interface.
      
      Fixes: 06c16d89 ("ice: register 1588 PTP clock device object for E810 devices")
      Link: https://lore.kernel.org/netdev/20210804121318.337276-1-arnd@kernel.org/
      Link: https://lore.kernel.org/netdev/CAK8P3a06enZOf=XyZ+zcAwBczv41UuCTz+=0FMf2gBz1_cOnZQ@mail.gmail.com/
      Link: https://lore.kernel.org/netdev/CAK8P3a3=eOxE-K25754+fB_-i_0BZzf9a9RfPTX3ppSwu9WZXw@mail.gmail.com/
      Link: https://lore.kernel.org/netdev/20210726084540.3282344-1-arnd@kernel.org/Acked-by: NShannon Nelson <snelson@pensando.io>
      Acked-by: NJacob Keller <jacob.e.keller@intel.com>
      Acked-by: NRichard Cochran <richardcochran@gmail.com>
      Reviewed-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Link: https://lore.kernel.org/r/20210812183509.1362782-1-arnd@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      e5f31552
  17. 12 8月, 2021 2 次提交
  18. 10 8月, 2021 4 次提交
    • V
      net: dsa: sja1105: fix broken backpressure in .port_fdb_dump · 21b52fed
      Vladimir Oltean 提交于
      rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into
      multiple netlink skbs if the buffer provided by user space is too small
      (one buffer will typically handle a few hundred FDB entries).
      
      When the current buffer becomes full, nlmsg_put() in
      dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index
      of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that
      point, and then the dump resumes on the same port with a new skb, and
      FDB entries up to the saved index are simply skipped.
      
      Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to
      drivers, then drivers must check for the -EMSGSIZE error code returned
      by it. Otherwise, when a netlink skb becomes full, DSA will no longer
      save newly dumped FDB entries to it, but the driver will continue
      dumping. So FDB entries will be missing from the dump.
      
      Fix the broken backpressure by propagating the "cb" return code and
      allow rtnl_fdb_dump() to restart the FDB dump with a new skb.
      
      Fixes: 291d1e72 ("net: dsa: sja1105: Add support for FDB and MDB management")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      21b52fed
    • V
      net: dsa: lantiq: fix broken backpressure in .port_fdb_dump · 871a73a1
      Vladimir Oltean 提交于
      rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into
      multiple netlink skbs if the buffer provided by user space is too small
      (one buffer will typically handle a few hundred FDB entries).
      
      When the current buffer becomes full, nlmsg_put() in
      dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index
      of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that
      point, and then the dump resumes on the same port with a new skb, and
      FDB entries up to the saved index are simply skipped.
      
      Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to
      drivers, then drivers must check for the -EMSGSIZE error code returned
      by it. Otherwise, when a netlink skb becomes full, DSA will no longer
      save newly dumped FDB entries to it, but the driver will continue
      dumping. So FDB entries will be missing from the dump.
      
      Fix the broken backpressure by propagating the "cb" return code and
      allow rtnl_fdb_dump() to restart the FDB dump with a new skb.
      
      Fixes: 58c59ef9 ("net: dsa: lantiq: Add Forwarding Database access")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      871a73a1
    • V
      net: dsa: lan9303: fix broken backpressure in .port_fdb_dump · ada2fee1
      Vladimir Oltean 提交于
      rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into
      multiple netlink skbs if the buffer provided by user space is too small
      (one buffer will typically handle a few hundred FDB entries).
      
      When the current buffer becomes full, nlmsg_put() in
      dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index
      of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that
      point, and then the dump resumes on the same port with a new skb, and
      FDB entries up to the saved index are simply skipped.
      
      Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to
      drivers, then drivers must check for the -EMSGSIZE error code returned
      by it. Otherwise, when a netlink skb becomes full, DSA will no longer
      save newly dumped FDB entries to it, but the driver will continue
      dumping. So FDB entries will be missing from the dump.
      
      Fix the broken backpressure by propagating the "cb" return code and
      allow rtnl_fdb_dump() to restart the FDB dump with a new skb.
      
      Fixes: ab335349 ("net: dsa: lan9303: Add port_fast_age and port_fdb_dump methods")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ada2fee1
    • V
      net: dsa: hellcreek: fix broken backpressure in .port_fdb_dump · cd391280
      Vladimir Oltean 提交于
      rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into
      multiple netlink skbs if the buffer provided by user space is too small
      (one buffer will typically handle a few hundred FDB entries).
      
      When the current buffer becomes full, nlmsg_put() in
      dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index
      of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that
      point, and then the dump resumes on the same port with a new skb, and
      FDB entries up to the saved index are simply skipped.
      
      Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to
      drivers, then drivers must check for the -EMSGSIZE error code returned
      by it. Otherwise, when a netlink skb becomes full, DSA will no longer
      save newly dumped FDB entries to it, but the driver will continue
      dumping. So FDB entries will be missing from the dump.
      
      Fix the broken backpressure by propagating the "cb" return code and
      allow rtnl_fdb_dump() to restart the FDB dump with a new skb.
      
      Fixes: e4b27ebc ("net: dsa: Add DSA driver for Hirschmann Hellcreek switches")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NKurt Kanzenbach <kurt@linutronix.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd391280