1. 09 10月, 2021 1 次提交
    • V
      net: dsa: mv88e6xxx: keep the pvid at 0 when VLAN-unaware · 8b6836d8
      Vladimir Oltean 提交于
      The VLAN support in mv88e6xxx has a loaded history. Commit 2ea7a679
      ("net: dsa: Don't add vlans when vlan filtering is disabled") noticed
      some issues with VLAN and decided the best way to deal with them was to
      make the DSA core ignore VLANs added by the bridge while VLAN awareness
      is turned off. Those issues were never explained, just presented as
      "at least one corner case".
      
      That approach had problems of its own, presented by
      commit 54a0ed0d ("net: dsa: provide an option for drivers to always
      receive bridge VLANs") for the DSA core, followed by
      commit 1fb74191 ("net: dsa: mv88e6xxx: fix vlan setup") which
      applied ds->configure_vlan_while_not_filtering = true for mv88e6xxx in
      particular.
      
      We still don't know what corner case Andrew saw when he wrote
      commit 2ea7a679 ("net: dsa: Don't add vlans when vlan filtering is
      disabled"), but Tobias now reports that when we use TX forwarding
      offload, pinging an external station from the bridge device is broken if
      the front-facing DSA user port has flooding turned off. The full
      description is in the link below, but for short, when a mv88e6xxx port
      is under a VLAN-unaware bridge, it inherits that bridge's pvid.
      So packets ingressing a user port will be classified to e.g. VID 1
      (assuming that value for the bridge_default_pvid), whereas when
      tag_dsa.c xmits towards a user port, it always sends packets using a VID
      of 0 if that port is standalone or under a VLAN-unaware bridge - or at
      least it did so prior to commit d82f8ab0 ("net: dsa: tag_dsa:
      offload the bridge forwarding process").
      
      In any case, when there is a conversation between the CPU and a station
      connected to a user port, the station's MAC address is learned in VID 1
      but the CPU tries to transmit through VID 0. The packets reach the
      intended station, but via flooding and not by virtue of matching the
      existing ATU entry.
      
      DSA has established (and enforced in other drivers: sja1105, felix,
      mt7530) that a VLAN-unaware port should use a private pvid, and not
      inherit the one from the bridge. The bridge's pvid should only be
      inherited when that bridge is VLAN-aware, so all state transitions need
      to be handled. On the other hand, all bridge VLANs should sit in the VTU
      starting with the moment when the bridge offloads them via switchdev,
      they are just not used.
      
      This solves the problem that Tobias sees because packets ingressing on
      VLAN-unaware user ports now get classified to VID 0, which is also the
      VID used by tag_dsa.c on xmit.
      
      Fixes: d82f8ab0 ("net: dsa: tag_dsa: offload the bridge forwarding process")
      Link: https://patchwork.kernel.org/project/netdevbpf/patch/20211003222312.284175-2-vladimir.oltean@nxp.com/#24491503Reported-by: NTobias Waldekranz <tobias@waldekranz.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      8b6836d8
  2. 27 9月, 2021 3 次提交
  3. 19 9月, 2021 2 次提交
    • V
      net: dsa: tear down devlink port regions when tearing down the devlink port on error · fd292c18
      Vladimir Oltean 提交于
      Commit 86f8b1c0 ("net: dsa: Do not make user port errors fatal")
      decided it was fine to ignore errors on certain ports that fail to
      probe, and go on with the ports that do probe fine.
      
      Commit fb6ec87f ("net: dsa: Fix type was not set for devlink port")
      noticed that devlink_port_type_eth_set(dlp, dp->slave); does not get
      called, and devlink notices after a timeout of 3600 seconds and prints a
      WARN_ON. So it went ahead to unregister the devlink port. And because
      there exists an UNUSED port flavour, we actually re-register the devlink
      port as UNUSED.
      
      Commit 08156ba4 ("net: dsa: Add devlink port regions support to
      DSA") added devlink port regions, which are set up by the driver and not
      by DSA.
      
      When we trigger the devlink port deregistration and reregistration as
      unused, devlink now prints another WARN_ON, from here:
      
      devlink_port_unregister:
      	WARN_ON(!list_empty(&devlink_port->region_list));
      
      So the port still has regions, which makes sense, because they were set
      up by the driver, and the driver doesn't know we're unregistering the
      devlink port.
      
      Somebody needs to tear them down, and optionally (actually it would be
      nice, to be consistent) set them up again for the new devlink port.
      
      But DSA's layering stays in our way quite badly here.
      
      The options I've considered are:
      
      1. Introduce a function in devlink to just change a port's type and
         flavour. No dice, devlink keeps a lot of state, it really wants the
         port to not be registered when you set its parameters, so changing
         anything can only be done by destroying what we currently have and
         recreating it.
      
      2. Make DSA cache the parameters passed to dsa_devlink_port_region_create,
         and the region returned, keep those in a list, then when the devlink
         port unregister needs to take place, the existing devlink regions are
         destroyed by DSA, and we replay the creation of new regions using the
         cached parameters. Problem: mv88e6xxx keeps the region pointers in
         chip->ports[port].region, and these will remain stale after DSA frees
         them. There are many things DSA can do, but updating mv88e6xxx's
         private pointers is not one of them.
      
      3. Just let the driver do it (i.e. introduce a very specific method
         called ds->ops->port_reinit_as_unused, which unregisters its devlink
         port devlink regions, then the old devlink port, then registers the
         new one, then the devlink port regions for it). While it does work,
         as opposed to the others, it's pretty horrible from an API
         perspective and we can do better.
      
      4. Introduce a new pair of methods, ->port_setup and ->port_teardown,
         which in the case of mv88e6xxx must register and unregister the
         devlink port regions. Call these 2 methods when the port must be
         reinitialized as unused.
      
      Naturally, I went for the 4th approach.
      
      Fixes: 08156ba4 ("net: dsa: Add devlink port regions support to DSA")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd292c18
    • V
      net: dsa: be compatible with masters which unregister on shutdown · 0650bf52
      Vladimir Oltean 提交于
      Lino reports that on his system with bcmgenet as DSA master and KSZ9897
      as a switch, rebooting or shutting down never works properly.
      
      What does the bcmgenet driver have special to trigger this, that other
      DSA masters do not? It has an implementation of ->shutdown which simply
      calls its ->remove implementation. Otherwise said, it unregisters its
      network interface on shutdown.
      
      This message can be seen in a loop, and it hangs the reboot process there:
      
      unregister_netdevice: waiting for eth0 to become free. Usage count = 3
      
      So why 3?
      
      A usage count of 1 is normal for a registered network interface, and any
      virtual interface which links itself as an upper of that will increment
      it via dev_hold. In the case of DSA, this is the call path:
      
      dsa_slave_create
      -> netdev_upper_dev_link
         -> __netdev_upper_dev_link
            -> __netdev_adjacent_dev_insert
               -> dev_hold
      
      So a DSA switch with 3 interfaces will result in a usage count elevated
      by two, and netdev_wait_allrefs will wait until they have gone away.
      
      Other stacked interfaces, like VLAN, watch NETDEV_UNREGISTER events and
      delete themselves, but DSA cannot just vanish and go poof, at most it
      can unbind itself from the switch devices, but that must happen strictly
      earlier compared to when the DSA master unregisters its net_device, so
      reacting on the NETDEV_UNREGISTER event is way too late.
      
      It seems that it is a pretty established pattern to have a driver's
      ->shutdown hook redirect to its ->remove hook, so the same code is
      executed regardless of whether the driver is unbound from the device, or
      the system is just shutting down. As Florian puts it, it is quite a big
      hammer for bcmgenet to unregister its net_device during shutdown, but
      having a common code path with the driver unbind helps ensure it is well
      tested.
      
      So DSA, for better or for worse, has to live with that and engage in an
      arms race of implementing the ->shutdown hook too, from all individual
      drivers, and do something sane when paired with masters that unregister
      their net_device there. The only sane thing to do, of course, is to
      unlink from the master.
      
      However, complications arise really quickly.
      
      The pattern of redirecting ->shutdown to ->remove is not unique to
      bcmgenet or even to net_device drivers. In fact, SPI controllers do it
      too (see dspi_shutdown -> dspi_remove), and presumably, I2C controllers
      and MDIO controllers do it too (this is something I have not researched
      too deeply, but even if this is not the case today, it is certainly
      plausible to happen in the future, and must be taken into consideration).
      
      Since DSA switches might be SPI devices, I2C devices, MDIO devices, the
      insane implication is that for the exact same DSA switch device, we
      might have both ->shutdown and ->remove getting called.
      
      So we need to do something with that insane environment. The pattern
      I've come up with is "if this, then not that", so if either ->shutdown
      or ->remove gets called, we set the device's drvdata to NULL, and in the
      other hook, we check whether the drvdata is NULL and just do nothing.
      This is probably not necessary for platform devices, just for devices on
      buses, but I would really insist for consistency among drivers, because
      when code is copy-pasted, it is not always copy-pasted from the best
      sources.
      
      So depending on whether the DSA switch's ->remove or ->shutdown will get
      called first, we cannot really guarantee even for the same driver if
      rebooting will result in the same code path on all platforms. But
      nonetheless, we need to do something minimally reasonable on ->shutdown
      too to fix the bug. Of course, the ->remove will do more (a full
      teardown of the tree, with all data structures freed, and this is why
      the bug was not caught for so long). The new ->shutdown method is kept
      separate from dsa_unregister_switch not because we couldn't have
      unregistered the switch, but simply in the interest of doing something
      quick and to the point.
      
      The big question is: does the DSA switch's ->shutdown get called earlier
      than the DSA master's ->shutdown? If not, there is still a risk that we
      might still trigger the WARN_ON in unregister_netdevice that says we are
      attempting to unregister a net_device which has uppers. That's no good.
      Although the reference to the master net_device won't physically go away
      even if DSA's ->shutdown comes afterwards, remember we have a dev_hold
      on it.
      
      The answer to that question lies in this comment above device_link_add:
      
       * A side effect of the link creation is re-ordering of dpm_list and the
       * devices_kset list by moving the consumer device and all devices depending
       * on it to the ends of these lists (that does not happen to devices that have
       * not been registered when this function is called).
      
      so the fact that DSA uses device_link_add towards its master is not
      exactly for nothing. device_shutdown() walks devices_kset from the back,
      so this is our guarantee that DSA's shutdown happens before the master's
      shutdown.
      
      Fixes: 2f1e8ea7 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings")
      Link: https://lore.kernel.org/netdev/20210909095324.12978-1-LinoSanfilippo@gmx.de/Reported-by: NLino Sanfilippo <LinoSanfilippo@gmx.de>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0650bf52
  4. 24 8月, 2021 1 次提交
    • N
      net: dsa: mv88e6xxx: Update mv88e6393x serdes errata · 3b0720ba
      Nathan Rossi 提交于
      In early erratas this issue only covered port 0 when changing from
      [x]MII (rev A 3.6). In subsequent errata versions this errata changed to
      cover the additional "Hardware reset in CPU managed mode" condition, and
      removed the note specifying that it only applied to port 0.
      
      In designs where the device is configured with CPU managed mode
      (CPU_MGD), on reset all SERDES ports (p0, p9, p10) have a stuck power
      down bit and require this initial power up procedure. As such apply this
      errata to all three SERDES ports of the mv88e6393x.
      Signed-off-by: NNathan Rossi <nathan.rossi@digi.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b0720ba
  5. 14 8月, 2021 1 次提交
    • A
      ethernet: fix PTP_1588_CLOCK dependencies · e5f31552
      Arnd Bergmann 提交于
      The 'imply' keyword does not do what most people think it does, it only
      politely asks Kconfig to turn on another symbol, but does not prevent
      it from being disabled manually or built as a loadable module when the
      user is built-in. In the ICE driver, the latter now causes a link failure:
      
      aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_eth_ioctl':
      ice_main.c:(.text+0x13b0): undefined reference to `ice_ptp_get_ts_config'
      ice_main.c:(.text+0x13b0): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_get_ts_config'
      aarch64-linux-ld: ice_main.c:(.text+0x13bc): undefined reference to `ice_ptp_set_ts_config'
      ice_main.c:(.text+0x13bc): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_set_ts_config'
      aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_prepare_for_reset':
      ice_main.c:(.text+0x31fc): undefined reference to `ice_ptp_release'
      ice_main.c:(.text+0x31fc): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_release'
      aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_rebuild':
      
      This is a recurring problem in many drivers, and we have discussed
      it several times befores, without reaching a consensus. I'm providing
      a link to the previous email thread for reference, which discusses
      some related problems.
      
      To solve the dependency issue better than the 'imply' keyword, introduce a
      separate Kconfig symbol "CONFIG_PTP_1588_CLOCK_OPTIONAL" that any driver
      can depend on if it is able to use PTP support when available, but works
      fine without it. Whenever CONFIG_PTP_1588_CLOCK=m, those drivers are
      then prevented from being built-in, the same way as with a 'depends on
      PTP_1588_CLOCK || !PTP_1588_CLOCK' dependency that does the same trick,
      but that can be rather confusing when you first see it.
      
      Since this should cover the dependencies correctly, the IS_REACHABLE()
      hack in the header is no longer needed now, and can be turned back
      into a normal IS_ENABLED() check. Any driver that gets the dependency
      wrong will now cause a link time failure rather than being unable to use
      PTP support when that is in a loadable module.
      
      However, the two recently added ptp_get_vclocks_index() and
      ptp_convert_timestamp() interfaces are only called from builtin code with
      ethtool and socket timestamps, so keep the current behavior by stubbing
      those out completely when PTP is in a loadable module. This should be
      addressed properly in a follow-up.
      
      As Richard suggested, we may want to actually turn PTP support into a
      'bool' option later on, preventing it from being a loadable module
      altogether, which would be one way to solve the problem with the ethtool
      interface.
      
      Fixes: 06c16d89 ("ice: register 1588 PTP clock device object for E810 devices")
      Link: https://lore.kernel.org/netdev/20210804121318.337276-1-arnd@kernel.org/
      Link: https://lore.kernel.org/netdev/CAK8P3a06enZOf=XyZ+zcAwBczv41UuCTz+=0FMf2gBz1_cOnZQ@mail.gmail.com/
      Link: https://lore.kernel.org/netdev/CAK8P3a3=eOxE-K25754+fB_-i_0BZzf9a9RfPTX3ppSwu9WZXw@mail.gmail.com/
      Link: https://lore.kernel.org/netdev/20210726084540.3282344-1-arnd@kernel.org/Acked-by: NShannon Nelson <snelson@pensando.io>
      Acked-by: NJacob Keller <jacob.e.keller@intel.com>
      Acked-by: NRichard Cochran <richardcochran@gmail.com>
      Reviewed-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Link: https://lore.kernel.org/r/20210812183509.1362782-1-arnd@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      e5f31552
  6. 09 8月, 2021 1 次提交
    • V
      net: dsa: centralize fast ageing when address learning is turned off · 045c45d1
      Vladimir Oltean 提交于
      Currently DSA leaves it down to device drivers to fast age the FDB on a
      port when address learning is disabled on it. There are 2 reasons for
      doing that in the first place:
      
      - when address learning is disabled by user space, through
        IFLA_BRPORT_LEARNING or the brport_attr_learning sysfs, what user
        space typically wants to achieve is to operate in a mode with no
        dynamic FDB entry on that port. But if the port is already up, some
        addresses might have been already learned on it, and it seems silly to
        wait for 5 minutes for them to expire until something useful can be
        done.
      
      - when a port leaves a bridge and becomes standalone, DSA turns off
        address learning on it. This also has the nice side effect of flushing
        the dynamically learned bridge FDB entries on it, which is a good idea
        because standalone ports should not have bridge FDB entries on them.
      
      We let drivers manage fast ageing under this condition because if DSA
      were to do it, it would need to track each port's learning state, and
      act upon the transition, which it currently doesn't.
      
      But there are 2 reasons why doing it is better after all:
      
      - drivers might get it wrong and not do it (see b53_port_set_learning)
      
      - we would like to flush the dynamic entries from the software bridge
        too, and letting drivers do that would be another pain point
      
      So track the port learning state and trigger a fast age process
      automatically within DSA.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      045c45d1
  7. 06 8月, 2021 1 次提交
    • V
      net: dsa: don't disable multicast flooding to the CPU even without an IGMP querier · c73c5708
      Vladimir Oltean 提交于
      Commit 08cc83cc ("net: dsa: add support for BRIDGE_MROUTER
      attribute") added an option for users to turn off multicast flooding
      towards the CPU if they turn off the IGMP querier on a bridge which
      already has enslaved ports (echo 0 > /sys/class/net/br0/bridge/multicast_router).
      
      And commit a8b659e7 ("net: dsa: act as passthrough for bridge port flags")
      simply papered over that issue, because it moved the decision to flood
      the CPU with multicast (or not) from the DSA core down to individual drivers,
      instead of taking a more radical position then.
      
      The truth is that disabling multicast flooding to the CPU is simply
      something we are not prepared to do now, if at all. Some reasons:
      
      - ICMP6 neighbor solicitation messages are unregistered multicast
        packets as far as the bridge is concerned. So if we stop flooding
        multicast, the outside world cannot ping the bridge device's IPv6
        link-local address.
      
      - There might be foreign interfaces bridged with our DSA switch ports
        (sending a packet towards the host does not necessarily equal
        termination, but maybe software forwarding). So if there is no one
        interested in that multicast traffic in the local network stack, that
        doesn't mean nobody is.
      
      - PTP over L4 (IPv4, IPv6) is multicast, but is unregistered as far as
        the bridge is concerned. This should reach the CPU port.
      
      - The switch driver might not do FDB partitioning. And since we don't
        even bother to do more fine-grained flood disabling (such as "disable
        flooding _from_port_N_ towards the CPU port" as opposed to "disable
        flooding _from_any_port_ towards the CPU port"), this breaks standalone
        ports, or even multiple bridges where one has an IGMP querier and one
        doesn't.
      
      Reverting the logic makes all of the above work.
      
      Fixes: a8b659e7 ("net: dsa: act as passthrough for bridge port flags")
      Fixes: 08cc83cc ("net: dsa: add support for BRIDGE_MROUTER attribute")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c73c5708
  8. 24 7月, 2021 1 次提交
  9. 23 7月, 2021 1 次提交
    • V
      net: dsa: mv88e6xxx: map virtual bridges with forwarding offload in the PVT · ce5df689
      Vladimir Oltean 提交于
      The mv88e6xxx switches have the ability to receive FORWARD (data plane)
      frames from the CPU port and route them according to the FDB. We can use
      this to offload the forwarding process of packets sent by the software
      bridge.
      
      Because DSA supports bridge domain isolation between user ports, just
      sending FORWARD frames is not enough, as they might leak the intended
      broadcast domain of the bridge on behalf of which the packets are sent.
      
      It should be noted that FORWARD frames are also (and typically) used to
      forward data plane packets on DSA links in cross-chip topologies. The
      FORWARD frame header contains the source port and switch ID, and
      switches receiving this frame header forward the packet according to
      their cross-chip port-based VLAN table (PVT).
      
      To address the bridging domain isolation in the context of offloading
      the forwarding on TX, the idea is that we can reuse the parts of the PVT
      that don't have any physical switch mapped to them, one entry for each
      software bridge. The switches will therefore think that behind their
      upstream port lie many switches, all in fact backed up by software
      bridges through tag_dsa.c, which constructs FORWARD packets with the
      right switch ID corresponding to each bridge.
      
      The mapping we use is absolutely trivial: DSA gives us a unique bridge
      number, and we add the number of the physical switches in the DSA switch
      tree to that, to obtain a unique virtual bridge device number to use in
      the PVT.
      Co-developed-by: NTobias Waldekranz <tobias@waldekranz.com>
      Signed-off-by: NTobias Waldekranz <tobias@waldekranz.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce5df689
  10. 16 7月, 2021 1 次提交
  11. 02 7月, 2021 6 次提交
  12. 22 6月, 2021 1 次提交
    • E
      net: dsa: mv88e6xxx: Fix adding vlan 0 · b8b79c41
      Eldar Gasanov 提交于
      8021q module adds vlan 0 to all interfaces when it starts.
      When 8021q module is loaded it isn't possible to create bond
      with mv88e6xxx interfaces, bonding module dipslay error
      "Couldn't add bond vlan ids", because it tries to add vlan 0
      to slave interfaces.
      
      There is unexpected behavior in the switch. When a PVID
      is assigned to a port the switch changes VID to PVID
      in ingress frames with VID 0 on the port. Expected
      that the switch doesn't assign PVID to tagged frames
      with VID 0. But there isn't a way to change this behavior
      in the switch.
      
      Fixes: 57e661aa ("net: dsa: mv88e6xxx: Link aggregation support")
      Signed-off-by: NEldar Gasanov <eldargasanov2@gmail.com>
      Reviewed-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b8b79c41
  13. 28 4月, 2021 4 次提交
  14. 22 4月, 2021 3 次提交
  15. 21 4月, 2021 3 次提交
  16. 13 4月, 2021 1 次提交
    • P
      net: phy: marvell: fix detection of PHY on Topaz switches · 1fe976d3
      Pali Rohár 提交于
      Since commit fee2d546 ("net: phy: marvell: mv88e6390 temperature
      sensor reading"), Linux reports the temperature of Topaz hwmon as
      constant -75°C.
      
      This is because switches from the Topaz family (88E6141 / 88E6341) have
      the address of the temperature sensor register different from Peridot.
      
      This address is instead compatible with 88E1510 PHYs, as was used for
      Topaz before the above mentioned commit.
      
      Create a new mapping table between switch family and PHY ID for families
      which don't have a model number. And define PHY IDs for Topaz and Peridot
      families.
      
      Create a new PHY ID and a new PHY driver for Topaz's internal PHY.
      The only difference from Peridot's PHY driver is the HWMON probing
      method.
      
      Prior this change Topaz's internal PHY is detected by kernel as:
      
        PHY [...] driver [Marvell 88E6390] (irq=63)
      
      And afterwards as:
      
        PHY [...] driver [Marvell 88E6341 Family] (irq=63)
      Signed-off-by: NPali Rohár <pali@kernel.org>
      BugLink: https://github.com/globalscaletechnologies/linux/issues/1
      Fixes: fee2d546 ("net: phy: marvell: mv88e6390 temperature sensor reading")
      Reviewed-by: NMarek Behún <kabel@kernel.org>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1fe976d3
  17. 23 3月, 2021 1 次提交
  18. 19 3月, 2021 7 次提交
  19. 18 3月, 2021 1 次提交