1. 03 4月, 2023 1 次提交
    • V
      net: create a netdev notifier for DSA to reject PTP on DSA master · 88c0a6b5
      Vladimir Oltean 提交于
      The fact that PTP 2-step TX timestamping is broken on DSA switches if
      the master also timestamps the same packets is documented by commit
      f685e609 ("net: dsa: Deny PTP on master if switch supports it").
      We attempt to help the users avoid shooting themselves in the foot by
      making DSA reject the timestamping ioctls on an interface that is a DSA
      master, and the switch tree beneath it contains switches which are aware
      of PTP.
      
      The only problem is that there isn't an established way of intercepting
      ndo_eth_ioctl calls, so DSA creates avoidable burden upon the network
      stack by creating a struct dsa_netdevice_ops with overlaid function
      pointers that are manually checked from the relevant call sites. There
      used to be 2 such dsa_netdevice_ops, but now, ndo_eth_ioctl is the only
      one left.
      
      There is an ongoing effort to migrate driver-visible hardware timestamping
      control from the ndo_eth_ioctl() based API to a new ndo_hwtstamp_set()
      model, but DSA actively prevents that migration, since dsa_master_ioctl()
      is currently coded to manually call the master's legacy ndo_eth_ioctl(),
      and so, whenever a network device driver would be converted to the new
      API, DSA's restrictions would be circumvented, because any device could
      be used as a DSA master.
      
      The established way for unrelated modules to react on a net device event
      is via netdevice notifiers. So we create a new notifier which gets
      called whenever there is an attempt to change hardware timestamping
      settings on a device.
      
      Finally, there is another reason why a netdev notifier will be a good
      idea, besides strictly DSA, and this has to do with PHY timestamping.
      
      With ndo_eth_ioctl(), all MAC drivers must manually call
      phy_has_hwtstamp() before deciding whether to act upon SIOCSHWTSTAMP,
      otherwise they must pass this ioctl to the PHY driver via
      phy_mii_ioctl().
      
      With the new ndo_hwtstamp_set() API, it will be desirable to simply not
      make any calls into the MAC device driver when timestamping should be
      performed at the PHY level.
      
      But there exist drivers, such as the lan966x switch, which need to
      install packet traps for PTP regardless of whether they are the layer
      that provides the hardware timestamps, or the PHY is. That would be
      impossible to support with the new API.
      
      The proposal there, too, is to introduce a netdev notifier which acts as
      a better cue for switching drivers to add or remove PTP packet traps,
      than ndo_hwtstamp_set(). The one introduced here "almost" works there as
      well, except for the fact that packet traps should only be installed if
      the PHY driver succeeded to enable hardware timestamping, whereas here,
      we need to deny hardware timestamping on the DSA master before it
      actually gets enabled. This is why this notifier is called "PRE_", and
      the notifier that would get used for PHY timestamping and packet traps
      would be called NETDEV_CHANGE_HWTSTAMP. This isn't a new concept, for
      example NETDEV_CHANGEUPPER and NETDEV_PRECHANGEUPPER do the same thing.
      
      In expectation of future netlink UAPI, we also pass a non-NULL extack
      pointer to the netdev notifier, and we make DSA populate it with an
      informative reason for the rejection. To avoid making it go to waste, we
      make the ioctl-based dev_set_hwtstamp() create a fake extack and print
      the message to the kernel log.
      
      Link: https://lore.kernel.org/netdev/20230401191215.tvveoi3lkawgg6g4@skbuf/
      Link: https://lore.kernel.org/netdev/20230310164451.ls7bbs6pdzs4m6pw@skbuf/Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      88c0a6b5
  2. 31 3月, 2023 1 次提交
    • V
      net: dsa: sync unicast and multicast addresses for VLAN filters too · 64fdc5f3
      Vladimir Oltean 提交于
      If certain conditions are met, DSA can install all necessary MAC
      addresses on the CPU ports as FDB entries and disable flooding towards
      the CPU (we call this RX filtering).
      
      There is one corner case where this does not work.
      
      ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up
      ip link set swp0 master br0 && ip link set swp0 up
      ip link add link swp0 name swp0.100 type vlan id 100
      ip link set swp0.100 up && ip addr add 192.168.100.1/24 dev swp0.100
      
      Traffic through swp0.100 is broken, because the bridge turns on VLAN
      filtering in the swp0 port (causing RX packets to be classified to the
      FDB database corresponding to the VID from their 802.1Q header), and
      although the 8021q module does call dev_uc_add() towards the real
      device, that API is VLAN-unaware, so it only contains the MAC address,
      not the VID; and DSA's current implementation of ndo_set_rx_mode() is
      only for VID 0 (corresponding to FDB entries which are installed in an
      FDB database which is only hit when the port is VLAN-unaware).
      
      It's interesting to understand why the bridge does not turn on
      IFF_PROMISC for its swp0 bridge port, and it may appear at first glance
      that this is a regression caused by the logic in commit 2796d0c6
      ("bridge: Automatically manage port promiscuous mode."). After all,
      a bridge port needs to have IFF_PROMISC by its very nature - it needs to
      receive and forward frames with a MAC DA different from the bridge
      ports' MAC addresses.
      
      While that may be true, when the bridge is VLAN-aware *and* it has a
      single port, there is no real reason to enable promiscuity even if that
      is an automatic port, with flooding and learning (there is nowhere for
      packets to go except to the BR_FDB_LOCAL entries), and this is how the
      corner case appears. Adding a second automatic interface to the bridge
      would make swp0 promisc as well, and would mask the corner case.
      
      Given the dev_uc_add() / ndo_set_rx_mode() API is what it is (it doesn't
      pass a VLAN ID), the only way to address that problem is to install host
      FDB entries for the cartesian product of RX filtering MAC addresses and
      VLAN RX filters.
      
      Fixes: 7569459a ("net: dsa: manage flooding on the CPU ports")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NSimon Horman <simon.horman@corigine.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/20230329151821.745752-1-vladimir.oltean@nxp.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      64fdc5f3
  3. 17 3月, 2023 1 次提交
    • V
      net: dsa: don't error out when drivers return ETH_DATA_LEN in .port_max_mtu() · 636e8adf
      Vladimir Oltean 提交于
      Currently, when dsa_slave_change_mtu() is called on a user port where
      dev->max_mtu is 1500 (as returned by ds->ops->port_max_mtu()), the code
      will stumble upon this check:
      
      	if (new_master_mtu > mtu_limit)
      		return -ERANGE;
      
      because new_master_mtu is adjusted for the tagger overhead but mtu_limit
      is not.
      
      But it would be good if the logic went through, for example if the DSA
      master really depends on an MTU adjustment to accept DSA-tagged frames.
      
      To make the code pass through the check, we need to adjust mtu_limit for
      the overhead as well, if the minimum restriction was caused by the DSA
      user port's MTU (dev->max_mtu). A DSA user port MTU and a DSA master MTU
      are always offset by the protocol overhead.
      
      Currently no drivers return 1500 .port_max_mtu(), but this is only
      temporary and a bug in itself - mv88e6xxx should have done that, but
      since commit b9c587fe ("dsa: mv88e6xxx: Include tagger overhead when
      setting MTU for DSA and CPU ports") it no longer does. This is a
      preparation for fixing that.
      
      Fixes: bfcb8132 ("net: dsa: configure the MTU for switch ports")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NSimon Horman <simon.horman@corigine.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      636e8adf
  4. 04 2月, 2023 1 次提交
  5. 02 2月, 2023 1 次提交
  6. 23 1月, 2023 1 次提交
  7. 23 11月, 2022 7 次提交
  8. 18 11月, 2022 3 次提交
  9. 04 11月, 2022 2 次提交
  10. 29 10月, 2022 1 次提交
  11. 15 10月, 2022 1 次提交
  12. 01 10月, 2022 1 次提交
    • V
      net: dsa: don't leave dangling pointers in dp->pl when failing · cf5ca4dd
      Vladimir Oltean 提交于
      There is a desire to simplify the dsa_port registration path with
      devlink, and this involves reworking a bit how user ports which fail to
      connect to their PHY (because it's missing) get reinitialized as UNUSED
      devlink ports.
      
      The desire is for the change to look something like this; basically
      dsa_port_setup() has failed, we just change dp->type and call
      dsa_port_setup() again.
      
      -/* Destroy the current devlink port, and create a new one which has the UNUSED
      - * flavour.
      - */
      -static int dsa_port_reinit_as_unused(struct dsa_port *dp)
      +static int dsa_port_setup_as_unused(struct dsa_port *dp)
       {
      -	dsa_port_devlink_teardown(dp);
       	dp->type = DSA_PORT_TYPE_UNUSED;
      -	return dsa_port_devlink_setup(dp);
      +	return dsa_port_setup(dp);
       }
      
      For an UNUSED port, dsa_port_setup() mostly only calls dsa_port_devlink_setup()
      anyway, so we could get away with calling just that. But if we call the
      full blown dsa_port_setup(dp) (which will be needed to properly set
      dp->setup = true), the callee will have the tendency to go through this
      code block too, and call dsa_port_disable(dp):
      
      	switch (dp->type) {
      	case DSA_PORT_TYPE_UNUSED:
      		dsa_port_disable(dp);
      		break;
      
      That is not very good, because dsa_port_disable() has this hidden inside
      of it:
      
      	if (dp->pl)
      		phylink_stop(dp->pl);
      
      Fact is, we are not prepared to handle a call to dsa_port_disable() with
      a struct dsa_port that came from a previous (and failed) call to
      dsa_port_setup(). We do not clean up dp->pl, and this will make the
      second call to dsa_port_setup() call phylink_stop() on a dangling dp->pl
      pointer.
      
      Solve this by creating an API for phylink destruction which is symmetric
      to the phylink creation, and never leave dp->pl set to anything except
      NULL or a valid phylink structure.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      cf5ca4dd
  13. 20 9月, 2022 3 次提交
    • V
      net: dsa: allow masters to join a LAG · acc43b7b
      Vladimir Oltean 提交于
      There are 2 ways in which a DSA user port may become handled by 2 CPU
      ports in a LAG:
      
      (1) its current DSA master joins a LAG
      
       ip link del bond0 && ip link add bond0 type bond mode 802.3ad
       ip link set eno2 master bond0
      
      When this happens, all user ports with "eno2" as DSA master get
      automatically migrated to "bond0" as DSA master.
      
      (2) it is explicitly configured as such by the user
      
       # Before, the DSA master was eno3
       ip link set swp0 type dsa master bond0
      
      The design of this configuration is that the LAG device dynamically
      becomes a DSA master through dsa_master_setup() when the first physical
      DSA master becomes a LAG slave, and stops being so through
      dsa_master_teardown() when the last physical DSA master leaves.
      
      A LAG interface is considered as a valid DSA master only if it contains
      existing DSA masters, and no other lower interfaces. Therefore, we
      mainly rely on method (1) to enter this configuration.
      
      Each physical DSA master (LAG slave) retains its dev->dsa_ptr for when
      it becomes a standalone DSA master again. But the LAG master also has a
      dev->dsa_ptr, and this is actually duplicated from one of the physical
      LAG slaves, and therefore needs to be balanced when LAG slaves come and
      go.
      
      To the switch driver, putting DSA masters in a LAG is seen as putting
      their associated CPU ports in a LAG.
      
      We need to prepare cross-chip host FDB notifiers for CPU ports in a LAG,
      by calling the driver's ->lag_fdb_add method rather than ->port_fdb_add.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      acc43b7b
    • V
      net: dsa: allow the DSA master to be seen and changed through rtnetlink · 95f510d0
      Vladimir Oltean 提交于
      Some DSA switches have multiple CPU ports, which can be used to improve
      CPU termination throughput, but DSA, through dsa_tree_setup_cpu_ports(),
      sets up only the first one, leading to suboptimal use of hardware.
      
      The desire is to not change the default configuration but to permit the
      user to create a dynamic mapping between individual user ports and the
      CPU port that they are served by, configurable through rtnetlink. It is
      also intended to permit load balancing between CPU ports, and in that
      case, the foreseen model is for the DSA master to be a bonding interface
      whose lowers are the physical DSA masters.
      
      To that end, we create a struct rtnl_link_ops for DSA user ports with
      the "dsa" kind. We expose the IFLA_DSA_MASTER link attribute that
      contains the ifindex of the newly desired DSA master.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      95f510d0
    • V
      net: dsa: introduce dsa_port_get_master() · 8f6a19c0
      Vladimir Oltean 提交于
      There is a desire to support for DSA masters in a LAG.
      
      That configuration is intended to work by simply enslaving the master to
      a bonding/team device. But the physical DSA master (the LAG slave) still
      has a dev->dsa_ptr, and that cpu_dp still corresponds to the physical
      CPU port.
      
      However, we would like to be able to retrieve the LAG that's the upper
      of the physical DSA master. In preparation for that, introduce a helper
      called dsa_port_get_master() that replaces all occurrences of the
      dp->cpu_dp->master pattern. The distinction between LAG and non-LAG will
      be made later within the helper itself.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      8f6a19c0
  14. 23 8月, 2022 7 次提交
  15. 30 6月, 2022 1 次提交
  16. 27 6月, 2022 2 次提交
  17. 10 6月, 2022 1 次提交
  18. 13 5月, 2022 1 次提交
    • V
      net: dsa: felix: manage host flooding using a specific driver callback · 72c3b0c7
      Vladimir Oltean 提交于
      At the time - commit 7569459a ("net: dsa: manage flooding on the CPU
      ports") - not introducing a dedicated switch callback for host flooding
      made sense, because for the only user, the felix driver, there was
      nothing different to do for the CPU port than set the flood flags on the
      CPU port just like on any other bridge port.
      
      There are 2 reasons why this approach is not good enough, however.
      
      (1) Other drivers, like sja1105, support configuring flooding as a
          function of {ingress port, egress port}, whereas the DSA
          ->port_bridge_flags() function only operates on an egress port.
          So with that driver we'd have useless host flooding from user ports
          which don't need it.
      
      (2) Even with the felix driver, support for multiple CPU ports makes it
          difficult to piggyback on ->port_bridge_flags(). The way in which
          the felix driver is going to support host-filtered addresses with
          multiple CPU ports is that it will direct these addresses towards
          both CPU ports (in a sort of multicast fashion), then restrict the
          forwarding to only one of the two using the forwarding masks.
          Consequently, flooding will also be enabled towards both CPU ports.
          However, ->port_bridge_flags() gets passed the index of a single CPU
          port, and that leaves the flood settings out of sync between the 2
          CPU ports.
      
      This is to say, it's better to have a specific driver method for host
      flooding, which takes the user port as argument. This solves problem (1)
      by allowing the driver to do different things for different user ports,
      and problem (2) by abstracting the operation and letting the driver do
      whatever, rather than explicitly making the DSA core point to the CPU
      port it thinks needs to be touched.
      
      This new method also creates a problem, which is that cross-chip setups
      are not handled. However I don't have hardware right now where I can
      test what is the proper thing to do, and there isn't hardware compatible
      with multi-switch trees that supports host flooding. So it remains a
      problem to be tackled in the future.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      72c3b0c7
  19. 25 4月, 2022 1 次提交
    • V
      net: dsa: flood multicast to CPU when slave has IFF_PROMISC · 7c762e70
      Vladimir Oltean 提交于
      Certain DSA switches can eliminate flooding to the CPU when none of the
      ports have the IFF_ALLMULTI or IFF_PROMISC flags set. This is done by
      synthesizing a call to dsa_port_bridge_flags() for the CPU port, a call
      which normally comes from the bridge driver via switchdev.
      
      The bridge port flags and IFF_PROMISC|IFF_ALLMULTI have slightly
      different semantics, and due to inattention/lack of proper testing, the
      IFF_PROMISC flag allows unknown unicast to be flooded to the CPU, but
      not unknown multicast.
      
      This must be fixed by setting both BR_FLOOD (unicast) and BR_MCAST_FLOOD
      in the synthesized dsa_port_bridge_flags() call, since IFF_PROMISC means
      that packets should not be filtered regardless of their MAC DA.
      
      Fixes: 7569459a ("net: dsa: manage flooding on the CPU ports")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c762e70
  20. 20 4月, 2022 3 次提交