1. 03 3月, 2022 1 次提交
  2. 27 2月, 2022 1 次提交
    • V
      net: mscc: ocelot: enforce FDB isolation when VLAN-unaware · 54c31984
      Vladimir Oltean 提交于
      Currently ocelot uses a pvid of 0 for standalone ports and ports under a
      VLAN-unaware bridge, and the pvid of the bridge for ports under a
      VLAN-aware bridge. Standalone ports do not perform learning, but packets
      received on them are still subject to FDB lookups. So if the MAC DA that
      a standalone port receives has been also learned on a VLAN-unaware
      bridge port, ocelot will attempt to forward to that port, even though it
      can't, so it will drop packets.
      
      So there is a desire to avoid that, and isolate the FDBs of different
      bridges from one another, and from standalone ports.
      
      The ocelot switch library has two distinct entry points: the felix DSA
      driver and the ocelot switchdev driver.
      
      We need to code up a minimal bridge_num allocation in the ocelot
      switchdev driver too, this is copied from DSA with the exception that
      ocelot does not care about DSA trees, cross-chip bridging etc. So it
      only looks at its own ports that are already in the same bridge.
      
      The ocelot switchdev driver uses the bridge_num it has allocated itself,
      while the felix driver uses the bridge_num allocated by DSA. They are
      both stored inside ocelot_port->bridge_num by the common function
      ocelot_port_bridge_join() which receives the bridge_num passed by value.
      
      Once we have a bridge_num, we can only use it to enforce isolation
      between VLAN-unaware bridges. As far as I can see, ocelot does not have
      anything like a FID that further makes VLAN 100 from a port be different
      to VLAN 100 from another port with regard to FDB lookup. So we simply
      deny multiple VLAN-aware bridges.
      
      For VLAN-unaware bridges, we crop the 4000-4095 VLAN region and we
      allocate a VLAN for each bridge_num. This will be used as the pvid of
      each port that is under that VLAN-unaware bridge, for as long as that
      bridge is VLAN-unaware.
      
      VID 0 remains only for standalone ports. It is okay if all standalone
      ports use the same VID 0, since they perform no address learning, the
      FDB will contain no entry in VLAN 0, so the packets will always be
      flooded to the only possible destination, the CPU port.
      
      The CPU port module doesn't need to be member of the VLANs to receive
      packets, but if we use the DSA tag_8021q protocol, those packets are
      part of the data plane as far as ocelot is concerned, so there it needs
      to. Just ensure that the DSA tag_8021q CPU port is a member of all
      reserved VLANs when it is created, and is removed when it is deleted.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      54c31984
  3. 25 2月, 2022 1 次提交
    • V
      net: dsa: felix: support FDB entries on offloaded LAG interfaces · 961d8b69
      Vladimir Oltean 提交于
      This adds the logic in the Felix DSA driver and Ocelot switch library.
      For Ocelot switches, the DEST_IDX that is the output of the MAC table
      lookup is a logical port (equal to physical port, if no LAG is used, or
      a dynamically allocated number otherwise). The allocation we have in
      place for LAG IDs is different from DSA's, so we can't use that:
      - DSA allocates a continuous range of LAG IDs starting from 1
      - Ocelot appears to require that physical ports and LAG IDs are in the
        same space of [0, num_phys_ports), and additionally, ports that aren't
        in a LAG must have physical port id == logical port id
      
      The implication is that an FDB entry towards a LAG might need to be
      deleted and reinstalled when the LAG ID changes.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      961d8b69
  4. 17 2月, 2022 4 次提交
    • V
      net: mscc: ocelot: annotate which traps need PTP timestamping · 9d75b881
      Vladimir Oltean 提交于
      The ocelot switch library does not need this information, but the felix
      DSA driver does.
      
      As a reminder, the VSC9959 switch in LS1028A doesn't have an IRQ line
      for packet extraction, so to be notified that a PTP packet needs to be
      dequeued, it receives that packet also over Ethernet, by setting up a
      packet trap. The Felix driver needs to install special kinds of traps
      for packets in need of RX timestamps, such that the packets are
      replicated both over Ethernet and over the CPU port module.
      
      But the Ocelot switch library sets up more than one trap for PTP event
      messages; it also traps PTP general messages, MRP control messages etc.
      Those packets don't need PTP timestamps, so there's no reason for the
      Felix driver to send them to the CPU port module.
      
      By knowing which traps need PTP timestamps, the Felix driver can
      adjust the traps installed using ocelot_trap_add() such that only those
      will actually get delivered to the CPU port module.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9d75b881
    • V
      net: mscc: ocelot: keep traps in a list · e42bd4ed
      Vladimir Oltean 提交于
      When using the ocelot-8021q tagging protocol, the CPU port isn't
      configured as an NPI port, but is a regular port. So a "trap to CPU"
      operation is actually a "redirect" operation. So DSA needs to set up the
      trapping action one way or another, depending on the tagging protocol in
      use.
      
      To ease DSA's work of modifying the action, keep all currently installed
      traps in a list, so that DSA can live-patch them when the tagging
      protocol changes.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e42bd4ed
    • V
      net: mscc: ocelot: use a single VCAP filter for all MRP traps · b9bace6e
      Vladimir Oltean 提交于
      The MRP assist code installs a VCAP IS2 trapping rule for each port, but
      since the key and the action is the same, just the ingress port mask
      differs, there isn't any need to do this. We can save some space in the
      TCAM by using a single filter and adjusting the ingress port mask.
      
      Reuse the ocelot_trap_add() and ocelot_trap_del() functions for this
      purpose.
      
      Now that the cookies are no longer per port, we need to change the
      allocation scheme such that MRP traps use a fixed number.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b9bace6e
    • V
      net: mscc: ocelot: consolidate cookie allocation for private VCAP rules · c518afec
      Vladimir Oltean 提交于
      Every use case that needed VCAP filters (in order: DSA tag_8021q, MRP,
      PTP traps) has hardcoded filter identifiers that worked well enough for
      that use case alone. But when two or more of those use cases would be
      used together, some of those identifiers would overlap, leading to
      breakage.
      
      Add definitions for each cookie and centralize them in ocelot_vcap.h,
      such that the overlaps are more obvious.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c518afec
  5. 15 2月, 2022 1 次提交
  6. 14 2月, 2022 2 次提交
  7. 11 2月, 2022 1 次提交
  8. 05 2月, 2022 1 次提交
    • V
      net: mscc: ocelot: fix all IP traffic getting trapped to CPU with PTP over IP · 59085208
      Vladimir Oltean 提交于
      The filters for the PTP trap keys are incorrectly configured, in the
      sense that is2_entry_set() only looks at trap->key.ipv4.dport or
      trap->key.ipv6.dport if trap->key.ipv4.proto or trap->key.ipv6.proto is
      set to IPPROTO_TCP or IPPROTO_UDP.
      
      But we don't do that, so is2_entry_set() goes through the "else" branch
      of the IP protocol check, and ends up installing a rule for "Any IP
      protocol match" (because msk is also 0). The UDP port is ignored.
      
      This means that when we run "ptp4l -i swp0 -4", all IP traffic is
      trapped to the CPU, which hinders bridging.
      
      Fix this by specifying the IP protocol in the VCAP IS2 filters for PTP
      over UDP.
      
      Fixes: 96ca08c0 ("net: mscc: ocelot: set up traps for PTP packets")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59085208
  9. 13 1月, 2022 1 次提交
    • V
      net: mscc: ocelot: don't let phylink re-enable TX PAUSE on the NPI port · 33cb0ff3
      Vladimir Oltean 提交于
      Since commit b3964807 ("net: mscc: ocelot: disable flow control on
      NPI interface"), flow control should be disabled on the DSA CPU port
      when used in NPI mode.
      
      However, the commit blamed in the Fixes: tag below broke this, because
      it allowed felix_phylink_mac_link_up() to overwrite SYS_PAUSE_CFG_PAUSE_ENA
      for the DSA CPU port.
      
      This issue became noticeable since the device tree update from commit
      8fcea7be ("arm64: dts: ls1028a: mark internal links between Felix
      and ENETC as capable of flow control").
      
      The solution is to check whether this is the currently configured NPI
      port from ocelot_phylink_mac_link_up(), and to not modify the statically
      disabled PAUSE frame transmission if it is.
      
      When the port is configured for lossless mode as opposed to tail drop
      mode, but the link partner (DSA master) doesn't observe the transmitted
      PAUSE frames, the switch termination throughput is much worse, as can be
      seen below.
      
      Before:
      
      root@debian:~# iperf3 -c 192.168.100.2
      Connecting to host 192.168.100.2, port 5201
      [  5] local 192.168.100.1 port 37504 connected to 192.168.100.2 port 5201
      [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
      [  5]   0.00-1.00   sec  28.4 MBytes   238 Mbits/sec  357   22.6 KBytes
      [  5]   1.00-2.00   sec  33.6 MBytes   282 Mbits/sec  426   19.8 KBytes
      [  5]   2.00-3.00   sec  34.0 MBytes   285 Mbits/sec  343   21.2 KBytes
      [  5]   3.00-4.00   sec  32.9 MBytes   276 Mbits/sec  354   22.6 KBytes
      [  5]   4.00-5.00   sec  32.3 MBytes   271 Mbits/sec  297   18.4 KBytes
      ^C[  5]   5.00-5.06   sec  2.05 MBytes   270 Mbits/sec   45   19.8 KBytes
      - - - - - - - - - - - - - - - - - - - - - - - - -
      [ ID] Interval           Transfer     Bitrate         Retr
      [  5]   0.00-5.06   sec   163 MBytes   271 Mbits/sec  1822             sender
      [  5]   0.00-5.06   sec  0.00 Bytes  0.00 bits/sec                  receiver
      
      After:
      
      root@debian:~# iperf3 -c 192.168.100.2
      Connecting to host 192.168.100.2, port 5201
      [  5] local 192.168.100.1 port 49470 connected to 192.168.100.2 port 5201
      [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
      [  5]   0.00-1.00   sec   112 MBytes   941 Mbits/sec  259    143 KBytes
      [  5]   1.00-2.00   sec   110 MBytes   920 Mbits/sec  329    144 KBytes
      [  5]   2.00-3.00   sec   112 MBytes   936 Mbits/sec  255    144 KBytes
      [  5]   3.00-4.00   sec   110 MBytes   927 Mbits/sec  355    105 KBytes
      [  5]   4.00-5.00   sec   110 MBytes   926 Mbits/sec  350    156 KBytes
      [  5]   5.00-6.00   sec   110 MBytes   925 Mbits/sec  305    148 KBytes
      [  5]   6.00-7.00   sec   110 MBytes   924 Mbits/sec  320    143 KBytes
      [  5]   7.00-8.00   sec   110 MBytes   925 Mbits/sec  273   97.6 KBytes
      [  5]   8.00-9.00   sec   109 MBytes   913 Mbits/sec  299    141 KBytes
      [  5]   9.00-10.00  sec   110 MBytes   922 Mbits/sec  287    146 KBytes
      - - - - - - - - - - - - - - - - - - - - - - - - -
      [ ID] Interval           Transfer     Bitrate         Retr
      [  5]   0.00-10.00  sec  1.08 GBytes   926 Mbits/sec  3032             sender
      [  5]   0.00-10.00  sec  1.08 GBytes   925 Mbits/sec                  receiver
      
      Fixes: de274be3 ("net: dsa: felix: set TX flow control according to the phylink_mac_link_up resolution")
      Reported-by: NXiaoliang Yang <xiaoliang.yang_1@nxp.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      33cb0ff3
  10. 08 1月, 2022 2 次提交
    • V
      net: dsa: felix: add port fast age support · 5cad43a5
      Vladimir Oltean 提交于
      Add support for flushing the MAC table on a given port in the ocelot
      switch library, and use this functionality in the felix DSA driver.
      
      This operation is needed when a port leaves a bridge to become
      standalone, and when the learning is disabled, and when the STP state
      changes to a state where no FDB entry should be present.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/20220107144229.244584-1-vladimir.oltean@nxp.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      5cad43a5
    • V
      net: mscc: ocelot: fix incorrect balancing with down LAG ports · a14e6b69
      Vladimir Oltean 提交于
      Assuming the test setup described here:
      https://patchwork.kernel.org/project/netdevbpf/cover/20210205130240.4072854-1-vladimir.oltean@nxp.com/
      (swp1 and swp2 are in bond0, and bond0 is in a bridge with swp0)
      
      it can be seen that when swp1 goes down (on either board A or B), then
      traffic that should go through that port isn't forwarded anywhere.
      
      A dump of the PGID table shows the following:
      
      PGID_DST[0] = ports 0
      PGID_DST[1] = ports 1
      PGID_DST[2] = ports 2
      PGID_DST[3] = ports 3
      PGID_DST[4] = ports 4
      PGID_DST[5] = ports 5
      PGID_DST[6] = no ports
      PGID_AGGR[0] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[1] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[2] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[3] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[4] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[5] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[6] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[7] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[8] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[9] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[10] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[11] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[12] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[13] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[14] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[15] = ports 0, 1, 2, 3, 4, 5
      PGID_SRC[0] = ports 1, 2
      PGID_SRC[1] = ports 0
      PGID_SRC[2] = ports 0
      PGID_SRC[3] = no ports
      PGID_SRC[4] = no ports
      PGID_SRC[5] = no ports
      PGID_SRC[6] = ports 0, 1, 2, 3, 4, 5
      
      Whereas a "good" PGID configuration for that setup should have looked
      like this:
      
      PGID_DST[0] = ports 0
      PGID_DST[1] = ports 1, 2
      PGID_DST[2] = ports 1, 2
      PGID_DST[3] = ports 3
      PGID_DST[4] = ports 4
      PGID_DST[5] = ports 5
      PGID_DST[6] = no ports
      PGID_AGGR[0] = ports 0, 2, 3, 4, 5
      PGID_AGGR[1] = ports 0, 2, 3, 4, 5
      PGID_AGGR[2] = ports 0, 2, 3, 4, 5
      PGID_AGGR[3] = ports 0, 2, 3, 4, 5
      PGID_AGGR[4] = ports 0, 2, 3, 4, 5
      PGID_AGGR[5] = ports 0, 2, 3, 4, 5
      PGID_AGGR[6] = ports 0, 2, 3, 4, 5
      PGID_AGGR[7] = ports 0, 2, 3, 4, 5
      PGID_AGGR[8] = ports 0, 2, 3, 4, 5
      PGID_AGGR[9] = ports 0, 2, 3, 4, 5
      PGID_AGGR[10] = ports 0, 2, 3, 4, 5
      PGID_AGGR[11] = ports 0, 2, 3, 4, 5
      PGID_AGGR[12] = ports 0, 2, 3, 4, 5
      PGID_AGGR[13] = ports 0, 2, 3, 4, 5
      PGID_AGGR[14] = ports 0, 2, 3, 4, 5
      PGID_AGGR[15] = ports 0, 2, 3, 4, 5
      PGID_SRC[0] = ports 1, 2
      PGID_SRC[1] = ports 0
      PGID_SRC[2] = ports 0
      PGID_SRC[3] = no ports
      PGID_SRC[4] = no ports
      PGID_SRC[5] = no ports
      PGID_SRC[6] = ports 0, 1, 2, 3, 4, 5
      
      In other words, in the "bad" configuration, the attempt is to remove the
      inactive swp1 from the destination ports via PGID_DST. But when a MAC
      table entry is learned, it is learned towards PGID_DST 1, because that
      is the logical port id of the LAG itself (it is equal to the lowest
      numbered member port). So when swp1 becomes inactive, if we set
      PGID_DST[1] to contain just swp1 and not swp2, the packet will not have
      any chance to reach the destination via swp2.
      
      The "correct" way to remove swp1 as a destination is via PGID_AGGR
      (remove swp1 from the aggregation port groups for all aggregation
      codes). This means that PGID_DST[1] and PGID_DST[2] must still contain
      both swp1 and swp2. This makes the MAC table still treat packets
      destined towards the single-port LAG as "multicast", and the inactive
      ports are removed via the aggregation code tables.
      
      The change presented here is a design one: the ocelot_get_bond_mask()
      function used to take an "only_active_ports" argument. We don't need
      that. The only call site that specifies only_active_ports=true,
      ocelot_set_aggr_pgids(), must retrieve the entire bonding mask, because
      it must program that into PGID_DST. Additionally, it must also clear the
      inactive ports from the bond mask here, which it can't do if bond_mask
      just contains the active ports:
      
      	ac = ocelot_read_rix(ocelot, ANA_PGID_PGID, i);
      	ac &= ~bond_mask;  <---- here
      	/* Don't do division by zero if there was no active
      	 * port. Just make all aggregation codes zero.
      	 */
      	if (num_active_ports)
      		ac |= BIT(aggr_idx[i % num_active_ports]);
      	ocelot_write_rix(ocelot, ac, ANA_PGID_PGID, i);
      
      So it becomes the responsibility of ocelot_set_aggr_pgids() to take
      ocelot_port->lag_tx_active into consideration when populating the
      aggr_idx array.
      
      Fixes: 23ca3b72 ("net: mscc: ocelot: rebalance LAGs on link up/down events")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20220107164332.402133-1-vladimir.oltean@nxp.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      a14e6b69
  11. 14 12月, 2021 1 次提交
    • H
      net_tstamp: add new flag HWTSTAMP_FLAG_BONDED_PHC_INDEX · 9c9211a3
      Hangbin Liu 提交于
      Since commit 94dd016a ("bond: pass get_ts_info and SIOC[SG]HWTSTAMP
      ioctl to active device") the user could get bond active interface's
      PHC index directly. But when there is a failover, the bond active
      interface will change, thus the PHC index is also changed. This may
      break the user's program if they did not update the PHC timely.
      
      This patch adds a new hwtstamp_config flag HWTSTAMP_FLAG_BONDED_PHC_INDEX.
      When the user wants to get the bond active interface's PHC, they need to
      add this flag and be aware the PHC index may be changed.
      
      With the new flag. All flag checks in current drivers are removed. Only
      the checking in net_hwtstamp_validate() is kept.
      Suggested-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9c9211a3
  12. 11 12月, 2021 2 次提交
  13. 30 11月, 2021 2 次提交
  14. 27 11月, 2021 3 次提交
    • V
      net: mscc: ocelot: correctly report the timestamping RX filters in ethtool · c49a35ee
      Vladimir Oltean 提交于
      The driver doesn't support RX timestamping for non-PTP packets, but it
      declares that it does. Restrict the reported RX filters to PTP v2 over
      L2 and over L4.
      
      Fixes: 4e3b0468 ("net: mscc: PTP Hardware Clock (PHC) support")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      c49a35ee
    • V
      net: mscc: ocelot: set up traps for PTP packets · 96ca08c0
      Vladimir Oltean 提交于
      IEEE 1588 support was declared too soon for the Ocelot switch. Out of
      reset, this switch does not apply any special treatment for PTP packets,
      i.e. when an event message is received, the natural tendency is to
      forward it by MAC DA/VLAN ID. This poses a problem when the ingress port
      is under a bridge, since user space application stacks (written
      primarily for endpoint ports, not switches) like ptp4l expect that PTP
      messages are always received on AF_PACKET / AF_INET sockets (depending
      on the PTP transport being used), and never being autonomously
      forwarded. Any forwarding, if necessary (for example in Transparent
      Clock mode) is handled in software by ptp4l. Having the hardware forward
      these packets too will cause duplicates which will confuse endpoints
      connected to these switches.
      
      So PTP over L2 barely works, in the sense that PTP packets reach the CPU
      port, but they reach it via flooding, and therefore reach lots of other
      unwanted destinations too. But PTP over IPv4/IPv6 does not work at all.
      This is because the Ocelot switch have a separate destination port mask
      for unknown IP multicast (which PTP over IP is) flooding compared to
      unknown non-IP multicast (which PTP over L2 is) flooding. Specifically,
      the driver allows the CPU port to be in the PGID_MC port group, but not
      in PGID_MCIPV4 and PGID_MCIPV6. There are several presentations from
      Allan Nielsen which explain that the embedded MIPS CPU on Ocelot
      switches is not very powerful at all, so every penny they could save by
      not allowing flooding to the CPU port module matters. Unknown IP
      multicast did not make it.
      
      The de facto consensus is that when a switch is PTP-aware and an
      application stack for PTP is running, switches should have some sort of
      trapping mechanism for PTP packets, to extract them from the hardware
      data path. This avoids both problems:
      (a) PTP packets are no longer flooded to unwanted destinations
      (b) PTP over IP packets are no longer denied from reaching the CPU since
          they arrive there via a trap and not via flooding
      
      It is not the first time when this change is attempted. Last time, the
      feedback from Allan Nielsen and Andrew Lunn was that the traps should
      not be installed by default, and that PTP-unaware switching may be
      desired for some use cases:
      https://patchwork.ozlabs.org/project/netdev/patch/20190813025214.18601-5-yangbo.lu@nxp.com/
      
      To address that feedback, the present patch adds the necessary packet
      traps according to the RX filter configuration transmitted by user space
      through the SIOCSHWTSTAMP ioctl. Trapping is done via VCAP IS2, where we
      keep 5 filters, which are amended each time RX timestamping is enabled
      or disabled on a port:
      - 1 for PTP over L2
      - 2 for PTP over IPv4 (UDP ports 319 and 320)
      - 2 for PTP over IPv6 (UDP ports 319 and 320)
      
      The cookie by which these filters (invisible to tc) are identified is
      strategically chosen such that it does not collide with the filters used
      for the ocelot-8021q tagging protocol by the Felix driver, or with the
      MRP traps set up by the Ocelot library.
      
      Other alternatives were considered, like patching user space to do
      something, but there are so many ways in which PTP packets could be made
      to reach the CPU, generically speaking, that "do what?" is a very valid
      question. The ptp4l program from the linuxptp stack already attempts to
      do something: it calls setsockopt(IP_ADD_MEMBERSHIP) (and
      PACKET_ADD_MEMBERSHIP, respectively) which translates in both cases into
      a dev_mc_add() on the interface, in the kernel:
      https://github.com/richardcochran/linuxptp/blob/v3.1.1/udp.c#L73
      https://github.com/richardcochran/linuxptp/blob/v3.1.1/raw.c
      
      Reality shows that this is not sufficient in case the interface belongs
      to a switchdev driver, as dev_mc_add() does not show the intention to
      trap a packet to the CPU, but rather the intention to not drop it (it is
      strictly for RX filtering, same as promiscuous does not mean to send all
      traffic to the CPU, but to not drop traffic with unknown MAC DA). This
      topic is a can of worms in itself, and it would be great if user space
      could just stay out of it.
      
      On the other hand, setting up PTP traps privately within the driver is
      not new by any stretch of the imagination:
      https://elixir.bootlin.com/linux/v5.16-rc2/source/drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c#L833
      https://elixir.bootlin.com/linux/v5.16-rc2/source/drivers/net/dsa/hirschmann/hellcreek.c#L1050
      https://elixir.bootlin.com/linux/v5.16-rc2/source/include/linux/dsa/sja1105.h#L21
      
      So this is the approach taken here as well. The difference here being
      that we prepare and destroy the traps per port, dynamically at runtime,
      as opposed to driver init time, because apparently, PTP-unaware
      forwarding is a use case.
      
      Fixes: 4e3b0468 ("net: mscc: PTP Hardware Clock (PHC) support")
      Reported-by: NPo Liu <po.liu@nxp.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      96ca08c0
    • V
      net: mscc: ocelot: don't downgrade timestamping RX filters in SIOCSHWTSTAMP · 8a075464
      Vladimir Oltean 提交于
      The ocelot driver, when asked to timestamp all receiving packets, 1588
      v1 or NTP, says "nah, here's 1588 v2 for you".
      
      According to this discussion:
      https://patchwork.kernel.org/project/netdevbpf/patch/20211104133204.19757-8-martin.kaistra@linutronix.de/#24577647
      drivers that downgrade from a wider request to a narrower response (or
      even a response where the intersection with the request is empty) are
      buggy, and should return -ERANGE instead. This patch fixes that.
      
      Fixes: 4e3b0468 ("net: mscc: PTP Hardware Clock (PHC) support")
      Suggested-by: NRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      8a075464
  15. 26 11月, 2021 2 次提交
    • V
      net: dsa: felix: enable cut-through forwarding between ports by default · 8abe1970
      Vladimir Oltean 提交于
      The VSC9959 switch embedded within NXP LS1028A (and that version of
      Ocelot switches only) supports cut-through forwarding - meaning it can
      start the process of looking up the destination ports for a packet, and
      forward towards those ports, before the entire packet has been received
      (as opposed to the store-and-forward mode).
      
      The up side is having lower forwarding latency for large packets. The
      down side is that frames with FCS errors are forwarded instead of being
      dropped. However, erroneous frames do not result in incorrect updates of
      the FDB or incorrect policer updates, since these processes are deferred
      inside the switch to the end of frame. Since the switch starts the
      cut-through forwarding process after all packet headers (including IP,
      if any) have been processed, packets with large headers and small
      payload do not see the benefit of lower forwarding latency.
      
      There are two cases that need special attention.
      
      The first is when a packet is multicast (or flooded) to multiple
      destinations, one of which doesn't have cut-through forwarding enabled.
      The switch deals with this automatically by disabling cut-through
      forwarding for the frame towards all destination ports.
      
      The second is when a packet is forwarded from a port of lower link speed
      towards a port of higher link speed. This is not handled by the hardware
      and needs software intervention.
      
      Since we practically need to update the cut-through forwarding domain
      from paths that aren't serialized by the rtnl_mutex (phylink
      mac_link_down/mac_link_up ops), this means we need to serialize physical
      link events with user space updates of bonding/bridging domains.
      
      Enabling cut-through forwarding is done per {egress port, traffic class}.
      I don't see any reason why this would be a configurable option as long
      as it works without issues, and there doesn't appear to be any user
      space configuration tool to toggle this on/off, so this patch enables
      cut-through forwarding on all eligible ports and traffic classes.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20211125125808.2383984-2-vladimir.oltean@nxp.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      8abe1970
    • V
      net: ocelot: remove "bridge" argument from ocelot_get_bridge_fwd_mask · a8bd9fa5
      Vladimir Oltean 提交于
      The only called takes ocelot_port->bridge and passes it as the "bridge"
      argument to this function, which then compares it with
      ocelot_port->bridge. This is not useful.
      
      Instead, we would like this function to return 0 if ocelot_port->bridge
      is not present, which is what this patch does.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20211125125808.2383984-1-vladimir.oltean@nxp.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      a8bd9fa5
  16. 18 11月, 2021 2 次提交
  17. 25 10月, 2021 2 次提交
  18. 24 10月, 2021 1 次提交
  19. 21 10月, 2021 5 次提交
    • V
      net: mscc: ocelot: track the port pvid using a pointer · d4004422
      Vladimir Oltean 提交于
      Now that we have a list of struct ocelot_bridge_vlan entries, we can
      rewrite the pvid logic to simply point to one of those structures,
      instead of having a separate structure with a "bool valid".
      The NULL pointer will represent the lack of a bridge pvid (not to be
      confused with the lack of a hardware pvid on the port, that is present
      at all times).
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4004422
    • V
      net: mscc: ocelot: add the local station MAC addresses in VID 0 · bfbab310
      Vladimir Oltean 提交于
      The ocelot switchdev driver does not include the CPU port in the list of
      flooding destinations for unknown traffic, instead that traffic is
      supposed to match FDB entries to reach the CPU.
      
      The addresses it installs are:
      (a) the station MAC address, in ocelot_probe_port() and later during
          runtime in ocelot_port_set_mac_address(). These are the VLAN-unaware
          addresses. The VLAN-aware addresses are in ocelot_vlan_vid_add().
      (b) multicast addresses added with dev_mc_add() (not bridge host MDB
          entries) in ocelot_mc_sync()
      (c) multicast destination MAC addresses for MRP in ocelot_mrp_save_mac(),
          to make sure those are dropped (not forwarded) by the bridging
          service, just trapped to the CPU
      
      So we can see that the logic is slightly buggy ever since the initial
      commit a556c76a ("net: mscc: Add initial Ocelot switch support").
      This is because, when ocelot_probe_port() runs, the port pvid is 0.
      Then we join a VLAN-aware bridge, the pvid becomes 1, we call
      ocelot_port_set_mac_address(), this learns the new MAC address in VID 1
      (also fails to forget the old one, since it thinks it's in VID 1, but
      that's not so important). Then when we leave the VLAN-aware bridge,
      outside world is unable to ping our new MAC address because it isn't
      learned in VID 0, the VLAN-unaware pvid.
      
      [ note: this is strictly based on static analysis, I don't have hardware
        to test. But there are also many more corner cases ]
      
      The basic idea is that we should have a separation of concerns, and the
      FDB entries used for standalone operation should be managed by the
      driver, and the FDB entries used by the bridging service should be
      managed by the bridge. So the standalone and VLAN-unaware bridge FDB
      entries should not follow the bridge PVID, because that will only be
      active when the bridge is VLAN-aware. So since the port pvid is
      coincidentally zero during probe time, just make those entries
      statically go to VID 0.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bfbab310
    • V
      net: mscc: ocelot: allow a config where all bridge VLANs are egress-untagged · 0da1a1c4
      Vladimir Oltean 提交于
      At present, the ocelot driver accepts a single egress-untagged bridge
      VLAN, meaning that this sequence of operations:
      
      ip link add br0 type bridge vlan_filtering 1
      ip link set swp0 master br0
      bridge vlan add dev swp0 vid 2 pvid untagged
      
      fails because the bridge automatically installs VID 1 as a pvid & untagged
      VLAN, and vid 2 would be the second untagged VLAN on this port. It is
      necessary to delete VID 1 before proceeding to add VID 2.
      
      This limitation comes from the fact that we operate the port tag, when
      it has an egress-untagged VID, in the OCELOT_PORT_TAG_NATIVE mode.
      The ocelot switches do not have full flexibility and can either have one
      single VID as egress-untagged, or all of them.
      
      There are use cases for having all VLANs as egress-untagged as well, and
      this patch adds support for that.
      
      The change rewrites ocelot_port_set_native_vlan() into a more generic
      ocelot_port_manage_port_tag() function. Because the software bridge's
      state, transmitted to us via switchdev, can become very complex, we
      don't attempt to track all possible state transitions, but instead take
      a more declarative approach and just make ocelot_port_manage_port_tag()
      figure out which more to operate in:
      
      - port is VLAN-unaware: the classified VLAN (internal, unrelated to the
                              802.1Q header) is not inserted into packets on egress
      - port is VLAN-aware:
        - port has tagged VLANs:
          -> port has no untagged VLAN: set up as pure trunk
          -> port has one untagged VLAN: set up as trunk port + native VLAN
          -> port has more than one untagged VLAN: this is an invalid config
             which is rejected by ocelot_vlan_prepare
        - port has no tagged VLANs
          -> set up as pure egress-untagged port
      
      We don't keep the number of tagged and untagged VLANs, we just count the
      structures we keep.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0da1a1c4
    • V
      net: mscc: ocelot: convert the VLAN masks to a list · 90e0aa8d
      Vladimir Oltean 提交于
      First and foremost, the driver currently allocates a constant sized
      4K * u32 (16KB memory) array for the VLAN masks. However, a typical
      application might not need so many VLANs, so if we dynamically allocate
      the memory as needed, we might actually save some space.
      
      Secondly, we'll need to keep more advanced bookkeeping of the VLANs we
      have, notably we'll have to check how many untagged and how many tagged
      VLANs we have. This will have to stay in a structure, and allocating
      another 16 KB array for that is again a bit too much.
      
      So refactor the bridge VLANs in a linked list of structures.
      
      The hook points inside the driver are ocelot_vlan_member_add() and
      ocelot_vlan_member_del(), which previously used to operate on the
      ocelot->vlan_mask[vid] array element.
      
      ocelot_vlan_member_add() and ocelot_vlan_member_del() used to call
      ocelot_vlan_member_set() to commit to the ocelot->vlan_mask.
      Additionally, we had two calls to ocelot_vlan_member_set() from outside
      those callers, and those were directly from ocelot_vlan_init().
      Those calls do not set up bridging service VLANs, instead they:
      
      - clear the VLAN table on reset
      - set the port pvid to the value used by this driver for VLAN-unaware
        standalone port operation (VID 0)
      
      So now, when we have a structure which represents actual bridge VLANs,
      VID 0 doesn't belong in that structure, since it is not part of the
      bridging layer.
      
      So delete the middle man, ocelot_vlan_member_set(), and let
      ocelot_vlan_init() call directly ocelot_vlant_set_mask() which forgoes
      any data structure and writes directly to hardware, which is all that we
      need.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      90e0aa8d
    • V
      net: mscc: ocelot: add a type definition for REW_TAG_CFG_TAG_CFG · 62a22bcb
      Vladimir Oltean 提交于
      This is a cosmetic patch which clarifies what are the port tagging
      options for Ocelot switches.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      62a22bcb
  20. 13 10月, 2021 5 次提交
    • V
      net: dsa: tag_ocelot: break circular dependency with ocelot switch lib driver · deab6b1c
      Vladimir Oltean 提交于
      As explained here:
      https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
      DSA tagging protocol drivers cannot depend on symbols exported by switch
      drivers, because this creates a circular dependency that breaks module
      autoloading.
      
      The tag_ocelot.c file depends on the ocelot_ptp_rew_op() function
      exported by the common ocelot switch lib. This function looks at
      OCELOT_SKB_CB(skb) and computes how to populate the REW_OP field of the
      DSA tag, for PTP timestamping (the command: one-step/two-step, and the
      TX timestamp identifier).
      
      None of that requires deep insight into the driver, it is quite
      stateless, as it only depends upon the skb->cb. So let's make it a
      static inline function and put it in include/linux/dsa/ocelot.h, a
      file that despite its name is used by the ocelot switch driver for
      populating the injection header too - since commit 40d3f295 ("net:
      mscc: ocelot: use common tag parsing code with DSA").
      
      With that function declared as static inline, its body is expanded
      inside each call site, so the dependency is broken and the DSA tagger
      can be built without the switch library, upon which the felix driver
      depends.
      
      Fixes: 39e5308b ("net: mscc: ocelot: support PTP Sync one-step timestamping")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      deab6b1c
    • V
      net: mscc: ocelot: cross-check the sequence id from the timestamp FIFO with the skb PTP header · ebb4c6a9
      Vladimir Oltean 提交于
      The sad reality is that when a PTP frame with a TX timestamping request
      is transmitted, it isn't guaranteed that it will make it all the way to
      the wire (due to congestion inside the switch), and that a timestamp
      will be taken by the hardware and placed in the timestamp FIFO where an
      IRQ will be raised for it.
      
      The implication is that if enough PTP frames are silently dropped by the
      hardware such that the timestamp ID has rolled over, it is possible to
      match a timestamp to an old skb.
      
      Furthermore, nobody will match on the real skb corresponding to this
      timestamp, since we stupidly matched on a previous one that was stale in
      the queue, and stopped there.
      
      So PTP timestamping will be broken and there will be no way to recover.
      
      It looks like the hardware parses the sequenceID from the PTP header,
      and also provides that metadata for each timestamp. The driver currently
      ignores this, but it shouldn't.
      
      As an extra resiliency measure, do the following:
      
      - check whether the PTP sequenceID also matches between the skb and the
        timestamp, treat the skb as stale otherwise and free it
      
      - if we see a stale skb, don't stop there and try to match an skb one
        more time, chances are there's one more skb in the queue with the same
        timestamp ID, otherwise we wouldn't have ever found the stale one (it
        is by timestamp ID that we matched it).
      
      While this does not prevent PTP packet drops, it at least prevents
      the catastrophic consequences of incorrect timestamp matching.
      
      Since we already call ptp_classify_raw in the TX path, save the result
      in the skb->cb of the clone, and just use that result in the interrupt
      code path.
      
      Fixes: 4e3b0468 ("net: mscc: PTP Hardware Clock (PHC) support")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      ebb4c6a9
    • V
      net: mscc: ocelot: deny TX timestamping of non-PTP packets · fba01283
      Vladimir Oltean 提交于
      It appears that Ocelot switches cannot timestamp non-PTP frames,
      I tested this using the isochron program at:
      https://github.com/vladimiroltean/tsn-scripts
      
      with the result that the driver increments the ocelot_port->ts_id
      counter as expected, puts it in the REW_OP, but the hardware seems to
      not timestamp these packets at all, since no IRQ is emitted.
      
      Therefore check whether we are sending PTP frames, and refuse to
      populate REW_OP otherwise.
      
      Fixes: 4e3b0468 ("net: mscc: PTP Hardware Clock (PHC) support")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      fba01283
    • V
      net: mscc: ocelot: warn when a PTP IRQ is raised for an unknown skb · 9fde506e
      Vladimir Oltean 提交于
      When skb_match is NULL, it means we received a PTP IRQ for a timestamp
      ID that the kernel has no idea about, since there is no skb in the
      timestamping queue with that timestamp ID.
      
      This is a grave error and not something to just "continue" over.
      So print a big warning in case this happens.
      
      Also, move the check above ocelot_get_hwtimestamp(), there is no point
      in reading the full 64-bit current PTP time if we're not going to do
      anything with it anyway for this skb.
      
      Fixes: 4e3b0468 ("net: mscc: PTP Hardware Clock (PHC) support")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      9fde506e
    • V
      net: mscc: ocelot: avoid overflowing the PTP timestamp FIFO · 52849bcf
      Vladimir Oltean 提交于
      PTP packets with 2-step TX timestamp requests are matched to packets
      based on the egress port number and a 6-bit timestamp identifier.
      All PTP timestamps are held in a common FIFO that is 128 entry deep.
      
      This patch ensures that back-to-back timestamping requests cannot exceed
      the hardware FIFO capacity. If that happens, simply send the packets
      without requesting a TX timestamp to be taken (in the case of felix,
      since the DSA API has a void return code in ds->ops->port_txtstamp) or
      drop them (in the case of ocelot).
      
      I've moved the ts_id_lock from a per-port basis to a per-switch basis,
      because we need separate accounting for both numbers of PTP frames in
      flight. And since we need locking to inc/dec the per-switch counter,
      that also offers protection for the per-port counter and hence there is
      no reason to have a per-port counter anymore.
      
      Fixes: 4e3b0468 ("net: mscc: PTP Hardware Clock (PHC) support")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      52849bcf