1. 22 11月, 2021 2 次提交
  2. 10 11月, 2021 1 次提交
  3. 03 11月, 2021 2 次提交
    • V
      net: dsa: felix: fix broken VLAN-tagged PTP under VLAN-aware bridge · 92f62485
      Vladimir Oltean 提交于
      Normally it is expected that the dsa_device_ops :: rcv() method finishes
      parsing the DSA tag and consumes it, then never looks at it again.
      
      But commit c0bcf537 ("net: dsa: ocelot: add hardware timestamping
      support for Felix") added support for RX timestamping in a very
      unconventional way. On this switch, a partial timestamp is available in
      the DSA header, but the driver got away with not parsing that timestamp
      right away, but instead delayed that parsing for a little longer:
      
      dsa_switch_rcv():
      	nskb = cpu_dp->rcv(skb, dev); <------------- not here
      	-> ocelot_rcv()
      	...
      
      	skb = nskb;
      	skb_push(skb, ETH_HLEN);
      	skb->pkt_type = PACKET_HOST;
      	skb->protocol = eth_type_trans(skb, skb->dev);
      
      	...
      
      	if (dsa_skb_defer_rx_timestamp(p, skb)) <--- but here
      	-> felix_rxtstamp()
      		return 0;
      
      When in felix_rxtstamp(), this driver accounted for the fact that
      eth_type_trans() happened in the meanwhile, so it got a hold of the
      extraction header again by subtracting (ETH_HLEN + OCELOT_TAG_LEN) bytes
      from the current skb->data.
      
      This worked for quite some time but was quite fragile from the very
      beginning. Not to mention that having DSA tag parsing split in two
      different files, under different folders (net/dsa/tag_ocelot.c vs
      drivers/net/dsa/ocelot/felix.c) made it quite non-obvious for patches to
      come that they might break this.
      
      Finally, the blamed commit does the following: at the end of
      ocelot_rcv(), it checks whether the skb payload contains a VLAN header.
      If it does, and this port is under a VLAN-aware bridge, that VLAN ID
      might not be correct in the sense that the packet might have suffered
      VLAN rewriting due to TCAM rules (VCAP IS1). So we consume the VLAN ID
      from the skb payload using __skb_vlan_pop(), and take the classified
      VLAN ID from the DSA tag, and construct a hwaccel VLAN tag with the
      classified VLAN, and the skb payload is VLAN-untagged.
      
      The big problem is that __skb_vlan_pop() does:
      
      	memmove(skb->data + VLAN_HLEN, skb->data, 2 * ETH_ALEN);
      	__skb_pull(skb, VLAN_HLEN);
      
      aka it moves the Ethernet header 4 bytes to the right, and pulls 4 bytes
      from the skb headroom (effectively also moving skb->data, by definition).
      So for felix_rxtstamp()'s fragile logic, all bets are off now.
      Instead of having the "extraction" pointer point to the DSA header,
      it actually points to 4 bytes _inside_ the extraction header.
      Corollary, the last 4 bytes of the "extraction" header are in fact 4
      stale bytes of the destination MAC address from the Ethernet header,
      from prior to the __skb_vlan_pop() movement.
      
      So of course, RX timestamps are completely bogus when the system is
      configured in this way.
      
      The fix is actually very simple: just don't structure the code like that.
      For better or worse, the DSA PTP timestamping API does not offer a
      straightforward way for drivers to present their RX timestamps, but
      other drivers (sja1105) have established a simple mechanism to carry
      their RX timestamp from dsa_device_ops :: rcv() all the way to
      dsa_switch_ops :: port_rxtstamp() and even later. That mechanism is to
      simply save the partial timestamp to the skb->cb, and complete it later.
      
      Question: why don't we simply populate the skb's struct
      skb_shared_hwtstamps from ocelot_rcv(), and bother with this
      complication of propagating the timestamp to felix_rxtstamp()?
      
      Answer: dsa_switch_ops :: port_rxtstamp() answers the question whether
      PTP packets need sleepable context to retrieve the full RX timestamp.
      Currently felix_rxtstamp() answers "no, thanks" to that question, and
      calls ocelot_ptp_gettime64() from softirq atomic context. This is
      understandable, since Felix VSC9959 is a PCIe memory-mapped switch, so
      hardware access does not require sleeping. But the felix driver is
      preparing for the introduction of other switches where hardware access
      is over a slow bus like SPI or MDIO:
      https://lore.kernel.org/lkml/20210814025003.2449143-1-colin.foster@in-advantage.com/
      
      So I would like to keep this code structure, so the rework needed when
      that driver will need PTP support will be minimal (answer "yes, I need
      deferred context for this skb's RX timestamp", then the partial
      timestamp will still be found in the skb->cb.
      
      Fixes: ea440cd2 ("net: dsa: tag_ocelot: use VLAN information from tagging header when available")
      Reported-by: NPo Liu <po.liu@nxp.com>
      Cc: Yangbo Lu <yangbo.lu@nxp.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92f62485
    • A
      net: dsa: qca8k: make sure PAD0 MAC06 exchange is disabled · 5f15d392
      Ansuel Smith 提交于
      Some device set MAC06 exchange in the bootloader. This cause some
      problem as we don't support this strange mode and we just set the port6
      as the primary CPU port. With MAC06 exchange, PAD0 reg configure port6
      instead of port0. Add an extra check and explicitly disable MAC06 exchange
      to correctly configure the port PAD config.
      Signed-off-by: NAnsuel Smith <ansuelsmth@gmail.com>
      Fixes: 3fcf734a ("net: dsa: qca8k: add support for cpu port 6")
      Reviewed-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5f15d392
  4. 25 10月, 2021 5 次提交
    • V
      net: dsa: lantiq_gswip: serialize access to the PCE registers · cf231b43
      Vladimir Oltean 提交于
      The GSWIP switch accesses various bridging layer tables (VLANs, FDBs,
      forwarding rules) indirectly through PCE registers. These hardware
      accesses are non-atomic, being comprised of several register reads and
      writes.
      
      These accesses are currently serialized by the rtnl_lock, but DSA is
      changing its driver API and that lock will no longer be held when
      calling ->port_fdb_add() and ->port_fdb_del().
      
      So this driver needs to serialize the access to the PCE registers using
      its own locking scheme. This patch adds that.
      
      Note that the driver also uses the gswip_pce_load_microcode() function
      to load a static configuration for the packet classification engine into
      a table using the same registers. It is currently not protected, but
      since that configuration is only done from the dsa_switch_ops :: setup
      method, there is no risk of it being concurrent with other operations.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NHauke Mehrtens <hauke@hauke-m.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf231b43
    • V
      net: dsa: b53: serialize access to the ARL table · f7eb4a1c
      Vladimir Oltean 提交于
      The b53 driver performs non-atomic transactions to the ARL table when
      adding, deleting and reading FDB and MDB entries.
      
      Traditionally these were all serialized by the rtnl_lock(), but now it
      is possible that DSA calls ->port_fdb_add and ->port_fdb_del without
      holding that lock.
      
      So the driver must have its own serialization logic. Add a mutex and
      hold it from all entry points (->port_fdb_{add,del,dump},
      ->port_mdb_{add,del}).
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f7eb4a1c
    • V
      net: dsa: sja1105: serialize access to the dynamic config interface · eb016afd
      Vladimir Oltean 提交于
      The sja1105 hardware seems as concurrent as can be, but when we create a
      background script that adds/removes a rain of FDB entries without the
      rtnl_mutex taken, then in parallel we do another operation like run
      'bridge fdb show', we can notice these errors popping up:
      
      sja1105 spi2.0: port 2 failed to read back entry for 00:01:02:03:00:40 vid 0: -ENOENT
      sja1105 spi2.0: port 2 failed to add 00:01:02:03:00:40 vid 0 to fdb: -2
      sja1105 spi2.0: port 2 failed to read back entry for 00:01:02:03:00:46 vid 0: -ENOENT
      sja1105 spi2.0: port 2 failed to add 00:01:02:03:00:46 vid 0 to fdb: -2
      
      Luckily what is going on does not require a major rework in the driver.
      The sja1105_dynamic_config_read() function sends multiple SPI buffers to
      the peripheral until the operation completes. We should not do anything
      until the hardware clears the VALID bit.
      
      But since there is no locking (i.e. right now we are implicitly
      serialized by the rtnl_mutex, but if we remove that), it might be
      possible that the process which performs the dynamic config read is
      preempted and another one performs a dynamic config write.
      
      What will happen in that case is that sja1105_dynamic_config_read(),
      when it resumes, expects to see VALIDENT set for the entry it reads
      back. But it won't.
      
      This can be corrected by introducing a mutex for serializing SPI
      accesses to the dynamic config interface which should be atomic with
      respect to each other.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb016afd
    • V
      net: dsa: sja1105: wait for dynamic config command completion on writes too · df405910
      Vladimir Oltean 提交于
      The hardware manual says that software should attempt a new dynamic
      config access (be it a a write or a read-back) only while the VALID bit
      is cleared. The VALID bit is set by software to 1, and it remains set as
      long as the hardware is still processing the request.
      
      Currently the driver only polls for the command completion only for
      reads, because that's when we need the actual data read back. Writes
      have been more or less "asynchronous", although this has never been an
      observable issue.
      
      This change makes sja1105_dynamic_config_write poll the VALID bit as
      well, to absolutely ensure that a follow-up access to the static config
      finds the VALID bit cleared.
      
      So VALID means "work in progress", while VALIDENT means "entry being
      read is valid". On reads we check the VALIDENT bit too, while on writes
      that bit is not always defined. So we need to factor it out of the loop,
      and make the loop provide back the unpacked command structure, so that
      sja1105_dynamic_config_read can check the VALIDENT bit.
      
      The change also attempts to convert the open-coded loop to use the
      read_poll_timeout macro, since I know this will come up during review.
      It's more code, but hey, it uses read_poll_timeout!
      
      Tested on SJA1105T, SJA1105S, SJA1110A.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df405910
    • D
      Revert "Merge branch 'dsa-rtnl'" · 2d7e73f0
      David S. Miller 提交于
      This reverts commit 965e6b26, reversing
      changes made to 4d98bb0d.
      2d7e73f0
  5. 24 10月, 2021 6 次提交
    • S
      net: convert users of bitmap_foo() to linkmode_foo() · 4973056c
      Sean Anderson 提交于
      This converts instances of
      	bitmap_foo(args..., __ETHTOOL_LINK_MODE_MASK_NBITS)
      to
      	linkmode_foo(args...)
      
      I manually fixed up some lines to prevent them from being excessively
      long. Otherwise, this change was generated with the following semantic
      patch:
      
      // Generated with
      // echo linux/linkmode.h > includes
      // git grep -Flf includes include/ | cut -f 2- -d / | cat includes - \
      // | sort | uniq | tee new_includes | wc -l && mv new_includes includes
      // and repeating until the number stopped going up
      @i@
      @@
      
      (
       #include <linux/acpi_mdio.h>
      |
       #include <linux/brcmphy.h>
      |
       #include <linux/dsa/loop.h>
      |
       #include <linux/dsa/sja1105.h>
      |
       #include <linux/ethtool.h>
      |
       #include <linux/ethtool_netlink.h>
      |
       #include <linux/fec.h>
      |
       #include <linux/fs_enet_pd.h>
      |
       #include <linux/fsl/enetc_mdio.h>
      |
       #include <linux/fwnode_mdio.h>
      |
       #include <linux/linkmode.h>
      |
       #include <linux/lsm_audit.h>
      |
       #include <linux/mdio-bitbang.h>
      |
       #include <linux/mdio.h>
      |
       #include <linux/mdio-mux.h>
      |
       #include <linux/mii.h>
      |
       #include <linux/mii_timestamper.h>
      |
       #include <linux/mlx5/accel.h>
      |
       #include <linux/mlx5/cq.h>
      |
       #include <linux/mlx5/device.h>
      |
       #include <linux/mlx5/driver.h>
      |
       #include <linux/mlx5/eswitch.h>
      |
       #include <linux/mlx5/fs.h>
      |
       #include <linux/mlx5/port.h>
      |
       #include <linux/mlx5/qp.h>
      |
       #include <linux/mlx5/rsc_dump.h>
      |
       #include <linux/mlx5/transobj.h>
      |
       #include <linux/mlx5/vport.h>
      |
       #include <linux/of_mdio.h>
      |
       #include <linux/of_net.h>
      |
       #include <linux/pcs-lynx.h>
      |
       #include <linux/pcs/pcs-xpcs.h>
      |
       #include <linux/phy.h>
      |
       #include <linux/phy_led_triggers.h>
      |
       #include <linux/phylink.h>
      |
       #include <linux/platform_data/bcmgenet.h>
      |
       #include <linux/platform_data/xilinx-ll-temac.h>
      |
       #include <linux/pxa168_eth.h>
      |
       #include <linux/qed/qed_eth_if.h>
      |
       #include <linux/qed/qed_fcoe_if.h>
      |
       #include <linux/qed/qed_if.h>
      |
       #include <linux/qed/qed_iov_if.h>
      |
       #include <linux/qed/qed_iscsi_if.h>
      |
       #include <linux/qed/qed_ll2_if.h>
      |
       #include <linux/qed/qed_nvmetcp_if.h>
      |
       #include <linux/qed/qed_rdma_if.h>
      |
       #include <linux/sfp.h>
      |
       #include <linux/sh_eth.h>
      |
       #include <linux/smsc911x.h>
      |
       #include <linux/soc/nxp/lpc32xx-misc.h>
      |
       #include <linux/stmmac.h>
      |
       #include <linux/sunrpc/svc_rdma.h>
      |
       #include <linux/sxgbe_platform.h>
      |
       #include <net/cfg80211.h>
      |
       #include <net/dsa.h>
      |
       #include <net/mac80211.h>
      |
       #include <net/selftests.h>
      |
       #include <rdma/ib_addr.h>
      |
       #include <rdma/ib_cache.h>
      |
       #include <rdma/ib_cm.h>
      |
       #include <rdma/ib_hdrs.h>
      |
       #include <rdma/ib_mad.h>
      |
       #include <rdma/ib_marshall.h>
      |
       #include <rdma/ib_pack.h>
      |
       #include <rdma/ib_pma.h>
      |
       #include <rdma/ib_sa.h>
      |
       #include <rdma/ib_smi.h>
      |
       #include <rdma/ib_umem.h>
      |
       #include <rdma/ib_umem_odp.h>
      |
       #include <rdma/ib_verbs.h>
      |
       #include <rdma/iw_cm.h>
      |
       #include <rdma/mr_pool.h>
      |
       #include <rdma/opa_addr.h>
      |
       #include <rdma/opa_port_info.h>
      |
       #include <rdma/opa_smi.h>
      |
       #include <rdma/opa_vnic.h>
      |
       #include <rdma/rdma_cm.h>
      |
       #include <rdma/rdma_cm_ib.h>
      |
       #include <rdma/rdmavt_cq.h>
      |
       #include <rdma/rdma_vt.h>
      |
       #include <rdma/rdmavt_qp.h>
      |
       #include <rdma/rw.h>
      |
       #include <rdma/tid_rdma_defs.h>
      |
       #include <rdma/uverbs_ioctl.h>
      |
       #include <rdma/uverbs_named_ioctl.h>
      |
       #include <rdma/uverbs_std_types.h>
      |
       #include <rdma/uverbs_types.h>
      |
       #include <soc/mscc/ocelot.h>
      |
       #include <soc/mscc/ocelot_ptp.h>
      |
       #include <soc/mscc/ocelot_vcap.h>
      |
       #include <trace/events/ib_mad.h>
      |
       #include <trace/events/rdma_core.h>
      |
       #include <trace/events/rdma.h>
      |
       #include <trace/events/rpcrdma.h>
      |
       #include <uapi/linux/ethtool.h>
      |
       #include <uapi/linux/ethtool_netlink.h>
      |
       #include <uapi/linux/mdio.h>
      |
       #include <uapi/linux/mii.h>
      )
      
      @depends on i@
      expression list args;
      @@
      
      (
      - bitmap_zero(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
      + linkmode_zero(args)
      |
      - bitmap_copy(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
      + linkmode_copy(args)
      |
      - bitmap_and(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
      + linkmode_and(args)
      |
      - bitmap_or(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
      + linkmode_or(args)
      |
      - bitmap_empty(args, ETHTOOL_LINK_MODE_MASK_NBITS)
      + linkmode_empty(args)
      |
      - bitmap_andnot(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
      + linkmode_andnot(args)
      |
      - bitmap_equal(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
      + linkmode_equal(args)
      |
      - bitmap_intersects(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
      + linkmode_intersects(args)
      |
      - bitmap_subset(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
      + linkmode_subset(args)
      )
      
      Add missing linux/mii.h include to mellanox. -DaveM
      Signed-off-by: NSean Anderson <sean.anderson@seco.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4973056c
    • V
      net: dsa: lantiq_gswip: serialize access to the PCE table · 49753a75
      Vladimir Oltean 提交于
      Looking at the code, the GSWIP switch appears to hold bridging service
      structures (VLANs, FDBs, forwarding rules) in PCE table entries.
      Hardware access to the PCE table is non-atomic, and is comprised of
      several register reads and writes.
      
      These accesses are currently serialized by the rtnl_lock, but DSA is
      changing its driver API and that lock will no longer be held when
      calling ->port_fdb_add() and ->port_fdb_del().
      
      So this driver needs to serialize the access to the PCE table using its
      own locking scheme. This patch adds that.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: NHauke Mehrtens <hauke@hauke-m.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      49753a75
    • V
      net: dsa: b53: serialize access to the ARL table · f239934c
      Vladimir Oltean 提交于
      The b53 driver performs non-atomic transactions to the ARL table when
      adding, deleting and reading FDB and MDB entries.
      
      Traditionally these were all serialized by the rtnl_lock(), but now it
      is possible that DSA calls ->port_fdb_add and ->port_fdb_del without
      holding that lock.
      
      So the driver must have its own serialization logic. Add a mutex and
      hold it from all entry points (->port_fdb_{add,del,dump},
      ->port_mdb_{add,del}).
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f239934c
    • V
      net: dsa: sja1105: serialize access to the dynamic config interface · 1681ae16
      Vladimir Oltean 提交于
      The sja1105 hardware seems as concurrent as can be, but when we create a
      background script that adds/removes a rain of FDB entries without the
      rtnl_mutex taken, then in parallel we do another operation like run
      'bridge fdb show', we can notice these errors popping up:
      
      sja1105 spi2.0: port 2 failed to read back entry for 00:01:02:03:00:40 vid 0: -ENOENT
      sja1105 spi2.0: port 2 failed to add 00:01:02:03:00:40 vid 0 to fdb: -2
      sja1105 spi2.0: port 2 failed to read back entry for 00:01:02:03:00:46 vid 0: -ENOENT
      sja1105 spi2.0: port 2 failed to add 00:01:02:03:00:46 vid 0 to fdb: -2
      
      Luckily what is going on does not require a major rework in the driver.
      The sja1105_dynamic_config_read() function sends multiple SPI buffers to
      the peripheral until the operation completes. We should not do anything
      until the hardware clears the VALID bit.
      
      But since there is no locking (i.e. right now we are implicitly
      serialized by the rtnl_mutex, but if we remove that), it might be
      possible that the process which performs the dynamic config read is
      preempted and another one performs a dynamic config write.
      
      What will happen in that case is that sja1105_dynamic_config_read(),
      when it resumes, expects to see VALIDENT set for the entry it reads
      back. But it won't.
      
      This can be corrected by introducing a mutex for serializing SPI
      accesses to the dynamic config interface which should be atomic with
      respect to each other.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1681ae16
    • V
      net: dsa: sja1105: wait for dynamic config command completion on writes too · 643979cf
      Vladimir Oltean 提交于
      The hardware manual says that software should attempt a new dynamic
      config access (be it a a write or a read-back) only while the VALID bit
      is cleared. The VALID bit is set by software to 1, and it remains set as
      long as the hardware is still processing the request.
      
      Currently the driver only polls for the command completion only for
      reads, because that's when we need the actual data read back. Writes
      have been more or less "asynchronous", although this has never been an
      observable issue.
      
      This change makes sja1105_dynamic_config_write poll the VALID bit as
      well, to absolutely ensure that a follow-up access to the static config
      finds the VALID bit cleared.
      
      So VALID means "work in progress", while VALIDENT means "entry being
      read is valid". On reads we check the VALIDENT bit too, while on writes
      that bit is not always defined. So we need to factor it out of the loop,
      and make the loop provide back the unpacked command structure, so that
      sja1105_dynamic_config_read can check the VALIDENT bit.
      
      The change also attempts to convert the open-coded loop to use the
      read_poll_timeout macro, since I know this will come up during review.
      It's more code, but hey, it uses read_poll_timeout!
      
      Tested on SJA1105T, SJA1105S, SJA1110A.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      643979cf
    • S
      net: Convert more users of mdiobus_* to mdiodev_* · 65aa371e
      Sean Anderson 提交于
      This converts users of mdiobus to mdiodev using the following semantic
      patch:
      
      @@
      identifier mdiodev;
      expression regnum;
      @@
      
      - mdiobus_read(mdiodev->bus, mdiodev->addr, regnum)
      + mdiodev_read(mdiodev, regnum)
      
      @@
      identifier mdiodev;
      expression regnum, val;
      @@
      
      - mdiobus_write(mdiodev->bus, mdiodev->addr, regnum, val)
      + mdiodev_write(mdiodev, regnum, val)
      Signed-off-by: NSean Anderson <sean.anderson@seco.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      65aa371e
  6. 23 10月, 2021 1 次提交
  7. 21 10月, 2021 1 次提交
  8. 20 10月, 2021 2 次提交
  9. 18 10月, 2021 4 次提交
  10. 15 10月, 2021 9 次提交
  11. 13 10月, 2021 6 次提交
    • V
      net: dsa: felix: break at first CPU port during init and teardown · 8d5f7954
      Vladimir Oltean 提交于
      The NXP LS1028A switch has two Ethernet ports towards the CPU, but only
      one of them is capable of acting as an NPI port at a time (inject and
      extract packets using DSA tags).
      
      However, using the alternative ocelot-8021q tagging protocol, it should
      be possible to use both CPU ports symmetrically, but for that we need to
      mark both ports in the device tree as DSA masters.
      
      In the process of doing that, it can be seen that traffic to/from the
      network stack gets broken, and this is because the Felix driver iterates
      through all DSA CPU ports and configures them as NPI ports. But since
      there can only be a single NPI port, we effectively end up in a
      situation where DSA thinks the default CPU port is the first one, but
      the hardware port configured to be an NPI is the last one.
      
      I would like to treat this as a bug, because if the updated device trees
      are going to start circulating, it would be really good for existing
      kernels to support them, too.
      
      Fixes: adb3dccf ("net: dsa: felix: convert to the new .change_tag_protocol DSA API")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      8d5f7954
    • V
      net: dsa: felix: purge skb from TX timestamping queue if it cannot be sent · 1328a883
      Vladimir Oltean 提交于
      At present, when a PTP packet which requires TX timestamping gets
      dropped under congestion by the switch, things go downhill very fast.
      The driver keeps a clone of that skb in a queue of packets awaiting TX
      timestamp interrupts, but interrupts will never be raised for the
      dropped packets.
      
      Moreover, matching timestamped packets to timestamps is done by a 2-bit
      timestamp ID, and this can wrap around and we can match on the wrong skb.
      
      Since with the default NPI-based tagging protocol, we get no notification
      about packet drops, the best we can do is eventually recover from the
      drop of a PTP frame: its skb will be dead memory until another skb which
      was assigned the same timestamp ID happens to find it.
      
      However, with the ocelot-8021q tagger which injects packets using the
      manual register interface, it appears that we can check for more
      information, such as:
      
      - whether the input queue has reached the high watermark or not
      - whether the injection group's FIFO can accept additional data or not
      
      so we know that a PTP frame is likely to get dropped before actually
      sending it, and drop it ourselves (because DSA uses NETIF_F_LLTX, so it
      can't return NETDEV_TX_BUSY to ask the qdisc to requeue the packet).
      
      But when we do that, we can also remove the skb from the timestamping
      queue, because there surely won't be any timestamp that matches it.
      
      Fixes: 0a6f17c6 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      1328a883
    • V
      net: dsa: tag_ocelot_8021q: break circular dependency with ocelot switch lib · 49f885b2
      Vladimir Oltean 提交于
      Michael reported that when using the "ocelot-8021q" tagging protocol,
      the switch driver module must be manually loaded before the tagging
      protocol can be loaded/is available.
      
      This appears to be the same problem described here:
      https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
      where due to the fact that DSA tagging protocols make use of symbols
      exported by the switch drivers, circular dependencies appear and this
      breaks module autoloading.
      
      The ocelot_8021q driver needs the ocelot_can_inject() and
      ocelot_port_inject_frame() functions from the switch library. Previously
      the wrong approach was taken to solve that dependency: shims were
      provided for the case where the ocelot switch library was compiled out,
      but that turns out to be insufficient, because the dependency when the
      switch lib _is_ compiled is problematic too.
      
      We cannot declare ocelot_can_inject() and ocelot_port_inject_frame() as
      static inline functions, because these access I/O functions like
      __ocelot_write_ix() which is called by ocelot_write_rix(). Making those
      static inline basically means exposing the whole guts of the ocelot
      switch library, not ideal...
      
      We already have one tagging protocol driver which calls into the switch
      driver during xmit but not using any exported symbol: sja1105_defer_xmit.
      We can do the same thing here: create a kthread worker and one work item
      per skb, and let the switch driver itself do the register accesses to
      send the skb, and then consume it.
      
      Fixes: 0a6f17c6 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping")
      Reported-by: NMichael Walle <michael@walle.cc>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      49f885b2
    • V
      net: mscc: ocelot: avoid overflowing the PTP timestamp FIFO · 52849bcf
      Vladimir Oltean 提交于
      PTP packets with 2-step TX timestamp requests are matched to packets
      based on the egress port number and a 6-bit timestamp identifier.
      All PTP timestamps are held in a common FIFO that is 128 entry deep.
      
      This patch ensures that back-to-back timestamping requests cannot exceed
      the hardware FIFO capacity. If that happens, simply send the packets
      without requesting a TX timestamp to be taken (in the case of felix,
      since the DSA API has a void return code in ds->ops->port_txtstamp) or
      drop them (in the case of ocelot).
      
      I've moved the ts_id_lock from a per-port basis to a per-switch basis,
      because we need separate accounting for both numbers of PTP frames in
      flight. And since we need locking to inc/dec the per-switch counter,
      that also offers protection for the per-port counter and hence there is
      no reason to have a per-port counter anymore.
      
      Fixes: 4e3b0468 ("net: mscc: PTP Hardware Clock (PHC) support")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      52849bcf
    • V
      net: dsa: sja1105: break dependency between dsa_port_is_sja1105 and switch driver · 4ac0567e
      Vladimir Oltean 提交于
      It's nice to be able to test a tagging protocol with dsa_loop, but not
      at the cost of losing the ability of building the tagging protocol and
      switch driver as modules, because as things stand, there is a circular
      dependency between the two. Tagging protocol drivers cannot depend on
      switch drivers, that is a hard fact.
      
      The reasoning behind the blamed patch was that accessing dp->priv should
      first make sure that the structure behind that pointer is what we really
      think it is.
      
      Currently the "sja1105" and "sja1110" tagging protocols only operate
      with the sja1105 switch driver, just like any other tagging protocol and
      switch combination. The only way to mix and match them is by modifying
      the code, and this applies to dsa_loop as well (by default that uses
      DSA_TAG_PROTO_NONE). So while in principle there is an issue, in
      practice there isn't one.
      
      Until we extend dsa_loop to allow user space configuration, treat the
      problem as a non-issue and just say that DSA ports found by tag_sja1105
      are always sja1105 ports, which is in fact true. But keep the
      dsa_port_is_sja1105 function so that it's easy to patch it during
      testing, and rely on dead code elimination.
      
      Fixes: 994d2cbb ("net: dsa: tag_sja1105: be dsa_loop-safe")
      Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      4ac0567e
    • V
      net: dsa: move sja1110_process_meta_tstamp inside the tagging protocol driver · 28da0555
      Vladimir Oltean 提交于
      The problem is that DSA tagging protocols really must not depend on the
      switch driver, because this creates a circular dependency at insmod
      time, and the switch driver will effectively not load when the tagging
      protocol driver is missing.
      
      The code was structured in the way it was for a reason, though. The DSA
      driver-facing API for PTP timestamping relies on the assumption that
      two-step TX timestamps are provided by the hardware in an out-of-band
      manner, typically by raising an interrupt and making that timestamp
      available inside some sort of FIFO which is to be accessed over
      SPI/MDIO/etc.
      
      So the API puts .port_txtstamp into dsa_switch_ops, because it is
      expected that the switch driver needs to save some state (like put the
      skb into a queue until its TX timestamp arrives).
      
      On SJA1110, TX timestamps are provided by the switch as Ethernet
      packets, so this makes them be received and processed by the tagging
      protocol driver. This in itself is great, because the timestamps are
      full 64-bit and do not require reconstruction, and since Ethernet is the
      fastest I/O method available to/from the switch, PTP timestamps arrive
      very quickly, no matter how bottlenecked the SPI connection is, because
      SPI interaction is not needed at all.
      
      DSA's code structure and strict isolation between the tagging protocol
      driver and the switch driver break the natural code organization.
      
      When the tagging protocol driver receives a packet which is classified
      as a metadata packet containing timestamps, it passes those timestamps
      one by one to the switch driver, which then proceeds to compare them
      based on the recorded timestamp ID that was generated in .port_txtstamp.
      
      The communication between the tagging protocol and the switch driver is
      done through a method exported by the switch driver, sja1110_process_meta_tstamp.
      To satisfy build requirements, we force a dependency to build the
      tagging protocol driver as a module when the switch driver is a module.
      However, as explained in the first paragraph, that causes the circular
      dependency.
      
      To solve this, move the skb queue from struct sja1105_private :: struct
      sja1105_ptp_data to struct sja1105_private :: struct sja1105_tagger_data.
      The latter is a data structure for which hacks have already been put
      into place to be able to create persistent storage per switch that is
      accessible from the tagging protocol driver (see sja1105_setup_ports).
      
      With the skb queue directly accessible from the tagging protocol driver,
      we can now move sja1110_process_meta_tstamp into the tagging driver
      itself, and avoid exporting a symbol.
      
      Fixes: 566b18c8 ("net: dsa: sja1105: implement TX timestamping for SJA1110")
      Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      28da0555
  12. 12 10月, 2021 1 次提交
    • A
      net: dsa: microchip: Added the condition for scheduling ksz_mib_read_work · ef1100ef
      Arun Ramadoss 提交于
      When the ksz module is installed and removed using rmmod, kernel crashes
      with null pointer dereferrence error. During rmmod, ksz_switch_remove
      function tries to cancel the mib_read_workqueue using
      cancel_delayed_work_sync routine and unregister switch from dsa.
      
      During dsa_unregister_switch it calls ksz_mac_link_down, which in turn
      reschedules the workqueue since mib_interval is non-zero.
      Due to which queue executed after mib_interval and it tries to access
      dp->slave. But the slave is unregistered in the ksz_switch_remove
      function. Hence kernel crashes.
      
      To avoid this crash, before canceling the workqueue, resetted the
      mib_interval to 0.
      
      v1 -> v2:
      -Removed the if condition in ksz_mib_read_work
      
      Fixes: 469b390e ("net: dsa: microchip: use delayed_work instead of timer + work")
      Signed-off-by: NArun Ramadoss <arun.ramadoss@microchip.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ef1100ef