1. 25 10月, 2021 5 次提交
    • V
      net: dsa: lantiq_gswip: serialize access to the PCE registers · cf231b43
      Vladimir Oltean 提交于
      The GSWIP switch accesses various bridging layer tables (VLANs, FDBs,
      forwarding rules) indirectly through PCE registers. These hardware
      accesses are non-atomic, being comprised of several register reads and
      writes.
      
      These accesses are currently serialized by the rtnl_lock, but DSA is
      changing its driver API and that lock will no longer be held when
      calling ->port_fdb_add() and ->port_fdb_del().
      
      So this driver needs to serialize the access to the PCE registers using
      its own locking scheme. This patch adds that.
      
      Note that the driver also uses the gswip_pce_load_microcode() function
      to load a static configuration for the packet classification engine into
      a table using the same registers. It is currently not protected, but
      since that configuration is only done from the dsa_switch_ops :: setup
      method, there is no risk of it being concurrent with other operations.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NHauke Mehrtens <hauke@hauke-m.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf231b43
    • V
      net: dsa: b53: serialize access to the ARL table · f7eb4a1c
      Vladimir Oltean 提交于
      The b53 driver performs non-atomic transactions to the ARL table when
      adding, deleting and reading FDB and MDB entries.
      
      Traditionally these were all serialized by the rtnl_lock(), but now it
      is possible that DSA calls ->port_fdb_add and ->port_fdb_del without
      holding that lock.
      
      So the driver must have its own serialization logic. Add a mutex and
      hold it from all entry points (->port_fdb_{add,del,dump},
      ->port_mdb_{add,del}).
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f7eb4a1c
    • V
      net: dsa: sja1105: serialize access to the dynamic config interface · eb016afd
      Vladimir Oltean 提交于
      The sja1105 hardware seems as concurrent as can be, but when we create a
      background script that adds/removes a rain of FDB entries without the
      rtnl_mutex taken, then in parallel we do another operation like run
      'bridge fdb show', we can notice these errors popping up:
      
      sja1105 spi2.0: port 2 failed to read back entry for 00:01:02:03:00:40 vid 0: -ENOENT
      sja1105 spi2.0: port 2 failed to add 00:01:02:03:00:40 vid 0 to fdb: -2
      sja1105 spi2.0: port 2 failed to read back entry for 00:01:02:03:00:46 vid 0: -ENOENT
      sja1105 spi2.0: port 2 failed to add 00:01:02:03:00:46 vid 0 to fdb: -2
      
      Luckily what is going on does not require a major rework in the driver.
      The sja1105_dynamic_config_read() function sends multiple SPI buffers to
      the peripheral until the operation completes. We should not do anything
      until the hardware clears the VALID bit.
      
      But since there is no locking (i.e. right now we are implicitly
      serialized by the rtnl_mutex, but if we remove that), it might be
      possible that the process which performs the dynamic config read is
      preempted and another one performs a dynamic config write.
      
      What will happen in that case is that sja1105_dynamic_config_read(),
      when it resumes, expects to see VALIDENT set for the entry it reads
      back. But it won't.
      
      This can be corrected by introducing a mutex for serializing SPI
      accesses to the dynamic config interface which should be atomic with
      respect to each other.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb016afd
    • V
      net: dsa: sja1105: wait for dynamic config command completion on writes too · df405910
      Vladimir Oltean 提交于
      The hardware manual says that software should attempt a new dynamic
      config access (be it a a write or a read-back) only while the VALID bit
      is cleared. The VALID bit is set by software to 1, and it remains set as
      long as the hardware is still processing the request.
      
      Currently the driver only polls for the command completion only for
      reads, because that's when we need the actual data read back. Writes
      have been more or less "asynchronous", although this has never been an
      observable issue.
      
      This change makes sja1105_dynamic_config_write poll the VALID bit as
      well, to absolutely ensure that a follow-up access to the static config
      finds the VALID bit cleared.
      
      So VALID means "work in progress", while VALIDENT means "entry being
      read is valid". On reads we check the VALIDENT bit too, while on writes
      that bit is not always defined. So we need to factor it out of the loop,
      and make the loop provide back the unpacked command structure, so that
      sja1105_dynamic_config_read can check the VALIDENT bit.
      
      The change also attempts to convert the open-coded loop to use the
      read_poll_timeout macro, since I know this will come up during review.
      It's more code, but hey, it uses read_poll_timeout!
      
      Tested on SJA1105T, SJA1105S, SJA1110A.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df405910
    • D
      Revert "Merge branch 'dsa-rtnl'" · 2d7e73f0
      David S. Miller 提交于
      This reverts commit 965e6b26, reversing
      changes made to 4d98bb0d.
      2d7e73f0
  2. 24 10月, 2021 6 次提交
    • S
      net: convert users of bitmap_foo() to linkmode_foo() · 4973056c
      Sean Anderson 提交于
      This converts instances of
      	bitmap_foo(args..., __ETHTOOL_LINK_MODE_MASK_NBITS)
      to
      	linkmode_foo(args...)
      
      I manually fixed up some lines to prevent them from being excessively
      long. Otherwise, this change was generated with the following semantic
      patch:
      
      // Generated with
      // echo linux/linkmode.h > includes
      // git grep -Flf includes include/ | cut -f 2- -d / | cat includes - \
      // | sort | uniq | tee new_includes | wc -l && mv new_includes includes
      // and repeating until the number stopped going up
      @i@
      @@
      
      (
       #include <linux/acpi_mdio.h>
      |
       #include <linux/brcmphy.h>
      |
       #include <linux/dsa/loop.h>
      |
       #include <linux/dsa/sja1105.h>
      |
       #include <linux/ethtool.h>
      |
       #include <linux/ethtool_netlink.h>
      |
       #include <linux/fec.h>
      |
       #include <linux/fs_enet_pd.h>
      |
       #include <linux/fsl/enetc_mdio.h>
      |
       #include <linux/fwnode_mdio.h>
      |
       #include <linux/linkmode.h>
      |
       #include <linux/lsm_audit.h>
      |
       #include <linux/mdio-bitbang.h>
      |
       #include <linux/mdio.h>
      |
       #include <linux/mdio-mux.h>
      |
       #include <linux/mii.h>
      |
       #include <linux/mii_timestamper.h>
      |
       #include <linux/mlx5/accel.h>
      |
       #include <linux/mlx5/cq.h>
      |
       #include <linux/mlx5/device.h>
      |
       #include <linux/mlx5/driver.h>
      |
       #include <linux/mlx5/eswitch.h>
      |
       #include <linux/mlx5/fs.h>
      |
       #include <linux/mlx5/port.h>
      |
       #include <linux/mlx5/qp.h>
      |
       #include <linux/mlx5/rsc_dump.h>
      |
       #include <linux/mlx5/transobj.h>
      |
       #include <linux/mlx5/vport.h>
      |
       #include <linux/of_mdio.h>
      |
       #include <linux/of_net.h>
      |
       #include <linux/pcs-lynx.h>
      |
       #include <linux/pcs/pcs-xpcs.h>
      |
       #include <linux/phy.h>
      |
       #include <linux/phy_led_triggers.h>
      |
       #include <linux/phylink.h>
      |
       #include <linux/platform_data/bcmgenet.h>
      |
       #include <linux/platform_data/xilinx-ll-temac.h>
      |
       #include <linux/pxa168_eth.h>
      |
       #include <linux/qed/qed_eth_if.h>
      |
       #include <linux/qed/qed_fcoe_if.h>
      |
       #include <linux/qed/qed_if.h>
      |
       #include <linux/qed/qed_iov_if.h>
      |
       #include <linux/qed/qed_iscsi_if.h>
      |
       #include <linux/qed/qed_ll2_if.h>
      |
       #include <linux/qed/qed_nvmetcp_if.h>
      |
       #include <linux/qed/qed_rdma_if.h>
      |
       #include <linux/sfp.h>
      |
       #include <linux/sh_eth.h>
      |
       #include <linux/smsc911x.h>
      |
       #include <linux/soc/nxp/lpc32xx-misc.h>
      |
       #include <linux/stmmac.h>
      |
       #include <linux/sunrpc/svc_rdma.h>
      |
       #include <linux/sxgbe_platform.h>
      |
       #include <net/cfg80211.h>
      |
       #include <net/dsa.h>
      |
       #include <net/mac80211.h>
      |
       #include <net/selftests.h>
      |
       #include <rdma/ib_addr.h>
      |
       #include <rdma/ib_cache.h>
      |
       #include <rdma/ib_cm.h>
      |
       #include <rdma/ib_hdrs.h>
      |
       #include <rdma/ib_mad.h>
      |
       #include <rdma/ib_marshall.h>
      |
       #include <rdma/ib_pack.h>
      |
       #include <rdma/ib_pma.h>
      |
       #include <rdma/ib_sa.h>
      |
       #include <rdma/ib_smi.h>
      |
       #include <rdma/ib_umem.h>
      |
       #include <rdma/ib_umem_odp.h>
      |
       #include <rdma/ib_verbs.h>
      |
       #include <rdma/iw_cm.h>
      |
       #include <rdma/mr_pool.h>
      |
       #include <rdma/opa_addr.h>
      |
       #include <rdma/opa_port_info.h>
      |
       #include <rdma/opa_smi.h>
      |
       #include <rdma/opa_vnic.h>
      |
       #include <rdma/rdma_cm.h>
      |
       #include <rdma/rdma_cm_ib.h>
      |
       #include <rdma/rdmavt_cq.h>
      |
       #include <rdma/rdma_vt.h>
      |
       #include <rdma/rdmavt_qp.h>
      |
       #include <rdma/rw.h>
      |
       #include <rdma/tid_rdma_defs.h>
      |
       #include <rdma/uverbs_ioctl.h>
      |
       #include <rdma/uverbs_named_ioctl.h>
      |
       #include <rdma/uverbs_std_types.h>
      |
       #include <rdma/uverbs_types.h>
      |
       #include <soc/mscc/ocelot.h>
      |
       #include <soc/mscc/ocelot_ptp.h>
      |
       #include <soc/mscc/ocelot_vcap.h>
      |
       #include <trace/events/ib_mad.h>
      |
       #include <trace/events/rdma_core.h>
      |
       #include <trace/events/rdma.h>
      |
       #include <trace/events/rpcrdma.h>
      |
       #include <uapi/linux/ethtool.h>
      |
       #include <uapi/linux/ethtool_netlink.h>
      |
       #include <uapi/linux/mdio.h>
      |
       #include <uapi/linux/mii.h>
      )
      
      @depends on i@
      expression list args;
      @@
      
      (
      - bitmap_zero(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
      + linkmode_zero(args)
      |
      - bitmap_copy(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
      + linkmode_copy(args)
      |
      - bitmap_and(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
      + linkmode_and(args)
      |
      - bitmap_or(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
      + linkmode_or(args)
      |
      - bitmap_empty(args, ETHTOOL_LINK_MODE_MASK_NBITS)
      + linkmode_empty(args)
      |
      - bitmap_andnot(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
      + linkmode_andnot(args)
      |
      - bitmap_equal(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
      + linkmode_equal(args)
      |
      - bitmap_intersects(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
      + linkmode_intersects(args)
      |
      - bitmap_subset(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
      + linkmode_subset(args)
      )
      
      Add missing linux/mii.h include to mellanox. -DaveM
      Signed-off-by: NSean Anderson <sean.anderson@seco.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4973056c
    • V
      net: dsa: lantiq_gswip: serialize access to the PCE table · 49753a75
      Vladimir Oltean 提交于
      Looking at the code, the GSWIP switch appears to hold bridging service
      structures (VLANs, FDBs, forwarding rules) in PCE table entries.
      Hardware access to the PCE table is non-atomic, and is comprised of
      several register reads and writes.
      
      These accesses are currently serialized by the rtnl_lock, but DSA is
      changing its driver API and that lock will no longer be held when
      calling ->port_fdb_add() and ->port_fdb_del().
      
      So this driver needs to serialize the access to the PCE table using its
      own locking scheme. This patch adds that.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: NHauke Mehrtens <hauke@hauke-m.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      49753a75
    • V
      net: dsa: b53: serialize access to the ARL table · f239934c
      Vladimir Oltean 提交于
      The b53 driver performs non-atomic transactions to the ARL table when
      adding, deleting and reading FDB and MDB entries.
      
      Traditionally these were all serialized by the rtnl_lock(), but now it
      is possible that DSA calls ->port_fdb_add and ->port_fdb_del without
      holding that lock.
      
      So the driver must have its own serialization logic. Add a mutex and
      hold it from all entry points (->port_fdb_{add,del,dump},
      ->port_mdb_{add,del}).
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f239934c
    • V
      net: dsa: sja1105: serialize access to the dynamic config interface · 1681ae16
      Vladimir Oltean 提交于
      The sja1105 hardware seems as concurrent as can be, but when we create a
      background script that adds/removes a rain of FDB entries without the
      rtnl_mutex taken, then in parallel we do another operation like run
      'bridge fdb show', we can notice these errors popping up:
      
      sja1105 spi2.0: port 2 failed to read back entry for 00:01:02:03:00:40 vid 0: -ENOENT
      sja1105 spi2.0: port 2 failed to add 00:01:02:03:00:40 vid 0 to fdb: -2
      sja1105 spi2.0: port 2 failed to read back entry for 00:01:02:03:00:46 vid 0: -ENOENT
      sja1105 spi2.0: port 2 failed to add 00:01:02:03:00:46 vid 0 to fdb: -2
      
      Luckily what is going on does not require a major rework in the driver.
      The sja1105_dynamic_config_read() function sends multiple SPI buffers to
      the peripheral until the operation completes. We should not do anything
      until the hardware clears the VALID bit.
      
      But since there is no locking (i.e. right now we are implicitly
      serialized by the rtnl_mutex, but if we remove that), it might be
      possible that the process which performs the dynamic config read is
      preempted and another one performs a dynamic config write.
      
      What will happen in that case is that sja1105_dynamic_config_read(),
      when it resumes, expects to see VALIDENT set for the entry it reads
      back. But it won't.
      
      This can be corrected by introducing a mutex for serializing SPI
      accesses to the dynamic config interface which should be atomic with
      respect to each other.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1681ae16
    • V
      net: dsa: sja1105: wait for dynamic config command completion on writes too · 643979cf
      Vladimir Oltean 提交于
      The hardware manual says that software should attempt a new dynamic
      config access (be it a a write or a read-back) only while the VALID bit
      is cleared. The VALID bit is set by software to 1, and it remains set as
      long as the hardware is still processing the request.
      
      Currently the driver only polls for the command completion only for
      reads, because that's when we need the actual data read back. Writes
      have been more or less "asynchronous", although this has never been an
      observable issue.
      
      This change makes sja1105_dynamic_config_write poll the VALID bit as
      well, to absolutely ensure that a follow-up access to the static config
      finds the VALID bit cleared.
      
      So VALID means "work in progress", while VALIDENT means "entry being
      read is valid". On reads we check the VALIDENT bit too, while on writes
      that bit is not always defined. So we need to factor it out of the loop,
      and make the loop provide back the unpacked command structure, so that
      sja1105_dynamic_config_read can check the VALIDENT bit.
      
      The change also attempts to convert the open-coded loop to use the
      read_poll_timeout macro, since I know this will come up during review.
      It's more code, but hey, it uses read_poll_timeout!
      
      Tested on SJA1105T, SJA1105S, SJA1110A.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      643979cf
    • S
      net: Convert more users of mdiobus_* to mdiodev_* · 65aa371e
      Sean Anderson 提交于
      This converts users of mdiobus to mdiodev using the following semantic
      patch:
      
      @@
      identifier mdiodev;
      expression regnum;
      @@
      
      - mdiobus_read(mdiodev->bus, mdiodev->addr, regnum)
      + mdiodev_read(mdiodev, regnum)
      
      @@
      identifier mdiodev;
      expression regnum, val;
      @@
      
      - mdiobus_write(mdiodev->bus, mdiodev->addr, regnum, val)
      + mdiodev_write(mdiodev, regnum, val)
      Signed-off-by: NSean Anderson <sean.anderson@seco.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      65aa371e
  3. 23 10月, 2021 1 次提交
  4. 21 10月, 2021 1 次提交
  5. 20 10月, 2021 2 次提交
  6. 18 10月, 2021 4 次提交
  7. 15 10月, 2021 9 次提交
  8. 13 10月, 2021 6 次提交
    • V
      net: dsa: felix: break at first CPU port during init and teardown · 8d5f7954
      Vladimir Oltean 提交于
      The NXP LS1028A switch has two Ethernet ports towards the CPU, but only
      one of them is capable of acting as an NPI port at a time (inject and
      extract packets using DSA tags).
      
      However, using the alternative ocelot-8021q tagging protocol, it should
      be possible to use both CPU ports symmetrically, but for that we need to
      mark both ports in the device tree as DSA masters.
      
      In the process of doing that, it can be seen that traffic to/from the
      network stack gets broken, and this is because the Felix driver iterates
      through all DSA CPU ports and configures them as NPI ports. But since
      there can only be a single NPI port, we effectively end up in a
      situation where DSA thinks the default CPU port is the first one, but
      the hardware port configured to be an NPI is the last one.
      
      I would like to treat this as a bug, because if the updated device trees
      are going to start circulating, it would be really good for existing
      kernels to support them, too.
      
      Fixes: adb3dccf ("net: dsa: felix: convert to the new .change_tag_protocol DSA API")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      8d5f7954
    • V
      net: dsa: felix: purge skb from TX timestamping queue if it cannot be sent · 1328a883
      Vladimir Oltean 提交于
      At present, when a PTP packet which requires TX timestamping gets
      dropped under congestion by the switch, things go downhill very fast.
      The driver keeps a clone of that skb in a queue of packets awaiting TX
      timestamp interrupts, but interrupts will never be raised for the
      dropped packets.
      
      Moreover, matching timestamped packets to timestamps is done by a 2-bit
      timestamp ID, and this can wrap around and we can match on the wrong skb.
      
      Since with the default NPI-based tagging protocol, we get no notification
      about packet drops, the best we can do is eventually recover from the
      drop of a PTP frame: its skb will be dead memory until another skb which
      was assigned the same timestamp ID happens to find it.
      
      However, with the ocelot-8021q tagger which injects packets using the
      manual register interface, it appears that we can check for more
      information, such as:
      
      - whether the input queue has reached the high watermark or not
      - whether the injection group's FIFO can accept additional data or not
      
      so we know that a PTP frame is likely to get dropped before actually
      sending it, and drop it ourselves (because DSA uses NETIF_F_LLTX, so it
      can't return NETDEV_TX_BUSY to ask the qdisc to requeue the packet).
      
      But when we do that, we can also remove the skb from the timestamping
      queue, because there surely won't be any timestamp that matches it.
      
      Fixes: 0a6f17c6 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      1328a883
    • V
      net: dsa: tag_ocelot_8021q: break circular dependency with ocelot switch lib · 49f885b2
      Vladimir Oltean 提交于
      Michael reported that when using the "ocelot-8021q" tagging protocol,
      the switch driver module must be manually loaded before the tagging
      protocol can be loaded/is available.
      
      This appears to be the same problem described here:
      https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
      where due to the fact that DSA tagging protocols make use of symbols
      exported by the switch drivers, circular dependencies appear and this
      breaks module autoloading.
      
      The ocelot_8021q driver needs the ocelot_can_inject() and
      ocelot_port_inject_frame() functions from the switch library. Previously
      the wrong approach was taken to solve that dependency: shims were
      provided for the case where the ocelot switch library was compiled out,
      but that turns out to be insufficient, because the dependency when the
      switch lib _is_ compiled is problematic too.
      
      We cannot declare ocelot_can_inject() and ocelot_port_inject_frame() as
      static inline functions, because these access I/O functions like
      __ocelot_write_ix() which is called by ocelot_write_rix(). Making those
      static inline basically means exposing the whole guts of the ocelot
      switch library, not ideal...
      
      We already have one tagging protocol driver which calls into the switch
      driver during xmit but not using any exported symbol: sja1105_defer_xmit.
      We can do the same thing here: create a kthread worker and one work item
      per skb, and let the switch driver itself do the register accesses to
      send the skb, and then consume it.
      
      Fixes: 0a6f17c6 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping")
      Reported-by: NMichael Walle <michael@walle.cc>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      49f885b2
    • V
      net: mscc: ocelot: avoid overflowing the PTP timestamp FIFO · 52849bcf
      Vladimir Oltean 提交于
      PTP packets with 2-step TX timestamp requests are matched to packets
      based on the egress port number and a 6-bit timestamp identifier.
      All PTP timestamps are held in a common FIFO that is 128 entry deep.
      
      This patch ensures that back-to-back timestamping requests cannot exceed
      the hardware FIFO capacity. If that happens, simply send the packets
      without requesting a TX timestamp to be taken (in the case of felix,
      since the DSA API has a void return code in ds->ops->port_txtstamp) or
      drop them (in the case of ocelot).
      
      I've moved the ts_id_lock from a per-port basis to a per-switch basis,
      because we need separate accounting for both numbers of PTP frames in
      flight. And since we need locking to inc/dec the per-switch counter,
      that also offers protection for the per-port counter and hence there is
      no reason to have a per-port counter anymore.
      
      Fixes: 4e3b0468 ("net: mscc: PTP Hardware Clock (PHC) support")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      52849bcf
    • V
      net: dsa: sja1105: break dependency between dsa_port_is_sja1105 and switch driver · 4ac0567e
      Vladimir Oltean 提交于
      It's nice to be able to test a tagging protocol with dsa_loop, but not
      at the cost of losing the ability of building the tagging protocol and
      switch driver as modules, because as things stand, there is a circular
      dependency between the two. Tagging protocol drivers cannot depend on
      switch drivers, that is a hard fact.
      
      The reasoning behind the blamed patch was that accessing dp->priv should
      first make sure that the structure behind that pointer is what we really
      think it is.
      
      Currently the "sja1105" and "sja1110" tagging protocols only operate
      with the sja1105 switch driver, just like any other tagging protocol and
      switch combination. The only way to mix and match them is by modifying
      the code, and this applies to dsa_loop as well (by default that uses
      DSA_TAG_PROTO_NONE). So while in principle there is an issue, in
      practice there isn't one.
      
      Until we extend dsa_loop to allow user space configuration, treat the
      problem as a non-issue and just say that DSA ports found by tag_sja1105
      are always sja1105 ports, which is in fact true. But keep the
      dsa_port_is_sja1105 function so that it's easy to patch it during
      testing, and rely on dead code elimination.
      
      Fixes: 994d2cbb ("net: dsa: tag_sja1105: be dsa_loop-safe")
      Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      4ac0567e
    • V
      net: dsa: move sja1110_process_meta_tstamp inside the tagging protocol driver · 28da0555
      Vladimir Oltean 提交于
      The problem is that DSA tagging protocols really must not depend on the
      switch driver, because this creates a circular dependency at insmod
      time, and the switch driver will effectively not load when the tagging
      protocol driver is missing.
      
      The code was structured in the way it was for a reason, though. The DSA
      driver-facing API for PTP timestamping relies on the assumption that
      two-step TX timestamps are provided by the hardware in an out-of-band
      manner, typically by raising an interrupt and making that timestamp
      available inside some sort of FIFO which is to be accessed over
      SPI/MDIO/etc.
      
      So the API puts .port_txtstamp into dsa_switch_ops, because it is
      expected that the switch driver needs to save some state (like put the
      skb into a queue until its TX timestamp arrives).
      
      On SJA1110, TX timestamps are provided by the switch as Ethernet
      packets, so this makes them be received and processed by the tagging
      protocol driver. This in itself is great, because the timestamps are
      full 64-bit and do not require reconstruction, and since Ethernet is the
      fastest I/O method available to/from the switch, PTP timestamps arrive
      very quickly, no matter how bottlenecked the SPI connection is, because
      SPI interaction is not needed at all.
      
      DSA's code structure and strict isolation between the tagging protocol
      driver and the switch driver break the natural code organization.
      
      When the tagging protocol driver receives a packet which is classified
      as a metadata packet containing timestamps, it passes those timestamps
      one by one to the switch driver, which then proceeds to compare them
      based on the recorded timestamp ID that was generated in .port_txtstamp.
      
      The communication between the tagging protocol and the switch driver is
      done through a method exported by the switch driver, sja1110_process_meta_tstamp.
      To satisfy build requirements, we force a dependency to build the
      tagging protocol driver as a module when the switch driver is a module.
      However, as explained in the first paragraph, that causes the circular
      dependency.
      
      To solve this, move the skb queue from struct sja1105_private :: struct
      sja1105_ptp_data to struct sja1105_private :: struct sja1105_tagger_data.
      The latter is a data structure for which hacks have already been put
      into place to be able to create persistent storage per switch that is
      accessible from the tagging protocol driver (see sja1105_setup_ports).
      
      With the skb queue directly accessible from the tagging protocol driver,
      we can now move sja1110_process_meta_tstamp into the tagging driver
      itself, and avoid exporting a symbol.
      
      Fixes: 566b18c8 ("net: dsa: sja1105: implement TX timestamping for SJA1110")
      Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      28da0555
  9. 12 10月, 2021 2 次提交
  10. 09 10月, 2021 2 次提交
    • V
      net: dsa: mv88e6xxx: isolate the ATU databases of standalone and bridged ports · 5bded825
      Vladimir Oltean 提交于
      Similar to commit 6087175b ("net: dsa: mt7530: use independent VLAN
      learning on VLAN-unaware bridges"), software forwarding between an
      unoffloaded LAG port (a bonding interface with an unsupported policy)
      and a mv88e6xxx user port directly under a bridge is broken.
      
      We adopt the same strategy, which is to make the standalone ports not
      find any ATU entry learned on a bridge port.
      
      Theory: the mv88e6xxx ATU is looked up by FID and MAC address. There are
      as many FIDs as VIDs (4096). The FID is derived from the VID when
      possible (the VTU maps a VID to a FID), with a fallback to the port
      based default FID value when not (802.1Q Mode is disabled on the port,
      or the classified VID isn't present in the VTU).
      
      The mv88e6xxx driver makes the following use of FIDs and VIDs:
      
      - the port's DefaultVID (to which untagged & pvid-tagged packets get
        classified) is 0 and is absent from the VTU, so this kind of packets is
        processed in FID 0, the default FID assigned by mv88e6xxx_setup_port.
      
      - every time a bridge VLAN is created, mv88e6xxx_port_vlan_join() ->
        mv88e6xxx_atu_new() associates a FID with that VID which increases
        linearly starting from 1. Like this:
      
        bridge vlan add dev lan0 vid 100 # FID 1
        bridge vlan add dev lan1 vid 100 # still FID 1
        bridge vlan add dev lan2 vid 1024 # FID 2
      
      The FID allocation made by the driver is sub-optimal for the following
      reasons:
      
      (a) A standalone port has a DefaultPVID of 0 and a default FID of 0 too.
          A VLAN-unaware bridged port has a DefaultPVID of 0 and a default FID
          of 0 too. The difference is that the bridged ports may learn ATU
          entries, while the standalone port has the requirement that it must
          not, and must not find them either. Standalone ports must not use
          the same FID as ports belonging to a bridge. All standalone ports
          can use the same FID, since the ATU will never have an entry in
          that FID.
      
      (b) Multiple VLAN-unaware bridges will all use a DefaultPVID of 0 and a
          default FID of 0 on all their ports. The FDBs will not be isolated
          between these bridges. Every VLAN-unaware bridge must use the same
          FID on all its ports, different from the FID of other bridge ports.
      
      (c) Each bridge VLAN uses a unique FID which is useful for Independent
          VLAN Learning, but the same VLAN ID on multiple VLAN-aware bridges
          will result in the same FID being used by mv88e6xxx_atu_new().
          The correct behavior is for VLAN 1 in br0 to have a different FID
          compared to VLAN 1 in br1.
      
      This patch cannot fix all the above. Traditionally the DSA framework did
      not care about this, and the reality is that DSA core involvement is
      needed for the aforementioned issues to be solved. The only thing we can
      solve here is an issue which does not require API changes, and that is
      issue (a), aka use a different FID for standalone ports vs ports under
      VLAN-unaware bridges.
      
      The first step is deciding what VID and FID to use for standalone ports,
      and what VID and FID for bridged ports. The 0/0 pair for standalone
      ports is what they used up till now, let's keep using that. For bridged
      ports, there are 2 cases:
      
      - VLAN-aware ports will never end up using the port default FID, because
        packets will always be classified to a VID in the VTU or dropped
        otherwise. The FID is the one associated with the VID in the VTU.
      
      - On VLAN-unaware ports, we _could_ leave their DefaultVID (pvid) at
        zero (just as in the case of standalone ports), and just change the
        port's default FID from 0 to a different number (say 1).
      
      However, Tobias points out that there is one more requirement to cater to:
      cross-chip bridging. The Marvell DSA header does not carry the FID in
      it, only the VID. So once a packet crosses a DSA link, if it has a VID
      of zero it will get classified to the default FID of that cascade port.
      Relying on a port default FID for upstream cascade ports results in
      contradictions: a default FID of 0 breaks ATU isolation of bridged ports
      on the downstream switch, a default FID of 1 breaks standalone ports on
      the downstream switch.
      
      So not only must standalone ports have different FIDs compared to
      bridged ports, they must also have different DefaultVID values.
      IEEE 802.1Q defines two reserved VID values: 0 and 4095. So we simply
      choose 4095 as the DefaultVID of ports belonging to VLAN-unaware
      bridges, and VID 4095 maps to FID 1.
      
      For the xmit operation to look up the same ATU database, we need to put
      VID 4095 in DSA tags sent to ports belonging to VLAN-unaware bridges
      too. All shared ports are configured to map this VID to the bridging
      FID, because they are members of that VLAN in the VTU. Shared ports
      don't need to have 802.1QMode enabled in any way, they always parse the
      VID from the DSA header, they don't need to look at the 802.1Q header.
      
      We install VID 4095 to the VTU in mv88e6xxx_setup_port(), with the
      mention that mv88e6xxx_vtu_setup() which was located right below that
      call was flushing the VTU so those entries wouldn't be preserved.
      So we need to relocate the VTU flushing prior to the port initialization
      during ->setup(). Also note that this is why it is safe to assume that
      VID 4095 will get associated with FID 1: the user ports haven't been
      created, so there is no avenue for the user to create a bridge VLAN
      which could otherwise race with the creation of another FID which would
      otherwise use up the non-reserved FID value of 1.
      
      [ Currently mv88e6xxx_port_vlan_join() doesn't have the option of
        specifying a preferred FID, it always calls mv88e6xxx_atu_new(). ]
      
      mv88e6xxx_port_db_load_purge() is the function to access the ATU for
      FDB/MDB entries, and it used to determine the FID to use for
      VLAN-unaware FDB entries (VID=0) using mv88e6xxx_port_get_fid().
      But the driver only called mv88e6xxx_port_set_fid() once, during probe,
      so no surprises, the port FID was always 0, the call to get_fid() was
      redundant. As much as I would have wanted to not touch that code, the
      logic is broken when we add a new FID which is not the port-based
      default. Now the port-based default FID only corresponds to standalone
      ports, and FDB/MDB entries belong to the bridging service. So while in
      the future, when the DSA API will support FDB isolation, we will have to
      figure out the FID based on the bridge number, for now there's a single
      bridging FID, so hardcode that.
      
      Lastly, the tagger needs to check, when it is transmitting a VLAN
      untagged skb, whether it is sending it towards a bridged or a standalone
      port. When we see it is bridged we assume the bridge is VLAN-unaware.
      Not because it cannot be VLAN-aware but:
      
      - if we are transmitting from a VLAN-aware bridge we are likely doing so
        using TX forwarding offload. That code path guarantees that skbs have
        a vlan hwaccel tag in them, so we would not enter the "else" branch
        of the "if (skb->protocol == htons(ETH_P_8021Q))" condition.
      
      - if we are transmitting on behalf of a VLAN-aware bridge but with no TX
        forwarding offload (no PVT support, out of space in the PVT, whatever),
        we would indeed be transmitting with VLAN 4095 instead of the bridge
        device's pvid. However we would be injecting a "From CPU" frame, and
        the switch won't learn from that - it only learns from "Forward" frames.
        So it is inconsequential for address learning. And VLAN 4095 is
        absolutely enough for the frame to exit the switch, since we never
        remove that VLAN from any port.
      
      Fixes: 57e661aa ("net: dsa: mv88e6xxx: Link aggregation support")
      Reported-by: NTobias Waldekranz <tobias@waldekranz.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      5bded825
    • V
      net: dsa: mv88e6xxx: keep the pvid at 0 when VLAN-unaware · 8b6836d8
      Vladimir Oltean 提交于
      The VLAN support in mv88e6xxx has a loaded history. Commit 2ea7a679
      ("net: dsa: Don't add vlans when vlan filtering is disabled") noticed
      some issues with VLAN and decided the best way to deal with them was to
      make the DSA core ignore VLANs added by the bridge while VLAN awareness
      is turned off. Those issues were never explained, just presented as
      "at least one corner case".
      
      That approach had problems of its own, presented by
      commit 54a0ed0d ("net: dsa: provide an option for drivers to always
      receive bridge VLANs") for the DSA core, followed by
      commit 1fb74191 ("net: dsa: mv88e6xxx: fix vlan setup") which
      applied ds->configure_vlan_while_not_filtering = true for mv88e6xxx in
      particular.
      
      We still don't know what corner case Andrew saw when he wrote
      commit 2ea7a679 ("net: dsa: Don't add vlans when vlan filtering is
      disabled"), but Tobias now reports that when we use TX forwarding
      offload, pinging an external station from the bridge device is broken if
      the front-facing DSA user port has flooding turned off. The full
      description is in the link below, but for short, when a mv88e6xxx port
      is under a VLAN-unaware bridge, it inherits that bridge's pvid.
      So packets ingressing a user port will be classified to e.g. VID 1
      (assuming that value for the bridge_default_pvid), whereas when
      tag_dsa.c xmits towards a user port, it always sends packets using a VID
      of 0 if that port is standalone or under a VLAN-unaware bridge - or at
      least it did so prior to commit d82f8ab0 ("net: dsa: tag_dsa:
      offload the bridge forwarding process").
      
      In any case, when there is a conversation between the CPU and a station
      connected to a user port, the station's MAC address is learned in VID 1
      but the CPU tries to transmit through VID 0. The packets reach the
      intended station, but via flooding and not by virtue of matching the
      existing ATU entry.
      
      DSA has established (and enforced in other drivers: sja1105, felix,
      mt7530) that a VLAN-unaware port should use a private pvid, and not
      inherit the one from the bridge. The bridge's pvid should only be
      inherited when that bridge is VLAN-aware, so all state transitions need
      to be handled. On the other hand, all bridge VLANs should sit in the VTU
      starting with the moment when the bridge offloads them via switchdev,
      they are just not used.
      
      This solves the problem that Tobias sees because packets ingressing on
      VLAN-unaware user ports now get classified to VID 0, which is also the
      VID used by tag_dsa.c on xmit.
      
      Fixes: d82f8ab0 ("net: dsa: tag_dsa: offload the bridge forwarding process")
      Link: https://patchwork.kernel.org/project/netdevbpf/patch/20211003222312.284175-2-vladimir.oltean@nxp.com/#24491503Reported-by: NTobias Waldekranz <tobias@waldekranz.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      8b6836d8
  11. 08 10月, 2021 1 次提交
  12. 06 10月, 2021 1 次提交