1. 27 2月, 2022 2 次提交
    • V
      net: dsa: tag_8021q: merge RX and TX VLANs · 04b67e18
      Vladimir Oltean 提交于
      In the old Shared VLAN Learning mode of operation that tag_8021q
      previously used for forwarding, we needed to have distinct concepts for
      an RX and a TX VLAN.
      
      An RX VLAN could be installed on all ports that were members of a given
      bridge, so that autonomous forwarding could still work, while a TX VLAN
      was dedicated for precise packet steering, so it just contained the CPU
      port and one egress port.
      
      Now that tag_8021q uses Independent VLAN Learning and imprecise RX/TX
      all over, those lines have been blurred and we no longer have the need
      to do precise TX towards a port that is in a bridge. As for standalone
      ports, it is fine to use the same VLAN ID for both RX and TX.
      
      This patch changes the tag_8021q format by shifting the VLAN range it
      reserves, and halving it. Previously, our DIR bits were encoding the
      VLAN direction (RX/TX) and were set to either 1 or 2. This meant that
      tag_8021q reserved 2K VLANs, or 50% of the available range.
      
      Change the DIR bits to a hardcoded value of 3 now, which makes tag_8021q
      reserve only 1K VLANs, and a different range now (the last 1K). This is
      done so that we leave the old format in place in case we need to return
      to it.
      
      In terms of code, the vid_is_dsa_8021q_rxvlan and vid_is_dsa_8021q_txvlan
      functions go away. Any vid_is_dsa_8021q is both a TX and an RX VLAN, and
      they are no longer distinct. For example, felix which did different
      things for different VLAN types, now needs to handle the RX and the TX
      logic for the same VLAN.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      04b67e18
    • V
      net: dsa: tag_8021q: replace the SVL bridging with VLAN-unaware IVL bridging · 91495f21
      Vladimir Oltean 提交于
      For VLAN-unaware bridging, tag_8021q uses something perhaps a bit too
      tied with the sja1105 switch: each port uses the same pvid which is also
      used for standalone operation (a unique one from which the source port
      and device ID can be retrieved when packets from that port are forwarded
      to the CPU). Since each port has a unique pvid when performing
      autonomous forwarding, the switch must be configured for Shared VLAN
      Learning (SVL) such that the VLAN ID itself is ignored when performing
      FDB lookups. Without SVL, packets would always be flooded, since FDB
      lookup in the source port's VLAN would never find any entry.
      
      First of all, to make tag_8021q more palatable to switches which might
      not support Shared VLAN Learning, let's just use a common VLAN for all
      ports that are under the same bridge.
      
      Secondly, using Shared VLAN Learning means that FDB isolation can never
      be enforced. But if all ports under the same VLAN-unaware bridge share
      the same VLAN ID, it can.
      
      The disadvantage is that the CPU port can no longer perform precise
      source port identification for these packets. But at least we have a
      mechanism which has proven to be adequate for that situation: imprecise
      RX (dsa_find_designated_bridge_port_by_vid), which is what we use for
      termination on VLAN-aware bridges.
      
      The VLAN ID that VLAN-unaware bridges will use with tag_8021q is the
      same one as we were previously using for imprecise TX (bridge TX
      forwarding offload). It is already allocated, it is just a matter of
      using it.
      
      Note that because now all ports under the same bridge share the same
      VLAN, the complexity of performing a tag_8021q bridge join decreases
      dramatically. We no longer have to install the RX VLAN of a newly
      joining port into the port membership of the existing bridge ports.
      The newly joining port just becomes a member of the VLAN corresponding
      to that bridge, and the other ports are already members of it from when
      they joined the bridge themselves. So forwarding works properly.
      
      This means that we can unhook dsa_tag_8021q_bridge_{join,leave} from the
      cross-chip notifier level dsa_switch_bridge_{join,leave}. We can put
      these calls directly into the sja1105 driver.
      
      With this new mode of operation, a port controlled by tag_8021q can have
      two pvids whereas before it could only have one. The pvid for standalone
      operation is different from the pvid used for VLAN-unaware bridging.
      This is done, again, so that FDB isolation can be enforced.
      Let tag_8021q manage this by deleting the standalone pvid when a port
      joins a bridge, and restoring it when it leaves it.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91495f21
  2. 25 2月, 2022 6 次提交
  3. 14 12月, 2021 1 次提交
  4. 12 12月, 2021 6 次提交
    • V
      Revert "net: dsa: move sja1110_process_meta_tstamp inside the tagging protocol driver" · fcbf979a
      Vladimir Oltean 提交于
      This reverts commit 6d709cad.
      
      The above change was done to avoid calling symbols exported by the
      switch driver from the tagging protocol driver.
      
      With the tagger-owned storage model, we have a new option on our hands,
      and that is for the switch driver to provide a data consumer handler in
      the form of a function pointer inside the ->connect_tag_protocol()
      method. Having a function pointer avoids the problems of the exported
      symbols approach.
      
      By creating a handler for metadata frames holding TX timestamps on
      SJA1110, we are able to eliminate an skb queue from the tagger data, and
      replace it with a simple, and stateless, function pointer. This skb
      queue is now handled exclusively by sja1105_ptp.c, which makes the code
      easier to follow, as it used to be before the reverted patch.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fcbf979a
    • V
      net: dsa: tag_sja1105: convert to tagger-owned data · c79e8486
      Vladimir Oltean 提交于
      Currently, struct sja1105_tagger_data is a part of struct
      sja1105_private, and is used by the sja1105 driver to populate dp->priv.
      
      With the movement towards tagger-owned storage, the sja1105 driver
      should not be the owner of this memory.
      
      This change implements the connection between the sja1105 switch driver
      and its tagging protocol, which means that sja1105_tagger_data no longer
      stays in dp->priv but in ds->tagger_data, and that the sja1105 driver
      now only populates the sja1105_port_deferred_xmit callback pointer.
      The kthread worker is now the responsibility of the tagger.
      
      The sja1105 driver also alters the tagger's state some more, especially
      with regard to the PTP RX timestamping state. This will be fixed up a
      bit in further changes.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c79e8486
    • V
      net: dsa: sja1105: move ts_id from sja1105_tagger_data · 22ee9f8e
      Vladimir Oltean 提交于
      The TX timestamp ID is incremented by the SJA1110 PTP timestamping
      callback (->port_tx_timestamp) for every packet, when cloning it.
      It isn't used by the tagger at all, even though it sits inside the
      struct sja1105_tagger_data.
      
      Also, serialization to this structure is currently done through
      tagger_data->meta_lock, which is a cheap hack because the meta_lock
      isn't used for anything else on SJA1110 (sja1105_rcv_meta_state_machine
      isn't called).
      
      This change moves ts_id from sja1105_tagger_data to sja1105_private and
      introduces a dedicated spinlock for it, also in sja1105_private.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22ee9f8e
    • V
      net: dsa: sja1105: make dp->priv point directly to sja1105_tagger_data · bfcf1425
      Vladimir Oltean 提交于
      The design of the sja1105 tagger dp->priv is that each port has a
      separate struct sja1105_port, and the sp->data pointer points to a
      common struct sja1105_tagger_data.
      
      We have removed all per-port members accessible by the tagger, and now
      only struct sja1105_tagger_data remains. Make dp->priv point directly to
      this.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bfcf1425
    • V
      net: dsa: sja1105: bring deferred xmit implementation in line with ocelot-8021q · d38049bb
      Vladimir Oltean 提交于
      When the ocelot-8021q driver was converted to deferred xmit as part of
      commit 8d5f7954 ("net: dsa: felix: break at first CPU port during
      init and teardown"), the deferred implementation was deliberately made
      subtly different from what sja1105 has.
      
      The implementation differences lied on the following observations:
      
      - There might be a race between these two lines in tag_sja1105.c:
      
             skb_queue_tail(&sp->xmit_queue, skb_get(skb));
             kthread_queue_work(sp->xmit_worker, &sp->xmit_work);
      
        and the skb dequeue logic in sja1105_port_deferred_xmit(). For
        example, the xmit_work might be already queued, however the work item
        has just finished walking through the skb queue. Because we don't
        check the return code from kthread_queue_work, we don't do anything if
        the work item is already queued.
      
        However, nobody will take that skb and send it, at least until the
        next timestampable skb is sent. This creates additional (and
        avoidable) TX timestamping latency.
      
        To close that race, what the ocelot-8021q driver does is it doesn't
        keep a single work item per port, and a skb timestamping queue, but
        rather dynamically allocates a work item per packet.
      
      - It is also unnecessary to have more than one kthread that does the
        work. So delete the per-port kthread allocations and replace them with
        a single kthread which is global to the switch.
      
      This change brings the two implementations in line by applying those
      observations to the sja1105 driver as well.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d38049bb
    • V
      net: dsa: sja1105: let deferred packets time out when sent to ports going down · a3d74295
      Vladimir Oltean 提交于
      This code is not necessary and complicates the conversion of this driver
      to tagger-owned memory. If there is a PTP packet that is sent
      concurrently with the port getting disabled, the deferred xmit mechanism
      is robust enough to time out when it sees that it hasn't been delivered,
      and recovers.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a3d74295
  5. 09 12月, 2021 5 次提交
    • V
      net: dsa: eliminate dsa_switch_ops :: port_bridge_tx_fwd_{,un}offload · 857fdd74
      Vladimir Oltean 提交于
      We don't really need new switch API for these, and with new switches
      which intend to add support for this feature, it will become cumbersome
      to maintain.
      
      The change consists in restructuring the two drivers that implement this
      offload (sja1105 and mv88e6xxx) such that the offload is enabled and
      disabled from the ->port_bridge_{join,leave} methods instead of the old
      ->port_bridge_tx_fwd_{,un}offload.
      
      The only non-trivial change is that mv88e6xxx_map_virtual_bridge_to_pvt()
      has been moved to avoid a forward declaration, and the
      mv88e6xxx_reg_lock() calls from inside it have been removed, since
      locking is now done from mv88e6xxx_port_bridge_{join,leave}.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NAlvin Šipraga <alsi@bang-olufsen.dk>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      857fdd74
    • V
      net: dsa: add a "tx_fwd_offload" argument to ->port_bridge_join · b079922b
      Vladimir Oltean 提交于
      This is a preparation patch for the removal of the DSA switch methods
      ->port_bridge_tx_fwd_offload() and ->port_bridge_tx_fwd_unoffload().
      The plan is for the switch to report whether it offloads TX forwarding
      directly as a response to the ->port_bridge_join() method.
      
      This change deals with the noisy portion of converting all existing
      function prototypes to take this new boolean pointer argument.
      The bool is placed in the cross-chip notifier structure for bridge join,
      and a reference to it is provided to drivers. In the next change, DSA
      will then actually look at this value instead of calling
      ->port_bridge_tx_fwd_offload().
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NAlvin Šipraga <alsi@bang-olufsen.dk>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      b079922b
    • V
      net: dsa: keep the bridge_dev and bridge_num as part of the same structure · d3eed0e5
      Vladimir Oltean 提交于
      The main desire behind this is to provide coherent bridge information to
      the fast path without locking.
      
      For example, right now we set dp->bridge_dev and dp->bridge_num from
      separate code paths, it is theoretically possible for a packet
      transmission to read these two port properties consecutively and find a
      bridge number which does not correspond with the bridge device.
      
      Another desire is to start passing more complex bridge information to
      dsa_switch_ops functions. For example, with FDB isolation, it is
      expected that drivers will need to be passed the bridge which requested
      an FDB/MDB entry to be offloaded, and along with that bridge_dev, the
      associated bridge_num should be passed too, in case the driver might
      want to implement an isolation scheme based on that number.
      
      We already pass the {bridge_dev, bridge_num} pair to the TX forwarding
      offload switch API, however we'd like to remove that and squash it into
      the basic bridge join/leave API. So that means we need to pass this
      pair to the bridge join/leave API.
      
      During dsa_port_bridge_leave, first we unset dp->bridge_dev, then we
      call the driver's .port_bridge_leave with what used to be our
      dp->bridge_dev, but provided as an argument.
      
      When bridge_dev and bridge_num get folded into a single structure, we
      need to preserve this behavior in dsa_port_bridge_leave: we need a copy
      of what used to be in dp->bridge.
      
      Switch drivers check bridge membership by comparing dp->bridge_dev with
      the provided bridge_dev, but now, if we provide the struct dsa_bridge as
      a pointer, they cannot keep comparing dp->bridge to the provided
      pointer, since this only points to an on-stack copy. To make this
      obvious and prevent driver writers from forgetting and doing stupid
      things, in this new API, the struct dsa_bridge is provided as a full
      structure (not very large, contains an int and a pointer) instead of a
      pointer. An explicit comparison function needs to be used to determine
      bridge membership: dsa_port_offloads_bridge().
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NAlvin Šipraga <alsi@bang-olufsen.dk>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      d3eed0e5
    • V
      net: dsa: hide dp->bridge_dev and dp->bridge_num in drivers behind helpers · 41fb0cf1
      Vladimir Oltean 提交于
      The location of the bridge device pointer and number is going to change.
      It is not going to be kept individually per port, but in a common
      structure allocated dynamically and which will have lockdep validation.
      
      Use the helpers to access these elements so that we have a migration
      path to the new organization.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      41fb0cf1
    • V
      net: dsa: assign a bridge number even without TX forwarding offload · 947c8746
      Vladimir Oltean 提交于
      The service where DSA assigns a unique bridge number for each forwarding
      domain is useful even for drivers which do not implement the TX
      forwarding offload feature.
      
      For example, drivers might use the dp->bridge_num for FDB isolation.
      
      So rename ds->num_fwd_offloading_bridges to ds->max_num_bridges, and
      calculate a unique bridge_num for all drivers that set this value.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NAlvin Šipraga <alsi@bang-olufsen.dk>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      947c8746
  6. 25 10月, 2021 2 次提交
    • V
      net: dsa: sja1105: serialize access to the dynamic config interface · eb016afd
      Vladimir Oltean 提交于
      The sja1105 hardware seems as concurrent as can be, but when we create a
      background script that adds/removes a rain of FDB entries without the
      rtnl_mutex taken, then in parallel we do another operation like run
      'bridge fdb show', we can notice these errors popping up:
      
      sja1105 spi2.0: port 2 failed to read back entry for 00:01:02:03:00:40 vid 0: -ENOENT
      sja1105 spi2.0: port 2 failed to add 00:01:02:03:00:40 vid 0 to fdb: -2
      sja1105 spi2.0: port 2 failed to read back entry for 00:01:02:03:00:46 vid 0: -ENOENT
      sja1105 spi2.0: port 2 failed to add 00:01:02:03:00:46 vid 0 to fdb: -2
      
      Luckily what is going on does not require a major rework in the driver.
      The sja1105_dynamic_config_read() function sends multiple SPI buffers to
      the peripheral until the operation completes. We should not do anything
      until the hardware clears the VALID bit.
      
      But since there is no locking (i.e. right now we are implicitly
      serialized by the rtnl_mutex, but if we remove that), it might be
      possible that the process which performs the dynamic config read is
      preempted and another one performs a dynamic config write.
      
      What will happen in that case is that sja1105_dynamic_config_read(),
      when it resumes, expects to see VALIDENT set for the entry it reads
      back. But it won't.
      
      This can be corrected by introducing a mutex for serializing SPI
      accesses to the dynamic config interface which should be atomic with
      respect to each other.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb016afd
    • D
      Revert "Merge branch 'dsa-rtnl'" · 2d7e73f0
      David S. Miller 提交于
      This reverts commit 965e6b26, reversing
      changes made to 4d98bb0d.
      2d7e73f0
  7. 24 10月, 2021 2 次提交
    • S
      net: convert users of bitmap_foo() to linkmode_foo() · 4973056c
      Sean Anderson 提交于
      This converts instances of
      	bitmap_foo(args..., __ETHTOOL_LINK_MODE_MASK_NBITS)
      to
      	linkmode_foo(args...)
      
      I manually fixed up some lines to prevent them from being excessively
      long. Otherwise, this change was generated with the following semantic
      patch:
      
      // Generated with
      // echo linux/linkmode.h > includes
      // git grep -Flf includes include/ | cut -f 2- -d / | cat includes - \
      // | sort | uniq | tee new_includes | wc -l && mv new_includes includes
      // and repeating until the number stopped going up
      @i@
      @@
      
      (
       #include <linux/acpi_mdio.h>
      |
       #include <linux/brcmphy.h>
      |
       #include <linux/dsa/loop.h>
      |
       #include <linux/dsa/sja1105.h>
      |
       #include <linux/ethtool.h>
      |
       #include <linux/ethtool_netlink.h>
      |
       #include <linux/fec.h>
      |
       #include <linux/fs_enet_pd.h>
      |
       #include <linux/fsl/enetc_mdio.h>
      |
       #include <linux/fwnode_mdio.h>
      |
       #include <linux/linkmode.h>
      |
       #include <linux/lsm_audit.h>
      |
       #include <linux/mdio-bitbang.h>
      |
       #include <linux/mdio.h>
      |
       #include <linux/mdio-mux.h>
      |
       #include <linux/mii.h>
      |
       #include <linux/mii_timestamper.h>
      |
       #include <linux/mlx5/accel.h>
      |
       #include <linux/mlx5/cq.h>
      |
       #include <linux/mlx5/device.h>
      |
       #include <linux/mlx5/driver.h>
      |
       #include <linux/mlx5/eswitch.h>
      |
       #include <linux/mlx5/fs.h>
      |
       #include <linux/mlx5/port.h>
      |
       #include <linux/mlx5/qp.h>
      |
       #include <linux/mlx5/rsc_dump.h>
      |
       #include <linux/mlx5/transobj.h>
      |
       #include <linux/mlx5/vport.h>
      |
       #include <linux/of_mdio.h>
      |
       #include <linux/of_net.h>
      |
       #include <linux/pcs-lynx.h>
      |
       #include <linux/pcs/pcs-xpcs.h>
      |
       #include <linux/phy.h>
      |
       #include <linux/phy_led_triggers.h>
      |
       #include <linux/phylink.h>
      |
       #include <linux/platform_data/bcmgenet.h>
      |
       #include <linux/platform_data/xilinx-ll-temac.h>
      |
       #include <linux/pxa168_eth.h>
      |
       #include <linux/qed/qed_eth_if.h>
      |
       #include <linux/qed/qed_fcoe_if.h>
      |
       #include <linux/qed/qed_if.h>
      |
       #include <linux/qed/qed_iov_if.h>
      |
       #include <linux/qed/qed_iscsi_if.h>
      |
       #include <linux/qed/qed_ll2_if.h>
      |
       #include <linux/qed/qed_nvmetcp_if.h>
      |
       #include <linux/qed/qed_rdma_if.h>
      |
       #include <linux/sfp.h>
      |
       #include <linux/sh_eth.h>
      |
       #include <linux/smsc911x.h>
      |
       #include <linux/soc/nxp/lpc32xx-misc.h>
      |
       #include <linux/stmmac.h>
      |
       #include <linux/sunrpc/svc_rdma.h>
      |
       #include <linux/sxgbe_platform.h>
      |
       #include <net/cfg80211.h>
      |
       #include <net/dsa.h>
      |
       #include <net/mac80211.h>
      |
       #include <net/selftests.h>
      |
       #include <rdma/ib_addr.h>
      |
       #include <rdma/ib_cache.h>
      |
       #include <rdma/ib_cm.h>
      |
       #include <rdma/ib_hdrs.h>
      |
       #include <rdma/ib_mad.h>
      |
       #include <rdma/ib_marshall.h>
      |
       #include <rdma/ib_pack.h>
      |
       #include <rdma/ib_pma.h>
      |
       #include <rdma/ib_sa.h>
      |
       #include <rdma/ib_smi.h>
      |
       #include <rdma/ib_umem.h>
      |
       #include <rdma/ib_umem_odp.h>
      |
       #include <rdma/ib_verbs.h>
      |
       #include <rdma/iw_cm.h>
      |
       #include <rdma/mr_pool.h>
      |
       #include <rdma/opa_addr.h>
      |
       #include <rdma/opa_port_info.h>
      |
       #include <rdma/opa_smi.h>
      |
       #include <rdma/opa_vnic.h>
      |
       #include <rdma/rdma_cm.h>
      |
       #include <rdma/rdma_cm_ib.h>
      |
       #include <rdma/rdmavt_cq.h>
      |
       #include <rdma/rdma_vt.h>
      |
       #include <rdma/rdmavt_qp.h>
      |
       #include <rdma/rw.h>
      |
       #include <rdma/tid_rdma_defs.h>
      |
       #include <rdma/uverbs_ioctl.h>
      |
       #include <rdma/uverbs_named_ioctl.h>
      |
       #include <rdma/uverbs_std_types.h>
      |
       #include <rdma/uverbs_types.h>
      |
       #include <soc/mscc/ocelot.h>
      |
       #include <soc/mscc/ocelot_ptp.h>
      |
       #include <soc/mscc/ocelot_vcap.h>
      |
       #include <trace/events/ib_mad.h>
      |
       #include <trace/events/rdma_core.h>
      |
       #include <trace/events/rdma.h>
      |
       #include <trace/events/rpcrdma.h>
      |
       #include <uapi/linux/ethtool.h>
      |
       #include <uapi/linux/ethtool_netlink.h>
      |
       #include <uapi/linux/mdio.h>
      |
       #include <uapi/linux/mii.h>
      )
      
      @depends on i@
      expression list args;
      @@
      
      (
      - bitmap_zero(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
      + linkmode_zero(args)
      |
      - bitmap_copy(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
      + linkmode_copy(args)
      |
      - bitmap_and(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
      + linkmode_and(args)
      |
      - bitmap_or(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
      + linkmode_or(args)
      |
      - bitmap_empty(args, ETHTOOL_LINK_MODE_MASK_NBITS)
      + linkmode_empty(args)
      |
      - bitmap_andnot(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
      + linkmode_andnot(args)
      |
      - bitmap_equal(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
      + linkmode_equal(args)
      |
      - bitmap_intersects(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
      + linkmode_intersects(args)
      |
      - bitmap_subset(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
      + linkmode_subset(args)
      )
      
      Add missing linux/mii.h include to mellanox. -DaveM
      Signed-off-by: NSean Anderson <sean.anderson@seco.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4973056c
    • V
      net: dsa: sja1105: serialize access to the dynamic config interface · 1681ae16
      Vladimir Oltean 提交于
      The sja1105 hardware seems as concurrent as can be, but when we create a
      background script that adds/removes a rain of FDB entries without the
      rtnl_mutex taken, then in parallel we do another operation like run
      'bridge fdb show', we can notice these errors popping up:
      
      sja1105 spi2.0: port 2 failed to read back entry for 00:01:02:03:00:40 vid 0: -ENOENT
      sja1105 spi2.0: port 2 failed to add 00:01:02:03:00:40 vid 0 to fdb: -2
      sja1105 spi2.0: port 2 failed to read back entry for 00:01:02:03:00:46 vid 0: -ENOENT
      sja1105 spi2.0: port 2 failed to add 00:01:02:03:00:46 vid 0 to fdb: -2
      
      Luckily what is going on does not require a major rework in the driver.
      The sja1105_dynamic_config_read() function sends multiple SPI buffers to
      the peripheral until the operation completes. We should not do anything
      until the hardware clears the VALID bit.
      
      But since there is no locking (i.e. right now we are implicitly
      serialized by the rtnl_mutex, but if we remove that), it might be
      possible that the process which performs the dynamic config read is
      preempted and another one performs a dynamic config write.
      
      What will happen in that case is that sja1105_dynamic_config_read(),
      when it resumes, expects to see VALIDENT set for the entry it reads
      back. But it won't.
      
      This can be corrected by introducing a mutex for serializing SPI
      accesses to the dynamic config interface which should be atomic with
      respect to each other.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1681ae16
  8. 23 10月, 2021 1 次提交
  9. 20 10月, 2021 1 次提交
  10. 13 10月, 2021 1 次提交
    • V
      net: dsa: sja1105: break dependency between dsa_port_is_sja1105 and switch driver · 4ac0567e
      Vladimir Oltean 提交于
      It's nice to be able to test a tagging protocol with dsa_loop, but not
      at the cost of losing the ability of building the tagging protocol and
      switch driver as modules, because as things stand, there is a circular
      dependency between the two. Tagging protocol drivers cannot depend on
      switch drivers, that is a hard fact.
      
      The reasoning behind the blamed patch was that accessing dp->priv should
      first make sure that the structure behind that pointer is what we really
      think it is.
      
      Currently the "sja1105" and "sja1110" tagging protocols only operate
      with the sja1105 switch driver, just like any other tagging protocol and
      switch combination. The only way to mix and match them is by modifying
      the code, and this applies to dsa_loop as well (by default that uses
      DSA_TAG_PROTO_NONE). So while in principle there is an issue, in
      practice there isn't one.
      
      Until we extend dsa_loop to allow user space configuration, treat the
      problem as a non-issue and just say that DSA ports found by tag_sja1105
      are always sja1105 ports, which is in fact true. But keep the
      dsa_port_is_sja1105 function so that it's easy to patch it during
      testing, and rely on dead code elimination.
      
      Fixes: 994d2cbb ("net: dsa: tag_sja1105: be dsa_loop-safe")
      Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      4ac0567e
  11. 23 9月, 2021 4 次提交
  12. 19 9月, 2021 1 次提交
    • V
      net: dsa: be compatible with masters which unregister on shutdown · 0650bf52
      Vladimir Oltean 提交于
      Lino reports that on his system with bcmgenet as DSA master and KSZ9897
      as a switch, rebooting or shutting down never works properly.
      
      What does the bcmgenet driver have special to trigger this, that other
      DSA masters do not? It has an implementation of ->shutdown which simply
      calls its ->remove implementation. Otherwise said, it unregisters its
      network interface on shutdown.
      
      This message can be seen in a loop, and it hangs the reboot process there:
      
      unregister_netdevice: waiting for eth0 to become free. Usage count = 3
      
      So why 3?
      
      A usage count of 1 is normal for a registered network interface, and any
      virtual interface which links itself as an upper of that will increment
      it via dev_hold. In the case of DSA, this is the call path:
      
      dsa_slave_create
      -> netdev_upper_dev_link
         -> __netdev_upper_dev_link
            -> __netdev_adjacent_dev_insert
               -> dev_hold
      
      So a DSA switch with 3 interfaces will result in a usage count elevated
      by two, and netdev_wait_allrefs will wait until they have gone away.
      
      Other stacked interfaces, like VLAN, watch NETDEV_UNREGISTER events and
      delete themselves, but DSA cannot just vanish and go poof, at most it
      can unbind itself from the switch devices, but that must happen strictly
      earlier compared to when the DSA master unregisters its net_device, so
      reacting on the NETDEV_UNREGISTER event is way too late.
      
      It seems that it is a pretty established pattern to have a driver's
      ->shutdown hook redirect to its ->remove hook, so the same code is
      executed regardless of whether the driver is unbound from the device, or
      the system is just shutting down. As Florian puts it, it is quite a big
      hammer for bcmgenet to unregister its net_device during shutdown, but
      having a common code path with the driver unbind helps ensure it is well
      tested.
      
      So DSA, for better or for worse, has to live with that and engage in an
      arms race of implementing the ->shutdown hook too, from all individual
      drivers, and do something sane when paired with masters that unregister
      their net_device there. The only sane thing to do, of course, is to
      unlink from the master.
      
      However, complications arise really quickly.
      
      The pattern of redirecting ->shutdown to ->remove is not unique to
      bcmgenet or even to net_device drivers. In fact, SPI controllers do it
      too (see dspi_shutdown -> dspi_remove), and presumably, I2C controllers
      and MDIO controllers do it too (this is something I have not researched
      too deeply, but even if this is not the case today, it is certainly
      plausible to happen in the future, and must be taken into consideration).
      
      Since DSA switches might be SPI devices, I2C devices, MDIO devices, the
      insane implication is that for the exact same DSA switch device, we
      might have both ->shutdown and ->remove getting called.
      
      So we need to do something with that insane environment. The pattern
      I've come up with is "if this, then not that", so if either ->shutdown
      or ->remove gets called, we set the device's drvdata to NULL, and in the
      other hook, we check whether the drvdata is NULL and just do nothing.
      This is probably not necessary for platform devices, just for devices on
      buses, but I would really insist for consistency among drivers, because
      when code is copy-pasted, it is not always copy-pasted from the best
      sources.
      
      So depending on whether the DSA switch's ->remove or ->shutdown will get
      called first, we cannot really guarantee even for the same driver if
      rebooting will result in the same code path on all platforms. But
      nonetheless, we need to do something minimally reasonable on ->shutdown
      too to fix the bug. Of course, the ->remove will do more (a full
      teardown of the tree, with all data structures freed, and this is why
      the bug was not caught for so long). The new ->shutdown method is kept
      separate from dsa_unregister_switch not because we couldn't have
      unregistered the switch, but simply in the interest of doing something
      quick and to the point.
      
      The big question is: does the DSA switch's ->shutdown get called earlier
      than the DSA master's ->shutdown? If not, there is still a risk that we
      might still trigger the WARN_ON in unregister_netdevice that says we are
      attempting to unregister a net_device which has uppers. That's no good.
      Although the reference to the master net_device won't physically go away
      even if DSA's ->shutdown comes afterwards, remember we have a dev_hold
      on it.
      
      The answer to that question lies in this comment above device_link_add:
      
       * A side effect of the link creation is re-ordering of dpm_list and the
       * devices_kset list by moving the consumer device and all devices depending
       * on it to the ends of these lists (that does not happen to devices that have
       * not been registered when this function is called).
      
      so the fact that DSA uses device_link_add towards its master is not
      exactly for nothing. device_shutdown() walks devices_kset from the back,
      so this is our guarantee that DSA's shutdown happens before the master's
      shutdown.
      
      Fixes: 2f1e8ea7 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings")
      Link: https://lore.kernel.org/netdev/20210909095324.12978-1-LinoSanfilippo@gmx.de/Reported-by: NLino Sanfilippo <LinoSanfilippo@gmx.de>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0650bf52
  13. 25 8月, 2021 3 次提交
    • V
      net: dsa: tag_sja1105: stop asking the sja1105 driver in sja1105_xmit_tpid · 8ded9160
      Vladimir Oltean 提交于
      Introduced in commit 38b5beea ("net: dsa: sja1105: prepare tagger
      for handling DSA tags and VLAN simultaneously"), the sja1105_xmit_tpid
      function solved quite a different problem than our needs are now.
      
      Then, we used best-effort VLAN filtering and we were using the xmit_tpid
      to tunnel packets coming from an 8021q upper through the TX VLAN allocated
      by tag_8021q to that egress port. The need for a different VLAN protocol
      depending on switch revision came from the fact that this in itself was
      more of a hack to trick the hardware into accepting tunneled VLANs in
      the first place.
      
      Right now, we deny 8021q uppers (see sja1105_prechangeupper). Even if we
      supported them again, we would not do that using the same method of
      {tunneling the VLAN on egress, retagging the VLAN on ingress} that we
      had in the best-effort VLAN filtering mode. It seems rather simpler that
      we just allocate a VLAN in the VLAN table that is simply not used by the
      bridge at all, or by any other port.
      
      Anyway, I have 2 gripes with the current sja1105_xmit_tpid:
      
      1. When sending packets on behalf of a VLAN-aware bridge (with the new
         TX forwarding offload framework) plus untagged (with the tag_8021q
         VLAN added by the tagger) packets, we can see that on SJA1105P/Q/R/S
         and later (which have a qinq_tpid of ETH_P_8021AD), some packets sent
         through the DSA master have a VLAN protocol of 0x8100 and others of
         0x88a8. This is strange and there is no reason for it now. If we have
         a bridge and are therefore forced to send using that bridge's TPID,
         we can as well blend with that bridge's VLAN protocol for all packets.
      
      2. The sja1105_xmit_tpid introduces a dependency on the sja1105 driver,
         because it looks inside dp->priv. It is desirable to keep as much
         separation between taggers and switch drivers as possible. Now it
         doesn't do that anymore.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ded9160
    • V
      net: dsa: sja1105: drop untagged packets on the CPU and DSA ports · b0b8c67e
      Vladimir Oltean 提交于
      The sja1105 driver is a bit special in its use of VLAN headers as DSA
      tags. This is because in VLAN-aware mode, the VLAN headers use an actual
      TPID of 0x8100, which is understood even by the DSA master as an actual
      VLAN header.
      
      Furthermore, control packets such as PTP and STP are transmitted with no
      VLAN header as a DSA tag, because, depending on switch generation, there
      are ways to steer these control packets towards a precise egress port
      other than VLAN tags. Transmitting control packets as untagged means
      leaving a door open for traffic in general to be transmitted as untagged
      from the DSA master, and for it to traverse the switch and exit a random
      switch port according to the FDB lookup.
      
      This behavior is a bit out of line with other DSA drivers which have
      native support for DSA tagging. There, it is to be expected that the
      switch only accepts DSA-tagged packets on its CPU port, dropping
      everything that does not match this pattern.
      
      We perhaps rely a bit too much on the switches' hardware dropping on the
      CPU port, and place no other restrictions in the kernel data path to
      avoid that. For example, sja1105 is also a bit special in that STP/PTP
      packets are transmitted using "management routes"
      (sja1105_port_deferred_xmit): when sending a link-local packet from the
      CPU, we must first write a SPI message to the switch to tell it to
      expect a packet towards multicast MAC DA 01-80-c2-00-00-0e, and to route
      it towards port 3 when it gets it. This entry expires as soon as it
      matches a packet received by the switch, and it needs to be reinstalled
      for the next packet etc. All in all quite a ghetto mechanism, but it is
      all that the sja1105 switches offer for injecting a control packet.
      The driver takes a mutex for serializing control packets and making the
      pairs of SPI writes of a management route and its associated skb atomic,
      but to be honest, a mutex is only relevant as long as all parties agree
      to take it. With the DSA design, it is possible to open an AF_PACKET
      socket on the DSA master net device, and blast packets towards
      01-80-c2-00-00-0e, and whatever locking the DSA switch driver might use,
      it all goes kaput because management routes installed by the driver will
      match skbs sent by the DSA master, and not skbs generated by the driver
      itself. So they will end up being routed on the wrong port.
      
      So through the lens of that, maybe it would make sense to avoid that
      from happening by doing something in the network stack, like: introduce
      a new bit in struct sk_buff, like xmit_from_dsa. Then, somewhere around
      dev_hard_start_xmit(), introduce the following check:
      
      	if (netdev_uses_dsa(dev) && !skb->xmit_from_dsa)
      		kfree_skb(skb);
      
      Ok, maybe that is a bit drastic, but that would at least prevent a bunch
      of problems. For example, right now, even though the majority of DSA
      switches drop packets without DSA tags sent by the DSA master (and
      therefore the majority of garbage that user space daemons like avahi and
      udhcpcd and friends create), it is still conceivable that an aggressive
      user space program can open an AF_PACKET socket and inject a spoofed DSA
      tag directly on the DSA master. We have no protection against that; the
      packet will be understood by the switch and be routed wherever user
      space says. Furthermore: there are some DSA switches where we even have
      register access over Ethernet, using DSA tags. So even user space
      drivers are possible in this way. This is a huge hole.
      
      However, the biggest thing that bothers me is that udhcpcd attempts to
      ask for an IP address on all interfaces by default, and with sja1105, it
      will attempt to get a valid IP address on both the DSA master as well as
      on sja1105 switch ports themselves. So with IP addresses in the same
      subnet on multiple interfaces, the routing table will be messed up and
      the system will be unusable for traffic until it is configured manually
      to not ask for an IP address on the DSA master itself.
      
      It turns out that it is possible to avoid that in the sja1105 driver, at
      least very superficially, by requesting the switch to drop VLAN-untagged
      packets on the CPU port. With the exception of control packets, all
      traffic originated from tag_sja1105.c is already VLAN-tagged, so only
      STP and PTP packets need to be converted. For that, we need to uphold
      the equivalence between an untagged and a pvid-tagged packet, and to
      remember that the CPU port of sja1105 uses a pvid of 4095.
      
      Now that we drop untagged traffic on the CPU port, non-aggressive user
      space applications like udhcpcd stop bothering us, and sja1105 effectively
      becomes just as vulnerable to the aggressive kind of user space programs
      as other DSA switches are (ok, users can also create 8021q uppers on top
      of the DSA master in the case of sja1105, but in future patches we can
      easily deny that, but it still doesn't change the fact that VLAN-tagged
      packets can still be injected over raw sockets).
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b0b8c67e
    • V
      net: dsa: sja1105: prevent tag_8021q VLANs from being received on user ports · 73ceab83
      Vladimir Oltean 提交于
      Currently it is possible for an attacker to craft packets with a fake
      DSA tag and send them to us, and our user ports will accept them and
      preserve that VLAN when transmitting towards the CPU. Then the tagger
      will be misled into thinking that the packets came on a different port
      than they really came on.
      
      Up until recently there wasn't a good option to prevent this from
      happening. In SJA1105P and later, the MAC Configuration Table introduced
      two options called:
      - DRPSITAG: Drop Single Inner Tagged Frames
      - DRPSOTAG: Drop Single Outer Tagged Frames
      
      Because the sja1105 driver classifies all VLANs as "outer VLANs" (S-Tags),
      it would be in principle possible to enable the DRPSOTAG bit on ports
      using tag_8021q, and drop on ingress all packets which have a VLAN tag.
      When the switch is VLAN-unaware, this works, because it uses a custom
      TPID of 0xdadb, so any "tagged" packets received on a user port are
      probably a spoofing attempt. But when the switch overall is VLAN-aware,
      and some ports are standalone (therefore they use tag_8021q), the TPID
      is 0x8100, and the port can receive a mix of untagged and VLAN-tagged
      packets. The untagged ones will be classified to the tag_8021q pvid, and
      the tagged ones to the VLAN ID from the packet header. Yes, it is true
      that since commit 4fbc08bd ("net: dsa: sja1105: deny 8021q uppers on
      ports") we no longer support this mixed mode, but that is a temporary
      limitation which will eventually be lifted. It would be nice to not
      introduce one more restriction via DRPSOTAG, which would make the
      standalone ports of a VLAN-aware switch drop genuinely VLAN-tagged
      packets.
      
      Also, the DRPSOTAG bit is not available on the first generation of
      switches (SJA1105E, SJA1105T). So since one of the key features of this
      driver is compatibility across switch generations, this makes it an even
      less desirable approach.
      
      The breakthrough comes from commit bef0746c ("net: dsa: sja1105:
      make sure untagged packets are dropped on ingress ports with no pvid"),
      where it became obvious that untagged packets are not dropped even if
      the ingress port is not in the VMEMB_PORT vector of that port's pvid.
      However, VLAN-tagged packets are subject to VLAN ingress
      checking/dropping. This means that instead of using the catch-all
      DRPSOTAG bit introduced in SJA1105P, we can drop tagged packets on a
      per-VLAN basis, and this is already compatible with SJA1105E/T.
      
      This patch adds an "allowed_ingress" argument to sja1105_vlan_add(), and
      we call it with "false" for tag_8021q VLANs on user ports. The tag_8021q
      VLANs still need to be allowed, of course, on ingress to DSA ports and
      CPU ports.
      
      We also need to refine the drop_untagged check in sja1105_commit_pvid to
      make it not freak out about this new configuration. Currently it will
      try to keep the configuration consistent between untagged and pvid-tagged
      packets, so if the pvid of a port is 1 but VLAN 1 is not in VMEMB_PORT,
      packets tagged with VID 1 will behave the same as untagged packets, and
      be dropped. This behavior is what we want for ports under a VLAN-aware
      bridge, but for the ports with a tag_8021q pvid, we want untagged
      packets to be accepted, but packets tagged with a header recognized by
      the switch as a tag_8021q VLAN to be dropped. So only restrict the
      drop_untagged check to apply to the bridge_pvid, not to the tag_8021q_pvid.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73ceab83
  14. 18 8月, 2021 1 次提交
    • V
      net: dsa: tag_sja1105: be dsa_loop-safe · 994d2cbb
      Vladimir Oltean 提交于
      Add support for tag_sja1105 running on non-sja1105 DSA ports, by making
      sure that every time we dereference dp->priv, we check the switch's
      dsa_switch_ops (otherwise we access a struct sja1105_port structure that
      is in fact something else).
      
      This adds an unconditional build-time dependency between sja1105 being
      built as module => tag_sja1105 must also be built as module. This was
      there only for PTP before.
      
      Some sane defaults must also take place when not running on sja1105
      hardware. These are:
      
      - sja1105_xmit_tpid: the sja1105 driver uses different VLAN protocols
        depending on VLAN awareness and switch revision (when an encapsulated
        VLAN must be sent). Default to 0x8100.
      
      - sja1105_rcv_meta_state_machine: this aggregates PTP frames with their
        metadata timestamp frames. When running on non-sja1105 hardware, don't
        do that and accept all frames unmodified.
      
      - sja1105_defer_xmit: calls sja1105_port_deferred_xmit in sja1105_main.c
        which writes a management route over SPI. When not running on sja1105
        hardware, bypass the SPI write and send the frame as-is.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      994d2cbb
  15. 16 8月, 2021 1 次提交
    • V
      net: dsa: sja1105: reorganize probe, remove, setup and teardown ordering · 022522ac
      Vladimir Oltean 提交于
      The sja1105 driver's initialization and teardown sequence is a chaotic
      mess that has gathered a lot of cruft over time. It works because there
      is no strict dependency between the functions, but it could be improved.
      
      The basic principle that teardown should be the exact reverse of setup
      is obviously not held. We have initialization steps (sja1105_tas_setup,
      sja1105_flower_setup) in the probe method that are torn down in the DSA
      .teardown method instead of driver unbind time.
      
      We also have code after the dsa_register_switch() call, which implicitly
      means after the .setup() method has finished, which is pretty unusual.
      
      Also, sja1105_teardown() has calls set up in a different order than the
      error path of sja1105_setup(): see the reversed ordering between
      sja1105_ptp_clock_unregister and sja1105_mdiobus_unregister.
      
      Also, sja1105_static_config_load() is called towards the end of
      sja1105_setup(), but sja1105_static_config_free() is also towards the
      end of the error path and teardown path. The static_config_load() call
      should be earlier.
      
      Also, making and breaking the connections between struct sja1105_port
      and struct dsa_port could be refactored into dedicated functions, makes
      the code easier to follow.
      
      We move some code from the DSA .setup() method into the probe method,
      like the device tree parsing, and we move some code from the probe
      method into the DSA .setup() method to be symmetric with its placement
      in the DSA .teardown() method, which is nice because the unbind function
      has a single call to dsa_unregister_switch(). Example of the latter type
      of code movement are the connections between ports mentioned above, they
      are now in the .setup() method.
      
      Finally, due to fact that the kthread_init_worker() call is no longer
      in sja1105_probe() - located towards the bottom of the file - but in
      sja1105_setup() - located much higher - there is an inverse ordering
      with the worker function declaration, sja1105_port_deferred_xmit. To
      avoid that, the entire sja1105_setup() and sja1105_teardown() functions
      are moved towards the bottom of the file.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      022522ac
  16. 12 8月, 2021 1 次提交
  17. 10 8月, 2021 1 次提交
    • V
      net: dsa: sja1105: fix broken backpressure in .port_fdb_dump · 21b52fed
      Vladimir Oltean 提交于
      rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into
      multiple netlink skbs if the buffer provided by user space is too small
      (one buffer will typically handle a few hundred FDB entries).
      
      When the current buffer becomes full, nlmsg_put() in
      dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index
      of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that
      point, and then the dump resumes on the same port with a new skb, and
      FDB entries up to the saved index are simply skipped.
      
      Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to
      drivers, then drivers must check for the -EMSGSIZE error code returned
      by it. Otherwise, when a netlink skb becomes full, DSA will no longer
      save newly dumped FDB entries to it, but the driver will continue
      dumping. So FDB entries will be missing from the dump.
      
      Fix the broken backpressure by propagating the "cb" return code and
      allow rtnl_fdb_dump() to restart the FDB dump with a new skb.
      
      Fixes: 291d1e72 ("net: dsa: sja1105: Add support for FDB and MDB management")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      21b52fed
  18. 09 8月, 2021 1 次提交