1. 25 10月, 2021 2 次提交
  2. 24 10月, 2021 1 次提交
  3. 21 10月, 2021 4 次提交
    • V
      net: mscc: ocelot: track the port pvid using a pointer · d4004422
      Vladimir Oltean 提交于
      Now that we have a list of struct ocelot_bridge_vlan entries, we can
      rewrite the pvid logic to simply point to one of those structures,
      instead of having a separate structure with a "bool valid".
      The NULL pointer will represent the lack of a bridge pvid (not to be
      confused with the lack of a hardware pvid on the port, that is present
      at all times).
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4004422
    • V
      net: mscc: ocelot: allow a config where all bridge VLANs are egress-untagged · 0da1a1c4
      Vladimir Oltean 提交于
      At present, the ocelot driver accepts a single egress-untagged bridge
      VLAN, meaning that this sequence of operations:
      
      ip link add br0 type bridge vlan_filtering 1
      ip link set swp0 master br0
      bridge vlan add dev swp0 vid 2 pvid untagged
      
      fails because the bridge automatically installs VID 1 as a pvid & untagged
      VLAN, and vid 2 would be the second untagged VLAN on this port. It is
      necessary to delete VID 1 before proceeding to add VID 2.
      
      This limitation comes from the fact that we operate the port tag, when
      it has an egress-untagged VID, in the OCELOT_PORT_TAG_NATIVE mode.
      The ocelot switches do not have full flexibility and can either have one
      single VID as egress-untagged, or all of them.
      
      There are use cases for having all VLANs as egress-untagged as well, and
      this patch adds support for that.
      
      The change rewrites ocelot_port_set_native_vlan() into a more generic
      ocelot_port_manage_port_tag() function. Because the software bridge's
      state, transmitted to us via switchdev, can become very complex, we
      don't attempt to track all possible state transitions, but instead take
      a more declarative approach and just make ocelot_port_manage_port_tag()
      figure out which more to operate in:
      
      - port is VLAN-unaware: the classified VLAN (internal, unrelated to the
                              802.1Q header) is not inserted into packets on egress
      - port is VLAN-aware:
        - port has tagged VLANs:
          -> port has no untagged VLAN: set up as pure trunk
          -> port has one untagged VLAN: set up as trunk port + native VLAN
          -> port has more than one untagged VLAN: this is an invalid config
             which is rejected by ocelot_vlan_prepare
        - port has no tagged VLANs
          -> set up as pure egress-untagged port
      
      We don't keep the number of tagged and untagged VLANs, we just count the
      structures we keep.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0da1a1c4
    • V
      net: mscc: ocelot: convert the VLAN masks to a list · 90e0aa8d
      Vladimir Oltean 提交于
      First and foremost, the driver currently allocates a constant sized
      4K * u32 (16KB memory) array for the VLAN masks. However, a typical
      application might not need so many VLANs, so if we dynamically allocate
      the memory as needed, we might actually save some space.
      
      Secondly, we'll need to keep more advanced bookkeeping of the VLANs we
      have, notably we'll have to check how many untagged and how many tagged
      VLANs we have. This will have to stay in a structure, and allocating
      another 16 KB array for that is again a bit too much.
      
      So refactor the bridge VLANs in a linked list of structures.
      
      The hook points inside the driver are ocelot_vlan_member_add() and
      ocelot_vlan_member_del(), which previously used to operate on the
      ocelot->vlan_mask[vid] array element.
      
      ocelot_vlan_member_add() and ocelot_vlan_member_del() used to call
      ocelot_vlan_member_set() to commit to the ocelot->vlan_mask.
      Additionally, we had two calls to ocelot_vlan_member_set() from outside
      those callers, and those were directly from ocelot_vlan_init().
      Those calls do not set up bridging service VLANs, instead they:
      
      - clear the VLAN table on reset
      - set the port pvid to the value used by this driver for VLAN-unaware
        standalone port operation (VID 0)
      
      So now, when we have a structure which represents actual bridge VLANs,
      VID 0 doesn't belong in that structure, since it is not part of the
      bridging layer.
      
      So delete the middle man, ocelot_vlan_member_set(), and let
      ocelot_vlan_init() call directly ocelot_vlant_set_mask() which forgoes
      any data structure and writes directly to hardware, which is all that we
      need.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      90e0aa8d
    • V
      net: mscc: ocelot: add a type definition for REW_TAG_CFG_TAG_CFG · 62a22bcb
      Vladimir Oltean 提交于
      This is a cosmetic patch which clarifies what are the port tagging
      options for Ocelot switches.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      62a22bcb
  4. 13 10月, 2021 4 次提交
    • V
      net: dsa: tag_ocelot_8021q: break circular dependency with ocelot switch lib · 49f885b2
      Vladimir Oltean 提交于
      Michael reported that when using the "ocelot-8021q" tagging protocol,
      the switch driver module must be manually loaded before the tagging
      protocol can be loaded/is available.
      
      This appears to be the same problem described here:
      https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
      where due to the fact that DSA tagging protocols make use of symbols
      exported by the switch drivers, circular dependencies appear and this
      breaks module autoloading.
      
      The ocelot_8021q driver needs the ocelot_can_inject() and
      ocelot_port_inject_frame() functions from the switch library. Previously
      the wrong approach was taken to solve that dependency: shims were
      provided for the case where the ocelot switch library was compiled out,
      but that turns out to be insufficient, because the dependency when the
      switch lib _is_ compiled is problematic too.
      
      We cannot declare ocelot_can_inject() and ocelot_port_inject_frame() as
      static inline functions, because these access I/O functions like
      __ocelot_write_ix() which is called by ocelot_write_rix(). Making those
      static inline basically means exposing the whole guts of the ocelot
      switch library, not ideal...
      
      We already have one tagging protocol driver which calls into the switch
      driver during xmit but not using any exported symbol: sja1105_defer_xmit.
      We can do the same thing here: create a kthread worker and one work item
      per skb, and let the switch driver itself do the register accesses to
      send the skb, and then consume it.
      
      Fixes: 0a6f17c6 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping")
      Reported-by: NMichael Walle <michael@walle.cc>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      49f885b2
    • V
      net: dsa: tag_ocelot: break circular dependency with ocelot switch lib driver · deab6b1c
      Vladimir Oltean 提交于
      As explained here:
      https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
      DSA tagging protocol drivers cannot depend on symbols exported by switch
      drivers, because this creates a circular dependency that breaks module
      autoloading.
      
      The tag_ocelot.c file depends on the ocelot_ptp_rew_op() function
      exported by the common ocelot switch lib. This function looks at
      OCELOT_SKB_CB(skb) and computes how to populate the REW_OP field of the
      DSA tag, for PTP timestamping (the command: one-step/two-step, and the
      TX timestamp identifier).
      
      None of that requires deep insight into the driver, it is quite
      stateless, as it only depends upon the skb->cb. So let's make it a
      static inline function and put it in include/linux/dsa/ocelot.h, a
      file that despite its name is used by the ocelot switch driver for
      populating the injection header too - since commit 40d3f295 ("net:
      mscc: ocelot: use common tag parsing code with DSA").
      
      With that function declared as static inline, its body is expanded
      inside each call site, so the dependency is broken and the DSA tagger
      can be built without the switch library, upon which the felix driver
      depends.
      
      Fixes: 39e5308b ("net: mscc: ocelot: support PTP Sync one-step timestamping")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      deab6b1c
    • V
      net: mscc: ocelot: cross-check the sequence id from the timestamp FIFO with the skb PTP header · ebb4c6a9
      Vladimir Oltean 提交于
      The sad reality is that when a PTP frame with a TX timestamping request
      is transmitted, it isn't guaranteed that it will make it all the way to
      the wire (due to congestion inside the switch), and that a timestamp
      will be taken by the hardware and placed in the timestamp FIFO where an
      IRQ will be raised for it.
      
      The implication is that if enough PTP frames are silently dropped by the
      hardware such that the timestamp ID has rolled over, it is possible to
      match a timestamp to an old skb.
      
      Furthermore, nobody will match on the real skb corresponding to this
      timestamp, since we stupidly matched on a previous one that was stale in
      the queue, and stopped there.
      
      So PTP timestamping will be broken and there will be no way to recover.
      
      It looks like the hardware parses the sequenceID from the PTP header,
      and also provides that metadata for each timestamp. The driver currently
      ignores this, but it shouldn't.
      
      As an extra resiliency measure, do the following:
      
      - check whether the PTP sequenceID also matches between the skb and the
        timestamp, treat the skb as stale otherwise and free it
      
      - if we see a stale skb, don't stop there and try to match an skb one
        more time, chances are there's one more skb in the queue with the same
        timestamp ID, otherwise we wouldn't have ever found the stale one (it
        is by timestamp ID that we matched it).
      
      While this does not prevent PTP packet drops, it at least prevents
      the catastrophic consequences of incorrect timestamp matching.
      
      Since we already call ptp_classify_raw in the TX path, save the result
      in the skb->cb of the clone, and just use that result in the interrupt
      code path.
      
      Fixes: 4e3b0468 ("net: mscc: PTP Hardware Clock (PHC) support")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      ebb4c6a9
    • V
      net: mscc: ocelot: avoid overflowing the PTP timestamp FIFO · 52849bcf
      Vladimir Oltean 提交于
      PTP packets with 2-step TX timestamp requests are matched to packets
      based on the egress port number and a 6-bit timestamp identifier.
      All PTP timestamps are held in a common FIFO that is 128 entry deep.
      
      This patch ensures that back-to-back timestamping requests cannot exceed
      the hardware FIFO capacity. If that happens, simply send the packets
      without requesting a TX timestamp to be taken (in the case of felix,
      since the DSA API has a void return code in ds->ops->port_txtstamp) or
      drop them (in the case of ocelot).
      
      I've moved the ts_id_lock from a per-port basis to a per-switch basis,
      because we need separate accounting for both numbers of PTP frames in
      flight. And since we need locking to inc/dec the per-switch counter,
      that also offers protection for the per-port counter and hence there is
      no reason to have a per-port counter anymore.
      
      Fixes: 4e3b0468 ("net: mscc: PTP Hardware Clock (PHC) support")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      52849bcf
  5. 20 8月, 2021 2 次提交
  6. 16 8月, 2021 2 次提交
    • V
      net: mscc: ocelot: convert to phylink · e6e12df6
      Vladimir Oltean 提交于
      The felix DSA driver, which is a wrapper over the same hardware class as
      ocelot, is integrated with phylink, but ocelot is using the plain PHY
      library. It makes sense to bring together the two implementations, which
      is what this patch achieves.
      
      This is a large patch and hard to break up, but it does the following:
      
      The existing ocelot_adjust_link writes some registers, and
      felix_phylink_mac_link_up writes some registers, some of them are
      common, but both functions write to some registers to which the other
      doesn't.
      
      The main reasons for this are:
      - Felix switches so far have used an NXP PCS so they had no need to
        write the PCS1G registers that ocelot_adjust_link writes
      - Felix switches have the MAC fixed at 1G, so some of the MAC speed
        changes actually break the link and must be avoided.
      
      The naming conventions for the functions introduced in this patch are:
      - vsc7514_phylink_{mac_config,validate} are specific to the Ocelot
        instantiations and placed in ocelot_net.c which is built only for the
        ocelot switchdev driver.
      - ocelot_phylink_mac_link_{up,down} are shared between the ocelot
        switchdev driver and the felix DSA driver (they are put in the common
        lib).
      
      One by one, the registers written by ocelot_adjust_link are:
      
      DEV_MAC_MODE_CFG - felix_phylink_mac_link_up had no need to write this
                         register since its out-of-reset value was fine and
                         did not need changing. The write is moved to the
                         common ocelot_phylink_mac_link_up and on felix it is
                         guarded by a quirk bit that makes the written value
                         identical with the out-of-reset one
      DEV_PORT_MISC - runtime invariant, was moved to vsc7514_phylink_mac_config
      PCS1G_MODE_CFG - same as above
      PCS1G_SD_CFG - same as above
      PCS1G_CFG - same as above
      PCS1G_ANEG_CFG - same as above
      PCS1G_LB_CFG - same as above
      DEV_MAC_ENA_CFG - both ocelot_adjust_link and ocelot_port_disable
                        touched this. felix_phylink_mac_link_{up,down} also
                        do. We go with what felix does and put it in
                        ocelot_phylink_mac_link_up.
      DEV_CLOCK_CFG - ocelot_adjust_link and felix_phylink_mac_link_up both
                      write this, but to different values. Move to the common
                      ocelot_phylink_mac_link_up and make sure via the quirk
                      that the old values are preserved for both.
      ANA_PFC_PFC_CFG - ocelot_adjust_link wrote this, felix_phylink_mac_link_up
                        did not. Runtime invariant, speed does not matter since
                        PFC is disabled via the RX_PFC_ENA bits which are cleared.
                        Move to vsc7514_phylink_mac_config.
      QSYS_SWITCH_PORT_MODE_PORT_ENA - both ocelot_adjust_link and
                                       felix_phylink_mac_link_{up,down} wrote
                                       this. Ocelot also wrote this register
                                       from ocelot_port_disable. Keep what
                                       felix did, move in ocelot_phylink_mac_link_{up,down}
                                       and delete ocelot_port_disable.
      ANA_POL_FLOWC - same as above
      SYS_MAC_FC_CFG - same as above, except slight behavior change. Whereas
                       ocelot always enabled RX and TX flow control, felix
                       listened to phylink (for the most part, at least - see
                       the 2500base-X comment).
      
      The registers which only felix_phylink_mac_link_up wrote are:
      
      SYS_PAUSE_CFG_PAUSE_ENA - this is why I am not sure that flow control
                                worked on ocelot. Not it should, since the
                                code is shared with felix where it does.
      ANA_PORT_PORT_CFG - this is a Frame Analyzer block register, phylink
                          should be the one touching them, deleted.
      
      Other changes:
      
      - The old phylib registration code was in mscc_ocelot_init_ports. It is
        hard to work with 2 levels of indentation already in, and with hard to
        follow teardown logic. The new phylink registration code was moved
        inside ocelot_probe_port(), right between alloc_etherdev() and
        register_netdev(). It could not be done before (=> outside of)
        ocelot_probe_port() because ocelot_probe_port() allocates the struct
        ocelot_port which we then use to assign ocelot_port->phy_mode to. It
        is more preferable to me to have all PHY handling logic inside the
        same function.
      - On the same topic: struct ocelot_port_private :: serdes is only used
        in ocelot_port_open to set the SERDES protocol to Ethernet. This is
        logically a runtime invariant and can be done just once, when the port
        registers with phylink. We therefore don't even need to keep the
        serdes reference inside struct ocelot_port_private, or to use the devm
        variant of of_phy_get().
      - Phylink needs a valid phy-mode for phylink_create() to succeed, and
        the existing device tree bindings in arch/mips/boot/dts/mscc/ocelot_pcb120.dts
        don't define one for the internal PHY ports. So we patch
        PHY_INTERFACE_MODE_NA into PHY_INTERFACE_MODE_INTERNAL.
      - There was a strategically placed:
      
      	switch (priv->phy_mode) {
      	case PHY_INTERFACE_MODE_NA:
      	        continue;
      
        which made the code skip the serdes initialization for the internal
        PHY ports. Frankly that is not all that obvious, so now we explicitly
        initialize the serdes under an "if" condition and not rely on code
        jumps, so everything is clearer.
      - There was a write of OCELOT_SPEED_1000 to DEV_CLOCK_CFG for QSGMII
        ports. Since that is in fact the default value for the register field
        DEV_CLOCK_CFG_LINK_SPEED, I can only guess the intention was to clear
        the adjacent fields, MAC_TX_RST and MAC_RX_RST, aka take the port out
        of reset, which does match the comment. I don't even want to know why
        this code is placed there, but if there is indeed an issue that all
        ports that share a QSGMII lane must all be up, then this logic is
        already buggy, since mscc_ocelot_init_ports iterates using
        for_each_available_child_of_node, so nobody prevents the user from
        putting a 'status = "disabled";' for some QSGMII ports which would
        break the driver's assumption.
        In any case, in the eventuality that I'm right, we would have yet
        another issue if ocelot_phylink_mac_link_down would reset those ports
        and that would be forbidden, so since the ocelot_adjust_link logic did
        not do that (maybe for a reason), add another quirk to preserve the
        old logic.
      
      The ocelot driver teardown goes through all ports in one fell swoop.
      When initialization of one port fails, the ocelot->ports[port] pointer
      for that is reset to NULL, and teardown is done only for non-NULL ports,
      so there is no reason to do partial teardowns, let the central
      mscc_ocelot_release_ports() do its job.
      
      Tested bind, unbind, rebind, link up, link down, speed change on mock-up
      hardware (modified the driver to probe on Felix VSC9959). Also
      regression tested the felix DSA driver. Could not test the Ocelot
      specific bits (PCS1G, SERDES, device tree bindings).
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e6e12df6
    • V
      net: dsa: felix: stop calling ocelot_port_{enable,disable} · 46efe4ef
      Vladimir Oltean 提交于
      ocelot_port_enable touches ANA_PORT_PORT_CFG, which has the following
      fields:
      
      - LOCKED_PORTMOVE_CPU, LEARNDROP, LEARNCPU, LEARNAUTO, RECV_ENA, all of
        which are written with their hardware default values, also runtime
        invariants. So it makes no sense to write these during every .ndo_open.
      
      - PORTID_VAL: this field has an out-of-reset value of zero for all ports
        and must be initialized by software. Additionally, the
        ocelot_setup_logical_port_ids() code path sets up different logical
        port IDs for the ports in a hardware LAG, and we absolutely don't want
        .ndo_open to interfere there and reset those values.
      
      So in fact the write from ocelot_port_enable can better be moved to
      ocelot_init_port, and the .ndo_open hook deleted.
      
      ocelot_port_disable touches DEV_MAC_ENA_CFG and QSYS_SWITCH_PORT_MODE_PORT_ENA,
      in an attempt to undo what ocelot_adjust_link did. But since .ndo_stop
      does not get called each time the link falls (i.e. this isn't a
      substitute for .phylink_mac_link_down), felix already does better at
      this by writing those registers already in felix_phylink_mac_link_down.
      
      So keep ocelot_port_disable (for now, until ocelot is converted to
      phylink too), and just delete the felix call to it, which is not
      necessary.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      46efe4ef
  7. 28 4月, 2021 3 次提交
  8. 24 3月, 2021 1 次提交
    • V
      net: ocelot: replay switchdev events when joining bridge · e4bd44e8
      Vladimir Oltean 提交于
      The premise of this change is that the switchdev port attributes and
      objects offloaded by ocelot might have been missed when we are joining
      an already existing bridge port, such as a bonding interface.
      
      The patch pulls these switchdev attributes and objects from the bridge,
      on behalf of the 'bridge port' net device which might be either the
      ocelot switch interface, or the bonding upper interface.
      
      The ocelot_net.c belongs strictly to the switchdev ocelot driver, while
      ocelot.c is part of a library shared with the DSA felix driver.
      The ocelot_port_bridge_leave function (part of the common library) used
      to call ocelot_port_vlan_filtering(false), something which is not
      necessary for DSA, since the framework deals with that already there.
      So we move this function to ocelot_switchdev_unsync, which is specific
      to the switchdev driver.
      
      The code movement described above makes ocelot_port_bridge_leave no
      longer return an error code, so we change its type from int to void.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e4bd44e8
  9. 19 3月, 2021 1 次提交
    • V
      net: ocelot: support multiple bridges · df291e54
      Vladimir Oltean 提交于
      The ocelot switches are a bit odd in that they do not have an STP state
      to put the ports into. Instead, the forwarding configuration is delayed
      from the typical port_bridge_join into stp_state_set, when the port enters
      the BR_STATE_FORWARDING state.
      
      I can only guess that the implementation of this quirk is the reason that
      led to the simplification of the driver such that only one bridge could
      be offloaded at a time.
      
      We can simplify the data structures somewhat, and introduce a per-port
      bridge device pointer and STP state, similar to how the LAG offload
      works now (there we have a per-port bonding device pointer and TX
      enabled state). This allows offloading multiple bridges with relative
      ease, while still keeping in place the quirk to delay the programming of
      the PGIDs.
      
      We actually need this change now because we need to remove the bogus
      restriction from ocelot_bridge_stp_state_set that ocelot->bridge_mask
      needs to contain BIT(port), otherwise that function is a no-op.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df291e54
  10. 17 3月, 2021 2 次提交
  11. 17 2月, 2021 1 次提交
    • H
      net: mscc: ocelot: Add support for MRP · d8ea7ff3
      Horatiu Vultur 提交于
      Add basic support for MRP. The HW will just trap all MRP frames on the
      ring ports to CPU and allow the SW to process them. In this way it is
      possible to for this node to behave both as MRM and MRC.
      
      Current limitations are:
      - it doesn't support Interconnect roles.
      - it supports only a single ring.
      - the HW should be able to do forwarding of MRP Test frames so the SW
        will not need to do this. So it would be able to have the role MRC
        without SW support.
      Signed-off-by: NHoratiu Vultur <horatiu.vultur@microchip.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d8ea7ff3
  12. 15 2月, 2021 4 次提交
    • V
      net: dsa: tag_ocelot_8021q: add support for PTP timestamping · 0a6f17c6
      Vladimir Oltean 提交于
      For TX timestamping, we use the felix_txtstamp method which is common
      with the regular (non-8021q) ocelot tagger. This method says that skb
      deferral is needed, prepares a timestamp request ID, and puts a clone of
      the skb in a queue waiting for the timestamp IRQ.
      
      felix_txtstamp is called by dsa_skb_tx_timestamp() just before the
      tagger's xmit method. In the tagger xmit, we divert the packets
      classified by dsa_skb_tx_timestamp() as PTP towards the MMIO-based
      injection registers, and we declare them as dead towards dsa_slave_xmit.
      If not PTP, we proceed with normal tag_8021q stuff.
      
      Then the timestamp IRQ fires, the clone queued up from felix_txtstamp is
      matched to the TX timestamp retrieved from the switch's FIFO based on
      the timestamp request ID, and the clone is delivered to the stack.
      
      On RX, thanks to the VCAP IS2 rule that redirects the frames with an
      EtherType for 1588 towards two destinations:
      - the CPU port module (for MMIO based extraction) and
      - if the "no XTR IRQ" workaround is in place, the dsa_8021q CPU port
      the relevant data path processing starts in the ptp_classify_raw BPF
      classifier installed by DSA in the RX data path (post tagger, which is
      completely unaware that it saw a PTP packet).
      
      This time we can't reuse the same implementation of .port_rxtstamp that
      also works with the default ocelot tagger. That is because felix_rxtstamp
      is given an skb with a freshly stripped DSA header, and it says "I don't
      need deferral for its RX timestamp, it's right in it, let me show you";
      and it just points to the header right behind skb->data, from where it
      unpacks the timestamp and annotates the skb with it.
      
      The same thing cannot happen with tag_ocelot_8021q, because for one
      thing, the skb did not have an extraction frame header in the first
      place, but a VLAN tag with no timestamp information. So the code paths
      in felix_rxtstamp for the regular and 8021q tagger are completely
      independent. With tag_8021q, the timestamp must come from the packet's
      duplicate delivered to the CPU port module, but there is potentially
      complex logic to be handled [ and prone to reordering ] if we were to
      just start reading packets from the CPU port module, and try to match
      them to the one we received over Ethernet and which needs an RX
      timestamp. So we do something simple: we tell DSA "give me some time to
      think" (we request skb deferral by returning false from .port_rxtstamp)
      and we just drop the frame we got over Ethernet with no attempt to match
      it to anything - we just treat it as a notification that there's data to
      be processed from the CPU port module's queues. Then we proceed to read
      the packets from those, one by one, which we deliver up the stack,
      timestamped, using netif_rx - the same function that any driver would
      use anyway if it needed RX timestamp deferral. So the assumption is that
      we'll come across the PTP packet that triggered the CPU extraction
      notification eventually, but we don't know when exactly. Thanks to the
      VCAP IS2 trap/redirect rule and the exclusion of the CPU port module
      from the flooding replicators, only PTP frames should be present in the
      CPU port module's RX queues anyway.
      
      There is just one conflict between the VCAP IS2 trapping rule and the
      semantics of the BPF classifier. Namely, ptp_classify_raw() deems
      general messages as non-timestampable, but still, those are trapped to
      the CPU port module since they have an EtherType of ETH_P_1588. So, if
      the "no XTR IRQ" workaround is in place, we need to run another BPF
      classifier on the frames extracted over MMIO, to avoid duplicates being
      sent to the stack (once over Ethernet, once over MMIO). It doesn't look
      like it's possible to install VCAP IS2 rules based on keys extracted
      from the 1588 frame headers.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a6f17c6
    • V
      net: mscc: ocelot: refactor ocelot_xtr_irq_handler into ocelot_xtr_poll · 924ee317
      Vladimir Oltean 提交于
      Since the felix DSA driver will need to poll the CPU port module for
      extracted frames as well, let's create some common functions that read
      an Extraction Frame Header, and then an skb, from a CPU extraction
      group.
      
      We abuse the struct ocelot_ops :: port_to_netdev function a little bit,
      in order to retrieve the DSA port net_device or the ocelot switchdev
      net_device based on the source port information from the Extraction
      Frame Header, but it's all in the benefit of code simplification -
      netdev_alloc_skb needs it. Originally, the port_to_netdev method was
      intended for parsing act->dev from tc flower offload code.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      924ee317
    • V
      net: mscc: ocelot: use common tag parsing code with DSA · 40d3f295
      Vladimir Oltean 提交于
      The Injection Frame Header and Extraction Frame Header that the switch
      prepends to frames over the NPI port is also prepended to frames
      delivered over the CPU port module's queues.
      
      Let's unify the handling of the frame headers by making the ocelot
      driver call some helpers exported by the DSA tagger. Among other things,
      this allows us to get rid of the strange cpu_to_be32 when transmitting
      the Injection Frame Header on ocelot, since the packing API uses
      network byte order natively (when "quirks" is 0).
      
      The comments above ocelot_gen_ifh talk about setting pop_cnt to 3, and
      the cpu extraction queue mask to something, but the code doesn't do it,
      so we don't do it either.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40d3f295
    • V
      net: mscc: ocelot: refactor ocelot_port_inject_frame out of ocelot_port_xmit · 137ffbc4
      Vladimir Oltean 提交于
      The felix DSA driver will inject some frames through register MMIO, same
      as ocelot switchdev currently does. So we need to be able to reuse the
      common code.
      
      Also create some shim definitions, since the DSA tagger can be compiled
      without support for the switch driver.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      137ffbc4
  13. 13 2月, 2021 2 次提交
  14. 10 2月, 2021 1 次提交
    • V
      net: dsa: felix: implement port flushing on .phylink_mac_link_down · eb4733d7
      Vladimir Oltean 提交于
      There are several issues which may be seen when the link goes down while
      forwarding traffic, all of which can be attributed to the fact that the
      port flushing procedure from the reference manual was not closely
      followed.
      
      With flow control enabled on both the ingress port and the egress port,
      it may happen when a link goes down that Ethernet packets are in flight.
      In flow control mode, frames are held back and not dropped. When there
      is enough traffic in flight (example: iperf3 TCP), then the ingress port
      might enter congestion and never exit that state. This is a problem,
      because it is the egress port's link that went down, and that has caused
      the inability of the ingress port to send packets to any other port.
      This is solved by flushing the egress port's queues when it goes down.
      
      There is also a problem when performing stream splitting for
      IEEE 802.1CB traffic (not yet upstream, but a sort of multicast,
      basically). There, if one port from the destination ports mask goes
      down, splitting the stream towards the other destinations will no longer
      be performed. This can be traced down to this line:
      
      	ocelot_port_writel(ocelot_port, 0, DEV_MAC_ENA_CFG);
      
      which should have been instead, as per the reference manual:
      
      	ocelot_port_rmwl(ocelot_port, 0, DEV_MAC_ENA_CFG_RX_ENA,
      			 DEV_MAC_ENA_CFG);
      
      Basically only DEV_MAC_ENA_CFG_RX_ENA should be disabled, but not
      DEV_MAC_ENA_CFG_TX_ENA - I don't have further insight into why that is
      the case, but apparently multicasting to several ports will cause issues
      if at least one of them doesn't have DEV_MAC_ENA_CFG_TX_ENA set.
      
      I am not sure what the state of the Ocelot VSC7514 driver is, but
      probably not as bad as Felix/Seville, since VSC7514 uses phylib and has
      the following in ocelot_adjust_link:
      
      	if (!phydev->link)
      		return;
      
      therefore the port is not really put down when the link is lost, unlike
      the DSA drivers which use .phylink_mac_link_down for that.
      
      Nonetheless, I put ocelot_port_flush() in the common ocelot.c because it
      needs to access some registers from drivers/net/ethernet/mscc/ocelot_rew.h
      which are not exported in include/soc/mscc/ and a bugfix patch should
      probably not move headers around.
      
      Fixes: bdeced75 ("net: dsa: felix: Add PCS operations for PHYLINK")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb4733d7
  15. 07 2月, 2021 4 次提交
  16. 30 1月, 2021 2 次提交
    • V
      net: dsa: felix: perform switch setup for tag_8021q · e21268ef
      Vladimir Oltean 提交于
      Unlike sja1105, the only other user of the software-defined tag_8021q.c
      tagger format, the implementation we choose for the Felix DSA switch
      driver preserves full functionality under a vlan_filtering bridge
      (i.e. IP termination works through the DSA user ports under all
      circumstances).
      
      The tag_8021q protocol just wants:
      - Identifying the ingress switch port based on the RX VLAN ID, as seen
        by the CPU. We achieve this by using the TCAM engines (which are also
        used for tc-flower offload) to push the RX VLAN as a second, outer
        tag, on egress towards the CPU port.
      - Steering traffic injected into the switch from the network stack
        towards the correct front port based on the TX VLAN, and consuming
        (popping) that header on the switch's egress.
      
      A tc-flower pseudocode of the static configuration done by the driver
      would look like this:
      
      $ tc qdisc add dev <cpu-port> clsact
      $ for eth in swp0 swp1 swp2 swp3; do \
      	tc filter add dev <cpu-port> egress flower indev ${eth} \
      		action vlan push id <rxvlan> protocol 802.1ad; \
      	tc filter add dev <cpu-port> ingress protocol 802.1Q flower
      		vlan_id <txvlan> action vlan pop \
      		action mirred egress redirect dev ${eth}; \
      done
      
      but of course since DSA does not register network interfaces for the CPU
      port, this configuration would be impossible for the user to do. Also,
      due to the same reason, it is impossible for the user to inadvertently
      delete these rules using tc. These rules do not collide in any way with
      tc-flower, they just consume some TCAM space, which is something we can
      live with.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      e21268ef
    • V
      net: mscc: ocelot: don't use NPI tag prefix for the CPU port module · cacea62f
      Vladimir Oltean 提交于
      Context: Ocelot switches put the injection/extraction frame header in
      front of the Ethernet header. When used in NPI mode, a DSA master would
      see junk instead of the destination MAC address, and it would most
      likely drop the packets. So the Ocelot frame header can have an optional
      prefix, which is just "ff:ff:ff:ff:ff:fe > ff:ff:ff:ff:ff:ff" padding
      put before the actual tag (still before the real Ethernet header) such
      that the DSA master thinks it's looking at a broadcast frame with a
      strange EtherType.
      
      Unfortunately, a lesson learned in commit 69df578c ("net: mscc:
      ocelot: eliminate confusion between CPU and NPI port") seems to have
      been forgotten in the meanwhile.
      
      The CPU port module and the NPI port have independent settings for the
      length of the tag prefix. However, the driver is using the same variable
      to program both of them.
      
      There is no reason really to use any tag prefix with the CPU port
      module, since that is not connected to any Ethernet port. So this patch
      makes the inj_prefix and xtr_prefix variables apply only to the NPI
      port (which the switchdev ocelot_vsc7514 driver does not use).
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      cacea62f
  17. 16 1月, 2021 4 次提交
    • V
      net: mscc: ocelot: configure watermarks using devlink-sb · f59fd9ca
      Vladimir Oltean 提交于
      Using devlink-sb, we can configure 12/16 (the important 75%) of the
      switch's controlling watermarks for congestion drops, and we can monitor
      50% of the watermark occupancies (we can monitor the reservation
      watermarks, but not the sharing watermarks, which are exposed as pool
      sizes).
      
      The following definitions can be made:
      
      SB_BUF=0 # The devlink-sb for frame buffers
      SB_REF=1 # The devlink-sb for frame references
      POOL_ING=0 # The pool for ingress traffic. Both devlink-sb instances
                 # have one of these.
      POOL_EGR=1 # The pool for egress traffic. Both devlink-sb instances
                 # have one of these.
      
      Editing the hardware watermarks is done in the following way:
      BUF_xxxx_I is accessed when sb=$SB_BUF and pool=$POOL_ING
      REF_xxxx_I is accessed when sb=$SB_REF and pool=$POOL_ING
      BUF_xxxx_E is accessed when sb=$SB_BUF and pool=$POOL_EGR
      REF_xxxx_E is accessed when sb=$SB_REF and pool=$POOL_EGR
      
      Configuring the sharing watermarks for COL_SHR(dp=0) is done implicitly
      by modifying the corresponding pool size. By default, the pool size has
      maximum size, so this can be skipped.
      
      devlink sb pool set pci/0000:00:00.5 sb $SB_BUF pool $POOL_ING \
      	size 129840 thtype static
      
      Since by default there is no buffer reservation, the above command has
      maxed out BUF_COL_SHR_I(dp=0).
      
      Configuring the per-port reservation watermark (P_RSRV) is done in the
      following way:
      
      devlink sb port pool set pci/0000:00:00.5/0 sb $SB_BUF \
      	pool $POOL_ING th 1000
      
      The above command sets BUF_P_RSRV_I(port 0) to 1000 bytes. After this
      command, the sharing watermarks are internally reconfigured with 1000
      bytes less, i.e. from 129840 bytes to 128840 bytes.
      
      Configuring the per-port-tc reservation watermarks (Q_RSRV) is done in
      the following way:
      
      for tc in {0..7}; do
      	devlink sb tc bind set pci/0000:00:00.5/0 sb 0 tc $tc \
      		type ingress pool $POOL_ING \
      		th 3000
      done
      
      The above command sets BUF_Q_RSRV_I(port 0, tc 0..7) to 3000 bytes.
      The sharing watermarks are again reconfigured with 24000 bytes less.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      f59fd9ca
    • V
      net: mscc: ocelot: register devlink ports · 6c30384e
      Vladimir Oltean 提交于
      Add devlink integration into the mscc_ocelot switchdev driver. All
      physical ports (i.e. the unused ones as well) except the CPU port module
      at ocelot->num_phys_ports are registered with devlink, and that requires
      keeping the devlink_port structure outside struct ocelot_port_private,
      since the latter has a 1:1 mapping with a struct net_device (which does
      not exist for unused ports).
      
      Since we use devlink_port_type_eth_set to link the devlink port to the
      net_device, we can as well remove the .ndo_get_phys_port_name and
      .ndo_get_port_parent_id implementations, since devlink takes care of
      retrieving the port name and number automatically, once
      .ndo_get_devlink_port is implemented.
      
      Note that the felix DSA driver is already integrated with devlink by
      default, since that is a thing that the DSA core takes care of. This is
      the reason why these devlink stubs were put in ocelot_net.c and not in
      the common library. It is also the reason why ocelot::devlink is a
      pointer and not a full structure embedded inside struct ocelot: because
      the mscc_ocelot driver allocates that by itself (as the container of
      struct ocelot, in fact), but in the case of felix, it is DSA who
      allocates the devlink, and felix just propagates the pointer towards
      struct ocelot.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      6c30384e
    • V
      net: mscc: ocelot: export NUM_TC constant from felix to common switch lib · 70d39a6e
      Vladimir Oltean 提交于
      We should be moving anything that isn't DSA-specific or SoC-specific out
      of the felix DSA driver, and into the common mscc_ocelot switch library.
      
      The number of traffic classes is one of the aspects that is common
      between all ocelot switches, so it belongs in the library.
      
      This patch also makes seville use 8 TX queues, and therefore enables
      prioritization via the QOS_CLASS field in the NPI injection header.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      70d39a6e
    • V
      net: mscc: ocelot: add ops for decoding watermark threshold and occupancy · 703b7621
      Vladimir Oltean 提交于
      We'll need to read back the watermark thresholds and occupancy from
      hardware (for devlink-sb integration), not only to write them as we did
      so far in ocelot_port_set_maxlen. So introduce 2 new functions in struct
      ocelot_ops, similar to wm_enc, and implement them for the 3 supported
      mscc_ocelot switches.
      
      Remove the INUSE and MAXUSE unpacking helpers for the QSYS_RES_STAT
      register, because that doesn't scale with the number of switches that
      mscc_ocelot supports now. They have different bit widths for the
      watermarks, and we need function pointers to abstract that difference
      away.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      703b7621