1. 18 8月, 2022 3 次提交
    • V
      net: mscc: ocelot: keep ocelot_stat_layout by reg address, not offset · d4c36765
      Vladimir Oltean 提交于
      With so many counter addresses recently discovered as being wrong, it is
      desirable to at least have a central database of information, rather
      than two: one through the SYS_COUNT_* registers (used for
      ndo_get_stats64), and the other through the offset field of struct
      ocelot_stat_layout elements (used for ethtool -S).
      
      The strategy will be to keep the SYS_COUNT_* definitions as the single
      source of truth, but for that we need to expand our current definitions
      to cover all registers. Then we need to convert the ocelot region
      creation logic, and stats worker, to the read semantics imposed by going
      through SYS_COUNT_* absolute register addresses, rather than offsets
      of 32-bit words relative to SYS_COUNT_RX_OCTETS (which should have been
      SYS_CNT, by the way).
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      d4c36765
    • V
      net: mscc: ocelot: make struct ocelot_stat_layout array indexable · 91904600
      Vladimir Oltean 提交于
      The ocelot counters are 32-bit and require periodic reading, every 2
      seconds, by ocelot_port_update_stats(), so that wraparounds are
      detected.
      
      Currently, the counters reported by ocelot_get_stats64() come from the
      32-bit hardware counters directly, rather than from the 64-bit
      accumulated ocelot->stats, and this is a problem for their integrity.
      
      The strategy is to make ocelot_get_stats64() able to cherry-pick
      individual stats from ocelot->stats the way in which it currently reads
      them out from SYS_COUNT_* registers. But currently it can't, because
      ocelot->stats is an opaque u64 array that's used only to feed data into
      ethtool -S.
      
      To solve that problem, we need to make ocelot->stats indexable, and
      associate each element with an element of struct ocelot_stat_layout used
      by ethtool -S.
      
      This makes ocelot_stat_layout a fat (and possibly sparse) array, so we
      need to change the way in which we access it. We no longer need
      OCELOT_STAT_END as a sentinel, because we know the array's size
      (OCELOT_NUM_STATS). We just need to skip the array elements that were
      left unpopulated for the switch revision (ocelot, felix, seville).
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      91904600
    • V
      net: mscc: ocelot: turn stats_lock into a spinlock · 22d842e3
      Vladimir Oltean 提交于
      ocelot_get_stats64() currently runs unlocked and therefore may collide
      with ocelot_port_update_stats() which indirectly accesses the same
      counters. However, ocelot_get_stats64() runs in atomic context, and we
      cannot simply take the sleepable ocelot->stats_lock mutex. We need to
      convert it to an atomic spinlock first. Do that as a preparatory change.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      22d842e3
  2. 19 6月, 2022 1 次提交
    • X
      net: dsa: felix: update base time of time-aware shaper when adjusting PTP time · 8670dc33
      Xiaoliang Yang 提交于
      When adjusting the PTP clock, the base time of the TAS configuration
      will become unreliable. We need reset the TAS configuration by using a
      new base time.
      
      For example, if the driver gets a base time 0 of Qbv configuration from
      user, and current time is 20000. The driver will set the TAS base time
      to be 20000. After the PTP clock adjustment, the current time becomes
      10000. If the TAS base time is still 20000, it will be a future time,
      and TAS entry list will stop running. Another example, if the current
      time becomes to be 10000000 after PTP clock adjust, a large time offset
      can cause the hardware to hang.
      
      This patch introduces a tas_clock_adjust() function to reset the TAS
      module by using a new base time after the PTP clock adjustment. This can
      avoid issues above.
      
      Due to PTP clock adjustment can occur at any time, it may conflict with
      the TAS configuration. We introduce a new TAS lock to serialize the
      access to the TAS registers.
      Signed-off-by: NXiaoliang Yang <xiaoliang.yang_1@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8670dc33
  3. 23 5月, 2022 4 次提交
  4. 13 5月, 2022 3 次提交
  5. 07 5月, 2022 1 次提交
  6. 06 5月, 2022 1 次提交
    • V
      net: mscc: ocelot: mark traps with a bool instead of keeping them in a list · e1846cff
      Vladimir Oltean 提交于
      Since the blamed commit, VCAP filters can appear on more than one list.
      If their action is "trap", they are chained on ocelot->traps via
      filter->trap_list. This is in addition to their normal placement on the
      VCAP block->rules list head.
      
      Therefore, when we free a VCAP filter, we must remove it from all lists
      it is a member of, including ocelot->traps.
      
      There are at least 2 bugs which are direct consequences of this design
      decision.
      
      First is the incorrect usage of list_empty(), meant to denote whether
      "filter" is chained into ocelot->traps via filter->trap_list.
      This does not do the correct thing, because list_empty() checks whether
      "head->next == head", but in our case, head->next == head->prev == NULL.
      So we dereference NULL pointers and die when we call list_del().
      
      Second is the fact that not all places that should remove the filter
      from ocelot->traps do so. One example is ocelot_vcap_block_remove_filter(),
      which is where we have the main kfree(filter). By keeping freed filters
      in ocelot->traps we end up in a use-after-free in
      felix_update_trapping_destinations().
      
      Attempting to fix all the buggy patterns is a whack-a-mole game which
      makes the driver unmaintainable. Actually this is what the previous
      patch version attempted to do:
      https://patchwork.kernel.org/project/netdevbpf/patch/20220503115728.834457-3-vladimir.oltean@nxp.com/
      
      but it introduced another set of bugs, because there are other places in
      which create VCAP filters, not just ocelot_vcap_filter_create():
      
      - ocelot_trap_add()
      - felix_tag_8021q_vlan_add_rx()
      - felix_tag_8021q_vlan_add_tx()
      
      Relying on the convention that all those code paths must call
      INIT_LIST_HEAD(&filter->trap_list) is not going to scale.
      
      So let's do what should have been done in the first place and keep a
      bool in struct ocelot_vcap_filter which denotes whether we are looking
      at a trapping rule or not. Iterating now happens over the main VCAP IS2
      block->rules. The advantage is that we no longer risk having stale
      references to a freed filter, since it is only present in that list.
      
      Fixes: e42bd4ed ("net: mscc: ocelot: keep traps in a list")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      e1846cff
  7. 30 4月, 2022 1 次提交
    • C
      net: ethernet: ocelot: remove the need for num_stats initializer · 2f187bfa
      Colin Foster 提交于
      There is a desire to share the oclot_stats_layout struct outside of the
      current vsc7514 driver. In order to do so, the length of the array needs to
      be known at compile time, and defined in the struct ocelot and struct
      felix_info.
      
      Since the array is defined in a .c file and would be declared in the header
      file via:
      extern struct ocelot_stat_layout[];
      the size of the array will not be known at compile time to outside modules.
      
      To fix this, remove the need for defining the number of stats at compile
      time and allow this number to be determined at initialization.
      Signed-off-by: NColin Foster <colin.foster@in-advantage.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2f187bfa
  8. 25 4月, 2022 2 次提交
    • V
      net: mscc: ocelot: don't add VID 0 to ocelot->vlans when leaving VLAN-aware bridge · 1fcb8fb3
      Vladimir Oltean 提交于
      DSA, through dsa_port_bridge_leave(), first notifies the port of the
      fact that it left a bridge, then, if that bridge was VLAN-aware, it
      notifies the port of the change in VLAN awareness state, towards
      VLAN-unaware mode.
      
      So ocelot_port_vlan_filtering() can be called when ocelot_port->bridge
      is NULL, and this makes ocelot_add_vlan_unaware_pvid() create a struct
      ocelot_bridge_vlan with a vid of 0 and an "untagged" setting of true on
      that port.
      
      In a way this structure correctly reflects the reality, but by design,
      VID 0 (OCELOT_STANDALONE_PVID) was not meant to be kept in the bridge
      VLAN list of the driver, but managed separately.
      
      Having OCELOT_STANDALONE_PVID in ocelot->vlans makes us trip up on
      several sanity checks that did not expect to have this VID there.
      For example, after we leave a VLAN-aware bridge and we re-join it, we
      can no longer program egress-tagged VLANs to hardware:
      
       # ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up
       # ip link set swp0 master br0
       # ip link set swp0 nomaster
       # ip link set swp0 master br0
       # bridge vlan add dev swp0 vid 100
      Error: mscc_ocelot_switch_lib: Port with more than one egress-untagged VLAN cannot have egress-tagged VLANs.
      
      But this configuration is in fact supported by the hardware, since we
      could use OCELOT_PORT_TAG_NATIVE. According to its comment:
      
      /* all VLANs except the native VLAN and VID 0 are egress-tagged */
      
      yet when assessing the eligibility for this mode, we do not check for
      VID 0 in ocelot_port_uses_native_vlan(), instead we just ensure that
      ocelot_port_num_untagged_vlans() == 1. This is simply because VID 0
      doesn't have a bridge VLAN structure.
      
      The way I identify the problem is that ocelot_port_vlan_filtering(false)
      only means to call ocelot_add_vlan_unaware_pvid() when we dynamically
      turn off VLAN awareness for a bridge we are under, and the PVID changes
      from the bridge PVID to a reserved PVID based on the bridge number.
      
      Since OCELOT_STANDALONE_PVID is statically added to the VLAN table
      during ocelot_vlan_init() and never removed afterwards, calling
      ocelot_add_vlan_unaware_pvid() for it is not intended and does not serve
      any purpose.
      
      Fix the issue by avoiding the call to ocelot_add_vlan_unaware_pvid(vid=0)
      when we're resetting VLAN awareness after leaving the bridge, to become
      a standalone port.
      
      Fixes: 54c31984 ("net: mscc: ocelot: enforce FDB isolation when VLAN-unaware")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1fcb8fb3
    • V
      net: mscc: ocelot: ignore VID 0 added by 8021q module · 9323ac36
      Vladimir Oltean 提交于
      Both the felix DSA driver and ocelot switchdev driver declare
      dev->features & NETIF_F_HW_VLAN_CTAG_FILTER under certain circumstances*,
      so the 8021q module will add VID 0 to our RX filter when the port goes
      up, to ensure 802.1p traffic is not dropped.
      
      We treat VID 0 as a special value (OCELOT_STANDALONE_PVID) which
      deliberately does not have a struct ocelot_bridge_vlan associated with
      it. Instead, this gets programmed to the VLAN table in ocelot_vlan_init().
      
      If we allow external calls to modify VID 0, we reach the following
      situation:
      
       # ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up
       # ip link set swp0 master br0
       # ip link set swp0 up # this adds VID 0 to ocelot->vlans with untagged=false
      bridge vlan
      port              vlan-id
      swp0              1 PVID Egress Untagged # the bridge also adds VID 1
      br0               1 PVID Egress Untagged
       # bridge vlan add dev swp0 vid 100 untagged
      Error: mscc_ocelot_switch_lib: Port with egress-tagged VLANs cannot have more than one egress-untagged (native) VLAN.
      
      This configuration should have been accepted, because
      ocelot_port_manage_port_tag() should select OCELOT_PORT_TAG_NATIVE.
      Yet it isn't, because we have an entry in ocelot->vlans which says
      VID 0 should be egress-tagged, something the hardware can't do.
      
      Fix this by suppressing additions/deletions on VID 0 and managing this
      VLAN exclusively using OCELOT_STANDALONE_PVID.
      
      *DSA toggles it when the port becomes VLAN-aware by joining a VLAN-aware
      bridge. Ocelot declares it unconditionally for some reason.
      
      Fixes: 54c31984 ("net: mscc: ocelot: enforce FDB isolation when VLAN-unaware")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9323ac36
  9. 19 4月, 2022 1 次提交
  10. 18 3月, 2022 2 次提交
    • V
      net: mscc: ocelot: offload per-flow mirroring using tc-mirred and VCAP IS2 · f2a0e216
      Vladimir Oltean 提交于
      Per-flow mirroring with the VCAP IS2 TCAM (in itself handled as an
      offload for tc-flower) is done by setting the MIRROR_ENA bit from the
      action vector of the filter. The packet is mirrored to the port mask
      configured in the ANA:ANA:MIRRORPORTS register (the same port mask as
      the destinations for port-based mirroring).
      
      Functionality was tested with:
      
      tc qdisc add dev swp3 clsact
      tc filter add dev swp3 ingress protocol ip \
      	flower skip_sw ip_proto icmp \
      	action mirred egress mirror dev swp1
      
      and pinging through swp3, while seeing that the ICMP replies are
      mirrored towards swp1.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      f2a0e216
    • V
      net: mscc: ocelot: add port mirroring support using tc-matchall · ccb6ed42
      Vladimir Oltean 提交于
      Ocelot switches perform port-based ingress mirroring if
      ANA:PORT:PORT_CFG field SRC_MIRROR_ENA is set, and egress mirroring if
      the port is in ANA:ANA:EMIRRORPORTS.
      
      Both ingress-mirrored and egress-mirrored frames are copied to the port
      mask from ANA:ANA:MIRRORPORTS.
      
      So the choice of limiting to a single mirror port via ocelot_mirror_get()
      and ocelot_mirror_put() may seem bizarre, but the hardware model doesn't
      map very well to the user space model. If the user wants to mirror the
      ingress of swp1 towards swp2 and the ingress of swp3 towards swp4, we'd
      have to program ANA:ANA:MIRRORPORTS with BIT(2) | BIT(4), and that would
      make swp1 be mirrored towards swp4 too, and swp3 towards swp2. But there
      are no tc-matchall rules to describe those actions.
      
      Now, we could offload a matchall rule with multiple mirred actions, one
      per desired mirror port, and force the user to stick to the multi-action
      rule format for subsequent matchall filters. But both DSA and ocelot
      have the flow_offload_has_one_action() check for the matchall offload,
      plus the fact that it will get cumbersome to cross-check matchall
      mirrors with flower mirrors (which will be added in the next patch).
      
      As a result, we limit the configuration to a single mirror port, with
      the possibility of lifting the restriction in the future.
      
      Frames injected from the CPU don't get egress-mirrored, since they are
      sent with the BYPASS bit in the injection frame header, and this
      bypasses the analyzer module (effectively also the mirroring logic).
      I don't know what to do/say about this.
      
      Functionality was tested with:
      
      tc qdisc add dev swp3 clsact
      tc filter add dev swp3 ingress \
      	matchall skip_sw \
      	action mirred egress mirror dev swp1
      
      and pinging through swp3, while seeing that the ICMP replies are
      mirrored towards swp1.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      ccb6ed42
  11. 16 3月, 2022 1 次提交
  12. 14 3月, 2022 1 次提交
    • V
      net: dsa: felix: configure default-prio and dscp priorities · 978777d0
      Vladimir Oltean 提交于
      Follow the established programming model for this driver and provide
      shims in the felix DSA driver which call the implementations from the
      ocelot switch lib. The ocelot switchdev driver wasn't integrated with
      dcbnl due to lack of hardware availability.
      
      The switch doesn't have any fancy QoS classification enabled by default.
      The provided getters will create a default-prio app table entry of 0,
      and no dscp entry. However, the getters have been made to actually
      retrieve the hardware configuration rather than static values, to be
      future proof in case DSA will need this information from more call paths.
      
      For default-prio, there is a single field per port, in ANA_PORT_QOS_CFG,
      called QOS_DEFAULT_VAL.
      
      DSCP classification is enabled per-port, again via ANA_PORT_QOS_CFG
      (field QOS_DSCP_ENA), and individual DSCP values are configured as
      trusted or not through register ANA_DSCP_CFG (replicated 64 times).
      An untrusted DSCP value falls back to other QoS classification methods.
      If trusted, the selected ANA_DSCP_CFG register also holds the QoS class
      in the QOS_DSCP_VAL field.
      
      The hardware also supports DSCP remapping (DSCP value X is translated to
      DSCP value Y before the QoS class is determined based on the app table
      entry for Y) and DSCP packet rewriting. The dcbnl framework, for being
      so flexible in other useless areas, doesn't appear to support this.
      So this functionality has been left out.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      978777d0
  13. 03 3月, 2022 1 次提交
  14. 27 2月, 2022 1 次提交
    • V
      net: mscc: ocelot: enforce FDB isolation when VLAN-unaware · 54c31984
      Vladimir Oltean 提交于
      Currently ocelot uses a pvid of 0 for standalone ports and ports under a
      VLAN-unaware bridge, and the pvid of the bridge for ports under a
      VLAN-aware bridge. Standalone ports do not perform learning, but packets
      received on them are still subject to FDB lookups. So if the MAC DA that
      a standalone port receives has been also learned on a VLAN-unaware
      bridge port, ocelot will attempt to forward to that port, even though it
      can't, so it will drop packets.
      
      So there is a desire to avoid that, and isolate the FDBs of different
      bridges from one another, and from standalone ports.
      
      The ocelot switch library has two distinct entry points: the felix DSA
      driver and the ocelot switchdev driver.
      
      We need to code up a minimal bridge_num allocation in the ocelot
      switchdev driver too, this is copied from DSA with the exception that
      ocelot does not care about DSA trees, cross-chip bridging etc. So it
      only looks at its own ports that are already in the same bridge.
      
      The ocelot switchdev driver uses the bridge_num it has allocated itself,
      while the felix driver uses the bridge_num allocated by DSA. They are
      both stored inside ocelot_port->bridge_num by the common function
      ocelot_port_bridge_join() which receives the bridge_num passed by value.
      
      Once we have a bridge_num, we can only use it to enforce isolation
      between VLAN-unaware bridges. As far as I can see, ocelot does not have
      anything like a FID that further makes VLAN 100 from a port be different
      to VLAN 100 from another port with regard to FDB lookup. So we simply
      deny multiple VLAN-aware bridges.
      
      For VLAN-unaware bridges, we crop the 4000-4095 VLAN region and we
      allocate a VLAN for each bridge_num. This will be used as the pvid of
      each port that is under that VLAN-unaware bridge, for as long as that
      bridge is VLAN-unaware.
      
      VID 0 remains only for standalone ports. It is okay if all standalone
      ports use the same VID 0, since they perform no address learning, the
      FDB will contain no entry in VLAN 0, so the packets will always be
      flooded to the only possible destination, the CPU port.
      
      The CPU port module doesn't need to be member of the VLANs to receive
      packets, but if we use the DSA tag_8021q protocol, those packets are
      part of the data plane as far as ocelot is concerned, so there it needs
      to. Just ensure that the DSA tag_8021q CPU port is a member of all
      reserved VLANs when it is created, and is removed when it is deleted.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      54c31984
  15. 25 2月, 2022 1 次提交
    • V
      net: dsa: felix: support FDB entries on offloaded LAG interfaces · 961d8b69
      Vladimir Oltean 提交于
      This adds the logic in the Felix DSA driver and Ocelot switch library.
      For Ocelot switches, the DEST_IDX that is the output of the MAC table
      lookup is a logical port (equal to physical port, if no LAG is used, or
      a dynamically allocated number otherwise). The allocation we have in
      place for LAG IDs is different from DSA's, so we can't use that:
      - DSA allocates a continuous range of LAG IDs starting from 1
      - Ocelot appears to require that physical ports and LAG IDs are in the
        same space of [0, num_phys_ports), and additionally, ports that aren't
        in a LAG must have physical port id == logical port id
      
      The implication is that an FDB entry towards a LAG might need to be
      deleted and reinstalled when the LAG ID changes.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      961d8b69
  16. 17 2月, 2022 4 次提交
    • V
      net: mscc: ocelot: annotate which traps need PTP timestamping · 9d75b881
      Vladimir Oltean 提交于
      The ocelot switch library does not need this information, but the felix
      DSA driver does.
      
      As a reminder, the VSC9959 switch in LS1028A doesn't have an IRQ line
      for packet extraction, so to be notified that a PTP packet needs to be
      dequeued, it receives that packet also over Ethernet, by setting up a
      packet trap. The Felix driver needs to install special kinds of traps
      for packets in need of RX timestamps, such that the packets are
      replicated both over Ethernet and over the CPU port module.
      
      But the Ocelot switch library sets up more than one trap for PTP event
      messages; it also traps PTP general messages, MRP control messages etc.
      Those packets don't need PTP timestamps, so there's no reason for the
      Felix driver to send them to the CPU port module.
      
      By knowing which traps need PTP timestamps, the Felix driver can
      adjust the traps installed using ocelot_trap_add() such that only those
      will actually get delivered to the CPU port module.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9d75b881
    • V
      net: mscc: ocelot: keep traps in a list · e42bd4ed
      Vladimir Oltean 提交于
      When using the ocelot-8021q tagging protocol, the CPU port isn't
      configured as an NPI port, but is a regular port. So a "trap to CPU"
      operation is actually a "redirect" operation. So DSA needs to set up the
      trapping action one way or another, depending on the tagging protocol in
      use.
      
      To ease DSA's work of modifying the action, keep all currently installed
      traps in a list, so that DSA can live-patch them when the tagging
      protocol changes.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e42bd4ed
    • V
      net: mscc: ocelot: use a single VCAP filter for all MRP traps · b9bace6e
      Vladimir Oltean 提交于
      The MRP assist code installs a VCAP IS2 trapping rule for each port, but
      since the key and the action is the same, just the ingress port mask
      differs, there isn't any need to do this. We can save some space in the
      TCAM by using a single filter and adjusting the ingress port mask.
      
      Reuse the ocelot_trap_add() and ocelot_trap_del() functions for this
      purpose.
      
      Now that the cookies are no longer per port, we need to change the
      allocation scheme such that MRP traps use a fixed number.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b9bace6e
    • V
      net: mscc: ocelot: consolidate cookie allocation for private VCAP rules · c518afec
      Vladimir Oltean 提交于
      Every use case that needed VCAP filters (in order: DSA tag_8021q, MRP,
      PTP traps) has hardcoded filter identifiers that worked well enough for
      that use case alone. But when two or more of those use cases would be
      used together, some of those identifiers would overlap, leading to
      breakage.
      
      Add definitions for each cookie and centralize them in ocelot_vcap.h,
      such that the overlaps are more obvious.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c518afec
  17. 15 2月, 2022 1 次提交
  18. 14 2月, 2022 2 次提交
  19. 11 2月, 2022 1 次提交
  20. 05 2月, 2022 1 次提交
    • V
      net: mscc: ocelot: fix all IP traffic getting trapped to CPU with PTP over IP · 59085208
      Vladimir Oltean 提交于
      The filters for the PTP trap keys are incorrectly configured, in the
      sense that is2_entry_set() only looks at trap->key.ipv4.dport or
      trap->key.ipv6.dport if trap->key.ipv4.proto or trap->key.ipv6.proto is
      set to IPPROTO_TCP or IPPROTO_UDP.
      
      But we don't do that, so is2_entry_set() goes through the "else" branch
      of the IP protocol check, and ends up installing a rule for "Any IP
      protocol match" (because msk is also 0). The UDP port is ignored.
      
      This means that when we run "ptp4l -i swp0 -4", all IP traffic is
      trapped to the CPU, which hinders bridging.
      
      Fix this by specifying the IP protocol in the VCAP IS2 filters for PTP
      over UDP.
      
      Fixes: 96ca08c0 ("net: mscc: ocelot: set up traps for PTP packets")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59085208
  21. 13 1月, 2022 1 次提交
    • V
      net: mscc: ocelot: don't let phylink re-enable TX PAUSE on the NPI port · 33cb0ff3
      Vladimir Oltean 提交于
      Since commit b3964807 ("net: mscc: ocelot: disable flow control on
      NPI interface"), flow control should be disabled on the DSA CPU port
      when used in NPI mode.
      
      However, the commit blamed in the Fixes: tag below broke this, because
      it allowed felix_phylink_mac_link_up() to overwrite SYS_PAUSE_CFG_PAUSE_ENA
      for the DSA CPU port.
      
      This issue became noticeable since the device tree update from commit
      8fcea7be ("arm64: dts: ls1028a: mark internal links between Felix
      and ENETC as capable of flow control").
      
      The solution is to check whether this is the currently configured NPI
      port from ocelot_phylink_mac_link_up(), and to not modify the statically
      disabled PAUSE frame transmission if it is.
      
      When the port is configured for lossless mode as opposed to tail drop
      mode, but the link partner (DSA master) doesn't observe the transmitted
      PAUSE frames, the switch termination throughput is much worse, as can be
      seen below.
      
      Before:
      
      root@debian:~# iperf3 -c 192.168.100.2
      Connecting to host 192.168.100.2, port 5201
      [  5] local 192.168.100.1 port 37504 connected to 192.168.100.2 port 5201
      [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
      [  5]   0.00-1.00   sec  28.4 MBytes   238 Mbits/sec  357   22.6 KBytes
      [  5]   1.00-2.00   sec  33.6 MBytes   282 Mbits/sec  426   19.8 KBytes
      [  5]   2.00-3.00   sec  34.0 MBytes   285 Mbits/sec  343   21.2 KBytes
      [  5]   3.00-4.00   sec  32.9 MBytes   276 Mbits/sec  354   22.6 KBytes
      [  5]   4.00-5.00   sec  32.3 MBytes   271 Mbits/sec  297   18.4 KBytes
      ^C[  5]   5.00-5.06   sec  2.05 MBytes   270 Mbits/sec   45   19.8 KBytes
      - - - - - - - - - - - - - - - - - - - - - - - - -
      [ ID] Interval           Transfer     Bitrate         Retr
      [  5]   0.00-5.06   sec   163 MBytes   271 Mbits/sec  1822             sender
      [  5]   0.00-5.06   sec  0.00 Bytes  0.00 bits/sec                  receiver
      
      After:
      
      root@debian:~# iperf3 -c 192.168.100.2
      Connecting to host 192.168.100.2, port 5201
      [  5] local 192.168.100.1 port 49470 connected to 192.168.100.2 port 5201
      [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
      [  5]   0.00-1.00   sec   112 MBytes   941 Mbits/sec  259    143 KBytes
      [  5]   1.00-2.00   sec   110 MBytes   920 Mbits/sec  329    144 KBytes
      [  5]   2.00-3.00   sec   112 MBytes   936 Mbits/sec  255    144 KBytes
      [  5]   3.00-4.00   sec   110 MBytes   927 Mbits/sec  355    105 KBytes
      [  5]   4.00-5.00   sec   110 MBytes   926 Mbits/sec  350    156 KBytes
      [  5]   5.00-6.00   sec   110 MBytes   925 Mbits/sec  305    148 KBytes
      [  5]   6.00-7.00   sec   110 MBytes   924 Mbits/sec  320    143 KBytes
      [  5]   7.00-8.00   sec   110 MBytes   925 Mbits/sec  273   97.6 KBytes
      [  5]   8.00-9.00   sec   109 MBytes   913 Mbits/sec  299    141 KBytes
      [  5]   9.00-10.00  sec   110 MBytes   922 Mbits/sec  287    146 KBytes
      - - - - - - - - - - - - - - - - - - - - - - - - -
      [ ID] Interval           Transfer     Bitrate         Retr
      [  5]   0.00-10.00  sec  1.08 GBytes   926 Mbits/sec  3032             sender
      [  5]   0.00-10.00  sec  1.08 GBytes   925 Mbits/sec                  receiver
      
      Fixes: de274be3 ("net: dsa: felix: set TX flow control according to the phylink_mac_link_up resolution")
      Reported-by: NXiaoliang Yang <xiaoliang.yang_1@nxp.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      33cb0ff3
  22. 08 1月, 2022 2 次提交
    • V
      net: dsa: felix: add port fast age support · 5cad43a5
      Vladimir Oltean 提交于
      Add support for flushing the MAC table on a given port in the ocelot
      switch library, and use this functionality in the felix DSA driver.
      
      This operation is needed when a port leaves a bridge to become
      standalone, and when the learning is disabled, and when the STP state
      changes to a state where no FDB entry should be present.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/20220107144229.244584-1-vladimir.oltean@nxp.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      5cad43a5
    • V
      net: mscc: ocelot: fix incorrect balancing with down LAG ports · a14e6b69
      Vladimir Oltean 提交于
      Assuming the test setup described here:
      https://patchwork.kernel.org/project/netdevbpf/cover/20210205130240.4072854-1-vladimir.oltean@nxp.com/
      (swp1 and swp2 are in bond0, and bond0 is in a bridge with swp0)
      
      it can be seen that when swp1 goes down (on either board A or B), then
      traffic that should go through that port isn't forwarded anywhere.
      
      A dump of the PGID table shows the following:
      
      PGID_DST[0] = ports 0
      PGID_DST[1] = ports 1
      PGID_DST[2] = ports 2
      PGID_DST[3] = ports 3
      PGID_DST[4] = ports 4
      PGID_DST[5] = ports 5
      PGID_DST[6] = no ports
      PGID_AGGR[0] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[1] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[2] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[3] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[4] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[5] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[6] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[7] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[8] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[9] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[10] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[11] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[12] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[13] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[14] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[15] = ports 0, 1, 2, 3, 4, 5
      PGID_SRC[0] = ports 1, 2
      PGID_SRC[1] = ports 0
      PGID_SRC[2] = ports 0
      PGID_SRC[3] = no ports
      PGID_SRC[4] = no ports
      PGID_SRC[5] = no ports
      PGID_SRC[6] = ports 0, 1, 2, 3, 4, 5
      
      Whereas a "good" PGID configuration for that setup should have looked
      like this:
      
      PGID_DST[0] = ports 0
      PGID_DST[1] = ports 1, 2
      PGID_DST[2] = ports 1, 2
      PGID_DST[3] = ports 3
      PGID_DST[4] = ports 4
      PGID_DST[5] = ports 5
      PGID_DST[6] = no ports
      PGID_AGGR[0] = ports 0, 2, 3, 4, 5
      PGID_AGGR[1] = ports 0, 2, 3, 4, 5
      PGID_AGGR[2] = ports 0, 2, 3, 4, 5
      PGID_AGGR[3] = ports 0, 2, 3, 4, 5
      PGID_AGGR[4] = ports 0, 2, 3, 4, 5
      PGID_AGGR[5] = ports 0, 2, 3, 4, 5
      PGID_AGGR[6] = ports 0, 2, 3, 4, 5
      PGID_AGGR[7] = ports 0, 2, 3, 4, 5
      PGID_AGGR[8] = ports 0, 2, 3, 4, 5
      PGID_AGGR[9] = ports 0, 2, 3, 4, 5
      PGID_AGGR[10] = ports 0, 2, 3, 4, 5
      PGID_AGGR[11] = ports 0, 2, 3, 4, 5
      PGID_AGGR[12] = ports 0, 2, 3, 4, 5
      PGID_AGGR[13] = ports 0, 2, 3, 4, 5
      PGID_AGGR[14] = ports 0, 2, 3, 4, 5
      PGID_AGGR[15] = ports 0, 2, 3, 4, 5
      PGID_SRC[0] = ports 1, 2
      PGID_SRC[1] = ports 0
      PGID_SRC[2] = ports 0
      PGID_SRC[3] = no ports
      PGID_SRC[4] = no ports
      PGID_SRC[5] = no ports
      PGID_SRC[6] = ports 0, 1, 2, 3, 4, 5
      
      In other words, in the "bad" configuration, the attempt is to remove the
      inactive swp1 from the destination ports via PGID_DST. But when a MAC
      table entry is learned, it is learned towards PGID_DST 1, because that
      is the logical port id of the LAG itself (it is equal to the lowest
      numbered member port). So when swp1 becomes inactive, if we set
      PGID_DST[1] to contain just swp1 and not swp2, the packet will not have
      any chance to reach the destination via swp2.
      
      The "correct" way to remove swp1 as a destination is via PGID_AGGR
      (remove swp1 from the aggregation port groups for all aggregation
      codes). This means that PGID_DST[1] and PGID_DST[2] must still contain
      both swp1 and swp2. This makes the MAC table still treat packets
      destined towards the single-port LAG as "multicast", and the inactive
      ports are removed via the aggregation code tables.
      
      The change presented here is a design one: the ocelot_get_bond_mask()
      function used to take an "only_active_ports" argument. We don't need
      that. The only call site that specifies only_active_ports=true,
      ocelot_set_aggr_pgids(), must retrieve the entire bonding mask, because
      it must program that into PGID_DST. Additionally, it must also clear the
      inactive ports from the bond mask here, which it can't do if bond_mask
      just contains the active ports:
      
      	ac = ocelot_read_rix(ocelot, ANA_PGID_PGID, i);
      	ac &= ~bond_mask;  <---- here
      	/* Don't do division by zero if there was no active
      	 * port. Just make all aggregation codes zero.
      	 */
      	if (num_active_ports)
      		ac |= BIT(aggr_idx[i % num_active_ports]);
      	ocelot_write_rix(ocelot, ac, ANA_PGID_PGID, i);
      
      So it becomes the responsibility of ocelot_set_aggr_pgids() to take
      ocelot_port->lag_tx_active into consideration when populating the
      aggr_idx array.
      
      Fixes: 23ca3b72 ("net: mscc: ocelot: rebalance LAGs on link up/down events")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20220107164332.402133-1-vladimir.oltean@nxp.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      a14e6b69
  23. 14 12月, 2021 1 次提交
    • H
      net_tstamp: add new flag HWTSTAMP_FLAG_BONDED_PHC_INDEX · 9c9211a3
      Hangbin Liu 提交于
      Since commit 94dd016a ("bond: pass get_ts_info and SIOC[SG]HWTSTAMP
      ioctl to active device") the user could get bond active interface's
      PHC index directly. But when there is a failover, the bond active
      interface will change, thus the PHC index is also changed. This may
      break the user's program if they did not update the PHC timely.
      
      This patch adds a new hwtstamp_config flag HWTSTAMP_FLAG_BONDED_PHC_INDEX.
      When the user wants to get the bond active interface's PHC, they need to
      add this flag and be aware the PHC index may be changed.
      
      With the new flag. All flag checks in current drivers are removed. Only
      the checking in net_hwtstamp_validate() is kept.
      Suggested-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9c9211a3
  24. 11 12月, 2021 2 次提交
  25. 30 11月, 2021 1 次提交