1. 07 9月, 2022 1 次提交
    • V
      net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet · 11afdc65
      Vladimir Oltean 提交于
      The blamed commit broke tc-taprio schedules such as this one:
      
      tc qdisc replace dev $swp1 root taprio \
              num_tc 8 \
              map 0 1 2 3 4 5 6 7 \
              queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
              base-time 0 \
              sched-entry S 0x7f 990000 \
              sched-entry S 0x80  10000 \
              flags 0x2
      
      because the gate entry for TC 7 (S 0x80 10000 ns) now has a static guard
      band added earlier than its 'gate close' event, such that packet
      overruns won't occur in the worst case of the largest packet possible.
      
      Since guard bands are statically determined based on the per-tc
      QSYS_QMAXSDU_CFG_* with a fallback on the port-based QSYS_PORT_MAX_SDU,
      we need to discuss what happens with TC 7 depending on kernel version,
      since the driver, prior to commit 55a515b1 ("net: dsa: felix: drop
      oversized frames with tc-taprio instead of hanging the port"), did not
      touch QSYS_QMAXSDU_CFG_*, and therefore relied on QSYS_PORT_MAX_SDU.
      
      1 (before vsc9959_tas_guard_bands_update): QSYS_PORT_MAX_SDU defaults to
        1518, and at gigabit this introduces a static guard band (independent
        of packet sizes) of 12144 ns, plus QSYS::HSCH_MISC_CFG.FRM_ADJ (bit
        time of 20 octets => 160 ns). But this is larger than the time window
        itself, of 10000 ns. So, the queue system never considers a frame with
        TC 7 as eligible for transmission, since the gate practically never
        opens, and these frames are forever stuck in the TX queues and hang
        the port.
      
      2 (after vsc9959_tas_guard_bands_update): Under the sole goal of
        enabling oversized frame dropping, we make an effort to set
        QSYS_QMAXSDU_CFG_7 to 1230 bytes. But QSYS_QMAXSDU_CFG_7 plays
        one more role, which we did not take into account: per-tc static guard
        band, expressed in L2 byte time (auto-adjusted for FCS and L1 overhead).
        There is a discrepancy between what the driver thinks (that there is
        no guard band, and 100% of min_gate_len[tc] is available for egress
        scheduling) and what the hardware actually does (crops the equivalent
        of QSYS_QMAXSDU_CFG_7 ns out of min_gate_len[tc]). In practice, this
        means that the hardware thinks it has exactly 0 ns for scheduling tc 7.
      
      In both cases, even minimum sized Ethernet frames are stuck on egress
      rather than being considered for scheduling on TC 7, even if they would
      fit given a proper configuration. Considering the current situation,
      with vsc9959_tas_guard_bands_update(), frames between 60 octets and 1230
      octets in size are not eligible for oversized dropping (because they are
      smaller than QSYS_QMAXSDU_CFG_7), but won't be considered as eligible
      for scheduling either, because the min_gate_len[7] (10000 ns) minus the
      guard band determined by QSYS_QMAXSDU_CFG_7 (1230 octets * 8 ns per
      octet == 9840 ns) minus the guard band auto-added for L1 overhead by
      QSYS::HSCH_MISC_CFG.FRM_ADJ (20 octets * 8 ns per octet == 160 octets)
      leaves 0 ns for scheduling in the queue system proper.
      
      Investigating the hardware behavior, it becomes apparent that the queue
      system needs precisely 33 ns of 'gate open' time in order to consider a
      frame as eligible for scheduling to a tc. So the solution to this
      problem is to amend vsc9959_tas_guard_bands_update(), by giving the
      per-tc guard bands less space by exactly 33 ns, just enough for one
      frame to be scheduled in that interval. This allows the queue system to
      make forward progress for that port-tc, and prevents it from hanging.
      
      Fixes: 297c4de6 ("net: dsa: felix: re-enable TAS guard band mode")
      Reported-by: NXiaoliang Yang <xiaoliang.yang_1@nxp.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      11afdc65
  2. 18 8月, 2022 5 次提交
    • V
      net: mscc: ocelot: keep ocelot_stat_layout by reg address, not offset · d4c36765
      Vladimir Oltean 提交于
      With so many counter addresses recently discovered as being wrong, it is
      desirable to at least have a central database of information, rather
      than two: one through the SYS_COUNT_* registers (used for
      ndo_get_stats64), and the other through the offset field of struct
      ocelot_stat_layout elements (used for ethtool -S).
      
      The strategy will be to keep the SYS_COUNT_* definitions as the single
      source of truth, but for that we need to expand our current definitions
      to cover all registers. Then we need to convert the ocelot region
      creation logic, and stats worker, to the read semantics imposed by going
      through SYS_COUNT_* absolute register addresses, rather than offsets
      of 32-bit words relative to SYS_COUNT_RX_OCTETS (which should have been
      SYS_CNT, by the way).
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      d4c36765
    • V
      net: mscc: ocelot: make struct ocelot_stat_layout array indexable · 91904600
      Vladimir Oltean 提交于
      The ocelot counters are 32-bit and require periodic reading, every 2
      seconds, by ocelot_port_update_stats(), so that wraparounds are
      detected.
      
      Currently, the counters reported by ocelot_get_stats64() come from the
      32-bit hardware counters directly, rather than from the 64-bit
      accumulated ocelot->stats, and this is a problem for their integrity.
      
      The strategy is to make ocelot_get_stats64() able to cherry-pick
      individual stats from ocelot->stats the way in which it currently reads
      them out from SYS_COUNT_* registers. But currently it can't, because
      ocelot->stats is an opaque u64 array that's used only to feed data into
      ethtool -S.
      
      To solve that problem, we need to make ocelot->stats indexable, and
      associate each element with an element of struct ocelot_stat_layout used
      by ethtool -S.
      
      This makes ocelot_stat_layout a fat (and possibly sparse) array, so we
      need to change the way in which we access it. We no longer need
      OCELOT_STAT_END as a sentinel, because we know the array's size
      (OCELOT_NUM_STATS). We just need to skip the array elements that were
      left unpopulated for the switch revision (ocelot, felix, seville).
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      91904600
    • V
      net: mscc: ocelot: turn stats_lock into a spinlock · 22d842e3
      Vladimir Oltean 提交于
      ocelot_get_stats64() currently runs unlocked and therefore may collide
      with ocelot_port_update_stats() which indirectly accesses the same
      counters. However, ocelot_get_stats64() runs in atomic context, and we
      cannot simply take the sleepable ocelot->stats_lock mutex. We need to
      convert it to an atomic spinlock first. Do that as a preparatory change.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      22d842e3
    • V
      net: mscc: ocelot: fix incorrect ndo_get_stats64 packet counters · 5152de7b
      Vladimir Oltean 提交于
      Reading stats using the SYS_COUNT_* register definitions is only used by
      ocelot_get_stats64() from the ocelot switchdev driver, however,
      currently the bucket definitions are incorrect.
      
      Separately, on both RX and TX, we have the following problems:
      - a 256-1023 bucket which actually tracks the 256-511 packets
      - the 1024-1526 bucket actually tracks the 512-1023 packets
      - the 1527-max bucket actually tracks the 1024-1526 packets
      
      => nobody tracks the packets from the real 1527-max bucket
      
      Additionally, the RX_PAUSE, RX_CONTROL, RX_LONGS and RX_CLASSIFIED_DROPS
      all track the wrong thing. However this doesn't seem to have any
      consequence, since ocelot_get_stats64() doesn't use these.
      
      Even though this problem only manifests itself for the switchdev driver,
      we cannot split the fix for ocelot and for DSA, since it requires fixing
      the bucket definitions from enum ocelot_reg, which makes us necessarily
      adapt the structures from felix and seville as well.
      
      Fixes: 84705fc1 ("net: dsa: felix: introduce support for Seville VSC9953 switch")
      Fixes: 56051948 ("net: dsa: ocelot: add driver for Felix switch family")
      Fixes: a556c76a ("net: mscc: Add initial Ocelot switch support")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      5152de7b
    • V
      net: dsa: felix: fix ethtool 256-511 and 512-1023 TX packet counters · 40d21c45
      Vladimir Oltean 提交于
      What the driver actually reports as 256-511 is in fact 512-1023, and the
      TX packets in the 256-511 bucket are not reported. Fix that.
      
      Fixes: 56051948 ("net: dsa: ocelot: add driver for Felix switch family")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      40d21c45
  3. 09 8月, 2022 1 次提交
    • V
      net: dsa: felix: fix min gate len calculation for tc when its first gate is closed · 7e4babff
      Vladimir Oltean 提交于
      min_gate_len[tc] is supposed to track the shortest interval of
      continuously open gates for a traffic class. For example, in the
      following case:
      
      TC 76543210
      
      t0 00000001b 200000 ns
      t1 00000010b 200000 ns
      
      min_gate_len[0] and min_gate_len[1] should be 200000, while
      min_gate_len[2-7] should be 0.
      
      However what happens is that min_gate_len[0] is 200000, but
      min_gate_len[1] ends up being 0 (despite gate_len[1] being 200000 at the
      point where the logic detects the gate close event for TC 1).
      
      The problem is that the code considers a "gate close" event whenever it
      sees that there is a 0 for that TC (essentially it's level rather than
      edge triggered). By doing that, any time a gate is seen as closed
      without having been open prior, gate_len, which is 0, will be written
      into min_gate_len. Once min_gate_len becomes 0, it's impossible for it
      to track anything higher than that (the length of actually open
      intervals).
      
      To fix this, we make the writing to min_gate_len[tc] be edge-triggered,
      which avoids writes for gates that are closed in consecutive intervals.
      However what this does is it makes us need to special-case the
      permanently closed gates at the end.
      
      Fixes: 55a515b1 ("net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20220804202817.1677572-1-vladimir.oltean@nxp.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      7e4babff
  4. 01 7月, 2022 5 次提交
    • V
      time64.h: consolidate uses of PSEC_PER_NSEC · 837ced3a
      Vladimir Oltean 提交于
      Time-sensitive networking code needs to work with PTP times expressed in
      nanoseconds, and with packet transmission times expressed in
      picoseconds, since those would be fractional at higher than gigabit
      speed when expressed in nanoseconds.
      
      Convert the existing uses in tc-taprio and the ocelot/felix DSA driver
      to a PSEC_PER_NSEC macro. This macro is placed in include/linux/time64.h
      as opposed to its relatives (PSEC_PER_SEC etc) from include/vdso/time64.h
      because the vDSO library does not (yet) need/use it.
      
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> # for the vDSO parts
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      837ced3a
    • V
      net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port · 55a515b1
      Vladimir Oltean 提交于
      Currently, sending a packet into a time gate too small for it (or always
      closed) causes the queue system to hold the frame forever. Even worse,
      this frame isn't subject to aging either, because for that to happen, it
      needs to be scheduled for transmission in the first place. But the frame
      will consume buffer memory and frame references while it is forever held
      in the queue system.
      
      Before commit a4ae997a ("net: mscc: ocelot: initialize watermarks to
      sane defaults"), this behavior was somewhat subtle, as the switch had a
      more intricately tuned default watermark configuration out of reset,
      which did not allow any single port and tc to consume the entire switch
      buffer space. Nonetheless, the held frames are still there, and they
      reduce the total backplane capacity of the switch.
      
      However, after the aforementioned commit, the behavior can be very
      clearly seen, since we deliberately allow each {port, tc} to consume the
      entire shared buffer of the switch minus the reservations (and we
      disable all reservations by default). That is to say, we allow a
      permanently closed tc-taprio gate to hang the entire switch.
      
      A careful inspection of the documentation shows that the QSYS:Q_MAX_SDU
      per-port-tc registers serve 2 purposes: one is for guard band calculation
      (when zero, this falls back to QSYS:PORT_MAX_SDU), and the other is to
      enable oversized frame dropping (when non-zero).
      
      Currently the QSYS:Q_MAX_SDU registers are all zero, so oversized frame
      dropping is disabled. The goal of the change is to enable it seamlessly.
      For that, we need to hook into the MTU change, tc-taprio change, and
      port link speed change procedures, since we depend on these variables.
      
      Frames are not dropped on egress due to a queue system oversize
      condition, instead that egress port is simply excluded from the mask of
      valid destination ports for the packet. If there are no destination
      ports at all, the ingress counter that increments is the generic
      "drop_tail" in ethtool -S.
      
      The issue exists in various forms since the tc-taprio offload was introduced.
      
      Fixes: de143c0e ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload")
      Reported-by: NRichie Pearn <richard.pearn@nxp.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      55a515b1
    • V
      net: dsa: felix: keep QSYS_TAG_CONFIG_INIT_GATE_STATE(0xFF) out of rmw · d68a373b
      Vladimir Oltean 提交于
      In vsc9959_tas_clock_adjust(), the INIT_GATE_STATE field is not changed,
      only the ENABLE field. Similarly for the disabling of the time-aware
      shaper in vsc9959_qos_port_tas_set().
      
      To reflect this, keep the QSYS_TAG_CONFIG_INIT_GATE_STATE_M mask out of
      the read-modify-write procedure to make it clearer what is the intention
      of the code.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      d68a373b
    • V
      net: dsa: felix: keep reference on entire tc-taprio config · 1c9017e4
      Vladimir Oltean 提交于
      In a future change we will need to remember the entire tc-taprio config
      on all ports rather than just the base time, so use the
      taprio_offload_get() helper function to replace ocelot_port->base_time
      with ocelot_port->taprio.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      1c9017e4
    • V
      net: dsa: felix: fix race between reading PSFP stats and port stats · 58bf4db6
      Vladimir Oltean 提交于
      Both PSFP stats and the port stats read by ocelot_check_stats_work() are
      indirectly read through the same mechanism - write to STAT_CFG:STAT_VIEW,
      read from SYS:STAT:CNT[n].
      
      It's just that for port stats, we write STAT_VIEW with the index of the
      port, and for PSFP stats, we write STAT_VIEW with the filter index.
      
      So if we allow them to run concurrently, ocelot_check_stats_work() may
      change the view from vsc9959_psfp_counters_get(), and vice versa.
      
      Fixes: 7d4b564d ("net: dsa: felix: support psfp filter on vsc9959")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20220629183007.3808130-1-vladimir.oltean@nxp.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      58bf4db6
  5. 19 6月, 2022 1 次提交
    • X
      net: dsa: felix: update base time of time-aware shaper when adjusting PTP time · 8670dc33
      Xiaoliang Yang 提交于
      When adjusting the PTP clock, the base time of the TAS configuration
      will become unreliable. We need reset the TAS configuration by using a
      new base time.
      
      For example, if the driver gets a base time 0 of Qbv configuration from
      user, and current time is 20000. The driver will set the TAS base time
      to be 20000. After the PTP clock adjustment, the current time becomes
      10000. If the TAS base time is still 20000, it will be a future time,
      and TAS entry list will stop running. Another example, if the current
      time becomes to be 10000000 after PTP clock adjust, a large time offset
      can cause the hardware to hang.
      
      This patch introduces a tas_clock_adjust() function to reset the TAS
      module by using a new base time after the PTP clock adjustment. This can
      avoid issues above.
      
      Due to PTP clock adjustment can occur at any time, it may conflict with
      the TAS configuration. We introduce a new TAS lock to serialize the
      access to the TAS registers.
      Signed-off-by: NXiaoliang Yang <xiaoliang.yang_1@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8670dc33
  6. 23 5月, 2022 1 次提交
    • V
      net: mscc: ocelot: switch from {,un}set to {,un}assign for tag_8021q CPU ports · c295f983
      Vladimir Oltean 提交于
      There is a desire for the felix driver to gain support for multiple
      tag_8021q CPU ports, but the current model prevents it.
      
      This is because ocelot_apply_bridge_fwd_mask() only takes into
      consideration whether a port is a tag_8021q CPU port, but not whose CPU
      port it is.
      
      We need a model where we can have a direct affinity between an ocelot
      port and a tag_8021q CPU port. This serves as the basis for multiple CPU
      ports.
      
      Declare a "dsa_8021q_cpu" backpointer in struct ocelot_port which
      encodes that affinity. Repurpose the "ocelot_set_dsa_8021q_cpu" API to
      "ocelot_assign_dsa_8021q_cpu" to express the change of paradigm.
      
      Note that this change makes the first practical use of the new
      ocelot_port->index field in ocelot_port_unassign_dsa_8021q_cpu(), where
      we need to remove the old tag_8021q CPU port from the reserved VLAN range.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c295f983
  7. 12 5月, 2022 1 次提交
  8. 30 4月, 2022 1 次提交
    • C
      net: ethernet: ocelot: remove the need for num_stats initializer · 2f187bfa
      Colin Foster 提交于
      There is a desire to share the oclot_stats_layout struct outside of the
      current vsc7514 driver. In order to do so, the length of the array needs to
      be known at compile time, and defined in the struct ocelot and struct
      felix_info.
      
      Since the array is defined in a .c file and would be declared in the header
      file via:
      extern struct ocelot_stat_layout[];
      the size of the array will not be known at compile time to outside modules.
      
      To fix this, remove the need for defining the number of stats at compile
      time and allow this number to be determined at initialization.
      Signed-off-by: NColin Foster <colin.foster@in-advantage.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2f187bfa
  9. 09 4月, 2022 1 次提交
  10. 31 3月, 2022 1 次提交
  11. 22 3月, 2022 1 次提交
  12. 28 2月, 2022 1 次提交
  13. 26 2月, 2022 1 次提交
  14. 09 2月, 2022 1 次提交
    • V
      net: dsa: felix: don't use devres for mdiobus · 209bdb7e
      Vladimir Oltean 提交于
      As explained in commits:
      74b6d7d1 ("net: dsa: realtek: register the MDIO bus under devres")
      5135e96a ("net: dsa: don't allocate the slave_mii_bus using devres")
      
      mdiobus_free() will panic when called from devm_mdiobus_free() <-
      devres_release_all() <- __device_release_driver(), and that mdiobus was
      not previously unregistered.
      
      The Felix VSC9959 switch is a PCI device, so the initial set of
      constraints that I thought would cause this (I2C or SPI buses which call
      ->remove on ->shutdown) do not apply. But there is one more which
      applies here.
      
      If the DSA master itself is on a bus that calls ->remove from ->shutdown
      (like dpaa2-eth, which is on the fsl-mc bus), there is a device link
      between the switch and the DSA master, and device_links_unbind_consumers()
      will unbind the felix switch driver on shutdown.
      
      So the same treatment must be applied to all DSA switch drivers, which
      is: either use devres for both the mdiobus allocation and registration,
      or don't use devres at all.
      
      The felix driver has the code structure in place for orderly mdiobus
      removal, so just replace devm_mdiobus_alloc_size() with the non-devres
      variant, and add manual free where necessary, to ensure that we don't
      let devres free a still-registered bus.
      
      Fixes: ac3a68d5 ("net: phy: don't abuse devres in devm_mdiobus_register()")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      209bdb7e
  15. 03 1月, 2022 2 次提交
  16. 19 12月, 2021 1 次提交
  17. 08 12月, 2021 3 次提交
  18. 29 11月, 2021 1 次提交
  19. 26 11月, 2021 1 次提交
    • V
      net: dsa: felix: enable cut-through forwarding between ports by default · 8abe1970
      Vladimir Oltean 提交于
      The VSC9959 switch embedded within NXP LS1028A (and that version of
      Ocelot switches only) supports cut-through forwarding - meaning it can
      start the process of looking up the destination ports for a packet, and
      forward towards those ports, before the entire packet has been received
      (as opposed to the store-and-forward mode).
      
      The up side is having lower forwarding latency for large packets. The
      down side is that frames with FCS errors are forwarded instead of being
      dropped. However, erroneous frames do not result in incorrect updates of
      the FDB or incorrect policer updates, since these processes are deferred
      inside the switch to the end of frame. Since the switch starts the
      cut-through forwarding process after all packet headers (including IP,
      if any) have been processed, packets with large headers and small
      payload do not see the benefit of lower forwarding latency.
      
      There are two cases that need special attention.
      
      The first is when a packet is multicast (or flooded) to multiple
      destinations, one of which doesn't have cut-through forwarding enabled.
      The switch deals with this automatically by disabling cut-through
      forwarding for the frame towards all destination ports.
      
      The second is when a packet is forwarded from a port of lower link speed
      towards a port of higher link speed. This is not handled by the hardware
      and needs software intervention.
      
      Since we practically need to update the cut-through forwarding domain
      from paths that aren't serialized by the rtnl_mutex (phylink
      mac_link_down/mac_link_up ops), this means we need to serialize physical
      link events with user space updates of bonding/bridging domains.
      
      Enabling cut-through forwarding is done per {egress port, traffic class}.
      I don't see any reason why this would be a configurable option as long
      as it works without issues, and there doesn't appear to be any user
      space configuration tool to toggle this on/off, so this patch enables
      cut-through forwarding on all eligible ports and traffic classes.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20211125125808.2383984-2-vladimir.oltean@nxp.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      8abe1970
  20. 18 11月, 2021 5 次提交
  21. 24 10月, 2021 1 次提交
    • S
      net: convert users of bitmap_foo() to linkmode_foo() · 4973056c
      Sean Anderson 提交于
      This converts instances of
      	bitmap_foo(args..., __ETHTOOL_LINK_MODE_MASK_NBITS)
      to
      	linkmode_foo(args...)
      
      I manually fixed up some lines to prevent them from being excessively
      long. Otherwise, this change was generated with the following semantic
      patch:
      
      // Generated with
      // echo linux/linkmode.h > includes
      // git grep -Flf includes include/ | cut -f 2- -d / | cat includes - \
      // | sort | uniq | tee new_includes | wc -l && mv new_includes includes
      // and repeating until the number stopped going up
      @i@
      @@
      
      (
       #include <linux/acpi_mdio.h>
      |
       #include <linux/brcmphy.h>
      |
       #include <linux/dsa/loop.h>
      |
       #include <linux/dsa/sja1105.h>
      |
       #include <linux/ethtool.h>
      |
       #include <linux/ethtool_netlink.h>
      |
       #include <linux/fec.h>
      |
       #include <linux/fs_enet_pd.h>
      |
       #include <linux/fsl/enetc_mdio.h>
      |
       #include <linux/fwnode_mdio.h>
      |
       #include <linux/linkmode.h>
      |
       #include <linux/lsm_audit.h>
      |
       #include <linux/mdio-bitbang.h>
      |
       #include <linux/mdio.h>
      |
       #include <linux/mdio-mux.h>
      |
       #include <linux/mii.h>
      |
       #include <linux/mii_timestamper.h>
      |
       #include <linux/mlx5/accel.h>
      |
       #include <linux/mlx5/cq.h>
      |
       #include <linux/mlx5/device.h>
      |
       #include <linux/mlx5/driver.h>
      |
       #include <linux/mlx5/eswitch.h>
      |
       #include <linux/mlx5/fs.h>
      |
       #include <linux/mlx5/port.h>
      |
       #include <linux/mlx5/qp.h>
      |
       #include <linux/mlx5/rsc_dump.h>
      |
       #include <linux/mlx5/transobj.h>
      |
       #include <linux/mlx5/vport.h>
      |
       #include <linux/of_mdio.h>
      |
       #include <linux/of_net.h>
      |
       #include <linux/pcs-lynx.h>
      |
       #include <linux/pcs/pcs-xpcs.h>
      |
       #include <linux/phy.h>
      |
       #include <linux/phy_led_triggers.h>
      |
       #include <linux/phylink.h>
      |
       #include <linux/platform_data/bcmgenet.h>
      |
       #include <linux/platform_data/xilinx-ll-temac.h>
      |
       #include <linux/pxa168_eth.h>
      |
       #include <linux/qed/qed_eth_if.h>
      |
       #include <linux/qed/qed_fcoe_if.h>
      |
       #include <linux/qed/qed_if.h>
      |
       #include <linux/qed/qed_iov_if.h>
      |
       #include <linux/qed/qed_iscsi_if.h>
      |
       #include <linux/qed/qed_ll2_if.h>
      |
       #include <linux/qed/qed_nvmetcp_if.h>
      |
       #include <linux/qed/qed_rdma_if.h>
      |
       #include <linux/sfp.h>
      |
       #include <linux/sh_eth.h>
      |
       #include <linux/smsc911x.h>
      |
       #include <linux/soc/nxp/lpc32xx-misc.h>
      |
       #include <linux/stmmac.h>
      |
       #include <linux/sunrpc/svc_rdma.h>
      |
       #include <linux/sxgbe_platform.h>
      |
       #include <net/cfg80211.h>
      |
       #include <net/dsa.h>
      |
       #include <net/mac80211.h>
      |
       #include <net/selftests.h>
      |
       #include <rdma/ib_addr.h>
      |
       #include <rdma/ib_cache.h>
      |
       #include <rdma/ib_cm.h>
      |
       #include <rdma/ib_hdrs.h>
      |
       #include <rdma/ib_mad.h>
      |
       #include <rdma/ib_marshall.h>
      |
       #include <rdma/ib_pack.h>
      |
       #include <rdma/ib_pma.h>
      |
       #include <rdma/ib_sa.h>
      |
       #include <rdma/ib_smi.h>
      |
       #include <rdma/ib_umem.h>
      |
       #include <rdma/ib_umem_odp.h>
      |
       #include <rdma/ib_verbs.h>
      |
       #include <rdma/iw_cm.h>
      |
       #include <rdma/mr_pool.h>
      |
       #include <rdma/opa_addr.h>
      |
       #include <rdma/opa_port_info.h>
      |
       #include <rdma/opa_smi.h>
      |
       #include <rdma/opa_vnic.h>
      |
       #include <rdma/rdma_cm.h>
      |
       #include <rdma/rdma_cm_ib.h>
      |
       #include <rdma/rdmavt_cq.h>
      |
       #include <rdma/rdma_vt.h>
      |
       #include <rdma/rdmavt_qp.h>
      |
       #include <rdma/rw.h>
      |
       #include <rdma/tid_rdma_defs.h>
      |
       #include <rdma/uverbs_ioctl.h>
      |
       #include <rdma/uverbs_named_ioctl.h>
      |
       #include <rdma/uverbs_std_types.h>
      |
       #include <rdma/uverbs_types.h>
      |
       #include <soc/mscc/ocelot.h>
      |
       #include <soc/mscc/ocelot_ptp.h>
      |
       #include <soc/mscc/ocelot_vcap.h>
      |
       #include <trace/events/ib_mad.h>
      |
       #include <trace/events/rdma_core.h>
      |
       #include <trace/events/rdma.h>
      |
       #include <trace/events/rpcrdma.h>
      |
       #include <uapi/linux/ethtool.h>
      |
       #include <uapi/linux/ethtool_netlink.h>
      |
       #include <uapi/linux/mdio.h>
      |
       #include <uapi/linux/mii.h>
      )
      
      @depends on i@
      expression list args;
      @@
      
      (
      - bitmap_zero(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
      + linkmode_zero(args)
      |
      - bitmap_copy(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
      + linkmode_copy(args)
      |
      - bitmap_and(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
      + linkmode_and(args)
      |
      - bitmap_or(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
      + linkmode_or(args)
      |
      - bitmap_empty(args, ETHTOOL_LINK_MODE_MASK_NBITS)
      + linkmode_empty(args)
      |
      - bitmap_andnot(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
      + linkmode_andnot(args)
      |
      - bitmap_equal(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
      + linkmode_equal(args)
      |
      - bitmap_intersects(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
      + linkmode_intersects(args)
      |
      - bitmap_subset(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
      + linkmode_subset(args)
      )
      
      Add missing linux/mii.h include to mellanox. -DaveM
      Signed-off-by: NSean Anderson <sean.anderson@seco.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4973056c
  22. 19 9月, 2021 1 次提交
    • V
      net: dsa: be compatible with masters which unregister on shutdown · 0650bf52
      Vladimir Oltean 提交于
      Lino reports that on his system with bcmgenet as DSA master and KSZ9897
      as a switch, rebooting or shutting down never works properly.
      
      What does the bcmgenet driver have special to trigger this, that other
      DSA masters do not? It has an implementation of ->shutdown which simply
      calls its ->remove implementation. Otherwise said, it unregisters its
      network interface on shutdown.
      
      This message can be seen in a loop, and it hangs the reboot process there:
      
      unregister_netdevice: waiting for eth0 to become free. Usage count = 3
      
      So why 3?
      
      A usage count of 1 is normal for a registered network interface, and any
      virtual interface which links itself as an upper of that will increment
      it via dev_hold. In the case of DSA, this is the call path:
      
      dsa_slave_create
      -> netdev_upper_dev_link
         -> __netdev_upper_dev_link
            -> __netdev_adjacent_dev_insert
               -> dev_hold
      
      So a DSA switch with 3 interfaces will result in a usage count elevated
      by two, and netdev_wait_allrefs will wait until they have gone away.
      
      Other stacked interfaces, like VLAN, watch NETDEV_UNREGISTER events and
      delete themselves, but DSA cannot just vanish and go poof, at most it
      can unbind itself from the switch devices, but that must happen strictly
      earlier compared to when the DSA master unregisters its net_device, so
      reacting on the NETDEV_UNREGISTER event is way too late.
      
      It seems that it is a pretty established pattern to have a driver's
      ->shutdown hook redirect to its ->remove hook, so the same code is
      executed regardless of whether the driver is unbound from the device, or
      the system is just shutting down. As Florian puts it, it is quite a big
      hammer for bcmgenet to unregister its net_device during shutdown, but
      having a common code path with the driver unbind helps ensure it is well
      tested.
      
      So DSA, for better or for worse, has to live with that and engage in an
      arms race of implementing the ->shutdown hook too, from all individual
      drivers, and do something sane when paired with masters that unregister
      their net_device there. The only sane thing to do, of course, is to
      unlink from the master.
      
      However, complications arise really quickly.
      
      The pattern of redirecting ->shutdown to ->remove is not unique to
      bcmgenet or even to net_device drivers. In fact, SPI controllers do it
      too (see dspi_shutdown -> dspi_remove), and presumably, I2C controllers
      and MDIO controllers do it too (this is something I have not researched
      too deeply, but even if this is not the case today, it is certainly
      plausible to happen in the future, and must be taken into consideration).
      
      Since DSA switches might be SPI devices, I2C devices, MDIO devices, the
      insane implication is that for the exact same DSA switch device, we
      might have both ->shutdown and ->remove getting called.
      
      So we need to do something with that insane environment. The pattern
      I've come up with is "if this, then not that", so if either ->shutdown
      or ->remove gets called, we set the device's drvdata to NULL, and in the
      other hook, we check whether the drvdata is NULL and just do nothing.
      This is probably not necessary for platform devices, just for devices on
      buses, but I would really insist for consistency among drivers, because
      when code is copy-pasted, it is not always copy-pasted from the best
      sources.
      
      So depending on whether the DSA switch's ->remove or ->shutdown will get
      called first, we cannot really guarantee even for the same driver if
      rebooting will result in the same code path on all platforms. But
      nonetheless, we need to do something minimally reasonable on ->shutdown
      too to fix the bug. Of course, the ->remove will do more (a full
      teardown of the tree, with all data structures freed, and this is why
      the bug was not caught for so long). The new ->shutdown method is kept
      separate from dsa_unregister_switch not because we couldn't have
      unregistered the switch, but simply in the interest of doing something
      quick and to the point.
      
      The big question is: does the DSA switch's ->shutdown get called earlier
      than the DSA master's ->shutdown? If not, there is still a risk that we
      might still trigger the WARN_ON in unregister_netdevice that says we are
      attempting to unregister a net_device which has uppers. That's no good.
      Although the reference to the master net_device won't physically go away
      even if DSA's ->shutdown comes afterwards, remember we have a dev_hold
      on it.
      
      The answer to that question lies in this comment above device_link_add:
      
       * A side effect of the link creation is re-ordering of dpm_list and the
       * devices_kset list by moving the consumer device and all devices depending
       * on it to the ends of these lists (that does not happen to devices that have
       * not been registered when this function is called).
      
      so the fact that DSA uses device_link_add towards its master is not
      exactly for nothing. device_shutdown() walks devices_kset from the back,
      so this is our guarantee that DSA's shutdown happens before the master's
      shutdown.
      
      Fixes: 2f1e8ea7 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings")
      Link: https://lore.kernel.org/netdev/20210909095324.12978-1-LinoSanfilippo@gmx.de/Reported-by: NLino Sanfilippo <LinoSanfilippo@gmx.de>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0650bf52
  23. 17 9月, 2021 1 次提交
    • V
      net: update NXP copyright text · 3c9cfb52
      Vladimir Oltean 提交于
      NXP Legal insists that the following are not fine:
      
      - Saying "NXP Semiconductors" instead of "NXP", since the company's
        registered name is "NXP"
      
      - Putting a "(c)" sign in the copyright string
      
      - Putting a comma in the copyright string
      
      The only accepted copyright string format is "Copyright <year-range> NXP".
      
      This patch changes the copyright headers in the networking files that
      were sent by me, or derived from code sent by me.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c9cfb52
  24. 11 5月, 2021 1 次提交
  25. 21 4月, 2021 1 次提交
    • X
      net: dsa: felix: disable always guard band bit for TAS config · 316bcffe
      Xiaoliang Yang 提交于
      ALWAYS_GUARD_BAND_SCH_Q bit in TAS config register is descripted as
      this:
      	0: Guard band is implemented for nonschedule queues to schedule
      	   queues transition.
      	1: Guard band is implemented for any queue to schedule queue
      	   transition.
      
      The driver set guard band be implemented for any queue to schedule queue
      transition before, which will make each GCL time slot reserve a guard
      band time that can pass the max SDU frame. Because guard band time could
      not be set in tc-taprio now, it will use about 12000ns to pass 1500B max
      SDU. This limits each GCL time interval to be more than 12000ns.
      
      This patch change the guard band to be only implemented for nonschedule
      queues to schedule queues transition, so that there is no need to reserve
      guard band on each GCL. Users can manually add guard band time for each
      schedule queues in their configuration if they want.
      Signed-off-by: NXiaoliang Yang <xiaoliang.yang_1@nxp.com>
      Reviewed-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      316bcffe