1. 20 4月, 2021 1 次提交
    • V
      net: enetc: create a common enetc_pf_to_port helper · 87614b93
      Vladimir Oltean 提交于
      Even though ENETC interfaces are exposed as individual PCIe PFs with
      their own driver instances, the ENETC is still fundamentally a
      multi-port Ethernet controller, and some parts of the IP take a port
      number (as can be seen in the PSFP implementation).
      
      Create a common helper that can be used outside of the TSN code for
      retrieving the ENETC port number based on the PF number. This is only
      correct for LS1028A, the only Linux-capable instantiation of ENETC thus
      far.
      
      Note that ENETC port 3 is PF 6. The TSN code did not care about this
      because ENETC port 3 does not support TSN, so the wrong mapping done by
      enetc_get_port for PF 6 could have never been hit.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      87614b93
  2. 17 4月, 2021 2 次提交
    • V
      net: enetc: use dedicated TX rings for XDP · 7eab503b
      Vladimir Oltean 提交于
      It is possible for one CPU to perform TX hashing (see netdev_pick_tx)
      between the 8 ENETC TX rings, and the TX hashing to select TX queue 1.
      
      At the same time, it is possible for the other CPU to already use TX
      ring 1 for XDP (either XDP_TX or XDP_REDIRECT). Since there is no mutual
      exclusion between XDP and the network stack, we run into an issue
      because the ENETC TX procedure is not reentrant.
      
      The obvious approach would be to just make XDP take the lock of the
      network stack's TX queue corresponding to the ring it's about to enqueue
      in.
      
      For XDP_REDIRECT, this is quite straightforward, a lock at the beginning
      and end of enetc_xdp_xmit() should do the trick.
      
      But for XDP_TX, it's a bit more complicated. For one, we do TX batching
      all by ourselves for frames with the XDP_TX verdict. This is something
      we would like to keep the way it is, for performance reasons. But
      batching means that the network stack's lock should be kept from the
      first enqueued XDP_TX frame and until we ring the doorbell. That is
      mostly fine, except for cases when in the same NAPI loop we have mixed
      XDP_TX and XDP_REDIRECT frames. So if enetc_xdp_xmit() gets called while
      we are holding the lock from the RX NAPI, then bam, deadlock. The naive
      answer could be 'just flush the XDP_TX frames first, then release the
      network stack's TX queue lock, then call xdp_do_flush_map()'. But even
      xdp_do_redirect() is capable of flushing the batched XDP_REDIRECT
      frames, so unless we unlock/relock the TX queue around xdp_do_redirect(),
      there simply isn't any clean way to protect XDP_TX from concurrent
      network stack .ndo_start_xmit() on another CPU.
      
      So we need to take a different approach, and that is to reserve two
      rings for the sole use of XDP. We leave TX rings
      0..ndev->real_num_tx_queues-1 to be handled by the network stack, and we
      pick them from the end of the priv->tx_ring array.
      
      We make an effort to keep the mapping done by enetc_alloc_msix() which
      decides which CPU handles the TX completions of which TX ring in its
      NAPI poll. So the XDP TX ring of CPU 0 is handled by TX ring 6, and the
      XDP TX ring of CPU 1 is handled by TX ring 7.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7eab503b
    • V
      net: enetc: increase TX ring size · ee3e875f
      Vladimir Oltean 提交于
      Now that commit d6a2829e ("net: enetc: increase RX ring default
      size") has increased the RX ring size, it is quite easy to congest the
      TX rings when the traffic is predominantly XDP_TX, as the RX ring is
      quite a bit larger than the TX one.
      
      Since we bit the bullet and did the expensive thing already (larger RX
      rings consume more memory pages), it seems quite foolish to keep the TX
      rings small. So make them equally sized with TX.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee3e875f
  3. 13 4月, 2021 2 次提交
    • Y
      enetc: support PTP Sync packet one-step timestamping · 7294380c
      Yangbo Lu 提交于
      This patch is to add support for PTP Sync packet one-step timestamping.
      Since ENETC single-step register has to be configured dynamically per
      packet for correctionField offeset and UDP checksum update, current
      one-step timestamping packet has to be sent only when the last one
      completes transmitting on hardware. So, on the TX, this patch handles
      one-step timestamping packet as below:
      
      - Trasmit packet immediately if no other one in transfer, or queue to
        skb queue if there is already one in transfer.
        The test_and_set_bit_lock() is used here to lock and check state.
      - Start a work when complete transfer on hardware, to release the bit
        lock and to send one skb in skb queue if has.
      
      And the configuration for one-step timestamping on ENETC before
      transmitting is,
      
      - Set one-step timestamping flag in extension BD.
      - Write 30 bits current timestamp in tstamp field of extension BD.
      - Update PTP Sync packet originTimestamp field with current timestamp.
      - Configure single-step register for correctionField offeset and UDP
        checksum update.
      Signed-off-by: NYangbo Lu <yangbo.lu@nxp.com>
      Reviewed-by: NClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7294380c
    • Y
      enetc: mark TX timestamp type per skb · f768e751
      Yangbo Lu 提交于
      Mark TX timestamp type per skb on skb->cb[0], instead of
      global variable for all skbs. This is a preparation for
      one step timestamp support.
      
      For one-step timestamping enablement, there will be both
      one-step and two-step PTP messages to transfer. And a skb
      queue is needed for one-step PTP messages making sure
      start to send current message only after the last one
      completed on hardware. (ENETC single-step register has to
      be dynamically configured per message.) So, marking TX
      timestamp type per skb is required.
      Signed-off-by: NYangbo Lu <yangbo.lu@nxp.com>
      Reviewed-by: NClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f768e751
  4. 01 4月, 2021 5 次提交
    • V
      net: enetc: add support for XDP_REDIRECT · 9d2b68cc
      Vladimir Oltean 提交于
      The driver implementation of the XDP_REDIRECT action reuses parts from
      XDP_TX, most notably the enetc_xdp_tx function which transmits an array
      of TX software BDs. Only this time, the buffers don't have DMA mappings,
      we need to create them.
      
      When a BPF program reaches the XDP_REDIRECT verdict for a frame, we can
      employ the same buffer reuse strategy as for the normal processing path
      and for XDP_PASS: we can flip to the other page half and seed that to
      the RX ring.
      
      Note that scatter/gather support is there, but disabled due to lack of
      multi-buffer support in XDP (which is added by this series):
      https://patchwork.kernel.org/project/netdevbpf/cover/cover.1616179034.git.lorenzo@kernel.org/Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9d2b68cc
    • V
      net: enetc: increase RX ring default size · d6a2829e
      Vladimir Oltean 提交于
      As explained in the XDP_TX patch, when receiving a burst of frames with
      the XDP_TX verdict, there is a momentary dip in the number of available
      RX buffers. The system will eventually recover as TX completions will
      start kicking in and refilling our RX BD ring again. But until that
      happens, we need to survive with as few out-of-buffer discards as
      possible.
      
      This increases the memory footprint of the driver in order to avoid
      discards at 2.5Gbps line rate 64B packet sizes, the maximum speed
      available for testing on 1 port on NXP LS1028A.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6a2829e
    • V
      net: enetc: add support for XDP_TX · 7ed2bc80
      Vladimir Oltean 提交于
      For reflecting packets back into the interface they came from, we create
      an array of TX software BDs derived from the RX software BDs. Therefore,
      we need to extend the TX software BD structure to contain most of the
      stuff that's already present in the RX software BD structure, for
      reasons that will become evident in a moment.
      
      For a frame with the XDP_TX verdict, we don't reuse any buffer right
      away as we do for XDP_DROP (the same page half) or XDP_PASS (the other
      page half, same as the skb code path).
      
      Because the buffer transfers ownership from the RX ring to the TX ring,
      reusing any page half right away is very dangerous. So what we can do is
      we can recycle the same page half as soon as TX is complete.
      
      The code path is:
      enetc_poll
      -> enetc_clean_rx_ring_xdp
         -> enetc_xdp_tx
         -> enetc_refill_rx_ring
      (time passes, another MSI interrupt is raised)
      enetc_poll
      -> enetc_clean_tx_ring
         -> enetc_recycle_xdp_tx_buff
      
      But that creates a problem, because there is a potentially large time
      window between enetc_xdp_tx and enetc_recycle_xdp_tx_buff, period in
      which we'll have less and less RX buffers.
      
      Basically, when the ship starts sinking, the knee-jerk reaction is to
      let enetc_refill_rx_ring do what it does for the standard skb code path
      (refill every 16 consumed buffers), but that turns out to be very
      inefficient. The problem is that we have no rx_swbd->page at our
      disposal from the enetc_reuse_page path, so enetc_refill_rx_ring would
      have to call enetc_new_page for every buffer that we refill (if we
      choose to refill at this early stage). Very inefficient, it only makes
      the problem worse, because page allocation is an expensive process, and
      CPU time is exactly what we're lacking.
      
      Additionally, there is an even bigger problem: if we let
      enetc_refill_rx_ring top up the ring's buffers again from the RX path,
      remember that the buffers sent to transmission haven't disappeared
      anywhere. They will be eventually sent, and processed in
      enetc_clean_tx_ring, and an attempt will be made to recycle them.
      But surprise, the RX ring is already full of new buffers, because we
      were premature in deciding that we should refill. So not only we took
      the expensive decision of allocating new pages, but now we must throw
      away perfectly good and reusable buffers.
      
      So what we do is we implement an elastic refill mechanism, which keeps
      track of the number of in-flight XDP_TX buffer descriptors. We top up
      the RX ring only up to the total ring capacity minus the number of BDs
      that are in flight (because we know that those BDs will return to us
      eventually).
      
      The enetc driver manages 1 RX ring per CPU, and the default TX ring
      management is the same. So we do XDP_TX towards the TX ring of the same
      index, because it is affined to the same CPU. This will probably not
      produce great results when we have a tc-taprio/tc-mqprio qdisc on the
      interface, because in that case, the number of TX rings might be
      greater, but I didn't add any checks for that yet (mostly because I
      didn't know what checks to add).
      
      It should also be noted that we need to change the DMA mapping direction
      for RX buffers, since they may now be reflected into the TX ring of the
      same device. We choose to use DMA_BIDIRECTIONAL instead of unmapping and
      remapping as DMA_TO_DEVICE, because performance is better this way.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7ed2bc80
    • V
      net: enetc: add support for XDP_DROP and XDP_PASS · d1b15102
      Vladimir Oltean 提交于
      For the RX ring, enetc uses an allocation scheme based on pages split
      into two buffers, which is already very efficient in terms of preventing
      reallocations / maximizing reuse, so I see no reason why I would change
      that.
      
       +--------+--------+--------+--------+--------+--------+--------+
       |        |        |        |        |        |        |        |
       | half B | half B | half B | half B | half B | half B | half B |
       |        |        |        |        |        |        |        |
       +--------+--------+--------+--------+--------+--------+--------+
       |        |        |        |        |        |        |        |
       | half A | half A | half A | half A | half A | half A | half A | RX ring
       |        |        |        |        |        |        |        |
       +--------+--------+--------+--------+--------+--------+--------+
           ^                                                     ^
           |                                                     |
       next_to_clean                                       next_to_alloc
                                                            next_to_use
      
                         +--------+--------+--------+--------+--------+
                         |        |        |        |        |        |
                         | half B | half B | half B | half B | half B |
                         |        |        |        |        |        |
       +--------+--------+--------+--------+--------+--------+--------+
       |        |        |        |        |        |        |        |
       | half B | half B | half A | half A | half A | half A | half A | RX ring
       |        |        |        |        |        |        |        |
       +--------+--------+--------+--------+--------+--------+--------+
       |        |        |   ^                                   ^
       | half A | half A |   |                                   |
       |        |        | next_to_clean                   next_to_use
       +--------+--------+
                    ^
                    |
               next_to_alloc
      
      then when enetc_refill_rx_ring is called, whose purpose is to advance
      next_to_use, it sees that it can take buffers up to next_to_alloc, and
      it says "oh, hey, rx_swbd->page isn't NULL, I don't need to allocate
      one!".
      
      The only problem is that for default PAGE_SIZE values of 4096, buffer
      sizes are 2048 bytes. While this is enough for normal skb allocations at
      an MTU of 1500 bytes, for XDP it isn't, because the XDP headroom is 256
      bytes, and including skb_shared_info and alignment, we end up being able
      to make use of only 1472 bytes, which is insufficient for the default
      MTU.
      
      To solve that problem, we implement scatter/gather processing in the
      driver, because we would really like to keep the existing allocation
      scheme. A packet of 1500 bytes is received in a buffer of 1472 bytes and
      another one of 28 bytes.
      
      Because the headroom required by XDP is different (and much larger) than
      the one required by the network stack, whenever a BPF program is added
      or deleted on the port, we drain the existing RX buffers and seed new
      ones with the required headroom. We also keep the required headroom in
      rx_ring->buffer_offset.
      
      The simplest way to implement XDP_PASS, where an skb must be created, is
      to create an xdp_buff based on the next_to_clean RX BDs, but not clear
      those BDs from the RX ring yet, just keep the original index at which
      the BDs for this frame started. Then, if the verdict is XDP_PASS,
      instead of converting the xdb_buff to an skb, we replay a call to
      enetc_build_skb (just as in the normal enetc_clean_rx_ring case),
      starting from the original BD index.
      
      We would also like to be minimally invasive to the regular RX data path,
      and not check whether there is a BPF program attached to the ring on
      every packet. So we create a separate RX ring processing function for
      XDP.
      
      Because we only install/remove the BPF program while the interface is
      down, we forgo the rcu_read_lock() in enetc_clean_rx_ring, since there
      shouldn't be any circumstance in which we are processing packets and
      there is a potentially freed BPF program attached to the RX ring.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1b15102
    • V
      net: enetc: add a dedicated is_eof bit in the TX software BD · d504498d
      Vladimir Oltean 提交于
      In the transmit path, if we have a scatter/gather frame, it is put into
      multiple software buffer descriptors, the last of which has the skb
      pointer populated (which is necessary for rearming the TX MSI vector and
      for collecting the two-step TX timestamp from the TX confirmation path).
      
      At the moment, this is sufficient, but with XDP_TX, we'll need to
      service TX software buffer descriptors that don't have an skb pointer,
      however they might be final nonetheless. So add a dedicated bit for
      final software BDs that we populate and check explicitly. Also, we keep
      looking just for an skb when doing TX timestamping, because we don't
      want/need that for XDP.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d504498d
  5. 11 3月, 2021 7 次提交
  6. 02 3月, 2021 2 次提交
    • V
      net: enetc: initialize RFS/RSS memories for unused ports too · 3222b5b6
      Vladimir Oltean 提交于
      Michael reports that since linux-next-20210211, the AER messages for ECC
      errors have started reappearing, and this time they can be reliably
      reproduced with the first ping on one of his LS1028A boards.
      
      $ ping 1[   33.258069] pcieport 0000:00:1f.0: AER: Multiple Corrected error received: 0000:00:00.0
      72.16.0.1
      PING [   33.267050] pcieport 0000:00:1f.0: AER: can't find device of ID0000
      172.16.0.1 (172.16.0.1): 56 data bytes
      64 bytes from 172.16.0.1: seq=0 ttl=64 time=17.124 ms
      64 bytes from 172.16.0.1: seq=1 ttl=64 time=0.273 ms
      
      $ devmem 0x1f8010e10 32
      0xC0000006
      
      It isn't clear why this is necessary, but it seems that for the errors
      to go away, we must clear the entire RFS and RSS memory, not just for
      the ports in use.
      
      Sadly the code is structured in such a way that we can't have unified
      logic for the used and unused ports. For the minimal initialization of
      an unused port, we need just to enable and ioremap the PF memory space,
      and a control buffer descriptor ring. Unused ports must then free the
      CBDR because the driver will exit, but used ports can not pick up from
      where that code path left, since the CBDR API does not reinitialize a
      ring when setting it up, so its producer and consumer indices are out of
      sync between the software and hardware state. So a separate
      enetc_init_unused_port function was created, and it gets called right
      after the PF memory space is enabled.
      
      Fixes: 07bf34a5 ("net: enetc: initialize the RFS and RSS memories")
      Reported-by: NMichael Walle <michael@walle.cc>
      Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: NMichael Walle <michael@walle.cc>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3222b5b6
    • V
      net: enetc: don't overwrite the RSS indirection table when initializing · c646d10d
      Vladimir Oltean 提交于
      After the blamed patch, all RX traffic gets hashed to CPU 0 because the
      hashing indirection table set up in:
      
      enetc_pf_probe
      -> enetc_alloc_si_resources
         -> enetc_configure_si
            -> enetc_setup_default_rss_table
      
      is overwritten later in:
      
      enetc_pf_probe
      -> enetc_init_port_rss_memory
      
      which zero-initializes the entire port RSS table in order to avoid ECC errors.
      
      The trouble really is that enetc_init_port_rss_memory really neads
      enetc_alloc_si_resources to be called, because it depends upon
      enetc_alloc_cbdr and enetc_setup_cbdr. But that whole enetc_configure_si
      thing could have been better thought out, it has nothing to do in a
      function called "alloc_si_resources", especially since its counterpart,
      "free_si_resources", does nothing to unwind the configuration of the SI.
      
      The point is, we need to pull out enetc_configure_si out of
      enetc_alloc_resources, and move it after enetc_init_port_rss_memory.
      This allows us to set up the default RSS indirection table after
      initializing the memory.
      
      Fixes: 07bf34a5 ("net: enetc: initialize the RFS and RSS memories")
      Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c646d10d
  7. 05 11月, 2020 1 次提交
  8. 12 10月, 2020 1 次提交
    • C
      enetc: Migrate to PHYLINK and PCS_LYNX · 71b77a7a
      Claudiu Manoil 提交于
      This is a methodical transition of the driver from phylib
      to phylink, following the guidelines from sfp-phylink.rst.
      The MAC register configurations based on interface mode
      were moved from the probing path to the mac_config() hook.
      MAC enable and disable commands (enabling Rx and Tx paths
      at MAC level) were also extracted and assigned to their
      corresponding phylink hooks.
      As part of the migration to phylink, the serdes configuration
      from the driver was offloaded to the PCS_LYNX module,
      introduced in commit 0da4c3d3 ("net: phy: add Lynx PCS module"),
      the PCS_LYNX module being a mandatory component required to
      make the enetc driver work with phylink.
      Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com>
      Reviewed-by: NIoana Ciornei <ioana.cionei@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      71b77a7a
  9. 22 7月, 2020 4 次提交
    • C
      enetc: Add adaptive interrupt coalescing · ae0e6a5d
      Claudiu Manoil 提交于
      Use the generic dynamic interrupt moderation (dim)
      framework to implement adaptive interrupt coalescing
      on Rx.  With the per-packet interrupt scheme, a high
      interrupt rate has been noted for moderate traffic flows
      leading to high CPU utilization.  The 'dim' scheme
      implemented by the current patch addresses this issue
      improving CPU utilization while using minimal coalescing
      time thresholds in order to preserve a good latency.
      On the Tx side use an optimal time threshold value by
      default.  This value has been optimized for Tx TCP
      streams at a rate of around 85kpps on a 1G link,
      at which rate half of the Tx ring size (128) gets filled
      in 1500 usecs.  Scaling this down to 2.5G links yields
      the current value of 600 usecs, which is conservative
      and gives good enough results for 1G links too (see
      next).
      
      Below are some measurement results for before and after
      this patch (and related dependencies) basically, for a
      2 ARM Cortex-A72 @1.3Ghz CPUs system (32 KB L1 data cache),
      using 60secs log netperf TCP stream tests @ 1Gbit link
      (maximum throughput):
      
      1) 1 Rx TCP flow, both Rx and Tx processed by the same NAPI
      thread on the same CPU:
      	CPU utilization		int rate (ints/sec)
      Before:	50%-60% (over 50%)		92k
      After:  13%-22%				3.5k-12k
      Comment:  Major CPU utilization improvement for a single flow
      	  Rx TCP flow (i.e. netperf -t TCP_MAERTS) on a single
      	  CPU. Usually settles under 16% for longer tests.
      
      2) 4 Rx TCP flows + 4 Tx TCP flows (+ pings to check the latency):
      	Total CPU utilization	Total int rate (ints/sec)
      Before:	~80% (spikes to 90%)		~100k
      After:   60% (more steady)		  ~4k
      Comment:  Important improvement for this load test, while the
      	  ping test outcome does not show any notable
      	  difference compared to before.
      Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae0e6a5d
    • C
      enetc: Add interrupt coalescing support · 91571081
      Claudiu Manoil 提交于
      Enable programming of the interrupt coalescing registers
      and allow manual configuration of the coalescing time
      thresholds via ethtool.  Packet thresholds have been fixed
      to predetermined values as there's no point in making them
      run-time configurable, also anticipating the dynamic interrupt
      moderation (DIM) algorithm which uses fixed packet thresholds
      as well.  If the interface is up when the operation mode of
      traffic interrupt events is changed by the user (i.e. switching
      from default per-packet interrupts to coalesced interrupts),
      the traffic needs to be paused in the process.
      This patch also prepares the ground for introducing DIM on Rx.
      Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91571081
    • C
      enetc: Drop redundant ____cacheline_aligned_in_smp · 058d9cfa
      Claudiu Manoil 提交于
      'struct enetc_bdr' is already '____cacheline_aligned_in_smp'.
      Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      058d9cfa
    • C
      enetc: Refine buffer descriptor ring sizes · 02293dd4
      Claudiu Manoil 提交于
      It's time to differentiate between Rx and Tx ring sizes.
      Not only Tx rings are processed differently than Rx rings,
      but their default number also differs - i.e. up to 8 Tx rings
      per device (8 traffic classes) vs. 2 Rx rings (one per CPU).
      So let's set Tx rings sizes to half the size of the Rx rings
      for now, to be conservative.
      The default ring sizes were decreased as well (to the next
      lower power of 2), to reduce the memory footprint, buffering
      etc., since the measurements I've made so far show that the
      rings are very unlikely to get full.
      This change also anticipates the introduction of the
      dynamic interrupt moderation (dim) algorithm which operates
      on maximum packet thresholds of 256 packets for Rx and 128
      packets for Tx.
      Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02293dd4
  10. 02 5月, 2020 2 次提交
    • P
      net: enetc: add tc flower psfp offload driver · 888ae5a3
      Po Liu 提交于
      This patch is to add tc flower offload for the enetc IEEE 802.1Qci(PSFP)
      function. There are four main feature parts to implement the flow
      policing and filtering for ingress flow with IEEE 802.1Qci features.
      They are stream identify(this is defined in the P802.1cb exactly but
      needed for 802.1Qci), stream filtering, stream gate and flow metering.
      Each function block includes many entries by index to assign parameters.
      So for one frame would be filtered by stream identify first, then
      flow into stream filter block by the same handle between stream identify
      and stream filtering. Then flow into stream gate control which assigned
      by the stream filtering entry. And then policing by the gate and limited
      by the max sdu in the filter block(optional). At last, policing by the
      flow metering block, index choosing at the fitering block.
      So you can see that each entry of block may link to many upper entries
      since they can be assigned same index means more streams want to share
      the same feature in the stream filtering or stream gate or flow
      metering.
      To implement such features, each stream filtered by source/destination
      mac address, some stream maybe also plus the vlan id value would be
      treated as one flow chain. This would be identified by the chain_index
      which already in the tc filter concept. Driver would maintain this chain
      and also with gate modules. The stream filter entry create by the gate
      index and flow meter(optional) entry id and also one priority value.
      Offloading only transfer the gate action and flow filtering parameters.
      Driver would create (or search same gate id and flow meter id and
       priority) one stream filter entry to set to the hardware. So stream
      filtering do not need transfer by the action offloading.
      This architecture is same with tc filter and actions relationship. tc
      filter maintain the list for each flow feature by keys. And actions
      maintain by the action list.
      
      Below showing a example commands by tc:
      > tc qdisc add dev eth0 ingress
      > ip link set eth0 address 10:00:80:00:00:00
      > tc filter add dev eth0 parent ffff: protocol ip chain 11 \
      	flower skip_sw dst_mac 10:00:80:00:00:00 \
      	action gate index 10 \
      	sched-entry open 200000000 1 8000000 \
      	sched-entry close 100000000 -1 -1
      
      Command means to set the dst_mac 10:00:80:00:00:00 to index 11 of stream
      identify module. Then setting the gate index 10 of stream gate module.
      Keep the gate open for 200ms and limit the traffic volume to 8MB in this
      sched-entry. Then direct the frames to the ingress queue 1.
      Signed-off-by: NPo Liu <Po.Liu@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      888ae5a3
    • P
      net: enetc: add hw tc hw offload features for PSPF capability · 79e49982
      Po Liu 提交于
      This patch is to let ethtool enable/disable the tc flower offload
      features. Hardware ENETC has the feature of PSFP which is for per-stream
      policing. When enable the tc hw offloading feature, driver would enable
      the IEEE 802.1Qci feature. It is only set the register enable bit for
      this feature not enable for any entry of per stream filtering and stream
      gate or stream identify but get how much capabilities for each feature.
      Signed-off-by: NPo Liu <Po.Liu@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79e49982
  11. 11 3月, 2020 2 次提交
    • C
      enetc: Add dynamic allocation of extended Rx BD rings · 434cebab
      Claudiu Manoil 提交于
      Hardware timestamping support (PTP) on Rx requires extended
      buffer descriptors, double the size of normal Rx descriptors.
      On the current controller revision only the timestamping offload
      requires extended Rx descriptors.
      Since Rx timestamping can be turned on/off at runtime, make Rx ring
      allocation configurable at runtime too. As a result, the static
      config option FSL_ENETC_HW_TIMESTAMPING can be dropped and the
      extended descriptors can be used only when Rx timestamping gets
      activated.
      The extension has the same size as the base descriptor, making
      the descriptor iterators easy to update for the extended case.
      Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      434cebab
    • C
      enetc: Clean up Rx BD iteration · 714239ac
      Claudiu Manoil 提交于
      Improve maintainability of the code iterating the Rx buffer
      descriptors to prepare it to support iterating extended Rx BD
      descriptors as well.
      Don't increment by one the h/w descriptor pointers explicitly,
      provide an iterator that takes care of the h/w details.
      Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      714239ac
  12. 25 2月, 2020 1 次提交
  13. 03 1月, 2020 1 次提交
    • P
      enetc: add support time specific departure base on the qos etf · 0d08c9ec
      Po Liu 提交于
      ENETC implement time specific departure capability, which enables
      the user to specify when a frame can be transmitted. When this
      capability is enabled, the device will delay the transmission of
      the frame so that it can be transmitted at the precisely specified time.
      The delay departure time up to 0.5 seconds in the future. If the
      departure time in the transmit BD has not yet been reached, based
      on the current time, the packet will not be transmitted.
      
      This driver was loaded by Qos driver ETF. User could load it by tc
      commands. Here are the example commands:
      
      tc qdisc add dev eth0 root handle 1: mqprio \
      	   num_tc 8 map 0 1 2 3 4 5 6 7 hw 1
      tc qdisc replace dev eth0 parent 1:8 etf \
      	   clockid CLOCK_TAI delta 30000  offload
      
      These example try to set queue mapping first and then set queue 7
      with 30us ahead dequeue time.
      
      Then user send test frame should set SO_TXTIME feature for socket.
      
      There are also some limitations for this feature in hardware:
      - Transmit checksum offloads and time specific departure operation
      are mutually exclusive.
      - Time Aware Shaper feature (Qbv) offload and time specific departure
      operation are mutually exclusive.
      Signed-off-by: NPo Liu <Po.Liu@nxp.com>
      Reviewed-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d08c9ec
  14. 26 11月, 2019 1 次提交
  15. 17 11月, 2019 2 次提交
  16. 29 5月, 2019 1 次提交
  17. 25 5月, 2019 2 次提交
  18. 25 1月, 2019 3 次提交
    • C
      enetc: Add RFS and RSS support · d382563f
      Claudiu Manoil 提交于
      A ternary match table is used for RFS. If multiple entries in the table
      match, the entry with the lowest numerical values index is chosen as the
      matching entry.  Entries in the table are identified using an index
      which takes a value from 0 to PRFSCAPR[NUM_RFS]-1 when accessed by the
      PSI (PF).
      Portions of the RFS table can be assigned to each SI by the PSI (PF)
      driver in PSIaRFSCFGR.  Assignments are cumulative, the entries assigned
      to SIn start after those assigned to SIn-1.  The total assignments to
      all SIs must be equal to or less than the number available to the port
      as found in PRFSCAPR.
      
      For RSS, the Toeplitz hash function used requires two inputs, a 40B
      random secret key that is supplied through the PRSSKR0-9 registers as well
      as the relevant pieces of the packet header (n-tuple).  The 6 LSB bits of
      the hash function result will then be used as a pointer to obtain the tag
      referenced in the 64 entry indirection table.  The result will provide a
      winning group which will be used to help route the received packet.
      Signed-off-by: NAlex Marginean <alexandru.marginean@nxp.com>
      Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d382563f
    • C
      enetc: Add vf to pf messaging support · beb74ac8
      Claudiu Manoil 提交于
      VSIs (VFs) may send a message to the PSI (PF) for general notification
      or to gain access to hardware resources which requires host inspection.
      These messages may vary in size and are handled as a partition copy
      between two memory regions owned by the respective participants.
      The PSI will respond with fail or success and a 16-bit message code.
      The patch implements the vf to pf messaging mechanism above and, as the
      first application making use of this support, it enables the VF to
      configure its own primary MAC address.
      Signed-off-by: NCatalin Horghidan <catalin.horghidan@nxp.com>
      Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      beb74ac8
    • C
      enetc: Introduce basic PF and VF ENETC ethernet drivers · d4fd0404
      Claudiu Manoil 提交于
      ENETC is a multi-port virtualized Ethernet controller supporting GbE
      designs and Time-Sensitive Networking (TSN) functionality.
      ENETC is operating as an SR-IOV multi-PF capable Root Complex Integrated
      Endpoint (RCIE).  As such, it contains multiple physical (PF) and
      virtual (VF) PCIe functions, discoverable by standard PCI Express.
      
      Introduce basic PF and VF ENETC ethernet drivers.  The PF has access to
      the ENETC Port registers and resources and makes the required privileged
      configurations for the underlying VF devices.  Common functionality is
      controlled through so called System Interface (SI) register blocks, PFs
      and VFs own a SI each.  Though SI register blocks are almost identical,
      there are a few privileged SI level controls that are accessible only to
      PFs, and so the distinction is made between PF SIs (PSI) and VF SIs (VSI).
      As such, the bulk of the code, including datapath processing, basic h/w
      offload support and generic pci related configuration, is shared between
      the 2 drivers and is factored out in common source files (i.e. enetc.c).
      
      Major functionalities included (for both drivers):
      MSI-X support for Rx and Tx processing, assignment of Rx/Tx BD ring pairs
      to MSI-X entries, multi-queue support, Rx S/G (Rx frame fragmentation) and
      jumbo frame (up to 9600B) support, Rx paged allocation and reuse, Tx S/G
      support (NETIF_F_SG), Rx and Tx checksum offload, PF MAC filtering and
      initial control ring support, VLAN extraction/ insertion, PF Rx VLAN
      CTAG filtering, VF mac address config support, VF VLAN isolation support,
      etc.
      Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4fd0404