1. 03 4月, 2021 3 次提交
  2. 01 4月, 2021 9 次提交
    • V
      net: enetc: add support for XDP_REDIRECT · 9d2b68cc
      Vladimir Oltean 提交于
      The driver implementation of the XDP_REDIRECT action reuses parts from
      XDP_TX, most notably the enetc_xdp_tx function which transmits an array
      of TX software BDs. Only this time, the buffers don't have DMA mappings,
      we need to create them.
      
      When a BPF program reaches the XDP_REDIRECT verdict for a frame, we can
      employ the same buffer reuse strategy as for the normal processing path
      and for XDP_PASS: we can flip to the other page half and seed that to
      the RX ring.
      
      Note that scatter/gather support is there, but disabled due to lack of
      multi-buffer support in XDP (which is added by this series):
      https://patchwork.kernel.org/project/netdevbpf/cover/cover.1616179034.git.lorenzo@kernel.org/Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9d2b68cc
    • V
      net: enetc: increase RX ring default size · d6a2829e
      Vladimir Oltean 提交于
      As explained in the XDP_TX patch, when receiving a burst of frames with
      the XDP_TX verdict, there is a momentary dip in the number of available
      RX buffers. The system will eventually recover as TX completions will
      start kicking in and refilling our RX BD ring again. But until that
      happens, we need to survive with as few out-of-buffer discards as
      possible.
      
      This increases the memory footprint of the driver in order to avoid
      discards at 2.5Gbps line rate 64B packet sizes, the maximum speed
      available for testing on 1 port on NXP LS1028A.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6a2829e
    • V
      net: enetc: add support for XDP_TX · 7ed2bc80
      Vladimir Oltean 提交于
      For reflecting packets back into the interface they came from, we create
      an array of TX software BDs derived from the RX software BDs. Therefore,
      we need to extend the TX software BD structure to contain most of the
      stuff that's already present in the RX software BD structure, for
      reasons that will become evident in a moment.
      
      For a frame with the XDP_TX verdict, we don't reuse any buffer right
      away as we do for XDP_DROP (the same page half) or XDP_PASS (the other
      page half, same as the skb code path).
      
      Because the buffer transfers ownership from the RX ring to the TX ring,
      reusing any page half right away is very dangerous. So what we can do is
      we can recycle the same page half as soon as TX is complete.
      
      The code path is:
      enetc_poll
      -> enetc_clean_rx_ring_xdp
         -> enetc_xdp_tx
         -> enetc_refill_rx_ring
      (time passes, another MSI interrupt is raised)
      enetc_poll
      -> enetc_clean_tx_ring
         -> enetc_recycle_xdp_tx_buff
      
      But that creates a problem, because there is a potentially large time
      window between enetc_xdp_tx and enetc_recycle_xdp_tx_buff, period in
      which we'll have less and less RX buffers.
      
      Basically, when the ship starts sinking, the knee-jerk reaction is to
      let enetc_refill_rx_ring do what it does for the standard skb code path
      (refill every 16 consumed buffers), but that turns out to be very
      inefficient. The problem is that we have no rx_swbd->page at our
      disposal from the enetc_reuse_page path, so enetc_refill_rx_ring would
      have to call enetc_new_page for every buffer that we refill (if we
      choose to refill at this early stage). Very inefficient, it only makes
      the problem worse, because page allocation is an expensive process, and
      CPU time is exactly what we're lacking.
      
      Additionally, there is an even bigger problem: if we let
      enetc_refill_rx_ring top up the ring's buffers again from the RX path,
      remember that the buffers sent to transmission haven't disappeared
      anywhere. They will be eventually sent, and processed in
      enetc_clean_tx_ring, and an attempt will be made to recycle them.
      But surprise, the RX ring is already full of new buffers, because we
      were premature in deciding that we should refill. So not only we took
      the expensive decision of allocating new pages, but now we must throw
      away perfectly good and reusable buffers.
      
      So what we do is we implement an elastic refill mechanism, which keeps
      track of the number of in-flight XDP_TX buffer descriptors. We top up
      the RX ring only up to the total ring capacity minus the number of BDs
      that are in flight (because we know that those BDs will return to us
      eventually).
      
      The enetc driver manages 1 RX ring per CPU, and the default TX ring
      management is the same. So we do XDP_TX towards the TX ring of the same
      index, because it is affined to the same CPU. This will probably not
      produce great results when we have a tc-taprio/tc-mqprio qdisc on the
      interface, because in that case, the number of TX rings might be
      greater, but I didn't add any checks for that yet (mostly because I
      didn't know what checks to add).
      
      It should also be noted that we need to change the DMA mapping direction
      for RX buffers, since they may now be reflected into the TX ring of the
      same device. We choose to use DMA_BIDIRECTIONAL instead of unmapping and
      remapping as DMA_TO_DEVICE, because performance is better this way.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7ed2bc80
    • V
      net: enetc: add support for XDP_DROP and XDP_PASS · d1b15102
      Vladimir Oltean 提交于
      For the RX ring, enetc uses an allocation scheme based on pages split
      into two buffers, which is already very efficient in terms of preventing
      reallocations / maximizing reuse, so I see no reason why I would change
      that.
      
       +--------+--------+--------+--------+--------+--------+--------+
       |        |        |        |        |        |        |        |
       | half B | half B | half B | half B | half B | half B | half B |
       |        |        |        |        |        |        |        |
       +--------+--------+--------+--------+--------+--------+--------+
       |        |        |        |        |        |        |        |
       | half A | half A | half A | half A | half A | half A | half A | RX ring
       |        |        |        |        |        |        |        |
       +--------+--------+--------+--------+--------+--------+--------+
           ^                                                     ^
           |                                                     |
       next_to_clean                                       next_to_alloc
                                                            next_to_use
      
                         +--------+--------+--------+--------+--------+
                         |        |        |        |        |        |
                         | half B | half B | half B | half B | half B |
                         |        |        |        |        |        |
       +--------+--------+--------+--------+--------+--------+--------+
       |        |        |        |        |        |        |        |
       | half B | half B | half A | half A | half A | half A | half A | RX ring
       |        |        |        |        |        |        |        |
       +--------+--------+--------+--------+--------+--------+--------+
       |        |        |   ^                                   ^
       | half A | half A |   |                                   |
       |        |        | next_to_clean                   next_to_use
       +--------+--------+
                    ^
                    |
               next_to_alloc
      
      then when enetc_refill_rx_ring is called, whose purpose is to advance
      next_to_use, it sees that it can take buffers up to next_to_alloc, and
      it says "oh, hey, rx_swbd->page isn't NULL, I don't need to allocate
      one!".
      
      The only problem is that for default PAGE_SIZE values of 4096, buffer
      sizes are 2048 bytes. While this is enough for normal skb allocations at
      an MTU of 1500 bytes, for XDP it isn't, because the XDP headroom is 256
      bytes, and including skb_shared_info and alignment, we end up being able
      to make use of only 1472 bytes, which is insufficient for the default
      MTU.
      
      To solve that problem, we implement scatter/gather processing in the
      driver, because we would really like to keep the existing allocation
      scheme. A packet of 1500 bytes is received in a buffer of 1472 bytes and
      another one of 28 bytes.
      
      Because the headroom required by XDP is different (and much larger) than
      the one required by the network stack, whenever a BPF program is added
      or deleted on the port, we drain the existing RX buffers and seed new
      ones with the required headroom. We also keep the required headroom in
      rx_ring->buffer_offset.
      
      The simplest way to implement XDP_PASS, where an skb must be created, is
      to create an xdp_buff based on the next_to_clean RX BDs, but not clear
      those BDs from the RX ring yet, just keep the original index at which
      the BDs for this frame started. Then, if the verdict is XDP_PASS,
      instead of converting the xdb_buff to an skb, we replay a call to
      enetc_build_skb (just as in the normal enetc_clean_rx_ring case),
      starting from the original BD index.
      
      We would also like to be minimally invasive to the regular RX data path,
      and not check whether there is a BPF program attached to the ring on
      every packet. So we create a separate RX ring processing function for
      XDP.
      
      Because we only install/remove the BPF program while the interface is
      down, we forgo the rcu_read_lock() in enetc_clean_rx_ring, since there
      shouldn't be any circumstance in which we are processing packets and
      there is a potentially freed BPF program attached to the RX ring.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1b15102
    • V
      net: enetc: move up enetc_reuse_page and enetc_page_reusable · 65d0cbb4
      Vladimir Oltean 提交于
      For XDP_TX, we need to call enetc_reuse_page from enetc_clean_tx_ring,
      so we need to avoid a forward declaration.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      65d0cbb4
    • V
      net: enetc: clean the TX software BD on the TX confirmation path · 1ee8d6f3
      Vladimir Oltean 提交于
      With the future introduction of some new fields into enetc_tx_swbd such
      as is_xdp_tx, is_xdp_redirect etc, we need not only to set these bits
      to true from the XDP_TX/XDP_REDIRECT code path, but also to false from
      the old code paths.
      
      This is because TX software buffer descriptors are kept in a ring that
      is shadow of the hardware TX ring, so these structures keep getting
      reused, and there is always the possibility that when a software BD is
      reused (after we ran a full circle through the TX ring), the old user of
      the tx_swbd had set is_xdp_tx = true, and now we are sending a regular
      skb, which would need to set is_xdp_tx = false.
      
      To be minimally invasive to the old code paths, let's just scrub the
      software TX BD in the TX confirmation path (enetc_clean_tx_ring), once
      we know that nobody uses this software TX BD (tx_ring->next_to_clean
      hasn't yet been updated, and the TX paths check enetc_bd_unused which
      tells them if there's any more space in the TX ring for a new enqueue).
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ee8d6f3
    • V
      net: enetc: add a dedicated is_eof bit in the TX software BD · d504498d
      Vladimir Oltean 提交于
      In the transmit path, if we have a scatter/gather frame, it is put into
      multiple software buffer descriptors, the last of which has the skb
      pointer populated (which is necessary for rearming the TX MSI vector and
      for collecting the two-step TX timestamp from the TX confirmation path).
      
      At the moment, this is sufficient, but with XDP_TX, we'll need to
      service TX software buffer descriptors that don't have an skb pointer,
      however they might be final nonetheless. So add a dedicated bit for
      final software BDs that we populate and check explicitly. Also, we keep
      looking just for an skb when doing TX timestamping, because we don't
      want/need that for XDP.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d504498d
    • V
      net: enetc: move skb creation into enetc_build_skb · a800abd3
      Vladimir Oltean 提交于
      We need to build an skb from two code paths now: from the plain RX data
      path and from the XDP data path when the verdict is XDP_PASS.
      
      Create a new enetc_build_skb function which contains the essential steps
      for building an skb based on the first and last positions of buffer
      descriptors within the RX ring.
      
      We also squash the enetc_process_skb function into enetc_build_skb,
      because what that function did wasn't very meaningful on its own.
      
      The "rx_frm_cnt++" instruction has been moved around napi_gro_receive
      for cosmetic reasons, to be in the same spot as rx_byte_cnt++, which
      itself must be before napi_gro_receive, because that's when we lose
      ownership of the skb.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a800abd3
    • V
      net: enetc: consume the error RX buffer descriptors in a dedicated function · 2fa423f5
      Vladimir Oltean 提交于
      We can and should check the RX BD errors before starting to build the
      skb. The only apparent reason why things are done in this backwards
      order is to spare one call to enetc_rxbd_next.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2fa423f5
  3. 31 3月, 2021 5 次提交
  4. 25 3月, 2021 2 次提交
    • V
      net: enetc: don't depend on system endianness in enetc_set_mac_ht_flt · e366a392
      Vladimir Oltean 提交于
      When enetc runs out of exact match entries for unicast address
      filtering, it switches to an approach based on hash tables, where
      multiple MAC addresses might end up in the same bucket.
      
      However, the enetc_set_mac_ht_flt function currently depends on the
      system endianness, because it interprets the 64-bit hash value as an
      array of two u32 elements. Modify this to use lower_32_bits and
      upper_32_bits.
      
      Tested by forcing enetc to go into hash table mode by creating two
      macvlan upper interfaces:
      
      ip link add link eno0 address 00:01:02:03:00:00 eno0.0 type macvlan && ip link set eno0.0 up
      ip link add link eno0 address 00:01:02:03:00:01 eno0.1 type macvlan && ip link set eno0.1 up
      
      and verified that the same bit values are written to the registers
      before and after:
      
      enetc_sync_mac_filters: addr 00:00:80:00:40:10 exact match 0
      enetc_sync_mac_filters: addr 00:00:00:00:80:00 exact match 0
      enetc_set_mac_ht_flt: hash 0x80008000000000 UMHFR0 0x0 UMHFR1 0x800080
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e366a392
    • V
      net: enetc: don't depend on system endianness in enetc_set_vlan_ht_filter · 110eccdb
      Vladimir Oltean 提交于
      ENETC has a 64-entry hash table for VLAN RX filtering per Station
      Interface, which is accessed through two 32-bit registers: VHFR0 holding
      the low portion, and VHFR1 holding the high portion.
      
      The enetc_set_vlan_ht_filter function looks at the pf->vlan_ht_filter
      bitmap, which is fundamentally an unsigned long variable, and casts it
      to a u32 array of two elements. It puts the first u32 element into VHFR0
      and the second u32 element into VHFR1.
      
      It is easy to imagine that this will not work on big endian systems
      (although, yes, we have bigger problems, because currently enetc assumes
      that the CPU endianness is equal to the controller endianness, aka
      little endian - but let's assume that we could add a cpu_to_le32 in
      enetc_wd_reg and a le32_to_cpu in enetc_rd_reg).
      
      Let's use lower_32_bits and upper_32_bits which are designed to work
      regardless of endianness.
      
      Tested that both the old and the new method produce the same results:
      
      $ ethtool -K eth1 rx-vlan-filter on
      $ ip link add link eth1 name eth1.100 type vlan id 100
      enetc_set_vlan_ht_filter: method 1: si_idx 0 VHFR0 0x0 VHFR1 0x20
      enetc_set_vlan_ht_filter: method 2: si_idx 0 VHFR0 0x0 VHFR1 0x20
      $ ip link add link eth1 name eth1.101 type vlan id 101
      enetc_set_vlan_ht_filter: method 1: si_idx 0 VHFR0 0x0 VHFR1 0x30
      enetc_set_vlan_ht_filter: method 2: si_idx 0 VHFR0 0x0 VHFR1 0x30
      $ ip link add link eth1 name eth1.34 type vlan id 34
      enetc_set_vlan_ht_filter: method 1: si_idx 0 VHFR0 0x0 VHFR1 0x34
      enetc_set_vlan_ht_filter: method 2: si_idx 0 VHFR0 0x0 VHFR1 0x34
      $ ip link add link eth1 name eth1.1024 type vlan id 1024
      enetc_set_vlan_ht_filter: method 1: si_idx 0 VHFR0 0x1 VHFR1 0x34
      enetc_set_vlan_ht_filter: method 2: si_idx 0 VHFR0 0x1 VHFR1 0x34
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      110eccdb
  5. 23 3月, 2021 6 次提交
  6. 20 3月, 2021 1 次提交
    • V
      net: enetc: teardown CBDR during PF/VF unbind · c54f042d
      Vladimir Oltean 提交于
      Michael reports that after the blamed patch, unbinding a VF would cause
      these transactions to remain pending, and trigger some warnings with the
      DMA API debug:
      
      $ echo 1 > /sys/bus/pci/devices/0000\:00\:00.0/sriov_numvfs
      pci 0000:00:01.0: [1957:ef00] type 00 class 0x020001
      fsl_enetc_vf 0000:00:01.0: Adding to iommu group 19
      fsl_enetc_vf 0000:00:01.0: enabling device (0000 -> 0002)
      fsl_enetc_vf 0000:00:01.0 eno0vf0: renamed from eth0
      
      $ echo 0 > /sys/bus/pci/devices/0000\:00\:00.0/sriov_numvfs
      DMA-API: pci 0000:00:01.0: device driver has pending DMA allocations while released from device [count=1]
      One of leaked entries details: [size=2048 bytes] [mapped with DMA_BIDIRECTIONAL] [mapped as coherent]
      WARNING: CPU: 0 PID: 2547 at kernel/dma/debug.c:853 dma_debug_device_change+0x174/0x1c8
      (...)
      Call trace:
       dma_debug_device_change+0x174/0x1c8
       blocking_notifier_call_chain+0x74/0xa8
       device_release_driver_internal+0x18c/0x1f0
       device_release_driver+0x20/0x30
       pci_stop_bus_device+0x8c/0xe8
       pci_stop_and_remove_bus_device+0x20/0x38
       pci_iov_remove_virtfn+0xb8/0x128
       sriov_disable+0x3c/0x110
       pci_disable_sriov+0x24/0x30
       enetc_sriov_configure+0x4c/0x108
       sriov_numvfs_store+0x11c/0x198
      (...)
      DMA-API: Mapped at:
       dma_entry_alloc+0xa4/0x130
       debug_dma_alloc_coherent+0xbc/0x138
       dma_alloc_attrs+0xa4/0x108
       enetc_setup_cbdr+0x4c/0x1d0
       enetc_vf_probe+0x11c/0x250
      pci 0000:00:01.0: Removing from iommu group 19
      
      This happens because stupid me moved enetc_teardown_cbdr outside of
      enetc_free_si_resources, but did not bother to keep calling
      enetc_teardown_cbdr from all the places where enetc_free_si_resources
      was called. In particular, now it is no longer called from the main
      unbind function, just from the probe error path.
      
      Fixes: 4b47c0b8 ("net: enetc: don't initialize unused ports from a separate code path")
      Reported-by: NMichael Walle <michael@walle.cc>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: NMichael Walle <michael@walle.cc>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c54f042d
  7. 18 3月, 2021 1 次提交
  8. 17 3月, 2021 5 次提交
  9. 14 3月, 2021 1 次提交
  10. 11 3月, 2021 7 次提交