1. 30 10月, 2021 1 次提交
  2. 15 10月, 2021 2 次提交
    • M
      ice: propagate xdp_ring onto rx_ring · eb087cd8
      Maciej Fijalkowski 提交于
      With rings being split, it is now convenient to introduce a pointer to
      XDP ring within the Rx ring. For XDP_TX workloads this means that
      xdp_rings array access will be skipped, which was executed per each
      processed frame.
      
      Also, read the XDP prog once per NAPI and if prog is present, set up the
      local xdp_ring pointer. Reading prog a single time was discussed in [1]
      with some concern raised by Toke around dispatcher handling and having
      the need for going through the RCU grace period in the ndo_bpf driver
      callback, but ice currently is torning down NAPI instances regardless of
      the prog presence on VSI.
      
      Although the pointer to XDP ring introduced to Rx ring makes things a
      lot slimmer/simpler, I still feel that single prog read per NAPI
      lifetime is beneficial.
      
      Further patch that will introduce the fallback path will also get a
      profit from that as xdp_ring pointer will be set during the XDP rings
      setup.
      
      [1]: https://lore.kernel.org/bpf/87k0oseo6e.fsf@toke.dk/Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: NGeorge Kuruvinakunnel <george.kuruvinakunnel@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      eb087cd8
    • M
      ice: split ice_ring onto Tx/Rx separate structs · e72bba21
      Maciej Fijalkowski 提交于
      While it was convenient to have a generic ring structure that served
      both Tx and Rx sides, next commits are going to introduce several
      Tx-specific fields, so in order to avoid hurting the Rx side, let's
      pull out the Tx ring onto new ice_tx_ring and ice_rx_ring structs.
      
      Rx ring could be handled by the old ice_ring which would reduce the code
      churn within this patch, but this would make things asymmetric.
      
      Make the union out of the ring container within ice_q_vector so that it
      is possible to iterate over newly introduced ice_tx_ring.
      
      Remove the @size as it's only accessed from control path and it can be
      calculated pretty easily.
      
      Change definitions of ice_update_ring_stats and
      ice_fetch_u64_stats_per_ring so that they are ring agnostic and can be
      used for both Rx and Tx rings.
      
      Sizes of Rx and Tx ring structs are 256 and 192 bytes, respectively. In
      Rx ring xdp_rxq_info occupies its own cacheline, so it's the major
      difference now.
      Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: NGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      e72bba21
  3. 28 9月, 2021 2 次提交
    • M
      ice: Use the xsk batched rx allocation interface · db804cfc
      Magnus Karlsson 提交于
      Use the new xsk batched rx allocation interface for the zero-copy data
      path. As the array of struct xdp_buff pointers kept by the driver is
      really a ring that wraps, the allocation routine is modified to detect
      a wrap and in that case call the allocation function twice. The
      allocation function cannot deal with wrapped rings, only arrays. As we
      now know exactly how many buffers we get and that there is no
      wrapping, the allocation function can be simplified even more as all
      if-statements in the allocation loop can be removed, improving
      performance.
      Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210922075613.12186-5-magnus.karlsson@gmail.com
      db804cfc
    • M
      ice: Use xdp_buf instead of rx_buf for xsk zero-copy · 57f7f8b6
      Magnus Karlsson 提交于
      In order to use the new xsk batched buffer allocation interface, a
      pointer to an array of struct xsk_buff pointers need to be provided so
      that the function can put the result of the allocation there. In the
      ice driver, we already have a ring that stores pointers to
      xdp_buffs. This is only used for the xsk zero-copy driver and is a
      union with the structure that is used for the regular non zero-copy
      path. Unfortunately, that structure is larger than the xdp_buffs
      pointers which mean that there will be a stride (of 20 bytes) between
      each xdp_buff pointer. And feeding this into the xsk_buff_alloc_batch
      interface will not work since it assumes a regular array of xdp_buff
      pointers (each 8 bytes with 0 bytes in-between them on a 64-bit
      system).
      
      To fix this, remove the xdp_buff pointer from the rx_buf union and
      move it one step higher to the union above which only has pointers to
      arrays in it. This solves the problem and we can directly feed the SW
      ring of xdp_buff pointers straight into the allocation function in the
      next patch when that interface is used. This will improve performance.
      Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210922075613.12186-4-magnus.karlsson@gmail.com
      57f7f8b6
  4. 25 6月, 2021 1 次提交
    • T
      intel: Remove rcu_read_lock() around XDP program invocation · 49589b23
      Toke Høiland-Jørgensen 提交于
      The Intel drivers all have rcu_read_lock()/rcu_read_unlock() pairs around
      XDP program invocations. However, the actual lifetime of the objects
      referred by the XDP program invocation is longer, all the way through to
      the call to xdp_do_flush(), making the scope of the rcu_read_lock() too
      small. This turns out to be harmless because it all happens in a single
      NAPI poll cycle (and thus under local_bh_disable()), but it makes the
      rcu_read_lock() misleading.
      
      Rather than extend the scope of the rcu_read_lock(), just get rid of it
      entirely. With the addition of RCU annotations to the XDP_REDIRECT map
      types that take bh execution into account, lockdep even understands this to
      be safe, so there's really no reason to keep it around.
      Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: Jesper Dangaard Brouer <brouer@redhat.com> # i40e
      Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
      Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
      Cc: intel-wired-lan@lists.osuosl.org
      Link: https://lore.kernel.org/bpf/20210624160609.292325-12-toke@redhat.com
      49589b23
  5. 18 6月, 2021 1 次提交
  6. 07 6月, 2021 1 次提交
  7. 03 6月, 2021 2 次提交
    • M
      ice: track AF_XDP ZC enabled queues in bitmap · e102db78
      Maciej Fijalkowski 提交于
      Commit c7a21904 ("ice: Remove xsk_buff_pool from VSI structure")
      silently introduced a regression and broke the Tx side of AF_XDP in copy
      mode. xsk_pool on ice_ring is set only based on the existence of the XDP
      prog on the VSI which in turn picks ice_clean_tx_irq_zc to be executed.
      That is not something that should happen for copy mode as it should use
      the regular data path ice_clean_tx_irq.
      
      This results in a following splat when xdpsock is run in txonly or l2fwd
      scenarios in copy mode:
      
      <snip>
      [  106.050195] BUG: kernel NULL pointer dereference, address: 0000000000000030
      [  106.057269] #PF: supervisor read access in kernel mode
      [  106.062493] #PF: error_code(0x0000) - not-present page
      [  106.067709] PGD 0 P4D 0
      [  106.070293] Oops: 0000 [#1] PREEMPT SMP NOPTI
      [  106.074721] CPU: 61 PID: 0 Comm: swapper/61 Not tainted 5.12.0-rc2+ #45
      [  106.081436] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
      [  106.092027] RIP: 0010:xp_raw_get_dma+0x36/0x50
      [  106.096551] Code: 74 14 48 b8 ff ff ff ff ff ff 00 00 48 21 f0 48 c1 ee 30 48 01 c6 48 8b 87 90 00 00 00 48 89 f2 81 e6 ff 0f 00 00 48 c1 ea 0c <48> 8b 04 d0 48 83 e0 fe 48 01 f0 c3 66 66 2e 0f 1f 84 00 00 00 00
      [  106.115588] RSP: 0018:ffffc9000d694e50 EFLAGS: 00010206
      [  106.120893] RAX: 0000000000000000 RBX: ffff88984b8c8a00 RCX: ffff889852581800
      [  106.128137] RDX: 0000000000000006 RSI: 0000000000000000 RDI: ffff88984cd8b800
      [  106.135383] RBP: ffff888123b50001 R08: ffff889896800000 R09: 0000000000000800
      [  106.142628] R10: 0000000000000000 R11: ffffffff826060c0 R12: 00000000000000ff
      [  106.149872] R13: 0000000000000000 R14: 0000000000000040 R15: ffff888123b50018
      [  106.157117] FS:  0000000000000000(0000) GS:ffff8897e0f40000(0000) knlGS:0000000000000000
      [  106.165332] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  106.171163] CR2: 0000000000000030 CR3: 000000000560a004 CR4: 00000000007706e0
      [  106.178408] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  106.185653] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  106.192898] PKRU: 55555554
      [  106.195653] Call Trace:
      [  106.198143]  <IRQ>
      [  106.200196]  ice_clean_tx_irq_zc+0x183/0x2a0 [ice]
      [  106.205087]  ice_napi_poll+0x3e/0x590 [ice]
      [  106.209356]  __napi_poll+0x2a/0x160
      [  106.212911]  net_rx_action+0xd6/0x200
      [  106.216634]  __do_softirq+0xbf/0x29b
      [  106.220274]  irq_exit_rcu+0x88/0xc0
      [  106.223819]  common_interrupt+0x7b/0xa0
      [  106.227719]  </IRQ>
      [  106.229857]  asm_common_interrupt+0x1e/0x40
      </snip>
      
      Fix this by introducing the bitmap of queues that are zero-copy enabled,
      where each bit, corresponding to a queue id that xsk pool is being
      configured on, will be set/cleared within ice_xsk_pool_{en,dis}able and
      checked within ice_xsk_pool(). The latter is a function used for
      deciding which napi poll routine is executed.
      Idea is being taken from our other drivers such as i40e and ixgbe.
      
      Fixes: c7a21904 ("ice: Remove xsk_buff_pool from VSI structure")
      Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: NKiran Bhandare <kiranx.bhandare@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      e102db78
    • M
      ice: add correct exception tracing for XDP · 89d65df0
      Magnus Karlsson 提交于
      Add missing exception tracing to XDP when a number of different
      errors can occur. The support was only partial. Several errors
      where not logged which would confuse the user quite a lot not
      knowing where and why the packets disappeared.
      
      Fixes: efc2214b ("ice: Add support for XDP")
      Fixes: 2d4238f5 ("ice: Add support for AF_XDP")
      Reported-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
      Tested-by: NKiran Bhandare <kiranx.bhandare@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      89d65df0
  8. 15 4月, 2021 2 次提交
  9. 16 3月, 2021 1 次提交
  10. 12 3月, 2021 1 次提交
    • M
      ice: fix napi work done reporting in xsk path · ed0907e3
      Magnus Karlsson 提交于
      Fix the wrong napi work done reporting in the xsk path of the ice
      driver. The code in the main Rx processing loop was written to assume
      that the buffer allocation code returns true if all allocations where
      successful and false if not. In contrast with all other Intel NIC xsk
      drivers, the ice_alloc_rx_bufs_zc() has the inverted logic messing up
      the work done reporting in the napi loop.
      
      This can be fixed either by inverting the return value from
      ice_alloc_rx_bufs_zc() in the function that uses this in an incorrect
      way, or by changing the return value of ice_alloc_rx_bufs_zc(). We
      chose the latter as it makes all the xsk allocation functions for
      Intel NICs behave in the same way. My guess is that it was this
      unexpected discrepancy that gave rise to this bug in the first place.
      
      Fixes: 5bb0c4b5 ("ice, xsk: Move Rx allocation out of while-loop")
      Reported-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
      Tested-by: NKiran Bhandare <kiranx.bhandare@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      ed0907e3
  11. 13 2月, 2021 1 次提交
  12. 09 2月, 2021 1 次提交
  13. 17 12月, 2020 1 次提交
    • B
      ice, xsk: clear the status bits for the next_to_use descriptor · 8d14768a
      Björn Töpel 提交于
      On the Rx side, the next_to_use index points to the next item in the
      HW ring to be refilled/allocated, and next_to_clean points to the next
      item to potentially be processed.
      
      When the HW Rx ring is fully refilled, i.e. no packets has been
      processed, the next_to_use will be next_to_clean - 1. When the ring is
      fully processed next_to_clean will be equal to next_to_use. The latter
      case is where a bug is triggered.
      
      If the next_to_use bits are not cleared, and the "fully processed"
      state is entered, a stale descriptor can be processed.
      
      The skb-path correctly clear the status bit for the next_to_use
      descriptor, but the AF_XDP zero-copy path did not do that.
      
      This change adds the status bits clearing of the next_to_use
      descriptor.
      
      Fixes: 2d4238f5 ("ice: Add support for AF_XDP")
      Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      8d14768a
  14. 15 12月, 2020 1 次提交
  15. 01 9月, 2020 3 次提交
  16. 01 8月, 2020 1 次提交
  17. 29 7月, 2020 1 次提交
    • K
      ice: need_wakeup flag might not be set for Tx · 682dfedc
      Krzysztof Kazimierczak 提交于
      This is a port of i40e commit 70563957 ("i40e: need_wakeup flag might
      not be set for Tx").
      
      Quoting the original commit message:
      
      "The need_wakeup flag for Tx might not be set for AF_XDP sockets that
      are only used to send packets. This happens if there is at least one
      outstanding packet that has not been completed by the hardware and we
      get that corresponding completion (which will not generate an interrupt
      since interrupts are disabled in the napi poll loop) between the time we
      stopped processing the Tx completions and interrupts are enabled again.
      In this case, the need_wakeup flag will have been cleared at the end of
      the Tx completion processing as we believe we will get an interrupt from
      the outstanding completion at a later point in time. But if this
      completion interrupt occurs before interrupts are enable, we lose it and
      should at that point really have set the need_wakeup flag since there
      are no more outstanding completions that can generate an interrupt to
      continue the processing. When this happens, user space will see a Tx
      queue need_wakeup of 0 and skip issuing a syscall, which means will
      never get into the Tx processing again and we have a deadlock."
      
      As a result, packet processing stops. This patch introduces a fix for
      this issue, by always setting the need_wakeup flag at the end of an
      interrupt processing. This ensures that the deadlock will not happen.
      Signed-off-by: NKrzysztof Kazimierczak <krzysztof.kazimierczak@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      682dfedc
  18. 02 7月, 2020 1 次提交
  19. 22 5月, 2020 3 次提交
  20. 15 5月, 2020 1 次提交
  21. 20 2月, 2020 2 次提交
  22. 16 2月, 2020 4 次提交
  23. 13 2月, 2020 1 次提交
  24. 18 1月, 2020 1 次提交
  25. 04 1月, 2020 1 次提交
  26. 21 12月, 2019 1 次提交
  27. 23 11月, 2019 1 次提交
  28. 05 11月, 2019 1 次提交