1. 15 4月, 2020 1 次提交
  2. 07 4月, 2020 1 次提交
  3. 29 2月, 2020 1 次提交
  4. 11 2月, 2020 1 次提交
    • M
      xsk: Publish global consumer pointers when NAPI is finished · 30744a68
      Magnus Karlsson 提交于
      The commit 4b638f13 ("xsk: Eliminate the RX batch size")
      introduced a much more lazy way of updating the global consumer
      pointers from the kernel side, by only doing so when running out of
      entries in the fill or Tx rings (the rings consumed by the
      kernel). This can result in a deadlock with the user application if
      the kernel requires more than one entry to proceed and the application
      cannot put these entries in the fill ring because the kernel has not
      updated the global consumer pointer since the ring is not empty.
      
      Fix this by publishing the local kernel side consumer pointer whenever
      we have completed Rx or Tx processing in the kernel. This way, user
      space will have an up-to-date view of the consumer pointers whenever it
      gets to execute in the one core case (application and driver on the
      same core), or after a certain number of packets have been processed
      in the two core case (application and driver on different cores).
      
      A side effect of this patch is that the one core case gets better
      performance, but the two core case gets worse. The reason that the one
      core case improves is that updating the global consumer pointer is
      relatively cheap since the application by definition is not running
      when the kernel is (they are on the same core) and it is beneficial
      for the application, once it gets to run, to have pointers that are
      as up to date as possible since it then can operate on more packets
      and buffers. In the two core case, the most important performance
      aspect is to minimize the number of accesses to the global pointers
      since they are shared between two cores and bounces between the caches
      of those cores. This patch results in more updates to global state,
      which means lower performance in the two core case.
      
      Fixes: 4b638f13 ("xsk: Eliminate the RX batch size")
      Reported-by: NRyan Goodfellow <rgoodfel@isi.edu>
      Reported-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
      Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
      Acked-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
      Link: https://lore.kernel.org/bpf/1581348432-6747-1-git-send-email-magnus.karlsson@intel.com
      30744a68
  5. 01 2月, 2020 2 次提交
    • J
      mm, tree-wide: rename put_user_page*() to unpin_user_page*() · f1f6a7dd
      John Hubbard 提交于
      In order to provide a clearer, more symmetric API for pinning and
      unpinning DMA pages.  This way, pin_user_pages*() calls match up with
      unpin_user_pages*() calls, and the API is a lot closer to being
      self-explanatory.
      
      Link: http://lkml.kernel.org/r/20200107224558.2362728-23-jhubbard@nvidia.comSigned-off-by: NJohn Hubbard <jhubbard@nvidia.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Alex Williamson <alex.williamson@redhat.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Björn Töpel <bjorn.topel@intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Leon Romanovsky <leonro@mellanox.com>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f1f6a7dd
    • J
      net/xdp: set FOLL_PIN via pin_user_pages() · fb48b474
      John Hubbard 提交于
      Convert net/xdp to use the new pin_longterm_pages() call, which sets
      FOLL_PIN.  Setting FOLL_PIN is now required for code that requires
      tracking of pinned pages.
      
      In partial anticipation of this work, the net/xdp code was already calling
      put_user_page() instead of put_page().  Therefore, in order to convert
      from the get_user_pages()/put_page() model, to the
      pin_user_pages()/put_user_page() model, the only change required here is
      to change get_user_pages() to pin_user_pages().
      
      Link: http://lkml.kernel.org/r/20200107224558.2362728-18-jhubbard@nvidia.comSigned-off-by: NJohn Hubbard <jhubbard@nvidia.com>
      Acked-by: NBjörn Töpel <bjorn.topel@intel.com>
      Cc: Alex Williamson <alex.williamson@redhat.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Leon Romanovsky <leonro@mellanox.com>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fb48b474
  6. 22 1月, 2020 1 次提交
  7. 16 1月, 2020 1 次提交
  8. 21 12月, 2019 12 次提交
  9. 20 12月, 2019 1 次提交
  10. 19 12月, 2019 1 次提交
    • M
      xsk: Add rcu_read_lock around the XSK wakeup · 06870682
      Maxim Mikityanskiy 提交于
      The XSK wakeup callback in drivers makes some sanity checks before
      triggering NAPI. However, some configuration changes may occur during
      this function that affect the result of those checks. For example, the
      interface can go down, and all the resources will be destroyed after the
      checks in the wakeup function, but before it attempts to use these
      resources. Wrap this callback in rcu_read_lock to allow driver to
      synchronize_rcu before actually destroying the resources.
      
      xsk_wakeup is a new function that encapsulates calling ndo_xsk_wakeup
      wrapped into the RCU lock. After this commit, xsk_poll starts using
      xsk_wakeup and checks xs->zc instead of ndo_xsk_wakeup != NULL to decide
      ndo_xsk_wakeup should be called. It also fixes a bug introduced with the
      need_wakeup feature: a non-zero-copy socket may be used with a driver
      supporting zero-copy, and in this case ndo_xsk_wakeup should not be
      called, so the xs->zc check is the correct one.
      
      Fixes: 77cd0d7b ("xsk: add support for need_wakeup flag in AF_XDP rings")
      Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
      Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20191217162023.16011-2-maximmi@mellanox.com
      06870682
  11. 25 11月, 2019 1 次提交
  12. 02 11月, 2019 1 次提交
  13. 24 10月, 2019 1 次提交
  14. 03 10月, 2019 1 次提交
  15. 25 9月, 2019 2 次提交
  16. 19 9月, 2019 1 次提交
  17. 05 9月, 2019 4 次提交
  18. 31 8月, 2019 1 次提交
  19. 21 8月, 2019 1 次提交
  20. 20 8月, 2019 1 次提交
  21. 18 8月, 2019 3 次提交
    • B
      xsk: remove AF_XDP socket from map when the socket is released · 0402acd6
      Björn Töpel 提交于
      When an AF_XDP socket is released/closed the XSKMAP still holds a
      reference to the socket in a "released" state. The socket will still
      use the netdev queue resource, and block newly created sockets from
      attaching to that queue, but no user application can access the
      fill/complete/rx/tx queues. This results in that all applications need
      to explicitly clear the map entry from the old "zombie state"
      socket. This should be done automatically.
      
      In this patch, the sockets tracks, and have a reference to, which maps
      it resides in. When the socket is released, it will remove itself from
      all maps.
      Suggested-by: NBruce Richardson <bruce.richardson@intel.com>
      Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      0402acd6
    • M
      xsk: add support for need_wakeup flag in AF_XDP rings · 77cd0d7b
      Magnus Karlsson 提交于
      This commit adds support for a new flag called need_wakeup in the
      AF_XDP Tx and fill rings. When this flag is set, it means that the
      application has to explicitly wake up the kernel Rx (for the bit in
      the fill ring) or kernel Tx (for bit in the Tx ring) processing by
      issuing a syscall. Poll() can wake up both depending on the flags
      submitted and sendto() will wake up tx processing only.
      
      The main reason for introducing this new flag is to be able to
      efficiently support the case when application and driver is executing
      on the same core. Previously, the driver was just busy-spinning on the
      fill ring if it ran out of buffers in the HW and there were none on
      the fill ring. This approach works when the application is running on
      another core as it can replenish the fill ring while the driver is
      busy-spinning. Though, this is a lousy approach if both of them are
      running on the same core as the probability of the fill ring getting
      more entries when the driver is busy-spinning is zero. With this new
      feature the driver now sets the need_wakeup flag and returns to the
      application. The application can then replenish the fill queue and
      then explicitly wake up the Rx processing in the kernel using the
      syscall poll(). For Tx, the flag is only set to one if the driver has
      no outstanding Tx completion interrupts. If it has some, the flag is
      zero as it will be woken up by a completion interrupt anyway.
      
      As a nice side effect, this new flag also improves the performance of
      the case where application and driver are running on two different
      cores as it reduces the number of syscalls to the kernel. The kernel
      tells user space if it needs to be woken up by a syscall, and this
      eliminates many of the syscalls.
      
      This flag needs some simple driver support. If the driver does not
      support this, the Rx flag is always zero and the Tx flag is always
      one. This makes any application relying on this feature default to the
      old behaviour of not requiring any syscalls in the Rx path and always
      having to call sendto() in the Tx path.
      
      For backwards compatibility reasons, this feature has to be explicitly
      turned on using a new bind flag (XDP_USE_NEED_WAKEUP). I recommend
      that you always turn it on as it so far always have had a positive
      performance impact.
      
      The name and inspiration of the flag has been taken from io_uring by
      Jens Axboe. Details about this feature in io_uring can be found in
      http://kernel.dk/io_uring.pdf, section 8.3.
      Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
      Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      77cd0d7b
    • M
      xsk: replace ndo_xsk_async_xmit with ndo_xsk_wakeup · 9116e5e2
      Magnus Karlsson 提交于
      This commit replaces ndo_xsk_async_xmit with ndo_xsk_wakeup. This new
      ndo provides the same functionality as before but with the addition of
      a new flags field that is used to specifiy if Rx, Tx or both should be
      woken up. The previous ndo only woke up Tx, as implied by the
      name. The i40e and ixgbe drivers (which are all the supported ones)
      are updated with this new interface.
      
      This new ndo will be used by the new need_wakeup functionality of XDP
      sockets that need to be able to wake up both Rx and Tx driver
      processing.
      Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
      Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      9116e5e2
  22. 10 8月, 2019 1 次提交