1. 25 11月, 2020 1 次提交
  2. 20 11月, 2020 1 次提交
  3. 29 10月, 2020 1 次提交
    • M
      xsk: Fix possible memory leak at socket close · e5e1a4bc
      Magnus Karlsson 提交于
      Fix a possible memory leak at xsk socket close that is caused by the
      refcounting of the umem object being wrong. The reference count of the
      umem was decremented only after the pool had been freed. Note that if
      the buffer pool is destroyed, it is important that the umem is
      destroyed after the pool, otherwise the umem would disappear while the
      driver is still running. And as the buffer pool needs to be destroyed
      in a work queue, the umem is also (if its refcount reaches zero)
      destroyed after the buffer pool in that same work queue.
      
      What was missing is that the refcount also needs to be decremented
      when the pool is not freed and when the pool has not even been
      created. The first case happens when the refcount of the pool is
      higher than 1, i.e. it is still being used by some other socket using
      the same device and queue id. In this case, it is safe to decrement
      the refcount of the umem outside of the work queue as the umem will
      never be freed because the refcount of the umem is always greater than
      or equal to the refcount of the buffer pool. The second case is if the
      buffer pool has not been created yet, i.e. the socket was closed
      before it was bound but after the umem was created. In this case, it
      is safe to destroy the umem outside of the work queue, since there is
      no pool that can use it by definition.
      
      Fixes: 1c1efc2a ("xsk: Create and free buffer pool independently from umem")
      Reported-by: syzbot+eb71df123dc2be2c1456@syzkaller.appspotmail.com
      Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NBjörn Töpel <bjorn.topel@intel.com>
      Link: https://lore.kernel.org/bpf/1603801921-2712-1-git-send-email-magnus.karlsson@gmail.com
      e5e1a4bc
  4. 29 9月, 2020 1 次提交
    • M
      xsk: Fix possible crash in socket_release when out-of-memory · 1fd17c8c
      Magnus Karlsson 提交于
      Fix possible crash in socket_release when an out-of-memory error has
      occurred in the bind call. If a socket using the XDP_SHARED_UMEM flag
      encountered an error in xp_create_and_assign_umem, the bind code
      jumped to the exit routine but erroneously forgot to set the err value
      before jumping. This meant that the exit routine thought the setup
      went well and set the state of the socket to XSK_BOUND. The xsk socket
      release code will then, at application exit, think that this is a
      properly setup socket, when it is not, leading to a crash when all
      fields in the socket have in fact not been initialized properly. Fix
      this by setting the err variable in xsk_bind so that the socket is not
      set to XSK_BOUND which leads to the clean-up in xsk_release not being
      triggered.
      
      Fixes: 1c1efc2a ("xsk: Create and free buffer pool independently from umem")
      Reported-by: syzbot+ddc7b4944bc61da19b81@syzkaller.appspotmail.com
      Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/1601112373-10595-1-git-send-email-magnus.karlsson@gmail.com
      1fd17c8c
  5. 17 9月, 2020 1 次提交
  6. 03 9月, 2020 1 次提交
  7. 02 9月, 2020 1 次提交
    • M
      xsk: Fix possible segfault at xskmap entry insertion · 968be23c
      Magnus Karlsson 提交于
      Fix possible segfault when entry is inserted into xskmap. This can
      happen if the socket is in a state where the umem has been set up, the
      Rx ring created but it has yet to be bound to a device. In this case
      the pool has not yet been created and we cannot reference it for the
      existence of the fill ring. Fix this by removing the whole
      xsk_is_setup_for_bpf_map function. Once upon a time, it was used to
      make sure that the Rx and fill rings where set up before the driver
      could call xsk_rcv, since there are no tests for the existence of
      these rings in the data path. But these days, we have a state variable
      that we test instead. When it is XSK_BOUND, everything has been set up
      correctly and the socket has been bound. So no reason to have the
      xsk_is_setup_for_bpf_map function anymore.
      
      Fixes: 7361f9c3 ("xsk: Move fill and completion rings to buffer pool")
      Reported-by: syzbot+febe51d44243fbc564ee@syzkaller.appspotmail.com
      Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/1599037569-26690-1-git-send-email-magnus.karlsson@intel.com
      968be23c
  8. 01 9月, 2020 7 次提交
  9. 28 7月, 2020 1 次提交
  10. 25 7月, 2020 1 次提交
  11. 14 7月, 2020 1 次提交
  12. 12 6月, 2020 1 次提交
  13. 22 5月, 2020 4 次提交
  14. 05 5月, 2020 1 次提交
  15. 27 4月, 2020 1 次提交
  16. 07 4月, 2020 1 次提交
  17. 11 2月, 2020 1 次提交
    • M
      xsk: Publish global consumer pointers when NAPI is finished · 30744a68
      Magnus Karlsson 提交于
      The commit 4b638f13 ("xsk: Eliminate the RX batch size")
      introduced a much more lazy way of updating the global consumer
      pointers from the kernel side, by only doing so when running out of
      entries in the fill or Tx rings (the rings consumed by the
      kernel). This can result in a deadlock with the user application if
      the kernel requires more than one entry to proceed and the application
      cannot put these entries in the fill ring because the kernel has not
      updated the global consumer pointer since the ring is not empty.
      
      Fix this by publishing the local kernel side consumer pointer whenever
      we have completed Rx or Tx processing in the kernel. This way, user
      space will have an up-to-date view of the consumer pointers whenever it
      gets to execute in the one core case (application and driver on the
      same core), or after a certain number of packets have been processed
      in the two core case (application and driver on different cores).
      
      A side effect of this patch is that the one core case gets better
      performance, but the two core case gets worse. The reason that the one
      core case improves is that updating the global consumer pointer is
      relatively cheap since the application by definition is not running
      when the kernel is (they are on the same core) and it is beneficial
      for the application, once it gets to run, to have pointers that are
      as up to date as possible since it then can operate on more packets
      and buffers. In the two core case, the most important performance
      aspect is to minimize the number of accesses to the global pointers
      since they are shared between two cores and bounces between the caches
      of those cores. This patch results in more updates to global state,
      which means lower performance in the two core case.
      
      Fixes: 4b638f13 ("xsk: Eliminate the RX batch size")
      Reported-by: NRyan Goodfellow <rgoodfel@isi.edu>
      Reported-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
      Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
      Acked-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
      Link: https://lore.kernel.org/bpf/1581348432-6747-1-git-send-email-magnus.karlsson@intel.com
      30744a68
  18. 22 1月, 2020 1 次提交
  19. 21 12月, 2019 5 次提交
  20. 20 12月, 2019 1 次提交
  21. 19 12月, 2019 1 次提交
    • M
      xsk: Add rcu_read_lock around the XSK wakeup · 06870682
      Maxim Mikityanskiy 提交于
      The XSK wakeup callback in drivers makes some sanity checks before
      triggering NAPI. However, some configuration changes may occur during
      this function that affect the result of those checks. For example, the
      interface can go down, and all the resources will be destroyed after the
      checks in the wakeup function, but before it attempts to use these
      resources. Wrap this callback in rcu_read_lock to allow driver to
      synchronize_rcu before actually destroying the resources.
      
      xsk_wakeup is a new function that encapsulates calling ndo_xsk_wakeup
      wrapped into the RCU lock. After this commit, xsk_poll starts using
      xsk_wakeup and checks xs->zc instead of ndo_xsk_wakeup != NULL to decide
      ndo_xsk_wakeup should be called. It also fixes a bug introduced with the
      need_wakeup feature: a non-zero-copy socket may be used with a driver
      supporting zero-copy, and in this case ndo_xsk_wakeup should not be
      called, so the xs->zc check is the correct one.
      
      Fixes: 77cd0d7b ("xsk: add support for need_wakeup flag in AF_XDP rings")
      Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
      Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20191217162023.16011-2-maximmi@mellanox.com
      06870682
  22. 25 11月, 2019 1 次提交
  23. 02 11月, 2019 1 次提交
  24. 03 10月, 2019 1 次提交
  25. 25 9月, 2019 1 次提交
  26. 05 9月, 2019 2 次提交
    • B
      xsk: use state member for socket synchronization · 42fddcc7
      Björn Töpel 提交于
      Prior the state variable was introduced by Ilya, the dev member was
      used to determine whether the socket was bound or not. However, when
      dev was read, proper SMP barriers and READ_ONCE were missing. In order
      to address the missing barriers and READ_ONCE, we start using the
      state variable as a point of synchronization. The state member
      read/write is paired with proper SMP barriers, and from this follows
      that the members described above does not need READ_ONCE if used in
      conjunction with state check.
      
      In all syscalls and the xsk_rcv path we check if state is
      XSK_BOUND. If that is the case we do a SMP read barrier, and this
      implies that the dev, umem and all rings are correctly setup. Note
      that no READ_ONCE are needed for these variable if used when state is
      XSK_BOUND (plus the read barrier).
      
      To summarize: The members struct xdp_sock members dev, queue_id, umem,
      fq, cq, tx, rx, and state were read lock-less, with incorrect barriers
      and missing {READ, WRITE}_ONCE. Now, umem, fq, cq, tx, rx, and state
      are read lock-less. When these members are updated, WRITE_ONCE is
      used. When read, READ_ONCE are only used when read outside the control
      mutex (e.g. mmap) or, not synchronized with the state member
      (XSK_BOUND plus smp_rmb())
      
      Note that dev and queue_id do not need a WRITE_ONCE or READ_ONCE, due
      to the introduce state synchronization (XSK_BOUND plus smp_rmb()).
      
      Introducing the state check also fixes a race, found by syzcaller, in
      xsk_poll() where umem could be accessed when stale.
      Suggested-by: NHillf Danton <hdanton@sina.com>
      Reported-by: syzbot+c82697e3043781e08802@syzkaller.appspotmail.com
      Fixes: 77cd0d7b ("xsk: add support for need_wakeup flag in AF_XDP rings")
      Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
      Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      42fddcc7
    • B
      xsk: avoid store-tearing when assigning umem · 9764f4b3
      Björn Töpel 提交于
      The umem member of struct xdp_sock is read outside of the control
      mutex, in the mmap implementation, and needs a WRITE_ONCE to avoid
      potential store-tearing.
      Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
      Fixes: 423f3832 ("xsk: add umem fill queue support and mmap")
      Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      9764f4b3