- 01 9月, 2020 3 次提交
-
-
由 Magnus Karlsson 提交于
Create and free the buffer pool independently from the umem. Move these operations that are performed on the buffer pool from the umem create and destroy functions to new create and destroy functions just for the buffer pool. This so that in later commits we can instantiate multiple buffer pools per umem when sharing a umem between HW queues and/or devices. We also erradicate the back pointer from the umem to the buffer pool as this will not work when we introduce the possibility to have multiple buffer pools per umem. Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NBjörn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-4-git-send-email-magnus.karlsson@intel.com
-
由 Magnus Karlsson 提交于
Rename the AF_XDP zero-copy driver interface functions to better reflect what they do after the replacement of umems with buffer pools in the previous commit. Mostly it is about replacing the umem name from the function names with xsk_buff and also have them take the a buffer pool pointer instead of a umem. The various ring functions have also been renamed in the process so that they have the same naming convention as the internal functions in xsk_queue.h. This so that it will be clearer what they do and also for consistency. Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NBjörn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-3-git-send-email-magnus.karlsson@intel.com
-
由 Magnus Karlsson 提交于
Replace the explicit umem reference passed to the driver in AF_XDP zero-copy mode with the buffer pool instead. This in preparation for extending the functionality of the zero-copy mode so that umems can be shared between queues on the same netdev and also between netdevs. In this commit, only an umem reference has been added to the buffer pool struct. But later commits will add other entities to it. These are going to be entities that are different between different queue ids and netdevs even though the umem is shared between them. Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NBjörn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-2-git-send-email-magnus.karlsson@intel.com
-
- 10 6月, 2020 1 次提交
-
-
由 Michel Lespinasse 提交于
This change converts the existing mmap_sem rwsem calls to use the new mmap locking API instead. The change is generated using coccinelle with the following rule: // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir . @@ expression mm; @@ ( -init_rwsem +mmap_init_lock | -down_write +mmap_write_lock | -down_write_killable +mmap_write_lock_killable | -down_write_trylock +mmap_write_trylock | -up_write +mmap_write_unlock | -downgrade_write +mmap_write_downgrade | -down_read +mmap_read_lock | -down_read_killable +mmap_read_lock_killable | -down_read_trylock +mmap_read_trylock | -up_read +mmap_read_unlock ) -(&mm->mmap_sem) +(mm) Signed-off-by: NMichel Lespinasse <walken@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: NLaurent Dufour <ldufour@linux.ibm.com> Reviewed-by: NVlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 6月, 2020 1 次提交
-
-
由 Pavel Machek 提交于
64bit division is kind of expensive, and shift should do the job here. Signed-off-by: NPavel Machek (CIP) <pavel@denx.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 5月, 2020 1 次提交
-
-
由 Björn Töpel 提交于
The npgs member of struct xdp_umem is an u32 entity, and stores the number of pages the UMEM consumes. The calculation of npgs npgs = size / PAGE_SIZE can overflow. To avoid overflow scenarios, the division is now first stored in a u64, and the result is verified to fit into 32b. An alternative would be storing the npgs as a u64, however, this wastes memory and is an unrealisticly large packet area. Fixes: c0c77d8f ("xsk: add user memory registration support sockopt") Reported-by: N"Minh Bùi Quang" <minhquangbui99@gmail.com> Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com> Link: https://lore.kernel.org/bpf/CACtPs=GGvV-_Yj6rbpzTVnopgi5nhMoCcTkSkYrJHGQHJWFZMQ@mail.gmail.com/ Link: https://lore.kernel.org/bpf/20200525080400.13195-1-bjorn.topel@gmail.com
-
- 22 5月, 2020 2 次提交
-
-
由 Björn Töpel 提交于
There are no users of MEM_TYPE_ZERO_COPY. Remove all corresponding code, including the "handle" member of struct xdp_buff. rfc->v1: Fixed spelling in commit message. (Björn) Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200520192103.355233-13-bjorn.topel@gmail.com
-
由 Björn Töpel 提交于
In order to simplify AF_XDP zero-copy enablement for NIC driver developers, a new AF_XDP buffer allocation API is added. The implementation is based on a single core (single producer/consumer) buffer pool for the AF_XDP UMEM. A buffer is allocated using the xsk_buff_alloc() function, and returned using xsk_buff_free(). If a buffer is disassociated with the pool, e.g. when a buffer is passed to an AF_XDP socket, a buffer is said to be released. Currently, the release function is only used by the AF_XDP internals and not visible to the driver. Drivers using this API should register the XDP memory model with the new MEM_TYPE_XSK_BUFF_POOL type. The API is defined in net/xdp_sock_drv.h. The buffer type is struct xdp_buff, and follows the lifetime of regular xdp_buffs, i.e. the lifetime of an xdp_buff is restricted to a NAPI context. In other words, the API is not replacing xdp_frames. In addition to introducing the API and implementations, the AF_XDP core is migrated to use the new APIs. rfc->v1: Fixed build errors/warnings for m68k and riscv. (kbuild test robot) Added headroom/chunk size getter. (Maxim/Björn) v1->v2: Swapped SoBs. (Maxim) v2->v3: Initialize struct xdp_buff member frame_sz. (Björn) Add API to query the DMA address of a frame. (Maxim) Do DMA sync for CPU till the end of the frame to handle possible growth (frame_sz). (Maxim) Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com> Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200520192103.355233-6-bjorn.topel@gmail.com
-
- 05 5月, 2020 2 次提交
-
-
由 Magnus Karlsson 提交于
Remove the unnecessary member of address in struct xdp_umem as it is only used during the umem registration. No need to carry this around as it is not used during run-time nor when unregistering the umem. Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com> Link: https://lore.kernel.org/bpf/1588599232-24897-3-git-send-email-magnus.karlsson@intel.com
-
由 Magnus Karlsson 提交于
Change two variables names so that it is clearer what they represent. The first one is xsk_list that in fact only contains the list of AF_XDP sockets with a Tx component. Change this to xsk_tx_list for improved clarity. The second variable is size in the ring structure. One might think that this is the size of the ring, but it is in fact the size of the umem, copied into the ring structure to improve performance. Rename this variable umem_size to avoid any confusion. Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com> Link: https://lore.kernel.org/bpf/1588599232-24897-2-git-send-email-magnus.karlsson@intel.com
-
- 15 4月, 2020 1 次提交
-
-
由 Magnus Karlsson 提交于
Add a check that the headroom cannot be larger than the available space in the chunk. In the current code, a malicious user can set the headroom to a value larger than the chunk size minus the fixed XDP headroom. That way packets with a length larger than the supported size in the umem could get accepted and result in an out-of-bounds write. Fixes: c0c77d8f ("xsk: add user memory registration support sockopt") Reported-by: NBui Quang Minh <minhquangbui99@gmail.com> Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Link: https://bugzilla.kernel.org/show_bug.cgi?id=207225 Link: https://lore.kernel.org/bpf/1586849715-23490-1-git-send-email-magnus.karlsson@intel.com
-
- 01 2月, 2020 2 次提交
-
-
由 John Hubbard 提交于
In order to provide a clearer, more symmetric API for pinning and unpinning DMA pages. This way, pin_user_pages*() calls match up with unpin_user_pages*() calls, and the API is a lot closer to being self-explanatory. Link: http://lkml.kernel.org/r/20200107224558.2362728-23-jhubbard@nvidia.comSigned-off-by: NJohn Hubbard <jhubbard@nvidia.com> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 John Hubbard 提交于
Convert net/xdp to use the new pin_longterm_pages() call, which sets FOLL_PIN. Setting FOLL_PIN is now required for code that requires tracking of pinned pages. In partial anticipation of this work, the net/xdp code was already calling put_user_page() instead of put_page(). Therefore, in order to convert from the get_user_pages()/put_page() model, to the pin_user_pages()/put_user_page() model, the only change required here is to change get_user_pages() to pin_user_pages(). Link: http://lkml.kernel.org/r/20200107224558.2362728-18-jhubbard@nvidia.comSigned-off-by: NJohn Hubbard <jhubbard@nvidia.com> Acked-by: NBjörn Töpel <bjorn.topel@intel.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 1月, 2020 1 次提交
-
-
由 Magnus Karlsson 提交于
When registering a umem area that is sufficiently large (>1G on an x86), kmalloc cannot be used to allocate one of the internal data structures, as the size requested gets too large. Use kvmalloc instead that falls back on vmalloc if the allocation is too large for kmalloc. Also add accounting for this structure as it is triggered by a user space action (the XDP_UMEM_REG setsockopt) and it is by far the largest structure of kernel allocated memory in xsk. Reported-by: NRyan Goodfellow <rgoodfel@isi.edu> Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com> Link: https://lore.kernel.org/bpf/1578995365-7050-1-git-send-email-magnus.karlsson@intel.com
-
- 24 10月, 2019 1 次提交
-
-
由 Magnus Karlsson 提交于
Having Rx-only AF_XDP sockets can potentially lead to a crash in the system by a NULL pointer dereference in xsk_umem_consume_tx(). This function iterates through a list of all sockets tied to a umem and checks if there are any packets to send on the Tx ring. Rx-only sockets do not have a Tx ring, so this will cause a NULL pointer dereference. This will happen if you have registered one or more Rx-only sockets to a umem and the driver is checking the Tx ring even on Rx, or if the XDP_SHARED_UMEM mode is used and there is a mix of Rx-only and other sockets tied to the same umem. Fixed by only putting sockets with a Tx component on the list that xsk_umem_consume_tx() iterates over. Fixes: ac98d8aa ("xsk: wire upp Tx zero-copy functions") Reported-by: NKal Cutter Conley <kal.conley@dectris.com> Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com> Link: https://lore.kernel.org/bpf/1571645818-16244-1-git-send-email-magnus.karlsson@intel.com
-
- 25 9月, 2019 1 次提交
-
-
由 John Hubbard 提交于
For pages that were retained via get_user_pages*(), release those pages via the new put_user_page*() routines, instead of via put_page() or release_pages(). This is part a tree-wide conversion, as described in fc1d8e7c ("mm: introduce put_user_page*(), placeholder versions"). Link: http://lkml.kernel.org/r/20190724044537.10458-4-jhubbard@nvidia.comSigned-off-by: NJohn Hubbard <jhubbard@nvidia.com> Acked-by: NBjörn Töpel <bjorn.topel@intel.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Magnus Karlsson <magnus.karlsson@intel.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 19 9月, 2019 1 次提交
-
-
由 Björn Töpel 提交于
This patch removes the 64B alignment of the UMEM headroom. There is really no reason for it, and having a headroom less than 64B should be valid. Fixes: c0c77d8f ("xsk: add user memory registration support sockopt") Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 31 8月, 2019 1 次提交
-
-
由 Kevin Laatz 提交于
Currently, addresses are chunk size aligned. This means, we are very restricted in terms of where we can place chunk within the umem. For example, if we have a chunk size of 2k, then our chunks can only be placed at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0). This patch introduces the ability to use unaligned chunks. With these changes, we are no longer bound to having to place chunks at a 2k (or whatever your chunk size is) interval. Since we are no longer dealing with aligned chunks, they can now cross page boundaries. Checks for page contiguity have been added in order to keep track of which pages are followed by a physically contiguous page. Signed-off-by: NKevin Laatz <kevin.laatz@intel.com> Signed-off-by: NCiara Loftus <ciara.loftus@intel.com> Signed-off-by: NBruce Richardson <bruce.richardson@intel.com> Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 21 8月, 2019 1 次提交
-
-
由 Ivan Khoronzhuk 提交于
For 64-bit there is no reason to use vmap/vunmap, so use page_address as it was initially. For 32 bits, in some apps, like in samples xdpsock_user.c when number of pgs in use is quite big, the kmap memory can be not enough, despite on this, kmap looks like is deprecated in such cases as it can block and should be used rather for dynamic mm. Signed-off-by: NIvan Khoronzhuk <ivan.khoronzhuk@linaro.org> Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 20 8月, 2019 1 次提交
-
-
由 Ivan Khoronzhuk 提交于
Fix mem leak caused by missed unpin routine for umem pages. Fixes: 8aef7340 ("xsk: introduce xdp_umem_page") Signed-off-by: NIvan Khoronzhuk <ivan.khoronzhuk@linaro.org> Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 18 8月, 2019 2 次提交
-
-
由 Magnus Karlsson 提交于
This commit adds support for a new flag called need_wakeup in the AF_XDP Tx and fill rings. When this flag is set, it means that the application has to explicitly wake up the kernel Rx (for the bit in the fill ring) or kernel Tx (for bit in the Tx ring) processing by issuing a syscall. Poll() can wake up both depending on the flags submitted and sendto() will wake up tx processing only. The main reason for introducing this new flag is to be able to efficiently support the case when application and driver is executing on the same core. Previously, the driver was just busy-spinning on the fill ring if it ran out of buffers in the HW and there were none on the fill ring. This approach works when the application is running on another core as it can replenish the fill ring while the driver is busy-spinning. Though, this is a lousy approach if both of them are running on the same core as the probability of the fill ring getting more entries when the driver is busy-spinning is zero. With this new feature the driver now sets the need_wakeup flag and returns to the application. The application can then replenish the fill queue and then explicitly wake up the Rx processing in the kernel using the syscall poll(). For Tx, the flag is only set to one if the driver has no outstanding Tx completion interrupts. If it has some, the flag is zero as it will be woken up by a completion interrupt anyway. As a nice side effect, this new flag also improves the performance of the case where application and driver are running on two different cores as it reduces the number of syscalls to the kernel. The kernel tells user space if it needs to be woken up by a syscall, and this eliminates many of the syscalls. This flag needs some simple driver support. If the driver does not support this, the Rx flag is always zero and the Tx flag is always one. This makes any application relying on this feature default to the old behaviour of not requiring any syscalls in the Rx path and always having to call sendto() in the Tx path. For backwards compatibility reasons, this feature has to be explicitly turned on using a new bind flag (XDP_USE_NEED_WAKEUP). I recommend that you always turn it on as it so far always have had a positive performance impact. The name and inspiration of the flag has been taken from io_uring by Jens Axboe. Details about this feature in io_uring can be found in http://kernel.dk/io_uring.pdf, section 8.3. Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com> Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Magnus Karlsson 提交于
This commit replaces ndo_xsk_async_xmit with ndo_xsk_wakeup. This new ndo provides the same functionality as before but with the addition of a new flags field that is used to specifiy if Rx, Tx or both should be woken up. The previous ndo only woke up Tx, as implied by the name. The i40e and ixgbe drivers (which are all the supported ones) are updated with this new interface. This new ndo will be used by the new need_wakeup functionality of XDP sockets that need to be able to wake up both Rx and Tx driver processing. Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com> Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 10 8月, 2019 1 次提交
-
-
由 Ivan Khoronzhuk 提交于
Use kmap instead of page_address as it's not always in low memory. Acked-by: NBjörn Töpel <bjorn.topel@intel.com> Signed-off-by: NIvan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 12 7月, 2019 1 次提交
-
-
由 Ilya Maximets 提交于
There are 2 call chains: a) xsk_bind --> xdp_umem_assign_dev b) unregister_netdevice_queue --> xsk_notifier with the following locking order: a) xs->mutex --> rtnl_lock b) rtnl_lock --> xdp.lock --> xs->mutex Different order of taking 'xs->mutex' and 'rtnl_lock' could produce a deadlock here. Fix that by moving the 'rtnl_lock' before 'xs->lock' in the bind call chain (a). Reported-by: syzbot+bf64ec93de836d7f4c2c@syzkaller.appspotmail.com Fixes: 455302d1 ("xdp: fix hang while unregistering device bound to xdp socket") Signed-off-by: NIlya Maximets <i.maximets@samsung.com> Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 03 7月, 2019 2 次提交
-
-
由 Ilya Maximets 提交于
Device that bound to XDP socket will not have zero refcount until the userspace application will not close it. This leads to hang inside 'netdev_wait_allrefs()' if device unregistering requested: # ip link del p1 < hang on recvmsg on netlink socket > # ps -x | grep ip 5126 pts/0 D+ 0:00 ip link del p1 # journalctl -b Jun 05 07:19:16 kernel: unregister_netdevice: waiting for p1 to become free. Usage count = 1 Jun 05 07:19:27 kernel: unregister_netdevice: waiting for p1 to become free. Usage count = 1 ... Fix that by implementing NETDEV_UNREGISTER event notification handler to properly clean up all the resources and unref device. This should also allow socket killing via ss(8) utility. Fixes: 965a9909 ("xsk: add support for bind for Rx") Signed-off-by: NIlya Maximets <i.maximets@samsung.com> Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Ilya Maximets 提交于
Device pointer stored in umem regardless of zero-copy mode, so we heed to hold the device in all cases. Fixes: c9b47cc1 ("xsk: fix bug when trying to use both copy and zero-copy on one queue id") Signed-off-by: NIlya Maximets <i.maximets@samsung.com> Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 12 6月, 2019 1 次提交
-
-
由 Ilya Maximets 提交于
We should not call 'ndo_bpf()' or 'dev_put()' with NULL argument. Fixes: c9b47cc1 ("xsk: fix bug when trying to use both copy and zero-copy on one queue id") Signed-off-by: NIlya Maximets <i.maximets@samsung.com> Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 15 5月, 2019 1 次提交
-
-
由 Ira Weiny 提交于
Pach series "Add FOLL_LONGTERM to GUP fast and use it". HFI1, qib, and mthca, use get_user_pages_fast() due to its performance advantages. These pages can be held for a significant time. But get_user_pages_fast() does not protect against mapping FS DAX pages. Introduce FOLL_LONGTERM and use this flag in get_user_pages_fast() which retains the performance while also adding the FS DAX checks. XDP has also shown interest in using this functionality.[1] In addition we change get_user_pages() to use the new FOLL_LONGTERM flag and remove the specialized get_user_pages_longterm call. [1] https://lkml.org/lkml/2019/3/19/939 "longterm" is a relative thing and at this point is probably a misnomer. This is really flagging a pin which is going to be given to hardware and can't move. I've thought of a couple of alternative names but I think we have to settle on if we are going to use FL_LAYOUT or something else to solve the "longterm" problem. Then I think we can change the flag to a better name. Secondly, it depends on how often you are registering memory. I have spoken with some RDMA users who consider MR in the performance path... For the overall application performance. I don't have the numbers as the tests for HFI1 were done a long time ago. But there was a significant advantage. Some of which is probably due to the fact that you don't have to hold mmap_sem. Finally, architecturally I think it would be good for everyone to use *_fast. There are patches submitted to the RDMA list which would allow the use of *_fast (they reworking the use of mmap_sem) and as soon as they are accepted I'll submit a patch to convert the RDMA core as well. Also to this point others are looking to use *_fast. As an aside, Jasons pointed out in my previous submission that *_fast and *_unlocked look very much the same. I agree and I think further cleanup will be coming. But I'm focused on getting the final solution for DAX at the moment. This patch (of 7): This patch starts a series which aims to support FOLL_LONGTERM in get_user_pages_fast(). Some callers who would like to do a longterm (user controlled pin) of pages with the fast variant of GUP for performance purposes. Rather than have a separate get_user_pages_longterm() call, introduce FOLL_LONGTERM and change the longterm callers to use it. This patch does not change any functionality. In the short term "longterm" or user controlled pins are unsafe for Filesystems and FS DAX in particular has been blocked. However, callers of get_user_pages_fast() were not "protected". FOLL_LONGTERM can _only_ be supported with get_user_pages[_fast]() as it requires vmas to determine if DAX is in use. NOTE: In merging with the CMA changes we opt to change the get_user_pages() call in check_and_migrate_cma_pages() to a call of __get_user_pages_locked() on the newly migrated pages. This makes the code read better in that we are calling __get_user_pages_locked() on the pages before and after a potential migration. As a side affect some of the interfaces are cleaned up but this is not the primary purpose of the series. In review[1] it was asked: <quote> > This I don't get - if you do lock down long term mappings performance > of the actual get_user_pages call shouldn't matter to start with. > > What do I miss? A couple of points. First "longterm" is a relative thing and at this point is probably a misnomer. This is really flagging a pin which is going to be given to hardware and can't move. I've thought of a couple of alternative names but I think we have to settle on if we are going to use FL_LAYOUT or something else to solve the "longterm" problem. Then I think we can change the flag to a better name. Second, It depends on how often you are registering memory. I have spoken with some RDMA users who consider MR in the performance path... For the overall application performance. I don't have the numbers as the tests for HFI1 were done a long time ago. But there was a significant advantage. Some of which is probably due to the fact that you don't have to hold mmap_sem. Finally, architecturally I think it would be good for everyone to use *_fast. There are patches submitted to the RDMA list which would allow the use of *_fast (they reworking the use of mmap_sem) and as soon as they are accepted I'll submit a patch to convert the RDMA core as well. Also to this point others are looking to use *_fast. As an asside, Jasons pointed out in my previous submission that *_fast and *_unlocked look very much the same. I agree and I think further cleanup will be coming. But I'm focused on getting the final solution for DAX at the moment. </quote> [1] https://lore.kernel.org/lkml/20190220180255.GA12020@iweiny-DESK2.sc.intel.com/T/#md6abad2569f3bf6c1f03686c8097ab6563e94965 [ira.weiny@intel.com: v3] Link: http://lkml.kernel.org/r/20190328084422.29911-2-ira.weiny@intel.com Link: http://lkml.kernel.org/r/20190328084422.29911-2-ira.weiny@intel.com Link: http://lkml.kernel.org/r/20190317183438.2057-2-ira.weiny@intel.comSigned-off-by: NIra Weiny <ira.weiny@intel.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Rich Felker <dalias@libc.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: James Hogan <jhogan@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Mike Marshall <hubcap@omnibond.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 3月, 2019 1 次提交
-
-
由 Björn Töpel 提交于
When the umem is cleaned up, the task that created it might already be gone. If the task was gone, the xdp_umem_release function did not free the pages member of struct xdp_umem. It turned out that the task lookup was not needed at all; The code was a left-over when we moved from task accounting to user accounting [1]. This patch fixes the memory leak by removing the task lookup logic completely. [1] https://lore.kernel.org/netdev/20180131135356.19134-3-bjorn.topel@gmail.com/ Link: https://lore.kernel.org/netdev/c1cb2ca8-6a14-3980-8672-f3de0bb38dfd@suse.cz/ Fixes: c0c77d8f ("xsk: add user memory registration support sockopt") Reported-by: NJiri Slaby <jslaby@suse.cz> Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 13 2月, 2019 1 次提交
-
-
由 Björn Töpel 提交于
Commit c9b47cc1 ("xsk: fix bug when trying to use both copy and zero-copy on one queue id") stores the umem into the netdev._rx struct. However, the patch incorrectly removed the umem from the netdev._rx struct when user-space passed "best-effort" mode (i.e. select the fastest possible option available), and zero-copy mode was not available. This commit fixes that. Fixes: c9b47cc1 ("xsk: fix bug when trying to use both copy and zero-copy on one queue id") Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 12 2月, 2019 1 次提交
-
-
由 Davidlohr Bueso 提交于
Holding mmap_sem exclusively for a gup() is an overkill. Lets share the lock and replace the gup call for gup_longterm(), as it is better suited for the lifetime of the pinning. Fixes: c0c77d8f ("xsk: add user memory registration support sockopt") Signed-off-by: NDavidlohr Bueso <dbueso@suse.de> Cc: David S. Miller <davem@davemloft.net> Cc: Bjorn Topel <bjorn.topel@intel.com> Cc: Magnus Karlsson <magnus.karlsson@intel.com> CC: netdev@vger.kernel.org Acked-by: NBjörn Töpel <bjorn.topel@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 25 1月, 2019 1 次提交
-
-
由 Björn Töpel 提交于
This commit adds an id to the umem structure. The id uniquely identifies a umem instance, and will be exposed to user-space via the socket monitoring interface. Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 22 1月, 2019 1 次提交
-
-
由 Jan Sokolowski 提交于
Export xdp_get_umem_from_qid for other modules to use. Signed-off-by: NJan Sokolowski <jan.sokolowski@intel.com> Acked-by: NBjörn Töpel <bjorn.topel@intel.com> Tested-by: NAndrew Bowers <andrewx.bowers@intel.com> Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
-
- 16 1月, 2019 1 次提交
-
-
由 Krzysztof Kazimierczak 提交于
In the xdp_umem_assign_dev() path, the xsk code does not check if a queue for which umem is to be created exists. It leads to a situation where umem is not assigned to any Tx/Rx queue of a netdevice, without notifying the stack about an error. This affects both XDP_SKB and XDP_DRV modes - in case of XDP_DRV_ZC, queue index is checked by the driver. This patch fixes xsk code, so that in both XDP_SKB and XDP_DRV mode of AF_XDP, an error is returned when requested queue index exceedes an existing maximum. Fixes: c9b47cc1 ("xsk: fix bug when trying to use both copy and zero-copy on one queue id") Reported-by: NJakub Spizewski <jakub.spizewski@intel.com> Signed-off-by: NKrzysztof Kazimierczak <krzysztof.kazimierczak@intel.com> Acked-by: NBjörn Töpel <bjorn.topel@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 08 10月, 2018 1 次提交
-
-
由 Björn Töpel 提交于
The AF_XDP socket struct can exist in three different, implicit states: setup, bound and released. Setup is prior the socket has been bound to a device. Bound is when the socket is active for receive and send. Released is when the process/userspace side of the socket is released, but the sock object is still lingering, e.g. when there is a reference to the socket in an XSKMAP after process termination. The Rx fast-path code uses the "dev" member of struct xdp_sock to check whether a socket is bound or relased, and the Tx code uses the struct xdp_umem "xsk_list" member in conjunction with "dev" to determine the state of a socket. However, the transition from bound to released did not tear the socket down in correct order. On the Rx side "dev" was cleared after synchronize_net() making the synchronization useless. On the Tx side, the internal queues were destroyed prior removing them from the "xsk_list". This commit corrects the cleanup order, and by doing so xdp_del_sk_umem() can be simplified and one synchronize_net() can be removed. Fixes: 965a9909 ("xsk: add support for bind for Rx") Fixes: ac98d8aa ("xsk: wire upp Tx zero-copy functions") Reported-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com> Acked-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 05 10月, 2018 3 次提交
-
-
由 Magnus Karlsson 提交于
As we now do not allow ethtool to deactivate the queue id we are running an AF_XDP socket on, we can simplify the implementation of xdp_clear_umem_at_qid(). Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Jakub Kicinski 提交于
We already check the RSS indirection table does not use queues which would be disabled by channel reconfiguration. Make sure user does not try to disable queues which have a UMEM and zero-copy AF_XDP socket installed. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Magnus Karlsson 提交于
Previously, the xsk code did not record which umem was bound to a specific queue id. This was not required if all drivers were zero-copy enabled as this had to be recorded in the driver anyway. So if a user tried to bind two umems to the same queue, the driver would say no. But if copy-mode was first enabled and then zero-copy mode (or the reverse order), we mistakenly enabled both of them on the same umem leading to buggy behavior. The main culprit for this is that we did not store the association of umem to queue id in the copy case and only relied on the driver reporting this. As this relation was not stored in the driver for copy mode (it does not rely on the AF_XDP NDOs), this obviously could not work. This patch fixes the problem by always recording the umem to queue id relationship in the netdev_queue and netdev_rx_queue structs. This way we always know what kind of umem has been bound to a queue id and can act appropriately at bind time. Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 26 9月, 2018 1 次提交
-
-
由 Jakub Kicinski 提交于
XSK UMEM is strongly single producer single consumer so reuse of frames is challenging. Add a simple "stash" of FILL packets to reuse for drivers to optionally make use of. This is useful when driver has to free (ndo_stop) or resize a ring with an active AF_XDP ZC socket. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Tested-by: NAndrew Bowers <andrewx.bowers@intel.com> Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
-
- 01 9月, 2018 1 次提交
-
-
由 Magnus Karlsson 提交于
This commit gets rid of the structure xdp_umem_props. It was there to be able to break a dependency at one point, but this is no longer needed. The values in the struct are instead stored directly in the xdp_umem structure. This simplifies the xsk code as well as af_xdp zero-copy drivers and as a bonus gets rid of one internal header file. The i40e driver is also adapted to the new interface in this commit. Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-