- 04 7月, 2022 1 次提交
-
-
由 Matthew Wilcox (Oracle) 提交于
Saves 11 bytes of text by removing a check of PageTail. Link: https://lkml.kernel.org/r/20220617175020.717127-16-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
-
- 13 5月, 2022 1 次提交
-
-
由 Jie Wang 提交于
Currently If use page pool allocation stats to analysis a RX performance degradation problem. These stats only count for pages allocate from page_pool_alloc_pages. But nic drivers such as hns3 use page_pool_dev_alloc_frag to allocate pages, so page stats in this API should also be counted. Signed-off-by: NJie Wang <wangjie125@huawei.com> Signed-off-by: NGuangbin Huang <huangguangbin2@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 4月, 2022 1 次提交
-
-
由 Lorenzo Bianconi 提交于
Introduce page_pool APIs to report stats through ethtool and reduce duplicated code in each driver. Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: NJakub Kicinski <kuba@kernel.org> Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 4月, 2022 1 次提交
-
-
由 Lorenzo Bianconi 提交于
Add missing recycle stats to page_pool_put_page_bulk routine. Reviewed-by: NJoe Damato <jdamato@fastly.com> Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org> Link: https://lore.kernel.org/r/3712178b51c007cfaed910ea80e68f00c916b1fa.1649685634.git.lorenzo@kernel.orgSigned-off-by: NPaolo Abeni <pabeni@redhat.com>
-
- 03 3月, 2022 3 次提交
-
-
由 Joe Damato 提交于
Adds a function page_pool_get_stats which can be used by drivers to obtain stats for a specified page_pool. Signed-off-by: NJoe Damato <jdamato@fastly.com> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Joe Damato 提交于
Add per-cpu stats tracking page pool recycling events: - cached: recycling placed page in the page pool cache - cache_full: page pool cache was full - ring: page placed into the ptr ring - ring_full: page released from page pool because the ptr ring was full - released_refcnt: page released (and not recycled) because refcnt > 1 Signed-off-by: NJoe Damato <jdamato@fastly.com> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Joe Damato 提交于
Add per-pool statistics counters for the allocation path of a page pool. These stats are incremented in softirq context, so no locking or per-cpu variables are needed. This code is disabled by default and a kernel config option is provided for users who wish to enable them. The statistics added are: - fast: successful fast path allocations - slow: slow path order-0 allocations - slow_high_order: slow path high order allocations - empty: ptr ring is empty, so a slow path allocation was forced. - refill: an allocation which triggered a refill of the cache - waive: pages obtained from the ptr ring that cannot be added to the cache due to a NUMA mismatch. Signed-off-by: NJoe Damato <jdamato@fastly.com> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 2月, 2022 1 次提交
-
-
由 Alexander Duyck 提交于
This change is meant to permit a driver to perform "fragmenting" of the page from within the driver instead of the current model which requires pre-partitioning the page. The main motivation behind this is to support use cases where the page will be split up by the driver after DMA instead of before. With this change it becomes possible to start using page pool to replace some of the existing use cases where multiple references were being used for a single page, but the number needed was unknown as the size could be dynamic. For example, with this code it would be possible to do something like the following to handle allocation: page = page_pool_alloc_pages(); if (!page) return NULL; page_pool_fragment_page(page, DRIVER_PAGECNT_BIAS_MAX); rx_buf->page = page; rx_buf->pagecnt_bias = DRIVER_PAGECNT_BIAS_MAX; Then we would process a received buffer by handling it with: rx_buf->pagecnt_bias--; Once the page has been fully consumed we could then flush the remaining instances with: if (page_pool_defrag_page(page, rx_buf->pagecnt_bias)) continue; page_pool_put_defragged_page(pool, page -1, !!budget); The general idea is that we want to have the ability to allocate a page with excess fragment count and then trim off the unneeded fragments. Signed-off-by: NAlexander Duyck <alexanderduyck@fb.com> Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 1月, 2022 1 次提交
-
-
由 Yunsheng Lin 提交于
As page_pool_refill_alloc_cache() is only called by __page_pool_get_cached(), which assumes non-concurrent access as suggested by the comment in __page_pool_get_cached(), and ptr_ring allows concurrent access between consumer and producer, so remove the spinlock in page_pool_refill_alloc_cache(). Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/r/20220107090042.13605-1-linyunsheng@huawei.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 06 1月, 2022 2 次提交
-
-
由 Toke Høiland-Jørgensen 提交于
Store the XDP mem ID inside the page_pool struct so it can be retrieved later for use in bpf_prog_run(). Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/20220103150812.87914-4-toke@redhat.com
-
由 Toke Høiland-Jørgensen 提交于
Add a new callback function to page_pool that, if set, will be called every time a new page is allocated. This will be used from bpf_test_run() to initialise the page data with the data provided by userspace when running XDP programs with redirect turned on. Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/20220103150812.87914-3-toke@redhat.com
-
- 18 11月, 2021 1 次提交
-
-
由 Yunsheng Lin 提交于
This reverts commit d00e60ee. As reported by Guillaume in [1]: Enabling LPAE always enables CONFIG_ARCH_DMA_ADDR_T_64BIT in 32-bit systems, which breaks the bootup proceess when a ethernet driver is using page pool with PP_FLAG_DMA_MAP flag. As we were hoping we had no active consumers for such system when we removed the dma mapping support, and LPAE seems like a common feature for 32 bits system, so revert it. 1. https://www.spinics.net/lists/netdev/msg779890.html Fixes: d00e60ee ("page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA") Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com> Reported-by: N"kernelci.org bot" <bot@kernelci.org> Tested-by: N"kernelci.org bot" <bot@kernelci.org> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Acked-by: NIlias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 10月, 2021 1 次提交
-
-
由 Yunsheng Lin 提交于
As the 32-bit arch with 64-bit DMA seems to rare those days, and page pool might carry a lot of code and complexity for systems that possibly. So disable dma mapping support for such systems, if drivers really want to work on such systems, they have to implement their own DMA-mapping fallback tracking outside page_pool. Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 8月, 2021 1 次提交
-
-
由 Yunsheng Lin 提交于
There is no need to synchronize the account updating, so use the relaxed atomic to avoid some memory barrier in the data path. Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com> Acked-by: NIlias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 8月, 2021 3 次提交
-
-
由 Yunsheng Lin 提交于
Currently page pool only support page recycling when there is only one user of the page, and the split page reusing implemented in the most driver can not use the page pool as bing-pong way of reusing requires the multi user support in page pool. Those reusing or recycling has below limitations: 1. page from page pool can only be used be one user in order for the page recycling to happen. 2. Bing-pong way of reusing in most driver does not support multi desc using different part of the same page in order to save memory. So add multi-users support and frag page recycling in page pool to overcome the above limitation. Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Yunsheng Lin 提交于
For 32 bit systems with 64 bit dma, dma_addr[1] is used to store the upper 32 bit dma addr, those system should be rare those days. For normal system, the dma_addr[1] in 'struct page' is not used, so we can reuse dma_addr[1] for storing frag count, which means how many frags this page might be splited to. In order to simplify the page frag support in the page pool, the PAGE_POOL_DMA_USE_PP_FRAG_COUNT macro is added to indicate the 32 bit systems with 64 bit dma, and the page frag support in page pool is disabled for such system. The newly added page_pool_set_frag_count() is called to reserve the maximum frag count before any page frag is passed to the user. The page_pool_atomic_sub_frag_count_return() is called when user is done with the page frag. Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Yunsheng Lin 提交于
Currently, page->pp is cleared and set everytime the page is recycled, which is unnecessary. So only set the page->pp when the page is added to the page pool and only clear it when the page is released from the page pool. This is also a preparation to support allocating frag page in page pool. Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 09 8月, 2021 1 次提交
-
-
由 Yunsheng Lin 提交于
As mentioned in commit c07aea3e ("mm: add a signature in struct page"): "The page->signature field is aliased to page->lru.next and page->compound_head." And as the comment in page_is_pfmemalloc(): "lru.next has bit 1 set if the page is allocated from the pfmemalloc reserves. Callers may simply overwrite it if they do not need to preserve that information." The page->signature is OR’ed with PP_SIGNATURE when a page is allocated in page pool, see __page_pool_alloc_pages_slow(), and page->signature is checked directly with PP_SIGNATURE in page_pool_return_skb_page(), which might cause resoure leaking problem for a page from page pool if bit 1 of lru.next is set for a pfmemalloc page. What happens here is that the original pp->signature is OR'ed with PP_SIGNATURE after the allocation in order to preserve any existing bits(such as the bit 1, used to indicate a pfmemalloc page), so when those bits are present, those page is not considered to be from page pool and the DMA mapping of those pages will be left stale. As bit 0 is for page->compound_head, So mask both bit 0/1 before the checking in page_pool_return_skb_page(). And we will return those pfmemalloc pages back to the page allocator after cleaning up the DMA mapping. Fixes: 6a5bcd84 ("page_pool: Allow drivers to hint on SKB recycling") Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 6月, 2021 2 次提交
-
-
由 Ilias Apalodimas 提交于
Up to now several high speed NICs have custom mechanisms of recycling the allocated memory they use for their payloads. Our page_pool API already has recycling capabilities that are always used when we are running in 'XDP mode'. So let's tweak the API and the kernel network stack slightly and allow the recycling to happen even during the standard operation. The API doesn't take into account 'split page' policies used by those drivers currently, but can be extended once we have users for that. The idea is to be able to intercept the packet on skb_release_data(). If it's a buffer coming from our page_pool API recycle it back to the pool for further usage or just release the packet entirely. To achieve that we introduce a bit in struct sk_buff (pp_recycle:1) and a field in struct page (page->pp) to store the page_pool pointer. Storing the information in page->pp allows us to recycle both SKBs and their fragments. We could have skipped the skb bit entirely, since identical information can bederived from struct page. However, in an effort to affect the free path as less as possible, reading a single bit in the skb which is already in cache, is better that trying to derive identical information for the page stored data. The driver or page_pool has to take care of the sync operations on it's own during the buffer recycling since the buffer is, after opting-in to the recycling, never unmapped. Since the gain on the drivers depends on the architecture, we are not enabling recycling by default if the page_pool API is used on a driver. In order to enable recycling the driver must call skb_mark_for_recycle() to store the information we need for recycling in page->pp and enabling the recycling bit, or page_pool_store_mem_info() for a fragment. Co-developed-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Co-developed-by: NMatteo Croce <mcroce@microsoft.com> Signed-off-by: NMatteo Croce <mcroce@microsoft.com> Signed-off-by: NIlias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Matteo Croce 提交于
This is needed by the page_pool to avoid recycling a page not allocated via page_pool. The page->signature field is aliased to page->lru.next and page->compound_head, but it can't be set by mistake because the signature value is a bad pointer, and can't trigger a false positive in PageTail() because the last bit is 0. Co-developed-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NMatteo Croce <mcroce@microsoft.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 5月, 2021 1 次提交
-
-
由 Matthew Wilcox (Oracle) 提交于
32-bit architectures which expect 8-byte alignment for 8-byte integers and need 64-bit DMA addresses (arm, mips, ppc) had their struct page inadvertently expanded in 2019. When the dma_addr_t was added, it forced the alignment of the union to 8 bytes, which inserted a 4 byte gap between 'flags' and the union. Fix this by storing the dma_addr_t in one or two adjacent unsigned longs. This restores the alignment to that of an unsigned long. We always store the low bits in the first word to prevent the PageTail bit from being inadvertently set on a big endian platform. If that happened, get_user_pages_fast() racing against a page which was freed and reallocated to the page_pool could dereference a bogus compound_head(), which would be hard to trace back to this cause. Link: https://lkml.kernel.org/r/20210510153211.1504886-1-willy@infradead.org Fixes: c25fff71 ("mm: add dma_addr_t to struct page") Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Acked-by: NIlias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Tested-by: NMatteo Croce <mcroce@linux.microsoft.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 5月, 2021 2 次提交
-
-
由 Jesper Dangaard Brouer 提交于
There are cases where the page_pool need to refill with pages from the page allocator. Some workloads cause the page_pool to release pages instead of recycling these pages. For these workload it can improve performance to bulk alloc pages from the page-allocator to refill the alloc cache. For XDP-redirect workload with 100G mlx5 driver (that use page_pool) redirecting xdp_frame packets into a veth, that does XDP_PASS to create an SKB from the xdp_frame, which then cannot return the page to the page_pool. Performance results under GitHub xdp-project[1]: [1] https://github.com/xdp-project/xdp-project/blob/master/areas/mem/page_pool06_alloc_pages_bulk.org Mel: The patch "net: page_pool: convert to use alloc_pages_bulk_array variant" was squashed with this patch. From the test page, the array variant was superior with one of the test results as follows. Kernel XDP stats CPU pps Delta Baseline XDP-RX CPU total 3,771,046 n/a List XDP-RX CPU total 3,940,242 +4.49% Array XDP-RX CPU total 4,249,224 +12.68% Link: https://lkml.kernel.org/r/20210325114228.27719-10-mgorman@techsingularity.netSigned-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NMel Gorman <mgorman@techsingularity.net> Reviewed-by: NAlexander Lobakin <alobakin@pm.me> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: David Miller <davem@davemloft.net> Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jesper Dangaard Brouer 提交于
In preparation for next patch, move the dma mapping into its own function, as this will make it easier to follow the changes. [ilias.apalodimas: make page_pool_dma_map return boolean] Link: https://lkml.kernel.org/r/20210325114228.27719-9-mgorman@techsingularity.netSigned-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NMel Gorman <mgorman@techsingularity.net> Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org> Reviewed-by: NAlexander Lobakin <alobakin@pm.me> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: David Miller <davem@davemloft.net> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 2月, 2021 1 次提交
-
-
由 Alexander Lobakin 提交于
pool_page_reusable() is a leftover from pre-NUMA-aware times. For now, this function is just a redundant wrapper over page_is_pfmemalloc(), so inline it into its sole call site. Signed-off-by: NAlexander Lobakin <alobakin@pm.me> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org> Reviewed-by: NJesse Brandeburg <jesse.brandeburg@intel.com> Acked-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 14 11月, 2020 1 次提交
-
-
由 Lorenzo Bianconi 提交于
Introduce the capability to batch page_pool ptr_ring refill since it is usually run inside the driver NAPI tx completion loop. Suggested-by: NJesper Dangaard Brouer <brouer@redhat.com> Co-developed-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Acked-by: NIlias Apalodimas <ilias.apalodimas@linaro.org> Link: https://lore.kernel.org/bpf/08dd249c9522c001313f520796faa777c4089e1c.1605267335.git.lorenzo@kernel.org
-
- 30 3月, 2020 1 次提交
-
-
由 Denis Kirjanov 提交于
page pool API can be useful for non-DMA cases like xen-netfront driver so let's allow to pass zero flags to page pool flags. v2: check DMA direction only if PP_FLAG_DMA_MAP is set Signed-off-by: NDenis Kirjanov <kda@linux-powerpc.org> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 2月, 2020 1 次提交
-
-
由 Ilias Apalodimas 提交于
Functions starting with __ usually indicate those which are exported, but should not be called directly. Update some of those declared in the API and make it more readable. page_pool_unmap_page() and page_pool_release_page() were doing exactly the same thing calling __page_pool_clean_page(). Let's rename __page_pool_clean_page() to page_pool_release_page() and export it in order to show up on perf logs and get rid of page_pool_unmap_page(). Finally rename __page_pool_put_page() to page_pool_put_page() since we can now directly call it from drivers and rename the existing page_pool_put_page() to page_pool_put_full_page() since they do the same thing but the latter is trying to sync the full DMA area. This patch also updates netsec, mvneta and stmmac drivers which use those functions. Suggested-by: NJonathan Lemon <jonathan.lemon@gmail.com> Acked-by: NToke Høiland-Jørgensen <toke@redhat.com> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NIlias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 2月, 2020 1 次提交
-
-
由 Li RongQing 提交于
"do {} while" in page_pool_refill_alloc_cache will always refill page once whether refill is true or false, and whether alloc.count of pool is less than PP_ALLOC_CACHE_REFILL or not this is wrong, and will cause overflow of pool->alloc.cache the caller of __page_pool_get_cached should provide guarantee that pool->alloc.cache is safe to access, so in_serving_softirq should be removed as suggested by Jesper Dangaard Brouer in https://patchwork.ozlabs.org/patch/1233713/ so fix this issue by calling page_pool_refill_alloc_cache() only when pool->alloc.count is zero Fixes: 44768dec ("page_pool: handle page recycle for NUMA_NO_NODE condition") Signed-off-by: NLi RongQing <lirongqing@baidu.com> Suggested-by: NJesper Dangaard Brouer <brouer@redhat.com> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Acked-by: NIlias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 1月, 2020 2 次提交
-
-
由 Jesper Dangaard Brouer 提交于
When kernel is compiled without NUMA support, then page_pool NUMA config setting (pool->p.nid) doesn't make any practical sense. The compiler cannot see that it can remove the code paths. This patch avoids reading pool->p.nid setting in case of !CONFIG_NUMA, in allocation and numa check code, which helps compiler to see the optimisation potential. It leaves update code intact to keep API the same. $ ./scripts/bloat-o-meter net/core/page_pool.o-numa-enabled \ net/core/page_pool.o-numa-disabled add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-113 (-113) Function old new delta page_pool_create 401 398 -3 __page_pool_alloc_pages_slow 439 426 -13 page_pool_refill_alloc_cache 425 328 -97 Total: Before=3611, After=3498, chg -3.13% Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jesper Dangaard Brouer 提交于
The check in pool_page_reusable (page_to_nid(page) == pool->p.nid) is not valid if page_pool was configured with pool->p.nid = NUMA_NO_NODE. The goal of the NUMA changes in commit d5394610 ("page_pool: Don't recycle non-reusable pages"), were to have RX-pages that belongs to the same NUMA node as the CPU processing RX-packet during softirq/NAPI. As illustrated by the performance measurements. This patch moves the NAPI checks out of fast-path, and at the same time solves the NUMA_NO_NODE issue. First realize that alloc_pages_node() with pool->p.nid = NUMA_NO_NODE will lookup current CPU nid (Numa ID) via numa_mem_id(), which is used as the the preferred nid. It is only in rare situations, where e.g. NUMA zone runs dry, that page gets doesn't get allocated from preferred nid. The page_pool API allows drivers to control the nid themselves via controlling pool->p.nid. This patch moves the NAPI check to when alloc cache is refilled, via dequeuing/consuming pages from the ptr_ring. Thus, we can allow placing pages from remote NUMA into the ptr_ring, as the dequeue/consume step will check the NUMA node. All current drivers using page_pool will alloc/refill RX-ring from same CPU running softirq/NAPI process. Drivers that control the nid explicitly, also use page_pool_update_nid when changing nid runtime. To speed up transision to new nid the alloc cache is now flushed on nid changes. This force pages to come from ptr_ring, which does the appropate nid check. For the NUMA_NO_NODE case, when a NIC IRQ is moved to another NUMA node, we accept that transitioning the alloc cache doesn't happen immediately. The preferred nid change runtime via consulting numa_mem_id() based on the CPU processing RX-packets. Notice, to avoid stressing the page buddy allocator and avoid doing too much work under softirq with preempt disabled, the NUMA check at ptr_ring dequeue will break the refill cycle, when detecting a NUMA mismatch. This will cause a slower transition, but its done on purpose. Fixes: d5394610 ("page_pool: Don't recycle non-reusable pages") Reported-by: NLi RongQing <lirongqing@baidu.com> Reported-by: NYunsheng Lin <linyunsheng@huawei.com> Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 11月, 2019 3 次提交
-
-
由 Lorenzo Bianconi 提交于
Introduce the following parameters in order to add the possibility to sync DMA memory for device before putting allocated pages in the page_pool caches: - PP_FLAG_DMA_SYNC_DEV: if set in page_pool_params flags, all pages that the driver gets from page_pool will be DMA-synced-for-device according to the length provided by the device driver. Please note DMA-sync-for-CPU is still device driver responsibility - offset: DMA address offset where the DMA engine starts copying rx data - max_len: maximum DMA memory size page_pool is allowed to flush. This is currently used in __page_pool_alloc_pages_slow routine when pages are allocated from page allocator These parameters are supposed to be set by device drivers. This optimization reduces the length of the DMA-sync-for-device. The optimization is valid because pages are initially DMA-synced-for-device as defined via max_len. At RX time, the driver will perform a DMA-sync-for-CPU on the memory for the packet length. What is important is the memory occupied by packet payload, because this is the area CPU is allowed to read and modify. As we don't track cache-lines written into by the CPU, simply use the packet payload length as dma_sync_size at page_pool recycle time. This also take into account any tail-extend. Tested-by: NMatteo Croce <mcroce@redhat.com> Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Acked-by: NIlias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Saeed Mahameed 提交于
A page is NOT reusable when at least one of the following is true: 1) allocated when system was under some pressure. (page_is_pfmemalloc) 2) belongs to a different NUMA node than pool->p.nid. To update pool->p.nid users should call page_pool_update_nid(). Holding on to such pages in the pool will hurt the consumer performance when the pool migrates to a different numa node. Performance testing: XDP drop/tx rate and TCP single/multi stream, on mlx5 driver while migrating rx ring irq from close to far numa: mlx5 internal page cache was locally disabled to get pure page pool results. CPU: Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz NIC: Mellanox Technologies MT27700 Family [ConnectX-4] (100G) XDP Drop/TX single core: NUMA | XDP | Before | After --------------------------------------- Close | Drop | 11 Mpps | 10.9 Mpps Far | Drop | 4.4 Mpps | 5.8 Mpps Close | TX | 6.5 Mpps | 6.5 Mpps Far | TX | 3.5 Mpps | 4 Mpps Improvement is about 30% drop packet rate, 15% tx packet rate for numa far test. No degradation for numa close tests. TCP single/multi cpu/stream: NUMA | #cpu | Before | After -------------------------------------- Close | 1 | 18 Gbps | 18 Gbps Far | 1 | 15 Gbps | 18 Gbps Close | 12 | 80 Gbps | 80 Gbps Far | 12 | 68 Gbps | 80 Gbps In all test cases we see improvement for the far numa case, and no impact on the close numa case. The impact of adding a check per page is very negligible, and shows no performance degradation whatsoever, also functionality wise it seems more correct and more robust for page pool to verify when pages should be recycled, since page pool can't guarantee where pages are coming from. Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com> Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com> Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Saeed Mahameed 提交于
Add page_pool_update_nid() to be called by page pool consumers when they detect numa node changes. It will update the page pool nid value to start allocating from the new effective numa node. This is to mitigate page pool allocating pages from a wrong numa node, where the pool was originally allocated, and holding on to pages that belong to a different numa node, which causes performance degradation. For pages that are already being consumed and could be returned to the pool by the consumer, in next patch we will add a check per page to avoid recycling them back to the pool and return them to the page allocator. Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com> Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com> Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 11月, 2019 1 次提交
-
-
由 Jesper Dangaard Brouer 提交于
When Jonathan change the page_pool to become responsible to its own shutdown via deferred work queue, then the disconnect_cnt counter was removed from xdp memory model tracepoint. This patch change the page_pool_inflight tracepoint name to page_pool_release, because it reflects the new responsability better. And it reintroduces a counter that reflect the number of times page_pool_release have been tried. The counter is also used by the code, to only empty the alloc cache once. With a stuck work queue running every second and counter being 64-bit, it will overrun in approx 584 billion years. For comparison, Earth lifetime expectancy is 7.5 billion years, before the Sun will engulf, and destroy, the Earth. Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Acked-by: NToke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 11月, 2019 1 次提交
-
-
由 Jonathan Lemon 提交于
The page pool keeps track of the number of pages in flight, and it isn't safe to remove the pool until all pages are returned. Disallow removing the pool until all pages are back, so the pool is always available for page producers. Make the page pool responsible for its own delayed destruction instead of relying on XDP, so the page pool can be used without the xdp memory model. When all pages are returned, free the pool and notify xdp if the pool is registered with the xdp memory system. Have the callback perform a table walk since some drivers (cpsw) may share the pool among multiple xdp_rxq_info. Note that the increment of pages_state_release_cnt may result in inflight == 0, resulting in the pool being released. Fixes: d956a048 ("xdp: force mem allocator removal and periodic warning") Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Acked-by: NIlias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 8月, 2019 2 次提交
-
-
由 Jonathan Lemon 提交于
__page_pool_get_cached() will return NULL when the ring is empty, even if there are pages present in the lookaside cache. It is also possible to refill the cache, and then return a NULL page. Restructure the logic so eliminate both cases. Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Acked-by: NIlias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yunsheng Lin 提交于
Remove variable initializations in functions that are followed by assignments before use Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 7月, 2019 1 次提交
-
-
由 Ivan Khoronzhuk 提交于
Jesper recently removed page_pool_destroy() (from driver invocation) and moved shutdown and free of page_pool into xdp_rxq_info_unreg(), in-order to handle in-flight packets/pages. This created an asymmetry in drivers create/destroy pairs. This patch reintroduce page_pool_destroy and add page_pool user refcnt. This serves the purpose to simplify drivers error handling as driver now drivers always calls page_pool_destroy() and don't need to track if xdp_rxq_info_reg_mem_model() was unsuccessful. This could be used for a special cases where a single RX-queue (with a single page_pool) provides packets for two net_device'es, and thus needs to register the same page_pool twice with two xdp_rxq_info structures. This patch is primarily to ease API usage for drivers. The recently merged netsec driver, actually have a bug in this area, which is solved by this API change. This patch is a modified version of Ivan Khoronzhuk's original patch. Link: https://lore.kernel.org/netdev/20190625175948.24771-2-ivan.khoronzhuk@linaro.org/ Fixes: 5c67bf0e ("net: netsec: Use page_pool API") Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: NSaeed Mahameed <saeedm@mellanox.com> Signed-off-by: NIvan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 6月, 2019 2 次提交
-
-
由 Jesper Dangaard Brouer 提交于
For DMA mapping use-case the page_pool keeps a pointer to the struct device, which is used in DMA map/unmap calls. For our in-flight handling, we also need to make sure that the struct device have not disappeared. This is assured via using get_device/put_device API. Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Reported-by: NIvan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jesper Dangaard Brouer 提交于
The xdp tracepoints for mem id disconnect don't carry information about, why it was not safe_to_remove. The tracepoint page_pool:page_pool_inflight in this patch can be used for extract this info for further debugging. This patchset also adds tracepoint for the pages_state_* release/hold transitions, including a pointer to the page. This can be used for stats about in-flight pages, or used to debug page leakage via keeping track of page pointer and combining this with kprobe for __put_page(). Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-