提交 · 8d29c7036f5ff360ea1f51b9fed5d909be7c8094 · openeuler / Kernel

04 7月, 2022 1 次提交

mm/swap: convert __put_page() to __folio_put() · 8d29c703

由 Matthew Wilcox (Oracle) 提交于 6月 17, 2022

Saves 11 bytes of text by removing a check of PageTail.

Link: https://lkml.kernel.org/r/20220617175020.717127-16-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

8d29c703

13 5月, 2022 1 次提交

net: page_pool: add page allocation stats for two fast page allocate path · 0f6deac3

由 Jie Wang 提交于 5月 12, 2022

Currently If use page pool allocation stats to analysis a RX performance
degradation problem. These stats only count for pages allocate from
page_pool_alloc_pages. But nic drivers such as hns3 use
page_pool_dev_alloc_frag to allocate pages, so page stats in this API
should also be counted.
Signed-off-by: NJie Wang <wangjie125@huawei.com>
Signed-off-by: NGuangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0f6deac3

15 4月, 2022 1 次提交

net: page_pool: introduce ethtool stats · f3c5264f

由 Lorenzo Bianconi 提交于 4月 12, 2022

Introduce page_pool APIs to report stats through ethtool and reduce
duplicated code in each driver.
Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: NJakub Kicinski <kuba@kernel.org>
Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f3c5264f

12 4月, 2022 1 次提交

page_pool: Add recycle stats to page_pool_put_page_bulk · 590032a4

由 Lorenzo Bianconi 提交于 4月 11, 2022

Add missing recycle stats to page_pool_put_page_bulk routine.
Reviewed-by: NJoe Damato <jdamato@fastly.com>
Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Link: https://lore.kernel.org/r/3712178b51c007cfaed910ea80e68f00c916b1fa.1649685634.git.lorenzo@kernel.orgSigned-off-by: NPaolo Abeni <pabeni@redhat.com>

590032a4

03 3月, 2022 3 次提交

page_pool: Add function to batch and return stats · 6b95e338

由 Joe Damato 提交于 3月 01, 2022

Adds a function page_pool_get_stats which can be used by drivers to obtain
stats for a specified page_pool.
Signed-off-by: NJoe Damato <jdamato@fastly.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6b95e338

page_pool: Add recycle stats · ad6fa1e1

由 Joe Damato 提交于 3月 01, 2022

Add per-cpu stats tracking page pool recycling events:
	- cached: recycling placed page in the page pool cache
	- cache_full: page pool cache was full
	- ring: page placed into the ptr ring
	- ring_full: page released from page pool because the ptr ring was full
	- released_refcnt: page released (and not recycled) because refcnt > 1
Signed-off-by: NJoe Damato <jdamato@fastly.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ad6fa1e1

page_pool: Add allocation stats · 8610037e

由 Joe Damato 提交于 3月 01, 2022

Add per-pool statistics counters for the allocation path of a page pool.
These stats are incremented in softirq context, so no locking or per-cpu
variables are needed.

This code is disabled by default and a kernel config option is provided for
users who wish to enable them.

The statistics added are:
	- fast: successful fast path allocations
	- slow: slow path order-0 allocations
	- slow_high_order: slow path high order allocations
	- empty: ptr ring is empty, so a slow path allocation was forced.
	- refill: an allocation which triggered a refill of the cache
	- waive: pages obtained from the ptr ring that cannot be added to
	  the cache due to a NUMA mismatch.
Signed-off-by: NJoe Damato <jdamato@fastly.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8610037e

03 2月, 2022 1 次提交

page_pool: Refactor page_pool to enable fragmenting after allocation · 52cc6ffc

由 Alexander Duyck 提交于 1月 31, 2022

This change is meant to permit a driver to perform "fragmenting" of the
page from within the driver instead of the current model which requires
pre-partitioning the page. The main motivation behind this is to support
use cases where the page will be split up by the driver after DMA instead
of before.

With this change it becomes possible to start using page pool to replace
some of the existing use cases where multiple references were being used
for a single page, but the number needed was unknown as the size could be
dynamic.

For example, with this code it would be possible to do something like
the following to handle allocation:
  page = page_pool_alloc_pages();
  if (!page)
    return NULL;
  page_pool_fragment_page(page, DRIVER_PAGECNT_BIAS_MAX);
  rx_buf->page = page;
  rx_buf->pagecnt_bias = DRIVER_PAGECNT_BIAS_MAX;

Then we would process a received buffer by handling it with:
  rx_buf->pagecnt_bias--;

Once the page has been fully consumed we could then flush the remaining
instances with:
  if (page_pool_defrag_page(page, rx_buf->pagecnt_bias))
    continue;
  page_pool_put_defragged_page(pool, page -1, !!budget);

The general idea is that we want to have the ability to allocate a page
with excess fragment count and then trim off the unneeded fragments.
Signed-off-by: NAlexander Duyck <alexanderduyck@fb.com>
Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52cc6ffc

10 1月, 2022 1 次提交

page_pool: remove spinlock in page_pool_refill_alloc_cache() · 07b17f0f

由 Yunsheng Lin 提交于 1月 07, 2022

As page_pool_refill_alloc_cache() is only called by
__page_pool_get_cached(), which assumes non-concurrent access
as suggested by the comment in __page_pool_get_cached(), and
ptr_ring allows concurrent access between consumer and producer,
so remove the spinlock in page_pool_refill_alloc_cache().
Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/r/20220107090042.13605-1-linyunsheng@huawei.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

07b17f0f

06 1月, 2022 2 次提交

page_pool: Store the XDP mem id · 64693ec7

由 Toke Høiland-Jørgensen 提交于 1月 03, 2022

Store the XDP mem ID inside the page_pool struct so it can be retrieved
later for use in bpf_prog_run().
Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/20220103150812.87914-4-toke@redhat.com

64693ec7

page_pool: Add callback to init pages when they are allocated · 35b2e549

由 Toke Høiland-Jørgensen 提交于 1月 03, 2022

Add a new callback function to page_pool that, if set, will be called every
time a new page is allocated. This will be used from bpf_test_run() to
initialise the page data with the data provided by userspace when running
XDP programs with redirect turned on.
Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/20220103150812.87914-3-toke@redhat.com

35b2e549

18 11月, 2021 1 次提交

page_pool: Revert "page_pool: disable dma mapping support..." · f915b75b

由 Yunsheng Lin 提交于 11月 17, 2021

This reverts commit d00e60ee.

As reported by Guillaume in [1]:
Enabling LPAE always enables CONFIG_ARCH_DMA_ADDR_T_64BIT
in 32-bit systems, which breaks the bootup proceess when a
ethernet driver is using page pool with PP_FLAG_DMA_MAP flag.
As we were hoping we had no active consumers for such system
when we removed the dma mapping support, and LPAE seems like
a common feature for 32 bits system, so revert it.

1. https://www.spinics.net/lists/netdev/msg779890.html

Fixes: d00e60ee ("page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA")
Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
Reported-by: N"kernelci.org bot" <bot@kernelci.org>
Tested-by: N"kernelci.org bot" <bot@kernelci.org>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f915b75b

15 10月, 2021 1 次提交

page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA · d00e60ee

由 Yunsheng Lin 提交于 10月 13, 2021

As the 32-bit arch with 64-bit DMA seems to rare those days,
and page pool might carry a lot of code and complexity for
systems that possibly.

So disable dma mapping support for such systems, if drivers
really want to work on such systems, they have to implement
their own DMA-mapping fallback tracking outside page_pool.
Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d00e60ee

24 8月, 2021 1 次提交

page_pool: use relaxed atomic for release side accounting · 7fb9b66d

由 Yunsheng Lin 提交于 8月 24, 2021

There is no need to synchronize the account updating, so
use the relaxed atomic to avoid some memory barrier in the
data path.
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
Acked-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7fb9b66d

10 8月, 2021 3 次提交

page_pool: add frag page recycling support in page pool · 53e0961d

由 Yunsheng Lin 提交于 8月 06, 2021

Currently page pool only support page recycling when there
is only one user of the page, and the split page reusing
implemented in the most driver can not use the page pool as
bing-pong way of reusing requires the multi user support in
page pool.

Those reusing or recycling has below limitations:
1. page from page pool can only be used be one user in order
   for the page recycling to happen.
2. Bing-pong way of reusing in most driver does not support
   multi desc using different part of the same page in order
   to save memory.

So add multi-users support and frag page recycling in page
pool to overcome the above limitation.
Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

53e0961d

page_pool: add interface to manipulate frag count in page pool · 0e9d2a0a

由 Yunsheng Lin 提交于 8月 06, 2021

For 32 bit systems with 64 bit dma, dma_addr[1] is used to
store the upper 32 bit dma addr, those system should be rare
those days.

For normal system, the dma_addr[1] in 'struct page' is not
used, so we can reuse dma_addr[1] for storing frag count,
which means how many frags this page might be splited to.

In order to simplify the page frag support in the page pool,
the PAGE_POOL_DMA_USE_PP_FRAG_COUNT macro is added to indicate
the 32 bit systems with 64 bit dma, and the page frag support
in page pool is disabled for such system.

The newly added page_pool_set_frag_count() is called to reserve
the maximum frag count before any page frag is passed to the
user. The page_pool_atomic_sub_frag_count_return() is called
when user is done with the page frag.
Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

0e9d2a0a

page_pool: keep pp info as long as page pool owns the page · 57f05bc2

由 Yunsheng Lin 提交于 8月 06, 2021

Currently, page->pp is cleared and set everytime the page
is recycled, which is unnecessary.

So only set the page->pp when the page is added to the page
pool and only clear it when the page is released from the
page pool.

This is also a preparation to support allocating frag page
in page pool.
Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

57f05bc2

09 8月, 2021 1 次提交

page_pool: mask the page->signature before the checking · 0fa32ca4

由 Yunsheng Lin 提交于 8月 06, 2021

As mentioned in commit c07aea3e ("mm: add a signature in
struct page"):
"The page->signature field is aliased to page->lru.next and
page->compound_head."

And as the comment in page_is_pfmemalloc():
"lru.next has bit 1 set if the page is allocated from the
pfmemalloc reserves. Callers may simply overwrite it if they
do not need to preserve that information."

The page->signature is OR’ed with PP_SIGNATURE when a page is
allocated in page pool, see __page_pool_alloc_pages_slow(),
and page->signature is checked directly with PP_SIGNATURE in
page_pool_return_skb_page(), which might cause resoure leaking
problem for a page from page pool if bit 1 of lru.next is set
for a pfmemalloc page. What happens here is that the original
pp->signature is OR'ed with PP_SIGNATURE after the allocation
in order to preserve any existing bits(such as the bit 1, used
to indicate a pfmemalloc page), so when those bits are present,
those page is not considered to be from page pool and the DMA
mapping of those pages will be left stale.

As bit 0 is for page->compound_head, So mask both bit 0/1 before
the checking in page_pool_return_skb_page(). And we will return
those pfmemalloc pages back to the page allocator after cleaning
up the DMA mapping.

Fixes: 6a5bcd84 ("page_pool: Allow drivers to hint on SKB recycling")
Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0fa32ca4

08 6月, 2021 2 次提交

page_pool: Allow drivers to hint on SKB recycling · 6a5bcd84

由 Ilias Apalodimas 提交于 6月 07, 2021

Up to now several high speed NICs have custom mechanisms of recycling
the allocated memory they use for their payloads.
Our page_pool API already has recycling capabilities that are always
used when we are running in 'XDP mode'. So let's tweak the API and the
kernel network stack slightly and allow the recycling to happen even
during the standard operation.
The API doesn't take into account 'split page' policies used by those
drivers currently, but can be extended once we have users for that.

The idea is to be able to intercept the packet on skb_release_data().
If it's a buffer coming from our page_pool API recycle it back to the
pool for further usage or just release the packet entirely.

To achieve that we introduce a bit in struct sk_buff (pp_recycle:1) and
a field in struct page (page->pp) to store the page_pool pointer.
Storing the information in page->pp allows us to recycle both SKBs and
their fragments.
We could have skipped the skb bit entirely, since identical information
can bederived from struct page. However, in an effort to affect the free path
as less as possible, reading a single bit in the skb which is already
in cache, is better that trying to derive identical information for the
page stored data.

The driver or page_pool has to take care of the sync operations on it's own
during the buffer recycling since the buffer is, after opting-in to the
recycling, never unmapped.

Since the gain on the drivers depends on the architecture, we are not
enabling recycling by default if the page_pool API is used on a driver.
In order to enable recycling the driver must call skb_mark_for_recycle()
to store the information we need for recycling in page->pp and
enabling the recycling bit, or page_pool_store_mem_info() for a fragment.
Co-developed-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Co-developed-by: NMatteo Croce <mcroce@microsoft.com>
Signed-off-by: NMatteo Croce <mcroce@microsoft.com>
Signed-off-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6a5bcd84

mm: add a signature in struct page · c07aea3e

由 Matteo Croce 提交于 6月 07, 2021

This is needed by the page_pool to avoid recycling a page not allocated
via page_pool.

The page->signature field is aliased to page->lru.next and
page->compound_head, but it can't be set by mistake because the
signature value is a bad pointer, and can't trigger a false positive
in PageTail() because the last bit is 0.
Co-developed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NMatteo Croce <mcroce@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c07aea3e

15 5月, 2021 1 次提交

mm: fix struct page layout on 32-bit systems · 9ddb3c14

由 Matthew Wilcox (Oracle) 提交于 5月 14, 2021

32-bit architectures which expect 8-byte alignment for 8-byte integers and
need 64-bit DMA addresses (arm, mips, ppc) had their struct page
inadvertently expanded in 2019.  When the dma_addr_t was added, it forced
the alignment of the union to 8 bytes, which inserted a 4 byte gap between
'flags' and the union.

Fix this by storing the dma_addr_t in one or two adjacent unsigned longs.
This restores the alignment to that of an unsigned long.  We always
store the low bits in the first word to prevent the PageTail bit from
being inadvertently set on a big endian platform.  If that happened,
get_user_pages_fast() racing against a page which was freed and
reallocated to the page_pool could dereference a bogus compound_head(),
which would be hard to trace back to this cause.

Link: https://lkml.kernel.org/r/20210510153211.1504886-1-willy@infradead.org
Fixes: c25fff71 ("mm: add dma_addr_t to struct page")
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Tested-by: NMatteo Croce <mcroce@linux.microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9ddb3c14

01 5月, 2021 2 次提交

net: page_pool: use alloc_pages_bulk in refill code path · be5dba25

由 Jesper Dangaard Brouer 提交于 4月 29, 2021

There are cases where the page_pool need to refill with pages from the
page allocator.  Some workloads cause the page_pool to release pages
instead of recycling these pages.

For these workload it can improve performance to bulk alloc pages from the
page-allocator to refill the alloc cache.

For XDP-redirect workload with 100G mlx5 driver (that use page_pool)
redirecting xdp_frame packets into a veth, that does XDP_PASS to create an
SKB from the xdp_frame, which then cannot return the page to the
page_pool.

Performance results under GitHub xdp-project[1]:
 [1] https://github.com/xdp-project/xdp-project/blob/master/areas/mem/page_pool06_alloc_pages_bulk.org

Mel: The patch "net: page_pool: convert to use alloc_pages_bulk_array
variant" was squashed with this patch. From the test page, the array
variant was superior with one of the test results as follows.

	Kernel		XDP stats       CPU     pps           Delta
	Baseline	XDP-RX CPU      total   3,771,046       n/a
	List		XDP-RX CPU      total   3,940,242    +4.49%
	Array		XDP-RX CPU      total   4,249,224   +12.68%

Link: https://lkml.kernel.org/r/20210325114228.27719-10-mgorman@techsingularity.netSigned-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
Reviewed-by: NAlexander Lobakin <alobakin@pm.me>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: David Miller <davem@davemloft.net>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

be5dba25

net: page_pool: refactor dma_map into own function page_pool_dma_map · dfa59717

由 Jesper Dangaard Brouer 提交于 4月 29, 2021

In preparation for next patch, move the dma mapping into its own function,
as this will make it easier to follow the changes.

[ilias.apalodimas: make page_pool_dma_map return boolean]

Link: https://lkml.kernel.org/r/20210325114228.27719-9-mgorman@techsingularity.netSigned-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: NAlexander Lobakin <alobakin@pm.me>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: David Miller <davem@davemloft.net>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dfa59717

05 2月, 2021 1 次提交

net: page_pool: simplify page recycling condition tests · 05656132

由 Alexander Lobakin 提交于 2月 02, 2021

pool_page_reusable() is a leftover from pre-NUMA-aware times. For now,
this function is just a redundant wrapper over page_is_pfmemalloc(),
so inline it into its sole call site.
Signed-off-by: NAlexander Lobakin <alobakin@pm.me>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

05656132

14 11月, 2020 1 次提交

net: page_pool: Add bulk support for ptr_ring · 78862447

由 Lorenzo Bianconi 提交于 11月 13, 2020

Introduce the capability to batch page_pool ptr_ring refill since it is
usually run inside the driver NAPI tx completion loop.
Suggested-by: NJesper Dangaard Brouer <brouer@redhat.com>
Co-developed-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Acked-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Link: https://lore.kernel.org/bpf/08dd249c9522c001313f520796faa777c4089e1c.1605267335.git.lorenzo@kernel.org

78862447

30 3月, 2020 1 次提交

net: page pool: allow to pass zero flags to page_pool_init() · 798dda81

由 Denis Kirjanov 提交于 3月 25, 2020

page pool API can be useful for non-DMA cases like
xen-netfront driver so let's allow to pass zero flags to
page pool flags.

v2: check DMA direction only if PP_FLAG_DMA_MAP is set
Signed-off-by: NDenis Kirjanov <kda@linux-powerpc.org>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

798dda81

21 2月, 2020 1 次提交

net: page_pool: API cleanup and comments · 458de8a9

由 Ilias Apalodimas 提交于 2月 20, 2020

Functions starting with __ usually indicate those which are exported,
but should not be called directly. Update some of those declared in the
API and make it more readable.

page_pool_unmap_page() and page_pool_release_page() were doing
exactly the same thing calling __page_pool_clean_page().  Let's
rename __page_pool_clean_page() to page_pool_release_page() and
export it in order to show up on perf logs and get rid of
page_pool_unmap_page().

Finally rename __page_pool_put_page() to page_pool_put_page() since we
can now directly call it from drivers and rename the existing
page_pool_put_page() to page_pool_put_full_page() since they do the same
thing but the latter is trying to sync the full DMA area.

This patch also updates netsec, mvneta and stmmac drivers which use
those functions.
Suggested-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: NToke Høiland-Jørgensen <toke@redhat.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

458de8a9

14 2月, 2020 1 次提交

page_pool: refill page when alloc.count of pool is zero · 304db6cb

由 Li RongQing 提交于 2月 11, 2020

"do {} while" in page_pool_refill_alloc_cache will always
refill page once whether refill is true or false, and whether
alloc.count of pool is less than PP_ALLOC_CACHE_REFILL or not
this is wrong, and will cause overflow of pool->alloc.cache

the caller of __page_pool_get_cached should provide guarantee
that pool->alloc.cache is safe to access, so in_serving_softirq
should be removed as suggested by Jesper Dangaard Brouer in
https://patchwork.ozlabs.org/patch/1233713/

so fix this issue by calling page_pool_refill_alloc_cache()
only when pool->alloc.count is zero

Fixes: 44768dec ("page_pool: handle page recycle for NUMA_NO_NODE condition")
Signed-off-by: NLi RongQing <lirongqing@baidu.com>
Suggested-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

304db6cb

03 1月, 2020 2 次提交

page_pool: help compiler remove code in case CONFIG_NUMA=n · f13fc107

由 Jesper Dangaard Brouer 提交于 12月 27, 2019

When kernel is compiled without NUMA support, then page_pool NUMA
config setting (pool->p.nid) doesn't make any practical sense. The
compiler cannot see that it can remove the code paths.

This patch avoids reading pool->p.nid setting in case of !CONFIG_NUMA,
in allocation and numa check code, which helps compiler to see the
optimisation potential. It leaves update code intact to keep API the
same.

 $ ./scripts/bloat-o-meter net/core/page_pool.o-numa-enabled \
                           net/core/page_pool.o-numa-disabled
 add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-113 (-113)
 Function                                     old     new   delta
 page_pool_create                             401     398      -3
 __page_pool_alloc_pages_slow                 439     426     -13
 page_pool_refill_alloc_cache                 425     328     -97
 Total: Before=3611, After=3498, chg -3.13%
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f13fc107

page_pool: handle page recycle for NUMA_NO_NODE condition · 44768dec

由 Jesper Dangaard Brouer 提交于 12月 27, 2019

The check in pool_page_reusable (page_to_nid(page) == pool->p.nid) is
not valid if page_pool was configured with pool->p.nid = NUMA_NO_NODE.

The goal of the NUMA changes in commit d5394610 ("page_pool: Don't
recycle non-reusable pages"), were to have RX-pages that belongs to the
same NUMA node as the CPU processing RX-packet during softirq/NAPI. As
illustrated by the performance measurements.

This patch moves the NAPI checks out of fast-path, and at the same time
solves the NUMA_NO_NODE issue.

First realize that alloc_pages_node() with pool->p.nid = NUMA_NO_NODE
will lookup current CPU nid (Numa ID) via numa_mem_id(), which is used
as the the preferred nid. It is only in rare situations, where
e.g. NUMA zone runs dry, that page gets doesn't get allocated from
preferred nid. The page_pool API allows drivers to control the nid
themselves via controlling pool->p.nid.

This patch moves the NAPI check to when alloc cache is refilled, via
dequeuing/consuming pages from the ptr_ring. Thus, we can allow placing
pages from remote NUMA into the ptr_ring, as the dequeue/consume step
will check the NUMA node. All current drivers using page_pool will
alloc/refill RX-ring from same CPU running softirq/NAPI process.

Drivers that control the nid explicitly, also use page_pool_update_nid
when changing nid runtime. To speed up transision to new nid the alloc
cache is now flushed on nid changes. This force pages to come from
ptr_ring, which does the appropate nid check.

For the NUMA_NO_NODE case, when a NIC IRQ is moved to another NUMA
node, we accept that transitioning the alloc cache doesn't happen
immediately. The preferred nid change runtime via consulting
numa_mem_id() based on the CPU processing RX-packets.

Notice, to avoid stressing the page buddy allocator and avoid doing too
much work under softirq with preempt disabled, the NUMA check at
ptr_ring dequeue will break the refill cycle, when detecting a NUMA
mismatch. This will cause a slower transition, but its done on purpose.

Fixes: d5394610 ("page_pool: Don't recycle non-reusable pages")
Reported-by: NLi RongQing <lirongqing@baidu.com>
Reported-by: NYunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

44768dec

21 11月, 2019 3 次提交

net: page_pool: add the possibility to sync DMA memory for device · e68bc756

由 Lorenzo Bianconi 提交于 11月 20, 2019

Introduce the following parameters in order to add the possibility to sync
DMA memory for device before putting allocated pages in the page_pool
caches:
- PP_FLAG_DMA_SYNC_DEV: if set in page_pool_params flags, all pages that
  the driver gets from page_pool will be DMA-synced-for-device according
  to the length provided by the device driver. Please note DMA-sync-for-CPU
  is still device driver responsibility
- offset: DMA address offset where the DMA engine starts copying rx data
- max_len: maximum DMA memory size page_pool is allowed to flush. This
  is currently used in __page_pool_alloc_pages_slow routine when pages
  are allocated from page allocator
These parameters are supposed to be set by device drivers.

This optimization reduces the length of the DMA-sync-for-device.
The optimization is valid because pages are initially
DMA-synced-for-device as defined via max_len. At RX time, the driver
will perform a DMA-sync-for-CPU on the memory for the packet length.
What is important is the memory occupied by packet payload, because
this is the area CPU is allowed to read and modify. As we don't track
cache-lines written into by the CPU, simply use the packet payload length
as dma_sync_size at page_pool recycle time. This also take into account
any tail-extend.
Tested-by: NMatteo Croce <mcroce@redhat.com>
Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e68bc756

page_pool: Don't recycle non-reusable pages · d5394610

由 Saeed Mahameed 提交于 11月 20, 2019

A page is NOT reusable when at least one of the following is true:
1) allocated when system was under some pressure. (page_is_pfmemalloc)
2) belongs to a different NUMA node than pool->p.nid.

To update pool->p.nid users should call page_pool_update_nid().

Holding on to such pages in the pool will hurt the consumer performance
when the pool migrates to a different numa node.

Performance testing:
XDP drop/tx rate and TCP single/multi stream, on mlx5 driver
while migrating rx ring irq from close to far numa:

mlx5 internal page cache was locally disabled to get pure page pool
results.

CPU: Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz
NIC: Mellanox Technologies MT27700 Family [ConnectX-4] (100G)

XDP Drop/TX single core:
NUMA  | XDP  | Before    | After
---------------------------------------
Close | Drop | 11   Mpps | 10.9 Mpps
Far   | Drop | 4.4  Mpps | 5.8  Mpps

Close | TX   | 6.5 Mpps  | 6.5 Mpps
Far   | TX   | 3.5 Mpps  | 4  Mpps

Improvement is about 30% drop packet rate, 15% tx packet rate for numa
far test.
No degradation for numa close tests.

TCP single/multi cpu/stream:
NUMA  | #cpu | Before  | After
--------------------------------------
Close | 1    | 18 Gbps | 18 Gbps
Far   | 1    | 15 Gbps | 18 Gbps
Close | 12   | 80 Gbps | 80 Gbps
Far   | 12   | 68 Gbps | 80 Gbps

In all test cases we see improvement for the far numa case, and no
impact on the close numa case.

The impact of adding a check per page is very negligible, and shows no
performance degradation whatsoever, also functionality wise it seems more
correct and more robust for page pool to verify when pages should be
recycled, since page pool can't guarantee where pages are coming from.
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d5394610

page_pool: Add API to update numa node · bc836748

由 Saeed Mahameed 提交于 11月 20, 2019

Add page_pool_update_nid() to be called by page pool consumers when they
detect numa node changes.

It will update the page pool nid value to start allocating from the new
effective numa node.

This is to mitigate page pool allocating pages from a wrong numa node,
where the pool was originally allocated, and holding on to pages that
belong to a different numa node, which causes performance degradation.

For pages that are already being consumed and could be returned to the
pool by the consumer, in next patch we will add a check per page to avoid
recycling them back to the pool and return them to the page allocator.
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc836748

19 11月, 2019 1 次提交

page_pool: add destroy attempts counter and rename tracepoint · 7c9e6942

由 Jesper Dangaard Brouer 提交于 11月 16, 2019

When Jonathan change the page_pool to become responsible to its
own shutdown via deferred work queue, then the disconnect_cnt
counter was removed from xdp memory model tracepoint.

This patch change the page_pool_inflight tracepoint name to
page_pool_release, because it reflects the new responsability
better.  And it reintroduces a counter that reflect the number of
times page_pool_release have been tried.

The counter is also used by the code, to only empty the alloc
cache once.  With a stuck work queue running every second and
counter being 64-bit, it will overrun in approx 584 billion
years. For comparison, Earth lifetime expectancy is 7.5 billion
years, before the Sun will engulf, and destroy, the Earth.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NToke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7c9e6942

17 11月, 2019 1 次提交

page_pool: do not release pool until inflight == 0. · c3f812ce

由 Jonathan Lemon 提交于 11月 14, 2019

The page pool keeps track of the number of pages in flight, and
it isn't safe to remove the pool until all pages are returned.

Disallow removing the pool until all pages are back, so the pool
is always available for page producers.

Make the page pool responsible for its own delayed destruction
instead of relying on XDP, so the page pool can be used without
the xdp memory model.

When all pages are returned, free the pool and notify xdp if the
pool is registered with the xdp memory system.  Have the callback
perform a table walk since some drivers (cpsw) may share the pool
among multiple xdp_rxq_info.

Note that the increment of pages_state_release_cnt may result in
inflight == 0, resulting in the pool being released.

Fixes: d956a048 ("xdp: force mem allocator removal and periodic warning")
Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c3f812ce

16 8月, 2019 2 次提交

page_pool: fix logic in __page_pool_get_cached · 8d73f8f2

由 Jonathan Lemon 提交于 8月 13, 2019

__page_pool_get_cached() will return NULL when the ring is
empty, even if there are pages present in the lookaside cache.

It is also possible to refill the cache, and then return a
NULL page.

Restructure the logic so eliminate both cases.
Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d73f8f2

page_pool: remove unnecessary variable init · 873343e7

由 Yunsheng Lin 提交于 8月 15, 2019

Remove variable initializations in functions that
are followed by assignments before use
Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

873343e7

09 7月, 2019 1 次提交

net: core: page_pool: add user refcnt and reintroduce page_pool_destroy · 1da4bbef

由 Ivan Khoronzhuk 提交于 7月 09, 2019

Jesper recently removed page_pool_destroy() (from driver invocation)
and moved shutdown and free of page_pool into xdp_rxq_info_unreg(),
in-order to handle in-flight packets/pages. This created an asymmetry
in drivers create/destroy pairs.

This patch reintroduce page_pool_destroy and add page_pool user
refcnt. This serves the purpose to simplify drivers error handling as
driver now drivers always calls page_pool_destroy() and don't need to
track if xdp_rxq_info_reg_mem_model() was unsuccessful.

This could be used for a special cases where a single RX-queue (with a
single page_pool) provides packets for two net_device'es, and thus
needs to register the same page_pool twice with two xdp_rxq_info
structures.

This patch is primarily to ease API usage for drivers. The recently
merged netsec driver, actually have a bug in this area, which is
solved by this API change.

This patch is a modified version of Ivan Khoronzhuk's original patch.

Link: https://lore.kernel.org/netdev/20190625175948.24771-2-ivan.khoronzhuk@linaro.org/
Fixes: 5c67bf0e ("net: netsec: Use page_pool API")
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NIvan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1da4bbef

19 6月, 2019 2 次提交

page_pool: make sure struct device is stable · f71fec47

由 Jesper Dangaard Brouer 提交于 6月 18, 2019

For DMA mapping use-case the page_pool keeps a pointer
to the struct device, which is used in DMA map/unmap calls.

For our in-flight handling, we also need to make sure that
the struct device have not disappeared.  This is assured
via using get_device/put_device API.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Reported-by: NIvan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f71fec47

page_pool: add tracepoints for page_pool with details need by XDP · 32c28f7e

由 Jesper Dangaard Brouer 提交于 6月 18, 2019

The xdp tracepoints for mem id disconnect don't carry information about, why
it was not safe_to_remove. The tracepoint page_pool:page_pool_inflight in
this patch can be used for extract this info for further debugging.

This patchset also adds tracepoint for the pages_state_* release/hold
transitions, including a pointer to the page. This can be used for stats
about in-flight pages, or used to debug page leakage via keeping track of
page pointer and combining this with kprobe for __put_page().
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

32c28f7e

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功