提交 · 7491a8e9c476754f53c8616f44ee72a217cabc51 · openeuler / Kernel

14 7月, 2021 40 次提交

mm/page_alloc: __alloc_pages_bulk(): do bounds check before accessing array · 7491a8e9

由 Rasmus Villemoes 提交于 7月 14, 2021

mainline inclusion
from mainline-5.13
commit b08e50dd
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZVL2
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b08e50dd64489e3997029d204f761cb57a3762d2

-------------------------------------------------

In the event that somebody would call this with an already fully
populated page_array, the last loop iteration would do an access beyond
the end of page_array.

It's of course extremely unlikely that would ever be done, but this
triggers my internal static analyzer.  Also, if it really is not
supposed to be invoked this way (i.e., with no NULL entries in
page_array), the nr_populated<nr_pages check could simply be removed
instead.

Link: https://lkml.kernel.org/r/20210507064504.1712559-1-linux@rasmusvillemoes.dk
Fixes: 0f87d9d3 ("mm/page_alloc: add an array-based interface to the bulk page allocator")
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: NMel Gorman <mgorman@techsingularity.net>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit b08e50dd)
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

7491a8e9

net: page_pool: use alloc_pages_bulk in refill code path · 7444187f

由 Jesper Dangaard Brouer 提交于 7月 14, 2021

mainline inclusion
from mainline-5.13-rc1
commit be5dba25
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZVL2
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=be5dba25b4b27f262626ddc9079d4858a75462fd

-------------------------------------------------

There are cases where the page_pool need to refill with pages from the
page allocator.  Some workloads cause the page_pool to release pages
instead of recycling these pages.

For these workload it can improve performance to bulk alloc pages from the
page-allocator to refill the alloc cache.

For XDP-redirect workload with 100G mlx5 driver (that use page_pool)
redirecting xdp_frame packets into a veth, that does XDP_PASS to create an
SKB from the xdp_frame, which then cannot return the page to the
page_pool.

Performance results under GitHub xdp-project[1]:
 [1] https://github.com/xdp-project/xdp-project/blob/master/areas/mem/page_pool06_alloc_pages_bulk.org

Mel: The patch "net: page_pool: convert to use alloc_pages_bulk_array
variant" was squashed with this patch. From the test page, the array
variant was superior with one of the test results as follows.

	Kernel		XDP stats       CPU     pps           Delta
	Baseline	XDP-RX CPU      total   3,771,046       n/a
	List		XDP-RX CPU      total   3,940,242    +4.49%
	Array		XDP-RX CPU      total   4,249,224   +12.68%

Link: https://lkml.kernel.org/r/20210325114228.27719-10-mgorman@techsingularity.netSigned-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
Reviewed-by: NAlexander Lobakin <alobakin@pm.me>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: David Miller <davem@davemloft.net>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit be5dba25)
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

7444187f

net: page_pool: refactor dma_map into own function page_pool_dma_map · 9fbc0d0f

由 Jesper Dangaard Brouer 提交于 7月 14, 2021

mainline inclusion
from mainline-5.13-rc1
commit dfa59717
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZVL2
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=dfa59717b97d4203e6b44ee82874d4f758d93542

-------------------------------------------------

In preparation for next patch, move the dma mapping into its own function,
as this will make it easier to follow the changes.

[ilias.apalodimas: make page_pool_dma_map return boolean]

Link: https://lkml.kernel.org/r/20210325114228.27719-9-mgorman@techsingularity.netSigned-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: NAlexander Lobakin <alobakin@pm.me>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: David Miller <davem@davemloft.net>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit dfa59717)
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
(fix conflicts: use page_pool_set_dma_addr() alternatively)
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9fbc0d0f

SUNRPC: refresh rq_pages using a bulk page allocator · c3ad8afd

由 Chuck Lever 提交于 7月 14, 2021

mainline inclusion
from mainline-5.13-rc1
commit f6e70aab
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZVL2
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f6e70aab9dfe0c2f79cf7dbcb1e80fa71dc60b09

-------------------------------------------------

Reduce the rate at which nfsd threads hammer on the page allocator.  This
improves throughput scalability by enabling the threads to run more
independently of each other.

[mgorman: Update interpretation of alloc_pages_bulk return value]

Link: https://lkml.kernel.org/r/20210325114228.27719-8-mgorman@techsingularity.netSigned-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
Reviewed-by: NAlexander Lobakin <alobakin@pm.me>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David Miller <davem@davemloft.net>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit f6e70aab)
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c3ad8afd

SUNRPC: set rq_page_end differently · fa4c44b5

由 Chuck Lever 提交于 7月 14, 2021

mainline inclusion
from mainline-5.13-rc1
commit ab836264
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZVL2
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ab8362645fba90fa44ec1991ad05544e307dd02f

-------------------------------------------------

Patch series "SUNRPC consumer for the bulk page allocator"

This patch set and the measurements below are based on yesterday's
bulk allocator series:

  git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git mm-bulk-rebase-v5r9

The patches change SUNRPC to invoke the array-based bulk allocator
instead of alloc_page().

The micro-benchmark results are promising.  I ran a mixture of 256KB
reads and writes over NFSv3.  The server's kernel is built with KASAN
enabled, so the comparison is exaggerated but I believe it is still
valid.

I instrumented svc_recv() to measure the latency of each call to
svc_alloc_arg() and report it via a trace point.  The following results
are averages across the trace events.

  Single page: 25.007 us per call over 532,571 calls
  Bulk list:    6.258 us per call over 517,034 calls
  Bulk array:   4.590 us per call over 517,442 calls

This patch (of 2)

Refactor:

I'm about to use the loop variable @i for something else.

As far as the "i++" is concerned, that is a post-increment. The
value of @i is not used subsequently, so the increment operator
is unnecessary and can be removed.

Also note that nfsd_read_actor() was renamed nfsd_splice_actor()
by commit cf8208d0 ("sendfile: convert nfsd to
splice_direct_to_actor()").

Link: https://lkml.kernel.org/r/20210325114228.27719-7-mgorman@techsingularity.netSigned-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
Reviewed-by: NAlexander Lobakin <alobakin@pm.me>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David Miller <davem@davemloft.net>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit ab836264)
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

fa4c44b5

mm/page_alloc: inline __rmqueue_pcplist · 6c1bad0f

由 Jesper Dangaard Brouer 提交于 7月 14, 2021

mainline inclusion
from mainline-5.13-rc1
commit 3b822017
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZVL2
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3b822017b636bf4261a644c16b01eb3900f2a9a0

-------------------------------------------------

When __alloc_pages_bulk() got introduced two callers of __rmqueue_pcplist
exist and the compiler chooses to not inline this function.

  ./scripts/bloat-o-meter vmlinux-before vmlinux-inline__rmqueue_pcplist
  add/remove: 0/1 grow/shrink: 2/0 up/down: 164/-125 (39)
  Function                                     old     new   delta
  rmqueue                                     2197    2296     +99
  __alloc_pages_bulk                          1921    1986     +65
  __rmqueue_pcplist                            125       -    -125
  Total: Before=19374127, After=19374166, chg +0.00%

modprobe page_bench04_bulk loops=$((10**7))

Type:time_bulk_page_alloc_free_array
 -  Per elem: 106 cycles(tsc) 29.595 ns (step:64)
 - (measurement period time:0.295955434 sec time_interval:295955434)
 - (invoke count:10000000 tsc_interval:1065447105)

Before:
 - Per elem: 110 cycles(tsc) 30.633 ns (step:64)

Link: https://lkml.kernel.org/r/20210325114228.27719-6-mgorman@techsingularity.netSigned-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
Reviewed-by: NAlexander Lobakin <alobakin@pm.me>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: David Miller <davem@davemloft.net>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 3b822017)
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6c1bad0f

mm/page_alloc: optimize code layout for __alloc_pages_bulk · c646ff55

由 Jesper Dangaard Brouer 提交于 7月 14, 2021

mainline inclusion
from mainline-5.13-rc1
commit ce76f9a1
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZVL2
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ce76f9a1d9a21c2633dcd2a5605f923286e16e1d

-------------------------------------------------

Looking at perf-report and ASM-code for __alloc_pages_bulk() it is clear
that the code activated is suboptimal.  The compiler guesses wrong and
places unlikely code at the beginning.  Due to the use of WARN_ON_ONCE()
macro the UD2 asm instruction is added to the code, which confuse the
I-cache prefetcher in the CPU.

[mgorman@techsingularity.net: minor changes and rebasing]

Link: https://lkml.kernel.org/r/20210325114228.27719-5-mgorman@techsingularity.netSigned-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
Reviewed-by: NAlexander Lobakin <alobakin@pm.me>
Acked-By: NVlastimil Babka <vbabka@suse.cz>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: David Miller <davem@davemloft.net>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit ce76f9a1)
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c646ff55

mm/page_alloc: add an array-based interface to the bulk page allocator · b7444719

由 Mel Gorman 提交于 7月 14, 2021

mainline inclusion
from mainline-5.13-rc1
commit 0f87d9d3
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZVL2
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0f87d9d30f21390dd71114f30e63038980e6eb3f

-------------------------------------------------

The proposed callers for the bulk allocator store pages from the bulk
allocator in an array.  This patch adds an array-based interface to the
API to avoid multiple list iterations.  The page list interface is
preserved to avoid requiring all users of the bulk API to allocate and
manage enough storage to store the pages.

[akpm@linux-foundation.org: remove now unused local `allocated']

Link: https://lkml.kernel.org/r/20210325114228.27719-4-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Reviewed-by: NAlexander Lobakin <alobakin@pm.me>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: David Miller <davem@davemloft.net>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 0f87d9d3)
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b7444719

mm/page_alloc: add a bulk page allocator · a907a0d0

由 Mel Gorman 提交于 7月 14, 2021

mainline inclusion
from mainline-5.13-rc1
commit 387ba26f
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZVL2
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=387ba26fb1cb9be9e35dc14a6d97188e916eda05

-------------------------------------------------

This patch adds a new page allocator interface via alloc_pages_bulk, and
__alloc_pages_bulk_nodemask.  A caller requests a number of pages to be
allocated and added to a list.

The API is not guaranteed to return the requested number of pages and
may fail if the preferred allocation zone has limited free memory, the
cpuset changes during the allocation or page debugging decides to fail
an allocation.  It's up to the caller to request more pages in batch if
necessary.

Note that this implementation is not very efficient and could be
improved but it would require refactoring.  The intent is to make it
available early to determine what semantics are required by different
callers.  Once the full semantics are nailed down, it can be refactored.

[mgorman@techsingularity.net: fix alloc_pages_bulk() return type, per Matthew]
  Link: https://lkml.kernel.org/r/20210325123713.GQ3697@techsingularity.net
[mgorman@techsingularity.net: fix uninit var warning]
  Link: https://lkml.kernel.org/r/20210330114847.GX3697@techsingularity.net
[mgorman@techsingularity.net: fix comment, per Vlastimil]
  Link: https://lkml.kernel.org/r/20210412110255.GV3697@techsingularity.net

Link: https://lkml.kernel.org/r/20210325114228.27719-3-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Reviewed-by: NAlexander Lobakin <alobakin@pm.me>
Tested-by: NColin Ian King <colin.king@canonical.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: David Miller <davem@davemloft.net>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 387ba26f)
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a907a0d0

mm/page_alloc: rename alloced to allocated · 109bd858

由 Mel Gorman 提交于 7月 14, 2021

mainline inclusion
from mainline-5.13-rc1
commit cb66bede
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZVL2
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cb66bede617581309883432e9a633e8cade2a36e

-------------------------------------------------

Patch series "Introduce a bulk order-0 page allocator with two in-tree users", v6.

This series introduces a bulk order-0 page allocator with sunrpc and the
network page pool being the first users.  The implementation is not
efficient as semantics needed to be ironed out first.  If no other
semantic changes are needed, it can be made more efficient.  Despite that,
this is a performance-related for users that require multiple pages for an
operation without multiple round-trips to the page allocator.  Quoting the
last patch for the high-speed networking use-case

            Kernel          XDP stats       CPU     pps           Delta
            Baseline        XDP-RX CPU      total   3,771,046       n/a
            List            XDP-RX CPU      total   3,940,242    +4.49%
            Array           XDP-RX CPU      total   4,249,224   +12.68%

Via the SUNRPC traces of svc_alloc_arg()

	Single page: 25.007 us per call over 532,571 calls
	Bulk list:    6.258 us per call over 517,034 calls
	Bulk array:   4.590 us per call over 517,442 calls

Both potential users in this series are corner cases (NFS and high-speed
networks) so it is unlikely that most users will see any benefit in the
short term.  Other potential other users are batch allocations for page
cache readahead, fault around and SLUB allocations when high-order pages
are unavailable.  It's unknown how much benefit would be seen by
converting multiple page allocation calls to a single batch or what
difference it may make to headline performance.

Light testing of my own running dbench over NFS passed.  Chuck and Jesper
conducted their own tests and details are included in the changelogs.

Patch 1 renames a variable name that is particularly unpopular

Patch 2 adds a bulk page allocator

Patch 3 adds an array-based version of the bulk allocator

Patches 4-5 adds micro-optimisations to the implementation

Patches 6-7 SUNRPC user

Patches 8-9 Network page_pool user

This patch (of 9):

Review feedback of the bulk allocator twice found problems with "alloced"
being a counter for pages allocated.  The naming was based on the API name
"alloc" and was based on the idea that verbal communication about malloc
tends to use the fake word "malloced" instead of the fake word mallocated.
To be consistent, this preparation patch renames alloced to allocated in
rmqueue_bulk so the bulk allocator and per-cpu allocator use similar names
when the bulk allocator is introduced.

Link: https://lkml.kernel.org/r/20210325114228.27719-1-mgorman@techsingularity.net
Link: https://lkml.kernel.org/r/20210325114228.27719-2-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: NAlexander Lobakin <alobakin@pm.me>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit cb66bede)
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

109bd858

mm/mempolicy: fix mpol_misplaced kernel-doc · c3006290

由 Matthew Wilcox (Oracle) 提交于 7月 14, 2021

mainline inclusion
from mainline-5.13-rc1
commit 5f076944
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZVL2
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5f076944f06988391a6dbd458fc6485a71088e57

-------------------------------------------------

Sphinx interprets the Return section as a list and complains about it.
Turn it into a sentence and move it to the end of the kernel-doc to fit
the kernel-doc style.

Link: https://lkml.kernel.org/r/20210225150642.2582252-8-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: NMike Rapoport <rppt@linux.ibm.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 5f076944)
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
(fix conflicts: add "kernel-doc:: mm/mempolicy.c" at the end of mm-api.rst)
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c3006290

mm/mempolicy: rewrite alloc_pages_vma documentation · ed39436c

由 Matthew Wilcox (Oracle) 提交于 7月 14, 2021

mainline inclusion
from mainline-5.13-rc1
commit eb350739
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZVL2
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=eb35073960510762dee417574589b7a8971c68ab

-------------------------------------------------

The current formatting doesn't quite work with kernel-doc.

Link: https://lkml.kernel.org/r/20210225150642.2582252-7-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: NMike Rapoport <rppt@linux.ibm.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit eb350739)
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

ed39436c

mm/mempolicy: rewrite alloc_pages documentation · bd6cfeb5

由 Matthew Wilcox (Oracle) 提交于 7月 14, 2021

mainline inclusion
from mainline-5.13-rc1
commit 6421ec76
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZVL2
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6421ec764a62c51f810c5dc40cd45eeb15801ad9

-------------------------------------------------

Document alloc_pages() for both NUMA and non-NUMA cases as kernel-doc
doesn't care.

Link: https://lkml.kernel.org/r/20210225150642.2582252-6-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: NMike Rapoport <rppt@linux.ibm.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 6421ec76)
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

bd6cfeb5

mm/mempolicy: rename alloc_pages_current to alloc_pages · e4327b2f

由 Matthew Wilcox (Oracle) 提交于 7月 14, 2021

mainline inclusion
from mainline-5.13-rc1
commit d7f946d0
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZVL2
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d7f946d0faf90014547ee5d090e9d05018278c7a

-------------------------------------------------

When CONFIG_NUMA is enabled, alloc_pages() is a wrapper around
alloc_pages_current().  This is pointless, just implement alloc_pages()
directly.

Link: https://lkml.kernel.org/r/20210225150642.2582252-5-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit d7f946d0)
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e4327b2f

mm/page_alloc: combine __alloc_pages and __alloc_pages_nodemask · cc3f11d4

由 Matthew Wilcox (Oracle) 提交于 7月 14, 2021

mainline inclusion
from mainline-5.13-rc1
commit 84172f4b
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZVL2
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=84172f4bb752424415756351a40f8da5714e1554

-------------------------------------------------

There are only two callers of __alloc_pages() so prune the thicket of
alloc_page variants by combining the two functions together.  Current
callers of __alloc_pages() simply add an extra 'NULL' parameter and
current callers of __alloc_pages_nodemask() call __alloc_pages() instead.

Link: https://lkml.kernel.org/r/20210225150642.2582252-4-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 84172f4b)
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

cc3f11d4

mm/page_alloc: rename gfp_mask to gfp · 5fd51946

由 Matthew Wilcox (Oracle) 提交于 7月 14, 2021

mainline inclusion
from mainline-5.13-rc1
commit 6e5e0f28
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZVL2
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6e5e0f286eb0ecf12afaa3e73c321bc5bf599abb

-------------------------------------------------

Shorten some overly-long lines by renaming this identifier.

Link: https://lkml.kernel.org/r/20210225150642.2582252-3-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 6e5e0f28)
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

5fd51946

mm/page_alloc: rename alloc_mask to alloc_gfp · 81b480cf

由 Matthew Wilcox (Oracle) 提交于 7月 14, 2021

mainline inclusion
from mainline-5.13-rc1
commit 8e6a930b
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZVL2
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8e6a930bb3ea6aa4b623eececc25465d09ee7b13

-------------------------------------------------

Patch series "Rationalise __alloc_pages wrappers", v3.

I was poking around the __alloc_pages variants trying to understand why
they each exist, and couldn't really find a good justification for keeping
__alloc_pages and __alloc_pages_nodemask as separate functions.  That led
to getting rid of alloc_pages_current() and then I noticed the
documentation was bad, and then I noticed the mempolicy documentation
wasn't included.

Anyway, this is all cleanups & doc fixes.

This patch (of 7):

We have two masks involved -- the nodemask and the gfp mask, so alloc_mask
is an unclear name.

Link: https://lkml.kernel.org/r/20210225150642.2582252-2-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 8e6a930b)
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

81b480cf

mm/filemap: fix infinite loop in generic_file_buffered_read() · fb32e21b

由 Kent Overstreet 提交于 7月 12, 2021

mainline inclusion
from mainline-v5.11-rc1
commit 3644e2d2
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I40B5X
CVE: NA

-------------------------------------------------

If iter->count is 0 and iocb->ki_pos is page aligned, this causes
nr_pages to be 0.

Then in generic_file_buffered_read_get_pages() find_get_pages_contig()
returns 0 - because we asked for 0 pages, so we call
generic_file_buffered_read_no_cached_page() which attempts to add a page
to the page cache, which fails with -EEXIST, and then we loop. Oops...
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Reported-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NNanyong Sun <sunnanyong@huawei.com>
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

fb32e21b

mm/filemap.c: generic_file_buffered_read() now uses find_get_pages_contig · eaa2e93a

由 Kent Overstreet 提交于 7月 12, 2021

mainline inclusion
from mainline-v5.11-rc1
commit 06c04442
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I40B5X
CVE: NA

-------------------------------------------------

Convert generic_file_buffered_read() to get pages to read from in batches,
and then copy data to userspace from many pages at once - in particular,
we now don't touch any cachelines that might be contended while we're in
the loop to copy data to userspace.

This is is a performance improvement on workloads that do buffered reads
with large blocksizes, and a very large performance improvement if that
file is also being accessed concurrently by different threads.

On smaller reads (512 bytes), there's a very small performance improvement
(1%, within the margin of error).

akpm: kernel test robot found a 32% speedup on one test:
https://lkml.kernel.org/r/20201030081456.GY31092@shao2-debian

Link: https://lkml.kernel.org/r/20201025212949.602194-3-kent.overstreet@gmail.comSigned-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NNanyong Sun <sunnanyong@huawei.com>
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

eaa2e93a

mm/filemap/c: break generic_file_buffered_read up into multiple functions · 6d75606d

由 Kent Overstreet 提交于 7月 12, 2021

mainline inclusion
from mainline-v5.11-rc1
commit 723ef24b
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I40B5X
CVE: NA

-------------------------------------------------

Patch series "generic_file_buffered_read() improvements", v2.

generic_file_buffered_read() has turned into a real monstrosity to work
with.  And it's a major performance improvement, for both small random and
large sequential reads.  On my test box, 4k buffered random reads go from
~150k to ~250k iops, and the improvements to big sequential reads are even
bigger.

This incorporates the fix for IOCB_WAITQ handling that Jens just posted as
well, also factors out lock_page_for_iocb() to improve handling of the
various iocb flags.

This patch (of 2):

This is prep work for changing generic_file_buffered_read() to use
find_get_pages_contig() to batch up all the pagecache lookups.

This patch should be functionally identical to the existing code and
changes as little as of the flow control as possible.  More refactoring
could be done, this patch is intended to be relatively minimal.

Link: https://lkml.kernel.org/r/20201025212949.602194-1-kent.overstreet@gmail.com
Link: https://lkml.kernel.org/r/20201025212949.602194-2-kent.overstreet@gmail.comSigned-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NNanyong Sun <sunnanyong@huawei.com>
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6d75606d

powerpc/mm: enable HAVE_MOVE_PMD support · eb6d64af

由 Aneesh Kumar K.V 提交于 7月 13, 2021

mainline inclusion
from mainline-v5.14-rc1
commit feac00aa
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZFUI
CVE: NA

-------------------------------------------------

mremap HAVE_MOVE_PMD/PUD optimization time comparison for 1GB region:
1GB mremap - Source PTE-aligned, Destination PTE-aligned
  mremap time:      2292772ns
1GB mremap - Source PMD-aligned, Destination PMD-aligned
  mremap time:      1158928ns
1GB mremap - Source PUD-aligned, Destination PUD-aligned
  mremap time:        63886ns

Link: https://lkml.kernel.org/r/20210616045735.374532-4-aneesh.kumar@linux.ibm.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NChen Wandun <chenwandun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

eb6d64af

powerpc/book3s64/mm: update flush_tlb_range to flush page walk cache · d9374b5d

由 Aneesh Kumar K.V 提交于 7月 13, 2021

mainline inclusion
from mainline-v5.14-rc1
commit cec6515a
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZFUI
CVE: NA

-------------------------------------------------

flush_tlb_range is special in that we don't specify the page size used for
the translation.  Hence when flushing TLB we flush the translation cache
for all possible page sizes.  The kernel also uses the same interface when
moving page tables around.  Such a move requires us to flush the page walk
cache.

Instead of adding another interface to force page walk cache flush, update
flush_tlb_range to flush page walk cache if the range flushed is more than
the PMD range.  A page table move will always involve an invalidate range
more than PMD_SIZE.

Running microbenchmark with mprotect and parallel memory access didn't
show any observable performance impact.

Link: https://lkml.kernel.org/r/20210616045735.374532-3-aneesh.kumar@linux.ibm.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NChen Wandun <chenwandun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d9374b5d

mm/mremap: allow arch runtime override · 00b14c43

由 Aneesh Kumar K.V 提交于 7月 13, 2021

mainline inclusion
from mainline-v5.14-rc1
commit 3bbda69c
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZFUI
CVE: NA

-------------------------------------------------

Patch series "Speedup mremap on ppc64", v8.

This patchset enables MOVE_PMD/MOVE_PUD support on power.  This requires
the platform to support updating higher-level page tables without updating
page table entries.  This also needs to invalidate the Page Walk Cache on
architecture supporting the same.

This patch (of 3):

Architectures like ppc64 support faster mremap only with radix
translation.  Hence allow a runtime check w.r.t support for fast mremap.

Link: https://lkml.kernel.org/r/20210616045735.374532-1-aneesh.kumar@linux.ibm.com
Link: https://lkml.kernel.org/r/20210616045735.374532-2-aneesh.kumar@linux.ibm.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NChen Wandun <chenwandun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

00b14c43

powerpc/64s/radix: refactor TLB flush type selection · 8920572c

由 Nicholas Piggin 提交于 7月 13, 2021

mainline inclusion
from mainline-v5.12-rc1
commit 26418b36
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZFUI
CVE: NA

-------------------------------------------------

The logic to decide what kind of TLB flush is required (local, global,
or IPI) is spread multiple times over the several kinds of TLB flushes.

Move it all into a single function which may issue IPIs if necessary,
and also returns a flush type that is to be used.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201217134731.488135-3-npiggin@gmail.comSigned-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NChen Wandun <chenwandun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8920572c

mm/mremap: hold the rmap lock in write mode when moving page table entries. · a534c612

由 Aneesh Kumar K.V 提交于 7月 13, 2021

mainline inclusion
from mainline-v5.14-rc1
commit 97113eb3
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZFUI
CVE: NA

-------------------------------------------------

To avoid a race between rmap walk and mremap, mremap does
take_rmap_locks().  The lock was taken to ensure that rmap walk don't miss
a page table entry due to PTE moves via move_pagetables().  The kernel
does further optimization of this lock such that if we are going to find
the newly added vma after the old vma, the rmap lock is not taken.  This
is because rmap walk would find the vmas in the same order and if we don't
find the page table attached to older vma we would find it with the new
vma which we would iterate later.

As explained in commit eb66ae03 ("mremap: properly flush TLB before
releasing the page") mremap is special in that it doesn't take ownership
of the page.  The optimized version for PUD/PMD aligned mremap also
doesn't hold the ptl lock.  This can result in stale TLB entries as show
below.

This patch updates the rmap locking requirement in mremap to handle the race condition
explained below with optimized mremap::

Optmized PMD move

    CPU 1                           CPU 2                                   CPU 3

    mremap(old_addr, new_addr)      page_shrinker/try_to_unmap_one

    mmap_write_lock_killable()

                                    addr = old_addr
                                    lock(pte_ptl)
    lock(pmd_ptl)
    pmd = *old_pmd
    pmd_clear(old_pmd)
    flush_tlb_range(old_addr)

    *new_pmd = pmd
                                                                            *new_addr = 10; and fills
                                                                            TLB with new addr
                                                                            and old pfn

    unlock(pmd_ptl)
                                    ptep_clear_flush()
                                    old pfn is free.
                                                                            Stale TLB entry

Optimized PUD move also suffers from a similar race.  Both the above race
condition can be fixed if we force mremap path to take rmap lock.

Link: https://lkml.kernel.org/r/20210616045239.370802-7-aneesh.kumar@linux.ibm.com
Fixes: 2c91bd4a ("mm: speed up mremap by 20x on large regions")
Fixes: c49dd340 ("mm: speedup mremap on 1GB or larger regions")
Link: https://lore.kernel.org/linux-mm/CAHk-=wgXVR04eBNtxQfevontWnP6FDm+oj5vauQXP3S-huwbPw@mail.gmail.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Acked-by: NHugh Dickins <hughd@google.com>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NChen Wandun <chenwandun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a534c612

mm/mremap: use pmd/pud_poplulate to update page table entries · 77864986

由 Aneesh Kumar K.V 提交于 7月 13, 2021

mainline inclusion
from mainline-v5.14-rc1
commit 0881ace2
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZFUI
CVE: NA

-------------------------------------------------

pmd/pud_populate is the right interface to be used to set the respective
page table entries.  Some architectures like ppc64 do assume that
set_pmd/pud_at can only be used to set a hugepage PTE.  Since we are not
setting up a hugepage PTE here, use the pmd/pud_populate interface.

Link: https://lkml.kernel.org/r/20210616045239.370802-6-aneesh.kumar@linux.ibm.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NChen Wandun <chenwandun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

77864986

mm/mremap: don't enable optimized PUD move if page table levels is 2 · bce165b4

由 Aneesh Kumar K.V 提交于 7月 13, 2021

mainline inclusion
from mainline-v5.14-rc1
commit d6655dff
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZFUI
CVE: NA

-------------------------------------------------

With two level page table don't enable move_normal_pud.

Link: https://lkml.kernel.org/r/20210616045239.370802-5-aneesh.kumar@linux.ibm.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NChen Wandun <chenwandun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

bce165b4

mm/mremap: convert huge PUD move to separate helper · 4dad0ba2

由 Aneesh Kumar K.V 提交于 7月 13, 2021

mainline inclusion
from mainline-v5.14-rc1
commit 7d846db7
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZFUI
CVE: NA

-------------------------------------------------

With TRANSPARENT_HUGEPAGE_PUD enabled the kernel can find huge PUD
entries.  Add a helper to move huge PUD entries on mremap().

This will be used by a later patch to optimize mremap of PUD_SIZE aligned
level 4 PTE mapped address

This also make sure we support mremap on huge PUD entries even with
CONFIG_HAVE_MOVE_PUD disabled.

[aneesh.kumar@linux.ibm.com: fix build failure with clang-10]
  Link: https://lore.kernel.org/lkml/YMuOSnJsL9qkxweY@archlinux-ax161
  Link: https://lkml.kernel.org/r/20210619134310.89098-1-aneesh.kumar@linux.ibm.com

Link: https://lkml.kernel.org/r/20210616045239.370802-4-aneesh.kumar@linux.ibm.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NChen Wandun <chenwandun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

4dad0ba2

selftest/mremap_test: avoid crash with static build · 29396f98

由 Aneesh Kumar K.V 提交于 7月 13, 2021

mainline inclusion
from mainline-v5.14-rc1
commit a9cc9c34
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZFUI
CVE: NA

-------------------------------------------------

With a large mmap map size, we can overlap with the text area and using
MAP_FIXED results in unmapping that area.  Switch to MAP_FIXED_NOREPLACE
and handle the EEXIST error.

Link: https://lkml.kernel.org/r/20210616045239.370802-3-aneesh.kumar@linux.ibm.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: NKalesh Singh <kaleshsingh@google.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NChen Wandun <chenwandun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

29396f98

selftest/mremap_test: update the test to handle pagesize other than 4K · b63f831a

由 Aneesh Kumar K.V 提交于 7月 13, 2021

mainline inclusion
from mainline-v5.14-rc1
commit f27a5c93
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZFUI
CVE: NA

-------------------------------------------------

Patch series "mrermap fixes", v2.

This patch (of 6):

Instead of hardcoding 4K page size fetch it using sysconf().  For the
performance measurements test still assume 2M and 1G are hugepage sizes.

Link: https://lkml.kernel.org/r/20210616045239.370802-1-aneesh.kumar@linux.ibm.com
Link: https://lkml.kernel.org/r/20210616045239.370802-2-aneesh.kumar@linux.ibm.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: NKalesh Singh <kaleshsingh@google.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NChen Wandun <chenwandun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b63f831a

mm: rename p4d_page_vaddr to p4d_pgtable and make it return pud_t * · ac7f09d3

由 Aneesh Kumar K.V 提交于 7月 13, 2021

mainline inclusion
from mainline-v5.14-rc1
commit dc4875f0
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZFUI
CVE: NA

-------------------------------------------------

No functional change in this patch.

[aneesh.kumar@linux.ibm.com: m68k build error reported by kernel robot]
  Link: https://lkml.kernel.org/r/87tulxnb2v.fsf@linux.ibm.com

Link: https://lkml.kernel.org/r/20210615110859.320299-2-aneesh.kumar@linux.ibm.com
Link: https://lore.kernel.org/linuxppc-dev/CAHk-=wi+J+iodze9FtjM3Zi4j4OeS+qqbKxME9QN4roxPEXH9Q@mail.gmail.com/Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NChen Wandun <chenwandun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

ac7f09d3

mm: rename pud_page_vaddr to pud_pgtable and make it return pmd_t * · c226ee6c

由 Aneesh Kumar K.V 提交于 7月 13, 2021

mainline inclusion
from mainline-v5.14-rc1
commit 9cf6fa24
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZFUI
CVE: NA

-------------------------------------------------

No functional change in this patch.

[aneesh.kumar@linux.ibm.com: fix]
  Link: https://lkml.kernel.org/r/87wnqtnb60.fsf@linux.ibm.com
[sfr@canb.auug.org.au: another fix]
  Link: https://lkml.kernel.org/r/20210619134410.89559-1-aneesh.kumar@linux.ibm.com

Link: https://lkml.kernel.org/r/20210615110859.320299-1-aneesh.kumar@linux.ibm.com
Link: https://lore.kernel.org/linuxppc-dev/CAHk-=wi+J+iodze9FtjM3Zi4j4OeS+qqbKxME9QN4roxPEXH9Q@mail.gmail.com/Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NChen Wandun <chenwandun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c226ee6c

mm/mremap.c: fix extent calculation · 08931171