提交 · 5ecb17b9f1cd705d0a22727826f3d477a595edb9 · openeuler / Kernel

16 8月, 2023 1 次提交

mm: disable kernelcore=mirror when no mirror memory · 45d9e1a9

由 Ma Wupeng 提交于 8月 06, 2023

maillist inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7QV2C
CVE: NA

Reference: https://lkml.kernel.org/r/20230802072328.2107981-1-mawupeng1@huawei.com

--------------------------------

For system with kernelcore=mirror enabled while no mirrored memory is
reported by efi.  This could lead to kernel OOM during startup since all
memory beside zone DMA are in the movable zone and this prevents the
kernel to use it.

Zone DMA/DMA32 initialization is independent of mirrored memory and their
max pfn is set in zone_sizes_init().  Since kernel can fallback to zone
DMA/DMA32 if there is no memory in zone Normal, these zones are seen as
mirrored memory no mather their memory attributes are.

To solve this problem, disable kernelcore=mirror when there is no real
mirrored memory exists.

Link: https://lkml.kernel.org/r/20230802072328.2107981-1-mawupeng1@huawei.comSigned-off-by: NMa Wupeng <mawupeng1@huawei.com>
Suggested-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Suggested-by: NMike Rapoport <rppt@kernel.org>
Reviewed-by: NMike Rapoport (IBM) <rppt@kernel.org>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Cc: Levi Yun <ppbuk5246@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
(cherry picked from commit 6087cfe0)

45d9e1a9

17 2月, 2023 2 次提交

mm: prevent page_frag_alloc() from corrupting the memory · fab2a606

由 Maurizio Lombardi 提交于 2月 15, 2023

stable inclusion
from stable-v5.10.147
commit a54fc536911371bd5863afdcb2418ab948484b75
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0W8

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=a54fc536911371bd5863afdcb2418ab948484b75

--------------------------------

commit dac22531 upstream.

A number of drivers call page_frag_alloc() with a fragment's size >
PAGE_SIZE.

In low memory conditions, __page_frag_cache_refill() may fail the order
3 cache allocation and fall back to order 0; In this case, the cache
will be smaller than the fragment, causing memory corruptions.

Prevent this from happening by checking if the newly allocated cache is
large enough for the fragment; if not, the allocation will fail and
page_frag_alloc() will return NULL.

Link: https://lkml.kernel.org/r/20220715125013.247085-1-mlombard@redhat.com
Fixes: b63ae8ca ("mm/net: Rename and move page fragment handling from net/ to mm/")
Signed-off-by: NMaurizio Lombardi <mlombard@redhat.com>
Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
Cc: Chen Lin <chen45464546@163.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
Reviewed-by: NZheng Zengkai <zhengzengkai@huawei.com>

fab2a606

mm/page_alloc: fix race condition between build_all_zonelists and page allocation · 83b19987

由 Mel Gorman 提交于 2月 15, 2023

stable inclusion
from stable-v5.10.147
commit 466a26af2d105666a427e4892b2be67cfcc55f4f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0W8

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=466a26af2d105666a427e4892b2be67cfcc55f4f

--------------------------------

commit 3d36424b upstream.

Patrick Daly reported the following problem;

	NODE_DATA(nid)->node_zonelists[ZONELIST_FALLBACK] - before offline operation
	[0] - ZONE_MOVABLE
	[1] - ZONE_NORMAL
	[2] - NULL

	For a GFP_KERNEL allocation, alloc_pages_slowpath() will save the
	offset of ZONE_NORMAL in ac->preferred_zoneref. If a concurrent
	memory_offline operation removes the last page from ZONE_MOVABLE,
	build_all_zonelists() & build_zonerefs_node() will update
	node_zonelists as shown below. Only populated zones are added.

	NODE_DATA(nid)->node_zonelists[ZONELIST_FALLBACK] - after offline operation
	[0] - ZONE_NORMAL
	[1] - NULL
	[2] - NULL

The race is simple -- page allocation could be in progress when a memory
hot-remove operation triggers a zonelist rebuild that removes zones.  The
allocation request will still have a valid ac->preferred_zoneref that is
now pointing to NULL and triggers an OOM kill.

This problem probably always existed but may be slightly easier to trigger
due to 6aa303de ("mm, vmscan: only allocate and reclaim from zones
with pages managed by the buddy allocator") which distinguishes between
zones that are completely unpopulated versus zones that have valid pages
not managed by the buddy allocator (e.g.  reserved, memblock, ballooning
etc).  Memory hotplug had multiple stages with timing considerations
around managed/present page updates, the zonelist rebuild and the zone
span updates.  As David Hildenbrand puts it

	memory offlining adjusts managed+present pages of the zone
	essentially in one go. If after the adjustments, the zone is no
	longer populated (present==0), we rebuild the zone lists.

	Once that's done, we try shrinking the zone (start+spanned
	pages) -- which results in zone_start_pfn == 0 if there are no
	more pages. That happens *after* rebuilding the zonelists via
	remove_pfn_range_from_zone().

The only requirement to fix the race is that a page allocation request
identifies when a zonelist rebuild has happened since the allocation
request started and no page has yet been allocated.  Use a seqlock_t to
track zonelist updates with a lockless read-side of the zonelist and
protecting the rebuild and update of the counter with a spinlock.

[akpm@linux-foundation.org: make zonelist_update_seq static]
Link: https://lkml.kernel.org/r/20220824110900.vh674ltxmzb3proq@techsingularity.net
Fixes: 6aa303de ("mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator")
Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
Reported-by: NPatrick Daly <quic_pdaly@quicinc.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Cc: <stable@vger.kernel.org>	[4.9+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
Reviewed-by: NZheng Zengkai <zhengzengkai@huawei.com>

83b19987

19 1月, 2023 1 次提交

mm: fix unexpected changes to {failslab|fail_page_alloc}.attr · f8f98863

由 Qi Zheng 提交于 1月 18, 2023

mainline inclusion
from mainline-v6.1-rc7
commit ea4452de
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I69VVC
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ea4452de2ae987342fadbdd2c044034e6480daad

--------------------------------

When we specify __GFP_NOWARN, we only expect that no warnings will be
issued for current caller.  But in the __should_failslab() and
__should_fail_alloc_page(), the local GFP flags alter the global
{failslab|fail_page_alloc}.attr, which is persistent and shared by all
tasks.  This is not what we expected, let's fix it.

[akpm@linux-foundation.org: unexport should_fail_ex()]
Link: https://lkml.kernel.org/r/20221118100011.2634-1-zhengqi.arch@bytedance.com
Fixes: 3f913fc5 ("mm: fix missing handler for __GFP_NOWARN")
Signed-off-by: NQi Zheng <zhengqi.arch@bytedance.com>
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Reviewed-by: NAkinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: NJason Gunthorpe <jgg@nvidia.com>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NYe Weihua <yeweihua4@huawei.com>
Reviewed-by: Ntong tiangen <tongtiangen@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
(cherry picked from commit 49457cf7)

f8f98863

22 11月, 2022 1 次提交

mm/page_alloc.c: add sysctl to revise the batch and high of percpu pageset · 49854662

由 Liu Shixin 提交于 11月 22, 2022

hulk inclusion
category: performance
bugzilla: 187468, https://gitee.com/openeuler/kernel/issues/I61HVC

--------------------------------

Patch d8a759b5 ("mm, page_alloc: double zone's batchsize") change the
default batchsize. For some machines with large memory, the value seems to
be too large. Although percpu_pagelist_fraction can be used to revise the
batchsize but needs to be adjusted based on managed memory.
So add a new sysctl percpu_max_batchsize to revise the batchsize to adapt
different scenarios.
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>

49854662

21 11月, 2022 1 次提交

mm: mem_reliable: Start fallback if no suitable zone found · 74120095

由 Ma Wupeng 提交于 11月 21, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

For reliable memory allocation bind to nodes which do not hvve any
reliable zones, its memory allocation will fail and then warn message
will be produced at the end of __alloc_pages_slowpath().

Though this memory allocation can fallback to movable zone in
check_after_alloc() if fallback is enabled, something should be done to
prevent this pointless warn log.

To solve this problem, fallback to movable zone if no suitable zone found.
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: Ntong tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

74120095

15 11月, 2022 3 次提交

mm: fix missing handler for __GFP_NOWARN · a02d1ec4

由 Qi Zheng 提交于 11月 15, 2022

mainline inclusion
from mainline-v5.19-rc1
commit 3f913fc5
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I610B5
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=3f913fc5f9745613088d3c569778c9813ab9c129

--------------------------------

We expect no warnings to be issued when we specify __GFP_NOWARN, but
currently in paths like alloc_pages() and kmalloc(), there are still some
warnings printed, fix it.

But for some warnings that report usage problems, we don't deal with them.
If such warnings are printed, then we should fix the usage problems.
Such as the following case:

	WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));

[zhengqi.arch@bytedance.com: v2]
 Link: https://lkml.kernel.org/r/20220511061951.1114-1-zhengqi.arch@bytedance.com
Link: https://lkml.kernel.org/r/20220510113809.80626-1-zhengqi.arch@bytedance.comSigned-off-by: NQi Zheng <zhengqi.arch@bytedance.com>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

Conflict:
	mm/internal.h
	mm/page_alloc.c
Signed-off-by: NYe Weihua <yeweihua4@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a02d1ec4

mm: fix ignore cpuset enforcement · 4e018573

由 Zhou Guanghui 提交于 11月 14, 2022

ascend inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I612UG
CVE: NA

-----------------------------------------------------

Since the current condition ignores the cpuset enforcement by adding
__GFP_THISNODEi to the gfp_mask, this will result in allocations that
specify __GFP_THISNODE and non-cdm nodes not subject to cpuset
restrictions.

For example, procA pid 1000:
node 0 cpus: 0 1 2 3
node 0 free: 57199MB
node 1 cpus: 4 5 6 7
node 1 free: 55930MB

cpuset/test/cpuset.mems  1
cpuset/test/tasks        1000
cpuset/test/cpuset.cpus  0-3

No cdm node exists. When procA malloc 100MB memory, the result is:
node 0 cpus: 0 1 2 3
node 0 free: 57099MB
node 1 cpus: 4 5 6 7
node 1 free: 55930MB
This is not what we expected, and in fact 100 MB of memory should be
allocated from node1. The reason for this problem is that THP specifies
__GFP_THISNODE to attempt to allocate from the local node.

Therefore, the cpuset enforcement should be ignored only when explicitly
allocating memory from the cdm node using __GFP_ THISNODE.
Signed-off-by: NZhou Guanghui <zhouguanghui1@huawei.com>

4e018573

mm: fix alloc CDM node memory for MPOL_BIND · 7f67a4ea

由 Zhou Guanghui 提交于 11月 14, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I612UG
CVE: NA

------------------------------------------------------

Memory can be allocated from a specified CDM node only when it is
allowed to apply for memory from the CDM node. Otherwise, memory
will be allocated from other non-CDM nodes that are not allowed by
th cpuset.
Signed-off-by: NZhou Guanghui <zhouguanghui1@huawei.com>

7f67a4ea

11 11月, 2022 5 次提交

mm: Update reliable flag in memory allocaion for reliable task only in task context · b9b3aaad

由 Ma Wupeng 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Since interrupt may occupy reliable task's context and its current->flags
will have PF_RELIABLE and this will lead to redirect it's memory allocation
to mirrored region.

In order to solve this problem, update reliable task's gfp flag can only
happen in normal task context by checking in_task().
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

b9b3aaad

mm: Introduce fallback mechanism for memory reliable · 5f0b48de

由 Ma Wupeng 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Introduce fallback mechanism for memory reliable. memory allocation will
fallback to non-mirrored region if zone's low watermark is reached and
kswapd will be awakened at this time.

This mechanism is enabled by defalut and can be disabled by adding
"reliable_debug=F" to the kernel parameters. This mechanism rely on
CONFIG_MEMORY_RELIABLE and need "kernelcore=reliable" in the kernel
parameters.
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

5f0b48de

mm: Add reliable memory use limit for user tasks · 8968270e

由 Ma Wupeng 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

There is a upper limit for all memory allocation if the following condtions
are met:
- gfp_zone(gfp & ~ GFP_RELIABLE) == ZONE_MOVABLE
- gfp & GFP_RELIABLE is true

Init tasks will alloc memory from non-mirrored region if their allocation
trigger limit.

The limit can be set or access via /proc/sys/vm/task_reliable_limit

This limit's default value is ULONG_MAX. User can update this value between
current user used reliable memory size and total reliable memory size.
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

8968270e

mm: Clear GFP_RELIABLE if the conditions are not met · a1bdc2e2

由 Ma Wupeng 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Memory reliable only handle memory allocation from movable zone.
GFP_RELIABLE will be removed if the conditions are not met.
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

a1bdc2e2

mm: Count reliable memory info based on zone info · 4623cc86

由 Ma Wupeng 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Count reliable memory info based on zone info. Any zone below
ZONE_MOVABLE is seed as reliable zone and sum the pages there.
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

4623cc86

10 11月, 2022 1 次提交

page_alloc: fix invalid watermark check on a negative value · 48bc7241

由 Jaewon Kim 提交于 11月 10, 2022

stable inclusion
from stable-v5.10.135
commit 2670f76a563124478d0d14e603b38b73b99c389c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZWFM

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2670f76a563124478d0d14e603b38b73b99c389c

--------------------------------

commit 9282012f upstream.

There was a report that a task is waiting at the
throttle_direct_reclaim. The pgscan_direct_throttle in vmstat was
increasing.

This is a bug where zone_watermark_fast returns true even when the free
is very low. The commit f27ce0e1 ("page_alloc: consider highatomic
reserve in watermark fast") changed the watermark fast to consider
highatomic reserve. But it did not handle a negative value case which
can be happened when reserved_highatomic pageblock is bigger than the
actual free.

If watermark is considered as ok for the negative value, allocating
contexts for order-0 will consume all free pages without direct reclaim,
and finally free page may become depleted except highatomic free.

Then allocating contexts may fall into throttle_direct_reclaim. This
symptom may easily happen in a system where wmark min is low and other
reclaimers like kswapd does not make free pages quickly.

Handle the negative case by using MIN.

Link: https://lkml.kernel.org/r/20220725095212.25388-1-jaewon31.kim@samsung.com
Fixes: f27ce0e1 ("page_alloc: consider highatomic reserve in watermark fast")
Signed-off-by: NJaewon Kim <jaewon31.kim@samsung.com>
Reported-by: NGyeongHwan Hong <gh21.hong@samsung.com>
Acked-by: NMel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Yong-Taek Lee <ytk.lee@samsung.com>
Cc: <stable@vger.kerenl.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

48bc7241

17 8月, 2022 2 次提交

mm: introduce memmap_alloc() to unify memory map allocation · 25837227

由 Mike Rapoport 提交于 8月 17, 2022

mainline inclusion
from mainline-v5.15-rc1
commit c803b3c8
category: bugfix
bugzilla: 186414, https://gitee.com/openeuler/kernel/issues/I5K7IY
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c803b3c8b3b70f306ee6300bf8acdd70ffd1441a

--------------------------------

There are several places that allocate memory for the memory map:
alloc_node_mem_map() for FLATMEM, sparse_buffer_init() and
__populate_section_memmap() for SPARSEMEM.

The memory allocated in the FLATMEM case is zeroed and it is never
poisoned, regardless of CONFIG_PAGE_POISON setting.

The memory allocated in the SPARSEMEM cases is not zeroed and it is
implicitly poisoned inside memblock if CONFIG_PAGE_POISON is set.

Introduce memmap_alloc() wrapper for memblock allocators that will be used
for both FLATMEM and SPARSEMEM cases and will makei memory map zeroing and
poisoning consistent for different memory models.

Link: https://lkml.kernel.org/r/20210714123739.16493-4-rppt@kernel.orgSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
Cc: Michal Simek <monstr@monstr.eu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Conflicts:
	mm/page_alloc.c
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

25837227

mm/page_alloc: always initialize memory map for the holes · 750b955b

由 Mike Rapoport 提交于 8月 17, 2022

mainline inclusion
from mainline-v5.15-rc1
commit c3ab6baf
category: bugfix
bugzilla: 186414, https://gitee.com/openeuler/kernel/issues/I5K7IY
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c3ab6baf6a004eab7344a1d8880a971f2414e1b6

--------------------------------

Patch series "mm: ensure consistency of memory map poisoning".

Currently memory map allocation for FLATMEM case does not poison the
struct pages regardless of CONFIG_PAGE_POISON setting.

This happens because allocation of the memory map for FLATMEM and SPARSMEM
use different memblock functions and those that are used for SPARSMEM case
(namely memblock_alloc_try_nid_raw() and memblock_alloc_exact_nid_raw())
implicitly poison the allocated memory.

Another side effect of this implicit poisoning is that early setup code
that uses the same functions to allocate memory burns cycles for the
memory poisoning even if it was not intended.

These patches introduce memmap_alloc() wrapper that ensure that the memory
map allocation is consistent for different memory models.

This patch (of 4):

Currently memory map for the holes is initialized only when SPARSEMEM
memory model is used.  Yet, even with FLATMEM there could be holes in the
physical memory layout that have memory map entries.

For instance, the memory reserved using e820 API on i386 or
"reserved-memory" nodes in device tree would not appear in memblock.memory
and hence the struct pages for such holes will be skipped during memory
map initialization.

These struct pages will be zeroed because the memory map for FLATMEM
systems is allocated with memblock_alloc_node() that clears the allocated
memory.  While zeroed struct pages do not cause immediate problems, the
correct behaviour is to initialize every page using __init_single_page().
Besides, enabling page poison for FLATMEM case will trigger
PF_POISONED_CHECK() unless the memory map is properly initialized.

Make sure init_unavailable_range() is called for both SPARSEMEM and
FLATMEM so that struct pages representing memory holes would appear as
PG_Reserved with any memory layout.

[rppt@kernel.org: fix microblaze]
  Link: https://lkml.kernel.org/r/YQWW3RCE4eWBuMu/@kernel.org

Link: https://lkml.kernel.org/r/20210714123739.16493-1-rppt@kernel.org
Link: https://lkml.kernel.org/r/20210714123739.16493-2-rppt@kernel.orgSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
Acked-by: NDavid Hildenbrand <david@redhat.com>
Tested-by: NGuenter Roeck <linux@roeck-us.net>
Cc: Michal Simek <monstr@monstr.eu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Conflicts:
	arch/microblaze/include/asm/page.h
	mm/page_alloc.c
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

750b955b

26 7月, 2022 1 次提交

mm: page_alloc: fix building error on -Werror=array-compare · 0ccd441a

由 Xiongwei Song 提交于 7月 26, 2022

stable inclusion
from stable-v5.10.113
commit 69848f9488bc1088bcd9a73987dff8d2cb47a060
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ISAH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=69848f9488bc1088bcd9a73987dff8d2cb47a060

--------------------------------

commit ca831f29 upstream.

Arthur Marsh reported we would hit the error below when building kernel
with gcc-12:

  CC      mm/page_alloc.o
  mm/page_alloc.c: In function `mem_init_print_info':
  mm/page_alloc.c:8173:27: error: comparison between two arrays [-Werror=array-compare]
   8173 |                 if (start <= pos && pos < end && size > adj) \
        |

In C++20, the comparision between arrays should be warned.

Link: https://lkml.kernel.org/r/20211125130928.32465-1-sxwjean@me.comSigned-off-by: NXiongwei Song <sxwjean@gmail.com>
Reported-by: NArthur Marsh <arthur.marsh@internode.on.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Khem Raj <raj.khem@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

0ccd441a

19 7月, 2022 1 次提交

mm, page_alloc: fix build_zonerefs_node() · 4558f170

由 Juergen Gross 提交于 7月 19, 2022

stable inclusion
from stable-v5.10.112
commit 192e507ef894b558f51e12ae87e938a959b7dfb6
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5HL0X

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=192e507ef894b558f51e12ae87e938a959b7dfb6

--------------------------------

commit e553f62f upstream.

Since commit 6aa303de ("mm, vmscan: only allocate and reclaim from
zones with pages managed by the buddy allocator") only zones with free
memory are included in a built zonelist.  This is problematic when e.g.
all memory of a zone has been ballooned out when zonelists are being
rebuilt.

The decision whether to rebuild the zonelists when onlining new memory
is done based on populated_zone() returning 0 for the zone the memory
will be added to.  The new zone is added to the zonelists only, if it
has free memory pages (managed_zone() returns a non-zero value) after
the memory has been onlined.  This implies, that onlining memory will
always free the added pages to the allocator immediately, but this is
not true in all cases: when e.g. running as a Xen guest the onlined new
memory will be added only to the ballooned memory list, it will be freed
only when the guest is being ballooned up afterwards.

Another problem with using managed_zone() for the decision whether a
zone is being added to the zonelists is, that a zone with all memory
used will in fact be removed from all zonelists in case the zonelists
happen to be rebuilt.

Use populated_zone() when building a zonelist as it has been done before
that commit.

There was a report that QubesOS (based on Xen) is hitting this problem.
Xen has switched to use the zone device functionality in kernel 5.9 and
QubesOS wants to use memory hotplugging for guests in order to be able
to start a guest with minimal memory and expand it as needed.  This was
the report leading to the patch.

Link: https://lkml.kernel.org/r/20220407120637.9035-1-jgross@suse.com
Fixes: 6aa303de ("mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator")
Signed-off-by: NJuergen Gross <jgross@suse.com>
Reported-by: NMarek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NDavid Hildenbrand <david@redhat.com>
Cc: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: NWei Yang <richard.weiyang@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

4558f170

06 7月, 2022 1 次提交

mm/pages_alloc.c: don't create ZONE_MOVABLE beyond the end of a node · ea2a0e2a

由 Alistair Popple 提交于 7月 06, 2022

stable inclusion
from stable-v5.10.110
commit 7188e7c96f39ae40b8f8d6a807d3f338fb1927ac
bugzilla: https://gitee.com/openeuler/kernel/issues/I574AL

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=7188e7c96f39ae40b8f8d6a807d3f338fb1927ac

--------------------------------

commit ddbc84f3 upstream.

ZONE_MOVABLE uses the remaining memory in each node.  Its starting pfn
is also aligned to MAX_ORDER_NR_PAGES.  It is possible for the remaining
memory in a node to be less than MAX_ORDER_NR_PAGES, meaning there is
not enough room for ZONE_MOVABLE on that node.

Unfortunately this condition is not checked for.  This leads to
zone_movable_pfn[] getting set to a pfn greater than the last pfn in a
node.

calculate_node_totalpages() then sets zone->present_pages to be greater
than zone->spanned_pages which is invalid, as spanned_pages represents
the maximum number of pages in a zone assuming no holes.

Subsequently it is possible free_area_init_core() will observe a zone of
size zero with present pages.  In this case it will skip setting up the
zone, including the initialisation of free_lists[].

However populated_zone() checks zone->present_pages to see if a zone has
memory available.  This is used by iterators such as
walk_zones_in_node().  pagetypeinfo_showfree() uses this to walk the
free_list of each zone in each node, which are assumed to be initialised
due to the zone not being empty.

As free_area_init_core() never initialised the free_lists[] this results
in the following kernel crash when trying to read /proc/pagetypeinfo:

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC NOPTI
  CPU: 0 PID: 456 Comm: cat Not tainted 5.16.0 #461
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
  RIP: 0010:pagetypeinfo_show+0x163/0x460
  Code: 9e 82 e8 80 57 0e 00 49 8b 06 b9 01 00 00 00 4c 39 f0 75 16 e9 65 02 00 00 48 83 c1 01 48 81 f9 a0 86 01 00 0f 84 48 02 00 00 <48> 8b 00 4c 39 f0 75 e7 48 c7 c2 80 a2 e2 82 48 c7 c6 79 ef e3 82
  RSP: 0018:ffffc90001c4bd10 EFLAGS: 00010003
  RAX: 0000000000000000 RBX: ffff88801105f638 RCX: 0000000000000001
  RDX: 0000000000000001 RSI: 000000000000068b RDI: ffff8880163dc68b
  RBP: ffffc90001c4bd90 R08: 0000000000000001 R09: ffff8880163dc67e
  R10: 656c6261766f6d6e R11: 6c6261766f6d6e55 R12: ffff88807ffb4a00
  R13: ffff88807ffb49f8 R14: ffff88807ffb4580 R15: ffff88807ffb3000
  FS:  00007f9c83eff5c0(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 0000000013c8e000 CR4: 0000000000350ef0
  Call Trace:
   seq_read_iter+0x128/0x460
   proc_reg_read_iter+0x51/0x80
   new_sync_read+0x113/0x1a0
   vfs_read+0x136/0x1d0
   ksys_read+0x70/0xf0
   __x64_sys_read+0x1a/0x20
   do_syscall_64+0x3b/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae

Fix this by checking that the aligned zone_movable_pfn[] does not exceed
the end of the node, and if it does skip creating a movable zone on this
node.

Link: https://lkml.kernel.org/r/20220215025831.2113067-1-apopple@nvidia.com
Fixes: 2a1e274a ("Create the ZONE_MOVABLE zone")
Signed-off-by: NAlistair Popple <apopple@nvidia.com>
Acked-by: NDavid Hildenbrand <david@redhat.com>
Acked-by: NMel Gorman <mgorman@techsingularity.net>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYu Liao <liaoyu15@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

ea2a0e2a

27 4月, 2022 2 次提交

mm/page_alloc.c: do not warn allocation failure on zone DMA if no managed pages · 56692753

由 Baoquan He 提交于 4月 27, 2022

stable inclusion
from stable-v5.10.94
commit 6c6f86bb618b73007dc2bc8d4b4003f80ba1efeb
bugzilla: https://gitee.com/openeuler/kernel/issues/I531X9

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=6c6f86bb618b73007dc2bc8d4b4003f80ba1efeb

--------------------------------

commit c4dc63f0 upstream.

In kdump kernel of x86_64, page allocation failure is observed:

 kworker/u2:2: page allocation failure: order:0, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
 CPU: 0 PID: 55 Comm: kworker/u2:2 Not tainted 5.16.0-rc4+ #5
 Hardware name: AMD Dinar/Dinar, BIOS RDN1505B 06/05/2013
 Workqueue: events_unbound async_run_entry_fn
 Call Trace:
  <TASK>
  dump_stack_lvl+0x48/0x5e
  warn_alloc.cold+0x72/0xd6
  __alloc_pages_slowpath.constprop.0+0xc69/0xcd0
  __alloc_pages+0x1df/0x210
  new_slab+0x389/0x4d0
  ___slab_alloc+0x58f/0x770
  __slab_alloc.constprop.0+0x4a/0x80
  kmem_cache_alloc_trace+0x24b/0x2c0
  sr_probe+0x1db/0x620
  ......
  device_add+0x405/0x920
  ......
  __scsi_add_device+0xe5/0x100
  ata_scsi_scan_host+0x97/0x1d0
  async_run_entry_fn+0x30/0x130
  process_one_work+0x1e8/0x3c0
  worker_thread+0x50/0x3b0
  ? rescuer_thread+0x350/0x350
  kthread+0x16b/0x190
  ? set_kthread_struct+0x40/0x40
  ret_from_fork+0x22/0x30
  </TASK>
 Mem-Info:
 ......

The above failure happened when calling kmalloc() to allocate buffer with
GFP_DMA.  It requests to allocate slab page from DMA zone while no managed
pages at all in there.

 sr_probe()
 --> get_capabilities()
     --> buffer = kmalloc(512, GFP_KERNEL | GFP_DMA);

Because in the current kernel, dma-kmalloc will be created as long as
CONFIG_ZONE_DMA is enabled.  However, kdump kernel of x86_64 doesn't have
managed pages on DMA zone since commit 6f599d84 ("x86/kdump: Always
reserve the low 1M when the crashkernel option is specified").  The
failure can be always reproduced.

For now, let's mute the warning of allocation failure if requesting pages
from DMA zone while no managed pages.

[akpm@linux-foundation.org: fix warning]

Link: https://lkml.kernel.org/r/20211223094435.248523-4-bhe@redhat.com
Fixes: 6f599d84 ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified")
Signed-off-by: NBaoquan He <bhe@redhat.com>
Acked-by: NJohn Donnelly  <john.p.donnelly@oracle.com>
Reviewed-by: NHyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

56692753

mm_zone: add function to check if managed dma zone exists · d4add921

由 Baoquan He 提交于 4月 27, 2022

stable inclusion
from stable-v5.10.94
commit d2e572411738a5aad67901caef8e083fb9df29fd
bugzilla: https://gitee.com/openeuler/kernel/issues/I531X9

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=d2e572411738a5aad67901caef8e083fb9df29fd

--------------------------------

commit 62b31070 upstream.

Patch series "Handle warning of allocation failure on DMA zone w/o
managed pages", v4.

**Problem observed:
On x86_64, when crash is triggered and entering into kdump kernel, page
allocation failure can always be seen.

 ---------------------------------
 DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
 swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
 CPU: 0 PID: 1 Comm: swapper/0
 Call Trace:
  dump_stack+0x7f/0xa1
  warn_alloc.cold+0x72/0xd6
  ......
  __alloc_pages+0x24d/0x2c0
  ......
  dma_atomic_pool_init+0xdb/0x176
  do_one_initcall+0x67/0x320
  ? rcu_read_lock_sched_held+0x3f/0x80
  kernel_init_freeable+0x290/0x2dc
  ? rest_init+0x24f/0x24f
  kernel_init+0xa/0x111
  ret_from_fork+0x22/0x30
 Mem-Info:
 ------------------------------------

***Root cause:
In the current kernel, it assumes that DMA zone must have managed pages
and try to request pages if CONFIG_ZONE_DMA is enabled. While this is not
always true. E.g in kdump kernel of x86_64, only low 1M is presented and
locked down at very early stage of boot, so that this low 1M won't be
added into buddy allocator to become managed pages of DMA zone. This
exception will always cause page allocation failure if page is requested
from DMA zone.

***Investigation:
This failure happens since below commit merged into linus's tree.
  1a6a9044 x86/setup: Remove CONFIG_X86_RESERVE_LOW and reservelow= options
  23721c8e x86/crash: Remove crash_reserve_low_1M()
  f1d4d47c x86/setup: Always reserve the first 1M of RAM
  7c321eb2 x86/kdump: Remove the backup region handling
  6f599d84 x86/kdump: Always reserve the low 1M when the crashkernel option is specified

Before them, on x86_64, the low 640K area will be reused by kdump kernel.
So in kdump kernel, the content of low 640K area is copied into a backup
region for dumping before jumping into kdump. Then except of those firmware
reserved region in [0, 640K], the left area will be added into buddy
allocator to become available managed pages of DMA zone.

However, after above commits applied, in kdump kernel of x86_64, the low
1M is reserved by memblock, but not released to buddy allocator. So any
later page allocation requested from DMA zone will fail.

At the beginning, if crashkernel is reserved, the low 1M need be locked
down because AMD SME encrypts memory making the old backup region
mechanims impossible when switching into kdump kernel.

Later, it was also observed that there are BIOSes corrupting memory
under 1M. To solve this, in commit f1d4d47c, the entire region of
low 1M is always reserved after the real mode trampoline is allocated.

Besides, recently, Intel engineer mentioned their TDX (Trusted domain
extensions) which is under development in kernel also needs to lock down
the low 1M. So we can't simply revert above commits to fix the page allocation
failure from DMA zone as someone suggested.

***Solution:
Currently, only DMA atomic pool and dma-kmalloc will initialize and
request page allocation with GFP_DMA during bootup.

So only initializ DMA atomic pool when DMA zone has available managed
pages, otherwise just skip the initialization.

For dma-kmalloc(), for the time being, let's mute the warning of
allocation failure if requesting pages from DMA zone while no manged
pages.  Meanwhile, change code to use dma_alloc_xx/dma_map_xx API to
replace kmalloc(GFP_DMA), or do not use GFP_DMA when calling kmalloc() if
not necessary.  Christoph is posting patches to fix those under
drivers/scsi/.  Finally, we can remove the need of dma-kmalloc() as people
suggested.

This patch (of 3):

In some places of the current kernel, it assumes that dma zone must have
managed pages if CONFIG_ZONE_DMA is enabled.  While this is not always
true.  E.g in kdump kernel of x86_64, only low 1M is presented and locked
down at very early stage of boot, so that there's no managed pages at all
in DMA zone.  This exception will always cause page allocation failure if
page is requested from DMA zone.

Here add function has_managed_dma() and the relevant helper functions to
check if there's DMA zone with managed pages.  It will be used in later
patches.

Link: https://lkml.kernel.org/r/20211223094435.248523-1-bhe@redhat.com
Link: https://lkml.kernel.org/r/20211223094435.248523-2-bhe@redhat.com
Fixes: 6f599d84 ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified")
Signed-off-by: NBaoquan He <bhe@redhat.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Acked-by: NJohn Donnelly  <john.p.donnelly@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

 Conflicts:
	mm/page_alloc.c
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

d4add921

20 3月, 2022 1 次提交

mm/dynamic_hugetlb: check page using check_new_page · c3aad494

由 Liu Shixin 提交于 3月 20, 2022

hulk inclusion
category: bugfix
bugzilla: 46904 https://gitee.com/openeuler/kernel/issues/I4Y0XO

--------------------------------

Use check_new_page to check the page to be allocated.
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c3aad494

23 2月, 2022 3 次提交

mm: Introduce reliable flag for user task · 8ee6e050

由 Peng Wu 提交于 2月 23, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4PM01
CVE: NA

------------------------------------------

Adding reliable flag for user task. User task with reliable flag can
only alloc memory from mirrored region. PF_RELIABLE is added to represent
the task's reliable flag.

- For init task, which is regarded as special task which alloc memory
  from mirrored region.

- For normal user tasks, The reliable flag can be set via procfs interface
  shown as below and can be inherited via fork().

User can change a user task's reliable flag by

	$ echo [0/1] > /proc/<pid>/reliable

and check a user task's reliable flag by

	$ cat /proc/<pid>/reliable

Note, global init task's reliable file can not be accessed.
Signed-off-by: NPeng Wu <wupeng58@huawei.com>
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8ee6e050

mm: Introduce memory reliable · 6c59ddf2

由 Ma Wupeng 提交于 2月 23, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4PM01
CVE: NA

--------------------------------

Introduction

============

Memory reliable feature is a memory tiering mechanism. It is based on
kernel mirror feature, which splits memory into two sperate regions,
mirrored(reliable) region and non-mirrored (non-reliable) region.

for kernel mirror feature:

- allocate kernel memory from mirrored region by default
- allocate user memory from non-mirrored region by default

non-mirrored region will be arranged into ZONE_MOVABLE.

for kernel reliable feature, it has additional features below:

- normal user tasks never alloc memory from mirrored region with userspace
  apis(malloc, mmap, etc.)
- special user tasks will allocate memory from mirrored region by default
- tmpfs/pagecache allocate memory from mirrored region by default
- upper limit of mirrored region allcated for user tasks, tmpfs and
  pagecache

Support Reliable fallback mechanism which allows special user tasks, tmpfs
and pagecache can fallback to alloc non-mirrored region, it's the default
setting.

In order to fulfil the goal

- ___GFP_RELIABLE flag added for alloc memory from mirrored region.

- the high_zoneidx for special user tasks/tmpfs/pagecache is set to
  ZONE_NORMAL.

- normal user tasks could only alloc from ZONE_MOVABLE.

This patch is just the main framework, memory reliable support for special
user tasks, pagecache and tmpfs has own patches.

To enable this function, mirrored(reliable) memory is needed and
"kernelcore=reliable" should be added to kernel parameters.
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6c59ddf2

efi: Disable mirror feature if kernelcore is not specified · 856090e5

由 Ma Wupeng 提交于 2月 23, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4PM01
CVE: NA

--------------------------------

With this patch, kernel will check mirrored_kernelcore before calling
efi_find_mirror() which will enable basic mirrored feature.

If system have some mirrored memory and mirrored feature is not specified
in boot parameter, the basic mirrored feature will be enabled and this will
lead to the following situations:

- memblock memory allocation perfers mirrored region. This may have some
  unexpected influence on numa affinity.

- contiguous memory will be splited into server parts if parts of them
is mirrored memroy via memblock_mark_mirror().
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

856090e5

11 2月, 2022 2 次提交

mm/page_alloc: use accumulated load when building node fallback list · f532b284

由 Krupa Ramakrishnan 提交于 2月 11, 2022

mainline inclusion
from mainline-v5.16-rc1
commit 54d032ce
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4T0ML
CVE: NA

-----------------------------------------------

In build_zonelists(), when the fallback list is built for the nodes, the
node load gets reinitialized during each iteration.  This results in
nodes with same distances occupying the same slot in different node
fallback lists rather than appearing in the intended round- robin
manner.  This results in one node getting picked for allocation more
compared to other nodes with the same distance.

As an example, consider a 4 node system with the following distance
matrix.

  Node 0  1  2  3
  ----------------
  0    10 12 32 32
  1    12 10 32 32
  2    32 32 10 12
  3    32 32 12 10

For this case, the node fallback list gets built like this:

  Node  Fallback list
  ---------------------
  0     0 1 2 3
  1     1 0 3 2
  2     2 3 0 1
  3     3 2 0 1 <-- Unexpected fallback order

In the fallback list for nodes 2 and 3, the nodes 0 and 1 appear in the
same order which results in more allocations getting satisfied from node
0 compared to node 1.

The effect of this on remote memory bandwidth as seen by stream
benchmark is shown below:

  Case 1: Bandwidth from cores on nodes 2 & 3 to memory on nodes 0 & 1
	(numactl -m 0,1 ./stream_lowOverhead ... --cores <from 2, 3>)
  Case 2: Bandwidth from cores on nodes 0 & 1 to memory on nodes 2 & 3
	(numactl -m 2,3 ./stream_lowOverhead ... --cores <from 0, 1>)

  ----------------------------------------
		BANDWIDTH (MB/s)
      TEST	Case 1		Case 2
  ----------------------------------------
      COPY	57479.6		110791.8
     SCALE	55372.9		105685.9
       ADD	50460.6		96734.2
    TRIADD	50397.6		97119.1
  ----------------------------------------

The bandwidth drop in Case 1 occurs because most of the allocations get
satisfied by node 0 as it appears first in the fallback order for both
nodes 2 and 3.

This can be fixed by accumulating the node load in build_zonelists()
rather than reinitializing it during each iteration.  With this the
nodes with the same distance rightly get assigned in the round robin
manner.

In fact this was how it was originally until commit f0c0b2b8
("change zonelist order: zonelist order selection logic") dropped the
load accumulation and resorted to initializing the load during each
iteration.

While zonelist ordering was removed by commit c9bff3ee ("mm,
page_alloc: rip out ZONELIST_ORDER_ZONE"), the change to the node load
accumulation in build_zonelists() remained.  So essentially this patch
reverts back to the accumulated node load logic.

After this fix, the fallback order gets built like this:

  Node Fallback list
  ------------------
  0    0 1 2 3
  1    1 0 3 2
  2    2 3 0 1
  3    3 2 1 0 <-- Note the change here

The bandwidth in Case 1 improves and matches Case 2 as shown below.

  ----------------------------------------
		BANDWIDTH (MB/s)
      TEST	Case 1		Case 2
  ----------------------------------------
      COPY	110438.9	110107.2
     SCALE	105930.5	105817.5
       ADD	97005.1		96159.8
    TRIADD	97441.5		96757.1
  ----------------------------------------

The correctness of the fallback list generation has been verified for
the above node configuration where the node 3 starts as memory-less node
and comes up online only during memory hotplug.

[bharata@amd.com: Added changelog, review, test validation]

Link: https://lkml.kernel.org/r/20210830121603.1081-3-bharata@amd.com
Fixes: f0c0b2b8 ("change zonelist order: zonelist order selection logic")
Signed-off-by: NKrupa Ramakrishnan <krupa.ramakrishnan@amd.com>
Co-developed-by: NSadagopan Srinivasan <Sadagopan.Srinivasan@amd.com>
Signed-off-by: NSadagopan Srinivasan <Sadagopan.Srinivasan@amd.com>
Signed-off-by: NBharata B Rao <bharata@amd.com>
Acked-by: NMel Gorman <mgorman@suse.de>
Reviewed-by: NAnshuman Khandual <anshuman.khandual@arm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NPeng Liu <liupeng256@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

f532b284

mm/page_alloc: print node fallback order · 262a7219

由 Bharata B Rao 提交于 2月 11, 2022

mainline inclusion
from mainline-v5.16-rc1
commit 6cf25392
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4T0ML
CVE: NA

-------------------------------------------------

Patch series "Fix NUMA nodes fallback list ordering".

For a NUMA system that has multiple nodes at same distance from other
nodes, the fallback list generation prefers same node order for them
instead of round-robin thereby penalizing one node over others.  This
series fixes it.

More description of the problem and the fix is present in the patch
description.

This patch (of 2):

Print information message about the allocation fallback order for each
NUMA node during boot.

No functional changes here.  This makes it easier to illustrate the
problem in the node fallback list generation, which the next patch
fixes.

Link: https://lkml.kernel.org/r/20210830121603.1081-1-bharata@amd.com
Link: https://lkml.kernel.org/r/20210830121603.1081-2-bharata@amd.comSigned-off-by: NBharata B Rao <bharata@amd.com>
Acked-by: NMel Gorman <mgorman@suse.de>
Reviewed-by: NAnshuman Khandual <anshuman.khandual@arm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Krupa Ramakrishnan <krupa.ramakrishnan@amd.com>
Cc: Sadagopan Srinivasan <Sadagopan.Srinivasan@amd.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NPeng Liu <liupeng256@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

262a7219

10 2月, 2022 1 次提交

mm/page_alloc.c: fix 'zone_id' may be used uninitialized in this function warning · b22bccf4

由 Nico Pache 提交于 2月 10, 2022

mainline inclusion
from mainline-v5.15-rc1
commit b346075f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4S7MA
CVE: NA

--------------------------------

When compiling with -Werror, cc1 will warn that 'zone_id' may be used
uninitialized in this function warning.

Initialize the zone_id as 0.

Its safe to assume that if the code reaches this point it has at least one
numa node with memory, so no need for an assertion before
init_unavilable_range.

Link: https://lkml.kernel.org/r/20210716210336.1114114-1-npache@redhat.com
Fixes: 122e093c ("mm/page_alloc: fix memory map initialization for descending nodes")
Signed-off-by: NNico Pache <npache@redhat.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b22bccf4

19 1月, 2022 3 次提交

mm/dynamic_hugetlb: free pages to dhugetlb_pool · 71197c63

由 Liu Shixin 提交于 1月 18, 2022

hulk inclusion
category: feature
bugzilla: 46904, https://gitee.com/openeuler/kernel/issues/I4QSHG
CVE: NA

--------------------------------

Add function to free page to dhugetlb_pool.
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

71197c63

mm/dynamic_hugetlb: alloc page from dhugetlb_pool · 32d6d14f

由 Liu Shixin 提交于 1月 18, 2022

hulk inclusion
category: feature
bugzilla: 46904, https://gitee.com/openeuler/kernel/issues/I4QSHG
CVE: NA

--------------------------------

Add function to alloc page from dhugetlb_pool.
When process is bound to a mem_cgroup configured with dhugtlb_pool, alloc
page from dhugetlb_pool firstly. If there is no page in dhugetlb_pool,
fallback to alloc page from buddy system.

As the process will alloc pages from dhugetlb_pool in the mem_cgroup,
process is not allowed to migrate to other mem_cgroup.
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

32d6d14f

mm: declare several functions · 98ecb3cd

由 Liu Shixin 提交于 1月 18, 2022

hulk inclusion
category: feature
bugzilla: 46904, https://gitee.com/openeuler/kernel/issues/I4QSHG
CVE: NA

--------------------------------

There are several functions that will be used in next patches for
dynamic hugetlb feature. Declare them.

No functional changes.
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

98ecb3cd

14 1月, 2022 1 次提交

hugetlb: address ref count racing in prep_compound_gigantic_page · df906dae

由 Mike Kravetz 提交于 1月 14, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 7118fc29
category: bugfix
bugzilla:171843

-----------------------------------------------

In [1], Jann Horn points out a possible race between
prep_compound_gigantic_page and __page_cache_add_speculative.  The root
cause of the possible race is prep_compound_gigantic_page uncondittionally
setting the ref count of pages to zero.  It does this because
prep_compound_gigantic_page is handed a 'group' of pages from an allocator
and needs to convert that group of pages to a compound page.  The ref
count of each page in this 'group' is one as set by the allocator.
However, the ref count of compound page tail pages must be zero.

The potential race comes about when ref counted pages are returned from
the allocator.  When this happens, other mm code could also take a
reference on the page.  __page_cache_add_speculative is one such example.
Therefore, prep_compound_gigantic_page can not just set the ref count of
pages to zero as it does today.  Doing so would lose the reference taken
by any other code.  This would lead to BUGs in code checking ref counts
and could possibly even lead to memory corruption.

There are two possible ways to address this issue.

1) Make all allocators of gigantic groups of pages be able to return a
   properly constructed compound page.

2) Make prep_compound_gigantic_page be more careful when constructing a
   compound page.

This patch takes approach 2.

In prep_compound_gigantic_page, use cmpxchg to only set ref count to zero
if it is one.  If the cmpxchg fails, call synchronize_rcu() in the hope
that the extra ref count will be driopped during a rcu grace period.  This
is not a performance critical code path and the wait should be
accceptable.  If the ref count is still inflated after the grace period,
then undo any modifications made and return an error.

Currently prep_compound_gigantic_page is type void and does not return
errors.  Modify the two callers to check for and handle error returns.  On
error, the caller must free the 'group' of pages as they can not be used
to form a gigantic page.  After freeing pages, the runtime caller
(alloc_fresh_huge_page) will retry the allocation once.  Boot time
allocations can not be retried.

The routine prep_compound_page also unconditionally sets the ref count of
compound page tail pages to zero.  However, in this case the buddy
allocator is constructing a compound page from freshly allocated pages.
The ref count on those freshly allocated pages is already zero, so the
set_page_count(p, 0) is unnecessary and could lead to confusion.  Just
remove it.

[1] https://lore.kernel.org/linux-mm/CAG48ez23q0Jy9cuVnwAe7t_fdhMk2S7N5Hdi-GLcCeq5bsfLxw@mail.gmail.com/

Link: https://lkml.kernel.org/r/20210622021423.154662-3-mike.kravetz@oracle.com
Fixes: 58a84aa9 ("thp: set compound tail page _count to zero")
Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
Reported-by: NJann Horn <jannh@google.com>
Cc: Youquan Song <youquan.song@intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NChen Huang <chenhuang5@huawei.com>
Reviewed-by: NChen Wandun <chenwandun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

df906dae

08 1月, 2022 1 次提交

acpi/numa: memorize node type from SRAT table · 1c76b8cf

由 Kemeng Shi 提交于 1月 08, 2022

euleros inclusion
category: feature
feature: etmem
bugzilla: https://gitee.com/openeuler/kernel/issues/I4OODH?from=project-issue
CVE: NA

-------------------------------------------------

Driver dax_kmem will export pmem as a NUMA node. This patch will
record node consists of persistent memory for futher use.
Signed-off-by: NKemeng Shi <shikemeng@huawei.com>
Reviewed-by: Nlouhongxiang <louhongxiang@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

1c76b8cf

29 12月, 2021 6 次提交

mm, page_alloc: disable pcplists during memory offline · e037ee4a

由 Vlastimil Babka 提交于 12月 29, 2021

mainline inclusion
from mainline-5.11-rc1
commit ec6e8c7e
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4MWJF
CVE: NA

-------------------------------------------------

Memory offlining relies on page isolation to guarantee a forward progress
because pages cannot be reused while they are isolated.  But the page
isolation itself doesn't prevent from races while freed pages are stored
on pcp lists and thus can be reused.  This can be worked around by
repeated draining of pcplists, as done by commit 96831826
("mm/memory_hotplug: drain per-cpu pages again during memory offline").

David and Michal would prefer that this race was closed in a way that
callers of page isolation who need stronger guarantees don't need to
repeatedly drain.  David suggested disabling pcplists usage completely
during page isolation, instead of repeatedly draining them.

To achieve this without adding special cases in alloc/free fastpath, we
can use the same approach as boot pagesets - when pcp->high is 0, any
pcplist addition will be immediately flushed.

The race can thus be closed by setting pcp->high to 0 and draining
pcplists once, before calling start_isolate_page_range().  The draining
will serialize after processes that already disabled interrupts and read
the old value of pcp->high in free_unref_page_commit(), and processes that
have not yet disabled interrupts, will observe pcp->high == 0 when they
are rescheduled, and skip pcplists.  This guarantees no stray pages on
pcplists in zones where isolation happens.

This patch thus adds zone_pcp_disable() and zone_pcp_enable() functions
that page isolation users can call before start_isolate_page_range() and
after unisolating (or offlining) the isolated pages.

Also, drain_all_pages() is optimized to only execute on cpus where
pcplists are not empty.  The check can however race with a free to pcplist
that has not yet increased the pcp->count from 0 to 1.  Thus make the
drain optionally skip the racy check and drain on all cpus, and use this
option in zone_pcp_disable().

As we have to avoid external updates to high and batch while pcplists are
disabled, we take pcp_batch_high_lock in zone_pcp_disable() and release it
in zone_pcp_enable().  This also synchronizes multiple users of
zone_pcp_disable()/enable().

Currently the only user of this functionality is offline_pages().

[vbabka@suse.cz: add comment, per David]
  Link: https://lkml.kernel.org/r/527480ef-ed72-e1c1-52a0-1c5b0113df45@suse.cz

Link: https://lkml.kernel.org/r/20201111092812.11329-8-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
Suggested-by: NDavid Hildenbrand <david@redhat.com>
Suggested-by: NMichal Hocko <mhocko@suse.com>
Reviewed-by: NOscar Salvador <osalvador@suse.de>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit ec6e8c7e)
Signed-off-by: NGuilei Xie <xieguilei@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e037ee4a

mm, page_alloc: move draining pcplists to page isolation users · 0346a753

由 Vlastimil Babka 提交于 12月 29, 2021

mainline inclusion
from mainline-5.11-rc1
commit 7612921f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4MWJF
CVE: NA

-------------------------------------------------

Currently, pcplists are drained during set_migratetype_isolate() which
means once per pageblock processed start_isolate_page_range().  This is
somewhat wasteful.  Moreover, the callers might need different guarantees,
and the draining is currently prone to races and does not guarantee that
no page from isolated pageblock will end up on the pcplist after the
drain.

Better guarantees are added by later patches and require explicit actions
by page isolation users that need them.  Thus it makes sense to move the
current imperfect draining to the callers also as a preparation step.

Link: https://lkml.kernel.org/r/20201111092812.11329-7-vbabka@suse.czSuggested-by: NDavid Hildenbrand <david@redhat.com>
Suggested-by: NPavel Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NOscar Salvador <osalvador@suse.de>
Acked-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 7612921f)
Signed-off-by: NGuilei Xie <xieguilei@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0346a753

mm, page_alloc: cache pageset high and batch in struct zone · cd80e78d

由 Vlastimil Babka 提交于 12月 29, 2021

mainline inclusion
from mainline-5.11-rc1
commit 952eaf81
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4MWJF
CVE: NA

-------------------------------------------------

All per-cpu pagesets for a zone use the same high and batch values, that
are duplicated there just for performance (locality) reasons.  This patch
adds the same variables also to struct zone as a shared copy.

This will be useful later for making possible to disable pcplists
temporarily by setting high value to 0, while remembering the values for
restoring them later.  But we can also immediately benefit from not
updating pagesets of all possible cpus in case the newly recalculated
values (after sysctl change or memory online/offline) are actually
unchanged from the previous ones.

Link: https://lkml.kernel.org/r/20201111092812.11329-6-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
Reviewed-by: NOscar Salvador <osalvador@suse.de>
Acked-by: NMichal Hocko <mhocko@suse.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 952eaf81)
Signed-off-by: NGuilei Xie <xieguilei@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

cd80e78d

mm, page_alloc: simplify pageset_update() · 8edaab06

由 Vlastimil Babka 提交于 12月 29, 2021

mainline inclusion
from mainline-5.11-rc1
commit 5c3ad2eb
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4MWJF
CVE: NA

-------------------------------------------------

pageset_update() attempts to update pcplist's high and batch values in a
way that readers don't observe batch > high.  It uses smp_wmb() to order
the updates in a way to achieve this.  However, without proper pairing
read barriers in readers this guarantee doesn't hold, and there are no
such barriers in e.g.  free_unref_page_commit().

Commit 88e8ac11 ("mm, page_alloc: fix core hung in
free_pcppages_bulk()") already showed this is problematic, and solved this
by ultimately only trusing pcp->count of the current cpu with interrupts
disabled.

The update dance with unpaired write barriers thus makes no sense.
Replace them with plain WRITE_ONCE to prevent store tearing, and document
that the values can change asynchronously and should not be trusted for
correctness.

All current readers appear to be OK after 88e8ac11.  Convert them to
READ_ONCE to prevent unnecessary read tearing, but mainly to alert anybody
making future changes to the code that special care is needed.

Link: https://lkml.kernel.org/r/20201111092812.11329-5-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
Reviewed-by: NOscar Salvador <osalvador@suse.de>
Acked-by: NDavid Hildenbrand <david@redhat.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 5c3ad2eb)
Signed-off-by: NGuilei Xie <xieguilei@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8edaab06

mm, page_alloc: remove setup_pageset() · 3cae8631

由 Vlastimil Babka 提交于 12月 29, 2021

mainline inclusion
from mainline-5.11-rc1
commit 69a8396a
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4MWJF
CVE: NA

-------------------------------------------------

We initialize boot-time pagesets with setup_pageset(), which sets high and
batch values that effectively disable pcplists.

We can remove this wrapper if we just set these values for all pagesets in
pageset_init().  Non-boot pagesets then subsequently update them to the
proper values.

No functional change.

Link: https://lkml.kernel.org/r/20201111092812.11329-4-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NOscar Salvador <osalvador@suse.de>
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NPankaj Gupta <pankaj.gupta@cloud.ionos.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 69a8396a)
Signed-off-by: NGuilei Xie <xieguilei@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

3cae8631

mm, page_alloc: calculate pageset high and batch once per zone · 3148740e

由 Vlastimil Babka 提交于 12月 29, 2021

mainline inclusion
from mainline-5.11-rc1
commit 0a8b4f1d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4MWJF
CVE: NA

-------------------------------------------------

We currently call pageset_set_high_and_batch() for each possible cpu,
which repeats the same calculations of high and batch values.

Instead call the function just once per zone, and make it apply the
calculated values to all per-cpu pagesets of the zone.

This also allows removing the zone_pageset_init() and __zone_pcp_update()
wrappers.

No functional change.

Link: https://lkml.kernel.org/r/20201111092812.11329-3-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
Reviewed-by: NOscar Salvador <osalvador@suse.de>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NPankaj Gupta <pankaj.gupta@cloud.ionos.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 0a8b4f1d)
Signed-off-by: NGuilei Xie <xieguilei@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

3148740e

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功