提交 · ea80e1d901b80ac7d9c498b2f7c6a602fabcb24a · openeuler / Kernel

09 2月, 2022 27 次提交

mm: Modify sharepool sp_mmap() page_offset · ea80e1d9

由 Guo Mengqi 提交于 2月 09, 2022

ascend inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SPNL
CVE: NA

-----------------------------------

In sp_mmap(), if use offset = va - MMAP_BASE/DVPP_BASE, then normal
sp_alloc pgoff may have same value with DVPP pgoff, causing DVPP
and sp_alloc mapped to overlapped part of file unexpectedly.

To fix the problem, pass VA value as mmap offset, for in this scenario,
VA value in one task address space will not be same.
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>
Reviewed-by: NDing Tianhong <dingtianhong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

ea80e1d9

share_pool: Accept device_id in k2u flags · 1db43c8a

由 Wang Wensheng 提交于 2月 09, 2022

ascend inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SON8
CVE: NA

-------------------------------------------------

We use device_id to select the correct dvpp vspace range when SP_DVPP
flag is specified.
Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

1db43c8a

share_pool: Clear the usage of node_id and device_id · 5fe50a03

由 Wang Wensheng 提交于 2月 09, 2022

ascend inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SON8
CVE: NA

-------------------------------------------------

Device_id is used for DVPP to select the correct virtual address space
and node_id is used to specify the node where we want to alloc physical
memory from. Those two don't have to be the same in theory.

Actually, the process runs always on the numa nodes corresponding to the
device the process used and the node with the same id as the device is
always belongs the the device. So using device_id as node_id to alloc
memory could work.

However the number of numa nodes belongs to a specified device is not
always one and we cannot use other numa nodes of the device.

Here we introduce a new flag SP_SPEC_NODE_ID and add a bit-region in
sp_flags for those who want to use other nodes belongs to a device. That
is, if one want to specify the node_id, the new flag and the node_id
should be both added to the sp_flags when calling sp_alloc() or
sp_make_share_k2u(), otherwise the node with the same id as the device
would be in use.
Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

5fe50a03

share_pool: Make multi-device support extendable · 38f74cfb

由 Wang Wensheng 提交于 2月 09, 2022

ascend inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SON8
CVE: NA

-------------------------------------------------

The maximum devices supported in share_pool in static. Here we make it
extendable for later use.
Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

38f74cfb

mm: add sysctl to clear free list pages · 8b805498

由 Yu Liao 提交于 2月 09, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

This patch add sysctl to clear pages in free lists of each NUMA node.
For each NUMA node, clear each page in the free list, these work is
scheduled on a random CPU of the NUMA node.

When kasan is enabled and the pages are free, the shadow memory will be
filled with 0xFF, writing these free pages will cause UAF, so just
disable KASAN for clear freelist.
Signed-off-by: NYu Liao <liaoyu15@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

8b805498

mm:vmscan: add the missing check of page_cache_over_limit · 92cd2e7f

由 Chen Wandun 提交于 2月 09, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Function shrink_shepherd is used to queue each work on cpu to shrink
page cache, and it will be called periodically, but if there is no
page_cache_over_limit check before shrink page cache, it will result
in periodic memory reclamation even the number of page cache below
limit, so add basic check before shrink page cache.
Signed-off-by: NChen Wandun <chenwandun@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

92cd2e7f

mm/vmscan: dont do shrink_slab in reclaim page cache · 25d463b5

由 Chen Wandun 提交于 2月 09, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

If page cache is over limit, it will trigger page cache reclaimation,
only page cache should be reclaimed, but slab will be reclaimed by
default in shrink_node, so disable shrink_slab by adding a control
parameter in scan_control.
Signed-off-by: NChen Wandun <chenwandun@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

25d463b5

mm/vmscan: dont reclaim anon page when shrink page cache · 94ee86c7

由 Chen Wandun 提交于 2月 09, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

The number of page cache should be limited in a range if
enable CONFIG_MEMORY_RELIABLE, so only page cache instead
of both file + anono page should be reclaimed during page
cache reclaimtion.
Signed-off-by: NChen Wandun <chenwandun@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

94ee86c7

filemap: dont shrink_page_cache in add_to_page_cache · 04ec8364

由 Chen Wandun 提交于 2月 09, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

The reason of disable shrink_page_cache in add_to_page_cache are:

1. Synchronous memory reclamation will affect performance.
2. add_to_page_cache will not increase the number of LRU size in
   HugeTLB situation, so shrink_page_cache will not be triggered.

Now, add_to_page_cache in mm/filemap.c and include/linux/pagemap.h
are same, don't delete add_to_page_cache in mm/filemap.c, just keep
interface for KABI.
Signed-off-by: NChen Wandun <chenwandun@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

04ec8364

mm/vmscan: fix unexpected shrinking page cache with vm_cache_reclaim_enable disable · 76569c77

由 Chen Wandun 提交于 2月 09, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

-------------------------------------------------------

In function cache_limit_ratio_sysctl_handler and
cache_limit_mbytes_sysctl_handler, it will shrink
page cache even if vm_cache_reclaim_enable is false,
it is unexpected.
Signed-off-by: NChen Wandun <chenwandun@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

76569c77

mm/vmscan: fix frequent call of shrink_page_cache_work · 18849ae6

由 Chen Wandun 提交于 2月 09, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

If vmcache_reclaim_s > 120, it will call shrink_page_cache_work
after 120 seconds even shrinking is hard, that is shorter than
vmcache_reclaim_s, deviating from the original intention of extending
the interval.

In order to solve this, shrink_page_cache_work should be call
after vmcache_reclaim_s + 120.
Signed-off-by: NChen Wandun <chenwandun@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

18849ae6

proc/meminfo: add "FileCache" item in /proc/meminfo · c05f56c6

由 Chen Wandun 提交于 2月 09, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------
Item "FileCache" in /proc/meminfo show the number of page cache
in LRU(active + inactive).
Signed-off-by: NChen Wandun <chenwandun@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

c05f56c6

mm: add page cache fallback statistic · 925368d8

由 Chen Wandun 提交于 2月 09, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Add page cache fallback statistic, the counter will overflow
after a period time of use, only reset to zero, no negative
effect.
Signed-off-by: NChen Wandun <chenwandun@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

925368d8

mm: add cmdline for the reliable memory usage of page cache · f5c69190

由 Chen Wandun 提交于 2月 09, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Add cmdline for the reliable memory usage of page cache.
Page cache will not use reliable memory when passing option
"P" to reliable_debug in cmdline.
Signed-off-by: NChen Wandun <chenwandun@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

f5c69190

mm: make page cache use reliable memory by default · c0019109

由 Chen Wandun 提交于 2月 09, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

__page_cache_alloc is used to alloc page cache in most file system,
such as ext4, f2fs, so add ___GFP_RELIABILITY flag to support feature
CONFIG_MEMORY_RELIABLE when alloc page.
Signed-off-by: NChen Wandun <chenwandun@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

c0019109

shmem: Show reliable shmem info · fc2c1dc8

由 Zhou Guanghui 提交于 2月 09, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

------------------------------------------

Add ReliableShmem in /proc/meminfo to show reliable memory info
used by shmem.

- ReliableShmem: reliable memory used by shmem
Signed-off-by: NZhou Guanghui <zhouguanghui1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

fc2c1dc8

shmem: Introduce shmem reliable · 3a3a1f75

由 Zhou Guanghui 提交于 2月 09, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

------------------------------------------

This feature depends on the overall memory reliable feature.
When the shared memory reliable feature is enabled, the pages
used by the shared memory are allocated from the mirrored
region by default. If the mirrored region is insufficient,
you can allocate resources from the non-mirrored region.
Signed-off-by: NZhou Guanghui <zhouguanghui1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

3a3a1f75

mm: Introduce fallback mechanism for memory reliable · 3023a4b3

由 Ma Wupeng 提交于 2月 09, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Introduce fallback mechanism for memory reliable. The following process
will fallback to non-mirrored region if their allocation from mirrored
region failed

- User tasks with reliable flag
- thp collapse pages
- init tasks
- pagecache
- tmpfs

In order to achieve this goals. Buddy system will fallback to non-mirrored
in the following situations.

- if __GFP_THISNODE is set in gfp_mask and dest nodes do not have any zones
  available

- high_zoneidx will be set to ZONE_MOVABLE to alloc memory before oom

This mechanism is enabled by defalut and can be disabled by adding
"reliable_debug=F" to the kernel parameters. This mechanism rely on
CONFIG_MEMORY_RELIABLE and need "kernelcore=reliable" in the kernel
parameters.
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

3023a4b3

mm: Add reliable memory use limit for user tasks · 1845e7ad

由 Peng Wu 提交于 2月 09, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

----------------------------------------------

there is a upper limit for special user tasks's memory allocation.
special user task means user task with reliable flag.

Init tasks will alloc memory from non-mirrored region if their allocation
trigger limit.

The limit can be set or access via /proc/sys/vm/task_reliable_limit

This limit's default value is ULONG_MAX.
Signed-off-by: NPeng Wu <wupeng58@huawei.com>
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

1845e7ad

mm: thp: Add memory reliable support for hugepaged collapse · ff0fb9e8

由 Ma Wupeng 提交于 2月 09, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Hugepaged collapse pages into huge page will use the same memory region.
When hugepaged collapse pages into huge page, hugepaged will check if
there is any reliable pages in the area to be collapsed. If this area
contains any reliable pages, hugepaged will alloc memory from mirrored
region. Otherwise it will alloc momory from non-mirrored region.
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

ff0fb9e8

proc: Count reliable memory usage of reliable tasks · 094eaabb

由 Peng Wu 提交于 2月 09, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

----------------------------------------------

Counting reliable memory allocated by the reliable user tasks.

The policy of counting reliable memory usage is based on RSS statistics.
Anywhere with counter of mm need count reliable pages too. Reliable page
which is checked by page_reliable() need to update the reliable page
counter by calling reliable_page_counter().

Updating the reliable pages should be considered if the following logic is
added:
- add_mm_counter
- dec_mm_counter
- inc_mm_counter_fast
- dec_mm_counter_fast
- rss[mm_counter(page)]
Signed-off-by: NPeng Wu <wupeng58@huawei.com>
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

094eaabb

mm: Introduce reliable flag for user task · c7731567

由 Peng Wu 提交于 2月 09, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

------------------------------------------

Adding reliable flag for user task. User task with reliable flag can
only alloc memory from mirrored region. PF_RELIABLE is added to represent
the task's reliable flag.

- For init task, which is regarded as as special task which alloc memory
  from mirrored region.

- For normal user tasks, The reliable flag can be set via procfs interface
  shown as below and can be inherited via fork().

User can change a user task's reliable flag by

	$ echo [0/1] > /proc/<pid>/reliable

and check a user task's reliable flag by

	$ cat /proc/<pid>/reliable

Note, global init task's reliable file can not be accessed.
Signed-off-by: NPeng Wu <wupeng58@huawei.com>
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

c7731567

meminfo: Show reliable memory info · b1f317c6

由 Ma Wupeng 提交于 2月 09, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Add ReliMemTotal & ReliMemUsed in /proc/meminfo to show memory info about
reliable memory.

- ReliableTotal: total reliable RAM

- ReliableUsed: thei used amount of reliable memory kernel
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

b1f317c6

mm: Introduce memory reliable · 33d1f46a

由 Ma Wupeng 提交于 2月 09, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Introduction
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

============

Memory reliable feature is a memory tiering mechanism. It is based on
kernel mirror feature, which splits memory into two sperate regions,
mirrored(reliable) region and non-mirrored (non-reliable) region.

for kernel mirror feature:

- allocate kernel memory from mirrored region by default
- allocate user memory from non-mirrored region by default

non-mirrored region will be arranged into ZONE_MOVABLE.

for kernel reliable feature, it has additional features below:

- normal user tasks never alloc memory from mirrored region with userspace
  apis(malloc, mmap, etc.)
- special user tasks will allocate memory from mirrored region by default
- tmpfs/pagecache allocate memory from mirrored region by default
- upper limit of mirrored region allcated for user tasks, tmpfs and
  pagecache

Support Reliable fallback mechanism which allows special user tasks, tmpfs
and pagecache can fallback to alloc non-mirrored region, it's the default
setting.

In order to fulfil the goal

- ___GFP_RELIABILITY flag added for alloc memory from mirrored region.

- the high_zoneidx for special user tasks/tmpfs/pagecache is set to
  ZONE_NORMAL.

- normal user tasks could only alloc from ZONE_MOVABLE.

This patch is just the main framework, memory reliable support for special
user tasks, pagecache and tmpfs has own patches.

To enable this function, mirrored(reliable) memory is needed and
"kernelcore=reliable" should be added to kernel parameters.
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

33d1f46a

mm/memory_hotplug: allow to specify a default online_type · c5031cab

由 David Hildenbrand 提交于 2月 09, 2022

mainline inclusion
from linux-5.7-rc1
commit 5f47adf7
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

For now, distributions implement advanced udev rules to essentially
- Don't online any hotplugged memory (s390x)
- Online all memory to ZONE_NORMAL (e.g., most virt environments like
  hyperv)
- Online all memory to ZONE_MOVABLE in case the zone imbalance is taken
  care of (e.g., bare metal, special virt environments)

In summary: All memory is usually onlined the same way, however, the
kernel always has to ask user space to come up with the same answer.
E.g., Hyper-V always waits for a memory block to get onlined before
continuing, otherwise it might end up adding memory faster than
onlining it, which can result in strange OOM situations.  This waiting
slows down adding of a bigger amount of memory.

Let's allow to specify a default online_type, not just "online" and
"offline".  This allows distributions to configure the default online_type
when booting up and be done with it.

We can now specify "offline", "online", "online_movable" and
"online_kernel" via
- "memhp_default_state=" on the kernel cmdline
- /sys/devices/system/memory/auto_online_blocks
just like we are able to specify for a single memory block via
/sys/devices/system/memory/memoryX/state
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NWei Yang <richard.weiyang@gmail.com>
Reviewed-by: NBaoquan He <bhe@redhat.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NPankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Yumei Huang <yuhuang@redhat.com>
Link: http://lkml.kernel.org/r/20200317104942.11178-9-david@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

c5031cab

mm/memory_hotplug: convert memhp_auto_online to store an online_type · b094d7ef

由 David Hildenbrand 提交于 2月 09, 2022

mainline inclusion
from linux-5.7-rc1
commit 862919e5
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

...  and rename it to memhp_default_online_type.  This is a preparation
for more detailed default online behavior.
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NWei Yang <richard.weiyang@gmail.com>
Reviewed-by: NBaoquan He <bhe@redhat.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NPankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Yumei Huang <yuhuang@redhat.com>
Link: http://lkml.kernel.org/r/20200317104942.11178-8-david@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
[Wupeng: keep memhp_auto_online for kabi]
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

b094d7ef

mm/memory_hotplug: drop "online" parameter from add_memory_resource() · 64b1cf47

由 David Hildenbrand 提交于 2月 09, 2022

mainline inclusion
from linux-5.0-rc1
commit f29d8e9c
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Userspace should always be in charge of how to online memory and if memory
should be onlined automatically in the kernel.  Let's drop the parameter
to overwrite this - XEN passes memhp_auto_online, just like add_memory(),
so we can directly use that instead internally.

Link: http://lkml.kernel.org/r/20181123123740.27652-1-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Reviewed-by: NOscar Salvador <osalvador@suse.de>
Acked-by: NJuergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

64b1cf47

08 2月, 2022 1 次提交

mm/memcg_memfs_info: show files that having pages charged in mem_cgroup · add8fd4e

由 Liu Shixin 提交于 2月 08, 2022

hulk inclusion
category: feature
bugzilla: 186182, https://gitee.com/openeuler/kernel/issues/I4SBQX
CVE: NA

--------------------------------

Support to print rootfs files and tmpfs files that having pages charged
in given memory cgroup. The files infomations can be printed through
interface "memory.memfs_files_info" or printed when OOM is triggered.

In order not to flush memory logs, we limit the maximum number of files
to be printed when oom through interface "max_print_files_in_oom". And
in order to filter out small files, we limit the minimum size of files
that can be printed through interface "size_threshold".
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

add8fd4e

29 1月, 2022 1 次提交

mm: bdi: initialize bdi_min_ratio when bdi is unregistered · ed5d42e8

由 Manjong Lee 提交于 1月 27, 2022

stable inclusion
from linux-4.19.221
commit 505039dd0fe6493565e27b55700e46f43f20580d

--------------------------------

commit 3c376dfa upstream.

Initialize min_ratio if it is set during bdi unregistration.  This can
prevent problems that may occur a when bdi is removed without resetting
min_ratio.

For example.
1) insert external sdcard
2) set external sdcard's min_ratio 70
3) remove external sdcard without setting min_ratio 0
4) insert external sdcard
5) set external sdcard's min_ratio 70 << error occur(can't set)

Because when an sdcard is removed, the present bdi_min_ratio value will
remain.  Currently, the only way to reset bdi_min_ratio is to reboot.

[akpm@linux-foundation.org: tweak comment and coding style]

Link: https://lkml.kernel.org/r/20211021161942.5983-1-mj0123.lee@samsung.comSigned-off-by: NManjong Lee <mj0123.lee@samsung.com>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Changheun Lee <nanich.lee@samsung.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <seunghwan.hyun@samsung.com>
Cc: <sookwan7.kim@samsung.com>
Cc: <yt0928.kim@samsung.com>
Cc: <junho89.kim@samsung.com>
Cc: <jisoo2146.oh@samsung.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Conflicts:
  mm/backing-dev.c
[yyl: adjust context]
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>

ed5d42e8

26 1月, 2022 3 次提交

etmem: Add a scan flag to support specified page swap-out · 353db299

由 liubo 提交于 1月 26, 2022

euleros inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4QVXW
CVE: NA

-------------------------------------------------
etmem, the memory vertical expansion technology,

The existing memory expansion tool etmem swaps out all pages that can be
swapped out for the process by default, unless the page is marked with
lock flag.

The function of swapping out specified pages is added. The process adds
VM_SWAPFLAG flags for pages to be swapped out. The etmem adds filters to
the scanning module and swaps out only these pages.
Signed-off-by: Nliubo <liubo254@huawei.com>
Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

353db299

etmem: add swapcache reclaim to etmem · d2869c60

由 liubo 提交于 1月 26, 2022

euleros inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4QVXW
CVE: NA

-------------------------------------------------
etmem, the memory vertical expansion technology,

In the current etmem process, memory page swapping is implemented by
invoking shrink_page_list. When this interface is invoked for the first
time, pages are added to the swap cache and written to disks.The swap
cache page is reclaimed only when this interface is invoked for the
second time and no process accesses the page.However, in the etmem
process, the user mode scans pages that have been accessed, and the
migration is not delivered to pages that are not accessed by processes.
Therefore, the swap cache may always be occupied.
To solve the preceding problem, add the logic for actively reclaiming
the swap cache.When the swap cache occupies a large amount of memory,
the system proactively scans the LRU linked list and reclaims the
swap cache to save memory within the specified range.
Signed-off-by: Nliubo <liubo254@huawei.com>
Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

d2869c60

etmem: add original kernel swap enabled options · 44983705

由 liubo 提交于 1月 26, 2022

euleros inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4QVXW
CVE: NA

-------------------------------------------------

etmem, the memory vertical expansion technology,
uses DRAM and high-performance storage new media to form multi-level
memory storage.
By grading the stored data, etmem migrates the classified cold
storage data from the storage medium to the high-performance
storage medium,
so as to achieve the purpose of memory capacity expansion and
memory cost reduction.

When the memory expansion function etmem is running, the native
swap function of the kernel needs to be disabled in certain
scenarios to avoid the impact of kernel swap.

This feature provides the preceding functions.

The /sys/kernel/mm/swap/ directory provides the kernel_swap_enable
sys interface to enable or disable the native swap function
of the kernel.

The default value of /sys/kernel/mm/swap/kernel_swap_enable is true,
that is, kernel swap is enabled by default.

Turn on kernel swap:
	echo true > /sys/kernel/mm/swap/kernel_swap_enable

Turn off kernel swap:
	echo false > /sys/kernel/mm/swap/kernel_swap_enable
Signed-off-by: Nliubo <liubo254@huawei.com>
Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

44983705

17 1月, 2022 4 次提交

hugetlbfs: fix issue of preallocation of gigantic pages can't work · d168c42d

由 Zhenguo Yao 提交于 1月 17, 2022

mainline inclusion
from mainline-v5.16-rc5
commit 4178158e
category: bugfix
bugzilla: 186043
CVE: NA

--------------------------------

Preallocation of gigantic pages can't work bacause of commit
b5389086 ("hugetlbfs: extend the definition of hugepages parameter
to support node allocation").  When nid is NUMA_NO_NODE(-1),
alloc_bootmem_huge_page will always return without doing allocation.
Fix this by adding more check.

Link: https://lkml.kernel.org/r/20211129133803.15653-1-yaozhenguo1@gmail.com
Fixes: b5389086 ("hugetlbfs: extend the definition of hugepages parameter to support node allocation")
Signed-off-by: NZhenguo Yao <yaozhenguo1@gmail.com>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Tested-by: NMaxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: NMuchun Song <songmuchun@bytedance.com>
Reviewed-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: Kefeng Wang<wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

d168c42d

hugetlbfs: extend the definition of hugepages parameter to support node allocation · b0750f70

由 Zhenguo Yao 提交于 1月 17, 2022

mainline inclusion
from mainline-v5.16-rc1
commit b5389086
category: feature
bugzilla: 186043
CVE: NA

--------------------------------

We can specify the number of hugepages to allocate at boot.  But the
hugepages is balanced in all nodes at present.  In some scenarios, we
only need hugepages in one node.  For example: DPDK needs hugepages
which are in the same node as NIC.

If DPDK needs four hugepages of 1G size in node1 and system has 16 numa
nodes we must reserve 64 hugepages on the kernel cmdline.  But only four
hugepages are used.  The others should be free after boot.  If the
system memory is low(for example: 64G), it will be an impossible task.

So extend the hugepages parameter to support specifying hugepages on a
specific node.  For example add following parameter:

  hugepagesz=1G hugepages=0:1,1:3

It will allocate 1 hugepage in node0 and 3 hugepages in node1.

Link: https://lkml.kernel.org/r/20211005054729.86457-1-yaozhenguo1@gmail.comSigned-off-by: NZhenguo Yao <yaozhenguo1@gmail.com>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Cc: Zhenguo Yao <yaozhenguo1@gmail.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Conflicts:
	Documentation/admin-guide/kernel-parameters.txt
	Documentation/admin-guide/mm/hugetlbpage.rst
	arch/powerpc/mm/hugetlbpage.c
	include/linux/hugetlb.h
	mm/hugetlb.c
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: Kefeng Wang<wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

b0750f70

mm: remove sharepool sp_unshare_uva current->mm NULL check · 4ec99782

由 Guo Mengqi 提交于 1月 17, 2022

ascend inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4ODJ6
CVE: NA

---------------------------

Remove the unnecessary current->mm NULL check in sp_unshare_uva, and
allow process to unshare kernel mapped addresses in do_exit().
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

4ec99782

share pool: use rwsem to protect sp group exit · 3aa4f0a7

由 Guo Mengqi 提交于 1月 17, 2022

ascend inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4ODMN
CVE: NA

-------------------------------------------------
Fix following situation:

when the last process in a group exits, and a second process tries to add
to this group.

The second process may get a invalid spg. However the group's
use_count is increased by 1, which caused the first process failed to
free the group when it exits. And then second process called
sp_group_drop --> free_sp_group and cause a double request of rwsem.
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

3aa4f0a7

31 12月, 2021 1 次提交

mm: export collect_procs() · bb784b81

由 Zhang Jian 提交于 12月 31, 2021

ascend inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4OXH9
CVE: NA

-------------------------------------------------

Collect the processes who have the page mapped via collect_procs().

@page if the page is a part of the hugepages/compound-page, we must
using compound_head() to find it's head page to prevent the kernel panic,
and make the page be locked.

@to_kill the function will return a linked list, when we have used
this list, we must kfree the list.

@force_early if we want to find all process, we must make it be true, if
it's false, the function will only return the process who have PF_MCE_PROCESS
or PF_MCE_EARLY mark.

limits: if force_early is true, sysctl_memory_failure_early_kill is useless.
If it's false, no process have PF_MCE_PROCESS and PF_MCE_EARLY flag, and
the sysctl_memory_failure_early_kill is enabled, function will return all tasks
whether the task have the PF_MCE_PROCESS and PF_MCE_EARLY flag.
Signed-off-by: NZhang Jian <zhangjian210@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Reviewed-by: Kefeng Wang<wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

bb784b81

30 12月, 2021 2 次提交

mm/page_alloc: Use cmdline to disable "place pages to tail" · baeaf1da

由 Peng Liu 提交于 12月 30, 2021

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4EG0R
CVE: NA

-----------------------------------------------

Add cmdline to disable "place pages to tail" when online memory.
Signed-off-by: NPeng Liu <liupeng256@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

baeaf1da

uaccess: Add strict non-pagefault kernel-space read function · 22058d2b

由 Daniel Borkmann 提交于 12月 30, 2021

mainline inclusion
from mainline-v5.5-rc1
commit 75a1a607
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NI6R
CVE: NA

--------------------------------

Add two new probe_kernel_read_strict() and strncpy_from_unsafe_strict()
helpers which by default alias to the __probe_kernel_read() and the
__strncpy_from_unsafe(), respectively, but can be overridden by archs
which have non-overlapping address ranges for kernel space and user
space in order to bail out with -EFAULT when attempting to probe user
memory including non-canonical user access addresses [0]:

  4-level page tables:
    user-space mem: 0x0000000000000000 - 0x00007fffffffffff
    non-canonical:  0x0000800000000000 - 0xffff7fffffffffff

  5-level page tables:
    user-space mem: 0x0000000000000000 - 0x00ffffffffffffff
    non-canonical:  0x0100000000000000 - 0xfeffffffffffffff

The idea is that these helpers are complementary to the probe_user_read()
and strncpy_from_unsafe_user() which probe user-only memory. Both added
helpers here do the same, but for kernel-only addresses.

Both set of helpers are going to be used for BPF tracing. They also
explicitly avoid throwing the splat for non-canonical user addresses from
00c42373 ("x86-64: add warning for non-canonical user access address
dereferences").

For compat, the current probe_kernel_read() and strncpy_from_unsafe() are
left as-is.

  [0] Documentation/x86/x86_64/mm.txt
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: x86@kernel.org
Link: https://lore.kernel.org/bpf/eefeefd769aa5a013531f491a71f0936779e916b.1572649915.git.daniel@iogearbox.netSigned-off-by: NPu Lehui <pulehui@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

22058d2b

28 12月, 2021 1 次提交

mm: kmemleak: slob: respect SLAB_NOLEAKTRACE flag · ea5974f9

由 Rustam Kovhaev 提交于 12月 27, 2021

stable inclusion
from linux-4.19.218
commit c52cef42f912ad74cea9e643edef9aec952b23cf

--------------------------------

commit 34dbc3aa upstream.

When kmemleak is enabled for SLOB, system does not boot and does not
print anything to the console.  At the very early stage in the boot
process we hit infinite recursion from kmemleak_init() and eventually
kernel crashes.

kmemleak_init() specifies SLAB_NOLEAKTRACE for KMEM_CACHE(), but
kmem_cache_create_usercopy() removes it because CACHE_CREATE_MASK is not
valid for SLOB.

Let's fix CACHE_CREATE_MASK and make kmemleak work with SLOB

Link: https://lkml.kernel.org/r/20211115020850.3154366-1-rkovhaev@gmail.com
Fixes: d8843922 ("slab: Ignore internal flags in cache creation")
Signed-off-by: NRustam Kovhaev <rkovhaev@gmail.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Reviewed-by: NMuchun Song <songmuchun@bytedance.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Glauber Costa <glommer@parallels.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

ea5974f9

openeuler / Kernel 大约 2 年 前同步成功

openeuler / Kernel
大约 2 年前同步成功