提交 · 32be46d7e9d4a83468b5de1d40b3e6540fc628cd · openeuler / Kernel

11 11月, 2022 16 次提交

shmem: Count and show reliable shmem info · 32be46d7

由 Zhou Guanghui 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Count reliable shmem usage based on NR_SHMEM.
Add ReliableShmem in /proc/meminfo to show reliable memory info
used by shmem.

- ReliableShmem: reliable memory used by shmem
Signed-off-by: NZhou Guanghui <zhouguanghui1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

32be46d7

mm: Introduce fallback mechanism for memory reliable · 5f0b48de

由 Ma Wupeng 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Introduce fallback mechanism for memory reliable. memory allocation will
fallback to non-mirrored region if zone's low watermark is reached and
kswapd will be awakened at this time.

This mechanism is enabled by defalut and can be disabled by adding
"reliable_debug=F" to the kernel parameters. This mechanism rely on
CONFIG_MEMORY_RELIABLE and need "kernelcore=reliable" in the kernel
parameters.
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

5f0b48de

mm: Add reliable memory use limit for user tasks · 8968270e

由 Ma Wupeng 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

There is a upper limit for all memory allocation if the following condtions
are met:
- gfp_zone(gfp & ~ GFP_RELIABLE) == ZONE_MOVABLE
- gfp & GFP_RELIABLE is true

Init tasks will alloc memory from non-mirrored region if their allocation
trigger limit.

The limit can be set or access via /proc/sys/vm/task_reliable_limit

This limit's default value is ULONG_MAX. User can update this value between
current user used reliable memory size and total reliable memory size.
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

8968270e

mm: thp: Add memory reliable support for hugepaged collapse · 62639947

由 Ma Wupeng 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Hugepaged collapse pages into huge page will use the same memory region.
When hugepaged collapse pages into huge page, hugepaged will check if
there is any reliable pages in the area to be collapsed. If this area
contains any reliable pages, hugepaged will alloc memory from mirrored
region. Otherwise it will alloc momory from non-mirrored region.
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

62639947

mm: Add support for limiting the usage of reliable memory in pagecache · 4021a0d5

由 Chen Wandun 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Add interface /proc/sys/vm/reliable_pagecache_max_bytes to set the
max size for reliable page cache, the max size cant beyond total
reliable ram.

the whole reliable memory feature depend on kernelcore=mirror,
and which depend on NUMA, so remove redundant code in UMA.
Signed-off-by: NChen Wandun <chenwandun@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

4021a0d5

mm: add "ReliableFileCache" item in /proc/meminfo · 6e6cf0d7

由 Chen Wandun 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Add statistics for usage of reliable page cache, Item "ReliableFileCache"
in /proc/meminfo show the usage of reliable page cache.
Signed-off-by: NChen Wandun <chenwandun@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

6e6cf0d7

proc/meminfo: Add "FileCache" item in /proc/meminfo · ccad5e7a

由 Chen Wandun 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Item "FileCache" in /proc/meminfo show the number of page cache
in LRU(active + inactive).
Signed-off-by: NChen Wandun <chenwandun@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

ccad5e7a

mm: Add cmdline for the reliable memory usage of page cache · 33c4a18f

由 Chen Wandun 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Add cmdline for the reliable memory usage of page cache.
Page cache will not use reliable memory when passing option
"P" to reliable_debug in cmdline.
Signed-off-by: NChen Wandun <chenwandun@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

33c4a18f

mm: Add kernel param for memory reliable · be9e2144

由 Peng Wu 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Add kernel param reliable_debug in reparation for control memory reliable
features.
Signed-off-by: NPeng Wu <wupeng58@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

be9e2144

mm: Clear GFP_RELIABLE if the conditions are not met · a1bdc2e2

由 Ma Wupeng 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Memory reliable only handle memory allocation from movable zone.
GFP_RELIABLE will be removed if the conditions are not met.
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

a1bdc2e2

mm: Disable memory reliable when kdump is in progress · 7023dd3c

由 Ma Wupeng 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Kdump only have limited memory and will lead to bugly memory reliable
features if memory reliable if enabled. So disable memory reliable if kdump
is in progress.
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

7023dd3c

mm: Count reliable memory info based on zone info · 4623cc86

由 Ma Wupeng 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Count reliable memory info based on zone info. Any zone below
ZONE_MOVABLE is seed as reliable zone and sum the pages there.
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

4623cc86

mm: Refactor code in reliable_report_meminfo() · 5baa7afd

由 Ma Wupeng 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Use show_val_kb() to format meminfo.
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

5baa7afd

mm: Export mem_reliable_status() for checking memory reliable status · 64f874c8

由 Ma Wupeng 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Export the mem_reliable_status(), so it can be used by others to check
memory reliable's status.
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

64f874c8

mm: Export static key mem_reliable · b3571129

由 Ma Wupeng 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Static key mem_reliable is used to check wheater memory reliable's status
in kernel's inline functions. These inline function rely on this but dirver
can not use because this symbol is not exported.
To slove this problem, export this symbol to make prepration for driver to
use memory reliable's inline function.
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

b3571129

mm: Drop shmem reliable related log during startup · 55ac3d06

由 Ma Wupeng 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Message "shmem reliable disabled." will be printed if memory reliable is
disabled. This is not necessary so drop it.
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

55ac3d06

10 11月, 2022 2 次提交

page_alloc: fix invalid watermark check on a negative value · 48bc7241

由 Jaewon Kim 提交于 11月 10, 2022

stable inclusion
from stable-v5.10.135
commit 2670f76a563124478d0d14e603b38b73b99c389c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZWFM

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2670f76a563124478d0d14e603b38b73b99c389c

--------------------------------

commit 9282012f upstream.

There was a report that a task is waiting at the
throttle_direct_reclaim. The pgscan_direct_throttle in vmstat was
increasing.

This is a bug where zone_watermark_fast returns true even when the free
is very low. The commit f27ce0e1 ("page_alloc: consider highatomic
reserve in watermark fast") changed the watermark fast to consider
highatomic reserve. But it did not handle a negative value case which
can be happened when reserved_highatomic pageblock is bigger than the
actual free.

If watermark is considered as ok for the negative value, allocating
contexts for order-0 will consume all free pages without direct reclaim,
and finally free page may become depleted except highatomic free.

Then allocating contexts may fall into throttle_direct_reclaim. This
symptom may easily happen in a system where wmark min is low and other
reclaimers like kswapd does not make free pages quickly.

Handle the negative case by using MIN.

Link: https://lkml.kernel.org/r/20220725095212.25388-1-jaewon31.kim@samsung.com
Fixes: f27ce0e1 ("page_alloc: consider highatomic reserve in watermark fast")
Signed-off-by: NJaewon Kim <jaewon31.kim@samsung.com>
Reported-by: NGyeongHwan Hong <gh21.hong@samsung.com>
Acked-by: NMel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Yong-Taek Lee <ytk.lee@samsung.com>
Cc: <stable@vger.kerenl.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

48bc7241

mm/mempolicy: fix uninit-value in mpol_rebind_policy() · 7057a3c7

由 Wang Cheng 提交于 11月 10, 2022

stable inclusion
from stable-v5.10.134
commit ddb3f0b68863bd1c5f43177eea476bce316d4993
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZVR7

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ddb3f0b68863bd1c5f43177eea476bce316d4993

--------------------------------

commit 018160ad upstream.

mpol_set_nodemask()(mm/mempolicy.c) does not set up nodemask when
pol->mode is MPOL_LOCAL.  Check pol->mode before access
pol->w.cpuset_mems_allowed in mpol_rebind_policy()(mm/mempolicy.c).

BUG: KMSAN: uninit-value in mpol_rebind_policy mm/mempolicy.c:352 [inline]
BUG: KMSAN: uninit-value in mpol_rebind_task+0x2ac/0x2c0 mm/mempolicy.c:368
 mpol_rebind_policy mm/mempolicy.c:352 [inline]
 mpol_rebind_task+0x2ac/0x2c0 mm/mempolicy.c:368
 cpuset_change_task_nodemask kernel/cgroup/cpuset.c:1711 [inline]
 cpuset_attach+0x787/0x15e0 kernel/cgroup/cpuset.c:2278
 cgroup_migrate_execute+0x1023/0x1d20 kernel/cgroup/cgroup.c:2515
 cgroup_migrate kernel/cgroup/cgroup.c:2771 [inline]
 cgroup_attach_task+0x540/0x8b0 kernel/cgroup/cgroup.c:2804
 __cgroup1_procs_write+0x5cc/0x7a0 kernel/cgroup/cgroup-v1.c:520
 cgroup1_tasks_write+0x94/0xb0 kernel/cgroup/cgroup-v1.c:539
 cgroup_file_write+0x4c2/0x9e0 kernel/cgroup/cgroup.c:3852
 kernfs_fop_write_iter+0x66a/0x9f0 fs/kernfs/file.c:296
 call_write_iter include/linux/fs.h:2162 [inline]
 new_sync_write fs/read_write.c:503 [inline]
 vfs_write+0x1318/0x2030 fs/read_write.c:590
 ksys_write+0x28b/0x510 fs/read_write.c:643
 __do_sys_write fs/read_write.c:655 [inline]
 __se_sys_write fs/read_write.c:652 [inline]
 __x64_sys_write+0xdb/0x120 fs/read_write.c:652
 do_syscall_x64 arch/x86/entry/common.c:51 [inline]
 do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Uninit was created at:
 slab_post_alloc_hook mm/slab.h:524 [inline]
 slab_alloc_node mm/slub.c:3251 [inline]
 slab_alloc mm/slub.c:3259 [inline]
 kmem_cache_alloc+0x902/0x11c0 mm/slub.c:3264
 mpol_new mm/mempolicy.c:293 [inline]
 do_set_mempolicy+0x421/0xb70 mm/mempolicy.c:853
 kernel_set_mempolicy mm/mempolicy.c:1504 [inline]
 __do_sys_set_mempolicy mm/mempolicy.c:1510 [inline]
 __se_sys_set_mempolicy+0x44c/0xb60 mm/mempolicy.c:1507
 __x64_sys_set_mempolicy+0xd8/0x110 mm/mempolicy.c:1507
 do_syscall_x64 arch/x86/entry/common.c:51 [inline]
 do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82
 entry_SYSCALL_64_after_hwframe+0x44/0xae

KMSAN: uninit-value in mpol_rebind_task (2)
https://syzkaller.appspot.com/bug?id=d6eb90f952c2a5de9ea718a1b873c55cb13b59dc

This patch seems to fix below bug too.
KMSAN: uninit-value in mpol_rebind_mm (2)
https://syzkaller.appspot.com/bug?id=f2fecd0d7013f54ec4162f60743a2b28df40926b

The uninit-value is pol->w.cpuset_mems_allowed in mpol_rebind_policy().
When syzkaller reproducer runs to the beginning of mpol_new(),

	    mpol_new() mm/mempolicy.c
	  do_mbind() mm/mempolicy.c
	kernel_mbind() mm/mempolicy.c

`mode` is 1(MPOL_PREFERRED), nodes_empty(*nodes) is `true` and `flags`
is 0. Then

	mode = MPOL_LOCAL;
	...
	policy->mode = mode;
	policy->flags = flags;

will be executed. So in mpol_set_nodemask(),

	    mpol_set_nodemask() mm/mempolicy.c
	  do_mbind()
	kernel_mbind()

pol->mode is 4 (MPOL_LOCAL), that `nodemask` in `pol` is not initialized,
which will be accessed in mpol_rebind_policy().

Link: https://lkml.kernel.org/r/20220512123428.fq3wofedp6oiotd4@ppc.localdomainSigned-off-by: NWang Cheng <wanngchenng@gmail.com>
Reported-by: <syzbot+217f792c92599518a2ab@syzkaller.appspotmail.com>
Tested-by: <syzbot+217f792c92599518a2ab@syzkaller.appspotmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

7057a3c7

09 11月, 2022 22 次提交

mm/sharepool: fix the incorrect judgement of the addr range · b1e17d35

由 Zhou Guanghui 提交于 11月 03, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5XQS4
CVE: NA

--------------------------------

The address range of dvpp is [start, start + size), the value of
start + size can be out of the address range.
Signed-off-by: NZhou Guanghui <zhouguanghui1@huawei.com>

b1e17d35

mm/sharepool: Fix sharepool hugepage cgroup uncount error. · 107e2b7c

由 Guo Mengqi 提交于 11月 03, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5XQS4
CVE: NA

--------------------------------

If current->flag is set as PF_MEMALLOC, memcgroup will not check
current's allocation against memory use limit, which cause system run
out of memory.

According to
https://lkml.indiana.edu/hypermail/linux/kernel/0911.2/00576.html,
PF_MEMALLOC shall only be used when more memory are sure to be freed as a
result of this allocation.

Do not use PF_MEMALLOC, rather, remove __GFP_RECLAIM from gfp_mask to
ensure no reclaim.
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>

107e2b7c

mm/sharepool: Rebind the numa node when fallback to normal pages · 1343dd93

由 Wang Wensheng 提交于 11月 03, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5XQS4
CVE: NA

--------------------------------

When we allocate memory using SP_HUGEPAGE, we would try normal pages when
there was no enough hugepages. The specified numa node information would
get lost when we fallback to normal pages. The result is that we could
allocate memory from other numa node than what we have specified.

The soultion is to rebind the node before retrying.
Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>

1343dd93

mm/sharepool: Remove the leading double underlines for function name · 95618625

由 Zhang Zekun 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5XQS4
CVE: NA

----------------------------------------------

Renaming __insert_sp_area to insert_sp_area.
Renaming __find_sp_area_locked to find_sp_area_locked.

Fix this by renaming __insert_sp_area to insert_sp_area.
Signed-off-by: NZhang Zekun <zhangzekun11@huawei.com>

95618625

mm/sharepool: Fix code-style warnings · c3c8461e

由 Zhang Zekun 提交于 11月 03, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5XQS4
CVE: NA

-----------------------------------------

1. Remove the inline clause before sp_mapping_find().
2. Do not declare or define reserved identifiers.
3. Add brackets in if, elese/elseif statements.
4. The pointer(*) can't have no spaces neither before nor after it.
5. Use parentheses to specify the sequence of expressions in
   sp_remap_kva_to_vma(), sp_node_id(), init_local_group().
6. Besides, change the name of __find_sp_area() to get_sp_area() to
   represent that this function need not to be called with lock hold
   and implicit that this function will increase the use_count.
Signed-off-by: NZhang Zekun <zhangzekun11@huawei.com>

c3c8461e

mm/sharepool: fix hugepage_rsvd count increase error · bec70574

由 Guo Mengqi 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5RO2H
CVE: NA

--------------------------------

When nr_hugepages is configured, sharepool allocates hugepages first
from hugetlb pool, then from buddy system if the pool had been used up.
Current page release function treat the buddy system hugepages as
hugetlb pages, which caused HugePages_Rsvd to increase improperly.

Add a check in page release function:
    if the page is temporary, do not call hugetlb_unreserve_pages.
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>

bec70574

mm/sharepool: check size=0 in mg_sp_make_share_k2u() · 564272e8

由 Guo Mengqi 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5QQPG
CVE: NA

--------------------------------

Add a size-0-check in mg_sp_make_share_k2u() to avoid passing 0-size spa
to __insert_sp_area().
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>

564272e8

mm/sharepool: fix potential AA deadlock · d9fb53bf

由 Guo Mengqi 提交于 11月 03, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5R0X9
CVE: NA

--------------------------------

Fix a AA deadlock caused by nested lock in mg_sp_group_add_task().

Deadlock path:

mg_sp_group_add_task()

    down_write(sp_group_sem)
    find_or_alloc_sp_group()
	!spg_valid()
	sp_group_drop()
	    free_sp_group() -> down_write(sp_group_sem)
    ---> AA deadlock
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>

d9fb53bf

mm/sharepool: delete unused codes · 872ebaa0

由 Guo Mengqi 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5QETC
CVE: NA

--------------------------------

sp_make_share_k2u only supports vmalloc address now. Therefore, delete a
backup handle case.

Also master is guaranteed not be freed until master->node_list is emptied.
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>

872ebaa0

mm/sharepool: bugfix for 2M U2K · e6a23a8d

由 Zhou Guanghui 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5PZDX
CVE: NA

--------------------------------

We could determine if a userspace map is huge-mapped after walking its
pagetable. So the uva_align should be calculated again after walking
the pagetable if it is huge-mapped.
Signed-off-by: NZhou Guanghui <zhouguanghui1@huawei.com>

e6a23a8d

mm/sharepool: Support alloc ro mapping · d9687e45

由 Chen Jun 提交于 11月 03, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5I72Q
CVE: NA

--------------------------------

1. Split sharepool normal area(8T) to sharepool readonly area(64G) and
sharepool normal area(8T - 64G)
2. User programs can not write to the address in sharepool readonly
   area.
3. Add SP_PROT_FOCUS for sp_alloc.
4. sp_alloc with SP_PROT_RO | SP_PROT_FOCUS returns the virtual address
   within sharepool readonly area.
5. Other user programs which add into task with write prot can not write
the address in sharepool readonly area.
Signed-off-by: NChen Jun <chenjun102@huawei.com>

d9687e45

mm/sharepool: Extract sp_mapping_find · 60d69023

由 Chen Jun 提交于 11月 03, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5I72Q
CVE: NA

--------------------------------

Extract code logic of obtaining sp_mapping by address into a function
sp_mapping_find.
Signed-off-by: NChen Jun <chenjun102@huawei.com>

60d69023

mm/sharepool: replace spg->{dvpp|normal} with spg->mapping[SP_MAPPING_{DVPP|NORMAL}] · 91bc1d52

由 Chen Jun 提交于 11月 03, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5I72Q
CVE: NA

--------------------------------

spg->dvpp and spg->normal can be combined into one array.
Signed-off-by: NChen Jun <chenjun102@huawei.com>

91bc1d52

mm/sharepool: Rename sp_mapping.flag to sp_mapping.type · ef12ea35

由 Chen Jun 提交于 11月 03, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5I72Q
CVE: NA

--------------------------------

Now, sp_mapping.flag is only used to distinguish sp_mapping types.
So, 'type' are more suitable.
Signed-off-by: NChen Jun <chenjun102@huawei.com>

ef12ea35

mm/sharepool: Make the definitions of MMAP_SHARE_POOL_{START|16G_START} more readable · 14cd3fb0

由 Chen Jun 提交于 11月 03, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5I72Q
CVE: NA

--------------------------------

"TASK_SIZE - MMAP_SHARE_POOL_DVPP_SIZE" is puzzling.

MMAP_SHARE_POOL_START = MMAP_SHARE_POOL_END - MMAP_SHARE_POOL_SIZE and
MMAP_SHARE_POOL_16G_START = MMAP_SHARE_POOL_END - MMAP_SHARE_POOL_DVPP_SIZE
make the memory layout not unintuitive.
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>

14cd3fb0

mm/sharepool: Avoid UAF on mm · a151f824

由 Zhou Guanghui 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5PIA6
CVE: NA

--------------------------------

Use get_task_mm to avoid the mm being released when the
information in mm_struct is used.
Signed-off-by: NZhou Guanghui <zhouguanghui1@huawei.com>

a151f824

mm/sharepool: Check the maximum value of spg_id · 99b7756c

由 Zhou Guanghui 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5PIA4
CVE: NA

--------------------------------

The maximum value of spg_id is checked to ensure that the value
of spg_id is within the valid range:
SPG_ID_DEFAULT or [SPG_ID_MIN SPG_ID_AUTO)
Signed-off-by: NZhou Guanghui <zhouguanghui1@huawei.com>

99b7756c

mm/sharepool: Avoid UAF on spa · 27d0e771

由 Zhou Guanghui 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5PIA0
CVE: NA

--------------------------------

The spa is used during the update_mem_usage. In this case, the
spa has been released in the case of concurrency (mg_sp_unshare).
Signed-off-by: NZhou Guanghui <zhouguanghui1@huawei.com>

27d0e771

mm/sharepool: delete unnecessary judgment · 142bfed2

由 Zhou Guanghui 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5PIA2
CVE: NA

--------------------------------

When a process is added to a group, mm->mm_users increases by one.
When a process is deleted from a group, mm->mm_users decreases by
one. It is not possible to reduce to 0 because this function is
preceded by get_task_mm.
Signed-off-by: NZhou Guanghui <zhouguanghui1@huawei.com>

142bfed2

mm/sharepool: Fix UAF reported by KASAN · 19896d2c

由 Wang Wensheng 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5PD4P
CVE: NA

--------------------------------

[ 2058.802818][  T290] BUG: KASAN: use-after-free in get_process_sp_res+0x70/0x134
[ 2058.810194][  T290] Read of size 8 at addr ffff00088dc6ab28 by task test_debug_loop/290
[ 2058.820520][  T290] CPU: 5 PID: 290 Comm: test_debug_loop Tainted: G        W  OE     5.10.0+ #2
[ 2058.829377][  T290] Hardware name: EVB(EP) (DT)
[ 2058.833982][  T290] Call trace:
[ 2058.837217][  T290]  dump_backtrace+0x0/0x30c
[ 2058.841660][  T290]  show_stack+0x20/0x30
[ 2058.845758][  T290]  dump_stack+0x120/0x1b0
[ 2058.850028][  T290]  print_address_description.constprop.0+0x2c/0x1fc
[ 2058.856555][  T290]  __kasan_report+0xfc/0x160
[ 2058.861086][  T290]  kasan_report+0x44/0xb0
[ 2058.865356][  T290]  __asan_load8+0x94/0xd0
[ 2058.869623][  T290]  get_process_sp_res+0x70/0x134
[ 2058.874501][  T290]  proc_usage_show+0x1ac/0x304
[ 2058.879208][  T290]  seq_read_iter+0x254/0x750
[ 2058.883728][  T290]  proc_reg_read_iter+0x100/0x140
[ 2058.888689][  T290]  new_sync_read+0x1cc/0x2c0
[ 2058.893215][  T290]  vfs_read+0x1f4/0x250
[ 2058.897304][  T290]  ksys_read+0xcc/0x170
[ 2058.901399][  T290]  __arm64_sys_read+0x4c/0x60
[ 2058.906016][  T290]  el0_svc_common.constprop.0+0xb4/0x2a0
[ 2058.911584][  T290]  do_el0_svc+0x8c/0xb0
[ 2058.915677][  T290]  el0_svc+0x20/0x30
[ 2058.919503][  T290]  el0_sync_handler+0xb0/0xbc
[ 2058.924114][  T290]  el0_sync+0x180/0x1c0
[ 2058.928190][  T290]
[ 2058.930444][  T290] Allocated by task 2176:
[ 2058.934714][  T290]  kasan_save_stack+0x28/0x60
[ 2058.939328][  T290]  __kasan_kmalloc.constprop.0+0xc8/0xf0
[ 2058.944909][  T290]  kasan_kmalloc+0x10/0x20
[ 2058.949268][  T290]  kmem_cache_alloc_trace+0x128/0xabc
[ 2058.954577][  T290]  create_spg_node+0x58/0x214
[ 2058.959188][  T290]  local_group_add_task+0x30/0x14c
[ 2058.964231][  T290]  init_local_group+0xd0/0x1a0
[ 2058.968936][  T290]  sp_init_group_master_locked.part.0+0x19c/0x290
[ 2058.975298][  T290]  mg_sp_group_add_task+0x73c/0xdb0
[ 2058.980456][  T290]  dev_sp_add_group+0x124/0x2dc [sharepool_dev]
[ 2058.986647][  T290]  dev_ioctl+0x21c/0x2ec [sharepool_dev]
[ 2058.992222][  T290]  __arm64_sys_ioctl+0xd8/0x120
[ 2058.997010][  T290]  el0_svc_common.constprop.0+0xb4/0x2a0
[ 2059.002572][  T290]  do_el0_svc+0x8c/0xb0
[ 2059.006662][  T290]  el0_svc+0x20/0x30
[ 2059.010489][  T290]  el0_sync_handler+0xb0/0xbc
[ 2059.015101][  T290]  el0_sync+0x180/0x1c0
[ 2059.019176][  T290]
[ 2059.021427][  T290] Freed by task 4125:
[ 2059.025343][  T290]  kasan_save_stack+0x28/0x60
[ 2059.029949][  T290]  kasan_set_track+0x28/0x40
[ 2059.034476][  T290]  kasan_set_free_info+0x24/0x50
[ 2059.039347][  T290]  __kasan_slab_free+0x104/0x1ac
[ 2059.044227][  T290]  kasan_slab_free+0x14/0x20
[ 2059.048744][  T290]  kfree+0x164/0xb94
[ 2059.052576][  T290]  sp_group_post_exit+0xf0/0x980
[ 2059.057448][  T290]  mmput.part.0+0xb4/0x220
[ 2059.061790][  T290]  mmput+0x2c/0x40
[ 2059.065450][  T290]  exit_mm+0x27c/0x3a0
[ 2059.069450][  T290]  do_exit+0x2a0/0x790
[ 2059.073448][  T290]  do_group_exit+0x64/0x100
[ 2059.077884][  T290]  get_signal+0x1fc/0x9fc
[ 2059.082144][  T290]  do_signal+0x110/0x2cc
[ 2059.086320][  T290]  do_notify_resume+0x158/0x2b0
[ 2059.091108][  T290]  work_pending+0xc/0x6d4
[ 2059.095358][  T290]
Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>

19896d2c

mm/sharepool: fix deadlock in sp_check_mmap_addr · 78c82ea5

由 Guo Mengqi 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OE1J
CVE: NA

--------------------------------

Fix a deadlock indicated below:

[  171.669844] Chain exists of:
[  171.669844]   &mm->mmap_lock --> sp_group_sem --> &spg->rw_lock
[  171.669844]
[  171.671469]  Possible unsafe locking scenario:
[  171.671469]
[  171.672121]        CPU0                    CPU1
[  171.672415]        ----                    ----
[  171.672706]   lock(&spg->rw_lock);
[  171.673114]                                lock(sp_group_sem);
[  171.673706]                                lock(&spg->rw_lock);
[  171.674208]   lock(&mm->mmap_lock);
[  171.674863]
[  171.674863]  *** DEADLOCK ***

sharepool use lock in order:
sp_group_sem --> &spg->rw_lock --> mm->mmap_lock
However, in sp_check_mmap_addr(), when mm->mmap_lock is held, it
requested sp_group_sem, which is: mm->mmap_lock --> sp_group_sem.
This causes ABBA problem.

This happens in:

[  171.642687] the existing dependency chain (in reverse order) is:
[  171.643745]
[  171.643745] -> #2 (&spg->rw_lock){++++}-{3:3}:
[  171.644639]        __lock_acquire+0x6f4/0xc40
[  171.645189]        lock_acquire+0x2f0/0x3c8
[  171.645631]        down_read+0x64/0x2d8
[  171.646075]        proc_usage_by_group+0x50/0x258 (spg->rw_lock)
[  171.646542]        idr_for_each+0x6c/0xf0
[  171.647011]        proc_group_usage_show+0x140/0x178
[  171.647629]        seq_read_iter+0xe4/0x498
[  171.648217]        proc_reg_read_iter+0xa8/0xe0
[  171.648776]        new_sync_read+0xfc/0x1a0
[  171.649002]        vfs_read+0x1ac/0x1c8
[  171.649217]        ksys_read+0x74/0xf8
[  171.649596]        __arm64_sys_read+0x24/0x30
[  171.649934]        el0_svc_common.constprop.0+0x8c/0x270
[  171.650528]        do_el0_svc+0x34/0xb8
[  171.651069]        el0_svc+0x1c/0x28
[  171.651278]        el0_sync_handler+0x8c/0xb0
[  171.651636]        el0_sync+0x168/0x180
[  171.652118]
[  171.652118] -> #1 (sp_group_sem){++++}-{3:3}:
[  171.652692]        __lock_acquire+0x6f4/0xc40
[  171.653059]        lock_acquire+0x2f0/0x3c8
[  171.653303]        down_read+0x64/0x2d8
[  171.653704]        mg_is_sharepool_addr+0x184/0x340 (&sp_group_sem)
[  171.654085]        sp_check_mmap_addr+0x64/0x108
[  171.654668]        arch_get_unmapped_area_topdown+0x9c/0x528
[  171.655370]        thp_get_unmapped_area+0x54/0x68
[  171.656170]        get_unmapped_area+0x94/0x160
[  171.656415]        __do_mmap_mm+0xd4/0x540
[  171.656629]        do_mmap+0x98/0x648
[  171.656838]        vm_mmap_pgoff+0xc0/0x188
[  171.657129]        vm_mmap+0x6c/0x98
[  171.657619]        elf_map+0xe0/0x118
[  171.657835]        load_elf_binary+0x4ec/0xfd8
[  171.658103]        bprm_execve.part.9+0x3ec/0x840
[  171.658448]        bprm_execve+0x7c/0xb0
[  171.658919]        kernel_execve+0x18c/0x198
[  171.659500]        run_init_process+0xf0/0x108
[  171.660073]        try_to_run_init_process+0x20/0x58
[  171.660558]        kernel_init+0xcc/0x120
[  171.660862]        ret_from_fork+0x10/0x18
[  171.661273]
[  171.661273] -> #0 (&mm->mmap_lock){++++}-{3:3}:
[  171.661885]        check_prev_add+0xa4/0xbd8
[  171.662229]        validate_chain+0xf54/0x14b8
[  171.662705]        __lock_acquire+0x6f4/0xc40
[  171.663310]        lock_acquire+0x2f0/0x3c8
[  171.663658]        down_write+0x60/0x208
[  171.664179]        mg_sp_alloc+0x24c/0x1150 (mm->mmap_lock)
[  171.665245]        dev_ioctl+0x1128/0x1fb8 [sharepool_dev]
[  171.665688]        __arm64_sys_ioctl+0xb0/0xe8
[  171.666250]        el0_svc_common.constprop.0+0x8c/0x270
[  171.667255]        do_el0_svc+0x34/0xb8
[  171.667806]        el0_svc+0x1c/0x28
[  171.668249]        el0_sync_handler+0x8c/0xb0
[  171.668661]        el0_sync+0x168/0x180
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>

78c82ea5

mm/sharepool: fix deadlock in spa_stat_of_mapping_show · 608669b7

由 Guo Mengqi 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OE1J
CVE: NA

--------------------------------

The mutex protecting spm_dvpp_list has an ABBA deadlock with
spg->rw_lock. Try add a process to a sharepool group and cat
/proc/sharepool/spa_stat at the same time will reproduce the
problem.

Remove spg->rw_lock to avoid this.

[ 1101.013480]INFO: task test:3567 blocked for more than 30 seconds.
[ 1101.014378]      Tainted: G           OE     5.10.0+ #45
[ 1101.015707]task:test state:D stack:    0 pid: 3567
[ 1101.016464]Call trace:
[ 1101.016736] __switch_to+0xc0/0x128
[ 1101.017082] __schedule+0x3fc/0x898
[ 1101.017626] schedule+0x48/0xd8
[ 1101.017981] schedule_preempt_disabled+0x14/0x20
[ 1101.018519] __mutex_lock.isra.1+0x160/0x638
[ 1101.018899] __mutex_lock_slowpath+0x24/0x30
[ 1101.019291] mutex_lock+0x5c/0x68
[ 1101.019607] sp_mapping_create+0x118/0x1b0
[ 1101.019963] sp_init_group_master_locked.part.9+0x10c/0x288
[ 1101.020356] mg_sp_group_add_task.part.16+0x7dc/0xcd0
[ 1101.020750] mg_sp_group_add_task+0x54/0xd0
[ 1101.021120] dev_ioctl+0x360/0x1e20 [sharepool_dev]
[ 1101.022171] __arm64_sys_ioctl+0xb0/0xe8
[ 1101.022695] el0_svc_common.constprop.0+0x88/0x268
[ 1101.023143] do_el0_svc+0x34/0xb8
[ 1101.023487] el0_svc+0x1c/0x28
[ 1101.023775] el0_sync_handler+0x8c/0xb0
[ 1101.024120] el0_sync+0x168/0x180
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>

608669b7

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功