提交 · 7086bdbae575d1be401f9d72767b88cf555ae014 · openeuler / Kernel

15 11月, 2022 4 次提交

mm/sharepool: Fix add group failed with errno 28 · 7086bdba

由 Xu Qiang 提交于 11月 14, 2022

ascend inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I612UG
CVE: NA

--------------------------------

We increase task->mm->mm_users by one when we add the task to a
sharepool group. Correspondingly we should drop the mm_users count when
the task exits. Currently we hijack the mmput function and make it
return early and decrease mm->mm_users by one (just as mmput would do)
if it is not called from a task's exiting process, or we decrease
mm->mm_users by the group number the task was added to. This has two
problems:
1. It makes mmput and sp_group_exit hard to understand.
2. The process of judging if the task (also its mm) is exiting and
   decrease its mm_users count is not atomic. We use this condition:
     mm->mm_users == master->count + MM_WOULD_FREE(1)
   If someone else change the mm->mm_users during those two steps, the
   mm->mm_users would be wrong and mm_struct cannot be released anymore.

Suppose the following process:

        proc1                                        proc2

1)      mmput
          |
          V
2)  enter sp_group_exit and
    'mm->mm_users == master->count + 1' is true
3)        |                                         mmget
          V
4)  decrease mm->mm_users by master->count
          |
          V
5)  enter __mmput and release mm_struct
    if mm->mm_users == 1
6)                                                  mmput

The statistical structure who has the same id of the task would get leaked
together with mm_struct, so the next time we try to create the statistical
structure of the same id, we get a failure.

We fix this by moving sp_group_exit to do_exit() actually where the task is
exiting. We don't need to judge if the task is exiting when someone
calling mmput so there is no chance to change mm_users wrongly.
Signed-off-by: NXu Qiang <xuqiang36@huawei.com>
Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>

7086bdba

mm: sharepool: Fix static check warning · fab907d0

由 Zhang Zekun 提交于 11月 14, 2022

ascend inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I612UG
CVE: NA

--------------------------------

Fix the following static check warning.
Use parentheses to specify the sequence of expressions, instead of using
the default priority.Should use parenthesis while use bitwise operator.

Fix this by add bracket in the expression.
Signed-off-by: NZhang Zekun <zhangzekun11@huawei.com>

fab907d0

mm/sharepool: Use "tgid" instead of "pid" to find a task · 32c81f1b

由 Zhang Zekun 提交于 11月 14, 2022

ascend inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I612UG
CVE: NA

--------------------------------

To support container scenario, use tgid instead of pid to find a
specific task. In normal cases, "tgid" represent a process in init_pid_ns,
this patch should not introduce problems to existing code.

Rename the input parameter "int pid" to "int tgid" in following
exported interfaces:
1.mg_sp_group_id_by_pid()
2.mg_sp_group_add_task()
3.mg_sp_group_del_task()
4.mg_sp_make_share_k2u()
5.mg_sp_make_share_u2k()
6.mg_sp_config_dvpp_range()

Besides, rename these static function together:
1.__sp_find_spg_locked()
2.__sp_find_spg()

The following function use "current->pid" to find spg, change
"current->pid" to "current->tgid".
1.find_or_alloc_sp_group()
2.sp_alloc_prepare()
3.mg_sp_make_share_k2u()
Signed-off-by: NZhang Zekun <zhangzekun11@huawei.com>

32c81f1b

ascend/arm64: Add ascend_enable_all kernel parameter · 66ae8ddd

由 Wang Wensheng 提交于 11月 14, 2022

ascend inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I612UG
CVE: NA

--------------------------------

This kernel parameter is used for ascend scene and would open all the
options needed at once.
Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>

66ae8ddd

11 11月, 2022 25 次提交

mm: Add sysctl to clear free list pages · 426b9efe

由 Yu Liao 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

This patch add sysctl to clear pages in free lists of each NUMA node.
For each NUMA node, clear each page in the free list, these work is
scheduled on a random CPU of the NUMA node.

When kasan is enabled and the pages are free, the shadow memory will be
filled with 0xFF, writing these free pages will cause UAF, so just
disable KASAN for clear freelist.

In the case of large memory, the clear freelist will hold zone lock
for a long time. As a result, the process may be blocked unless clear
freelist thread exit, and causing the system to be reset by the watchdog.

Provide a mechanism to stop clear freelist threads when elapsed time
exceeds cfp_timeout, which can be set by module_param().
Signed-off-by: NYu Liao <liaoyu15@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

426b9efe

mm/hugetlb: Hugetlb use non-mirrored memory if memory reliable is enabled · 74bfdf15

由 Ma Wupeng 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Previous memory allocation in memblock for hugetlb may use mirrored or
non-mirrored memory depends on the system's memory status. However this is
not suitable if hugetlb user want to alloc memory from non-mirrored memory
if memory reliable is enabled.

In order to solve this problem, hugetlb use MEMBLOCK_NOMIRROR flag to alloc
memory from non-mirrored region without fallback to mirrored region.
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

74bfdf15

mm/memblock: Introduce ability to alloc memory from specify memory reigon · 8525dfb2

由 Ma Wupeng 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

With mirrored feature enabled, memblock will prefer to alloc memory from
mirrored memory in any case. Since mirrored region and non-mirrored region
may have different capacity or bandwidth, memblock user may choose which
region to alloc memory rather than choose the mirrored one by default.

To solve this problem, flag MEMBLOCK_NOMIRROR is introduced to alloc memory
from non-mirrored region. Function memblock_alloc_range_nid_flags() is
introduced to alloc memory with specify flag without fallback.
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

8525dfb2

mm: Update reliable flag in memory allocaion for reliable task only in task context · b9b3aaad

由 Ma Wupeng 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Since interrupt may occupy reliable task's context and its current->flags
will have PF_RELIABLE and this will lead to redirect it's memory allocation
to mirrored region.

In order to solve this problem, update reliable task's gfp flag can only
happen in normal task context by checking in_task().
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

b9b3aaad

proc: Count reliable memory usage of reliable tasks · d81e9624

由 Peng Wu 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Counting reliable memory allocated by the reliable user tasks.

The policy of counting reliable memory usage is based on RSS statistics.
Anywhere with counter of mm need count reliable pages too. Reliable page
which is checked by page_reliable() need to update the reliable page
counter by calling reliable_page_counter().

Updating the reliable pages should be considered if the following logic is
added:
- add_mm_counter
- dec_mm_counter
- inc_mm_counter_fast
- dec_mm_counter_fast
- rss[mm_counter(page)]
Signed-off-by: NPeng Wu <wupeng58@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

d81e9624

mm: Show debug info about memory reliable if oom occurs · bfdc680c

由 Ma Wupeng 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Show debug info about memory reliable if oom occurs.
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

bfdc680c

mm: Introduce proc interface to disable memory reliable features · 2525d04c

由 Ma Wupeng 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

reliable_debug is used to disable memory reliable features.

Four bits are used to represent the following features
- bit 0: memory reliable feature
- bit 1: reliable fallback feature
- bit 2: tmpfs use reliable memory feature
- bit 3: pagecache use reliable memory feature

Bit 1~3 are valid if and only if the bit 0 is 1. If the first bit is 0, all
other features will be closed no matter other bits's status.

For example, you can disable all features by

	$ echo 0 > /proc/sys/vm/reliable_debug
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

2525d04c

mm: Introduce reliable_debug=S to control shmem use mirrored memory · 3f5ebb1c

由 Ma Wupeng 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Introduce reliable_debug=S to control shmem use mirrored memory.
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

3f5ebb1c

mm: Introduce shmem mirrored memory limit for memory reliable · f30c7817

由 Ma Wupeng 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

This limit is used to restrict the amount of mirrored memory by shmem.
This memory allocation will return no memory if reliable fallback is off
or fallback to non-mirrored region if reliable fallback on.

This limit can be set or access via
/proc/sys/vm/shmem_reliable_bytes_limit.
The default value of this limit is LONG_MAX. This limit can be set from 0
to the total size of mirrored memory.
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Signed-off-by: NZhou Guanghui <zhouguanghui1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

f30c7817

shmem: Count and show reliable shmem info · 32be46d7

由 Zhou Guanghui 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Count reliable shmem usage based on NR_SHMEM.
Add ReliableShmem in /proc/meminfo to show reliable memory info
used by shmem.

- ReliableShmem: reliable memory used by shmem
Signed-off-by: NZhou Guanghui <zhouguanghui1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

32be46d7

mm: Introduce fallback mechanism for memory reliable · 5f0b48de

由 Ma Wupeng 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Introduce fallback mechanism for memory reliable. memory allocation will
fallback to non-mirrored region if zone's low watermark is reached and
kswapd will be awakened at this time.

This mechanism is enabled by defalut and can be disabled by adding
"reliable_debug=F" to the kernel parameters. This mechanism rely on
CONFIG_MEMORY_RELIABLE and need "kernelcore=reliable" in the kernel
parameters.
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

5f0b48de

mm: Add reliable memory use limit for user tasks · 8968270e

由 Ma Wupeng 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

There is a upper limit for all memory allocation if the following condtions
are met:
- gfp_zone(gfp & ~ GFP_RELIABLE) == ZONE_MOVABLE
- gfp & GFP_RELIABLE is true

Init tasks will alloc memory from non-mirrored region if their allocation
trigger limit.

The limit can be set or access via /proc/sys/vm/task_reliable_limit

This limit's default value is ULONG_MAX. User can update this value between
current user used reliable memory size and total reliable memory size.
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

8968270e

mm: thp: Add memory reliable support for hugepaged collapse · 62639947

由 Ma Wupeng 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Hugepaged collapse pages into huge page will use the same memory region.
When hugepaged collapse pages into huge page, hugepaged will check if
there is any reliable pages in the area to be collapsed. If this area
contains any reliable pages, hugepaged will alloc memory from mirrored
region. Otherwise it will alloc momory from non-mirrored region.
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

62639947

mm: Add support for limiting the usage of reliable memory in pagecache · 4021a0d5

由 Chen Wandun 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Add interface /proc/sys/vm/reliable_pagecache_max_bytes to set the
max size for reliable page cache, the max size cant beyond total
reliable ram.

the whole reliable memory feature depend on kernelcore=mirror,
and which depend on NUMA, so remove redundant code in UMA.
Signed-off-by: NChen Wandun <chenwandun@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

4021a0d5

mm: add "ReliableFileCache" item in /proc/meminfo · 6e6cf0d7

由 Chen Wandun 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Add statistics for usage of reliable page cache, Item "ReliableFileCache"
in /proc/meminfo show the usage of reliable page cache.
Signed-off-by: NChen Wandun <chenwandun@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

6e6cf0d7

proc/meminfo: Add "FileCache" item in /proc/meminfo · ccad5e7a

由 Chen Wandun 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Item "FileCache" in /proc/meminfo show the number of page cache
in LRU(active + inactive).
Signed-off-by: NChen Wandun <chenwandun@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

ccad5e7a

mm: Add cmdline for the reliable memory usage of page cache · 33c4a18f

由 Chen Wandun 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Add cmdline for the reliable memory usage of page cache.
Page cache will not use reliable memory when passing option
"P" to reliable_debug in cmdline.
Signed-off-by: NChen Wandun <chenwandun@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

33c4a18f

mm: Add kernel param for memory reliable · be9e2144

由 Peng Wu 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Add kernel param reliable_debug in reparation for control memory reliable
features.
Signed-off-by: NPeng Wu <wupeng58@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

be9e2144

mm: Clear GFP_RELIABLE if the conditions are not met · a1bdc2e2

由 Ma Wupeng 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Memory reliable only handle memory allocation from movable zone.
GFP_RELIABLE will be removed if the conditions are not met.
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

a1bdc2e2

mm: Disable memory reliable when kdump is in progress · 7023dd3c

由 Ma Wupeng 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Kdump only have limited memory and will lead to bugly memory reliable
features if memory reliable if enabled. So disable memory reliable if kdump
is in progress.
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

7023dd3c

mm: Count reliable memory info based on zone info · 4623cc86

由 Ma Wupeng 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Count reliable memory info based on zone info. Any zone below
ZONE_MOVABLE is seed as reliable zone and sum the pages there.
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

4623cc86

mm: Refactor code in reliable_report_meminfo() · 5baa7afd

由 Ma Wupeng 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Use show_val_kb() to format meminfo.
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

5baa7afd

mm: Export mem_reliable_status() for checking memory reliable status · 64f874c8

由 Ma Wupeng 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Export the mem_reliable_status(), so it can be used by others to check
memory reliable's status.
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

64f874c8

mm: Export static key mem_reliable · b3571129

由 Ma Wupeng 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Static key mem_reliable is used to check wheater memory reliable's status
in kernel's inline functions. These inline function rely on this but dirver
can not use because this symbol is not exported.
To slove this problem, export this symbol to make prepration for driver to
use memory reliable's inline function.
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

b3571129

mm: Drop shmem reliable related log during startup · 55ac3d06

由 Ma Wupeng 提交于 11月 11, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SK3S
CVE: NA

--------------------------------

Message "shmem reliable disabled." will be printed if memory reliable is
disabled. This is not necessary so drop it.
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>

55ac3d06

10 11月, 2022 2 次提交

page_alloc: fix invalid watermark check on a negative value · 48bc7241

由 Jaewon Kim 提交于 11月 10, 2022

stable inclusion
from stable-v5.10.135
commit 2670f76a563124478d0d14e603b38b73b99c389c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZWFM

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2670f76a563124478d0d14e603b38b73b99c389c

--------------------------------

commit 9282012f upstream.

There was a report that a task is waiting at the
throttle_direct_reclaim. The pgscan_direct_throttle in vmstat was
increasing.

This is a bug where zone_watermark_fast returns true even when the free
is very low. The commit f27ce0e1 ("page_alloc: consider highatomic
reserve in watermark fast") changed the watermark fast to consider
highatomic reserve. But it did not handle a negative value case which
can be happened when reserved_highatomic pageblock is bigger than the
actual free.

If watermark is considered as ok for the negative value, allocating
contexts for order-0 will consume all free pages without direct reclaim,
and finally free page may become depleted except highatomic free.

Then allocating contexts may fall into throttle_direct_reclaim. This
symptom may easily happen in a system where wmark min is low and other
reclaimers like kswapd does not make free pages quickly.

Handle the negative case by using MIN.

Link: https://lkml.kernel.org/r/20220725095212.25388-1-jaewon31.kim@samsung.com
Fixes: f27ce0e1 ("page_alloc: consider highatomic reserve in watermark fast")
Signed-off-by: NJaewon Kim <jaewon31.kim@samsung.com>
Reported-by: NGyeongHwan Hong <gh21.hong@samsung.com>
Acked-by: NMel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Yong-Taek Lee <ytk.lee@samsung.com>
Cc: <stable@vger.kerenl.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

48bc7241

mm/mempolicy: fix uninit-value in mpol_rebind_policy() · 7057a3c7

由 Wang Cheng 提交于 11月 10, 2022

stable inclusion
from stable-v5.10.134
commit ddb3f0b68863bd1c5f43177eea476bce316d4993
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZVR7

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ddb3f0b68863bd1c5f43177eea476bce316d4993

--------------------------------

commit 018160ad upstream.

mpol_set_nodemask()(mm/mempolicy.c) does not set up nodemask when
pol->mode is MPOL_LOCAL.  Check pol->mode before access
pol->w.cpuset_mems_allowed in mpol_rebind_policy()(mm/mempolicy.c).

BUG: KMSAN: uninit-value in mpol_rebind_policy mm/mempolicy.c:352 [inline]
BUG: KMSAN: uninit-value in mpol_rebind_task+0x2ac/0x2c0 mm/mempolicy.c:368
 mpol_rebind_policy mm/mempolicy.c:352 [inline]
 mpol_rebind_task+0x2ac/0x2c0 mm/mempolicy.c:368
 cpuset_change_task_nodemask kernel/cgroup/cpuset.c:1711 [inline]
 cpuset_attach+0x787/0x15e0 kernel/cgroup/cpuset.c:2278
 cgroup_migrate_execute+0x1023/0x1d20 kernel/cgroup/cgroup.c:2515
 cgroup_migrate kernel/cgroup/cgroup.c:2771 [inline]
 cgroup_attach_task+0x540/0x8b0 kernel/cgroup/cgroup.c:2804
 __cgroup1_procs_write+0x5cc/0x7a0 kernel/cgroup/cgroup-v1.c:520
 cgroup1_tasks_write+0x94/0xb0 kernel/cgroup/cgroup-v1.c:539
 cgroup_file_write+0x4c2/0x9e0 kernel/cgroup/cgroup.c:3852
 kernfs_fop_write_iter+0x66a/0x9f0 fs/kernfs/file.c:296
 call_write_iter include/linux/fs.h:2162 [inline]
 new_sync_write fs/read_write.c:503 [inline]
 vfs_write+0x1318/0x2030 fs/read_write.c:590
 ksys_write+0x28b/0x510 fs/read_write.c:643
 __do_sys_write fs/read_write.c:655 [inline]
 __se_sys_write fs/read_write.c:652 [inline]
 __x64_sys_write+0xdb/0x120 fs/read_write.c:652
 do_syscall_x64 arch/x86/entry/common.c:51 [inline]
 do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Uninit was created at:
 slab_post_alloc_hook mm/slab.h:524 [inline]
 slab_alloc_node mm/slub.c:3251 [inline]
 slab_alloc mm/slub.c:3259 [inline]
 kmem_cache_alloc+0x902/0x11c0 mm/slub.c:3264
 mpol_new mm/mempolicy.c:293 [inline]
 do_set_mempolicy+0x421/0xb70 mm/mempolicy.c:853
 kernel_set_mempolicy mm/mempolicy.c:1504 [inline]
 __do_sys_set_mempolicy mm/mempolicy.c:1510 [inline]
 __se_sys_set_mempolicy+0x44c/0xb60 mm/mempolicy.c:1507
 __x64_sys_set_mempolicy+0xd8/0x110 mm/mempolicy.c:1507
 do_syscall_x64 arch/x86/entry/common.c:51 [inline]
 do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82
 entry_SYSCALL_64_after_hwframe+0x44/0xae

KMSAN: uninit-value in mpol_rebind_task (2)
https://syzkaller.appspot.com/bug?id=d6eb90f952c2a5de9ea718a1b873c55cb13b59dc

This patch seems to fix below bug too.
KMSAN: uninit-value in mpol_rebind_mm (2)
https://syzkaller.appspot.com/bug?id=f2fecd0d7013f54ec4162f60743a2b28df40926b

The uninit-value is pol->w.cpuset_mems_allowed in mpol_rebind_policy().
When syzkaller reproducer runs to the beginning of mpol_new(),

	    mpol_new() mm/mempolicy.c
	  do_mbind() mm/mempolicy.c
	kernel_mbind() mm/mempolicy.c

`mode` is 1(MPOL_PREFERRED), nodes_empty(*nodes) is `true` and `flags`
is 0. Then

	mode = MPOL_LOCAL;
	...
	policy->mode = mode;
	policy->flags = flags;

will be executed. So in mpol_set_nodemask(),

	    mpol_set_nodemask() mm/mempolicy.c
	  do_mbind()
	kernel_mbind()

pol->mode is 4 (MPOL_LOCAL), that `nodemask` in `pol` is not initialized,
which will be accessed in mpol_rebind_policy().

Link: https://lkml.kernel.org/r/20220512123428.fq3wofedp6oiotd4@ppc.localdomainSigned-off-by: NWang Cheng <wanngchenng@gmail.com>
Reported-by: <syzbot+217f792c92599518a2ab@syzkaller.appspotmail.com>
Tested-by: <syzbot+217f792c92599518a2ab@syzkaller.appspotmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

7057a3c7

09 11月, 2022 9 次提交

mm/sharepool: fix the incorrect judgement of the addr range · b1e17d35

由 Zhou Guanghui 提交于 11月 03, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5XQS4
CVE: NA

--------------------------------

The address range of dvpp is [start, start + size), the value of
start + size can be out of the address range.
Signed-off-by: NZhou Guanghui <zhouguanghui1@huawei.com>

b1e17d35

mm/sharepool: Fix sharepool hugepage cgroup uncount error. · 107e2b7c

由 Guo Mengqi 提交于 11月 03, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5XQS4
CVE: NA

--------------------------------

If current->flag is set as PF_MEMALLOC, memcgroup will not check
current's allocation against memory use limit, which cause system run
out of memory.

According to
https://lkml.indiana.edu/hypermail/linux/kernel/0911.2/00576.html,
PF_MEMALLOC shall only be used when more memory are sure to be freed as a
result of this allocation.

Do not use PF_MEMALLOC, rather, remove __GFP_RECLAIM from gfp_mask to
ensure no reclaim.
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>

107e2b7c

mm/sharepool: Rebind the numa node when fallback to normal pages · 1343dd93

由 Wang Wensheng 提交于 11月 03, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5XQS4
CVE: NA

--------------------------------

When we allocate memory using SP_HUGEPAGE, we would try normal pages when
there was no enough hugepages. The specified numa node information would
get lost when we fallback to normal pages. The result is that we could
allocate memory from other numa node than what we have specified.

The soultion is to rebind the node before retrying.
Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>

1343dd93

mm/sharepool: Remove the leading double underlines for function name · 95618625

由 Zhang Zekun 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5XQS4
CVE: NA

----------------------------------------------

Renaming __insert_sp_area to insert_sp_area.
Renaming __find_sp_area_locked to find_sp_area_locked.

Fix this by renaming __insert_sp_area to insert_sp_area.
Signed-off-by: NZhang Zekun <zhangzekun11@huawei.com>

95618625

mm/sharepool: Fix code-style warnings · c3c8461e

由 Zhang Zekun 提交于 11月 03, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5XQS4
CVE: NA

-----------------------------------------

1. Remove the inline clause before sp_mapping_find().
2. Do not declare or define reserved identifiers.
3. Add brackets in if, elese/elseif statements.
4. The pointer(*) can't have no spaces neither before nor after it.
5. Use parentheses to specify the sequence of expressions in
   sp_remap_kva_to_vma(), sp_node_id(), init_local_group().
6. Besides, change the name of __find_sp_area() to get_sp_area() to
   represent that this function need not to be called with lock hold
   and implicit that this function will increase the use_count.
Signed-off-by: NZhang Zekun <zhangzekun11@huawei.com>

c3c8461e

mm/sharepool: fix hugepage_rsvd count increase error · bec70574

由 Guo Mengqi 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5RO2H
CVE: NA

--------------------------------

When nr_hugepages is configured, sharepool allocates hugepages first
from hugetlb pool, then from buddy system if the pool had been used up.
Current page release function treat the buddy system hugepages as
hugetlb pages, which caused HugePages_Rsvd to increase improperly.

Add a check in page release function:
    if the page is temporary, do not call hugetlb_unreserve_pages.
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>

bec70574

mm/sharepool: check size=0 in mg_sp_make_share_k2u() · 564272e8

由 Guo Mengqi 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5QQPG
CVE: NA

--------------------------------

Add a size-0-check in mg_sp_make_share_k2u() to avoid passing 0-size spa
to __insert_sp_area().
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>

564272e8

mm/sharepool: fix potential AA deadlock · d9fb53bf

由 Guo Mengqi 提交于 11月 03, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5R0X9
CVE: NA

--------------------------------

Fix a AA deadlock caused by nested lock in mg_sp_group_add_task().

Deadlock path:

mg_sp_group_add_task()

    down_write(sp_group_sem)
    find_or_alloc_sp_group()
	!spg_valid()
	sp_group_drop()
	    free_sp_group() -> down_write(sp_group_sem)
    ---> AA deadlock
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>

d9fb53bf

mm/sharepool: delete unused codes · 872ebaa0

由 Guo Mengqi 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5QETC
CVE: NA

--------------------------------

sp_make_share_k2u only supports vmalloc address now. Therefore, delete a
backup handle case.

Also master is guaranteed not be freed until master->node_list is emptied.
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>

872ebaa0

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功