提交 · d9687e453946bfad0a403a249603582f258d5d18 · openeuler / Kernel

09 11月, 2022 25 次提交

mm/sharepool: Support alloc ro mapping · d9687e45

由 Chen Jun 提交于 11月 03, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5I72Q
CVE: NA

--------------------------------

1. Split sharepool normal area(8T) to sharepool readonly area(64G) and
sharepool normal area(8T - 64G)
2. User programs can not write to the address in sharepool readonly
   area.
3. Add SP_PROT_FOCUS for sp_alloc.
4. sp_alloc with SP_PROT_RO | SP_PROT_FOCUS returns the virtual address
   within sharepool readonly area.
5. Other user programs which add into task with write prot can not write
the address in sharepool readonly area.
Signed-off-by: NChen Jun <chenjun102@huawei.com>

d9687e45

mm/sharepool: Extract sp_mapping_find · 60d69023

由 Chen Jun 提交于 11月 03, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5I72Q
CVE: NA

--------------------------------

Extract code logic of obtaining sp_mapping by address into a function
sp_mapping_find.
Signed-off-by: NChen Jun <chenjun102@huawei.com>

60d69023

mm/sharepool: replace spg->{dvpp|normal} with spg->mapping[SP_MAPPING_{DVPP|NORMAL}] · 91bc1d52

由 Chen Jun 提交于 11月 03, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5I72Q
CVE: NA

--------------------------------

spg->dvpp and spg->normal can be combined into one array.
Signed-off-by: NChen Jun <chenjun102@huawei.com>

91bc1d52

mm/sharepool: Rename sp_mapping.flag to sp_mapping.type · ef12ea35

由 Chen Jun 提交于 11月 03, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5I72Q
CVE: NA

--------------------------------

Now, sp_mapping.flag is only used to distinguish sp_mapping types.
So, 'type' are more suitable.
Signed-off-by: NChen Jun <chenjun102@huawei.com>

ef12ea35

mm/sharepool: Make the definitions of MMAP_SHARE_POOL_{START|16G_START} more readable · 14cd3fb0

由 Chen Jun 提交于 11月 03, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5I72Q
CVE: NA

--------------------------------

"TASK_SIZE - MMAP_SHARE_POOL_DVPP_SIZE" is puzzling.

MMAP_SHARE_POOL_START = MMAP_SHARE_POOL_END - MMAP_SHARE_POOL_SIZE and
MMAP_SHARE_POOL_16G_START = MMAP_SHARE_POOL_END - MMAP_SHARE_POOL_DVPP_SIZE
make the memory layout not unintuitive.
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>

14cd3fb0

mm/sharepool: Avoid UAF on mm · a151f824

由 Zhou Guanghui 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5PIA6
CVE: NA

--------------------------------

Use get_task_mm to avoid the mm being released when the
information in mm_struct is used.
Signed-off-by: NZhou Guanghui <zhouguanghui1@huawei.com>

a151f824

mm/sharepool: Check the maximum value of spg_id · 99b7756c

由 Zhou Guanghui 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5PIA4
CVE: NA

--------------------------------

The maximum value of spg_id is checked to ensure that the value
of spg_id is within the valid range:
SPG_ID_DEFAULT or [SPG_ID_MIN SPG_ID_AUTO)
Signed-off-by: NZhou Guanghui <zhouguanghui1@huawei.com>

99b7756c

mm/sharepool: Avoid UAF on spa · 27d0e771

由 Zhou Guanghui 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5PIA0
CVE: NA

--------------------------------

The spa is used during the update_mem_usage. In this case, the
spa has been released in the case of concurrency (mg_sp_unshare).
Signed-off-by: NZhou Guanghui <zhouguanghui1@huawei.com>

27d0e771

mm/sharepool: delete unnecessary judgment · 142bfed2

由 Zhou Guanghui 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5PIA2
CVE: NA

--------------------------------

When a process is added to a group, mm->mm_users increases by one.
When a process is deleted from a group, mm->mm_users decreases by
one. It is not possible to reduce to 0 because this function is
preceded by get_task_mm.
Signed-off-by: NZhou Guanghui <zhouguanghui1@huawei.com>

142bfed2

mm/sharepool: Fix UAF reported by KASAN · 19896d2c

由 Wang Wensheng 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5PD4P
CVE: NA

--------------------------------

[ 2058.802818][  T290] BUG: KASAN: use-after-free in get_process_sp_res+0x70/0x134
[ 2058.810194][  T290] Read of size 8 at addr ffff00088dc6ab28 by task test_debug_loop/290
[ 2058.820520][  T290] CPU: 5 PID: 290 Comm: test_debug_loop Tainted: G        W  OE     5.10.0+ #2
[ 2058.829377][  T290] Hardware name: EVB(EP) (DT)
[ 2058.833982][  T290] Call trace:
[ 2058.837217][  T290]  dump_backtrace+0x0/0x30c
[ 2058.841660][  T290]  show_stack+0x20/0x30
[ 2058.845758][  T290]  dump_stack+0x120/0x1b0
[ 2058.850028][  T290]  print_address_description.constprop.0+0x2c/0x1fc
[ 2058.856555][  T290]  __kasan_report+0xfc/0x160
[ 2058.861086][  T290]  kasan_report+0x44/0xb0
[ 2058.865356][  T290]  __asan_load8+0x94/0xd0
[ 2058.869623][  T290]  get_process_sp_res+0x70/0x134
[ 2058.874501][  T290]  proc_usage_show+0x1ac/0x304
[ 2058.879208][  T290]  seq_read_iter+0x254/0x750
[ 2058.883728][  T290]  proc_reg_read_iter+0x100/0x140
[ 2058.888689][  T290]  new_sync_read+0x1cc/0x2c0
[ 2058.893215][  T290]  vfs_read+0x1f4/0x250
[ 2058.897304][  T290]  ksys_read+0xcc/0x170
[ 2058.901399][  T290]  __arm64_sys_read+0x4c/0x60
[ 2058.906016][  T290]  el0_svc_common.constprop.0+0xb4/0x2a0
[ 2058.911584][  T290]  do_el0_svc+0x8c/0xb0
[ 2058.915677][  T290]  el0_svc+0x20/0x30
[ 2058.919503][  T290]  el0_sync_handler+0xb0/0xbc
[ 2058.924114][  T290]  el0_sync+0x180/0x1c0
[ 2058.928190][  T290]
[ 2058.930444][  T290] Allocated by task 2176:
[ 2058.934714][  T290]  kasan_save_stack+0x28/0x60
[ 2058.939328][  T290]  __kasan_kmalloc.constprop.0+0xc8/0xf0
[ 2058.944909][  T290]  kasan_kmalloc+0x10/0x20
[ 2058.949268][  T290]  kmem_cache_alloc_trace+0x128/0xabc
[ 2058.954577][  T290]  create_spg_node+0x58/0x214
[ 2058.959188][  T290]  local_group_add_task+0x30/0x14c
[ 2058.964231][  T290]  init_local_group+0xd0/0x1a0
[ 2058.968936][  T290]  sp_init_group_master_locked.part.0+0x19c/0x290
[ 2058.975298][  T290]  mg_sp_group_add_task+0x73c/0xdb0
[ 2058.980456][  T290]  dev_sp_add_group+0x124/0x2dc [sharepool_dev]
[ 2058.986647][  T290]  dev_ioctl+0x21c/0x2ec [sharepool_dev]
[ 2058.992222][  T290]  __arm64_sys_ioctl+0xd8/0x120
[ 2058.997010][  T290]  el0_svc_common.constprop.0+0xb4/0x2a0
[ 2059.002572][  T290]  do_el0_svc+0x8c/0xb0
[ 2059.006662][  T290]  el0_svc+0x20/0x30
[ 2059.010489][  T290]  el0_sync_handler+0xb0/0xbc
[ 2059.015101][  T290]  el0_sync+0x180/0x1c0
[ 2059.019176][  T290]
[ 2059.021427][  T290] Freed by task 4125:
[ 2059.025343][  T290]  kasan_save_stack+0x28/0x60
[ 2059.029949][  T290]  kasan_set_track+0x28/0x40
[ 2059.034476][  T290]  kasan_set_free_info+0x24/0x50
[ 2059.039347][  T290]  __kasan_slab_free+0x104/0x1ac
[ 2059.044227][  T290]  kasan_slab_free+0x14/0x20
[ 2059.048744][  T290]  kfree+0x164/0xb94
[ 2059.052576][  T290]  sp_group_post_exit+0xf0/0x980
[ 2059.057448][  T290]  mmput.part.0+0xb4/0x220
[ 2059.061790][  T290]  mmput+0x2c/0x40
[ 2059.065450][  T290]  exit_mm+0x27c/0x3a0
[ 2059.069450][  T290]  do_exit+0x2a0/0x790
[ 2059.073448][  T290]  do_group_exit+0x64/0x100
[ 2059.077884][  T290]  get_signal+0x1fc/0x9fc
[ 2059.082144][  T290]  do_signal+0x110/0x2cc
[ 2059.086320][  T290]  do_notify_resume+0x158/0x2b0
[ 2059.091108][  T290]  work_pending+0xc/0x6d4
[ 2059.095358][  T290]
Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>

19896d2c

mm/sharepool: fix deadlock in sp_check_mmap_addr · 78c82ea5

由 Guo Mengqi 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OE1J
CVE: NA

--------------------------------

Fix a deadlock indicated below:

[  171.669844] Chain exists of:
[  171.669844]   &mm->mmap_lock --> sp_group_sem --> &spg->rw_lock
[  171.669844]
[  171.671469]  Possible unsafe locking scenario:
[  171.671469]
[  171.672121]        CPU0                    CPU1
[  171.672415]        ----                    ----
[  171.672706]   lock(&spg->rw_lock);
[  171.673114]                                lock(sp_group_sem);
[  171.673706]                                lock(&spg->rw_lock);
[  171.674208]   lock(&mm->mmap_lock);
[  171.674863]
[  171.674863]  *** DEADLOCK ***

sharepool use lock in order:
sp_group_sem --> &spg->rw_lock --> mm->mmap_lock
However, in sp_check_mmap_addr(), when mm->mmap_lock is held, it
requested sp_group_sem, which is: mm->mmap_lock --> sp_group_sem.
This causes ABBA problem.

This happens in:

[  171.642687] the existing dependency chain (in reverse order) is:
[  171.643745]
[  171.643745] -> #2 (&spg->rw_lock){++++}-{3:3}:
[  171.644639]        __lock_acquire+0x6f4/0xc40
[  171.645189]        lock_acquire+0x2f0/0x3c8
[  171.645631]        down_read+0x64/0x2d8
[  171.646075]        proc_usage_by_group+0x50/0x258 (spg->rw_lock)
[  171.646542]        idr_for_each+0x6c/0xf0
[  171.647011]        proc_group_usage_show+0x140/0x178
[  171.647629]        seq_read_iter+0xe4/0x498
[  171.648217]        proc_reg_read_iter+0xa8/0xe0
[  171.648776]        new_sync_read+0xfc/0x1a0
[  171.649002]        vfs_read+0x1ac/0x1c8
[  171.649217]        ksys_read+0x74/0xf8
[  171.649596]        __arm64_sys_read+0x24/0x30
[  171.649934]        el0_svc_common.constprop.0+0x8c/0x270
[  171.650528]        do_el0_svc+0x34/0xb8
[  171.651069]        el0_svc+0x1c/0x28
[  171.651278]        el0_sync_handler+0x8c/0xb0
[  171.651636]        el0_sync+0x168/0x180
[  171.652118]
[  171.652118] -> #1 (sp_group_sem){++++}-{3:3}:
[  171.652692]        __lock_acquire+0x6f4/0xc40
[  171.653059]        lock_acquire+0x2f0/0x3c8
[  171.653303]        down_read+0x64/0x2d8
[  171.653704]        mg_is_sharepool_addr+0x184/0x340 (&sp_group_sem)
[  171.654085]        sp_check_mmap_addr+0x64/0x108
[  171.654668]        arch_get_unmapped_area_topdown+0x9c/0x528
[  171.655370]        thp_get_unmapped_area+0x54/0x68
[  171.656170]        get_unmapped_area+0x94/0x160
[  171.656415]        __do_mmap_mm+0xd4/0x540
[  171.656629]        do_mmap+0x98/0x648
[  171.656838]        vm_mmap_pgoff+0xc0/0x188
[  171.657129]        vm_mmap+0x6c/0x98
[  171.657619]        elf_map+0xe0/0x118
[  171.657835]        load_elf_binary+0x4ec/0xfd8
[  171.658103]        bprm_execve.part.9+0x3ec/0x840
[  171.658448]        bprm_execve+0x7c/0xb0
[  171.658919]        kernel_execve+0x18c/0x198
[  171.659500]        run_init_process+0xf0/0x108
[  171.660073]        try_to_run_init_process+0x20/0x58
[  171.660558]        kernel_init+0xcc/0x120
[  171.660862]        ret_from_fork+0x10/0x18
[  171.661273]
[  171.661273] -> #0 (&mm->mmap_lock){++++}-{3:3}:
[  171.661885]        check_prev_add+0xa4/0xbd8
[  171.662229]        validate_chain+0xf54/0x14b8
[  171.662705]        __lock_acquire+0x6f4/0xc40
[  171.663310]        lock_acquire+0x2f0/0x3c8
[  171.663658]        down_write+0x60/0x208
[  171.664179]        mg_sp_alloc+0x24c/0x1150 (mm->mmap_lock)
[  171.665245]        dev_ioctl+0x1128/0x1fb8 [sharepool_dev]
[  171.665688]        __arm64_sys_ioctl+0xb0/0xe8
[  171.666250]        el0_svc_common.constprop.0+0x8c/0x270
[  171.667255]        do_el0_svc+0x34/0xb8
[  171.667806]        el0_svc+0x1c/0x28
[  171.668249]        el0_sync_handler+0x8c/0xb0
[  171.668661]        el0_sync+0x168/0x180
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>

78c82ea5

mm/sharepool: fix deadlock in spa_stat_of_mapping_show · 608669b7

由 Guo Mengqi 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5OE1J
CVE: NA

--------------------------------

The mutex protecting spm_dvpp_list has an ABBA deadlock with
spg->rw_lock. Try add a process to a sharepool group and cat
/proc/sharepool/spa_stat at the same time will reproduce the
problem.

Remove spg->rw_lock to avoid this.

[ 1101.013480]INFO: task test:3567 blocked for more than 30 seconds.
[ 1101.014378]      Tainted: G           OE     5.10.0+ #45
[ 1101.015707]task:test state:D stack:    0 pid: 3567
[ 1101.016464]Call trace:
[ 1101.016736] __switch_to+0xc0/0x128
[ 1101.017082] __schedule+0x3fc/0x898
[ 1101.017626] schedule+0x48/0xd8
[ 1101.017981] schedule_preempt_disabled+0x14/0x20
[ 1101.018519] __mutex_lock.isra.1+0x160/0x638
[ 1101.018899] __mutex_lock_slowpath+0x24/0x30
[ 1101.019291] mutex_lock+0x5c/0x68
[ 1101.019607] sp_mapping_create+0x118/0x1b0
[ 1101.019963] sp_init_group_master_locked.part.9+0x10c/0x288
[ 1101.020356] mg_sp_group_add_task.part.16+0x7dc/0xcd0
[ 1101.020750] mg_sp_group_add_task+0x54/0xd0
[ 1101.021120] dev_ioctl+0x360/0x1e20 [sharepool_dev]
[ 1101.022171] __arm64_sys_ioctl+0xb0/0xe8
[ 1101.022695] el0_svc_common.constprop.0+0x88/0x268
[ 1101.023143] do_el0_svc+0x34/0xb8
[ 1101.023487] el0_svc+0x1c/0x28
[ 1101.023775] el0_sync_handler+0x8c/0xb0
[ 1101.024120] el0_sync+0x168/0x180
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>

608669b7

mm/sharepool: fix softlockup in high pressure use case. · 8b19f5e0

由 Guo Mengqi 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ODCT
CVE: NA

--------------------------------

When there are a large number of groups in the system, or with a large
number of processes in each group, "cat /proc/sharepool/proc_stat"
will encounter softlockup before all prints finished.
This is because there are too many loops in the callback function.
Remove one of the loops to reduce time cost and add a cond_resched() to
avoid this.

root@buildroot:~/install# cat /proc/sharepool/proc_stat
[ 1250.647469] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [cat:309]
[ 1250.648610] Modules linked in: sharepool_dev(OE)
[ 1250.650795] CPU: 0 PID: 309 Comm: cat Tainted: G     OE     5.10.0+ #43
[ 1250.651216] Hardware name: linux,dummy-virt (DT)
[ 1250.651721] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
[ 1250.652426] pc : get_process_sp_res+0x40/0x90
[ 1250.652747] lr : proc_usage_by_group+0x158/0x218
    ...
[ 1250.657903] Call trace:
[ 1250.658376]  get_process_sp_res+0x40/0x90
[ 1250.658602]  proc_usage_by_group+0x158/0x218
[ 1250.658838]  idr_for_each+0x6c/0xf0
[ 1250.659027]  proc_group_usage_show+0x104/0x120
[ 1250.659263]  seq_read_iter+0xe0/0x498
[ 1250.659462]  proc_reg_read_iter+0xa8/0xe0
[ 1250.659660]  generic_file_splice_read+0xf0/0x1b0
[ 1250.659865]  do_splice_to+0x7c/0xd0
[ 1250.660029]  splice_direct_to_actor+0xe0/0x2a8
[ 1250.660353]  do_splice_direct+0xa4/0xf8
[ 1250.660902]  do_sendfile+0x1bc/0x420
[ 1250.661079]  __arm64_sys_sendfile64+0x170/0x178
[ 1250.661298]  el0_svc_common.constprop.0+0x88/0x268
[ 1250.661505]  do_el0_svc+0x34/0xb8
[ 1250.661686]  el0_svc+0x1c/0x28
[ 1250.661836]  el0_sync_handler+0x8c/0xb0
[ 1250.662033]  el0_sync+0x168/0x180
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>

8b19f5e0

mm/sharepool: delete redundant codes · cd65775f

由 Guo Mengqi 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5O5RQ
CVE: NA

--------------------------------

Notice that in sp_unshare_uva(), for authentication check, comparison
between current->tgid and spa->applier is well enough. There is no need
to check current->mm against spa->mm.

Other redundant cases:
- find_spg_node_by_spg() will never return NULL in current use context;
- spg_info_show() will not come across a group with id 0.

Therefore, delete these redundant paths.
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>

cd65775f

mm/sharepool: Add a read lock in proc_usage_show() · d5fb0387

由 Zhang Zekun 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5XQS4
CVE: NA

-----------------------------------------------

In function get_process_sp_res(), spg_node can be freed by other
process, the access to spg_node->spg can cause kernel panic. Add
a pair of read lock to fix this problem.
Fix the same problem in proc_sp_group_state().

Fixes: 3d37f8717287 ("[Huawei] mm: sharepool: use built-in-statistics")
Signed-off-by: NZhang Zekun <zhangzekun11@huawei.com>

d5fb0387

mm/sharepool: fix static code-check errors · de73eb95

由 Guo Mengqi 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5MS48
CVE: NA

--------------------------------

Fix two bugs revealed by static check:

- Release the mm->mmap_lock when mm->sp_group_master had not been
initialized.
- Do not add mm to master list if there process add group failed.
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>

de73eb95

mm/sharepool: fix statistics error · 738027fc

由 Guo Mengqi 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5M3PS
CVE: NA

--------------------------------

- fix SP_RES value incorrect bug
- fix SP_RES_T value incorrect bug
- fix pid field uninitialized error in pass-through scenario
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>

738027fc

mm/sharepool: Remove the comment and fix a bug in mg_sp_group_id_by_pid() · c2e830a7

由 Zhang Zekun 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5LY2R
CVE: NA

-------------------------------------------

Remove the meaningless comment in mg_sp_free() and the fix the
bug in mg_sp_group_id_by_pid() parameter check path.
Signed-off-by: NZhang Zekun <zhangzekun11@huawei.com>

c2e830a7

mm/sharepool: Remove enable_mdc_default_group and change the definition of is_process_in_group() · 36ddb7ca

由 Zhang Zekun 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5LY51
CVE: NA

----------------------------------------------

The variable enable_mdc_default_group has been deprecated, thus remove
it and the corresponding code.
The definition of is_process_in_group() can be ambiguous, thus change
the return value type.
Signed-off-by: NZhang Zekun <zhangzekun11@huawei.com>

36ddb7ca

mm/sharepool: Remove sp_device_number_detect function · 5b9c2984

由 Zhang Zekun 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5LY4H
CVE: NA

-----------------------------------------

Remove the sp_device_number, and we don't need 'sp_device_number'
to detect the sp_device_number. Instead, we use maco 'MAX_DEVID' to
take the place of sp_device_number.
Signed-off-by: NZhang Zekun <zhangzekun11@huawei.com>

5b9c2984

mm/sharepool: Remove unused sp_dev_va_start and sp_dev_va_size · 009c8a05

由 Zhang Zekun 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5LY5K
CVE: NA

-----------------------------------

Remove the unused sp_dev_va_start and sp_dev_va_size, the related
code can be removed.

Add the dvpp_addr checker in mg_is_sharepool_addr() for current proc.
Signed-off-by: NZhang Zekun <zhangzekun11@huawei.com>

009c8a05

mm/sharepool: Delete unused sysctl interface · 00f8b7c2

由 Wang Wensheng 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5LHGZ
CVE: NA

--------------------------------

Delete unused sysctl interfaces in sharepool feature.
Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>

00f8b7c2

mm/sharepool: fix dvpp spm redundant print error · efa70a93

由 Guo Mengqi 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5KSDH
CVE: NA

--------------------------------

Fix sharepool redundant /proc/sharepool/spa_stat prints when there are
multiple groups which are all attached to same sp_mapping.

Traverse all dvpp-mappings rather than all groups.
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>

efa70a93

mm/sharepool: proc_sp_group_state bugfix · a38909f1

由 Guo Mengqi 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5K3MH
CVE: NA

--------------------------------

After refactoring, cat /proc/pid_xx/sp_group will cause kernel panic.
Fix this error.
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>

a38909f1

mm/sharepool: remove deprecated interfaces · aa7f4227

由 Guo Mengqi 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5KC7C
CVE: NA

--------------------------------

Most interfaces starting with "sp_" are deprecated, remove them.
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>

aa7f4227

03 11月, 2022 3 次提交

bootmem: remove the vmemmap pages from kmemleak in put_page_bootmem · 778cc62f

由 Liu Shixin 提交于 11月 03, 2022

maillist inclusion
category: bugfix
bugzilla: 187198, https://gitee.com/openeuler/kernel/issues/I5NX1S

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20220823&id=f5606044e659f8fa754fa692e2fa5aea1ec7f2f6

--------------------------------

The vmemmap pages is marked by kmemleak when allocated from memblock.
Remove it from kmemleak when freeing the page.  Otherwise, when we reuse
the page, kmemleak may report such an error and then stop working.

 kmemleak: Cannot insert 0xffff98fb6eab3d40 into the object search tree (overlaps existing)
 kmemleak: Kernel memory leak detector disabled
 kmemleak: Object 0xffff98fb6be00000 (size 335544320):
 kmemleak:   comm "swapper", pid 0, jiffies 4294892296
 kmemleak:   min_count = 0
 kmemleak:   count = 0
 kmemleak:   flags = 0x1
 kmemleak:   checksum = 0
 kmemleak:   backtrace:

Link: https://lkml.kernel.org/r/20220819094005.2928241-1-liushixin2@huawei.com
Fixes: f41f2ed4 (mm: hugetlb: free the vmemmap pages associated with each HugeTLB page)
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NMuchun Song <songmuchun@bytedance.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

778cc62f

mm/hugetlb: fix races when looking up a CONT-PTE/PMD size hugetlb page · 5ef3abe9

由 Baolin Wang 提交于 11月 03, 2022

mainline inclusion
from mainline-v6.1-rc1
commit fac35ba7
category: bugfix
bugzilla: 187864, https://gitee.com/src-openeuler/kernel/issues/I5X1Z9
CVE: CVE-2022-3623

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=fac35ba763ed07ba93154c95ffc0c4a55023707f

--------------------------------

On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb (2M and
1G), but also CONT-PTE/PMD size(64K and 32M) if a 4K page size specified.

So when looking up a CONT-PTE size hugetlb page by follow_page(), it will
use pte_offset_map_lock() to get the pte entry lock for the CONT-PTE size
hugetlb in follow_page_pte().  However this pte entry lock is incorrect
for the CONT-PTE size hugetlb, since we should use huge_pte_lock() to get
the correct lock, which is mm->page_table_lock.

That means the pte entry of the CONT-PTE size hugetlb under current pte
lock is unstable in follow_page_pte(), we can continue to migrate or
poison the pte entry of the CONT-PTE size hugetlb, which can cause some
potential race issues, even though they are under the 'pte lock'.

For example, suppose thread A is trying to look up a CONT-PTE size hugetlb
page by move_pages() syscall under the lock, however antoher thread B can
migrate the CONT-PTE hugetlb page at the same time, which will cause
thread A to get an incorrect page, if thread A also wants to do page
migration, then data inconsistency error occurs.

Moreover we have the same issue for CONT-PMD size hugetlb in
follow_huge_pmd().

To fix above issues, rename the follow_huge_pmd() as follow_huge_pmd_pte()
to handle PMD and PTE level size hugetlb, which uses huge_pte_lock() to
get the correct pte entry lock to make the pte entry stable.

Mike said:

Support for CONT_PMD/_PTE was added with bb9dd3df ("arm64: hugetlb:
refactor find_num_contig()").  Patch series "Support for contiguous pte
hugepages", v4.  However, I do not believe these code paths were
executed until migration support was added with 5480280d ("arm64/mm:
enable HugeTLB migration for contiguous bit HugeTLB pages") I would go
with 5480280d for the Fixes: targe.

Link: https://lkml.kernel.org/r/635f43bdd85ac2615a58405da82b4d33c6e5eb05.1662017562.git.baolin.wang@linux.alibaba.com
Fixes: 5480280d ("arm64/mm: enable HugeTLB migration for contiguous bit HugeTLB pages")
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Suggested-by: NMike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Conflicts:
	mm/hugetlb.c
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

5ef3abe9

mm/memory.c: fix race when faulting a device private page · 66c1e596

由 Alistair Popple 提交于 11月 03, 2022

mainline inclusion
from mainline-v6.1-rc1
commit 16ce101d
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5VZ0L
CVE: CVE-2022-3523

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=16ce101db85db694a91380aa4c89b25530871d33

--------------------------------

Patch series "Fix several device private page reference counting issues",
v2

This series aims to fix a number of page reference counting issues in
drivers dealing with device private ZONE_DEVICE pages.  These result in
use-after-free type bugs, either from accessing a struct page which no
longer exists because it has been removed or accessing fields within the
struct page which are no longer valid because the page has been freed.

During normal usage it is unlikely these will cause any problems.  However
without these fixes it is possible to crash the kernel from userspace.
These crashes can be triggered either by unloading the kernel module or
unbinding the device from the driver prior to a userspace task exiting.
In modules such as Nouveau it is also possible to trigger some of these
issues by explicitly closing the device file-descriptor prior to the task
exiting and then accessing device private memory.

This involves some minor changes to both PowerPC and AMD GPU code.
Unfortunately I lack hardware to test either of those so any help there
would be appreciated.  The changes mimic what is done in for both Nouveau
and hmm-tests though so I doubt they will cause problems.

This patch (of 8):

When the CPU tries to access a device private page the migrate_to_ram()
callback associated with the pgmap for the page is called.  However no
reference is taken on the faulting page.  Therefore a concurrent migration
of the device private page can free the page and possibly the underlying
pgmap.  This results in a race which can crash the kernel due to the
migrate_to_ram() function pointer becoming invalid.  It also means drivers
can't reliably read the zone_device_data field because the page may have
been freed with memunmap_pages().

Close the race by getting a reference on the page while holding the ptl to
ensure it has not been freed.  Unfortunately the elevated reference count
will cause the migration required to handle the fault to fail.  To avoid
this failure pass the faulting page into the migrate_vma functions so that
if an elevated reference count is found it can be checked to see if it's
expected or not.

[mpe@ellerman.id.au: fix build]
  Link: https://lkml.kernel.org/r/87fsgbf3gh.fsf@mpe.ellerman.id.au
Link: https://lkml.kernel.org/r/cover.60659b549d8509ddecafad4f498ee7f03bb23c69.1664366292.git-series.apopple@nvidia.com
Link: https://lkml.kernel.org/r/d3e813178a59e565e8d78d9b9a4e2562f6494f90.1664366292.git-series.apopple@nvidia.comSigned-off-by: NAlistair Popple <apopple@nvidia.com>
Acked-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Alex Sierra <alex.sierra@amd.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Conflicts:
	arch/powerpc/kvm/book3s_hv_uvmem.c
	include/linux/migrate.h
	lib/test_hmm.c
	mm/migrate.c
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: Ntong tiangen <tongtiangen@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

66c1e596

02 11月, 2022 1 次提交

mm: split huge PUD on wp_huge_pud fallback · 7782054e

由 Gowans, James 提交于 11月 02, 2022

stable inclusion
from stable-v5.10.132
commit 931dbcc2e02f0409c095b11e35490cade9ac14af
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5YS3T

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=931dbcc2e02f0409c095b11e35490cade9ac14af

--------------------------------

commit 14c99d65 upstream.

Currently the implementation will split the PUD when a fallback is taken
inside the create_huge_pud function.  This isn't where it should be done:
the splitting should be done in wp_huge_pud, just like it's done for PMDs.
Reason being that if a callback is taken during create, there is no PUD
yet so nothing to split, whereas if a fallback is taken when encountering
a write protection fault there is something to split.

It looks like this was the original intention with the commit where the
splitting was introduced, but somehow it got moved to the wrong place
between v1 and v2 of the patch series.  Rebase mistake perhaps.

Link: https://lkml.kernel.org/r/6f48d622eb8bce1ae5dd75327b0b73894a2ec407.camel@amazon.com
Fixes: 327e9fd4 ("mm: Split huge pages on write-notify or COW")
Signed-off-by: NJames Gowans <jgowans@amazon.com>
Reviewed-by: NThomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

7782054e

19 10月, 2022 6 次提交

mm/rmap: Fix anon_vma->degree ambiguity leading to double-reuse · 25702ca1

由 Jann Horn 提交于 10月 19, 2022

stable inclusion
from stable-v5.10.141
commit 98f401d36396134c0c86e9e3bd00b6b6b028b521
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5USOP
CVE: CVE-2022-42703

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=98f401d36396134c0c86e9e3bd00b6b6b028b521

--------------------------------

commit 2555283e upstream.

anon_vma->degree tracks the combined number of child anon_vmas and VMAs
that use the anon_vma as their ->anon_vma.

anon_vma_clone() then assumes that for any anon_vma attached to
src->anon_vma_chain other than src->anon_vma, it is impossible for it to
be a leaf node of the VMA tree, meaning that for such VMAs ->degree is
elevated by 1 because of a child anon_vma, meaning that if ->degree
equals 1 there are no VMAs that use the anon_vma as their ->anon_vma.

This assumption is wrong because the ->degree optimization leads to leaf
nodes being abandoned on anon_vma_clone() - an existing anon_vma is
reused and no new parent-child relationship is created.  So it is
possible to reuse an anon_vma for one VMA while it is still tied to
another VMA.

This is an issue because is_mergeable_anon_vma() and its callers assume
that if two VMAs have the same ->anon_vma, the list of anon_vmas
attached to the VMAs is guaranteed to be the same.  When this assumption
is violated, vma_merge() can merge pages into a VMA that is not attached
to the corresponding anon_vma, leading to dangling page->mapping
pointers that will be dereferenced during rmap walks.

Fix it by separately tracking the number of child anon_vmas and the
number of VMAs using the anon_vma as their ->anon_vma.

Fixes: 7a3ef208 ("mm: prevent endless growth of anon_vma hierarchy")
Cc: stable@kernel.org
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Signed-off-by: NJann Horn <jannh@google.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

25702ca1

kasan: check KASAN_NO_FREE_META in __kasan_metadata_size · d5c5dc51

由 Andrey Konovalov 提交于 10月 19, 2022

mainline inclusion
from mainline-v6.1-rc1
commit ca77f290
category: bugfix
bugzilla: 187796, https://gitee.com/openeuler/kernel/issues/I5W6YV
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ca77f290cff1dfa095d71ae16cc7cda8ee6df495

--------------------------------

Patch series "kasan: switch tag-based modes to stack ring from per-object
metadata", v3.

This series makes the tag-based KASAN modes use a ring buffer for storing
stack depot handles for alloc/free stack traces for slab objects instead
of per-object metadata.  This ring buffer is referred to as the stack
ring.

On each alloc/free of a slab object, the tagged address of the object and
the current stack trace are recorded in the stack ring.

On each bug report, if the accessed address belongs to a slab object, the
stack ring is scanned for matching entries.  The newest entries are used
to print the alloc/free stack traces in the report: one entry for alloc
and one for free.

The advantages of this approach over storing stack trace handles in
per-object metadata with the tag-based KASAN modes:

- Allows to find relevant stack traces for use-after-free bugs without
  using quarantine for freed memory. (Currently, if the object was
  reallocated multiple times, the report contains the latest alloc/free
  stack traces, not necessarily the ones relevant to the buggy allocation.)
- Allows to better identify and mark use-after-free bugs, effectively
  making the CONFIG_KASAN_TAGS_IDENTIFY functionality always-on.
- Has fixed memory overhead.

The disadvantage:

- If the affected object was allocated/freed long before the bug happened
  and the stack trace events were purged from the stack ring, the report
  will have no stack traces.

Discussion

==========

The proposed implementation of the stack ring uses a single ring buffer
for the whole kernel.  This might lead to contention due to atomic
accesses to the ring buffer index on multicore systems.

At this point, it is unknown whether the performance impact from this
contention would be significant compared to the slowdown introduced by
collecting stack traces due to the planned changes to the latter part, see
the section below.

For now, the proposed implementation is deemed to be good enough, but this
might need to be revisited once the stack collection becomes faster.

A considered alternative is to keep a separate ring buffer for each CPU
and then iterate over all of them when printing a bug report.  This
approach requires somehow figuring out which of the stack rings has the
freshest stack traces for an object if multiple stack rings have them.

Further plans
=============

This series is a part of an effort to make KASAN stack trace collection
suitable for production.  This requires stack trace collection to be fast
and memory-bounded.

The planned steps are:

1. Speed up stack trace collection (potentially, by using SCS;
   patches on-hold until steps #2 and #3 are completed).
2. Keep stack trace handles in the stack ring (this series).
3. Add a memory-bounded mode to stack depot or provide an alternative
   memory-bounded stack storage.
4. Potentially, implement stack trace collection sampling to minimize
   the performance impact.

This patch (of 34):

__kasan_metadata_size() calculates the size of the redzone for objects in
a slab cache.

When accounting for presence of kasan_free_meta in the redzone, this
function only compares free_meta_offset with 0.  But free_meta_offset
could also be equal to KASAN_NO_FREE_META, which indicates that
kasan_free_meta is not present at all.

Add a comparison with KASAN_NO_FREE_META into __kasan_metadata_size().

Link: https://lkml.kernel.org/r/cover.1662411799.git.andreyknvl@google.com
Link: https://lkml.kernel.org/r/c7b316d30d90e5947eb8280f4dc78856a49298cf.1662411799.git.andreyknvl@google.comSigned-off-by: NAndrey Konovalov <andreyknvl@google.com>
Reviewed-by: NMarco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Peter Collingbourne <pcc@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d5c5dc51

kasan: sanitize objects when metadata doesn't fit · d5ddd708

由 Andrey Konovalov 提交于 10月 19, 2022

mainline inclusion
from mainline-v5.11-rc1
commit 97593cad
category: bugfix
bugzilla: 187796, https://gitee.com/openeuler/kernel/issues/I5W6YV
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=97593cad003c668e2532cb2939a24a031f8de52d

--------------------------------

KASAN marks caches that are sanitized with the SLAB_KASAN cache flag.
Currently if the metadata that is appended after the object (stores e.g.
stack trace ids) doesn't fit into KMALLOC_MAX_SIZE (can only happen with
SLAB, see the comment in the patch), KASAN turns off sanitization
completely.

With this change sanitization of the object data is always enabled.
However the metadata is only stored when it fits.  Instead of checking for
SLAB_KASAN flag accross the code to find out whether the metadata is
there, use cache->kasan_info.alloc/free_meta_offset.  As 0 can be a valid
value for free_meta_offset, introduce KASAN_NO_FREE_META as an indicator
that the free metadata is missing.

Without this change all sanitized KASAN objects would be put into
quarantine with generic KASAN.  With this change, only the objects that
have metadata (i.e.  when it fits) are put into quarantine, the rest is
freed right away.

Along the way rework __kasan_cache_create() and add claryfying comments.

Link: https://lkml.kernel.org/r/aee34b87a5e4afe586c2ac6a0b32db8dc4dcc2dc.1606162397.git.andreyknvl@google.com
Link: https://linux-review.googlesource.com/id/Icd947e2bea054cb5cfbdc6cf6652227d97032dcbCo-developed-by: NVincenzo Frascino <Vincenzo.Frascino@arm.com>
Signed-off-by: NVincenzo Frascino <Vincenzo.Frascino@arm.com>
Signed-off-by: NAndrey Konovalov <andreyknvl@google.com>
Reviewed-by: NMarco Elver <elver@google.com>
Tested-by: NVincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Conflicts:
	mm/kasan/common.c
	mm/kasan/hw_tags.c
	mm/kasan/kasan.h
	mm/kasan/quarantine.c
	mm/kasan/report.c
	mm/kasan/report_sw_tags.c
	mm/kasan/sw_tags.c
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d5ddd708

kasan: introduce set_alloc_info · be897263

由 Andrey Konovalov 提交于 10月 19, 2022

mainline inclusion
from mainline-v5.11-rc1
commit 8bb0009b
category: bugfix
bugzilla: 187796, https://gitee.com/openeuler/kernel/issues/I5W6YV
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8bb0009b19465da5a0cd394b5a6ccc2eaf418f23

--------------------------------

Add set_alloc_info() helper and move kasan_set_track() into it. This will
simplify the code for one of the upcoming changes.

No functional changes.

Link: https://lkml.kernel.org/r/b2393e8f1e311a70fc3aaa2196461b6acdee7d21.1606162397.git.andreyknvl@google.com
Link: https://linux-review.googlesource.com/id/I0316193cbb4ecc9b87b7c2eee0dd79f8ec908c1aSigned-off-by: NAndrey Konovalov <andreyknvl@google.com>
Reviewed-by: NDmitry Vyukov <dvyukov@google.com>
Reviewed-by: NMarco Elver <elver@google.com>
Tested-by: NVincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

be897263

kasan: rename get_alloc/free_info · 7ca33333

由 Andrey Konovalov 提交于 10月 19, 2022

mainline inclusion
from mainline-v5.11-rc1
commit 6476792f
category: bugfix
bugzilla: 187796, https://gitee.com/openeuler/kernel/issues/I5W6YV
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6476792f1015a356e6864076c210b328b64d08cc

--------------------------------

Rename get_alloc_info() and get_free_info() to kasan_get_alloc_meta() and
kasan_get_free_meta() to better reflect what those do and avoid confusion
with kasan_set_free_info().

No functional changes.

Link: https://lkml.kernel.org/r/27b7c036b754af15a2839e945f6d8bfce32b4c2f.1606162397.git.andreyknvl@google.com
Link: https://linux-review.googlesource.com/id/Ib6e4ba61c8b12112b403d3479a9799ac8fff8de1Signed-off-by: NAndrey Konovalov <andreyknvl@google.com>
Reviewed-by: NDmitry Vyukov <dvyukov@google.com>
Reviewed-by: NMarco Elver <elver@google.com>
Tested-by: NVincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Conflicts:
	mm/kasan/generic.c
	mm/kasan/quarantine.c
	mm/kasan/report.c
	mm/kasan/report_sw_tags.c
	mm/kasan/sw_tags.c
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

7ca33333

kasan: simplify quarantine_put call site · f43965ff

由 Andrey Konovalov 提交于 10月 19, 2022

mainline inclusion
from mainline-v5.11-rc1
commit c696de9f
category: bugfix
bugzilla: 187796, https://gitee.com/openeuler/kernel/issues/I5W6YV
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c696de9f12b7ddeddc05d378fc4dc0f66e9a8c95

--------------------------------

Patch series "kasan: boot parameters for hardware tag-based mode", v4.

=== Overview

Hardware tag-based KASAN mode [1] is intended to eventually be used in
production as a security mitigation. Therefore there's a need for finer
control over KASAN features and for an existence of a kill switch.

This patchset adds a few boot parameters for hardware tag-based KASAN that
allow to disable or otherwise control particular KASAN features, as well
as provides some initial optimizations for running KASAN in production.

There's another planned patchset what will further optimize hardware
tag-based KASAN, provide proper benchmarking and tests, and will fully
enable tag-based KASAN for production use.

Hardware tag-based KASAN relies on arm64 Memory Tagging Extension (MTE)
[2] to perform memory and pointer tagging. Please see [3] and [4] for
detailed analysis of how MTE helps to fight memory safety problems.

The features that can be controlled are:

1. Whether KASAN is enabled at all.
2. Whether KASAN collects and saves alloc/free stacks.
3. Whether KASAN panics on a detected bug or not.

The patch titled "kasan: add and integrate kasan boot parameters" of this
series adds a few new boot parameters.

kasan.mode allows to choose one of three main modes:

- kasan.mode=off - KASAN is disabled, no tag checks are performed
- kasan.mode=prod - only essential production features are enabled
- kasan.mode=full - all KASAN features are enabled

The chosen mode provides default control values for the features mentioned
above. However it's also possible to override the default values by
providing:

- kasan.stacktrace=off/on - enable stacks collection
                            (default: on for mode=full, otherwise off)
- kasan.fault=report/panic - only report tag fault or also panic
                             (default: report)

If kasan.mode parameter is not provided, it defaults to full when
CONFIG_DEBUG_KERNEL is enabled, and to prod otherwise.

It is essential that switching between these modes doesn't require
rebuilding the kernel with different configs, as this is required by
the Android GKI (Generic Kernel Image) initiative.

=== Benchmarks

For now I've only performed a few simple benchmarks such as measuring
kernel boot time and slab memory usage after boot. There's an upcoming
patchset which will optimize KASAN further and include more detailed
benchmarking results.

The benchmarks were performed in QEMU and the results below exclude the
slowdown caused by QEMU memory tagging emulation (as it's different from
the slowdown that will be introduced by hardware and is therefore
irrelevant).

KASAN_HW_TAGS=y + kasan.mode=off introduces no performance or memory
impact compared to KASAN_HW_TAGS=n.

kasan.mode=prod (manually excluding tagging) introduces 3% of performance
and no memory impact (except memory used by hardware to store tags)
compared to kasan.mode=off.

kasan.mode=full has about 40% performance and 30% memory impact over
kasan.mode=prod. Both come from alloc/free stack collection.

=== Notes

This patchset is available here:

https://github.com/xairy/linux/tree/up-boot-mte-v4

This patchset is based on v11 of "kasan: add hardware tag-based mode for
arm64" patchset [1].

For testing in QEMU hardware tag-based KASAN requires:

1. QEMU built from master [6] (use "-machine virt,mte=on -cpu max" arguments
   to run).
2. GCC version 10.

[1] https://lore.kernel.org/linux-arm-kernel/cover.1606161801.git.andreyknvl@google.com/T/#t
[2] https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/enhancing-memory-safety
[3] https://arxiv.org/pdf/1802.09517.pdf
[4] https://github.com/microsoft/MSRC-Security-Research/blob/master/papers/2020/Security%20analysis%20of%20memory%20tagging.pdf
[5] https://source.android.com/devices/architecture/kernel/generic-kernel-image
[6] https://github.com/qemu/qemu

=== Tags
Tested-by: NVincenzo Frascino <vincenzo.frascino@arm.com>

This patch (of 19):

Move get_free_info() call into quarantine_put() to simplify the call site.

No functional changes.

Link: https://lkml.kernel.org/r/cover.1606162397.git.andreyknvl@google.com
Link: https://lkml.kernel.org/r/312d0a3ef92cc6dc4fa5452cbc1714f9393ca239.1606162397.git.andreyknvl@google.com
Link: https://linux-review.googlesource.com/id/Iab0f04e7ebf8d83247024b7190c67c3c34c7940fSigned-off-by: NAndrey Konovalov <andreyknvl@google.com>
Reviewed-by: NDmitry Vyukov <dvyukov@google.com>
Reviewed-by: NMarco Elver <elver@google.com>
Tested-by: NVincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

f43965ff

29 9月, 2022 3 次提交

mm: Force TLB flush for PFNMAP mappings before unlink_file_vma() · 1916192d

由 Jann Horn 提交于 9月 29, 2022

stable inclusion
from stable-v5.10.142
commit 895428ee124ad70b9763259308354877b725c31d
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5PE9S
CVE: CVE-2022-39188

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=895428ee124ad70b9763259308354877b725c31d

--------------------------------

commit b67fbebd upstream.

Some drivers rely on having all VMAs through which a PFN might be
accessible listed in the rmap for correctness.
However, on X86, it was possible for a VMA with stale TLB entries
to not be listed in the rmap.

This was fixed in mainline with
commit b67fbebd ("mmu_gather: Force tlb-flush VM_PFNMAP vmas"),
but that commit relies on preceding refactoring in
commit 18ba064e ("mmu_gather: Let there be one tlb_{start,end}_vma()
implementation") and commit 1e9fdf21 ("mmu_gather: Remove per arch
tlb_{start,end}_vma()").

This patch provides equivalent protection without needing that
refactoring, by forcing a TLB flush between removing PTEs in
unmap_vmas() and the call to unlink_file_vma() in free_pgtables().

[This is a stable-specific rewrite of the upstream commit!]
Signed-off-by: NJann Horn <jannh@google.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Nze zuo <zuoze1@huawei.com>
Reviewed-by: NChen Wandun <chenwandun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

1916192d

hugetlb: fix huge_pmd_unshare address update · fd210efb

由 Mike Kravetz 提交于 9月 29, 2022

stable inclusion
from stable-v5.10.121
commit 63758dd9595f87c7e7b5f826fd2dcf53d6aff0cf
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6CQ

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=63758dd9595f87c7e7b5f826fd2dcf53d6aff0cf

--------------------------------

commit 48381273 upstream.

The routine huge_pmd_unshare() is passed a pointer to an address
associated with an area which may be unshared.  If unshare is successful
this address is updated to 'optimize' callers iterating over huge page
addresses.  For the optimization to work correctly, address should be
updated to the last huge page in the unmapped/unshared area.  However, in
the common case where the passed address is PUD_SIZE aligned, the address
is incorrectly updated to the address of the preceding huge page.  That
wastes CPU cycles as the unmapped/unshared range is scanned twice.

Link: https://lkml.kernel.org/r/20220524205003.126184-1-mike.kravetz@oracle.com
Fixes: 39dde65c ("shared page table for hugetlb page")
Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
Acked-by: NMuchun Song <songmuchun@bytedance.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

fd210efb

mm, compaction: fast_find_migrateblock() should return pfn in the target zone · 63146d88

由 Rei Yamamoto 提交于 9月 29, 2022

stable inclusion
from stable-v5.10.121
commit 7994d890123a6cad033f2842ff0177a9bda1cb23
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6CQ

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=7994d890123a6cad033f2842ff0177a9bda1cb23

--------------------------------

commit bbe832b9 upstream.

At present, pages not in the target zone are added to cc->migratepages
list in isolate_migratepages_block().  As a result, pages may migrate
between nodes unintentionally.

This would be a serious problem for older kernels without commit
a984226f ("mm: memcontrol: remove the pgdata parameter of
mem_cgroup_page_lruvec"), because it can corrupt the lru list by
handling pages in list without holding proper lru_lock.

Avoid returning a pfn outside the target zone in the case that it is
not aligned with a pageblock boundary.  Otherwise
isolate_migratepages_block() will handle pages not in the target zone.

Link: https://lkml.kernel.org/r/20220511044300.4069-1-yamamoto.rei@jp.fujitsu.com
Fixes: 70b44595 ("mm, compaction: use free lists to quickly locate a migration source")
Signed-off-by: NRei Yamamoto <yamamoto.rei@jp.fujitsu.com>
Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
Acked-by: NMel Gorman <mgorman@techsingularity.net>
Reviewed-by: NOscar Salvador <osalvador@suse.de>
Cc: Don Dutile <ddutile@redhat.com>
Cc: Wonhyuk Yang <vvghjk1234@gmail.com>
Cc: Rei Yamamoto <yamamoto.rei@jp.fujitsu.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

63146d88

20 9月, 2022 2 次提交

etmem: Add a scan flag to support specified page swap-out · 4f0eedb8

由 liubo 提交于 9月 20, 2022

euleros inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5DC4A
CVE: NA

-------------------------------------------------

etmem, the memory vertical expansion technology,

The existing memory expansion tool etmem swaps out all pages that can be
swapped out for the process by default, unless the page is marked with
lock flag.

The function of swapping out specified pages is added. The process adds
VM_SWAPFLAG flags for pages to be swapped out. The etmem adds filters to
the scanning module and swaps out only these pages.
Signed-off-by: Nliubo <liubo254@huawei.com>
Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

4f0eedb8

etmem: add swapcache reclaim to etmem · abfd8691

由 liubo 提交于 9月 20, 2022

euleros inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5DC4A
CVE: NA

-------------------------------------------------

etmem, the memory vertical expansion technology,

In the current etmem process, memory page swapping is implemented by
invoking shrink_page_list. When this interface is invoked for the first
time, pages are added to the swap cache and written to disks.The swap
cache page is reclaimed only when this interface is invoked for the
second time and no process accesses the page.However, in the etmem
process, the user mode scans pages that have been accessed, and the
migration is not delivered to pages that are not accessed by processes.
Therefore, the swap cache may always be occupied.
To solve the preceding problem, add the logic for actively reclaiming
the swap cache.When the swap cache occupies a large amount of memory,
the system proactively scans the LRU linked list and reclaims the
swap cache to save memory within the specified range.
Signed-off-by: Nliubo <liubo254@huawei.com>
Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

abfd8691

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功