- 09 11月, 2022 25 次提交
-
-
由 Chen Jun 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5I72Q CVE: NA -------------------------------- 1. Split sharepool normal area(8T) to sharepool readonly area(64G) and sharepool normal area(8T - 64G) 2. User programs can not write to the address in sharepool readonly area. 3. Add SP_PROT_FOCUS for sp_alloc. 4. sp_alloc with SP_PROT_RO | SP_PROT_FOCUS returns the virtual address within sharepool readonly area. 5. Other user programs which add into task with write prot can not write the address in sharepool readonly area. Signed-off-by: NChen Jun <chenjun102@huawei.com>
-
由 Chen Jun 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5I72Q CVE: NA -------------------------------- Extract code logic of obtaining sp_mapping by address into a function sp_mapping_find. Signed-off-by: NChen Jun <chenjun102@huawei.com>
-
由 Chen Jun 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5I72Q CVE: NA -------------------------------- spg->dvpp and spg->normal can be combined into one array. Signed-off-by: NChen Jun <chenjun102@huawei.com>
-
由 Chen Jun 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5I72Q CVE: NA -------------------------------- Now, sp_mapping.flag is only used to distinguish sp_mapping types. So, 'type' are more suitable. Signed-off-by: NChen Jun <chenjun102@huawei.com>
-
由 Chen Jun 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5I72Q CVE: NA -------------------------------- "TASK_SIZE - MMAP_SHARE_POOL_DVPP_SIZE" is puzzling. MMAP_SHARE_POOL_START = MMAP_SHARE_POOL_END - MMAP_SHARE_POOL_SIZE and MMAP_SHARE_POOL_16G_START = MMAP_SHARE_POOL_END - MMAP_SHARE_POOL_DVPP_SIZE make the memory layout not unintuitive. Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>
-
由 Zhou Guanghui 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5PIA6 CVE: NA -------------------------------- Use get_task_mm to avoid the mm being released when the information in mm_struct is used. Signed-off-by: NZhou Guanghui <zhouguanghui1@huawei.com>
-
由 Zhou Guanghui 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5PIA4 CVE: NA -------------------------------- The maximum value of spg_id is checked to ensure that the value of spg_id is within the valid range: SPG_ID_DEFAULT or [SPG_ID_MIN SPG_ID_AUTO) Signed-off-by: NZhou Guanghui <zhouguanghui1@huawei.com>
-
由 Zhou Guanghui 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5PIA0 CVE: NA -------------------------------- The spa is used during the update_mem_usage. In this case, the spa has been released in the case of concurrency (mg_sp_unshare). Signed-off-by: NZhou Guanghui <zhouguanghui1@huawei.com>
-
由 Zhou Guanghui 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5PIA2 CVE: NA -------------------------------- When a process is added to a group, mm->mm_users increases by one. When a process is deleted from a group, mm->mm_users decreases by one. It is not possible to reduce to 0 because this function is preceded by get_task_mm. Signed-off-by: NZhou Guanghui <zhouguanghui1@huawei.com>
-
由 Wang Wensheng 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5PD4P CVE: NA -------------------------------- [ 2058.802818][ T290] BUG: KASAN: use-after-free in get_process_sp_res+0x70/0x134 [ 2058.810194][ T290] Read of size 8 at addr ffff00088dc6ab28 by task test_debug_loop/290 [ 2058.820520][ T290] CPU: 5 PID: 290 Comm: test_debug_loop Tainted: G W OE 5.10.0+ #2 [ 2058.829377][ T290] Hardware name: EVB(EP) (DT) [ 2058.833982][ T290] Call trace: [ 2058.837217][ T290] dump_backtrace+0x0/0x30c [ 2058.841660][ T290] show_stack+0x20/0x30 [ 2058.845758][ T290] dump_stack+0x120/0x1b0 [ 2058.850028][ T290] print_address_description.constprop.0+0x2c/0x1fc [ 2058.856555][ T290] __kasan_report+0xfc/0x160 [ 2058.861086][ T290] kasan_report+0x44/0xb0 [ 2058.865356][ T290] __asan_load8+0x94/0xd0 [ 2058.869623][ T290] get_process_sp_res+0x70/0x134 [ 2058.874501][ T290] proc_usage_show+0x1ac/0x304 [ 2058.879208][ T290] seq_read_iter+0x254/0x750 [ 2058.883728][ T290] proc_reg_read_iter+0x100/0x140 [ 2058.888689][ T290] new_sync_read+0x1cc/0x2c0 [ 2058.893215][ T290] vfs_read+0x1f4/0x250 [ 2058.897304][ T290] ksys_read+0xcc/0x170 [ 2058.901399][ T290] __arm64_sys_read+0x4c/0x60 [ 2058.906016][ T290] el0_svc_common.constprop.0+0xb4/0x2a0 [ 2058.911584][ T290] do_el0_svc+0x8c/0xb0 [ 2058.915677][ T290] el0_svc+0x20/0x30 [ 2058.919503][ T290] el0_sync_handler+0xb0/0xbc [ 2058.924114][ T290] el0_sync+0x180/0x1c0 [ 2058.928190][ T290] [ 2058.930444][ T290] Allocated by task 2176: [ 2058.934714][ T290] kasan_save_stack+0x28/0x60 [ 2058.939328][ T290] __kasan_kmalloc.constprop.0+0xc8/0xf0 [ 2058.944909][ T290] kasan_kmalloc+0x10/0x20 [ 2058.949268][ T290] kmem_cache_alloc_trace+0x128/0xabc [ 2058.954577][ T290] create_spg_node+0x58/0x214 [ 2058.959188][ T290] local_group_add_task+0x30/0x14c [ 2058.964231][ T290] init_local_group+0xd0/0x1a0 [ 2058.968936][ T290] sp_init_group_master_locked.part.0+0x19c/0x290 [ 2058.975298][ T290] mg_sp_group_add_task+0x73c/0xdb0 [ 2058.980456][ T290] dev_sp_add_group+0x124/0x2dc [sharepool_dev] [ 2058.986647][ T290] dev_ioctl+0x21c/0x2ec [sharepool_dev] [ 2058.992222][ T290] __arm64_sys_ioctl+0xd8/0x120 [ 2058.997010][ T290] el0_svc_common.constprop.0+0xb4/0x2a0 [ 2059.002572][ T290] do_el0_svc+0x8c/0xb0 [ 2059.006662][ T290] el0_svc+0x20/0x30 [ 2059.010489][ T290] el0_sync_handler+0xb0/0xbc [ 2059.015101][ T290] el0_sync+0x180/0x1c0 [ 2059.019176][ T290] [ 2059.021427][ T290] Freed by task 4125: [ 2059.025343][ T290] kasan_save_stack+0x28/0x60 [ 2059.029949][ T290] kasan_set_track+0x28/0x40 [ 2059.034476][ T290] kasan_set_free_info+0x24/0x50 [ 2059.039347][ T290] __kasan_slab_free+0x104/0x1ac [ 2059.044227][ T290] kasan_slab_free+0x14/0x20 [ 2059.048744][ T290] kfree+0x164/0xb94 [ 2059.052576][ T290] sp_group_post_exit+0xf0/0x980 [ 2059.057448][ T290] mmput.part.0+0xb4/0x220 [ 2059.061790][ T290] mmput+0x2c/0x40 [ 2059.065450][ T290] exit_mm+0x27c/0x3a0 [ 2059.069450][ T290] do_exit+0x2a0/0x790 [ 2059.073448][ T290] do_group_exit+0x64/0x100 [ 2059.077884][ T290] get_signal+0x1fc/0x9fc [ 2059.082144][ T290] do_signal+0x110/0x2cc [ 2059.086320][ T290] do_notify_resume+0x158/0x2b0 [ 2059.091108][ T290] work_pending+0xc/0x6d4 [ 2059.095358][ T290] Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>
-
由 Guo Mengqi 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5OE1J CVE: NA -------------------------------- Fix a deadlock indicated below: [ 171.669844] Chain exists of: [ 171.669844] &mm->mmap_lock --> sp_group_sem --> &spg->rw_lock [ 171.669844] [ 171.671469] Possible unsafe locking scenario: [ 171.671469] [ 171.672121] CPU0 CPU1 [ 171.672415] ---- ---- [ 171.672706] lock(&spg->rw_lock); [ 171.673114] lock(sp_group_sem); [ 171.673706] lock(&spg->rw_lock); [ 171.674208] lock(&mm->mmap_lock); [ 171.674863] [ 171.674863] *** DEADLOCK *** sharepool use lock in order: sp_group_sem --> &spg->rw_lock --> mm->mmap_lock However, in sp_check_mmap_addr(), when mm->mmap_lock is held, it requested sp_group_sem, which is: mm->mmap_lock --> sp_group_sem. This causes ABBA problem. This happens in: [ 171.642687] the existing dependency chain (in reverse order) is: [ 171.643745] [ 171.643745] -> #2 (&spg->rw_lock){++++}-{3:3}: [ 171.644639] __lock_acquire+0x6f4/0xc40 [ 171.645189] lock_acquire+0x2f0/0x3c8 [ 171.645631] down_read+0x64/0x2d8 [ 171.646075] proc_usage_by_group+0x50/0x258 (spg->rw_lock) [ 171.646542] idr_for_each+0x6c/0xf0 [ 171.647011] proc_group_usage_show+0x140/0x178 [ 171.647629] seq_read_iter+0xe4/0x498 [ 171.648217] proc_reg_read_iter+0xa8/0xe0 [ 171.648776] new_sync_read+0xfc/0x1a0 [ 171.649002] vfs_read+0x1ac/0x1c8 [ 171.649217] ksys_read+0x74/0xf8 [ 171.649596] __arm64_sys_read+0x24/0x30 [ 171.649934] el0_svc_common.constprop.0+0x8c/0x270 [ 171.650528] do_el0_svc+0x34/0xb8 [ 171.651069] el0_svc+0x1c/0x28 [ 171.651278] el0_sync_handler+0x8c/0xb0 [ 171.651636] el0_sync+0x168/0x180 [ 171.652118] [ 171.652118] -> #1 (sp_group_sem){++++}-{3:3}: [ 171.652692] __lock_acquire+0x6f4/0xc40 [ 171.653059] lock_acquire+0x2f0/0x3c8 [ 171.653303] down_read+0x64/0x2d8 [ 171.653704] mg_is_sharepool_addr+0x184/0x340 (&sp_group_sem) [ 171.654085] sp_check_mmap_addr+0x64/0x108 [ 171.654668] arch_get_unmapped_area_topdown+0x9c/0x528 [ 171.655370] thp_get_unmapped_area+0x54/0x68 [ 171.656170] get_unmapped_area+0x94/0x160 [ 171.656415] __do_mmap_mm+0xd4/0x540 [ 171.656629] do_mmap+0x98/0x648 [ 171.656838] vm_mmap_pgoff+0xc0/0x188 [ 171.657129] vm_mmap+0x6c/0x98 [ 171.657619] elf_map+0xe0/0x118 [ 171.657835] load_elf_binary+0x4ec/0xfd8 [ 171.658103] bprm_execve.part.9+0x3ec/0x840 [ 171.658448] bprm_execve+0x7c/0xb0 [ 171.658919] kernel_execve+0x18c/0x198 [ 171.659500] run_init_process+0xf0/0x108 [ 171.660073] try_to_run_init_process+0x20/0x58 [ 171.660558] kernel_init+0xcc/0x120 [ 171.660862] ret_from_fork+0x10/0x18 [ 171.661273] [ 171.661273] -> #0 (&mm->mmap_lock){++++}-{3:3}: [ 171.661885] check_prev_add+0xa4/0xbd8 [ 171.662229] validate_chain+0xf54/0x14b8 [ 171.662705] __lock_acquire+0x6f4/0xc40 [ 171.663310] lock_acquire+0x2f0/0x3c8 [ 171.663658] down_write+0x60/0x208 [ 171.664179] mg_sp_alloc+0x24c/0x1150 (mm->mmap_lock) [ 171.665245] dev_ioctl+0x1128/0x1fb8 [sharepool_dev] [ 171.665688] __arm64_sys_ioctl+0xb0/0xe8 [ 171.666250] el0_svc_common.constprop.0+0x8c/0x270 [ 171.667255] do_el0_svc+0x34/0xb8 [ 171.667806] el0_svc+0x1c/0x28 [ 171.668249] el0_sync_handler+0x8c/0xb0 [ 171.668661] el0_sync+0x168/0x180 Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>
-
由 Guo Mengqi 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5OE1J CVE: NA -------------------------------- The mutex protecting spm_dvpp_list has an ABBA deadlock with spg->rw_lock. Try add a process to a sharepool group and cat /proc/sharepool/spa_stat at the same time will reproduce the problem. Remove spg->rw_lock to avoid this. [ 1101.013480]INFO: task test:3567 blocked for more than 30 seconds. [ 1101.014378] Tainted: G OE 5.10.0+ #45 [ 1101.015707]task:test state:D stack: 0 pid: 3567 [ 1101.016464]Call trace: [ 1101.016736] __switch_to+0xc0/0x128 [ 1101.017082] __schedule+0x3fc/0x898 [ 1101.017626] schedule+0x48/0xd8 [ 1101.017981] schedule_preempt_disabled+0x14/0x20 [ 1101.018519] __mutex_lock.isra.1+0x160/0x638 [ 1101.018899] __mutex_lock_slowpath+0x24/0x30 [ 1101.019291] mutex_lock+0x5c/0x68 [ 1101.019607] sp_mapping_create+0x118/0x1b0 [ 1101.019963] sp_init_group_master_locked.part.9+0x10c/0x288 [ 1101.020356] mg_sp_group_add_task.part.16+0x7dc/0xcd0 [ 1101.020750] mg_sp_group_add_task+0x54/0xd0 [ 1101.021120] dev_ioctl+0x360/0x1e20 [sharepool_dev] [ 1101.022171] __arm64_sys_ioctl+0xb0/0xe8 [ 1101.022695] el0_svc_common.constprop.0+0x88/0x268 [ 1101.023143] do_el0_svc+0x34/0xb8 [ 1101.023487] el0_svc+0x1c/0x28 [ 1101.023775] el0_sync_handler+0x8c/0xb0 [ 1101.024120] el0_sync+0x168/0x180 Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>
-
由 Guo Mengqi 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5ODCT CVE: NA -------------------------------- When there are a large number of groups in the system, or with a large number of processes in each group, "cat /proc/sharepool/proc_stat" will encounter softlockup before all prints finished. This is because there are too many loops in the callback function. Remove one of the loops to reduce time cost and add a cond_resched() to avoid this. root@buildroot:~/install# cat /proc/sharepool/proc_stat [ 1250.647469] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [cat:309] [ 1250.648610] Modules linked in: sharepool_dev(OE) [ 1250.650795] CPU: 0 PID: 309 Comm: cat Tainted: G OE 5.10.0+ #43 [ 1250.651216] Hardware name: linux,dummy-virt (DT) [ 1250.651721] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--) [ 1250.652426] pc : get_process_sp_res+0x40/0x90 [ 1250.652747] lr : proc_usage_by_group+0x158/0x218 ... [ 1250.657903] Call trace: [ 1250.658376] get_process_sp_res+0x40/0x90 [ 1250.658602] proc_usage_by_group+0x158/0x218 [ 1250.658838] idr_for_each+0x6c/0xf0 [ 1250.659027] proc_group_usage_show+0x104/0x120 [ 1250.659263] seq_read_iter+0xe0/0x498 [ 1250.659462] proc_reg_read_iter+0xa8/0xe0 [ 1250.659660] generic_file_splice_read+0xf0/0x1b0 [ 1250.659865] do_splice_to+0x7c/0xd0 [ 1250.660029] splice_direct_to_actor+0xe0/0x2a8 [ 1250.660353] do_splice_direct+0xa4/0xf8 [ 1250.660902] do_sendfile+0x1bc/0x420 [ 1250.661079] __arm64_sys_sendfile64+0x170/0x178 [ 1250.661298] el0_svc_common.constprop.0+0x88/0x268 [ 1250.661505] do_el0_svc+0x34/0xb8 [ 1250.661686] el0_svc+0x1c/0x28 [ 1250.661836] el0_sync_handler+0x8c/0xb0 [ 1250.662033] el0_sync+0x168/0x180 Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>
-
由 Guo Mengqi 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5O5RQ CVE: NA -------------------------------- Notice that in sp_unshare_uva(), for authentication check, comparison between current->tgid and spa->applier is well enough. There is no need to check current->mm against spa->mm. Other redundant cases: - find_spg_node_by_spg() will never return NULL in current use context; - spg_info_show() will not come across a group with id 0. Therefore, delete these redundant paths. Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>
-
由 Zhang Zekun 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5XQS4 CVE: NA ----------------------------------------------- In function get_process_sp_res(), spg_node can be freed by other process, the access to spg_node->spg can cause kernel panic. Add a pair of read lock to fix this problem. Fix the same problem in proc_sp_group_state(). Fixes: 3d37f8717287 ("[Huawei] mm: sharepool: use built-in-statistics") Signed-off-by: NZhang Zekun <zhangzekun11@huawei.com>
-
由 Guo Mengqi 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5MS48 CVE: NA -------------------------------- Fix two bugs revealed by static check: - Release the mm->mmap_lock when mm->sp_group_master had not been initialized. - Do not add mm to master list if there process add group failed. Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>
-
由 Guo Mengqi 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5M3PS CVE: NA -------------------------------- - fix SP_RES value incorrect bug - fix SP_RES_T value incorrect bug - fix pid field uninitialized error in pass-through scenario Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>
-
由 Zhang Zekun 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5LY2R CVE: NA ------------------------------------------- Remove the meaningless comment in mg_sp_free() and the fix the bug in mg_sp_group_id_by_pid() parameter check path. Signed-off-by: NZhang Zekun <zhangzekun11@huawei.com>
-
由 Zhang Zekun 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5LY51 CVE: NA ---------------------------------------------- The variable enable_mdc_default_group has been deprecated, thus remove it and the corresponding code. The definition of is_process_in_group() can be ambiguous, thus change the return value type. Signed-off-by: NZhang Zekun <zhangzekun11@huawei.com>
-
由 Zhang Zekun 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5LY4H CVE: NA ----------------------------------------- Remove the sp_device_number, and we don't need 'sp_device_number' to detect the sp_device_number. Instead, we use maco 'MAX_DEVID' to take the place of sp_device_number. Signed-off-by: NZhang Zekun <zhangzekun11@huawei.com>
-
由 Zhang Zekun 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5LY5K CVE: NA ----------------------------------- Remove the unused sp_dev_va_start and sp_dev_va_size, the related code can be removed. Add the dvpp_addr checker in mg_is_sharepool_addr() for current proc. Signed-off-by: NZhang Zekun <zhangzekun11@huawei.com>
-
由 Wang Wensheng 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5LHGZ CVE: NA -------------------------------- Delete unused sysctl interfaces in sharepool feature. Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>
-
由 Guo Mengqi 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5KSDH CVE: NA -------------------------------- Fix sharepool redundant /proc/sharepool/spa_stat prints when there are multiple groups which are all attached to same sp_mapping. Traverse all dvpp-mappings rather than all groups. Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>
-
由 Guo Mengqi 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5K3MH CVE: NA -------------------------------- After refactoring, cat /proc/pid_xx/sp_group will cause kernel panic. Fix this error. Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>
-
由 Guo Mengqi 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5KC7C CVE: NA -------------------------------- Most interfaces starting with "sp_" are deprecated, remove them. Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>
-
- 03 11月, 2022 3 次提交
-
-
由 Liu Shixin 提交于
maillist inclusion category: bugfix bugzilla: 187198, https://gitee.com/openeuler/kernel/issues/I5NX1S Reference: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20220823&id=f5606044e659f8fa754fa692e2fa5aea1ec7f2f6 -------------------------------- The vmemmap pages is marked by kmemleak when allocated from memblock. Remove it from kmemleak when freeing the page. Otherwise, when we reuse the page, kmemleak may report such an error and then stop working. kmemleak: Cannot insert 0xffff98fb6eab3d40 into the object search tree (overlaps existing) kmemleak: Kernel memory leak detector disabled kmemleak: Object 0xffff98fb6be00000 (size 335544320): kmemleak: comm "swapper", pid 0, jiffies 4294892296 kmemleak: min_count = 0 kmemleak: count = 0 kmemleak: flags = 0x1 kmemleak: checksum = 0 kmemleak: backtrace: Link: https://lkml.kernel.org/r/20220819094005.2928241-1-liushixin2@huawei.com Fixes: f41f2ed4 (mm: hugetlb: free the vmemmap pages associated with each HugeTLB page) Signed-off-by: NLiu Shixin <liushixin2@huawei.com> Reviewed-by: NMuchun Song <songmuchun@bytedance.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLiu Shixin <liushixin2@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Baolin Wang 提交于
mainline inclusion from mainline-v6.1-rc1 commit fac35ba7 category: bugfix bugzilla: 187864, https://gitee.com/src-openeuler/kernel/issues/I5X1Z9 CVE: CVE-2022-3623 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=fac35ba763ed07ba93154c95ffc0c4a55023707f -------------------------------- On some architectures (like ARM64), it can support CONT-PTE/PMD size hugetlb, which means it can support not only PMD/PUD size hugetlb (2M and 1G), but also CONT-PTE/PMD size(64K and 32M) if a 4K page size specified. So when looking up a CONT-PTE size hugetlb page by follow_page(), it will use pte_offset_map_lock() to get the pte entry lock for the CONT-PTE size hugetlb in follow_page_pte(). However this pte entry lock is incorrect for the CONT-PTE size hugetlb, since we should use huge_pte_lock() to get the correct lock, which is mm->page_table_lock. That means the pte entry of the CONT-PTE size hugetlb under current pte lock is unstable in follow_page_pte(), we can continue to migrate or poison the pte entry of the CONT-PTE size hugetlb, which can cause some potential race issues, even though they are under the 'pte lock'. For example, suppose thread A is trying to look up a CONT-PTE size hugetlb page by move_pages() syscall under the lock, however antoher thread B can migrate the CONT-PTE hugetlb page at the same time, which will cause thread A to get an incorrect page, if thread A also wants to do page migration, then data inconsistency error occurs. Moreover we have the same issue for CONT-PMD size hugetlb in follow_huge_pmd(). To fix above issues, rename the follow_huge_pmd() as follow_huge_pmd_pte() to handle PMD and PTE level size hugetlb, which uses huge_pte_lock() to get the correct pte entry lock to make the pte entry stable. Mike said: Support for CONT_PMD/_PTE was added with bb9dd3df ("arm64: hugetlb: refactor find_num_contig()"). Patch series "Support for contiguous pte hugepages", v4. However, I do not believe these code paths were executed until migration support was added with 5480280d ("arm64/mm: enable HugeTLB migration for contiguous bit HugeTLB pages") I would go with 5480280d for the Fixes: targe. Link: https://lkml.kernel.org/r/635f43bdd85ac2615a58405da82b4d33c6e5eb05.1662017562.git.baolin.wang@linux.alibaba.com Fixes: 5480280d ("arm64/mm: enable HugeTLB migration for contiguous bit HugeTLB pages") Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Suggested-by: NMike Kravetz <mike.kravetz@oracle.com> Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Conflicts: mm/hugetlb.c Signed-off-by: NLiu Shixin <liushixin2@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Alistair Popple 提交于
mainline inclusion from mainline-v6.1-rc1 commit 16ce101d category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5VZ0L CVE: CVE-2022-3523 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=16ce101db85db694a91380aa4c89b25530871d33 -------------------------------- Patch series "Fix several device private page reference counting issues", v2 This series aims to fix a number of page reference counting issues in drivers dealing with device private ZONE_DEVICE pages. These result in use-after-free type bugs, either from accessing a struct page which no longer exists because it has been removed or accessing fields within the struct page which are no longer valid because the page has been freed. During normal usage it is unlikely these will cause any problems. However without these fixes it is possible to crash the kernel from userspace. These crashes can be triggered either by unloading the kernel module or unbinding the device from the driver prior to a userspace task exiting. In modules such as Nouveau it is also possible to trigger some of these issues by explicitly closing the device file-descriptor prior to the task exiting and then accessing device private memory. This involves some minor changes to both PowerPC and AMD GPU code. Unfortunately I lack hardware to test either of those so any help there would be appreciated. The changes mimic what is done in for both Nouveau and hmm-tests though so I doubt they will cause problems. This patch (of 8): When the CPU tries to access a device private page the migrate_to_ram() callback associated with the pgmap for the page is called. However no reference is taken on the faulting page. Therefore a concurrent migration of the device private page can free the page and possibly the underlying pgmap. This results in a race which can crash the kernel due to the migrate_to_ram() function pointer becoming invalid. It also means drivers can't reliably read the zone_device_data field because the page may have been freed with memunmap_pages(). Close the race by getting a reference on the page while holding the ptl to ensure it has not been freed. Unfortunately the elevated reference count will cause the migration required to handle the fault to fail. To avoid this failure pass the faulting page into the migrate_vma functions so that if an elevated reference count is found it can be checked to see if it's expected or not. [mpe@ellerman.id.au: fix build] Link: https://lkml.kernel.org/r/87fsgbf3gh.fsf@mpe.ellerman.id.au Link: https://lkml.kernel.org/r/cover.60659b549d8509ddecafad4f498ee7f03bb23c69.1664366292.git-series.apopple@nvidia.com Link: https://lkml.kernel.org/r/d3e813178a59e565e8d78d9b9a4e2562f6494f90.1664366292.git-series.apopple@nvidia.comSigned-off-by: NAlistair Popple <apopple@nvidia.com> Acked-by: NFelix Kuehling <Felix.Kuehling@amd.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Lyude Paul <lyude@redhat.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Alex Sierra <alex.sierra@amd.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Christian König <christian.koenig@amd.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Conflicts: arch/powerpc/kvm/book3s_hv_uvmem.c include/linux/migrate.h lib/test_hmm.c mm/migrate.c Signed-off-by: NMa Wupeng <mawupeng1@huawei.com> Reviewed-by: Ntong tiangen <tongtiangen@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 02 11月, 2022 1 次提交
-
-
由 Gowans, James 提交于
stable inclusion from stable-v5.10.132 commit 931dbcc2e02f0409c095b11e35490cade9ac14af category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5YS3T Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=931dbcc2e02f0409c095b11e35490cade9ac14af -------------------------------- commit 14c99d65 upstream. Currently the implementation will split the PUD when a fallback is taken inside the create_huge_pud function. This isn't where it should be done: the splitting should be done in wp_huge_pud, just like it's done for PMDs. Reason being that if a callback is taken during create, there is no PUD yet so nothing to split, whereas if a fallback is taken when encountering a write protection fault there is something to split. It looks like this was the original intention with the commit where the splitting was introduced, but somehow it got moved to the wrong place between v1 and v2 of the patch series. Rebase mistake perhaps. Link: https://lkml.kernel.org/r/6f48d622eb8bce1ae5dd75327b0b73894a2ec407.camel@amazon.com Fixes: 327e9fd4 ("mm: Split huge pages on write-notify or COW") Signed-off-by: NJames Gowans <jgowans@amazon.com> Reviewed-by: NThomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Christian König <christian.koenig@amd.com> Cc: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
- 19 10月, 2022 6 次提交
-
-
由 Jann Horn 提交于
stable inclusion from stable-v5.10.141 commit 98f401d36396134c0c86e9e3bd00b6b6b028b521 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5USOP CVE: CVE-2022-42703 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=98f401d36396134c0c86e9e3bd00b6b6b028b521 -------------------------------- commit 2555283e upstream. anon_vma->degree tracks the combined number of child anon_vmas and VMAs that use the anon_vma as their ->anon_vma. anon_vma_clone() then assumes that for any anon_vma attached to src->anon_vma_chain other than src->anon_vma, it is impossible for it to be a leaf node of the VMA tree, meaning that for such VMAs ->degree is elevated by 1 because of a child anon_vma, meaning that if ->degree equals 1 there are no VMAs that use the anon_vma as their ->anon_vma. This assumption is wrong because the ->degree optimization leads to leaf nodes being abandoned on anon_vma_clone() - an existing anon_vma is reused and no new parent-child relationship is created. So it is possible to reuse an anon_vma for one VMA while it is still tied to another VMA. This is an issue because is_mergeable_anon_vma() and its callers assume that if two VMAs have the same ->anon_vma, the list of anon_vmas attached to the VMAs is guaranteed to be the same. When this assumption is violated, vma_merge() can merge pages into a VMA that is not attached to the corresponding anon_vma, leading to dangling page->mapping pointers that will be dereferenced during rmap walks. Fix it by separately tracking the number of child anon_vmas and the number of VMAs using the anon_vma as their ->anon_vma. Fixes: 7a3ef208 ("mm: prevent endless growth of anon_vma hierarchy") Cc: stable@kernel.org Acked-by: NMichal Hocko <mhocko@suse.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Signed-off-by: NJann Horn <jannh@google.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NMa Wupeng <mawupeng1@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Andrey Konovalov 提交于
mainline inclusion from mainline-v6.1-rc1 commit ca77f290 category: bugfix bugzilla: 187796, https://gitee.com/openeuler/kernel/issues/I5W6YV CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ca77f290cff1dfa095d71ae16cc7cda8ee6df495 -------------------------------- Patch series "kasan: switch tag-based modes to stack ring from per-object metadata", v3. This series makes the tag-based KASAN modes use a ring buffer for storing stack depot handles for alloc/free stack traces for slab objects instead of per-object metadata. This ring buffer is referred to as the stack ring. On each alloc/free of a slab object, the tagged address of the object and the current stack trace are recorded in the stack ring. On each bug report, if the accessed address belongs to a slab object, the stack ring is scanned for matching entries. The newest entries are used to print the alloc/free stack traces in the report: one entry for alloc and one for free. The advantages of this approach over storing stack trace handles in per-object metadata with the tag-based KASAN modes: - Allows to find relevant stack traces for use-after-free bugs without using quarantine for freed memory. (Currently, if the object was reallocated multiple times, the report contains the latest alloc/free stack traces, not necessarily the ones relevant to the buggy allocation.) - Allows to better identify and mark use-after-free bugs, effectively making the CONFIG_KASAN_TAGS_IDENTIFY functionality always-on. - Has fixed memory overhead. The disadvantage: - If the affected object was allocated/freed long before the bug happened and the stack trace events were purged from the stack ring, the report will have no stack traces. Discussion ========== The proposed implementation of the stack ring uses a single ring buffer for the whole kernel. This might lead to contention due to atomic accesses to the ring buffer index on multicore systems. At this point, it is unknown whether the performance impact from this contention would be significant compared to the slowdown introduced by collecting stack traces due to the planned changes to the latter part, see the section below. For now, the proposed implementation is deemed to be good enough, but this might need to be revisited once the stack collection becomes faster. A considered alternative is to keep a separate ring buffer for each CPU and then iterate over all of them when printing a bug report. This approach requires somehow figuring out which of the stack rings has the freshest stack traces for an object if multiple stack rings have them. Further plans ============= This series is a part of an effort to make KASAN stack trace collection suitable for production. This requires stack trace collection to be fast and memory-bounded. The planned steps are: 1. Speed up stack trace collection (potentially, by using SCS; patches on-hold until steps #2 and #3 are completed). 2. Keep stack trace handles in the stack ring (this series). 3. Add a memory-bounded mode to stack depot or provide an alternative memory-bounded stack storage. 4. Potentially, implement stack trace collection sampling to minimize the performance impact. This patch (of 34): __kasan_metadata_size() calculates the size of the redzone for objects in a slab cache. When accounting for presence of kasan_free_meta in the redzone, this function only compares free_meta_offset with 0. But free_meta_offset could also be equal to KASAN_NO_FREE_META, which indicates that kasan_free_meta is not present at all. Add a comparison with KASAN_NO_FREE_META into __kasan_metadata_size(). Link: https://lkml.kernel.org/r/cover.1662411799.git.andreyknvl@google.com Link: https://lkml.kernel.org/r/c7b316d30d90e5947eb8280f4dc78856a49298cf.1662411799.git.andreyknvl@google.comSigned-off-by: NAndrey Konovalov <andreyknvl@google.com> Reviewed-by: NMarco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Peter Collingbourne <pcc@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLiu Shixin <liushixin2@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Andrey Konovalov 提交于
mainline inclusion from mainline-v5.11-rc1 commit 97593cad category: bugfix bugzilla: 187796, https://gitee.com/openeuler/kernel/issues/I5W6YV CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=97593cad003c668e2532cb2939a24a031f8de52d -------------------------------- KASAN marks caches that are sanitized with the SLAB_KASAN cache flag. Currently if the metadata that is appended after the object (stores e.g. stack trace ids) doesn't fit into KMALLOC_MAX_SIZE (can only happen with SLAB, see the comment in the patch), KASAN turns off sanitization completely. With this change sanitization of the object data is always enabled. However the metadata is only stored when it fits. Instead of checking for SLAB_KASAN flag accross the code to find out whether the metadata is there, use cache->kasan_info.alloc/free_meta_offset. As 0 can be a valid value for free_meta_offset, introduce KASAN_NO_FREE_META as an indicator that the free metadata is missing. Without this change all sanitized KASAN objects would be put into quarantine with generic KASAN. With this change, only the objects that have metadata (i.e. when it fits) are put into quarantine, the rest is freed right away. Along the way rework __kasan_cache_create() and add claryfying comments. Link: https://lkml.kernel.org/r/aee34b87a5e4afe586c2ac6a0b32db8dc4dcc2dc.1606162397.git.andreyknvl@google.com Link: https://linux-review.googlesource.com/id/Icd947e2bea054cb5cfbdc6cf6652227d97032dcbCo-developed-by: NVincenzo Frascino <Vincenzo.Frascino@arm.com> Signed-off-by: NVincenzo Frascino <Vincenzo.Frascino@arm.com> Signed-off-by: NAndrey Konovalov <andreyknvl@google.com> Reviewed-by: NMarco Elver <elver@google.com> Tested-by: NVincenzo Frascino <vincenzo.frascino@arm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Conflicts: mm/kasan/common.c mm/kasan/hw_tags.c mm/kasan/kasan.h mm/kasan/quarantine.c mm/kasan/report.c mm/kasan/report_sw_tags.c mm/kasan/sw_tags.c Signed-off-by: NLiu Shixin <liushixin2@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Andrey Konovalov 提交于
mainline inclusion from mainline-v5.11-rc1 commit 8bb0009b category: bugfix bugzilla: 187796, https://gitee.com/openeuler/kernel/issues/I5W6YV CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8bb0009b19465da5a0cd394b5a6ccc2eaf418f23 -------------------------------- Add set_alloc_info() helper and move kasan_set_track() into it. This will simplify the code for one of the upcoming changes. No functional changes. Link: https://lkml.kernel.org/r/b2393e8f1e311a70fc3aaa2196461b6acdee7d21.1606162397.git.andreyknvl@google.com Link: https://linux-review.googlesource.com/id/I0316193cbb4ecc9b87b7c2eee0dd79f8ec908c1aSigned-off-by: NAndrey Konovalov <andreyknvl@google.com> Reviewed-by: NDmitry Vyukov <dvyukov@google.com> Reviewed-by: NMarco Elver <elver@google.com> Tested-by: NVincenzo Frascino <vincenzo.frascino@arm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NLiu Shixin <liushixin2@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Andrey Konovalov 提交于
mainline inclusion from mainline-v5.11-rc1 commit 6476792f category: bugfix bugzilla: 187796, https://gitee.com/openeuler/kernel/issues/I5W6YV CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6476792f1015a356e6864076c210b328b64d08cc -------------------------------- Rename get_alloc_info() and get_free_info() to kasan_get_alloc_meta() and kasan_get_free_meta() to better reflect what those do and avoid confusion with kasan_set_free_info(). No functional changes. Link: https://lkml.kernel.org/r/27b7c036b754af15a2839e945f6d8bfce32b4c2f.1606162397.git.andreyknvl@google.com Link: https://linux-review.googlesource.com/id/Ib6e4ba61c8b12112b403d3479a9799ac8fff8de1Signed-off-by: NAndrey Konovalov <andreyknvl@google.com> Reviewed-by: NDmitry Vyukov <dvyukov@google.com> Reviewed-by: NMarco Elver <elver@google.com> Tested-by: NVincenzo Frascino <vincenzo.frascino@arm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Conflicts: mm/kasan/generic.c mm/kasan/quarantine.c mm/kasan/report.c mm/kasan/report_sw_tags.c mm/kasan/sw_tags.c Signed-off-by: NLiu Shixin <liushixin2@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Andrey Konovalov 提交于
mainline inclusion from mainline-v5.11-rc1 commit c696de9f category: bugfix bugzilla: 187796, https://gitee.com/openeuler/kernel/issues/I5W6YV CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c696de9f12b7ddeddc05d378fc4dc0f66e9a8c95 -------------------------------- Patch series "kasan: boot parameters for hardware tag-based mode", v4. === Overview Hardware tag-based KASAN mode [1] is intended to eventually be used in production as a security mitigation. Therefore there's a need for finer control over KASAN features and for an existence of a kill switch. This patchset adds a few boot parameters for hardware tag-based KASAN that allow to disable or otherwise control particular KASAN features, as well as provides some initial optimizations for running KASAN in production. There's another planned patchset what will further optimize hardware tag-based KASAN, provide proper benchmarking and tests, and will fully enable tag-based KASAN for production use. Hardware tag-based KASAN relies on arm64 Memory Tagging Extension (MTE) [2] to perform memory and pointer tagging. Please see [3] and [4] for detailed analysis of how MTE helps to fight memory safety problems. The features that can be controlled are: 1. Whether KASAN is enabled at all. 2. Whether KASAN collects and saves alloc/free stacks. 3. Whether KASAN panics on a detected bug or not. The patch titled "kasan: add and integrate kasan boot parameters" of this series adds a few new boot parameters. kasan.mode allows to choose one of three main modes: - kasan.mode=off - KASAN is disabled, no tag checks are performed - kasan.mode=prod - only essential production features are enabled - kasan.mode=full - all KASAN features are enabled The chosen mode provides default control values for the features mentioned above. However it's also possible to override the default values by providing: - kasan.stacktrace=off/on - enable stacks collection (default: on for mode=full, otherwise off) - kasan.fault=report/panic - only report tag fault or also panic (default: report) If kasan.mode parameter is not provided, it defaults to full when CONFIG_DEBUG_KERNEL is enabled, and to prod otherwise. It is essential that switching between these modes doesn't require rebuilding the kernel with different configs, as this is required by the Android GKI (Generic Kernel Image) initiative. === Benchmarks For now I've only performed a few simple benchmarks such as measuring kernel boot time and slab memory usage after boot. There's an upcoming patchset which will optimize KASAN further and include more detailed benchmarking results. The benchmarks were performed in QEMU and the results below exclude the slowdown caused by QEMU memory tagging emulation (as it's different from the slowdown that will be introduced by hardware and is therefore irrelevant). KASAN_HW_TAGS=y + kasan.mode=off introduces no performance or memory impact compared to KASAN_HW_TAGS=n. kasan.mode=prod (manually excluding tagging) introduces 3% of performance and no memory impact (except memory used by hardware to store tags) compared to kasan.mode=off. kasan.mode=full has about 40% performance and 30% memory impact over kasan.mode=prod. Both come from alloc/free stack collection. === Notes This patchset is available here: https://github.com/xairy/linux/tree/up-boot-mte-v4 This patchset is based on v11 of "kasan: add hardware tag-based mode for arm64" patchset [1]. For testing in QEMU hardware tag-based KASAN requires: 1. QEMU built from master [6] (use "-machine virt,mte=on -cpu max" arguments to run). 2. GCC version 10. [1] https://lore.kernel.org/linux-arm-kernel/cover.1606161801.git.andreyknvl@google.com/T/#t [2] https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/enhancing-memory-safety [3] https://arxiv.org/pdf/1802.09517.pdf [4] https://github.com/microsoft/MSRC-Security-Research/blob/master/papers/2020/Security%20analysis%20of%20memory%20tagging.pdf [5] https://source.android.com/devices/architecture/kernel/generic-kernel-image [6] https://github.com/qemu/qemu === Tags Tested-by: NVincenzo Frascino <vincenzo.frascino@arm.com> This patch (of 19): Move get_free_info() call into quarantine_put() to simplify the call site. No functional changes. Link: https://lkml.kernel.org/r/cover.1606162397.git.andreyknvl@google.com Link: https://lkml.kernel.org/r/312d0a3ef92cc6dc4fa5452cbc1714f9393ca239.1606162397.git.andreyknvl@google.com Link: https://linux-review.googlesource.com/id/Iab0f04e7ebf8d83247024b7190c67c3c34c7940fSigned-off-by: NAndrey Konovalov <andreyknvl@google.com> Reviewed-by: NDmitry Vyukov <dvyukov@google.com> Reviewed-by: NMarco Elver <elver@google.com> Tested-by: NVincenzo Frascino <vincenzo.frascino@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NLiu Shixin <liushixin2@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 29 9月, 2022 3 次提交
-
-
由 Jann Horn 提交于
stable inclusion from stable-v5.10.142 commit 895428ee124ad70b9763259308354877b725c31d category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5PE9S CVE: CVE-2022-39188 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=895428ee124ad70b9763259308354877b725c31d -------------------------------- commit b67fbebd upstream. Some drivers rely on having all VMAs through which a PFN might be accessible listed in the rmap for correctness. However, on X86, it was possible for a VMA with stale TLB entries to not be listed in the rmap. This was fixed in mainline with commit b67fbebd ("mmu_gather: Force tlb-flush VM_PFNMAP vmas"), but that commit relies on preceding refactoring in commit 18ba064e ("mmu_gather: Let there be one tlb_{start,end}_vma() implementation") and commit 1e9fdf21 ("mmu_gather: Remove per arch tlb_{start,end}_vma()"). This patch provides equivalent protection without needing that refactoring, by forcing a TLB flush between removing PTEs in unmap_vmas() and the call to unlink_file_vma() in free_pgtables(). [This is a stable-specific rewrite of the upstream commit!] Signed-off-by: NJann Horn <jannh@google.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Nze zuo <zuoze1@huawei.com> Reviewed-by: NChen Wandun <chenwandun@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Mike Kravetz 提交于
stable inclusion from stable-v5.10.121 commit 63758dd9595f87c7e7b5f826fd2dcf53d6aff0cf category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6CQ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=63758dd9595f87c7e7b5f826fd2dcf53d6aff0cf -------------------------------- commit 48381273 upstream. The routine huge_pmd_unshare() is passed a pointer to an address associated with an area which may be unshared. If unshare is successful this address is updated to 'optimize' callers iterating over huge page addresses. For the optimization to work correctly, address should be updated to the last huge page in the unmapped/unshared area. However, in the common case where the passed address is PUD_SIZE aligned, the address is incorrectly updated to the address of the preceding huge page. That wastes CPU cycles as the unmapped/unshared range is scanned twice. Link: https://lkml.kernel.org/r/20220524205003.126184-1-mike.kravetz@oracle.com Fixes: 39dde65c ("shared page table for hugetlb page") Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com> Acked-by: NMuchun Song <songmuchun@bytedance.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Rei Yamamoto 提交于
stable inclusion from stable-v5.10.121 commit 7994d890123a6cad033f2842ff0177a9bda1cb23 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6CQ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=7994d890123a6cad033f2842ff0177a9bda1cb23 -------------------------------- commit bbe832b9 upstream. At present, pages not in the target zone are added to cc->migratepages list in isolate_migratepages_block(). As a result, pages may migrate between nodes unintentionally. This would be a serious problem for older kernels without commit a984226f ("mm: memcontrol: remove the pgdata parameter of mem_cgroup_page_lruvec"), because it can corrupt the lru list by handling pages in list without holding proper lru_lock. Avoid returning a pfn outside the target zone in the case that it is not aligned with a pageblock boundary. Otherwise isolate_migratepages_block() will handle pages not in the target zone. Link: https://lkml.kernel.org/r/20220511044300.4069-1-yamamoto.rei@jp.fujitsu.com Fixes: 70b44595 ("mm, compaction: use free lists to quickly locate a migration source") Signed-off-by: NRei Yamamoto <yamamoto.rei@jp.fujitsu.com> Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com> Acked-by: NMel Gorman <mgorman@techsingularity.net> Reviewed-by: NOscar Salvador <osalvador@suse.de> Cc: Don Dutile <ddutile@redhat.com> Cc: Wonhyuk Yang <vvghjk1234@gmail.com> Cc: Rei Yamamoto <yamamoto.rei@jp.fujitsu.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
- 20 9月, 2022 2 次提交
-
-
由 liubo 提交于
euleros inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5DC4A CVE: NA ------------------------------------------------- etmem, the memory vertical expansion technology, The existing memory expansion tool etmem swaps out all pages that can be swapped out for the process by default, unless the page is marked with lock flag. The function of swapping out specified pages is added. The process adds VM_SWAPFLAG flags for pages to be swapped out. The etmem adds filters to the scanning module and swaps out only these pages. Signed-off-by: Nliubo <liubo254@huawei.com> Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 liubo 提交于
euleros inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5DC4A CVE: NA ------------------------------------------------- etmem, the memory vertical expansion technology, In the current etmem process, memory page swapping is implemented by invoking shrink_page_list. When this interface is invoked for the first time, pages are added to the swap cache and written to disks.The swap cache page is reclaimed only when this interface is invoked for the second time and no process accesses the page.However, in the etmem process, the user mode scans pages that have been accessed, and the migration is not delivered to pages that are not accessed by processes. Therefore, the swap cache may always be occupied. To solve the preceding problem, add the logic for actively reclaiming the swap cache.When the swap cache occupies a large amount of memory, the system proactively scans the LRU linked list and reclaims the swap cache to save memory within the specified range. Signed-off-by: Nliubo <liubo254@huawei.com> Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-