- 08 2月, 2021 40 次提交
-
-
由 Chen Zhou 提交于
maillist inclusion category: feature bugzilla: 47954 Reference: https://lkml.org/lkml/2021/1/30/53 ------------------------------------------------- Make the functions reserve_crashkernel[_low]() as generic. Arm64 will use these to reimplement crashkernel=X. Signed-off-by: NChen Zhou <chenzhou10@huawei.com> Tested-by: NJohn Donnelly <John.p.donnelly@oracle.com> Signed-off-by: NChen Zhou <chenzhou10@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Chen Zhou 提交于
maillist inclusion category: feature bugzilla: 47954 Reference: https://lkml.org/lkml/2021/1/30/53 ------------------------------------------------- We will make the functions reserve_crashkernel() as generic, the xen_pv_domain() check in reserve_crashkernel() is relevant only to x86, the same as insert_resource() in reserve_crashkernel[_low](). So move xen_pv_domain() check and insert_resource() to setup_arch() to keep them in x86. Suggested-by: NMike Rapoport <rppt@kernel.org> Signed-off-by: NChen Zhou <chenzhou10@huawei.com> Tested-by: NJohn Donnelly <John.p.donnelly@oracle.com> Signed-off-by: NChen Zhou <chenzhou10@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Chen Zhou 提交于
maillist inclusion category: feature bugzilla: 47954 Reference: https://lkml.org/lkml/2021/1/30/53 ------------------------------------------------- To make the functions reserve_crashkernel() as generic, replace some hard-coded numbers with macro CRASH_ADDR_LOW_MAX. Signed-off-by: NChen Zhou <chenzhou10@huawei.com> Tested-by: NJohn Donnelly <John.p.donnelly@oracle.com> Signed-off-by: NChen Zhou <chenzhou10@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Chen Zhou 提交于
maillist inclusion category: feature bugzilla: 47954 Reference: https://lkml.org/lkml/2021/1/30/53 ------------------------------------------------- The lower bounds of crash kernel reservation and crash kernel low reservation are different, use the consistent value CRASH_ALIGN. Suggested-by: NDave Young <dyoung@redhat.com> Signed-off-by: NChen Zhou <chenzhou10@huawei.com> Tested-by: NJohn Donnelly <John.p.donnelly@oracle.com> Signed-off-by: NChen Zhou <chenzhou10@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Chen Zhou 提交于
maillist inclusion category: feature bugzilla: 47954 Reference: https://lkml.org/lkml/2021/1/30/53 ------------------------------------------------- Move CRASH_ALIGN to header asm/kexec.h for later use. Besides, the alignment of crash kernel regions in x86 is 16M(CRASH_ALIGN), but function reserve_crashkernel() also used 1M alignment. So just replace hard-coded alignment 1M with macro CRASH_ALIGN. Suggested-by: NDave Young <dyoung@redhat.com> Suggested-by: NBaoquan He <bhe@redhat.com> Signed-off-by: NChen Zhou <chenzhou10@huawei.com> Tested-by: NJohn Donnelly <John.p.donnelly@oracle.com> Signed-off-by: NChen Zhou <chenzhou10@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Liu Hua 提交于
hulk inclusion category: bugfix Bugzilla: 47259 CVE: N/A ---------------------------------------- With CONFIG_ARM_LPAE=y, memory in 32-bit ARM systems can exceed 4G. So if we use kdump in such systems. The capture kernel should parse 64-bit elf header(parse_crash_elf64_headers). And this process can not pass because ARM linux does not supply related check function. This patch adds check functions related of elf64 header. Signed-off-by: NLiu Hua <sdu.liu@huawei.com> Signed-off-by: NYufen Wang <wangyufen@huawei.com> Reviewed-by: NLi Bin <huawei.libin@huawei.com> Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com> Conflicts: arch/arm/include/asm/elf.h Signed-off-by: NLi Huafei <lihuafei1@huawei.com> Reviewed-by: NWang Yufen <wangyufen@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Yufen Wang 提交于
hulk inclusion category: bugfix bugzilla: 47258 CVE: N/A ------------------------------------------------- The kexec will boot a captured kernel while the kernel panic. But it boots failed if the kernel panic in handler function of PPI. The reason is that the PPI has not been 'eoi', other interrupts can not be handled when booting the captured kernel. This patch fix this bug. Signed-off-by: NYufen Wang <wangyufen@huawei.com> Reviewed-by: NHanjun Guo <guohanjun@huawei.com> Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com> Conflicts: arch/arm/kernel/machine_kexec.c Signed-off-by: NLi Huafei <lihuafei1@huawei.com> Reviewed-by: NWang Yufen <wangyufen@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 liujian 提交于
hulk inclusion category: bugfix bugzilla: 47459 CVE: NA --------------------------- syzkaller triggered a UBSAN warning: [ 196.188950] UBSAN: Undefined behaviour in drivers/input/input.c:62:23 [ 196.188958] signed integer overflow: [ 196.188964] -2147483647 - 104 cannot be represented in type 'int [2]' [ 196.188973] CPU: 7 PID: 4763 Comm: syz-executor Not tainted 4.19.0-514.55.6.9.x86_64+ #7 [ 196.188977] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 196.188979] Call Trace: [ 196.189001] dump_stack+0x91/0xeb [ 196.189014] ubsan_epilogue+0x9/0x7c [ 196.189020] handle_overflow+0x1d7/0x22c [ 196.189028] ? __ubsan_handle_negate_overflow+0x18f/0x18f [ 196.189038] ? __mutex_lock+0x213/0x13f0 [ 196.189053] ? drop_futex_key_refs+0xa0/0xa0 [ 196.189070] ? __might_fault+0xef/0x1b0 [ 196.189096] input_handle_event+0xe1b/0x1290 [ 196.189108] input_inject_event+0x1d7/0x27e [ 196.189119] evdev_write+0x2cf/0x3f0 [ 196.189129] ? evdev_pass_values+0xd40/0xd40 [ 196.189157] ? mark_held_locks+0x160/0x160 [ 196.189171] ? __vfs_write+0xe0/0x6c0 [ 196.189175] ? evdev_pass_values+0xd40/0xd40 [ 196.189179] __vfs_write+0xe0/0x6c0 [ 196.189186] ? kernel_read+0x130/0x130 [ 196.189204] ? _cond_resched+0x15/0x30 [ 196.189214] ? __inode_security_revalidate+0xb8/0xe0 [ 196.189222] ? selinux_file_permission+0x354/0x430 [ 196.189233] vfs_write+0x160/0x440 [ 196.189242] ksys_write+0xc1/0x190 [ 196.189248] ? __ia32_sys_read+0xb0/0xb0 [ 196.189259] ? trace_hardirqs_on_thunk+0x1a/0x1c [ 196.189267] ? do_syscall_64+0x22/0x4a0 [ 196.189276] do_syscall_64+0xa5/0x4a0 [ 196.189287] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 196.189293] RIP: 0033:0x44e7c9 [ 196.189299] Code: fc ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 the syzkaller reproduce script(but can't reproduce it every time): r0 = syz_open_dev$evdev(&(0x7f0000000100)='/dev/input/event#\x00', 0x2, 0x1) write$binfmt_elf64(r0, &(0x7f0000000240)={{0x7f, 0x45, 0x4c, 0x46, 0x40, 0x2, 0x2, 0xffffffff, 0xffffffffffff374c, 0x3, 0x0, 0x80000001, 0x103, 0x40, 0x22e, 0x26, 0x1, 0x38, 0x2, 0xa23, 0x1, 0x2}, [{0x6474e557, 0x5, 0x6, 0x2, 0x9, 0x9, 0x6c3, 0x1ff}], "", [[], [], [], []]}, 0x478) ioctl$EVIOCGSW(0xffffffffffffffff, 0x8040451b, &(0x7f0000000040)=""/7) syz_open_dev$evdev(&(0x7f0000000100)='/dev/input/event#\x00', 0x2, 0x1) r1 = syz_open_dev$evdev(&(0x7f0000000100)='/dev/input/event#\x00', 0x2, 0x1) openat$smack_task_current(0xffffffffffffff9c, &(0x7f0000000040)='/proc/self/attr/current\x00', 0x2, 0x0) ioctl$EVIOCSABS0(r1, 0x401845c0, &(0x7f0000000000)={0x4, 0x10000, 0x4, 0xd1, 0x81, 0x3}) eventfd(0x1ff) syz_open_dev$evdev(&(0x7f0000000100)='/dev/input/event#\x00', 0x2, 0x200) syz_open_dev$evdev(&(0x7f0000000100)='/dev/input/event#\x00', 0x2, 0x1) syz_open_dev$evdev(&(0x7f0000000100)='/dev/input/event#\x00', 0x2, 0x1) syz_open_dev$evdev(&(0x7f0000000100)='/dev/input/event#\x00', 0x2, 0x1) syz_open_dev$evdev(&(0x7f0000000100)='/dev/input/event#\x00', 0x2, 0x1) Typecast int to long to fix the issue. Signed-off-by: Nliujian <liujian56@huawei.com> Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NXiang Yang <xiangyang3@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Yang Yingliang 提交于
hulk inclusion category: feature bugzilla: 47984 CVE: NA -------------------------------------------------- If a cntvct workaround is enabled, read CNTVCT_EL0 twice in VDSO to avoid the clock bug. Without this patch on Kunpeng916: ./gettimeofday -E -C 200 -L -S -W -N "gettimeofday" Running: gettimeofday# ./gettimeofday -E -C 200 -L -S -W -N gettimeofday prc thr usecs/call samples errors cnt/samp gettimeofday 1 1 0.31753 198 0 20000 With this patch on Kunpeng916: ./gettimeofday -E -C 200 -L -S -W -N "gettimeofday" Running: gettimeofday# ./gettimeofday -E -C 200 -L -S -W -N gettimeofday prc thr usecs/call samples errors cnt/samp gettimeofday 1 1 0.05244 198 0 20000 Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Reviewed-by: NHanjun Guo <guohanjun@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Yang Yingliang 提交于
hulk inclusion category: feature bugzilla: 47984 CVE: NA -------------------------------------------------- It costs very much time to read CNTVCT_EL0, if a cntvct workaround and CNTVCT_EL0 trap is enabled. To decrease the read time, we disable CNTVCT_EL0 trap, introduce vdso_fix and vdso_shift for doing cntvct workaround in VDSO. Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Reviewed-by: NHanjun Guo <guohanjun@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Lu Jialin 提交于
hulk inclusion category: bugfix Bugzilla: 47951 CVE: NA -------------------------- When echo a Z process into tasks, it should return -ERSCH instead of 0. Signed-off-by: NLu Jialin <lujialin4@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Ard Biesheuvel 提交于
mainline inclusion from mainline-5.11-rc1 commit 10fce53c category: feature feature: ARM KASAN support bugzilla: 46872 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=10fce53c0ef8f6e79115c3d9e0d7ea1338c3fa37 ------------------------------------------------- The early ATAGS/DT mapping code uses SECTION_SHIFT to mask low order bits of R2, and decides that no ATAGS/DTB were provided if the resulting value is 0x0. This means that on systems where DRAM starts at 0x0 (such as Raspberry Pi), no explicit mapping of the DT will be created if R2 points into the first 1 MB section of memory. This was not a problem before, because the decompressed kernel is loaded at the base of DRAM and mapped using sections as well, and so as long as the DT is referenced via a virtual address that uses the same translation (the linear map, in this case), things work fine. However, commit 7a1be318 ("9012/1: move device tree mapping out of linear region") changes this, and now the DT is referenced via a virtual address that is disjoint from the linear mapping of DRAM, and so we need the early code to create the DT mapping unconditionally. So let's create the early DT mapping for any value of R2 != 0x0. Reported-by: N"kernelci.org bot" <bot@kernelci.org> Reviewed-by: NLinus Walleij <linus.walleij@linaro.org> Signed-off-by: NArd Biesheuvel <ardb@kernel.org> Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk> (cherry picked from commit 10fce53c) Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: NJing Xiangfeng <jingxiangfeng@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Ard Biesheuvel 提交于
mainline inclusion from mainline-5.11-rc1 commit 4d576cab category: feature feature: ARM KASAN support bugzilla: 46872 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4d576cab16f57e1f87978f6997a725179398341e ------------------------------------------------- KASAN uses the routines in stacktrace.c to capture the call stack each time memory gets allocated or freed. Some of these routines are also used to log CPU and memory context when exceptions are taken, and so in some cases, memory accesses may be made that are not strictly in line with the KASAN constraints, and may therefore trigger false KASAN positives. So follow the example set by other architectures, and simply disable KASAN instrumentation for these routines. Reviewed-by: NLinus Walleij <linus.walleij@linaro.org> Signed-off-by: NArd Biesheuvel <ardb@kernel.org> Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk> (cherry picked from commit 4d576cab) Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: NJing Xiangfeng <jingxiangfeng@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Fangrui Song 提交于
mainline inclusion from mainline-5.11-rc1 commit 735e8d93 category: feature feature: ARM KASAN support bugzilla: 46872 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=735e8d93dc2b107f7891a9c2b1c4cfbea1fcbbbc ------------------------------------------------- Commit d6d51a96 ("ARM: 9014/2: Replace string mem* functions for KASan") add .weak directives to memcpy/memmove/memset to avoid collision with KASAN interceptors. This does not work with LLVM's integrated assembler (the assembly snippet `.weak memcpy ... .globl memcpy` produces a STB_GLOBAL memcpy while GNU as produces a STB_WEAK memcpy). LLVM 12 (since https://reviews.llvm.org/D90108) will error on such an overridden symbol binding. Use the appropriate WEAK macro instead. Link: https://github.com/ClangBuiltLinux/linux/issues/1190Reviewed-by: NJing Xiangfeng <jingxiangfeng@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Ard Biesheuvel 提交于
mainline inclusion from mainline-5.11-rc1 commit fc2933c1 category: feature feature: ARM KASAN support bugzilla: 46872 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fc2933c133744305236793025b00c2f7d258b687 ------------------------------------------------- Commit 149a3ffe62b9dbc3 ("9012/1: move device tree mapping out of linear region") created a permanent, read-only section mapping of the device tree blob provided by the firmware, and added a set of macros to get the base and size of the virtually mapped FDT based on the physical address. However, while the mapping code uses the SECTION_SIZE macro correctly, the macros use PMD_SIZE instead, which means something entirely different on ARM when using short descriptors, and is therefore not the right quantity to use here. So replace PMD_SIZE with SECTION_SIZE. While at it, change the names of the macro and its parameter to clarify that it returns the virtual address of the start of the FDT, based on the physical address in memory. Tested-by: NJoel Stanley <joel@jms.id.au> Tested-by: NMarek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: NArd Biesheuvel <ardb@kernel.org> Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk> (cherry picked from commit fc2933c1) Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: NJing Xiangfeng <jingxiangfeng@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Linus Walleij 提交于
mainline inclusion from mainline-5.11-rc1 commit 42101571 category: feature feature: ARM KASAN support bugzilla: 46872 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=421015713b306e47af95d4d61cdfbd96d462e4cb ------------------------------------------------- This patch enables the kernel address sanitizer for ARM. XIP_KERNEL has not been tested and is therefore not allowed for now. Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: kasan-dev@googlegroups.com Acked-by: NDmitry Vyukov <dvyukov@google.com> Reviewed-by: NArd Biesheuvel <ardb@kernel.org> Tested-by: Ard Biesheuvel <ardb@kernel.org> # QEMU/KVM/mach-virt/LPAE/8G Tested-by: Florian Fainelli <f.fainelli@gmail.com> # Brahma SoCs Tested-by: Ahmad Fatoum <a.fatoum@pengutronix.de> # i.MX6Q Signed-off-by: NAbbott Liu <liuwenliang@huawei.com> Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NLinus Walleij <linus.walleij@linaro.org> Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk> (cherry picked from commit 42101571) Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: NJing Xiangfeng <jingxiangfeng@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Linus Walleij 提交于
mainline inclusion from mainline-5.11-rc1 commit 5615f69b category: feature feature: ARM KASAN support bugzilla: 46872 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5615f69bc2097452ecc954f5264d784e158d6801 ------------------------------------------------- This patch initializes KASan shadow region's page table and memory. There are two stage for KASan initializing: 1. At early boot stage the whole shadow region is mapped to just one physical page (kasan_zero_page). It is finished by the function kasan_early_init which is called by __mmap_switched(arch/arm/kernel/ head-common.S) 2. After the calling of paging_init, we use kasan_zero_page as zero shadow for some memory that KASan does not need to track, and we allocate a new shadow space for the other memory that KASan need to track. These issues are finished by the function kasan_init which is call by setup_arch. When using KASan we also need to increase the THREAD_SIZE_ORDER from 1 to 2 as the extra calls for shadow memory uses quite a bit of stack. As we need to make a temporary copy of the PGD when setting up shadow memory we create a helpful PGD_SIZE definition for both LPAE and non-LPAE setups. The KASan core code unconditionally calls pud_populate() so this needs to be changed from BUG() to do {} while (0) when building with KASan enabled. After the initial development by Andre Ryabinin several modifications have been made to this code: Abbott Liu <liuwenliang@huawei.com> - Add support ARM LPAE: If LPAE is enabled, KASan shadow region's mapping table need be copied in the pgd_alloc() function. - Change kasan_pte_populate,kasan_pmd_populate,kasan_pud_populate, kasan_pgd_populate from .meminit.text section to .init.text section. Reported by Florian Fainelli <f.fainelli@gmail.com> Linus Walleij <linus.walleij@linaro.org>: - Drop the custom mainpulation of TTBR0 and just use cpu_switch_mm() to switch the pgd table. - Adopt to handle 4th level page tabel folding. - Rewrite the entire page directory and page entry initialization sequence to be recursive based on ARM64:s kasan_init.c. Ard Biesheuvel <ardb@kernel.org>: - Necessary underlying fixes. - Crucial bug fixes to the memory set-up code. Co-developed-by: NAndrey Ryabinin <aryabinin@virtuozzo.com> Co-developed-by: NAbbott Liu <liuwenliang@huawei.com> Co-developed-by: NArd Biesheuvel <ardb@kernel.org> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: kasan-dev@googlegroups.com Cc: Mike Rapoport <rppt@linux.ibm.com> Acked-by: NMike Rapoport <rppt@linux.ibm.com> Reviewed-by: NArd Biesheuvel <ardb@kernel.org> Tested-by: Ard Biesheuvel <ardb@kernel.org> # QEMU/KVM/mach-virt/LPAE/8G Tested-by: Florian Fainelli <f.fainelli@gmail.com> # Brahma SoCs Tested-by: Ahmad Fatoum <a.fatoum@pengutronix.de> # i.MX6Q Reported-by: NRussell King - ARM Linux <rmk+kernel@armlinux.org.uk> Reported-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: NAbbott Liu <liuwenliang@huawei.com> Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NArd Biesheuvel <ardb@kernel.org> Signed-off-by: NLinus Walleij <linus.walleij@linaro.org> Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk> (cherry picked from commit 5615f69b) Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: NJing Xiangfeng <jingxiangfeng@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Linus Walleij 提交于
mainline inclusion from mainline-5.11-rc1 commit c12366ba category: feature feature: ARM KASAN support bugzilla: 46872 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c12366ba441da2f6f2b915410aca2b5b39c16514 ------------------------------------------------- Define KASAN_SHADOW_OFFSET,KASAN_SHADOW_START and KASAN_SHADOW_END for the Arm kernel address sanitizer. We are "stealing" lowmem (the 4GB addressable by a 32bit architecture) out of the virtual address space to use as shadow memory for KASan as follows: +----+ 0xffffffff | | | | |-> Static kernel image (vmlinux) BSS and page table | |/ +----+ PAGE_OFFSET | | | | |-> Loadable kernel modules virtual address space area | |/ +----+ MODULES_VADDR = KASAN_SHADOW_END | | | | |-> The shadow area of kernel virtual address. | |/ +----+-> TASK_SIZE (start of kernel space) = KASAN_SHADOW_START the | | shadow address of MODULES_VADDR | | | | | | | | |-> The user space area in lowmem. The kernel address | | | sanitizer do not use this space, nor does it map it. | | | | | | | | | | | | | |/ ------ 0 0 .. TASK_SIZE is the memory that can be used by shared userspace/kernelspace. It us used for userspace processes and for passing parameters and memory buffers in system calls etc. We do not need to shadow this area. KASAN_SHADOW_START: This value begins with the MODULE_VADDR's shadow address. It is the start of kernel virtual space. Since we have modules to load, we need to cover also that area with shadow memory so we can find memory bugs in modules. KASAN_SHADOW_END This value is the 0x100000000's shadow address: the mapping that would be after the end of the kernel memory at 0xffffffff. It is the end of kernel address sanitizer shadow area. It is also the start of the module area. KASAN_SHADOW_OFFSET: This value is used to map an address to the corresponding shadow address by the following formula: shadow_addr = (address >> 3) + KASAN_SHADOW_OFFSET; As you would expect, >> 3 is equal to dividing by 8, meaning each byte in the shadow memory covers 8 bytes of kernel memory, so one bit shadow memory per byte of kernel memory is used. The KASAN_SHADOW_OFFSET is provided in a Kconfig option depending on the VMSPLIT layout of the system: the kernel and userspace can split up lowmem in different ways according to needs, so we calculate the shadow offset depending on this. When kasan is enabled, the definition of TASK_SIZE is not an 8-bit rotated constant, so we need to modify the TASK_SIZE access code in the *.s file. The kernel and modules may use different amounts of memory, according to the VMSPLIT configuration, which in turn determines the PAGE_OFFSET. We use the following KASAN_SHADOW_OFFSETs depending on how the virtual memory is split up: - 0x1f000000 if we have 1G userspace / 3G kernelspace split: - The kernel address space is 3G (0xc0000000) - PAGE_OFFSET is then set to 0x40000000 so the kernel static image (vmlinux) uses addresses 0x40000000 .. 0xffffffff - On top of that we have the MODULES_VADDR which under the worst case (using ARM instructions) is PAGE_OFFSET - 16M (0x01000000) = 0x3f000000 so the modules use addresses 0x3f000000 .. 0x3fffffff - So the addresses 0x3f000000 .. 0xffffffff need to be covered with shadow memory. That is 0xc1000000 bytes of memory. - 1/8 of that is needed for its shadow memory, so 0x18200000 bytes of shadow memory is needed. We "steal" that from the remaining lowmem. - The KASAN_SHADOW_START becomes 0x26e00000, to KASAN_SHADOW_END at 0x3effffff. - Now we can calculate the KASAN_SHADOW_OFFSET for any kernel address as 0x3f000000 needs to map to the first byte of shadow memory and 0xffffffff needs to map to the last byte of shadow memory. Since: SHADOW_ADDR = (address >> 3) + KASAN_SHADOW_OFFSET 0x26e00000 = (0x3f000000 >> 3) + KASAN_SHADOW_OFFSET KASAN_SHADOW_OFFSET = 0x26e00000 - (0x3f000000 >> 3) KASAN_SHADOW_OFFSET = 0x26e00000 - 0x07e00000 KASAN_SHADOW_OFFSET = 0x1f000000 - 0x5f000000 if we have 2G userspace / 2G kernelspace split: - The kernel space is 2G (0x80000000) - PAGE_OFFSET is set to 0x80000000 so the kernel static image uses 0x80000000 .. 0xffffffff. - On top of that we have the MODULES_VADDR which under the worst case (using ARM instructions) is PAGE_OFFSET - 16M (0x01000000) = 0x7f000000 so the modules use addresses 0x7f000000 .. 0x7fffffff - So the addresses 0x7f000000 .. 0xffffffff need to be covered with shadow memory. That is 0x81000000 bytes of memory. - 1/8 of that is needed for its shadow memory, so 0x10200000 bytes of shadow memory is needed. We "steal" that from the remaining lowmem. - The KASAN_SHADOW_START becomes 0x6ee00000, to KASAN_SHADOW_END at 0x7effffff. - Now we can calculate the KASAN_SHADOW_OFFSET for any kernel address as 0x7f000000 needs to map to the first byte of shadow memory and 0xffffffff needs to map to the last byte of shadow memory. Since: SHADOW_ADDR = (address >> 3) + KASAN_SHADOW_OFFSET 0x6ee00000 = (0x7f000000 >> 3) + KASAN_SHADOW_OFFSET KASAN_SHADOW_OFFSET = 0x6ee00000 - (0x7f000000 >> 3) KASAN_SHADOW_OFFSET = 0x6ee00000 - 0x0fe00000 KASAN_SHADOW_OFFSET = 0x5f000000 - 0x9f000000 if we have 3G userspace / 1G kernelspace split, and this is the default split for ARM: - The kernel address space is 1GB (0x40000000) - PAGE_OFFSET is set to 0xc0000000 so the kernel static image uses 0xc0000000 .. 0xffffffff. - On top of that we have the MODULES_VADDR which under the worst case (using ARM instructions) is PAGE_OFFSET - 16M (0x01000000) = 0xbf000000 so the modules use addresses 0xbf000000 .. 0xbfffffff - So the addresses 0xbf000000 .. 0xffffffff need to be covered with shadow memory. That is 0x41000000 bytes of memory. - 1/8 of that is needed for its shadow memory, so 0x08200000 bytes of shadow memory is needed. We "steal" that from the remaining lowmem. - The KASAN_SHADOW_START becomes 0xb6e00000, to KASAN_SHADOW_END at 0xbfffffff. - Now we can calculate the KASAN_SHADOW_OFFSET for any kernel address as 0xbf000000 needs to map to the first byte of shadow memory and 0xffffffff needs to map to the last byte of shadow memory. Since: SHADOW_ADDR = (address >> 3) + KASAN_SHADOW_OFFSET 0xb6e00000 = (0xbf000000 >> 3) + KASAN_SHADOW_OFFSET KASAN_SHADOW_OFFSET = 0xb6e00000 - (0xbf000000 >> 3) KASAN_SHADOW_OFFSET = 0xb6e00000 - 0x17e00000 KASAN_SHADOW_OFFSET = 0x9f000000 - 0x8f000000 if we have 3G userspace / 1G kernelspace with full 1 GB low memory (VMSPLIT_3G_OPT): - The kernel address space is 1GB (0x40000000) - PAGE_OFFSET is set to 0xb0000000 so the kernel static image uses 0xb0000000 .. 0xffffffff. - On top of that we have the MODULES_VADDR which under the worst case (using ARM instructions) is PAGE_OFFSET - 16M (0x01000000) = 0xaf000000 so the modules use addresses 0xaf000000 .. 0xaffffff - So the addresses 0xaf000000 .. 0xffffffff need to be covered with shadow memory. That is 0x51000000 bytes of memory. - 1/8 of that is needed for its shadow memory, so 0x0a200000 bytes of shadow memory is needed. We "steal" that from the remaining lowmem. - The KASAN_SHADOW_START becomes 0xa4e00000, to KASAN_SHADOW_END at 0xaeffffff. - Now we can calculate the KASAN_SHADOW_OFFSET for any kernel address as 0xaf000000 needs to map to the first byte of shadow memory and 0xffffffff needs to map to the last byte of shadow memory. Since: SHADOW_ADDR = (address >> 3) + KASAN_SHADOW_OFFSET 0xa4e00000 = (0xaf000000 >> 3) + KASAN_SHADOW_OFFSET KASAN_SHADOW_OFFSET = 0xa4e00000 - (0xaf000000 >> 3) KASAN_SHADOW_OFFSET = 0xa4e00000 - 0x15e00000 KASAN_SHADOW_OFFSET = 0x8f000000 - The default value of 0xffffffff for KASAN_SHADOW_OFFSET is an error value. We should always match one of the above shadow offsets. When we do this, TASK_SIZE will sometimes get a bit odd values that will not fit into immediate mov assembly instructions. To account for this, we need to rewrite some assembly using TASK_SIZE like this: - mov r1, #TASK_SIZE + ldr r1, =TASK_SIZE or - cmp r4, #TASK_SIZE + ldr r0, =TASK_SIZE + cmp r4, r0 this is done to avoid the immediate #TASK_SIZE that need to fit into a limited number of bits. Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: kasan-dev@googlegroups.com Cc: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: NArd Biesheuvel <ardb@kernel.org> Tested-by: Ard Biesheuvel <ardb@kernel.org> # QEMU/KVM/mach-virt/LPAE/8G Tested-by: Florian Fainelli <f.fainelli@gmail.com> # Brahma SoCs Tested-by: Ahmad Fatoum <a.fatoum@pengutronix.de> # i.MX6Q Reported-by: NArd Biesheuvel <ardb@kernel.org> Signed-off-by: NAbbott Liu <liuwenliang@huawei.com> Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NLinus Walleij <linus.walleij@linaro.org> Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk> (cherry picked from commit c12366ba) Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: NJing Xiangfeng <jingxiangfeng@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Linus Walleij 提交于
mainline inclusion from mainline-5.11-rc1 commit d6d51a96 category: feature feature: ARM KASAN support bugzilla: 46872 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d6d51a96c7d63b7450860a3037f2d62388286a52 ------------------------------------------------- Functions like memset()/memmove()/memcpy() do a lot of memory accesses. If a bad pointer is passed to one of these functions it is important to catch this. Compiler instrumentation cannot do this since these functions are written in assembly. KASan replaces these memory functions with instrumented variants. The original functions are declared as weak symbols so that the strong definitions in mm/kasan/kasan.c can replace them. The original functions have aliases with a '__' prefix in their name, so we can call the non-instrumented variant if needed. We must use __memcpy()/__memset() in place of memcpy()/memset() when we copy .data to RAM and when we clear .bss, because kasan_early_init cannot be called before the initialization of .data and .bss. For the kernel compression and EFI libstub's custom string libraries we need a special quirk: even if these are built without KASan enabled, they rely on the global headers for their custom string libraries, which means that e.g. memcpy() will be defined to __memcpy() and we get link failures. Since these implementations are written i C rather than assembly we use e.g. __alias(memcpy) to redirected any users back to the local implementation. Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: kasan-dev@googlegroups.com Reviewed-by: NArd Biesheuvel <ardb@kernel.org> Tested-by: Ard Biesheuvel <ardb@kernel.org> # QEMU/KVM/mach-virt/LPAE/8G Tested-by: Florian Fainelli <f.fainelli@gmail.com> # Brahma SoCs Tested-by: Ahmad Fatoum <a.fatoum@pengutronix.de> # i.MX6Q Reported-by: NRussell King - ARM Linux <rmk+kernel@armlinux.org.uk> Signed-off-by: NAhmad Fatoum <a.fatoum@pengutronix.de> Signed-off-by: NAbbott Liu <liuwenliang@huawei.com> Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NLinus Walleij <linus.walleij@linaro.org> Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk> (cherry picked from commit d6d51a96) Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: NJing Xiangfeng <jingxiangfeng@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Linus Walleij 提交于
mainline inclusion from mainline-5.11-rc1 commit d5d44e7e category: feature feature: ARM KASAN support bugzilla: 46872 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d5d44e7e3507b0ad868f68e0c5bca6a57afa1b8b ------------------------------------------------- Disable instrumentation for arch/arm/boot/compressed/* since that code is executed before the kernel has even set up its mappings and definately out of scope for KASan. Disable instrumentation of arch/arm/vdso/* because that code is not linked with the kernel image, so the KASan management code would fail to link. Disable instrumentation of arch/arm/mm/physaddr.c. See commit ec6d06ef ("arm64: Add support for CONFIG_DEBUG_VIRTUAL") for more details. Disable kasan check in the function unwind_pop_register because it does not matter that kasan checks failed when unwind_pop_register() reads the stack memory of a task. Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: kasan-dev@googlegroups.com Reviewed-by: NArd Biesheuvel <ardb@kernel.org> Tested-by: Ard Biesheuvel <ardb@kernel.org> # QEMU/KVM/mach-virt/LPAE/8G Tested-by: Florian Fainelli <f.fainelli@gmail.com> # Brahma SoCs Tested-by: Ahmad Fatoum <a.fatoum@pengutronix.de> # i.MX6Q Reported-by: NFlorian Fainelli <f.fainelli@gmail.com> Reported-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NAbbott Liu <liuwenliang@huawei.com> Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NLinus Walleij <linus.walleij@linaro.org> Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk> (cherry picked from commit d5d44e7e) Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: NJing Xiangfeng <jingxiangfeng@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Ard Biesheuvel 提交于
mainline inclusion from mainline-5.11-rc1 commit 7a1be318 category: feature feature: ARM KASAN support bugzilla: 46872 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7a1be318f5795cb66fa0dc86b3ace427fe68057f ------------------------------------------------- On ARM, setting up the linear region is tricky, given the constraints around placement and alignment of the memblocks, and how the kernel itself as well as the DT are placed in physical memory. Let's simplify matters a bit, by moving the device tree mapping to the top of the address space, right between the end of the vmalloc region and the start of the the fixmap region, and create a read-only mapping for it that is independent of the size of the linear region, and how it is organized. Since this region was formerly used as a guard region, which will now be populated fully on LPAE builds by this read-only mapping (which will still be able to function as a guard region for stray writes), bump the start of the [underutilized] fixmap region by 512 KB as well, to ensure that there is always a proper guard region here. Doing so still leaves ample room for the fixmap space, even with NR_CPUS set to its maximum value of 32. Tested-by: NLinus Walleij <linus.walleij@linaro.org> Reviewed-by: NLinus Walleij <linus.walleij@linaro.org> Reviewed-by: NNicolas Pitre <nico@fluxnic.net> Signed-off-by: NArd Biesheuvel <ardb@kernel.org> Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk> (cherry picked from commit 7a1be318) Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: NJing Xiangfeng <jingxiangfeng@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Ard Biesheuvel 提交于
mainline inclusion from mainline-5.11-rc1 commit e9a2f8b5 category: feature feature: ARM KASAN support bugzilla: 46872 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e9a2f8b599d0bc22a1b13e69527246ac39c697b4 ------------------------------------------------- Before moving the DT mapping out of the linear region, let's prepare for this change by removing all the phys-to-virt translations of the __atags_pointer variable, and perform this translation only once at setup time. Tested-by: NLinus Walleij <linus.walleij@linaro.org> Reviewed-by: NLinus Walleij <linus.walleij@linaro.org> Acked-by: NNicolas Pitre <nico@fluxnic.net> Signed-off-by: NArd Biesheuvel <ardb@kernel.org> Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk> (cherry picked from commit e9a2f8b5) Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: NJing Xiangfeng <jingxiangfeng@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Yang Yingliang 提交于
hulk inclusion category: bugfix bugzilla: 47877 CVE: CVE-2019-16230 ------------------------------------------------- check the alloc_workqueue return value in radeon_crtc_init() to avoid null-ptr-deref. Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Reviewed-by: NHanjun Guo <guohanjun@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com> Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 John Ogness 提交于
stable inclusion from stable-5.10.12 commit d5ac8304e18025a522b5d1d87629e926064ce134 bugzilla: 47876 -------------------------------- commit 08d60e59 upstream. Commit f0e386ee ("printk: fix buffer overflow potential for print_text()") added string termination in record_print_text(). However it used the wrong base pointer for adding the terminator. This led to a 0-byte being written somewhere beyond the buffer. Use the correct base pointer when adding the terminator. Fixes: f0e386ee ("printk: fix buffer overflow potential for print_text()") Reported-by: NSven Schnelle <svens@linux.ibm.com> Signed-off-by: NJohn Ogness <john.ogness@linutronix.de> Signed-off-by: NPetr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20210124202728.4718-1-john.ogness@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 John Ogness 提交于
stable inclusion from stable-5.10.12 commit 861c2e349a36868f9c19a82844b2eb0abf20939b bugzilla: 47876 -------------------------------- commit f0e386ee upstream. Before the commit 896fbe20 ("printk: use the lockless ringbuffer"), msg_print_text() would only write up to size-1 bytes into the provided buffer. Some callers expect this behavior and append a terminator to returned string. In particular: arch/powerpc/xmon/xmon.c:dump_log_buf() arch/um/kernel/kmsg_dump.c:kmsg_dumper_stdout() msg_print_text() has been replaced by record_print_text(), which currently fills the full size of the buffer. This causes a buffer overflow for the above callers. Change record_print_text() so that it will only use size-1 bytes for text data. Also, for paranoia sakes, add a terminator after the text data. And finally, document this behavior so that it is clear that only size-1 bytes are used and a terminator is added. Fixes: 896fbe20 ("printk: use the lockless ringbuffer") Cc: stable@vger.kernel.org # 5.10+ Signed-off-by: NJohn Ogness <john.ogness@linutronix.de> Reviewed-by: NPetr Mladek <pmladek@suse.com> Acked-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: NPetr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20210114170412.4819-1-john.ogness@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Jean-Philippe Brucker 提交于
stable inclusion from stable-5.10.12 commit cb14bbbb7bbfdb9da25d24cf14f52ef54eee1109 bugzilla: 47876 -------------------------------- commit c8a950d0 upstream. Several Makefiles in tools/ need to define the host toolchain variables. Move their definition to tools/scripts/Makefile.include Signed-off-by: NJean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: NAndrii Nakryiko <andrii@kernel.org> Acked-by: NJiri Olsa <jolsa@redhat.com> Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/bpf/20201110164310.2600671-2-jean-philippe@linaro.org Cc: Alistair Delva <adelva@google.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Zhaoyang Huang 提交于
stable inclusion from stable-5.10.12 commit f472a59aa182d5aac2927633f390514cf7b614b4 bugzilla: 47876 -------------------------------- commit b50da6e9 upstream. The scenario on which "Free swap = -4kB" happens in my system, which is caused by several get_swap_pages racing with each other and show_swap_cache_info happens simutaniously. No need to add a lock on get_swap_page_of_type as we remove "Presub/PosAdd" here. ProcessA ProcessB ProcessC ngoals = 1 ngoals = 1 avail = nr_swap_pages(1) avail = nr_swap_pages(1) nr_swap_pages(1) -= ngoals nr_swap_pages(0) -= ngoals nr_swap_pages = -1 Link: https://lkml.kernel.org/r/1607050340-4535-1-git-send-email-zhaoyang.huang@unisoc.comSigned-off-by: NZhaoyang Huang <zhaoyang.huang@unisoc.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Hailong liu 提交于
stable inclusion from stable-5.10.12 commit c11f7749f1fc9bad6b1f0e073de08fa996f21cc3 bugzilla: 47876 -------------------------------- commit ce8f86ee upstream. The trace point *trace_mm_page_alloc_zone_locked()* in __rmqueue() does not currently cover all branches. Add the missing tracepoint and check the page before do that. [akpm@linux-foundation.org: use IS_ENABLED() to suppress warning] Link: https://lkml.kernel.org/r/20201228132901.41523-1-carver4lio@163.comSigned-off-by: NHailong liu <liu.hailong6@zte.com.cn> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Ivan Babrou <ivan@cloudflare.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Josh Poimboeuf 提交于
stable inclusion from stable-5.10.12 commit c6fd968f58439398b765300aecd7758d501ee49c bugzilla: 47876 -------------------------------- commit 1d489151 upstream. Thanks to a recent binutils change which doesn't generate unused symbols, it's now possible for thunk_64.o be completely empty without CONFIG_PREEMPTION: no text, no data, no symbols. We could edit the Makefile to only build that file when CONFIG_PREEMPTION is enabled, but that will likely create confusion if/when the thunks end up getting used by some other code again. Just ignore it and move on. Reported-by: NNathan Chancellor <natechancellor@gmail.com> Reviewed-by: NNathan Chancellor <natechancellor@gmail.com> Reviewed-by: NMiroslav Benes <mbenes@suse.cz> Tested-by: NNathan Chancellor <natechancellor@gmail.com> Link: https://github.com/ClangBuiltLinux/linux/issues/1254Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Pavel Begunkov 提交于
stable inclusion from stable-5.10.12 commit d92d00861e98db178bd721876d0a06d1e8d5ff1a bugzilla: 47876 -------------------------------- [ Upstream commit 9d5c8190 ] [ 27.629441] BUG: sleeping function called from invalid context at fs/file.c:402 [ 27.631317] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1012, name: io_wqe_worker-0 [ 27.633220] 1 lock held by io_wqe_worker-0/1012: [ 27.634286] #0: ffff888105e26c98 (&ctx->completion_lock) {....}-{2:2}, at: __io_req_complete.part.102+0x30/0x70 [ 27.649249] Call Trace: [ 27.649874] dump_stack+0xac/0xe3 [ 27.650666] ___might_sleep+0x284/0x2c0 [ 27.651566] put_files_struct+0xb8/0x120 [ 27.652481] __io_clean_op+0x10c/0x2a0 [ 27.653362] __io_cqring_fill_event+0x2c1/0x350 [ 27.654399] __io_req_complete.part.102+0x41/0x70 [ 27.655464] io_openat2+0x151/0x300 [ 27.656297] io_issue_sqe+0x6c/0x14e0 [ 27.660991] io_wq_submit_work+0x7f/0x240 [ 27.662890] io_worker_handle_work+0x501/0x8a0 [ 27.664836] io_wqe_worker+0x158/0x520 [ 27.667726] kthread+0x134/0x180 [ 27.669641] ret_from_fork+0x1f/0x30 Instead of cleaning files on overflow, return back overflow cancellation into io_uring_cancel_files(). Previously it was racy to clean REQ_F_OVERFLOW flag, but we got rid of it, and can do it through repetitive attempts targeting all matching requests. Cc: stable@vger.kernel.org # 5.9+ Reported-by: NAbaci <abaci@linux.alibaba.com> Reported-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Cc: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Pavel Begunkov 提交于
stable inclusion from stable-5.10.12 commit 7bccd1c19128140b9fefaa43808924c6932bef5b bugzilla: 47876 -------------------------------- [ Upstream commit 4aa84f2f ] CPU0 CPU1 ---- ---- lock(&new->fa_lock); local_irq_disable(); lock(&ctx->completion_lock); lock(&new->fa_lock); <Interrupt> lock(&ctx->completion_lock); *** DEADLOCK *** Move kill_fasync() out of io_commit_cqring() to io_cqring_ev_posted(), so it doesn't hold completion_lock while doing it. That saves from the reported deadlock, and it's just nice to shorten the locking time and untangle nested locks (compl_lock -> wq_head::lock). Cc: stable@vger.kernel.org # 5.5+ Reported-by: syzbot+91ca3f25bd7f795f019c@syzkaller.appspotmail.com Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Pavel Begunkov 提交于
stable inclusion from stable-5.10.12 commit 186725a80c4e931b6fe31b94d66c989d5f2354c1 bugzilla: 47876 -------------------------------- [ Upstream commit 0b5cd6c3 ] If there are no requests at the time __io_uring_task_cancel() is called, tctx_inflight() returns zero and and it terminates not getting a chance to go through __io_uring_files_cancel() and do io_disable_sqo_submit(). And we absolutely want them disabled by the time cancellation ends. Cc: stable@vger.kernel.org # 5.5+ Reported-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Fixes: d9d05217 ("io_uring: stop SQPOLL submit on creator's death") Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Pavel Begunkov 提交于
stable inclusion from stable-5.10.12 commit 54b4c4f9aba9e5d1ef6877f42a57895b189107c9 bugzilla: 47876 -------------------------------- [ Upstream commit 4325cb49 ] WARNING: CPU: 1 PID: 11100 at fs/io_uring.c:9096 io_uring_flush+0x326/0x3a0 fs/io_uring.c:9096 RIP: 0010:io_uring_flush+0x326/0x3a0 fs/io_uring.c:9096 Call Trace: filp_close+0xb4/0x170 fs/open.c:1280 close_files fs/file.c:401 [inline] put_files_struct fs/file.c:416 [inline] put_files_struct+0x1cc/0x350 fs/file.c:413 exit_files+0x7e/0xa0 fs/file.c:433 do_exit+0xc22/0x2ae0 kernel/exit.c:820 do_group_exit+0x125/0x310 kernel/exit.c:922 get_signal+0x3e9/0x20a0 kernel/signal.c:2770 arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:811 handle_signal_work kernel/entry/common.c:147 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0x148/0x250 kernel/entry/common.c:201 __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline] syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:302 entry_SYSCALL_64_after_hwframe+0x44/0xa9 An SQPOLL ring creator task may have gotten rid of its file note during exit and called io_disable_sqo_submit(), but the io_uring is still left referenced through fdtable, which will be put during close_files() and cause a false positive warning. First split the warning into two for more clarity when is hit, and the add sqo_dead check to handle the described case. Cc: stable@vger.kernel.org # 5.5+ Reported-by: syzbot+a32b546d58dde07875a1@syzkaller.appspotmail.com Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Pavel Begunkov 提交于
stable inclusion from stable-5.10.12 commit 0682759126bc761c325325ca809ae99c93fda2a0 bugzilla: 47876 -------------------------------- [ Upstream commit 6b393a1f ] WARNING: CPU: 1 PID: 9094 at fs/io_uring.c:8884 io_disable_sqo_submit+0x106/0x130 fs/io_uring.c:8884 Call Trace: io_uring_flush+0x28b/0x3a0 fs/io_uring.c:9099 filp_close+0xb4/0x170 fs/open.c:1280 close_fd+0x5c/0x80 fs/file.c:626 __do_sys_close fs/open.c:1299 [inline] __se_sys_close fs/open.c:1297 [inline] __x64_sys_close+0x2f/0xa0 fs/open.c:1297 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 io_uring's final close() may be triggered by any task not only the creator. It's well handled by io_uring_flush() including SQPOLL case, though a warning in io_disable_sqo_submit() will fallaciously fire by moving this warning out to the only call site that matters. Cc: stable@vger.kernel.org # 5.5+ Reported-by: syzbot+2f5d1785dc624932da78@syzkaller.appspotmail.com Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Pavel Begunkov 提交于
stable inclusion from stable-5.10.12 commit 8cb6f4da831bc51145aee3a923f03114121dea6b bugzilla: 47876 -------------------------------- [ Upstream commit 06585c49 ] WARNING: CPU: 0 PID: 8494 at fs/io_uring.c:8717 io_ring_ctx_wait_and_kill+0x4f2/0x600 fs/io_uring.c:8717 Call Trace: io_uring_release+0x3e/0x50 fs/io_uring.c:8759 __fput+0x283/0x920 fs/file_table.c:280 task_work_run+0xdd/0x190 kernel/task_work.c:140 tracehook_notify_resume include/linux/tracehook.h:189 [inline] exit_to_user_mode_loop kernel/entry/common.c:174 [inline] exit_to_user_mode_prepare+0x249/0x250 kernel/entry/common.c:201 __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline] syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:302 entry_SYSCALL_64_after_hwframe+0x44/0xa9 failed io_uring_install_fd() is a special case, we don't do io_ring_ctx_wait_and_kill() directly but defer it to fput, though still need to io_disable_sqo_submit() before. note: it doesn't fix any real problem, just a warning. That's because sqring won't be available to the userspace in this case and so SQPOLL won't submit anything. Cc: stable@vger.kernel.org # 5.5+ Reported-by: syzbot+9c9c35374c0ecac06516@syzkaller.appspotmail.com Fixes: d9d05217 ("io_uring: stop SQPOLL submit on creator's death") Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Pavel Begunkov 提交于
stable inclusion from stable-5.10.12 commit 0e3562e3b2aeb4a6aa4615185a8f59c51cade61b bugzilla: 47876 -------------------------------- [ Upstream commit b4411616 ] general protection fault, probably for non-canonical address 0xdffffc0000000022: 0000 [#1] KASAN: null-ptr-deref in range [0x0000000000000110-0x0000000000000117] RIP: 0010:io_ring_set_wakeup_flag fs/io_uring.c:6929 [inline] RIP: 0010:io_disable_sqo_submit+0xdb/0x130 fs/io_uring.c:8891 Call Trace: io_uring_create fs/io_uring.c:9711 [inline] io_uring_setup+0x12b1/0x38e0 fs/io_uring.c:9739 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 io_disable_sqo_submit() might be called before user rings were allocated, don't do io_ring_set_wakeup_flag() in those cases. Cc: stable@vger.kernel.org # 5.5+ Reported-by: syzbot+ab412638aeb652ded540@syzkaller.appspotmail.com Fixes: d9d05217 ("io_uring: stop SQPOLL submit on creator's death") Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Pavel Begunkov 提交于
stable inclusion from stable-5.10.12 commit a63d9157571b52f7339d6db4c2ab7bc3bfe527c0 bugzilla: 47876 -------------------------------- [ Upstream commit d9d05217 ] When the creator of SQPOLL io_uring dies (i.e. sqo_task), we don't want its internals like ->files and ->mm to be poked by the SQPOLL task, it have never been nice and recently got racy. That can happen when the owner undergoes destruction and SQPOLL tasks tries to submit new requests in parallel, and so calls io_sq_thread_acquire*(). That patch halts SQPOLL submissions when sqo_task dies by introducing sqo_dead flag. Once set, the SQPOLL task must not do any submission, which is synchronised by uring_lock as well as the new flag. The tricky part is to make sure that disabling always happens, that means either the ring is discovered by creator's do_exit() -> cancel, or if the final close() happens before it's done by the creator. The last is guaranteed by the fact that for SQPOLL the creator task and only it holds exactly one file note, so either it pins up to do_exit() or removed by the creator on the final put in flush. (see comments in uring_flush() around file->f_count == 2). One more place that can trigger io_sq_thread_acquire_*() is __io_req_task_submit(). Shoot off requests on sqo_dead there, even though actually we don't need to. That's because cancellation of sqo_task should wait for the request before going any further. note 1: io_disable_sqo_submit() does io_ring_set_wakeup_flag() so the caller would enter the ring to get an error, but it still doesn't guarantee that the flag won't be cleared. note 2: if final __userspace__ close happens not from the creator task, the file note will pin the ring until the task dies. Cc: stable@vger.kernel.org # 5.5+ Fixed: b1b6b5a3 ("kernel/io_uring: cancel io_uring before task works") Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Pavel Begunkov 提交于
stable inclusion from stable-5.10.12 commit da67631a33c342528245817cc61e36dd945665b0 bugzilla: 47876 -------------------------------- [ Upstream commit 6b5733eb] files_cancel() should cancel all relevant requests and drop file notes, so we should never have file notes after that, including on-exit fput and flush. Add a WARN_ONCE to be sure. Cc: stable@vger.kernel.org # 5.5+ Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Pavel Begunkov 提交于
stable inclusion from stable-5.10.12 commit 18f31594ee52ed1f364e376767fb839935fd899c bugzilla: 47876 -------------------------------- [ Upstream commit 4f793dc4 ] A simple preparation change inlining io_uring_attempt_task_drop() into io_uring_flush(). Cc: stable@vger.kernel.org # 5.5+ Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Pavel Begunkov 提交于
stable inclusion from stable-5.10.12 commit 7bf3fb6243a3b153ab1854b331ec19d67a4878bb bugzilla: 47876 -------------------------------- [ Upstream commit b1b6b5a3 ] For cancelling io_uring requests it needs either to be able to run currently enqueued task_works or having it shut down by that moment. Otherwise io_uring_cancel_files() may be waiting for requests that won't ever complete. Go with the first way and do cancellations before setting PF_EXITING and so before putting the task_work infrastructure into a transition state where task_work_run() would better not be called. Cc: stable@vger.kernel.org # 5.5+ Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-