提交 · 8cb8686864a99cfa7962a56777535101d60f8e39 · openeuler / Kernel

08 2月, 2021 40 次提交

x86: kdump: move reserve_crashkernel[_low]() into crash_core.c · 8cb86868

由 Chen Zhou 提交于 2月 03, 2021

maillist inclusion
category: feature
bugzilla: 47954
Reference: https://lkml.org/lkml/2021/1/30/53

-------------------------------------------------

Make the functions reserve_crashkernel[_low]() as generic.
Arm64 will use these to reimplement crashkernel=X.
Signed-off-by: NChen Zhou <chenzhou10@huawei.com>
Tested-by: NJohn Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: NChen Zhou <chenzhou10@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8cb86868

x86: kdump: move xen_pv_domain() check and insert_resource() to setup_arch() · 8ec4a816

由 Chen Zhou 提交于 2月 03, 2021

maillist inclusion
category: feature
bugzilla: 47954
Reference: https://lkml.org/lkml/2021/1/30/53

-------------------------------------------------

We will make the functions reserve_crashkernel() as generic, the
xen_pv_domain() check in reserve_crashkernel() is relevant only to
x86, the same as insert_resource() in reserve_crashkernel[_low]().
So move xen_pv_domain() check and insert_resource() to setup_arch()
to keep them in x86.
Suggested-by: NMike Rapoport <rppt@kernel.org>
Signed-off-by: NChen Zhou <chenzhou10@huawei.com>
Tested-by: NJohn Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: NChen Zhou <chenzhou10@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8ec4a816

x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel() · a2e0b435

由 Chen Zhou 提交于 2月 03, 2021

maillist inclusion
category: feature
bugzilla: 47954
Reference: https://lkml.org/lkml/2021/1/30/53

-------------------------------------------------

To make the functions reserve_crashkernel() as generic,
replace some hard-coded numbers with macro CRASH_ADDR_LOW_MAX.
Signed-off-by: NChen Zhou <chenzhou10@huawei.com>
Tested-by: NJohn Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: NChen Zhou <chenzhou10@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a2e0b435

x86: kdump: make the lower bound of crash kernel reservation consistent · 8882ba54

由 Chen Zhou 提交于 2月 03, 2021

maillist inclusion
category: feature
bugzilla: 47954
Reference: https://lkml.org/lkml/2021/1/30/53

-------------------------------------------------

The lower bounds of crash kernel reservation and crash kernel low
reservation are different, use the consistent value CRASH_ALIGN.
Suggested-by: NDave Young <dyoung@redhat.com>
Signed-off-by: NChen Zhou <chenzhou10@huawei.com>
Tested-by: NJohn Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: NChen Zhou <chenzhou10@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8882ba54

x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN · 873384fe

由 Chen Zhou 提交于 2月 03, 2021

maillist inclusion
category: feature
bugzilla: 47954
Reference: https://lkml.org/lkml/2021/1/30/53

-------------------------------------------------

Move CRASH_ALIGN to header asm/kexec.h for later use. Besides, the
alignment of crash kernel regions in x86 is 16M(CRASH_ALIGN), but
function reserve_crashkernel() also used 1M alignment. So just
replace hard-coded alignment 1M with macro CRASH_ALIGN.
Suggested-by: NDave Young <dyoung@redhat.com>
Suggested-by: NBaoquan He <bhe@redhat.com>
Signed-off-by: NChen Zhou <chenzhou10@huawei.com>
Tested-by: NJohn Donnelly <John.p.donnelly@oracle.com>
Signed-off-by: NChen Zhou <chenzhou10@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

873384fe

ARM: kdump: Add LPAE support · 25ac7c26

由 Liu Hua 提交于 1月 20, 2021

hulk inclusion
category: bugfix
Bugzilla: 47259
CVE: N/A

----------------------------------------

With CONFIG_ARM_LPAE=y, memory in 32-bit ARM systems can exceed
4G. So if we use kdump in such systems. The capture kernel
should parse 64-bit elf header(parse_crash_elf64_headers).

And this process can not pass because ARM linux does not
supply related check function.

This patch adds check functions related of elf64 header.
Signed-off-by: NLiu Hua <sdu.liu@huawei.com>
Signed-off-by: NYufen Wang <wangyufen@huawei.com>
Reviewed-by: NLi Bin <huawei.libin@huawei.com>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>

Conflicts:
	arch/arm/include/asm/elf.h
Signed-off-by: NLi Huafei <lihuafei1@huawei.com>
Reviewed-by: NWang Yufen <wangyufen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

25ac7c26

ARM: kdump: fix timer interrupts panic, could not boot capture_kernel · d72cad45

由 Yufen Wang 提交于 1月 20, 2021

hulk inclusion
category: bugfix
bugzilla: 47258
CVE: N/A

-------------------------------------------------

The kexec will boot a captured kernel while the kernel panic. But it boots
failed if the kernel panic in handler function of PPI. The reason is that
the PPI has not been 'eoi', other interrupts can not be handled when
booting the captured kernel. This patch fix this bug.
Signed-off-by: NYufen Wang <wangyufen@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>

Conflicts:
	arch/arm/kernel/machine_kexec.c
Signed-off-by: NLi Huafei <lihuafei1@huawei.com>
Reviewed-by: NWang Yufen <wangyufen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d72cad45

driver: input: fix UBSAN warning in input_defuzz_abs_event · e5d497b7

由 liujian 提交于 1月 25, 2021

hulk inclusion
category: bugfix
bugzilla: 47459
CVE: NA

---------------------------

syzkaller triggered a UBSAN warning:

[  196.188950] UBSAN: Undefined behaviour in drivers/input/input.c:62:23
[  196.188958] signed integer overflow:
[  196.188964] -2147483647 - 104 cannot be represented in type 'int [2]'
[  196.188973] CPU: 7 PID: 4763 Comm: syz-executor Not tainted
4.19.0-514.55.6.9.x86_64+ #7
[  196.188977] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  196.188979] Call Trace:
[  196.189001]  dump_stack+0x91/0xeb
[  196.189014]  ubsan_epilogue+0x9/0x7c
[  196.189020]  handle_overflow+0x1d7/0x22c
[  196.189028]  ? __ubsan_handle_negate_overflow+0x18f/0x18f
[  196.189038]  ? __mutex_lock+0x213/0x13f0
[  196.189053]  ? drop_futex_key_refs+0xa0/0xa0
[  196.189070]  ? __might_fault+0xef/0x1b0
[  196.189096]  input_handle_event+0xe1b/0x1290
[  196.189108]  input_inject_event+0x1d7/0x27e
[  196.189119]  evdev_write+0x2cf/0x3f0
[  196.189129]  ? evdev_pass_values+0xd40/0xd40
[  196.189157]  ? mark_held_locks+0x160/0x160
[  196.189171]  ? __vfs_write+0xe0/0x6c0
[  196.189175]  ? evdev_pass_values+0xd40/0xd40
[  196.189179]  __vfs_write+0xe0/0x6c0
[  196.189186]  ? kernel_read+0x130/0x130
[  196.189204]  ? _cond_resched+0x15/0x30
[  196.189214]  ? __inode_security_revalidate+0xb8/0xe0
[  196.189222]  ? selinux_file_permission+0x354/0x430
[  196.189233]  vfs_write+0x160/0x440
[  196.189242]  ksys_write+0xc1/0x190
[  196.189248]  ? __ia32_sys_read+0xb0/0xb0
[  196.189259]  ? trace_hardirqs_on_thunk+0x1a/0x1c
[  196.189267]  ? do_syscall_64+0x22/0x4a0
[  196.189276]  do_syscall_64+0xa5/0x4a0
[  196.189287]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  196.189293] RIP: 0033:0x44e7c9
[  196.189299] Code: fc ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00

the syzkaller reproduce script(but can't reproduce it every time):

r0 = syz_open_dev$evdev(&(0x7f0000000100)='/dev/input/event#\x00', 0x2,
0x1)
write$binfmt_elf64(r0, &(0x7f0000000240)={{0x7f, 0x45, 0x4c, 0x46, 0x40,
0x2, 0x2, 0xffffffff, 0xffffffffffff374c, 0x3, 0x0, 0x80000001, 0x103,
0x40, 0x22e, 0x26, 0x1, 0x38, 0x2, 0xa23, 0x1, 0x2}, [{0x6474e557, 0x5,
0x6, 0x2, 0x9, 0x9, 0x6c3, 0x1ff}], "", [[], [], [], []]}, 0x478)
ioctl$EVIOCGSW(0xffffffffffffffff, 0x8040451b, &(0x7f0000000040)=""/7)
syz_open_dev$evdev(&(0x7f0000000100)='/dev/input/event#\x00', 0x2, 0x1)
r1 = syz_open_dev$evdev(&(0x7f0000000100)='/dev/input/event#\x00', 0x2,
0x1)
openat$smack_task_current(0xffffffffffffff9c,
&(0x7f0000000040)='/proc/self/attr/current\x00', 0x2, 0x0)
ioctl$EVIOCSABS0(r1, 0x401845c0, &(0x7f0000000000)={0x4, 0x10000, 0x4,
0xd1, 0x81, 0x3})
eventfd(0x1ff)
syz_open_dev$evdev(&(0x7f0000000100)='/dev/input/event#\x00', 0x2,
0x200)
syz_open_dev$evdev(&(0x7f0000000100)='/dev/input/event#\x00', 0x2, 0x1)
syz_open_dev$evdev(&(0x7f0000000100)='/dev/input/event#\x00', 0x2, 0x1)
syz_open_dev$evdev(&(0x7f0000000100)='/dev/input/event#\x00', 0x2, 0x1)
syz_open_dev$evdev(&(0x7f0000000100)='/dev/input/event#\x00', 0x2, 0x1)

Typecast int to long to fix the issue.
Signed-off-by: Nliujian <liujian56@huawei.com>
Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NXiang Yang <xiangyang3@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e5d497b7

vdso: do cntvct workaround in the VDSO · 89d94b6b

由 Yang Yingliang 提交于 2月 04, 2021

hulk inclusion
category: feature
bugzilla: 47984
CVE: NA

--------------------------------------------------

If a cntvct workaround is enabled, read CNTVCT_EL0 twice
in VDSO to avoid the clock bug.

Without this patch on Kunpeng916:
./gettimeofday -E -C 200 -L -S -W -N "gettimeofday"
Running:        gettimeofday# ./gettimeofday -E -C 200 -L -S -W -N gettimeofday
             prc thr   usecs/call      samples   errors cnt/samp
gettimeofday   1   1      0.31753          198        0    20000

With this patch on Kunpeng916:
./gettimeofday -E -C 200 -L -S -W -N "gettimeofday"
Running:        gettimeofday# ./gettimeofday -E -C 200 -L -S -W -N gettimeofday
             prc thr   usecs/call      samples   errors cnt/samp
gettimeofday   1   1      0.05244          198        0    20000
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

89d94b6b

arm64: arch_timer: Disable CNTVCT_EL0 trap if workaround is enabled · 609a81be

由 Yang Yingliang 提交于 2月 04, 2021

hulk inclusion
category: feature
bugzilla: 47984
CVE: NA

--------------------------------------------------

It costs very much time to read CNTVCT_EL0, if a cntvct workaround
and CNTVCT_EL0 trap is enabled. To decrease the read time, we disable
CNTVCT_EL0 trap, introduce vdso_fix and vdso_shift for doing cntvct
workaround in VDSO.
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

609a81be

cgroup: Return ERSCH when add Z process into task · 546c5fc5

由 Lu Jialin 提交于 2月 03, 2021

hulk inclusion
category: bugfix
Bugzilla: 47951
CVE: NA

--------------------------

When echo a Z process into tasks, it should return -ERSCH instead of 0.
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

546c5fc5

ARM: 9027/1: head.S: explicitly map DT even if it lives in the first physical section · 03c358a8

由 Ard Biesheuvel 提交于 1月 28, 2021

mainline inclusion
from mainline-5.11-rc1
commit 10fce53c
category: feature
feature: ARM KASAN support
bugzilla: 46872
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=10fce53c0ef8f6e79115c3d9e0d7ea1338c3fa37

-------------------------------------------------
The early ATAGS/DT mapping code uses SECTION_SHIFT to mask low order
bits of R2, and decides that no ATAGS/DTB were provided if the resulting
value is 0x0.

This means that on systems where DRAM starts at 0x0 (such as Raspberry
Pi), no explicit mapping of the DT will be created if R2 points into the
first 1 MB section of memory. This was not a problem before, because the
decompressed kernel is loaded at the base of DRAM and mapped using
sections as well, and so as long as the DT is referenced via a virtual
address that uses the same translation (the linear map, in this case),
things work fine.

However, commit 7a1be318 ("9012/1: move device tree mapping out of
linear region") changes this, and now the DT is referenced via a virtual
address that is disjoint from the linear mapping of DRAM, and so we need
the early code to create the DT mapping unconditionally.

So let's create the early DT mapping for any value of R2 != 0x0.
Reported-by: N"kernelci.org bot" <bot@kernelci.org>
Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
(cherry picked from commit 10fce53c)
Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: NJing Xiangfeng <jingxiangfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

03c358a8

ARM: 9028/1: disable KASAN in call stack capturing routines · 8b1e6c92

由 Ard Biesheuvel 提交于 1月 28, 2021

mainline inclusion
from mainline-5.11-rc1
commit 4d576cab
category: feature
feature: ARM KASAN support
bugzilla: 46872
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4d576cab16f57e1f87978f6997a725179398341e

-------------------------------------------------
KASAN uses the routines in stacktrace.c to capture the call stack each
time memory gets allocated or freed. Some of these routines are also
used to log CPU and memory context when exceptions are taken, and so
in some cases, memory accesses may be made that are not strictly in
line with the KASAN constraints, and may therefore trigger false KASAN
positives.

So follow the example set by other architectures, and simply disable
KASAN instrumentation for these routines.
Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
(cherry picked from commit 4d576cab)
Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: NJing Xiangfeng <jingxiangfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8b1e6c92

ARM: 9022/1: Change arch/arm/lib/mem*.S to use WEAK instead of .weak · 4c885136

由 Fangrui Song 提交于 1月 28, 2021

mainline inclusion
from mainline-5.11-rc1
commit 735e8d93
category: feature
feature: ARM KASAN support
bugzilla: 46872
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=735e8d93dc2b107f7891a9c2b1c4cfbea1fcbbbc

-------------------------------------------------
Commit d6d51a96 ("ARM: 9014/2: Replace string mem* functions for
KASan") add .weak directives to memcpy/memmove/memset to avoid collision
with KASAN interceptors.

This does not work with LLVM's integrated assembler (the assembly snippet
`.weak memcpy ... .globl memcpy` produces a STB_GLOBAL memcpy while GNU as
produces a STB_WEAK memcpy). LLVM 12 (since https://reviews.llvm.org/D90108)
will error on such an overridden symbol binding.

Use the appropriate WEAK macro instead.

Link: https://github.com/ClangBuiltLinux/linux/issues/1190Reviewed-by: NJing Xiangfeng <jingxiangfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

4c885136

ARM: 9020/1: mm: use correct section size macro to describe the FDT virtual address · be09ffdc

由 Ard Biesheuvel 提交于 1月 28, 2021

mainline inclusion
from mainline-5.11-rc1
commit fc2933c1
category: feature
feature: ARM KASAN support
bugzilla: 46872
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fc2933c133744305236793025b00c2f7d258b687

-------------------------------------------------
Commit

  149a3ffe62b9dbc3 ("9012/1: move device tree mapping out of linear region")

created a permanent, read-only section mapping of the device tree blob
provided by the firmware, and added a set of macros to get the base and
size of the virtually mapped FDT based on the physical address. However,
while the mapping code uses the SECTION_SIZE macro correctly, the macros
use PMD_SIZE instead, which means something entirely different on ARM when
using short descriptors, and is therefore not the right quantity to use
here. So replace PMD_SIZE with SECTION_SIZE. While at it, change the names
of the macro and its parameter to clarify that it returns the virtual
address of the start of the FDT, based on the physical address in memory.
Tested-by: NJoel Stanley <joel@jms.id.au>
Tested-by: NMarek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
(cherry picked from commit fc2933c1)
Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: NJing Xiangfeng <jingxiangfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

be09ffdc

ARM: 9017/2: Enable KASan for ARM · bcc59c55

由 Linus Walleij 提交于 1月 28, 2021

mainline inclusion
from mainline-5.11-rc1
commit 42101571
category: feature
feature: ARM KASAN support
bugzilla: 46872
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=421015713b306e47af95d4d61cdfbd96d462e4cb

-------------------------------------------------
This patch enables the kernel address sanitizer for ARM. XIP_KERNEL
has not been tested and is therefore not allowed for now.

Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: kasan-dev@googlegroups.com
Acked-by: NDmitry Vyukov <dvyukov@google.com>
Reviewed-by: NArd Biesheuvel <ardb@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org> # QEMU/KVM/mach-virt/LPAE/8G
Tested-by: Florian Fainelli <f.fainelli@gmail.com> # Brahma SoCs
Tested-by: Ahmad Fatoum <a.fatoum@pengutronix.de> # i.MX6Q
Signed-off-by: NAbbott Liu <liuwenliang@huawei.com>
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
(cherry picked from commit 42101571)
Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: NJing Xiangfeng <jingxiangfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

bcc59c55

ARM: 9016/2: Initialize the mapping of KASan shadow memory · 9ed75a48

由 Linus Walleij 提交于 1月 28, 2021

mainline inclusion
from mainline-5.11-rc1
commit 5615f69b
category: feature
feature: ARM KASAN support
bugzilla: 46872
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5615f69bc2097452ecc954f5264d784e158d6801

-------------------------------------------------
This patch initializes KASan shadow region's page table and memory.
There are two stage for KASan initializing:

1. At early boot stage the whole shadow region is mapped to just
   one physical page (kasan_zero_page). It is finished by the function
   kasan_early_init which is called by __mmap_switched(arch/arm/kernel/
   head-common.S)

2. After the calling of paging_init, we use kasan_zero_page as zero
   shadow for some memory that KASan does not need to track, and we
   allocate a new shadow space for the other memory that KASan need to
   track. These issues are finished by the function kasan_init which is
   call by setup_arch.

When using KASan we also need to increase the THREAD_SIZE_ORDER
from 1 to 2 as the extra calls for shadow memory uses quite a bit
of stack.

As we need to make a temporary copy of the PGD when setting up
shadow memory we create a helpful PGD_SIZE definition for both
LPAE and non-LPAE setups.

The KASan core code unconditionally calls pud_populate() so this
needs to be changed from BUG() to do {} while (0) when building
with KASan enabled.

After the initial development by Andre Ryabinin several modifications
have been made to this code:

Abbott Liu <liuwenliang@huawei.com>
- Add support ARM LPAE: If LPAE is enabled, KASan shadow region's
  mapping table need be copied in the pgd_alloc() function.
- Change kasan_pte_populate,kasan_pmd_populate,kasan_pud_populate,
  kasan_pgd_populate from .meminit.text section to .init.text section.
  Reported by Florian Fainelli <f.fainelli@gmail.com>

Linus Walleij <linus.walleij@linaro.org>:
- Drop the custom mainpulation of TTBR0 and just use
  cpu_switch_mm() to switch the pgd table.
- Adopt to handle 4th level page tabel folding.
- Rewrite the entire page directory and page entry initialization
  sequence to be recursive based on ARM64:s kasan_init.c.

Ard Biesheuvel <ardb@kernel.org>:
- Necessary underlying fixes.
- Crucial bug fixes to the memory set-up code.
Co-developed-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
Co-developed-by: NAbbott Liu <liuwenliang@huawei.com>
Co-developed-by: NArd Biesheuvel <ardb@kernel.org>

Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: kasan-dev@googlegroups.com
Cc: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: NMike Rapoport <rppt@linux.ibm.com>
Reviewed-by: NArd Biesheuvel <ardb@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org> # QEMU/KVM/mach-virt/LPAE/8G
Tested-by: Florian Fainelli <f.fainelli@gmail.com> # Brahma SoCs
Tested-by: Ahmad Fatoum <a.fatoum@pengutronix.de> # i.MX6Q
Reported-by: NRussell King - ARM Linux <rmk+kernel@armlinux.org.uk>
Reported-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: NAbbott Liu <liuwenliang@huawei.com>
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
(cherry picked from commit 5615f69b)
Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: NJing Xiangfeng <jingxiangfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9ed75a48

ARM: 9015/2: Define the virtual space of KASan's shadow region · b85c1e0e

由 Linus Walleij 提交于 1月 28, 2021

mainline inclusion
from mainline-5.11-rc1
commit c12366ba
category: feature
feature: ARM KASAN support
bugzilla: 46872
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c12366ba441da2f6f2b915410aca2b5b39c16514

-------------------------------------------------
Define KASAN_SHADOW_OFFSET,KASAN_SHADOW_START and KASAN_SHADOW_END for
the Arm kernel address sanitizer. We are "stealing" lowmem (the 4GB
addressable by a 32bit architecture) out of the virtual address
space to use as shadow memory for KASan as follows:

 +----+ 0xffffffff
 |    |
 |    | |-> Static kernel image (vmlinux) BSS and page table
 |    |/
 +----+ PAGE_OFFSET
 |    |
 |    | |->  Loadable kernel modules virtual address space area
 |    |/
 +----+ MODULES_VADDR = KASAN_SHADOW_END
 |    |
 |    | |-> The shadow area of kernel virtual address.
 |    |/
 +----+->  TASK_SIZE (start of kernel space) = KASAN_SHADOW_START the
 |    |   shadow address of MODULES_VADDR
 |    | |
 |    | |
 |    | |-> The user space area in lowmem. The kernel address
 |    | |   sanitizer do not use this space, nor does it map it.
 |    | |
 |    | |
 |    | |
 |    | |
 |    |/
 ------ 0

0 .. TASK_SIZE is the memory that can be used by shared
userspace/kernelspace. It us used for userspace processes and for
passing parameters and memory buffers in system calls etc. We do not
need to shadow this area.

KASAN_SHADOW_START:
 This value begins with the MODULE_VADDR's shadow address. It is the
 start of kernel virtual space. Since we have modules to load, we need
 to cover also that area with shadow memory so we can find memory
 bugs in modules.

KASAN_SHADOW_END
 This value is the 0x100000000's shadow address: the mapping that would
 be after the end of the kernel memory at 0xffffffff. It is the end of
 kernel address sanitizer shadow area. It is also the start of the
 module area.

KASAN_SHADOW_OFFSET:
 This value is used to map an address to the corresponding shadow
 address by the following formula:

   shadow_addr = (address >> 3) + KASAN_SHADOW_OFFSET;

 As you would expect, >> 3 is equal to dividing by 8, meaning each
 byte in the shadow memory covers 8 bytes of kernel memory, so one
 bit shadow memory per byte of kernel memory is used.

 The KASAN_SHADOW_OFFSET is provided in a Kconfig option depending
 on the VMSPLIT layout of the system: the kernel and userspace can
 split up lowmem in different ways according to needs, so we calculate
 the shadow offset depending on this.

When kasan is enabled, the definition of TASK_SIZE is not an 8-bit
rotated constant, so we need to modify the TASK_SIZE access code in the
*.s file.

The kernel and modules may use different amounts of memory,
according to the VMSPLIT configuration, which in turn
determines the PAGE_OFFSET.

We use the following KASAN_SHADOW_OFFSETs depending on how the
virtual memory is split up:

- 0x1f000000 if we have 1G userspace / 3G kernelspace split:
  - The kernel address space is 3G (0xc0000000)
  - PAGE_OFFSET is then set to 0x40000000 so the kernel static
    image (vmlinux) uses addresses 0x40000000 .. 0xffffffff
  - On top of that we have the MODULES_VADDR which under
    the worst case (using ARM instructions) is
    PAGE_OFFSET - 16M (0x01000000) = 0x3f000000
    so the modules use addresses 0x3f000000 .. 0x3fffffff
  - So the addresses 0x3f000000 .. 0xffffffff need to be
    covered with shadow memory. That is 0xc1000000 bytes
    of memory.
  - 1/8 of that is needed for its shadow memory, so
    0x18200000 bytes of shadow memory is needed. We
    "steal" that from the remaining lowmem.
  - The KASAN_SHADOW_START becomes 0x26e00000, to
    KASAN_SHADOW_END at 0x3effffff.
  - Now we can calculate the KASAN_SHADOW_OFFSET for any
    kernel address as 0x3f000000 needs to map to the first
    byte of shadow memory and 0xffffffff needs to map to
    the last byte of shadow memory. Since:
    SHADOW_ADDR = (address >> 3) + KASAN_SHADOW_OFFSET
    0x26e00000 = (0x3f000000 >> 3) + KASAN_SHADOW_OFFSET
    KASAN_SHADOW_OFFSET = 0x26e00000 - (0x3f000000 >> 3)
    KASAN_SHADOW_OFFSET = 0x26e00000 - 0x07e00000
    KASAN_SHADOW_OFFSET = 0x1f000000

- 0x5f000000 if we have 2G userspace / 2G kernelspace split:
  - The kernel space is 2G (0x80000000)
  - PAGE_OFFSET is set to 0x80000000 so the kernel static
    image uses 0x80000000 .. 0xffffffff.
  - On top of that we have the MODULES_VADDR which under
    the worst case (using ARM instructions) is
    PAGE_OFFSET - 16M (0x01000000) = 0x7f000000
    so the modules use addresses 0x7f000000 .. 0x7fffffff
  - So the addresses 0x7f000000 .. 0xffffffff need to be
    covered with shadow memory. That is 0x81000000 bytes
    of memory.
  - 1/8 of that is needed for its shadow memory, so
    0x10200000 bytes of shadow memory is needed. We
    "steal" that from the remaining lowmem.
  - The KASAN_SHADOW_START becomes 0x6ee00000, to
    KASAN_SHADOW_END at 0x7effffff.
  - Now we can calculate the KASAN_SHADOW_OFFSET for any
    kernel address as 0x7f000000 needs to map to the first
    byte of shadow memory and 0xffffffff needs to map to
    the last byte of shadow memory. Since:
    SHADOW_ADDR = (address >> 3) + KASAN_SHADOW_OFFSET
    0x6ee00000 = (0x7f000000 >> 3) + KASAN_SHADOW_OFFSET
    KASAN_SHADOW_OFFSET = 0x6ee00000 - (0x7f000000 >> 3)
    KASAN_SHADOW_OFFSET = 0x6ee00000 - 0x0fe00000
    KASAN_SHADOW_OFFSET = 0x5f000000

- 0x9f000000 if we have 3G userspace / 1G kernelspace split,
  and this is the default split for ARM:
  - The kernel address space is 1GB (0x40000000)
  - PAGE_OFFSET is set to 0xc0000000 so the kernel static
    image uses 0xc0000000 .. 0xffffffff.
  - On top of that we have the MODULES_VADDR which under
    the worst case (using ARM instructions) is
    PAGE_OFFSET - 16M (0x01000000) = 0xbf000000
    so the modules use addresses 0xbf000000 .. 0xbfffffff
  - So the addresses 0xbf000000 .. 0xffffffff need to be
    covered with shadow memory. That is 0x41000000 bytes
    of memory.
  - 1/8 of that is needed for its shadow memory, so
    0x08200000 bytes of shadow memory is needed. We
    "steal" that from the remaining lowmem.
  - The KASAN_SHADOW_START becomes 0xb6e00000, to
    KASAN_SHADOW_END at 0xbfffffff.
  - Now we can calculate the KASAN_SHADOW_OFFSET for any
    kernel address as 0xbf000000 needs to map to the first
    byte of shadow memory and 0xffffffff needs to map to
    the last byte of shadow memory. Since:
    SHADOW_ADDR = (address >> 3) + KASAN_SHADOW_OFFSET
    0xb6e00000 = (0xbf000000 >> 3) + KASAN_SHADOW_OFFSET
    KASAN_SHADOW_OFFSET = 0xb6e00000 - (0xbf000000 >> 3)
    KASAN_SHADOW_OFFSET = 0xb6e00000 - 0x17e00000
    KASAN_SHADOW_OFFSET = 0x9f000000

- 0x8f000000 if we have 3G userspace / 1G kernelspace with
  full 1 GB low memory (VMSPLIT_3G_OPT):
  - The kernel address space is 1GB (0x40000000)
  - PAGE_OFFSET is set to 0xb0000000 so the kernel static
    image uses 0xb0000000 .. 0xffffffff.
  - On top of that we have the MODULES_VADDR which under
    the worst case (using ARM instructions) is
    PAGE_OFFSET - 16M (0x01000000) = 0xaf000000
    so the modules use addresses 0xaf000000 .. 0xaffffff
  - So the addresses 0xaf000000 .. 0xffffffff need to be
    covered with shadow memory. That is 0x51000000 bytes
    of memory.
  - 1/8 of that is needed for its shadow memory, so
    0x0a200000 bytes of shadow memory is needed. We
    "steal" that from the remaining lowmem.
  - The KASAN_SHADOW_START becomes 0xa4e00000, to
    KASAN_SHADOW_END at 0xaeffffff.
  - Now we can calculate the KASAN_SHADOW_OFFSET for any
    kernel address as 0xaf000000 needs to map to the first
    byte of shadow memory and 0xffffffff needs to map to
    the last byte of shadow memory. Since:
    SHADOW_ADDR = (address >> 3) + KASAN_SHADOW_OFFSET
    0xa4e00000 = (0xaf000000 >> 3) + KASAN_SHADOW_OFFSET
    KASAN_SHADOW_OFFSET = 0xa4e00000 - (0xaf000000 >> 3)
    KASAN_SHADOW_OFFSET = 0xa4e00000 - 0x15e00000
    KASAN_SHADOW_OFFSET = 0x8f000000

- The default value of 0xffffffff for KASAN_SHADOW_OFFSET
  is an error value. We should always match one of the
  above shadow offsets.

When we do this, TASK_SIZE will sometimes get a bit odd values
that will not fit into immediate mov assembly instructions.
To account for this, we need to rewrite some assembly using
TASK_SIZE like this:

-       mov     r1, #TASK_SIZE
+       ldr     r1, =TASK_SIZE

or

-       cmp     r4, #TASK_SIZE
+       ldr     r0, =TASK_SIZE
+       cmp     r4, r0

this is done to avoid the immediate #TASK_SIZE that need to
fit into a limited number of bits.

Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: kasan-dev@googlegroups.com
Cc: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: NArd Biesheuvel <ardb@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org> # QEMU/KVM/mach-virt/LPAE/8G
Tested-by: Florian Fainelli <f.fainelli@gmail.com> # Brahma SoCs
Tested-by: Ahmad Fatoum <a.fatoum@pengutronix.de> # i.MX6Q
Reported-by: NArd Biesheuvel <ardb@kernel.org>
Signed-off-by: NAbbott Liu <liuwenliang@huawei.com>
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
(cherry picked from commit c12366ba)
Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: NJing Xiangfeng <jingxiangfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b85c1e0e

ARM: 9014/2: Replace string mem* functions for KASan · 802a7d27

由 Linus Walleij 提交于 1月 28, 2021

mainline inclusion
from mainline-5.11-rc1
commit d6d51a96
category: feature
feature: ARM KASAN support
bugzilla: 46872
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d6d51a96c7d63b7450860a3037f2d62388286a52

-------------------------------------------------
Functions like memset()/memmove()/memcpy() do a lot of memory
accesses.

If a bad pointer is passed to one of these functions it is important
to catch this. Compiler instrumentation cannot do this since these
functions are written in assembly.

KASan replaces these memory functions with instrumented variants.

The original functions are declared as weak symbols so that
the strong definitions in mm/kasan/kasan.c can replace them.

The original functions have aliases with a '__' prefix in their
name, so we can call the non-instrumented variant if needed.

We must use __memcpy()/__memset() in place of memcpy()/memset()
when we copy .data to RAM and when we clear .bss, because
kasan_early_init cannot be called before the initialization of
.data and .bss.

For the kernel compression and EFI libstub's custom string
libraries we need a special quirk: even if these are built
without KASan enabled, they rely on the global headers for their
custom string libraries, which means that e.g. memcpy()
will be defined to __memcpy() and we get link failures.
Since these implementations are written i C rather than
assembly we use e.g. __alias(memcpy) to redirected any
users back to the local implementation.

Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: kasan-dev@googlegroups.com
Reviewed-by: NArd Biesheuvel <ardb@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org> # QEMU/KVM/mach-virt/LPAE/8G
Tested-by: Florian Fainelli <f.fainelli@gmail.com> # Brahma SoCs
Tested-by: Ahmad Fatoum <a.fatoum@pengutronix.de> # i.MX6Q
Reported-by: NRussell King - ARM Linux <rmk+kernel@armlinux.org.uk>
Signed-off-by: NAhmad Fatoum <a.fatoum@pengutronix.de>
Signed-off-by: NAbbott Liu <liuwenliang@huawei.com>
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
(cherry picked from commit d6d51a96)
Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: NJing Xiangfeng <jingxiangfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

802a7d27

ARM: 9013/2: Disable KASan instrumentation for some code · 768125cb

由 Linus Walleij 提交于 1月 28, 2021

mainline inclusion
from mainline-5.11-rc1
commit d5d44e7e
category: feature
feature: ARM KASAN support
bugzilla: 46872
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d5d44e7e3507b0ad868f68e0c5bca6a57afa1b8b

-------------------------------------------------
Disable instrumentation for arch/arm/boot/compressed/*
since that code is executed before the kernel has even
set up its mappings and definately out of scope for
KASan.

Disable instrumentation of arch/arm/vdso/* because that code
is not linked with the kernel image, so the KASan management
code would fail to link.

Disable instrumentation of arch/arm/mm/physaddr.c. See commit
ec6d06ef ("arm64: Add support for CONFIG_DEBUG_VIRTUAL")
for more details.

Disable kasan check in the function unwind_pop_register because
it does not matter that kasan checks failed when unwind_pop_register()
reads the stack memory of a task.

Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: kasan-dev@googlegroups.com
Reviewed-by: NArd Biesheuvel <ardb@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org> # QEMU/KVM/mach-virt/LPAE/8G
Tested-by: Florian Fainelli <f.fainelli@gmail.com> # Brahma SoCs
Tested-by: Ahmad Fatoum <a.fatoum@pengutronix.de> # i.MX6Q
Reported-by: NFlorian Fainelli <f.fainelli@gmail.com>
Reported-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NAbbott Liu <liuwenliang@huawei.com>
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
(cherry picked from commit d5d44e7e)
Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: NJing Xiangfeng <jingxiangfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

768125cb

ARM: 9012/1: move device tree mapping out of linear region · 3cf42c8c

由 Ard Biesheuvel 提交于 1月 28, 2021

mainline inclusion
from mainline-5.11-rc1
commit 7a1be318
category: feature
feature: ARM KASAN support
bugzilla: 46872
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7a1be318f5795cb66fa0dc86b3ace427fe68057f

-------------------------------------------------
On ARM, setting up the linear region is tricky, given the constraints
around placement and alignment of the memblocks, and how the kernel
itself as well as the DT are placed in physical memory.

Let's simplify matters a bit, by moving the device tree mapping to the
top of the address space, right between the end of the vmalloc region
and the start of the the fixmap region, and create a read-only mapping
for it that is independent of the size of the linear region, and how it
is organized.

Since this region was formerly used as a guard region, which will now be
populated fully on LPAE builds by this read-only mapping (which will
still be able to function as a guard region for stray writes), bump the
start of the [underutilized] fixmap region by 512 KB as well, to ensure
that there is always a proper guard region here. Doing so still leaves
ample room for the fixmap space, even with NR_CPUS set to its maximum
value of 32.
Tested-by: NLinus Walleij <linus.walleij@linaro.org>
Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
Reviewed-by: NNicolas Pitre <nico@fluxnic.net>
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
(cherry picked from commit 7a1be318)
Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: NJing Xiangfeng <jingxiangfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

3cf42c8c

ARM: 9011/1: centralize phys-to-virt conversion of DT/ATAGS address · de437ff5

由 Ard Biesheuvel 提交于 1月 28, 2021

mainline inclusion
from mainline-5.11-rc1
commit e9a2f8b5
category: feature
feature: ARM KASAN support
bugzilla: 46872
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e9a2f8b599d0bc22a1b13e69527246ac39c697b4

-------------------------------------------------
Before moving the DT mapping out of the linear region, let's prepare
for this change by removing all the phys-to-virt translations of the
__atags_pointer variable, and perform this translation only once at
setup time.
Tested-by: NLinus Walleij <linus.walleij@linaro.org>
Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
Acked-by: NNicolas Pitre <nico@fluxnic.net>
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
(cherry picked from commit e9a2f8b5)
Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: NJing Xiangfeng <jingxiangfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

de437ff5

drm/radeon: check the alloc_workqueue return value · 1a8076a9

由 Yang Yingliang 提交于 2月 01, 2021

hulk inclusion
category: bugfix
bugzilla: 47877
CVE: CVE-2019-16230

-------------------------------------------------

check the alloc_workqueue return value in radeon_crtc_init()
to avoid null-ptr-deref.
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

1a8076a9

printk: fix string termination for record_print_text() · 397bddb1

由 John Ogness 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit d5ac8304e18025a522b5d1d87629e926064ce134
bugzilla: 47876

--------------------------------

commit 08d60e59 upstream.

Commit f0e386ee ("printk: fix buffer overflow potential for
print_text()") added string termination in record_print_text().
However it used the wrong base pointer for adding the terminator.
This led to a 0-byte being written somewhere beyond the buffer.

Use the correct base pointer when adding the terminator.

Fixes: f0e386ee ("printk: fix buffer overflow potential for print_text()")
Reported-by: NSven Schnelle <svens@linux.ibm.com>
Signed-off-by: NJohn Ogness <john.ogness@linutronix.de>
Signed-off-by: NPetr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210124202728.4718-1-john.ogness@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

397bddb1

printk: fix buffer overflow potential for print_text() · ba7c4f45

由 John Ogness 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit 861c2e349a36868f9c19a82844b2eb0abf20939b
bugzilla: 47876

--------------------------------

commit f0e386ee upstream.

Before the commit 896fbe20 ("printk: use the lockless
ringbuffer"), msg_print_text() would only write up to size-1 bytes
into the provided buffer. Some callers expect this behavior and
append a terminator to returned string. In particular:

arch/powerpc/xmon/xmon.c:dump_log_buf()
arch/um/kernel/kmsg_dump.c:kmsg_dumper_stdout()

msg_print_text() has been replaced by record_print_text(), which
currently fills the full size of the buffer. This causes a
buffer overflow for the above callers.

Change record_print_text() so that it will only use size-1 bytes
for text data. Also, for paranoia sakes, add a terminator after
the text data.

And finally, document this behavior so that it is clear that only
size-1 bytes are used and a terminator is added.

Fixes: 896fbe20 ("printk: use the lockless ringbuffer")
Cc: stable@vger.kernel.org # 5.10+
Signed-off-by: NJohn Ogness <john.ogness@linutronix.de>
Reviewed-by: NPetr Mladek <pmladek@suse.com>
Acked-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: NPetr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210114170412.4819-1-john.ogness@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

ba7c4f45

tools: Factor HOSTCC, HOSTLD, HOSTAR definitions · 204757ce

由 Jean-Philippe Brucker 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit cb14bbbb7bbfdb9da25d24cf14f52ef54eee1109
bugzilla: 47876

--------------------------------

commit c8a950d0 upstream.

Several Makefiles in tools/ need to define the host toolchain variables.
Move their definition to tools/scripts/Makefile.include
Signed-off-by: NJean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
Acked-by: NJiri Olsa <jolsa@redhat.com>
Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/bpf/20201110164310.2600671-2-jean-philippe@linaro.org
Cc: Alistair Delva <adelva@google.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

204757ce

mm: fix a race on nr_swap_pages · 41c5aab6

由 Zhaoyang Huang 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit f472a59aa182d5aac2927633f390514cf7b614b4
bugzilla: 47876

--------------------------------

commit b50da6e9 upstream.

The scenario on which "Free swap = -4kB" happens in my system, which is caused
by several get_swap_pages racing with each other and show_swap_cache_info
happens simutaniously. No need to add a lock on get_swap_page_of_type as we
remove "Presub/PosAdd" here.

ProcessA			ProcessB			ProcessC
ngoals = 1			ngoals = 1
avail = nr_swap_pages(1)	avail = nr_swap_pages(1)
nr_swap_pages(1) -= ngoals
				nr_swap_pages(0) -= ngoals
								nr_swap_pages = -1

Link: https://lkml.kernel.org/r/1607050340-4535-1-git-send-email-zhaoyang.huang@unisoc.comSigned-off-by: NZhaoyang Huang <zhaoyang.huang@unisoc.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

41c5aab6

mm/page_alloc: add a missing mm_page_alloc_zone_locked() tracepoint · 8c9cf8f7

由 Hailong liu 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit c11f7749f1fc9bad6b1f0e073de08fa996f21cc3
bugzilla: 47876

--------------------------------

commit ce8f86ee upstream.

The trace point *trace_mm_page_alloc_zone_locked()* in __rmqueue() does
not currently cover all branches.  Add the missing tracepoint and check
the page before do that.

[akpm@linux-foundation.org: use IS_ENABLED() to suppress warning]

Link: https://lkml.kernel.org/r/20201228132901.41523-1-carver4lio@163.comSigned-off-by: NHailong liu <liu.hailong6@zte.com.cn>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Ivan Babrou <ivan@cloudflare.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8c9cf8f7

objtool: Don't fail on missing symbol table · 5aa82478

由 Josh Poimboeuf 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit c6fd968f58439398b765300aecd7758d501ee49c
bugzilla: 47876

--------------------------------

commit 1d489151 upstream.

Thanks to a recent binutils change which doesn't generate unused
symbols, it's now possible for thunk_64.o be completely empty without
CONFIG_PREEMPTION: no text, no data, no symbols.

We could edit the Makefile to only build that file when
CONFIG_PREEMPTION is enabled, but that will likely create confusion
if/when the thunks end up getting used by some other code again.

Just ignore it and move on.
Reported-by: NNathan Chancellor <natechancellor@gmail.com>
Reviewed-by: NNathan Chancellor <natechancellor@gmail.com>
Reviewed-by: NMiroslav Benes <mbenes@suse.cz>
Tested-by: NNathan Chancellor <natechancellor@gmail.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/1254Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

5aa82478

io_uring: fix sleeping under spin in __io_clean_op · 805fa5dc

由 Pavel Begunkov 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit d92d00861e98db178bd721876d0a06d1e8d5ff1a
bugzilla: 47876

--------------------------------

[ Upstream commit 9d5c8190 ]

[   27.629441] BUG: sleeping function called from invalid context
	at fs/file.c:402
[   27.631317] in_atomic(): 1, irqs_disabled(): 1, non_block: 0,
	pid: 1012, name: io_wqe_worker-0
[   27.633220] 1 lock held by io_wqe_worker-0/1012:
[   27.634286]  #0: ffff888105e26c98 (&ctx->completion_lock)
	{....}-{2:2}, at: __io_req_complete.part.102+0x30/0x70
[   27.649249] Call Trace:
[   27.649874]  dump_stack+0xac/0xe3
[   27.650666]  ___might_sleep+0x284/0x2c0
[   27.651566]  put_files_struct+0xb8/0x120
[   27.652481]  __io_clean_op+0x10c/0x2a0
[   27.653362]  __io_cqring_fill_event+0x2c1/0x350
[   27.654399]  __io_req_complete.part.102+0x41/0x70
[   27.655464]  io_openat2+0x151/0x300
[   27.656297]  io_issue_sqe+0x6c/0x14e0
[   27.660991]  io_wq_submit_work+0x7f/0x240
[   27.662890]  io_worker_handle_work+0x501/0x8a0
[   27.664836]  io_wqe_worker+0x158/0x520
[   27.667726]  kthread+0x134/0x180
[   27.669641]  ret_from_fork+0x1f/0x30

Instead of cleaning files on overflow, return back overflow cancellation
into io_uring_cancel_files(). Previously it was racy to clean
REQ_F_OVERFLOW flag, but we got rid of it, and can do it through
repetitive attempts targeting all matching requests.

Cc: stable@vger.kernel.org # 5.9+
Reported-by: NAbaci <abaci@linux.alibaba.com>
Reported-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Cc: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

805fa5dc

io_uring: dont kill fasync under completion_lock · b3a9880e

由 Pavel Begunkov 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit 7bccd1c19128140b9fefaa43808924c6932bef5b
bugzilla: 47876

--------------------------------

[ Upstream commit 4aa84f2f ]

      CPU0                    CPU1
       ----                    ----
  lock(&new->fa_lock);
                               local_irq_disable();
                               lock(&ctx->completion_lock);
                               lock(&new->fa_lock);
  <Interrupt>
    lock(&ctx->completion_lock);

 *** DEADLOCK ***

Move kill_fasync() out of io_commit_cqring() to io_cqring_ev_posted(),
so it doesn't hold completion_lock while doing it. That saves from the
reported deadlock, and it's just nice to shorten the locking time and
untangle nested locks (compl_lock -> wq_head::lock).

Cc: stable@vger.kernel.org # 5.5+
Reported-by: syzbot+91ca3f25bd7f795f019c@syzkaller.appspotmail.com
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b3a9880e

io_uring: fix skipping disabling sqo on exec · 417ce493

由 Pavel Begunkov 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit 186725a80c4e931b6fe31b94d66c989d5f2354c1
bugzilla: 47876

--------------------------------

[ Upstream commit 0b5cd6c3 ]

If there are no requests at the time __io_uring_task_cancel() is called,
tctx_inflight() returns zero and and it terminates not getting a chance
to go through __io_uring_files_cancel() and do
io_disable_sqo_submit(). And we absolutely want them disabled by the
time cancellation ends.

Cc: stable@vger.kernel.org # 5.5+
Reported-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Fixes: d9d05217 ("io_uring: stop SQPOLL submit on creator's death")
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

417ce493

io_uring: fix uring_flush in exit_files() warning · 8940e1a8

由 Pavel Begunkov 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit 54b4c4f9aba9e5d1ef6877f42a57895b189107c9
bugzilla: 47876

--------------------------------

[ Upstream commit 4325cb49 ]

WARNING: CPU: 1 PID: 11100 at fs/io_uring.c:9096
	io_uring_flush+0x326/0x3a0 fs/io_uring.c:9096
RIP: 0010:io_uring_flush+0x326/0x3a0 fs/io_uring.c:9096
Call Trace:
 filp_close+0xb4/0x170 fs/open.c:1280
 close_files fs/file.c:401 [inline]
 put_files_struct fs/file.c:416 [inline]
 put_files_struct+0x1cc/0x350 fs/file.c:413
 exit_files+0x7e/0xa0 fs/file.c:433
 do_exit+0xc22/0x2ae0 kernel/exit.c:820
 do_group_exit+0x125/0x310 kernel/exit.c:922
 get_signal+0x3e9/0x20a0 kernel/signal.c:2770
 arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:811
 handle_signal_work kernel/entry/common.c:147 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
 exit_to_user_mode_prepare+0x148/0x250 kernel/entry/common.c:201
 __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
 syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:302
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

An SQPOLL ring creator task may have gotten rid of its file note during
exit and called io_disable_sqo_submit(), but the io_uring is still left
referenced through fdtable, which will be put during close_files() and
cause a false positive warning.

First split the warning into two for more clarity when is hit, and the
add sqo_dead check to handle the described case.

Cc: stable@vger.kernel.org # 5.5+
Reported-by: syzbot+a32b546d58dde07875a1@syzkaller.appspotmail.com
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8940e1a8

io_uring: fix false positive sqo warning on flush · 79ee5666

由 Pavel Begunkov 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit 0682759126bc761c325325ca809ae99c93fda2a0
bugzilla: 47876

--------------------------------

[ Upstream commit 6b393a1f ]

WARNING: CPU: 1 PID: 9094 at fs/io_uring.c:8884
	io_disable_sqo_submit+0x106/0x130 fs/io_uring.c:8884
Call Trace:
 io_uring_flush+0x28b/0x3a0 fs/io_uring.c:9099
 filp_close+0xb4/0x170 fs/open.c:1280
 close_fd+0x5c/0x80 fs/file.c:626
 __do_sys_close fs/open.c:1299 [inline]
 __se_sys_close fs/open.c:1297 [inline]
 __x64_sys_close+0x2f/0xa0 fs/open.c:1297
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

io_uring's final close() may be triggered by any task not only the
creator. It's well handled by io_uring_flush() including SQPOLL case,
though a warning in io_disable_sqo_submit() will fallaciously fire by
moving this warning out to the only call site that matters.

Cc: stable@vger.kernel.org # 5.5+
Reported-by: syzbot+2f5d1785dc624932da78@syzkaller.appspotmail.com
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

79ee5666

io_uring: do sqo disable on install_fd error · 43eac3ed

由 Pavel Begunkov 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit 8cb6f4da831bc51145aee3a923f03114121dea6b
bugzilla: 47876

--------------------------------

[ Upstream commit 06585c49 ]

WARNING: CPU: 0 PID: 8494 at fs/io_uring.c:8717
	io_ring_ctx_wait_and_kill+0x4f2/0x600 fs/io_uring.c:8717
Call Trace:
 io_uring_release+0x3e/0x50 fs/io_uring.c:8759
 __fput+0x283/0x920 fs/file_table.c:280
 task_work_run+0xdd/0x190 kernel/task_work.c:140
 tracehook_notify_resume include/linux/tracehook.h:189 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:174 [inline]
 exit_to_user_mode_prepare+0x249/0x250 kernel/entry/common.c:201
 __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
 syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:302
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

failed io_uring_install_fd() is a special case, we don't do
io_ring_ctx_wait_and_kill() directly but defer it to fput, though still
need to io_disable_sqo_submit() before.

note: it doesn't fix any real problem, just a warning. That's because
sqring won't be available to the userspace in this case and so SQPOLL
won't submit anything.

Cc: stable@vger.kernel.org # 5.5+
Reported-by: syzbot+9c9c35374c0ecac06516@syzkaller.appspotmail.com
Fixes: d9d05217 ("io_uring: stop SQPOLL submit on creator's death")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

43eac3ed

io_uring: fix null-deref in io_disable_sqo_submit · 8173ba66

由 Pavel Begunkov 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit 0e3562e3b2aeb4a6aa4615185a8f59c51cade61b
bugzilla: 47876

--------------------------------

[ Upstream commit b4411616 ]

general protection fault, probably for non-canonical address
	0xdffffc0000000022: 0000 [#1] KASAN: null-ptr-deref
	in range [0x0000000000000110-0x0000000000000117]
RIP: 0010:io_ring_set_wakeup_flag fs/io_uring.c:6929 [inline]
RIP: 0010:io_disable_sqo_submit+0xdb/0x130 fs/io_uring.c:8891
Call Trace:
 io_uring_create fs/io_uring.c:9711 [inline]
 io_uring_setup+0x12b1/0x38e0 fs/io_uring.c:9739
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

io_disable_sqo_submit() might be called before user rings were
allocated, don't do io_ring_set_wakeup_flag() in those cases.

Cc: stable@vger.kernel.org # 5.5+
Reported-by: syzbot+ab412638aeb652ded540@syzkaller.appspotmail.com
Fixes: d9d05217 ("io_uring: stop SQPOLL submit on creator's death")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8173ba66

io_uring: stop SQPOLL submit on creator's death · 63238c74

由 Pavel Begunkov 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit a63d9157571b52f7339d6db4c2ab7bc3bfe527c0
bugzilla: 47876

--------------------------------

[ Upstream commit d9d05217 ]

When the creator of SQPOLL io_uring dies (i.e. sqo_task), we don't want
its internals like ->files and ->mm to be poked by the SQPOLL task, it
have never been nice and recently got racy. That can happen when the
owner undergoes destruction and SQPOLL tasks tries to submit new
requests in parallel, and so calls io_sq_thread_acquire*().

That patch halts SQPOLL submissions when sqo_task dies by introducing
sqo_dead flag. Once set, the SQPOLL task must not do any submission,
which is synchronised by uring_lock as well as the new flag.

The tricky part is to make sure that disabling always happens, that
means either the ring is discovered by creator's do_exit() -> cancel,
or if the final close() happens before it's done by the creator. The
last is guaranteed by the fact that for SQPOLL the creator task and only
it holds exactly one file note, so either it pins up to do_exit() or
removed by the creator on the final put in flush. (see comments in
uring_flush() around file->f_count == 2).

One more place that can trigger io_sq_thread_acquire_*() is
__io_req_task_submit(). Shoot off requests on sqo_dead there, even
though actually we don't need to. That's because cancellation of
sqo_task should wait for the request before going any further.

note 1: io_disable_sqo_submit() does io_ring_set_wakeup_flag() so the
caller would enter the ring to get an error, but it still doesn't
guarantee that the flag won't be cleared.

note 2: if final __userspace__ close happens not from the creator
task, the file note will pin the ring until the task dies.

Cc: stable@vger.kernel.org # 5.5+
Fixed: b1b6b5a3 ("kernel/io_uring: cancel io_uring before task works")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

63238c74

io_uring: add warn_once for io_uring_flush() · db9b4d68

由 Pavel Begunkov 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit da67631a33c342528245817cc61e36dd945665b0
bugzilla: 47876

--------------------------------

[ Upstream commit 6b5733eb]

files_cancel() should cancel all relevant requests and drop file notes,
so we should never have file notes after that, including on-exit fput
and flush. Add a WARN_ONCE to be sure.

Cc: stable@vger.kernel.org # 5.5+
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

db9b4d68

io_uring: inline io_uring_attempt_task_drop() · 8157becf

由 Pavel Begunkov 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit 18f31594ee52ed1f364e376767fb839935fd899c
bugzilla: 47876

--------------------------------

[ Upstream commit 4f793dc4 ]

A simple preparation change inlining io_uring_attempt_task_drop() into
io_uring_flush().

Cc: stable@vger.kernel.org # 5.5+
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8157becf

kernel/io_uring: cancel io_uring before task works · 56ea41ef

由 Pavel Begunkov 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit 7bf3fb6243a3b153ab1854b331ec19d67a4878bb
bugzilla: 47876

--------------------------------

[ Upstream commit b1b6b5a3 ]

For cancelling io_uring requests it needs either to be able to run
currently enqueued task_works or having it shut down by that moment.
Otherwise io_uring_cancel_files() may be waiting for requests that won't
ever complete.

Go with the first way and do cancellations before setting PF_EXITING and
so before putting the task_work infrastructure into a transition state
where task_work_run() would better not be called.

Cc: stable@vger.kernel.org # 5.5+
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

56ea41ef

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功