提交 · 4.19.90-2304.2.0 · openeuler / Kernel

08 4月, 2023 21 次提交

!566 linux-4.19.y bugfixes backport · ce8f76ce

由 openeuler-ci-bot 提交于 4月 08, 2023

Merge Pull Request from: @LiuYongQiang0816 
 
20 bugfixes from linux-4.19.y 
 
Link:https://gitee.com/openeuler/kernel/pulls/566 

Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> 
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>

ce8f76ce

bpf: add missing header file include · c6efc48a

由 Linus Torvalds 提交于 4月 08, 2023

stable inclusion
from stable-v4.19.274
commit c7603df97635954165fb599e64e197efc353979b
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1
CVE: NA

--------------------------------

commit f3dd0c53 upstream.

Commit 74e19ef0 ("uaccess: Add speculation barrier to
copy_from_user()") built fine on x86-64 and arm64, and that's the extent
of my local build testing.

It turns out those got the <linux/nospec.h> include incidentally through
other header files (<linux/kvm_host.h> in particular), but that was not
true of other architectures, resulting in build errors

  kernel/bpf/core.c: In function ‘___bpf_prog_run’:
  kernel/bpf/core.c:1913:3: error: implicit declaration of function ‘barrier_nospec’

so just make sure to explicitly include the proper <linux/nospec.h>
header file to make everybody see it.

Fixes: 74e19ef0 ("uaccess: Add speculation barrier to copy_from_user()")
Reported-by: Nkernel test robot <lkp@intel.com>
Reported-by: NViresh Kumar <viresh.kumar@linaro.org>
Reported-by: NHuacai Chen <chenhuacai@loongson.cn>
Tested-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Tested-by: NDave Hansen <dave.hansen@linux.intel.com>
Acked-by: NAlexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

c6efc48a

uaccess: Add speculation barrier to copy_from_user() · bb47af81

由 Dave Hansen 提交于 4月 07, 2023

stable inclusion
from stable-v4.19.274
commit f8e54da1c729cc23d9a7b7bd42379323e7fb7979
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1
CVE: NA

--------------------------------

commit 74e19ef0 upstream.

The results of "access_ok()" can be mis-speculated.  The result is that
you can end speculatively:

	if (access_ok(from, size))
		// Right here

even for bad from/size combinations.  On first glance, it would be ideal
to just add a speculation barrier to "access_ok()" so that its results
can never be mis-speculated.

But there are lots of system calls just doing access_ok() via
"copy_to_user()" and friends (example: fstat() and friends).  Those are
generally not problematic because they do not _consume_ data from
userspace other than the pointer.  They are also very quick and common
system calls that should not be needlessly slowed down.

"copy_from_user()" on the other hand uses a user-controller pointer and
is frequently followed up with code that might affect caches.  Take
something like this:

	if (!copy_from_user(&kernelvar, uptr, size))
		do_something_with(kernelvar);

If userspace passes in an evil 'uptr' that *actually* points to a kernel
addresses, and then do_something_with() has cache (or other)
side-effects, it could allow userspace to infer kernel data values.

Add a barrier to the common copy_from_user() code to prevent
mis-speculated values which happen after the copy.

Also add a stub for architectures that do not define barrier_nospec().
This makes the macro usable in generic code.

Since the barrier is now usable in generic code, the x86 #ifdef in the
BPF code can also go away.
Reported-by: NJordy Zomer <jordyzomer@google.com>
Suggested-by: NLinus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>   # BPF bits
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Conflicts:
	lib/usercopy.c
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
Reviewed-by: NNanyong Sun <sunnanyong@huawei.com>
Reviewed-by: Ntong tiangen <tongtiangen@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

bb47af81

random: always mix cycle counter in add_latent_entropy() · fff09007

由 Jason A. Donenfeld 提交于 4月 07, 2023

stable inclusion
from stable-v4.19.274
commit e4935368448ce8097dada35163598e93567f1110
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e4935368448ce8097dada35163598e93567f1110

--------------------------------

[ Upstream commit d7bf7f3b ]

add_latent_entropy() is called every time a process forks, in
kernel_clone(). This in turn calls add_device_randomness() using the
latent entropy global state. add_device_randomness() does two things:

   2) Mixes into the input pool the latent entropy argument passed; and
   1) Mixes in a cycle counter, a sort of measurement of when the event
      took place, the high precision bits of which are presumably
      difficult to predict.

(2) is impossible without CONFIG_GCC_PLUGIN_LATENT_ENTROPY=y. But (1) is
always possible. However, currently CONFIG_GCC_PLUGIN_LATENT_ENTROPY=n
disables both (1) and (2), instead of just (2).

This commit causes the CONFIG_GCC_PLUGIN_LATENT_ENTROPY=n case to still
do (1) by passing NULL (len 0) to add_device_randomness() when add_latent_
entropy() is called.

Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Emese Revfy <re.emese@gmail.com>
Fixes: 38addce8 ("gcc-plugins: Add latent_entropy plugin")
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

Conflicts:
	include/linux/random.h
Signed-off-by: NCui GaoSheng <cuigaosheng1@huawei.com>
Reviewed-by: Nyiyang <yiyang13@huawei.com>
Reviewed-by: Nguozihua <guozihua@huawei.com>
Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

fff09007

x86/mm: Fix use of uninitialized buffer in sme_enable() · 8295451e

由 Nikita Zhandarovich 提交于 4月 07, 2023

stable inclusion
from stable-v4.19.279
commit ffdf8d81c48822a329af9f31dc239090f4a60761
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1
CVE: NA

--------------------------------

commit cbebd68f upstream.

cmdline_find_option() may fail before doing any initialization of
the buffer array. This may lead to unpredictable results when the same
buffer is used later in calls to strncmp() function.  Fix the issue by
returning early if cmdline_find_option() returns an error.

Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.

Fixes: aca20d54 ("x86/mm: Add support to make use of Secure Memory Encryption")
Signed-off-by: NNikita Zhandarovich <n.zhandarovich@fintech.ru>
Signed-off-by: NBorislav Petkov (AMD) <bp@alien8.de>
Acked-by: NTom Lendacky <thomas.lendacky@amd.com>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20230306160656.14844-1-n.zhandarovich@fintech.ruSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

8295451e

ext4: fail ext4_iget if special inode unallocated · 9188d638

由 Baokun Li 提交于 4月 07, 2023

stable inclusion
from stable-v4.19.279
commit 3aea195acd977e82d970cbc7078f983880c7ee6a
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1
CVE: NA

--------------------------------

[ Upstream commit 5cd74028 ]

In ext4_fill_super(), EXT4_ORPHAN_FS flag is cleared after
ext4_orphan_cleanup() is executed. Therefore, when __ext4_iget() is
called to get an inode whose i_nlink is 0 when the flag exists, no error
is returned. If the inode is a special inode, a null pointer dereference
may occur. If the value of i_nlink is 0 for any inodes (except boot loader
inodes) got by using the EXT4_IGET_SPECIAL flag, the current file system
is corrupted. Therefore, make the ext4_iget() function return an error if
it gets such an abnormal special inode.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=199179
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216541
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216539Reported-by: NLuís Henriques <lhenriques@suse.de>
Suggested-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230107032126.4165860-2-libaokun1@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

9188d638

ext4: zero i_disksize when initializing the bootloader inode · 484d2779

由 Zhihao Cheng 提交于 4月 07, 2023

stable inclusion
from stable-v4.19.278
commit 59eee0cdf8c036f554add97a4da7c06d7a9ff34a
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1
CVE: NA

--------------------------------

commit f5361da1 upstream.

If the boot loader inode has never been used before, the
EXT4_IOC_SWAP_BOOT inode will initialize it, including setting the
i_size to 0.  However, if the "never before used" boot loader has a
non-zero i_size, then i_disksize will be non-zero, and the
inconsistency between i_size and i_disksize can trigger a kernel
warning:

 WARNING: CPU: 0 PID: 2580 at fs/ext4/file.c:319
 CPU: 0 PID: 2580 Comm: bb Not tainted 6.3.0-rc1-00004-g703695902cfa
 RIP: 0010:ext4_file_write_iter+0xbc7/0xd10
 Call Trace:
  vfs_write+0x3b1/0x5c0
  ksys_write+0x77/0x160
  __x64_sys_write+0x22/0x30
  do_syscall_64+0x39/0x80

Reproducer:
 1. create corrupted image and mount it:
       mke2fs -t ext4 /tmp/foo.img 200
       debugfs -wR "sif <5> size 25700" /tmp/foo.img
       mount -t ext4 /tmp/foo.img /mnt
       cd /mnt
       echo 123 > file
 2. Run the reproducer program:
       posix_memalign(&buf, 1024, 1024)
       fd = open("file", O_RDWR | O_DIRECT);
       ioctl(fd, EXT4_IOC_SWAP_BOOT);
       write(fd, buf, 1024);

Fix this by setting i_disksize as well as i_size to zero when
initiaizing the boot loader inode.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217159
Cc: stable@kernel.org
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
Link: https://lore.kernel.org/r/20230308032643.641113-1-chengzhihao1@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

484d2779

irqdomain: Drop bogus fwspec-mapping error handling · 8c2b6143

由 Johan Hovold 提交于 4月 07, 2023

stable inclusion
from stable-v4.19.276
commit 525eb5cb8edfb7014711c2c87827ed17af8872fb
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1
CVE: NA

--------------------------------

commit e3b7ab02 upstream.

In case a newly allocated IRQ ever ends up not having any associated
struct irq_data it would not even be possible to dispose the mapping.

Replace the bogus disposal with a WARN_ON().

This will also be used to fix a shared-interrupt mapping race, hence the
CC-stable tag.

Fixes: 1e2a7d78 ("irqdomain: Don't set type when mapping an IRQ")
Cc: stable@vger.kernel.org      # 4.8
Tested-by: NHsin-Yi Wang <hsinyi@chromium.org>
Tested-by: NMark-PK Tsai <mark-pk.tsai@mediatek.com>
Signed-off-by: NJohan Hovold <johan+linaro@kernel.org>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230213104302.17307-4-johan+linaro@kernel.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

8c2b6143

irqdomain: Fix disassociation race · 02f71880

由 Johan Hovold 提交于 4月 07, 2023

stable inclusion
from stable-v4.19.276
commit 1b9fe6e4930155f1083a4dc32d3a47b1c4e8f55c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1
CVE: NA

--------------------------------

commit 3f883c38 upstream.

The global irq_domain_mutex is held when mapping interrupts from
non-hierarchical domains but currently not when disposing them.

This specifically means that updates of the domain mapcount is racy
(currently only used for statistics in debugfs).

Make sure to hold the global irq_domain_mutex also when disposing
mappings from non-hierarchical domains.

Fixes: 9dc6be3d ("genirq/irqdomain: Add map counter")
Cc: stable@vger.kernel.org      # 4.13
Tested-by: NHsin-Yi Wang <hsinyi@chromium.org>
Tested-by: NMark-PK Tsai <mark-pk.tsai@mediatek.com>
Signed-off-by: NJohan Hovold <johan+linaro@kernel.org>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230213104302.17307-3-johan+linaro@kernel.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

02f71880

irqdomain: Fix association race · 1db52082

由 Johan Hovold 提交于 4月 07, 2023

stable inclusion
from stable-v4.19.276
commit 2dcccf91bc4e9937dccf86c9b1f5026ffd72a80b
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1
CVE: NA

--------------------------------

commit b06730a5 upstream.

The sanity check for an already mapped virq is done outside of the
irq_domain_mutex-protected section which means that an (unlikely) racing
association may not be detected.

Fix this by factoring out the association implementation, which will
also be used in a follow-on change to fix a shared-interrupt mapping
race.

Fixes: ddaf144c ("irqdomain: Refactor irq_domain_associate_many()")
Cc: stable@vger.kernel.org      # 3.11
Tested-by: NHsin-Yi Wang <hsinyi@chromium.org>
Tested-by: NMark-PK Tsai <mark-pk.tsai@mediatek.com>
Signed-off-by: NJohan Hovold <johan+linaro@kernel.org>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230213104302.17307-2-johan+linaro@kernel.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

1db52082

x86/kprobes: Fix arch_check_optimized_kprobe check within optimized_kprobe range · a96794fa

由 Yang Jihong 提交于 4月 07, 2023

stable inclusion
from stable-v4.19.276
commit add105f090a6ad3af1caaf1d81f896208cfc7afb
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1
CVE: NA

--------------------------------

commit f1c97a1b upstream.

When arch_prepare_optimized_kprobe calculating jump destination address,
it copies original instructions from jmp-optimized kprobe (see
__recover_optprobed_insn), and calculated based on length of original
instruction.

arch_check_optimized_kprobe does not check KPROBE_FLAG_OPTIMATED when
checking whether jmp-optimized kprobe exists.
As a result, setup_detour_execution may jump to a range that has been
overwritten by jump destination address, resulting in an inval opcode error.

For example, assume that register two kprobes whose addresses are
<func+9> and <func+11> in "func" function.
The original code of "func" function is as follows:

   0xffffffff816cb5e9 <+9>:     push   %r12
   0xffffffff816cb5eb <+11>:    xor    %r12d,%r12d
   0xffffffff816cb5ee <+14>:    test   %rdi,%rdi
   0xffffffff816cb5f1 <+17>:    setne  %r12b
   0xffffffff816cb5f5 <+21>:    push   %rbp

1.Register the kprobe for <func+11>, assume that is kp1, corresponding optimized_kprobe is op1.
  After the optimization, "func" code changes to:

   0xffffffff816cc079 <+9>:     push   %r12
   0xffffffff816cc07b <+11>:    jmp    0xffffffffa0210000
   0xffffffff816cc080 <+16>:    incl   0xf(%rcx)
   0xffffffff816cc083 <+19>:    xchg   %eax,%ebp
   0xffffffff816cc084 <+20>:    (bad)
   0xffffffff816cc085 <+21>:    push   %rbp

Now op1->flags == KPROBE_FLAG_OPTIMATED;

2. Register the kprobe for <func+9>, assume that is kp2, corresponding optimized_kprobe is op2.

register_kprobe(kp2)
  register_aggr_kprobe
    alloc_aggr_kprobe
      __prepare_optimized_kprobe
        arch_prepare_optimized_kprobe
          __recover_optprobed_insn    // copy original bytes from kp1->optinsn.copied_insn,
                                      // jump address = <func+14>

3. disable kp1:

disable_kprobe(kp1)
  __disable_kprobe
    ...
    if (p == orig_p || aggr_kprobe_disabled(orig_p)) {
      ret = disarm_kprobe(orig_p, true)       // add op1 in unoptimizing_list, not unoptimized
      orig_p->flags |= KPROBE_FLAG_DISABLED;  // op1->flags ==  KPROBE_FLAG_OPTIMATED | KPROBE_FLAG_DISABLED
    ...

4. unregister kp2
__unregister_kprobe_top
  ...
  if (!kprobe_disabled(ap) && !kprobes_all_disarmed) {
    optimize_kprobe(op)
      ...
      if (arch_check_optimized_kprobe(op) < 0) // because op1 has KPROBE_FLAG_DISABLED, here not return
        return;
      p->kp.flags |= KPROBE_FLAG_OPTIMIZED;   //  now op2 has KPROBE_FLAG_OPTIMIZED
  }

"func" code now is:

   0xffffffff816cc079 <+9>:     int3
   0xffffffff816cc07a <+10>:    push   %rsp
   0xffffffff816cc07b <+11>:    jmp    0xffffffffa0210000
   0xffffffff816cc080 <+16>:    incl   0xf(%rcx)
   0xffffffff816cc083 <+19>:    xchg   %eax,%ebp
   0xffffffff816cc084 <+20>:    (bad)
   0xffffffff816cc085 <+21>:    push   %rbp

5. if call "func", int3 handler call setup_detour_execution:

  if (p->flags & KPROBE_FLAG_OPTIMIZED) {
    ...
    regs->ip = (unsigned long)op->optinsn.insn + TMPL_END_IDX;
    ...
  }

The code for the destination address is

   0xffffffffa021072c:  push   %r12
   0xffffffffa021072e:  xor    %r12d,%r12d
   0xffffffffa0210731:  jmp    0xffffffff816cb5ee <func+14>

However, <func+14> is not a valid start instruction address. As a result, an error occurs.

Link: https://lore.kernel.org/all/20230216034247.32348-3-yangjihong1@huawei.com/

Fixes: f66c0447 ("kprobes: Set unoptimized flag after unoptimizing code")
Signed-off-by: NYang Jihong <yangjihong1@huawei.com>
Cc: stable@vger.kernel.org
Acked-by: NMasami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: NMasami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

a96794fa

x86/kprobes: Fix __recover_optprobed_insn check optimizing logic · 754bca97

由 Yang Jihong 提交于 4月 07, 2023

stable inclusion
from stable-v4.19.276
commit 4334c26f53585a45455af324c08a4b0036bfaa8d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1
CVE: NA

--------------------------------

commit 868a6fc0 upstream.

Since the following commit:

  commit f66c0447 ("kprobes: Set unoptimized flag after unoptimizing code")

modified the update timing of the KPROBE_FLAG_OPTIMIZED, a optimized_kprobe
may be in the optimizing or unoptimizing state when op.kp->flags
has KPROBE_FLAG_OPTIMIZED and op->list is not empty.

The __recover_optprobed_insn check logic is incorrect, a kprobe in the
unoptimizing state may be incorrectly determined as unoptimizing.
As a result, incorrect instructions are copied.

The optprobe_queued_unopt function needs to be exported for invoking in
arch directory.

Link: https://lore.kernel.org/all/20230216034247.32348-2-yangjihong1@huawei.com/

Fixes: f66c0447 ("kprobes: Set unoptimized flag after unoptimizing code")
Cc: stable@vger.kernel.org
Signed-off-by: NYang Jihong <yangjihong1@huawei.com>
Acked-by: NMasami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: NMasami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

754bca97

x86/bugs: Reset speculation control settings on init · 4baf67fc

由 Breno Leitao 提交于 4月 07, 2023

stable inclusion
from stable-v4.19.276
commit ca582161f5900991e26240e17f30740a8f3b9f2b
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1
CVE: NA

--------------------------------

[ Upstream commit 0125acda ]

Currently, x86_spec_ctrl_base is read at boot time and speculative bits
are set if Kconfig items are enabled. For example, IBRS is enabled if
CONFIG_CPU_IBRS_ENTRY is configured, etc. These MSR bits are not cleared
if the mitigations are disabled.

This is a problem when kexec-ing a kernel that has the mitigation
disabled from a kernel that has the mitigation enabled. In this case,
the MSR bits are not cleared during the new kernel boot. As a result,
this might have some performance degradation that is hard to pinpoint.

This problem does not happen if the machine is (hard) rebooted because
the bit will be cleared by default.

  [ bp: Massage. ]
Suggested-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: NBreno Leitao <leitao@debian.org>
Signed-off-by: NBorislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20221128153148.1129350-1-leitao@debian.orgSigned-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

4baf67fc

timers: Prevent union confusion from unexpected restart_syscall() · 960d6b2a

由 Jann Horn 提交于 4月 07, 2023

stable inclusion
from stable-v4.19.276
commit 003e49fab13d0de9cda625489c402e5d18012a8b
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1
CVE: NA

--------------------------------

[ Upstream commit 9f76d591 ]

The nanosleep syscalls use the restart_block mechanism, with a quirk:
The `type` and `rmtp`/`compat_rmtp` fields are set up unconditionally on
syscall entry, while the rest of the restart_block is only set up in the
unlikely case that the syscall is actually interrupted by a signal (or
pseudo-signal) that doesn't have a signal handler.

If the restart_block was set up by a previous syscall (futex(...,
FUTEX_WAIT, ...) or poll()) and hasn't been invalidated somehow since then,
this will clobber some of the union fields used by futex_wait_restart() and
do_restart_poll().

If userspace afterwards wrongly calls the restart_syscall syscall,
futex_wait_restart()/do_restart_poll() will read struct fields that have
been clobbered.

This doesn't actually lead to anything particularly interesting because
none of the union fields contain trusted kernel data, and
futex(..., FUTEX_WAIT, ...) and poll() aren't syscalls where it makes much
sense to apply seccomp filters to their arguments.

So the current consequences are just of the "if userspace does bad stuff,
it can damage itself, and that's not a problem" flavor.

But still, it seems like a hazard for future developers, so invalidate the
restart_block when partly setting it up in the nanosleep syscalls.
Signed-off-by: NJann Horn <jannh@google.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230105134403.754986-1-jannh@google.comSigned-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

960d6b2a

crypto: rsa-pkcs1pad - Use akcipher_request_complete · 90269052

由 Herbert Xu 提交于 4月 07, 2023

stable inclusion
from stable-v4.19.276
commit a023f1a938ad43642ad68f68527001e5686f5e60
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1
CVE: NA

--------------------------------

[ Upstream commit 564cabc0 ]

Use the akcipher_request_complete helper instead of calling the
completion function directly.  In fact the previous code was buggy
in that EINPROGRESS was never passed back to the original caller.

Fixes: 3d5b1ecd ("crypto: rsa - RSA padding algorithm")
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

90269052

crypto: seqiv - Handle EBUSY correctly · 6196963f

由 Herbert Xu 提交于 4月 07, 2023

stable inclusion
from stable-v4.19.276
commit 1effbddaff60eeef8017c6dea1ee0ed970164d14
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1
CVE: NA

--------------------------------

[ Upstream commit 32e62025 ]

As it is seqiv only handles the special return value of EINPROGERSS,
which means that in all other cases it will free data related to the
request.

However, as the caller of seqiv may specify MAY_BACKLOG, we also need
to expect EBUSY and treat it in the same way.  Otherwise backlogged
requests will trigger a use-after-free.

Fixes: 0a270321 ("[CRYPTO] seqiv: Add Sequence Number IV Generator")
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

6196963f

ACPI: battery: Fix missing NUL-termination with large strings · 11b6c1a2

由 Armin Wolf 提交于 4月 07, 2023

stable inclusion
from stable-v4.19.276
commit 6671af7f52c382963b482c6ae55f3e6ee582e0f6
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1
CVE: NA

--------------------------------

[ Upstream commit f2ac14b5 ]

When encountering a string bigger than the destination buffer (32 bytes),
the string is not properly NUL-terminated, causing buffer overreads later.

This for example happens on the Inspiron 3505, where the battery
model name is larger than 32 bytes, which leads to sysfs showing
the model name together with the serial number string (which is
NUL-terminated and thus prevents worse).

Fix this by using strscpy() which ensures that the result is
always NUL-terminated.

Fixes: 106449e8 ("ACPI: Battery: Allow extract string from integer")
Signed-off-by: NArmin Wolf <W_Armin@gmx.de>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

11b6c1a2

ACPICA: nsrepair: handle cases without a return value correctly · 721f9208

由 Daniil Tatianin 提交于 4月 07, 2023

stable inclusion
from stable-v4.19.276
commit 331db828d34c37c466e4915fd50ea51415da143c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1
CVE: NA

--------------------------------

[ Upstream commit ca843a4c ]

Previously acpi_ns_simple_repair() would crash if expected_btypes
contained any combination of ACPI_RTYPE_NONE with a different type,
e.g | ACPI_RTYPE_INTEGER because of slightly incorrect logic in the
!return_object branch, which wouldn't return AE_AML_NO_RETURN_VALUE
for such cases.

Found by Linux Verification Center (linuxtesting.org) with the SVACE
static analysis tool.

Link: https://github.com/acpica/acpica/pull/811
Fixes: 61db45ca ("ACPICA: Restore code that repairs NULL package elements in return values.")
Signed-off-by: NDaniil Tatianin <d-tatianin@yandex-team.ru>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

721f9208

genirq: Fix the return type of kstat_cpu_irqs_sum() · 09f78ae4

由 Zhen Lei 提交于 4月 07, 2023

stable inclusion
from stable-v4.19.276
commit b84d49628b0fcb1c4925b565ccfeb50a0b0f4630
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1
CVE: NA

--------------------------------

[ Upstream commit 47904aed ]

The type of member ->irqs_sum is unsigned long, but kstat_cpu_irqs_sum()
returns int, which can result in truncation.  Therefore, change the
kstat_cpu_irqs_sum() function's return value to unsigned long to avoid
truncation.

Fixes: f2c66cd8 ("/proc/stat: scalability of irq num per cpu")
Reported-by: NElliott, Robert (Servers) <elliott@hpe.com>
Signed-off-by: NZhen Lei <thunder.leizhen@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Josh Don <joshdon@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: NFrederic Weisbecker <frederic@kernel.org>
Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

09f78ae4

ACPI: NFIT: fix a potential deadlock during NFIT teardown · e0ccb64b

由 Vishal Verma 提交于 4月 07, 2023

stable inclusion
from stable-v4.19.275
commit 5f8401d7dba21e549306fe4a7a9ff8bf3bd8d56a
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1
CVE: NA

--------------------------------

[ Upstream commit fb6df436 ]

Lockdep reports that acpi_nfit_shutdown() may deadlock against an
opportune acpi_nfit_scrub(). acpi_nfit_scrub () is run from inside a
'work' and therefore has already acquired workqueue-internal locks. It
also acquiires acpi_desc->init_mutex. acpi_nfit_shutdown() first
acquires init_mutex, and was subsequently attempting to cancel any
pending workqueue items. This reversed locking order causes a potential
deadlock:

    ======================================================
    WARNING: possible circular locking dependency detected
    6.2.0-rc3 #116 Tainted: G           O     N
    ------------------------------------------------------
    libndctl/1958 is trying to acquire lock:
    ffff888129b461c0 ((work_completion)(&(&acpi_desc->dwork)->work)){+.+.}-{0:0}, at: __flush_work+0x43/0x450

    but task is already holding lock:
    ffff888129b460e8 (&acpi_desc->init_mutex){+.+.}-{3:3}, at: acpi_nfit_shutdown+0x87/0xd0 [nfit]

    which lock already depends on the new lock.

    ...

    Possible unsafe locking scenario:

          CPU0                    CPU1
          ----                    ----
     lock(&acpi_desc->init_mutex);
                                  lock((work_completion)(&(&acpi_desc->dwork)->work));
                                  lock(&acpi_desc->init_mutex);
     lock((work_completion)(&(&acpi_desc->dwork)->work));

    *** DEADLOCK ***

Since the workqueue manipulation is protected by its own internal locking,
the cancellation of pending work doesn't need to be done under
acpi_desc->init_mutex. Move cancel_delayed_work_sync() outside the
init_mutex to fix the deadlock. Any work that starts after
acpi_nfit_shutdown() drops the lock will see ARS_CANCEL, and the
cancel_delayed_work_sync() will safely flush it out.
Reported-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Link: https://lore.kernel.org/r/20230112-acpi_nfit_lockdep-v1-1-660be4dd10be@intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

e0ccb64b

alarmtimer: Prevent starvation by small intervals and SIG_IGN · 32bc4886

由 Thomas Gleixner 提交于 4月 07, 2023

stable inclusion
from stable-v4.19.274
commit d6a300076d11a6e27b4d4f7fd986ec66ee97a3e1
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6TIG1
CVE: NA

--------------------------------

commit d125d134 upstream.

syzbot reported a RCU stall which is caused by setting up an alarmtimer
with a very small interval and ignoring the signal. The reproducer arms the
alarm timer with a relative expiry of 8ns and an interval of 9ns. Not a
problem per se, but that's an issue when the signal is ignored because then
the timer is immediately rearmed because there is no way to delay that
rearming to the signal delivery path.  See posix_timer_fn() and commit
58229a18 ("posix-timers: Prevent softirq starvation by small intervals
and SIG_IGN") for details.

The reproducer does not set SIG_IGN explicitely, but it sets up the timers
signal with SIGCONT. That has the same effect as explicitely setting
SIG_IGN for a signal as SIGCONT is ignored if there is no handler set and
the task is not ptraced.

The log clearly shows that:

   [pid  5102] --- SIGCONT {si_signo=SIGCONT, si_code=SI_TIMER, si_timerid=0, si_overrun=316014, si_int=0, si_ptr=NULL} ---

It works because the tasks are traced and therefore the signal is queued so
the tracer can see it, which delays the restart of the timer to the signal
delivery path. But then the tracer is killed:

   [pid  5087] kill(-5102, SIGKILL <unfinished ...>
   ...
   ./strace-static-x86_64: Process 5107 detached

and after it's gone the stall can be observed:

   syzkaller login: [   79.439102][    C0] hrtimer: interrupt took 68471 ns
   [  184.460538][    C1] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
   ...
   [  184.658237][    C1] rcu: Stack dump where RCU GP kthread last ran:
   [  184.664574][    C1] Sending NMI from CPU 1 to CPUs 0:
   [  184.669821][    C0] NMI backtrace for cpu 0
   [  184.669831][    C0] CPU: 0 PID: 5108 Comm: syz-executor192 Not tainted 6.2.0-rc6-next-20230203-syzkaller #0
   ...
   [  184.670036][    C0] Call Trace:
   [  184.670041][    C0]  <IRQ>
   [  184.670045][    C0]  alarmtimer_fired+0x327/0x670

posix_timer_fn() prevents that by checking whether the interval for
timers which have the signal ignored is smaller than a jiffie and
artifically delay it by shifting the next expiry out by a jiffie. That's
accurate vs. the overrun accounting, but slightly inaccurate
vs. timer_gettimer(2).

The comment in that function says what needs to be done and there was a fix
available for the regular userspace induced SIG_IGN mechanism, but that did
not work due to the implicit ignore for SIGCONT and similar signals. This
needs to be worked on, but for now the only available workaround is to do
exactly what posix_timer_fn() does:

Increase the interval of self-rearming timers, which have their signal
ignored, to at least a jiffie.

Interestingly this has been fixed before via commit ff86bf0c
("alarmtimer: Rate limit periodic intervals") already, but that fix got
lost in a later rework.

Reported-by: syzbot+b9564ba6e8e00694511b@syzkaller.appspotmail.com
Fixes: f2c45807 ("alarmtimer: Switch over to generic set/get/rearm routine")
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NJohn Stultz <jstultz@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87k00q1no2.ffs@tglxSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

32bc4886

07 4月, 2023 19 次提交

ring-buffer: Fix race while reader and writer are on the same page · 19ad53da

由 Zheng Yejian 提交于 4月 07, 2023

mainline inclusion
from mainline-v6.3-rc6
commit 6455b616
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6TJ97
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6455b6163d8c680366663cdb8c679514d55fc30c

--------------------------------

When user reads file 'trace_pipe', kernel keeps printing following logs
that warn at "cpu_buffer->reader_page->read > rb_page_size(reader)" in
rb_get_reader_page(). It just looks like there's an infinite loop in
tracing_read_pipe(). This problem occurs several times on arm64 platform
when testing v5.10 and below.

  Call trace:
   rb_get_reader_page+0x248/0x1300
   rb_buffer_peek+0x34/0x160
   ring_buffer_peek+0xbc/0x224
   peek_next_entry+0x98/0xbc
   __find_next_entry+0xc4/0x1c0
   trace_find_next_entry_inc+0x30/0x94
   tracing_read_pipe+0x198/0x304
   vfs_read+0xb4/0x1e0
   ksys_read+0x74/0x100
   __arm64_sys_read+0x24/0x30
   el0_svc_common.constprop.0+0x7c/0x1bc
   do_el0_svc+0x2c/0x94
   el0_svc+0x20/0x30
   el0_sync_handler+0xb0/0xb4
   el0_sync+0x160/0x180

Then I dump the vmcore and look into the problematic per_cpu ring_buffer,
I found that tail_page/commit_page/reader_page are on the same page while
reader_page->read is obviously abnormal:
  tail_page == commit_page == reader_page == {
    .write = 0x100d20,
    .read = 0x8f9f4805,  // Far greater than 0xd20, obviously abnormal!!!
    .entries = 0x10004c,
    .real_end = 0x0,
    .page = {
      .time_stamp = 0x857257416af0,
      .commit = 0xd20,  // This page hasn't been full filled.
      // .data[0...0xd20] seems normal.
    }
 }

The root cause is most likely the race that reader and writer are on the
same page while reader saw an event that not fully committed by writer.

To fix this, add memory barriers to make sure the reader can see the
content of what is committed. Since commit a0fcaaed ("ring-buffer: Fix
race between reset page and reading page") has added the read barrier in
rb_get_reader_page(), here we just need to add the write barrier.

Link: https://lore.kernel.org/linux-trace-kernel/20230325021247.2923907-1-zhengyejian1@huawei.com

Cc: stable@vger.kernel.org
Fixes: 77ae365e ("ring-buffer: make lockless")
Suggested-by: NSteven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: NZheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: NSteven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: NZheng Yejian <zhengyejian1@huawei.com>
Reviewed-by: NYang Jihong <yangjihong1@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

19ad53da

cgroup: Add missing cpus_read_lock() to cgroup_attach_task_all() · 6ea7c5e7

由 Tetsuo Handa 提交于 4月 07, 2023

stable inclusion
from stable-v4.19.280
commit 321488cfac7d0eb6d97de467015ff754f85813ff
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6TI3Y
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=321488cfac7d0eb6d97de467015ff754f85813ff

--------------------------------

commit 43626dad upstream.

syzbot is hitting percpu_rwsem_assert_held(&cpu_hotplug_lock) warning at
cpuset_attach() [1], for commit 4f7e7236 ("cgroup: Fix
threadgroup_rwsem <-> cpus_read_lock() deadlock") missed that
cpuset_attach() is also called from cgroup_attach_task_all().
Add cpus_read_lock() like what cgroup_procs_write_start() does.

Link: https://syzkaller.appspot.com/bug?extid=29d3a3b4d86c8136ad9e [1]
Reported-by: Nsyzbot <syzbot+29d3a3b4d86c8136ad9e@syzkaller.appspotmail.com>
Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 4f7e7236 ("cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock")
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NCai Xinchen <caixinchen1@huawei.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

conflicts:
	kernel/cgroup/cgroup-internal.h
	kernel/cgroup/cgroup-v1.c
	kernel/cgroup/cgroup.c
Signed-off-by: NCai Xinchen <caixinchen1@huawei.com>
Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

6ea7c5e7

cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock · b48ceb10

由 Tejun Heo 提交于 4月 07, 2023

stable inclusion
from stable-v4.19.280
commit e446300968c6bd25d9cd6c33b9600780a39b3975
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6TI3Y
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e446300968c6bd25d9cd6c33b9600780a39b3975

--------------------------------

commit 4f7e7236 upstream.

Add #include <linux/cpu.h> to avoid compile error on some architectures.

commit 9a3284fa ("cgroup: Optimize single thread migration") and
commit 671c11f0 ("cgroup: Elide write-locking threadgroup_rwsem
when updating csses on an empty subtree") are not backport. So ignore the
input parameter of cgroup_attach_lock/cgroup_attach_unlock.

original commit message:

Bringing up a CPU may involve creating and destroying tasks which requires
read-locking threadgroup_rwsem, so threadgroup_rwsem nests inside
cpus_read_lock(). However, cpuset's ->attach(), which may be called with
thredagroup_rwsem write-locked, also wants to disable CPU hotplug and
acquires cpus_read_lock(), leading to a deadlock.

Fix it by guaranteeing that ->attach() is always called with CPU hotplug
disabled and removing cpus_read_lock() call from cpuset_attach().
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-and-tested-by: NImran Khan <imran.f.khan@oracle.com>
Reported-and-tested-by: NXuewen Yan <xuewen.yan@unisoc.com>
Fixes: 05c7b7a9 ("cgroup/cpuset: Fix a race between cpuset_attach() and cpu hotplug")
Cc: stable@vger.kernel.org # v5.17+
Signed-off-by: NCai Xinchen <caixinchen1@huawei.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NCai Xinchen <caixinchen1@huawei.com>
Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

b48ceb10

cgroup/cpuset: Change cpuset_rwsem and hotplug lock order · f49afae4

由 Juri Lelli 提交于 4月 07, 2023

stable inclusion
from stable-v4.19.280
commit 224262583fabf3b6bf2a29d033cf9a8f28fde843
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6TI3Y
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=224262583fabf3b6bf2a29d033cf9a8f28fde843

--------------------------------

commit d74b27d6 upstream.

commit 1243dc51 ("cgroup/cpuset: Convert cpuset_mutex to
percpu_rwsem") is performance patch which is not backport. So
convert percpu_rwsem to cpuset_mutex.

commit aa44002e7db25 ("cpuset: Fix unsafe lock order between
cpuset lock and cpuslock") makes lock order keep cpuset_mutex
->cpu_hotplug_lock. We should change lock order in cpuset_attach.

original commit message:

cpuset_rwsem is going to be acquired from sched_setscheduler() with a
following patch. There are however paths (e.g., spawn_ksoftirqd) in
which sched_scheduler() is eventually called while holding hotplug lock;
this creates a dependecy between hotplug lock (to be always acquired
first) and cpuset_rwsem (to be always acquired after hotplug lock).

Fix paths which currently take the two locks in the wrong order (after
a following patch is applied).
Tested-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: NJuri Lelli <juri.lelli@redhat.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bristot@redhat.com
Cc: claudio@evidence.eu.com
Cc: lizefan@huawei.com
Cc: longman@redhat.com
Cc: luca.abeni@santannapisa.it
Cc: mathieu.poirier@linaro.org
Cc: rostedt@goodmis.org
Cc: tj@kernel.org
Cc: tommaso.cucinotta@santannapisa.it
Link: https://lkml.kernel.org/r/20190719140000.31694-7-juri.lelli@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NCai Xinchen <caixinchen1@huawei.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NCai Xinchen <caixinchen1@huawei.com>
Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

f49afae4

Revert "cgroup/cpuset: Change cpuset_rwsem and hotplug lock order" · 1ffff695

由 Cai Xinchen 提交于 4月 07, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6TI3Y
CVE: NA

--------------------------------

This reverts commit c831178a.
Signed-off-by: NCai Xinchen <caixinchen1@huawei.com>
Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

1ffff695

Revert "cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock" · 5faaedb6

由 Cai Xinchen 提交于 4月 07, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6TI3Y
CVE: NA

--------------------------------

This reverts commit 4924308a.
Signed-off-by: NCai Xinchen <caixinchen1@huawei.com>
Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

5faaedb6

Revert "cgroup: Add missing cpus_read_lock() to cgroup_attach_task_all()" · 7d8391de

由 Cai Xinchen 提交于 4月 07, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6TI3Y
CVE: NA

--------------------------------

This reverts commit c2d83556.
Signed-off-by: NCai Xinchen <caixinchen1@huawei.com>
Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

7d8391de

block: fix wrong mode for blkdev_put() from disk_scan_partitions() · ad2c2fb9

由 Yu Kuai 提交于 4月 07, 2023

mainline inclusion
from mainline-v6.3-rc2
commit 428913bc
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6MRB5
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e5cfefa97bccf956ea0bb6464c1f6c84fd7a8d9f

--------------------------------

If disk_scan_partitions() is called with 'FMODE_EXCL',
blkdev_get_by_dev() will be called without 'FMODE_EXCL', however, follow
blkdev_put() is still called with 'FMODE_EXCL', which will cause
'bd_holders' counter to leak.

Fix the problem by using the right mode for blkdev_put().

Reported-by: syzbot+2bcc0d79e548c4f62a59@syzkaller.appspotmail.com
Link: https://lore.kernel.org/lkml/f9649d501bc8c3444769418f6c26263555d9d3be.camel@linux.ibm.com/T/Tested-by: NJulian Ruess <julianr@linux.ibm.com>
Fixes: e5cfefa9 ("block: fix scan partition for exclusively open device again")
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

ad2c2fb9

block: fix scan partition for exclusively open device again · fb518619

由 Yu Kuai 提交于 4月 07, 2023

mainline inclusion
from mainline-v6.3-rc1
commit e5cfefa9
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6MRB5
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e5cfefa97bccf956ea0bb6464c1f6c84fd7a8d9f

--------------------------------

As explained in commit 36369f46 ("block: Do not reread partition table
on exclusively open device"), reread partition on the device that is
exclusively opened by someone else is problematic.

This patch will make sure partition scan will only be proceed if current
thread open the device exclusively, or the device is not opened
exclusively, and in the later case, other scanners and exclusive openers
will be blocked temporarily until partition scan is done.

Fixes: 10c70d95 ("block: remove the bd_openers checks in blk_drop_partitions")
Cc: <stable@vger.kernel.org>
Suggested-by: NJan Kara <jack@suse.cz>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230217022200.3092987-3-yukuai1@huaweicloud.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

Conflicts:
	block/genhd.c
	block/ioctl.c
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

fb518619

block: fix kabi broken in ioctl.c · cdfb5c11

由 Yu Kuai 提交于 4月 07, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6MRB5
CVE: NA

--------------------------------

Include blk.h in ioctl.c will cause kabi broken, because some data
structure definitions is exposed. This patch add a separate header file
to fix this problem.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

cdfb5c11

block: merge disk_scan_partitions and blkdev_reread_part · efc73feb

由 Christoph Hellwig 提交于 4月 07, 2023

mainline inclusion
from mainline-v5.17-rc1
commit e16e506c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6MRB5
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e16e506ccd673a3a888a34f8f694698305840044

--------------------------------

Unify the functionality that implements a partition rescan for a
gendisk.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211122130625.1136848-6-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

Conflicts:
 - comit f0b870df ("block: remove (__)blkdev_reread_part as an
 exported API") is not backported, and this patch doesn't remove
 (__)blkdev_reread_part apis as well.
 - commit b98bcd9e ("block: reopen the device in blkdev_reread_part")
 is not backported, this patch switch blkdev_reread_part() to
 disk_scan_partitions() directly, which will reopen the device.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

efc73feb

block: cleanup partition scanning in register_disk · fbbec472

由 Christoph Hellwig 提交于 4月 07, 2023

mainline inclusion
from mainline-v5.10-rc1
commit 9301fe73
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6MRB5
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9301fe734384990ef9a2463cb7aeb3b00bf5dad5

--------------------------------

Use blkdev_get_by_dev instead of open coding it using bdget_disk +
blkdev_get, and split the code to read the partition table into a
separate helper to make it a little more obvious.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

Conflict:
 - this patch just factor out a helper, bdget_disk + blkdev_get is still
 used because 'bdev->bd_invalidated' need to be set.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

fbbec472

block: Revert "block: check 'bd_super' before rescanning partition" · 33b040f7

由 Yu Kuai 提交于 4月 07, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6MRB5
CVE: NA

--------------------------------

This reverts commit 00f20694.

Mainline solution will be backported in following patches.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

33b040f7

md: fix kabi broken in struct mddev · d0acf215

由 Yu Kuai 提交于 4月 07, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6T6VY
CVE: NA

--------------------------------

Struct mddev is just used inside raid, just in case that md_mod is compiled
from new kernel, and raid1/raid10 or other out-of-tree raid are compiled
from old kernel.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

d0acf215

md: use interruptible apis in idle/frozen_sync_thread · e990c814

由 Yu Kuai 提交于 4月 07, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6T6VY
CVE: NA

--------------------------------

Before refactoring idle and frozen from action_store, interruptible apis
is used so that hungtask warning won't be triggered if it takes too long
to finish indle/frozen sync_thread. This patch do the same.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

e990c814

md: wake up 'resync_wait' at last in md_reap_sync_thread() · dadf0563

由 Yu Kuai 提交于 4月 07, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6T6VY
CVE: NA

--------------------------------

We just replace md_reap_sync_thread() with wait_event(resync_wait, ...)
from action_store(), this patch just make sure action_store() will still
wait for everything to be done in md_reap_sync_thread().
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

dadf0563

md: refactor idle/frozen_sync_thread() · 969e6f89

由 Yu Kuai 提交于 4月 07, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6T6VY
CVE: NA

--------------------------------

Our test found a following deadlock in raid10:

1) Issue a normal write, and such write failed:

  raid10_end_write_request
   set_bit(R10BIO_WriteError, &r10_bio->state)
   one_write_done
    reschedule_retry

  // later from md thread
  raid10d
   handle_write_completed
    list_add(&r10_bio->retry_list, &conf->bio_end_io_list)

  // later from md thread
  raid10d
   if (!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags))
    list_move(conf->bio_end_io_list.prev, &tmp)
    r10_bio = list_first_entry(&tmp, struct r10bio, retry_list)
    raid_end_bio_io(r10_bio)

Dependency chain 1: normal io is waiting for updating superblock

2) Trigger a recovery:

  raid10_sync_request
   raise_barrier

Dependency chain 2: sync thread is waiting for normal io

3) echo idle/frozen to sync_action:

  action_store
   mddev_lock
    md_unregister_thread
     kthread_stop

Dependency chain 3: drop 'reconfig_mutex' is waiting for sync thread

4) md thread can't update superblock:

  raid10d
   md_check_recovery
    if (mddev_trylock(mddev))
     md_update_sb

Dependency chain 4: update superblock is waiting for 'reconfig_mutex'

Hence cyclic dependency exist, in order to fix the problem, we must
break one of them. Dependency 1 and 2 can't be broken because they are
foundation design. Dependency 4 may be possible if it can be guaranteed
that no io can be inflight, however, this requires a new mechanism which
seems complex. Dependency 3 is a good choice, because idle/frozen only
requires sync thread to finish, which can be done asynchronously that is
already implemented, and 'reconfig_mutex' is not needed anymore.

This patch switch 'idle' and 'frozen' to wait sync thread to be done
asynchronously, and this patch also add a sequence counter to record how
many times sync thread is done, so that 'idle' won't keep waiting on new
started sync thread.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

969e6f89

md: add a mutex to synchronize idle and frozen in action_store() · d2a9f128

由 Yu Kuai 提交于 4月 07, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6T6VY
CVE: NA

--------------------------------

Currently, for idle and frozen, action_store will hold 'reconfig_mutex'
and call md_reap_sync_thread() to stop sync thread, however, this will
cause deadlock (explained in the next patch). In order to fix the
problem, following patch will release 'reconfig_mutex' and wait on
'resync_wait', like md_set_readonly() and do_md_stop() does.

Consider that action_store() will set/clear 'MD_RECOVERY_FROZEN'
unconditionally, which might cause unexpected problems, for example,
frozen just set 'MD_RECOVERY_FROZEN' and is still in progress, while
'idle' clear 'MD_RECOVERY_FROZEN' and new sync thread is started, which
might starve in progress frozen.

This patch add a mutex to synchronize idle and frozen from
action_store().
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

d2a9f128

md: refactor action_store() for 'idle' and 'frozen' · dd9fcd21

由 Yu Kuai 提交于 4月 07, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6T6VY
CVE: NA

--------------------------------

Prepare to handle 'idle' and 'frozen' differently to fix a deadlock, there
are no functional changes except that MD_RECOVERY_RUNNING is checked
again after 'reconfig_mutex' is held.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

dd9fcd21

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功