提交 · 91562cf99364dd29755988f3cc33ce9a46cd5b0a · openeuler / Kernel

28 10月, 2022 4 次提交

RISC-V: Fix /proc/cpuinfo cpumask warning · d14e99bf

由 Andrew Jones 提交于 10月 14, 2022

Commit 78e5a339 ("cpumask: fix checking valid cpu range") has
started issuing warnings[*] when cpu indices equal to nr_cpu_ids - 1
are passed to cpumask_next* functions. seq_read_iter() and cpuinfo's
start and next seq operations implement a pattern like

  n = cpumask_next(n - 1, mask);
  show(n);
  while (1) {
      ++n;
      n = cpumask_next(n - 1, mask);
      if (n >= nr_cpu_ids)
          break;
      show(n);
  }

which will issue the warning when reading /proc/cpuinfo. Ensure no
warning is generated by validating the cpu index before calling
cpumask_next().

[*] Warnings will only appear with DEBUG_PER_CPU_MAPS enabled.
Signed-off-by: NAndrew Jones <ajones@ventanamicro.com>
Reviewed-by: NAnup Patel <anup@brainfault.org>
Reviewed-by: NConor Dooley <conor.dooley@microchip.com>
Tested-by: NConor Dooley <conor.dooley@microchip.com>
Acked-by: NYury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/20221014155845.1986223-2-ajones@ventanamicro.com/
Fixes: 78e5a339 ("cpumask: fix checking valid cpu range")
Cc: stable@vger.kernel.org
Signed-off-by: NPalmer Dabbelt <palmer@rivosinc.com>

d14e99bf

riscv: fix detection of toolchain Zihintpause support · aae538cd

由 Conor Dooley 提交于 10月 06, 2022

It is not sufficient to check if a toolchain supports a particular
extension without checking if the linker supports that extension
too. For example, Clang 15 supports Zihintpause but GNU bintutils
2.35.2 does not, leading build errors like so:

riscv64-linux-gnu-ld: -march=rv64i2p0_m2p0_a2p0_c2p0_zihintpause2p0: Invalid or unknown z ISA extension: 'zihintpause'

Add a TOOLCHAIN_HAS_ZIHINTPAUSE which checks if each of the compiler,
assembler and linker support the extension. Replace the ifdef in the
vdso with one depending on this new symbol.

Fixes: 8eb060e1 ("arch/riscv: add Zihintpause support")
Signed-off-by: NConor Dooley <conor.dooley@microchip.com>
Reviewed-by: NHeiko Stuebner <heiko@sntech.de>
Reviewed-by: NNathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20221006173520.1785507-3-conor@kernel.orgSigned-off-by: NPalmer Dabbelt <palmer@rivosinc.com>

aae538cd

riscv: fix detection of toolchain Zicbom support · b8c86872

由 Conor Dooley 提交于 10月 06, 2022

It is not sufficient to check if a toolchain supports a particular
extension without checking if the linker supports that extension too.
For example, Clang 15 supports Zicbom but GNU bintutils 2.35.2 does
not, leading build errors like so:

riscv64-linux-gnu-ld: -march=rv64i2p0_m2p0_a2p0_c2p0_zicbom1p0_zihintpause2p0: Invalid or unknown z ISA extension: 'zicbom'

Convert CC_HAS_ZICBOM to TOOLCHAIN_HAS_ZICBOM & check if the linker
also supports Zicbom.
Reported-by: NKevin Hilman <khilman@baylibre.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/1714
Link: https://storage.kernelci.org/next/master/next-20220920/riscv/defconfig+CONFIG_EFI=n/clang-16/logs/kernel.log
Fixes: 1631ba12 ("riscv: Add support for non-coherent devices using zicbom extension")
Signed-off-by: NConor Dooley <conor.dooley@microchip.com>
Reviewed-by: NHeiko Stuebner <heiko@sntech.de>
Reviewed-by: NNathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20221006173520.1785507-2-conor@kernel.org
[Palmer: Check for ld-2.38, not 2.39, as 2.38 no longer errors.]
Signed-off-by: NPalmer Dabbelt <palmer@rivosinc.com>

b8c86872

riscv: mm: add missing memcpy in kasan_init · 9f2ac64d

由 Qinglin Pan 提交于 10月 09, 2022

Hi Atish,

It seems that the panic is due to the missing memcpy during kasan_init.
Could you please check whether this patch is helpful?

When doing kasan_populate, the new allocated base_pud/base_p4d should
contain kasan_early_shadow_{pud, p4d}'s content. Add the missing memcpy
to avoid page fault when read/write kasan shadow region.

Tested on:
 - qemu with sv57 and CONFIG_KASAN on.
 - qemu with sv48 and CONFIG_KASAN on.
Signed-off-by: NQinglin Pan <panqinglin2020@iscas.ac.cn>
Tested-by: NAtish Patra <atishp@rivosinc.com>
Fixes: 8fbdccd2 ("riscv: mm: Support kasan for sv57")
Link: https://lore.kernel.org/r/20221009083050.3814850-1-panqinglin2020@iscas.ac.cnSigned-off-by: NPalmer Dabbelt <palmer@rivosinc.com>

9f2ac64d

26 10月, 2022 7 次提交

powerpc/64s/interrupt: Fix clear of PACA_IRQS_HARD_DIS when returning to soft-masked context · 65722736

由 Nicholas Piggin 提交于 10月 22, 2022

Commit a4cb3651 ("powerpc/64s/interrupt: Fix lost interrupts when
returning to soft-masked context") fixed the problem of pending irqs
being cleared when clearing the HARD_DIS bit, but then it didn't clear
the bit at all. This change clears HARD_DIS without affecting other bits
in the mask.

When an interrupt hits in a soft-masked section that has MSR[EE]=1, it
can hard disable and set PACA_IRQS_HARD_DIS, which must be cleared when
returning to the EE=1 caller (unless it was set due to a MUST_HARD_MASK
interrupt becoming pending). Failure to clear this leaves the
returned-to context running with MSR[EE]=1 and PACA_IRQS_HARD_DIS, which
confuses irq assertions and could be dangerous for code that might test
the flag.

This was observed in a hash MMU kernel where a kernel hash fault hits in
a local_irqs_disabled region that has EE=1. The hash fault also runs
with EE=1, then as it returns, a decrementer hits in the restart section
and the irq restart code hard-masks which sets the PACA_IRQ_HARD_DIS
flag, which is not clear when the original context is returned to.
Reported-by: NSachin Sant <sachinp@linux.ibm.com>
Fixes: a4cb3651 ("powerpc/64s/interrupt: Fix lost interrupts when returning to soft-masked context")
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Tested-by: NSachin Sant <sachinp@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221022052207.471328-1-npiggin@gmail.com

65722736

s390/pai: fix raw data collection for PMU pai_ext · 8b1e6a3f

由 Thomas Richter 提交于 10月 20, 2022

Commit 838d9bb6 ("perf: Use sample_flags for raw_data")
changed the way the raw data of an event is collected.
Adjust the PMU pai_ext to the new scheme.

Fixes: 838d9bb6 ("perf: Use sample_flags for raw_data")
Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
Acked-by: NSumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

8b1e6a3f

s390/boot: add secure boot trailer · aa127a06

由 Peter Oberparleiter 提交于 9月 16, 2022

This patch enhances the kernel image adding a trailer as required for
secure boot by future firmware versions.

Cc: <stable@vger.kernel.org> # 5.2+
Signed-off-by: NPeter Oberparleiter <oberpar@linux.ibm.com>
Reviewed-by: NSven Schnelle <svens@linux.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

aa127a06

s390/pci: add missing EX_TABLE entries to __pcistg_mio_inuser()/__pcilg_mio_inuser() · 6ec80302

由 Heiko Carstens 提交于 10月 18, 2022

For some exception types the instruction address points behind the
instruction that caused the exception. Take that into account and add
the missing exception table entry.

Cc: <stable@vger.kernel.org>
Fixes: f058599e ("s390/pci: Fix s390_mmio_read/write with MIO")
Reviewed-by: NNiklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

6ec80302

s390/futex: add missing EX_TABLE entry to __futex_atomic_op() · a262d3ad

由 Heiko Carstens 提交于 10月 18, 2022

For some exception types the instruction address points behind the
instruction that caused the exception. Take that into account and add
the missing exception table entry.

Cc: <stable@vger.kernel.org>
Reviewed-by: NVasily Gorbik <gor@linux.ibm.com>
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

a262d3ad

s390/uaccess: add missing EX_TABLE entries to __clear_user() · 4e1b5a86

由 Heiko Carstens 提交于 10月 18, 2022

For some exception types the instruction address points behind the
instruction that caused the exception. Take that into account and add
the missing exception table entries.

Cc: <stable@vger.kernel.org>
Reviewed-by: NVasily Gorbik <gor@linux.ibm.com>
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

4e1b5a86

riscv: jump_label: mark arguments as const to satisfy asm constraints · 89fd4a1d

由 Jisheng Zhang 提交于 10月 08, 2022

Samuel reported that the static branch usage in cpu_relax() breaks
building with CONFIG_CC_OPTIMIZE_FOR_SIZE:

In file included from <command-line>:
./arch/riscv/include/asm/jump_label.h: In function 'cpu_relax':
././include/linux/compiler_types.h:285:33: warning: 'asm' operand 0
probably does not match constraints
  285 | #define asm_volatile_goto(x...) asm goto(x)
      |                                 ^~~
./arch/riscv/include/asm/jump_label.h:41:9: note: in expansion of macro
'asm_volatile_goto'
   41 |         asm_volatile_goto(
      |         ^~~~~~~~~~~~~~~~~
././include/linux/compiler_types.h:285:33: error: impossible constraint
in 'asm'
  285 | #define asm_volatile_goto(x...) asm goto(x)
      |                                 ^~~
./arch/riscv/include/asm/jump_label.h:41:9: note: in expansion of macro
'asm_volatile_goto'
   41 |         asm_volatile_goto(
      |         ^~~~~~~~~~~~~~~~~
make[1]: *** [scripts/Makefile.build:249:
arch/riscv/kernel/vdso/vgettimeofday.o] Error 1
make: *** [arch/riscv/Makefile:128: vdso_prepare] Error 2

Maybe "-Os" prevents GCC from detecting that the key/branch arguments
can be treated as constants and used as immediate operands. Inspired
by x86's commit 864b4355("x86/jump_label: Mark arguments as const to
satisfy asm constraints"), and as pointed out by Steven: "The "i"
constraint needs to be a constant.", let's do similar modifications to
riscv.

Tested by CC_OPTIMIZE_FOR_SIZE + gcc and CC_OPTIMIZE_FOR_SIZE + clang.

Link: https://lore.kernel.org/linux-riscv/20220922060958.44203-1-samuel@sholland.org/
Link: https://lore.kernel.org/all/20210212094059.5f8d05e8@gandalf.local.home/
Fixes: 8eb060e1 ("arch/riscv: add Zihintpause support")
Reported-by: NSamuel Holland <samuel@sholland.org>
Signed-off-by: NJisheng Zhang <jszhang@kernel.org>
Reviewed-by: NAndrew Jones <ajones@ventanamicro.com>
Tested-by: NHeiko Stuebner <heiko@sntech.de>
Link: https://lore.kernel.org/r/20221008145437.491-1-jszhang@kernel.orgSigned-off-by: NPalmer Dabbelt <palmer@rivosinc.com>

89fd4a1d

25 10月, 2022 1 次提交

x86/mm: Do not verify W^X at boot up · a970174d

由 Steven Rostedt (Google) 提交于 10月 24, 2022

Adding on the kernel command line "ftrace=function" triggered:

  CPA detected W^X violation: 8000000000000063 -> 0000000000000063 range: 0xffffffffc0013000 - 0xffffffffc0013fff PFN 10031b
  WARNING: CPU: 0 PID: 0 at arch/x86/mm/pat/set_memory.c:609
  verify_rwx+0x61/0x6d
  Call Trace:
     __change_page_attr_set_clr+0x146/0x8a6
     change_page_attr_set_clr+0x135/0x268
     change_page_attr_clear.constprop.0+0x16/0x1c
     set_memory_x+0x2c/0x32
     arch_ftrace_update_trampoline+0x218/0x2db
     ftrace_update_trampoline+0x16/0xa1
     __register_ftrace_function+0x93/0xb2
     ftrace_startup+0x21/0xf0
     register_ftrace_function_nolock+0x26/0x40
     register_ftrace_function+0x4e/0x143
     function_trace_init+0x7d/0xc3
     tracer_init+0x23/0x2c
     tracing_set_tracer+0x1d5/0x206
     register_tracer+0x1c0/0x1e4
     init_function_trace+0x90/0x96
     early_trace_init+0x25c/0x352
     start_kernel+0x424/0x6e4
     x86_64_start_reservations+0x24/0x2a
     x86_64_start_kernel+0x8c/0x95
     secondary_startup_64_no_verify+0xe0/0xeb

This is because at boot up, kernel text is writable, and there's no
reason to do tricks to updated it.  But the verifier does not
distinguish updates at boot up and at run time, and causes a warning at
time of boot.

Add a check for system_state == SYSTEM_BOOTING and allow it if that is
the case.

[ These SYSTEM_BOOTING special cases are all pretty horrid, but the x86
  text_poke() code does some odd things at bootup, forcing this for now
    - Linus ]

Link: https://lore.kernel.org/r/20221024112730.180916b3@gandalf.local.home
Fixes: 652c5bf3 ("x86/mm: Refuse W^X violations")
Signed-off-by: NSteven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a970174d

22 10月, 2022 3 次提交

KVM: x86: Add compat handler for KVM_X86_SET_MSR_FILTER · 1739c701

由 Alexander Graf 提交于 10月 17, 2022

The KVM_X86_SET_MSR_FILTER ioctls contains a pointer in the passed in
struct which means it has a different struct size depending on whether
it gets called from 32bit or 64bit code.

This patch introduces compat code that converts from the 32bit struct to
its 64bit counterpart which then gets used going forward internally.
With this applied, 32bit QEMU can successfully set MSR bitmaps when
running on 64bit kernels.
Reported-by: NAndrew Randrianasulu <randrianasulu@gmail.com>
Fixes: 1a155254 ("KVM: x86: Introduce MSR filtering")
Signed-off-by: NAlexander Graf <graf@amazon.com>
Message-Id: <20221017184541.2658-4-graf@amazon.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1739c701

KVM: x86: Copy filter arg outside kvm_vm_ioctl_set_msr_filter() · 2e3272bc

由 Alexander Graf 提交于 10月 17, 2022

In the next patch we want to introduce a second caller to
set_msr_filter() which constructs its own filter list on the stack.
Refactor the original function so it takes it as argument instead of
reading it through copy_from_user().
Signed-off-by: NAlexander Graf <graf@amazon.com>
Message-Id: <20221017184541.2658-3-graf@amazon.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2e3272bc

x86/fpu: Fix copy_xstate_to_uabi() to copy init states correctly · 471f0aa7

由 Chang S. Bae 提交于 10月 21, 2022

When an extended state component is not present in fpstate, but in init
state, the function copies from init_fpstate via copy_feature().

But, dynamic states are not present in init_fpstate because of all-zeros
init states. Then retrieving them from init_fpstate will explode like this:

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 ...
 RIP: 0010:memcpy_erms+0x6/0x10
  ? __copy_xstate_to_uabi_buf+0x381/0x870
  fpu_copy_guest_fpstate_to_uabi+0x28/0x80
  kvm_arch_vcpu_ioctl+0x14c/0x1460 [kvm]
  ? __this_cpu_preempt_check+0x13/0x20
  ? vmx_vcpu_put+0x2e/0x260 [kvm_intel]
  kvm_vcpu_ioctl+0xea/0x6b0 [kvm]
  ? kvm_vcpu_ioctl+0xea/0x6b0 [kvm]
  ? __fget_light+0xd4/0x130
  __x64_sys_ioctl+0xe3/0x910
  ? debug_smp_processor_id+0x17/0x20
  ? fpregs_assert_state_consistent+0x27/0x50
  do_syscall_64+0x3f/0x90
  entry_SYSCALL_64_after_hwframe+0x63/0xcd

Adjust the 'mask' to zero out the userspace buffer for the features that
are not available both from fpstate and from init_fpstate.

The dynamic features depend on the compacted XSAVE format. Ensure it is
enabled before reading XCOMP_BV in init_fpstate.

Fixes: 2308ee57 ("x86/fpu/amx: Enable the AMX feature in 64-bit mode")
Reported-by: NYuan Yao <yuan.yao@intel.com>
Suggested-by: NDave Hansen <dave.hansen@intel.com>
Signed-off-by: NChang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Tested-by: NYuan Yao <yuan.yao@intel.com>
Link: https://lore.kernel.org/lkml/BYAPR11MB3717EDEF2351C958F2C86EED95259@BYAPR11MB3717.namprd11.prod.outlook.com/
Link: https://lkml.kernel.org/r/20221021185844.13472-1-chang.seok.bae@intel.com

471f0aa7

21 10月, 2022 6 次提交

x86/unwind/orc: Fix unreliable stack dump with gcov · 230db824

由 Chen Zhongjin 提交于 7月 27, 2022

When a console stack dump is initiated with CONFIG_GCOV_PROFILE_ALL
enabled, show_trace_log_lvl() gets out of sync with the ORC unwinder,
causing the stack trace to show all text addresses as unreliable:

  # echo l > /proc/sysrq-trigger
  [  477.521031] sysrq: Show backtrace of all active CPUs
  [  477.523813] NMI backtrace for cpu 0
  [  477.524492] CPU: 0 PID: 1021 Comm: bash Not tainted 6.0.0 #65
  [  477.525295] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-1.fc36 04/01/2014
  [  477.526439] Call Trace:
  [  477.526854]  <TASK>
  [  477.527216]  ? dump_stack_lvl+0xc7/0x114
  [  477.527801]  ? dump_stack+0x13/0x1f
  [  477.528331]  ? nmi_cpu_backtrace.cold+0xb5/0x10d
  [  477.528998]  ? lapic_can_unplug_cpu+0xa0/0xa0
  [  477.529641]  ? nmi_trigger_cpumask_backtrace+0x16a/0x1f0
  [  477.530393]  ? arch_trigger_cpumask_backtrace+0x1d/0x30
  [  477.531136]  ? sysrq_handle_showallcpus+0x1b/0x30
  [  477.531818]  ? __handle_sysrq.cold+0x4e/0x1ae
  [  477.532451]  ? write_sysrq_trigger+0x63/0x80
  [  477.533080]  ? proc_reg_write+0x92/0x110
  [  477.533663]  ? vfs_write+0x174/0x530
  [  477.534265]  ? handle_mm_fault+0x16f/0x500
  [  477.534940]  ? ksys_write+0x7b/0x170
  [  477.535543]  ? __x64_sys_write+0x1d/0x30
  [  477.536191]  ? do_syscall_64+0x6b/0x100
  [  477.536809]  ? entry_SYSCALL_64_after_hwframe+0x63/0xcd
  [  477.537609]  </TASK>

This happens when the compiled code for show_stack() has a single word
on the stack, and doesn't use a tail call to show_stack_log_lvl().
(CONFIG_GCOV_PROFILE_ALL=y is the only known case of this.)  Then the
__unwind_start() skip logic hits an off-by-one bug and fails to unwind
all the way to the intended starting frame.

Fix it by reverting the following commit:

  f1d9a2ab ("x86/unwind/orc: Don't skip the first frame for inactive tasks")

The original justification for that commit no longer exists.  That
original issue was later fixed in a different way, with the following
commit:

  f2ac57a4 ("x86/unwind/orc: Fix inactive tasks with stack pointer in %sp on GCC 10 compiled kernels")

Fixes: f1d9a2ab ("x86/unwind/orc: Don't skip the first frame for inactive tasks")
Signed-off-by: NChen Zhongjin <chenzhongjin@huawei.com>
[jpoimboe: rewrite commit log]
Signed-off-by: NJosh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>

230db824

crypto: x86/polyval - Fix crashes when keys are not 16-byte aligned · 9f6035af

由 Nathan Huckleberry 提交于 10月 18, 2022

crypto_tfm::__crt_ctx is not guaranteed to be 16-byte aligned on x86-64.
This causes crashes due to movaps instructions in clmul_polyval_update.

Add logic to align polyval_tfm_ctx to 16 bytes.

Cc: <stable@vger.kernel.org>
Fixes: 34f7f6c3 ("crypto: x86/polyval - Add PCLMULQDQ accelerated implementation of POLYVAL")
Reported-by: NBruno Goncalves <bgoncalv@redhat.com>
Signed-off-by: NNathan Huckleberry <nhuck@google.com>
Reviewed-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

9f6035af

iommu/vt-d: Allow NVS regions in arch_rmrr_sanity_check() · 5566e68d

由 Charlotte Tan 提交于 10月 19, 2022

arch_rmrr_sanity_check() warns if the RMRR is not covered by an ACPI
Reserved region, but it seems like it should accept an NVS region as
well. The ACPI spec
https://uefi.org/specs/ACPI/6.5/15_System_Address_Map_Interfaces.html
uses similar wording for "Reserved" and "NVS" region types; for NVS
regions it says "This range of addresses is in use or reserved by the
system and must not be used by the operating system."

There is an old comment on this mailing list that also suggests NVS
regions should pass the arch_rmrr_sanity_check() test:

 The warnings come from arch_rmrr_sanity_check() since it checks whether
 the region is E820_TYPE_RESERVED. However, if the purpose of the check
 is to detect RMRR has regions that may be used by OS as free memory,
 isn't  E820_TYPE_NVS safe, too?

This patch overlaps with another proposed patch that would add the region
type to the log since sometimes the bug reporter sees this log on the
console but doesn't know to include the kernel log:

https://lore.kernel.org/lkml/20220611204859.234975-3-atomlin@redhat.com/

Here's an example of the "Firmware Bug" apparent false positive (wrapped
for line length):

 DMAR: [Firmware Bug]: No firmware reserved region can cover this RMRR
       [0x000000006f760000-0x000000006f762fff], contact BIOS vendor for
       fixes
 DMAR: [Firmware Bug]: Your BIOS is broken; bad RMRR
       [0x000000006f760000-0x000000006f762fff]

This is the snippet from the e820 table:

 BIOS-e820: [mem 0x0000000068bff000-0x000000006ebfefff] reserved
 BIOS-e820: [mem 0x000000006ebff000-0x000000006f9fefff] ACPI NVS
 BIOS-e820: [mem 0x000000006f9ff000-0x000000006fffefff] ACPI data

Fixes: f036c7fa ("iommu/vt-d: Check VT-d RMRR region in BIOS is reported as reserved")
Cc: Will Mortensen <will@extrahop.com>
Link: https://lore.kernel.org/linux-iommu/64a5843d-850d-e58c-4fc2-0a0eeeb656dc@nec.com/
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216443Signed-off-by: NCharlotte Tan <charlotte@extrahop.com>
Reviewed-by: NAaron Tomlin <atomlin@redhat.com>
Link: https://lore.kernel.org/r/20220929044449.32515-1-charlotte@extrahop.comSigned-off-by: NLu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

5566e68d

RISC-V: KVM: Fix kvm_riscv_vcpu_timer_pending() for Sstc · cea8896b

由 Anup Patel 提交于 10月 21, 2022

The kvm_riscv_vcpu_timer_pending() checks per-VCPU next_cycles
and per-VCPU software injected VS timer interrupt. This function
returns incorrect value when Sstc is available because the per-VCPU
next_cycles are only updated by kvm_riscv_vcpu_timer_save() called
from kvm_arch_vcpu_put(). As a result, when Sstc is available the
VCPU does not block properly upon WFI traps.

To fix the above issue, we introduce kvm_riscv_vcpu_timer_sync()
which will update per-VCPU next_cycles upon every VM exit instead
of kvm_riscv_vcpu_timer_save().

Fixes: 8f5cb44b ("RISC-V: KVM: Support sstc extension")
Signed-off-by: NAnup Patel <apatel@ventanamicro.com>
Reviewed-by: NAtish Patra <atishp@rivosinc.com>
Signed-off-by: NAnup Patel <anup@brainfault.org>

cea8896b

RISC-V: Fix compilation without RISCV_ISA_ZICBOM · 5c20a3a9

由 Andrew Jones 提交于 10月 21, 2022

riscv_cbom_block_size and riscv_init_cbom_blocksize() should always
be available and riscv_init_cbom_blocksize() should always be
invoked, even when compiling without RISCV_ISA_ZICBOM enabled. This
is because disabling RISCV_ISA_ZICBOM means "don't use zicbom
instructions in the kernel" not "pretend there isn't zicbom, even
when there is". When zicbom is available, whether the kernel enables
its use with RISCV_ISA_ZICBOM or not, KVM will offer it to guests.
Ensure we can build KVM and that the block size is initialized even
when compiling without RISCV_ISA_ZICBOM.

Fixes: 8f7e001e ("RISC-V: Clean up the Zicbom block size probing")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NAndrew Jones <ajones@ventanamicro.com>
Signed-off-by: NAnup Patel <apatel@ventanamicro.com>
Reviewed-by: NConor Dooley <conor.dooley@microchip.com>
Reviewed-by: NHeiko Stuebner <heiko@sntech.de>
Tested-by: NHeiko Stuebner <heiko@sntech.de>
Signed-off-by: NAnup Patel <anup@brainfault.org>

5c20a3a9

bpf: Fix dispatcher patchable function entry to 5 bytes nop · dbe69b29

由 Jiri Olsa 提交于 10月 18, 2022

The patchable_function_entry(5) might output 5 single nop
instructions (depends on toolchain), which will clash with
bpf_arch_text_poke check for 5 bytes nop instruction.

Adding early init call for dispatcher that checks and change
the patchable entry into expected 5 nop instruction if needed.

There's no need to take text_mutex, because we are using it
in early init call which is called at pre-smp time.

Fixes: ceea991a ("bpf: Move bpf_dispatcher function out of ftrace locations")
Signed-off-by: NJiri Olsa <jolsa@kernel.org>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221018075934.574415-1-jolsa@kernel.orgSigned-off-by: NAlexei Starovoitov <ast@kernel.org>

dbe69b29

20 10月, 2022 3 次提交

perf/x86/intel/lbr: Use setup_clear_cpu_cap() instead of clear_cpu_cap() · b329f5dd

由 Maxim Levitsky 提交于 7月 18, 2022

clear_cpu_cap(&boot_cpu_data) is very similar to setup_clear_cpu_cap()
except that the latter also sets a bit in 'cpu_caps_cleared' which
later clears the same cap in secondary cpus, which is likely what is
meant here.

Fixes: 47125db2 ("perf/x86/intel/lbr: Support Architectural LBR")
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NKan Liang <kan.liang@linux.intel.com>
Link: https://lkml.kernel.org/r/20220718141123.136106-2-mlevitsk@redhat.com

b329f5dd

ftrace,kcfi: Separate ftrace_stub() and ftrace_stub_graph() · 883bbbff

由 Peter Zijlstra 提交于 10月 18, 2022

Different function signatures means they needs to be different
functions; otherwise CFI gets upset.

As triggered by the ftrace boot tests:

  [] CFI failure at ftrace_return_to_handler+0xac/0x16c (target: ftrace_stub+0x0/0x14; expected type: 0x0a5d5347)

Fixes: 3c516f89 ("x86: Add support for CONFIG_CFI_CLANG")
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NMark Rutland <mark.rutland@arm.com>
Tested-by: NMark Rutland <mark.rutland@arm.com>
Link: https://lkml.kernel.org/r/Y06dg4e1xF6JTdQq@hirez.programming.kicks-ass.net

883bbbff

x86/ftrace: Remove ftrace_epilogue() · b5f1fc31

由 Peter Zijlstra 提交于 9月 15, 2022

Remove the weird jumps to RET and simply use RET.

This then promotes ftrace_stub() to a real function; which becomes
important for kcfi.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220915111148.719080593@infradead.orgSigned-off-by: NPeter Zijlstra <peterz@infradead.org>

b5f1fc31

19 10月, 2022 1 次提交

x86/resctrl: Fix min_cbm_bits for AMD · 67bf6493

由 Babu Moger 提交于 9月 27, 2022

AMD systems support zero CBM (capacity bit mask) for cache allocation.
That is reflected in rdt_init_res_defs_amd() by:

  r->cache.arch_has_empty_bitmaps = true;

However given the unified code in cbm_validate(), checking for:

  val == 0 && !arch_has_empty_bitmaps

is not enough because of another check in cbm_validate():

  if ((zero_bit - first_bit) < r->cache.min_cbm_bits)

The default value of r->cache.min_cbm_bits = 1.

Leading to:

  $ cd /sys/fs/resctrl
  $ mkdir foo
  $ cd foo
  $ echo L3:0=0 > schemata
    -bash: echo: write error: Invalid argument
  $ cat /sys/fs/resctrl/info/last_cmd_status
    Need at least 1 bits in the mask

Initialize the min_cbm_bits to 0 for AMD. Also, remove the default
setting of min_cbm_bits and initialize it separately.

After the fix:

  $ cd /sys/fs/resctrl
  $ mkdir foo
  $ cd foo
  $ echo L3:0=0 > schemata
  $ cat /sys/fs/resctrl/info/last_cmd_status
    ok

Fixes: 316e7f90 ("x86/resctrl: Add struct rdt_cache::arch_has_{sparse, empty}_bitmaps")
Co-developed-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NBabu Moger <babu.moger@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NIngo Molnar <mingo@kernel.org>
Reviewed-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NReinette Chatre <reinette.chatre@intel.com>
Reviewed-by: NFenghua Yu <fenghua.yu@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/lkml/20220517001234.3137157-1-eranian@google.com

67bf6493

18 10月, 2022 15 次提交

powerpc/64s/interrupt: Perf NMI should not take normal exit path · dc398a08

由 Nicholas Piggin 提交于 10月 07, 2022

NMI interrupts should exit with EXCEPTION_RESTORE_REGS not with
interrupt_return_srr, which is what the perf NMI handler currently does.
This breaks if a PMI hits after interrupt_exit_user_prepare_main() has
switched the context tracking to user mode, then the CT_WARN_ON() in
interrupt_exit_kernel_prepare() fires because it returns to kernel with
context set to user.

This could possibly be solved by soft-disabling PMIs in the exit path,
but that reduces our ability to profile that code. The warning could be
removed, but it's potentially useful.

All other NMIs and soft-NMIs return using EXCEPTION_RESTORE_REGS, so
this makes perf interrupts consistent with that and seems like the best
fix.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
[mpe: Squash in fixups from Nick]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221006140413.126443-3-npiggin@gmail.com

dc398a08

powerpc/64/interrupt: Prevent NMI PMI causing a dangerous warning · a073672e

由 Nicholas Piggin 提交于 10月 14, 2022

NMI PMIs really should not return using the normal interrupt_return
function. If such a PMI hits in code returning to user with the context
switched to user mode, this warning can fire. This was enough to cause
crashes when reproducing on 64s, because another perf interrupt would
hit while reporting bug, and that would cause another bug, and so on
until smashing the stack.

Work around that particular crash for now by just disabling that context
warning for PMIs. This is a hack and not a complete fix, there could be
other such problems lurking in corners. But it does fix the known crash.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221014030729.2077151-3-npiggin@gmail.com

a073672e

KVM: PPC: BookS PR-KVM and BookE do not support context tracking · e59b3399

由 Nicholas Piggin 提交于 10月 14, 2022

The context tracking code in PR-KVM and BookE implementations is not
complete, and can cause host crashes if context tracking is enabled.

Make these implementations depend on !CONTEXT_TRACKING_USER.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221014030729.2077151-2-npiggin@gmail.com

e59b3399

powerpc: Fix reschedule bug in KUAP-unlocked user copy · 00ff1eaa

由 Nicholas Piggin 提交于 10月 14, 2022

schedule must not be explicitly called while KUAP is unlocked, because
the AMR register will not be saved across the context switch on
64s (preemption is allowed because that is driven by interrupts which do
save the AMR).

exit_vmx_usercopy() runs inside an unlocked user access region, and it
calls preempt_enable() which will call schedule() if need_resched() was
set while non-preemptible. This can cause tasks to run unprotected when
the should not, and can cause the user copy to be improperly blocked
when scheduling back to it.

Fix this by avoiding the explicit resched for preempt kernels by
generating an interrupt to reschedule the context if need_resched() got
set.
Reported-by: NSamuel Holland <samuel@sholland.org>
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Tested-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221013151647.1857994-3-npiggin@gmail.com

00ff1eaa

powerpc/64s: Fix hash__change_memory_range preemption warning · 2b2095f3

由 Nicholas Piggin 提交于 10月 14, 2022

stop_machine_cpuslocked takes a mutex so it must be called in a
preemptible context, so it can't simply be fixed by disabling
preemption.

This is not a bug, because CPU hotplug is locked, so this processor will
call in to the stop machine function. So raw_smp_processor_id() could be
used. This leaves a small chance that this thread will be migrated to
another CPU, so the master work would be done by a CPU from a different
context. Better for test coverage to make that a common case by just
having the first CPU to call in become the master.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Tested-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221013151647.1857994-2-npiggin@gmail.com

2b2095f3

powerpc/64s: Disable preemption in hash lazy mmu mode · b9ef323e

由 Nicholas Piggin 提交于 10月 14, 2022

apply_to_page_range on kernel pages does not disable preemption, which
is a requirement for hash's lazy mmu mode, which keeps track of the
TLBs to flush with a per-cpu array.
Reported-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Tested-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221013151647.1857994-1-npiggin@gmail.com

b9ef323e

powerpc/64s: make linear_map_hash_lock a raw spinlock · b12eb279

由 Nicholas Piggin 提交于 10月 14, 2022

This lock is taken while the raw kfence_freelist_lock is held, so it
must also be a raw spinlock, as reported by lockdep when raw lock
nesting checking is enabled.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221013230710.1987253-3-npiggin@gmail.com

b12eb279

powerpc/64s: make HPTE lock and native_tlbie_lock irq-safe · 35159b57

由 Nicholas Piggin 提交于 10月 14, 2022

With kfence enabled, there are several cases where HPTE and TLBIE locks
are called from softirq context, for example:

  WARNING: inconsistent lock state
  6.0.0-11845-g0cbbc95b12ac #1 Tainted: G                 N
  --------------------------------
  inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
  swapper/0/1 [HC0[0]:SC0[0]:HE1:SE1] takes:
  c000000002734de8 (native_tlbie_lock){+.?.}-{2:2}, at: .native_hpte_updateboltedpp+0x1a4/0x600
  {IN-SOFTIRQ-W} state was registered at:
    .lock_acquire+0x20c/0x520
    ._raw_spin_lock+0x4c/0x70
    .native_hpte_invalidate+0x62c/0x840
    .hash__kernel_map_pages+0x450/0x640
    .kfence_protect+0x58/0xc0
    .kfence_guarded_free+0x374/0x5a0
    .__slab_free+0x3d0/0x630
    .put_cred_rcu+0xcc/0x120
    .rcu_core+0x3c4/0x14e0
    .__do_softirq+0x1dc/0x7dc
    .do_softirq_own_stack+0x40/0x60

Fix this by consistently disabling irqs while taking either of these
locks. Don't just disable bh because several of the more common cases
already disable irqs, so this just makes the locks always irq-safe.
Reported-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221013230710.1987253-2-npiggin@gmail.com

35159b57

powerpc/64s: Add lockdep for HPTE lock · be83d548

由 Nicholas Piggin 提交于 10月 14, 2022

Add lockdep annotation for the HPTE bit-spinlock. Modern systems don't
take the tlbie lock, so this shows up some of the same lockdep warnings
that were being reported by the ppc970. And they're not taken in exactly
the same places so this is nice to have in its own right.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221013230710.1987253-1-npiggin@gmail.com

be83d548

powerpc/pseries: Use lparcfg to reconfig VAS windows for DLPAR CPU · 2147783d

由 Haren Myneni 提交于 10月 06, 2022

The hypervisor assigns VAS (Virtual Accelerator Switchboard)
windows depends on cores configured in LPAR. The kernel uses
OF reconfig notifier to reconfig VAS windows for DLPAR CPU event.
In the case of shared CPU mode partition, the hypervisor assigns
VAS windows depends on CPU entitled capacity, not based on vcpus.
When the user changes CPU entitled capacity for the partition,
drmgr uses /proc/ppc64/lparcfg interface to notify the kernel.

This patch adds the following changes to update VAS resources
for shared mode:
- Call vas reconfig windows from lparcfg_write()
- Ignore reconfig changes in the VAS notifier
Signed-off-by: NHaren Myneni <haren@linux.ibm.com>
[mpe: Rework error handling, report any errors as EIO]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/efa9c16e4a78dda4567a16f13dabfd73cb4674a2.camel@linux.ibm.com

2147783d

powerpc/pseries/vas: Add VAS IRQ primary handler · 89ed0b76

由 Haren Myneni 提交于 10月 09, 2022

irq_default_primary_handler() can be used only with IRQF_ONESHOT
flag, but the flag disables IRQ before executing the thread handler
and enables it after the interrupt is handled. But this IRQ disable
sets the VAS IRQ OFF state in the hypervisor. In case if NX faults
during this window, the hypervisor will not deliver the fault
interrupt to the partition and the user space may wait continuously
for the CSB update. So use VAS specific IRQ handler instead of
calling the default primary handler.

Increment pending_faults counter in IRQ handler and the bottom
thread handler will process all faults based on this counter.
In case if the another interrupt is received while the thread is
running, it will be processed using this counter. The synchronization
of top and bottom handlers will be done with IRQTF_RUNTHREAD flag
and will re-enter to bottom half if this flag is set.
Signed-off-by: NHaren Myneni <haren@linux.ibm.com>
Reviewed-by: NFrederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/aaad8813b4762a6753cfcd0b605a7574a5192ec7.camel@linux.ibm.com

89ed0b76

x86/microcode/AMD: Apply the patch early on every logical thread · e7ad18d1

由 Borislav Petkov 提交于 10月 05, 2022

Currently, the patch application logic checks whether the revision
needs to be applied on each logical CPU (SMT thread). Therefore, on SMT
designs where the microcode engine is shared between the two threads,
the application happens only on one of them as that is enough to update
the shared microcode engine.

However, there are microcode patches which do per-thread modification,
see Link tag below.

Therefore, drop the revision check and try applying on each thread. This
is what the BIOS does too so this method is very much tested.

Btw, change only the early paths. On the late loading paths, there's no
point in doing per-thread modification because if is it some case like
in the bugzilla below - removing a CPUID flag - the kernel cannot go and
un-use features it has detected are there early. For that, one should
use early loading anyway.

  [ bp: Fixes does not contain the oldest commit which did check for
    equality but that is good enough. ]

Fixes: 8801b3fc ("x86/microcode/AMD: Rework container parsing")
Reported-by: NȘtefan Talpalaru <stefantalpalaru@yahoo.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Tested-by: NȘtefan Talpalaru <stefantalpalaru@yahoo.com>
Cc: <stable@vger.kernel.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216211

e7ad18d1

ARC: mm: fix leakage of memory allocated for PTE · 4fd9df10

由 Pavel Kozlov 提交于 10月 17, 2022

Since commit d9820ff7 ("ARC: mm: switch pgtable_t back to struct page *")
a memory leakage problem occurs. Memory allocated for page table entries
not released during process termination. This issue can be reproduced by
a small program that allocates a large amount of memory. After several
runs, you'll see that the amount of free memory has reduced and will
continue to reduce after each run. All ARC CPUs are effected by this
issue. The issue was introduced since the kernel stable release v5.15-rc1.

As described in commit d9820ff7 after switch pgtable_t back to struct
page *, a pointer to "struct page" and appropriate functions are used to
allocate and free a memory page for PTEs, but the pmd_pgtable macro hasn't
changed and returns the direct virtual address from the PMD (PGD) entry.
Than this address used as a parameter in the __pte_free() and as a result
this function couldn't release memory page allocated for PTEs.

Fix this issue by changing the pmd_pgtable macro and returning pointer to
struct page.

Fixes: d9820ff7 ("ARC: mm: switch pgtable_t back to struct page *")
Cc: Mike Rapoport <rppt@kernel.org>
Cc: <stable@vger.kernel.org> # 5.15.x
Signed-off-by: NPavel Kozlov <pavel.kozlov@synopsys.com>
Signed-off-by: NVineet Gupta <vgupta@kernel.org>

4fd9df10

arc: update config files · 2df1f4a7

由 Lukas Bulwahn 提交于 9月 29, 2022

Clean up config files by:
  - removing configs that were deleted in the past
  - removing configs not in tree and without recently pending patches
  - adding new configs that are replacements for old configs in the file

For some detailed information, see Link.

Link: https://lore.kernel.org/kernel-janitors/20220929090645.1389-1-lukas.bulwahn@gmail.com/Signed-off-by: NLukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: NVineet Gupta <vgupta@kernel.org>

2df1f4a7

arc: iounmap() arg is volatile · c44f15c1

由 Randy Dunlap 提交于 10月 09, 2022

Add 'volatile' to iounmap()'s argument to prevent build warnings.
This make it the same as other major architectures.

Placates these warnings: (12 such warnings)

../drivers/video/fbdev/riva/fbdev.c: In function 'rivafb_probe':
../drivers/video/fbdev/riva/fbdev.c:2067:42: error: passing argument 1 of 'iounmap' discards 'volatile' qualifier from pointer target type [-Werror=discarded-qualifiers]
 2067 |                 iounmap(default_par->riva.PRAMIN);

Fixes: 1162b070 ("ARC: I/O and DMA Mappings")
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: linux-snps-arc@lists.infradead.org
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: NVineet Gupta <vgupta@kernel.org>

c44f15c1

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功