提交 · 55923e4d7d195a34d3b1faaba57a5a6551e88b36 · openeuler / Kernel

01 5月, 2020 1 次提交

arm64: vdso: Add -fasynchronous-unwind-tables to cflags · 1578e5d0

由 Vincenzo Frascino 提交于 4月 29, 2020

On arm64 linux gcc uses -fasynchronous-unwind-tables -funwind-tables
by default since gcc-8, so now the de facto platform ABI is to allow
unwinding from async signal handlers.

However on bare metal targets (aarch64-none-elf), and on old gcc,
async and sync unwind tables are not enabled by default to avoid
runtime memory costs.

This means if linux is built with a baremetal toolchain the vdso.so
may not have unwind tables which breaks the gcc platform ABI guarantee
in userspace.

Add -fasynchronous-unwind-tables explicitly to the vgettimeofday.o
cflags to address the ABI change.

Fixes: 28b1a824 ("arm64: vdso: Substitute gettimeofday() with C implementation")
Cc: Will Deacon <will@kernel.org>
Reported-by: NSzabolcs Nagy <szabolcs.nagy@arm.com>
Signed-off-by: NVincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

1578e5d0

25 4月, 2020 5 次提交

s390/protvirt: fix compilation issue · 673deb0b

由 Claudio Imbrenda 提交于 4月 23, 2020

The kernel fails to compile with CONFIG_PROTECTED_VIRTUALIZATION_GUEST
set but CONFIG_KVM unset.

This patch fixes the issue by making the needed variable always available.

Link: https://lkml.kernel.org/r/20200423120114.2027410-1-imbrenda@linux.ibm.com
Fixes: a0f60f84 ("s390/protvirt: Add sysfs firmware interface for Ultravisor information")
Reported-by: Nkbuild test robot <lkp@intel.com>
Reported-by: NPhilipp Rudo <prudo@linux.ibm.com>
Suggested-by: NPhilipp Rudo <prudo@linux.ibm.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Reviewed-by: NVasily Gorbik <gor@linux.ibm.com>
Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NClaudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

673deb0b

bpf, x86_32: Fix logic error in BPF_LDX zero-extension · 5ca1ca01

由 Wang YanQing 提交于 4月 23, 2020

When verifier_zext is true, we don't need to emit code
for zero-extension.

Fixes: 836256bf ("x32: bpf: eliminate zero extension code-gen")
Signed-off-by: NWang YanQing <udknight@gmail.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200423050637.GA4029@udknight

5ca1ca01

bpf, x86_32: Fix clobbering of dst for BPF_JSET · 50fe7ebb

由 Luke Nelson 提交于 4月 22, 2020

The current JIT clobbers the destination register for BPF_JSET BPF_X
and BPF_K by using "and" and "or" instructions. This is fine when the
destination register is a temporary loaded from a register stored on
the stack but not otherwise.

This patch fixes the problem (for both BPF_K and BPF_X) by always loading
the destination register into temporaries since BPF_JSET should not
modify the destination register.

This bug may not be currently triggerable as BPF_REG_AX is the only
register not stored on the stack and the verifier uses it in a limited
way.

Fixes: 03f5781b ("bpf, x86_32: add eBPF JIT compiler for ia32")
Signed-off-by: NXi Wang <xi.wang@gmail.com>
Signed-off-by: NLuke Nelson <luke.r.nels@gmail.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NWang YanQing <udknight@gmail.com>
Link: https://lore.kernel.org/bpf/20200422173630.8351-2-luke.r.nels@gmail.com

50fe7ebb

bpf, x86_32: Fix incorrect encoding in BPF_LDX zero-extension · 5fa9a98f

由 Luke Nelson 提交于 4月 22, 2020

The current JIT uses the following sequence to zero-extend into the
upper 32 bits of the destination register for BPF_LDX BPF_{B,H,W},
when the destination register is not on the stack:

  EMIT3(0xC7, add_1reg(0xC0, dst_hi), 0);

The problem is that C7 /0 encodes a MOV instruction that requires a 4-byte
immediate; the current code emits only 1 byte of the immediate. This
means that the first 3 bytes of the next instruction will be treated as
the rest of the immediate, breaking the stream of instructions.

This patch fixes the problem by instead emitting "xor dst_hi,dst_hi"
to clear the upper 32 bits. This fixes the problem and is more efficient
than using MOV to load a zero immediate.

This bug may not be currently triggerable as BPF_REG_AX is the only
register not stored on the stack and the verifier uses it in a limited
way, and the verifier implements a zero-extension optimization. But the
JIT should avoid emitting incorrect encodings regardless.

Fixes: 03f5781b ("bpf, x86_32: add eBPF JIT compiler for ia32")
Signed-off-by: NXi Wang <xi.wang@gmail.com>
Signed-off-by: NLuke Nelson <luke.r.nels@gmail.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Reviewed-by: NH. Peter Anvin (Intel) <hpa@zytor.com>
Acked-by: NWang YanQing <udknight@gmail.com>
Link: https://lore.kernel.org/bpf/20200422173630.8351-1-luke.r.nels@gmail.com

5fa9a98f

riscv: select ARCH_HAS_STRICT_KERNEL_RWX only if MMU · a5fe13c7

由 Damien Le Moal 提交于 4月 13, 2020

ARCH_HAS_STRICT_KERNEL_RWX is not useful for NO-MMU systems.
Furthermore, has this option leads to very large boot image files on
64bits architectures, do not enable this option to allow supporting
no-mmu platforms such as the Kendryte K210 SoC based boards.

Fixes: 00cb41d5 ("riscv: add alignment for text, rodata and data sections")
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NAnup Patel <anup@brainfault.org>
Reviewed-by: NWladimir J. van der Laan <laanwj@gmail.com>
Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>

a5fe13c7

23 4月, 2020 8 次提交

M
h8300: ignore vmlinux.lds · d9451798
由 Masahiro Yamada 提交于 4月 23, 2020
```
Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
```
d9451798

um: ensure `make ARCH=um mrproper` removes arch/$(SUBARCH)/include/generated/ · 63ec90f1

由 Vitor Massaru Iha 提交于 4月 21, 2020

In this workflow:

$ make ARCH=um defconfig && make ARCH=um -j8
  [snip]
$ make ARCH=um mrproper
  [snip]
$ make ARCH=um defconfig O=./build_um && make ARCH=um -j8 O=./build_um
  [snip]
  CC      scripts/mod/empty.o
In file included from ../include/linux/types.h:6,
                 from ../include/linux/mod_devicetable.h:12,
                 from ../scripts/mod/devicetable-offsets.c:3:
../include/uapi/linux/types.h:5:10: fatal error: asm/types.h: No such file or directory
    5 | #include <asm/types.h>
      |          ^~~~~~~~~~~~~
compilation terminated.
make[2]: *** [../scripts/Makefile.build:100: scripts/mod/devicetable-offsets.s] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [/home/iha/sdb/opensource/lkmp/linux-kselftest.git/Makefile:1140: prepare0] Error 2
make[1]: Leaving directory '/home/iha/sdb/opensource/lkmp/linux-kselftest.git/build_um'
make: *** [Makefile:180: sub-make] Error 2

The cause of the error was because arch/$(SUBARCH)/include/generated files
weren't properly cleaned by `make ARCH=um mrproper`.

Fixes: a788b2ed ("kbuild: check arch/$(SRCARCH)/include/generated before out-of-tree build")
Reported-by: NTheodore Ts'o <tytso@mit.edu>
Suggested-by: NMasahiro Yamada <masahiroy@kernel.org>
Signed-off-by: NVitor Massaru Iha <vitor@massaru.org>
Reviewed-by: NBrendan Higgins <brendanhiggins@google.com>
Tested-by: NBrendan Higgins <brendanhiggins@google.com>
Link: https://groups.google.com/forum/#!msg/kunit-dev/QmA27YEgEgI/hvS1kiz2CwAJSigned-off-by: NMasahiro Yamada <masahiroy@kernel.org>

63ec90f1

arch: split MODULE_ARCH_VERMAGIC definitions out to <asm/vermagic.h> · 62d0fd59

由 Masahiro Yamada 提交于 4月 22, 2020

As the bug report [1] pointed out, <linux/vermagic.h> must be included
after <linux/module.h>.

I believe we should not impose any include order restriction. We often
sort include directives alphabetically, but it is just coding style
convention. Technically, we can include header files in any order by
making every header self-contained.

Currently, arch-specific MODULE_ARCH_VERMAGIC is defined in
<asm/module.h>, which is not included from <linux/vermagic.h>.

Hence, the straight-forward fix-up would be as follows:

|--- a/include/linux/vermagic.h
|+++ b/include/linux/vermagic.h
|@@ -1,5 +1,6 @@
| /* SPDX-License-Identifier: GPL-2.0 */
| #include <generated/utsrelease.h>
|+#include <linux/module.h>
|
| /* Simply sanity version stamp for modules. */
| #ifdef CONFIG_SMP

This works enough, but for further cleanups, I split MODULE_ARCH_VERMAGIC
definitions into <asm/vermagic.h>.

With this, <linux/module.h> and <linux/vermagic.h> will be orthogonal,
and the location of MODULE_ARCH_VERMAGIC definitions will be consistent.

For arc and ia64, MODULE_PROC_FAMILY is only used for defining
MODULE_ARCH_VERMAGIC. I squashed it.

For hexagon, nds32, and xtensa, I removed <asm/modules.h> entirely
because they contained nothing but MODULE_ARCH_VERMAGIC definition.
Kbuild will automatically generate <asm/modules.h> at build-time,
wrapping <asm-generic/module.h>.

[1] https://lore.kernel.org/lkml/20200411155623.GA22175@zn.tnicReported-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
Acked-by: NJessica Yu <jeyu@kernel.org>

62d0fd59

x86, sched: Move check for CPU type to caller function · db441bd9

由 Giovanni Gherdovich 提交于 4月 16, 2020

Improve readability of the function intel_set_max_freq_ratio() by moving
the check for KNL CPUs there, together with checks for GLM and SKX.
Signed-off-by: NGiovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lkml.kernel.org/r/20200416054745.740-5-ggherdovich@suse.cz

db441bd9

x86, sched: Don't enable static key when starting secondary CPUs · b56e7d45

由 Peter Zijlstra (Intel) 提交于 4月 16, 2020

The static key arch_scale_freq_key only needs to be enabled once (at
boot). This change fixes a bug by which the key was enabled every time cpu0
is started, even as a secondary CPU during cpu hotplug. Secondary CPUs are
started from the idle thread: setting a static key from there means
acquiring a lock and may result in sleeping in the idle task, causing CPU
lockup.

Another consequence of this change is that init_counter_refs() is now
called on each CPU correctly; previously the function on_each_cpu() was
used, but it was called at boot when the only online cpu is cpu0.

[ggherdovich@suse.cz: Tested and wrote changelog]
Fixes: 1567c3e3 ("x86, sched: Add support for frequency invariance")
Reported-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NGiovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lkml.kernel.org/r/20200416054745.740-4-ggherdovich@suse.cz

b56e7d45

x86, sched: Account for CPUs with less than 4 cores in freq. invariance · 23ccee22

由 Giovanni Gherdovich 提交于 4月 16, 2020

If a CPU has less than 4 physical cores, MSR_TURBO_RATIO_LIMIT will
rightfully report that the 4C turbo ratio is zero. In such cases, use the
1C turbo ratio instead for frequency invariance calculations.

Fixes: 1567c3e3 ("x86, sched: Add support for frequency invariance")
Reported-by: NLike Xu <like.xu@linux.intel.com>
Reported-by: NNeil Rickert <nwr10cst-oslnx@yahoo.com>
Signed-off-by: NGiovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: NDave Kleikamp <dave.kleikamp@oracle.com>
Link: https://lkml.kernel.org/r/20200416054745.740-3-ggherdovich@suse.cz

23ccee22

x86, sched: Bail out of frequency invariance if base frequency is unknown · 9a6c2c3c

由 Giovanni Gherdovich 提交于 4月 16, 2020

Some hypervisors such as VMWare ESXi 5.5 advertise support for
X86_FEATURE_APERFMPERF but then fill all MSR's with zeroes. In particular,
MSR_PLATFORM_INFO set to zero tricks the code that wants to know the base
clock frequency of the CPU (highest non-turbo frequency), producing a
division by zero when computing the ratio turbo_freq/base_freq necessary
for frequency invariant accounting.

It is to be noted that even if MSR_PLATFORM_INFO contained the appropriate
data, APERF and MPERF are constantly zero on ESXi 5.5, thus freq-invariance
couldn't be done in principle (not that it would make a lot of sense in a
VM anyway). The real problem is advertising X86_FEATURE_APERFMPERF. This
appears to be fixed in more recent versions: ESXi 6.7 doesn't advertise
that feature.

Fixes: 1567c3e3 ("x86, sched: Add support for frequency invariance")
Signed-off-by: NGiovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lkml.kernel.org/r/20200416054745.740-2-ggherdovich@suse.cz

9a6c2c3c

perf/x86/cstate: Add Jasper Lake CPU support · 5b16ef2e

由 Harry Pan 提交于 4月 02, 2020

The Jasper Lake processor is Tremont microarchitecture, reuse the
glm_cstates table of Goldmont and Goldmont Plus to enable the C-states
residency profiling.
Signed-off-by: NHarry Pan <harry.pan@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200402190658.1.Ic02e891daac41303aed1f2fc6c64f6110edd27bd@changeid

5b16ef2e

22 4月, 2020 10 次提交

s390/pci: do not set affinity for floating irqs · 86dbf32d

由 Niklas Schnelle 提交于 4月 16, 2020

with the introduction of CPU directed interrupts the kernel
parameter pci=force_floating was introduced to fall back
to the previous behavior using floating irqs.

However we were still setting the affinity in that case,
both in __irq_alloc_descs() and via the irq_set_affinity
callback in struct irq_chip.

For the former only set the affinity in the directed case.

The latter is explicitly set in zpci_directed_irq_init()
so we can just leave it unset for the floating case.

Fixes: e979ce7b ("s390/pci: provide support for CPU directed interrupts")
Co-developed-by: NAlexander Schmidt <alexs@linux.ibm.com>
Signed-off-by: NAlexander Schmidt <alexs@linux.ibm.com>
Signed-off-by: NNiklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

86dbf32d

s390/ftrace: fix potential crashes when switching tracers · 8ebf6da9

由 Philipp Rudo 提交于 4月 06, 2020

Switching tracers include instruction patching. To prevent that a
instruction is patched while it's read the instruction patching is done
in stop_machine 'context'. This also means that any function called
during stop_machine must not be traced. Thus add 'notrace' to all
functions called within stop_machine.

Fixes: 1ec2772e ("s390/diag: add a statistic for diagnose calls")
Fixes: 38f2c691 ("s390: improve wait logic of stop_machine")
Fixes: 4ecf0a43 ("processor: get rid of cpu_relax_yield")
Signed-off-by: NPhilipp Rudo <prudo@linux.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>

8ebf6da9

powerpc/mm: Fix CONFIG_PPC_KUAP_DEBUG on PPC32 · feb8e960

由 Christophe Leroy 提交于 4月 17, 2020

CONFIG_PPC_KUAP_DEBUG is not selectable because it depends on PPC_32
which doesn't exists.

Fixing it leads to a deadlock due to a vital register getting
clobbered in _switch().

Change dependency to PPC32 and use r0 instead of r4 in _switch()

Fixes: e2fb9f54 ("powerpc/32: Prepare for Kernel Userspace Access Protection")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/540242f7d4573f7cdf1b3bf46bb35f743b2cd68f.1587124651.git.christophe.leroy@c-s.fr

feb8e960

powerpc/8xx: Fix STRICT_KERNEL_RWX startup test failure · b61c38ba

由 Christophe Leroy 提交于 4月 20, 2020

WRITE_RO lkdtm test works.

But when selecting CONFIG_DEBUG_RODATA_TEST, the kernel reports
	rodata_test: test data was not read only

This is because when rodata test runs, there are still old entries
in TLB.

Flush TLB after setting kernel pages RO or NX.

Fixes: d5f17ee9 ("powerpc/8xx: don't disable large TLBs with CONFIG_STRICT_KERNEL_RWX")
Cc: stable@vger.kernel.org # v5.1+
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/485caac75f195f18c11eb077b0031fdd2bb7fb9e.1587361039.git.christophe.leroy@c-s.fr

b61c38ba

riscv: sbi: Fix undefined reference to sbi_shutdown · 7d0ce3b2

由 Kefeng Wang 提交于 4月 17, 2020

There is no shutdown call in SBI v0.2, only set pm_power_off
when RISCV_SBI_V01 enabled to fix following build error,

riscv64-linux-ld: arch/riscv/kernel/sbi.o: in function `sbi_power_off':
sbi.c:(.text+0xe): undefined reference to `sbi_shutdown

Fixes: efca1398 ("RISC-V: Introduce a new config for SBI v0.1")
Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: NAnup Patel <anup@brainfault.org>
Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>

7d0ce3b2

riscv: sbi: Correct sbi_shutdown() and sbi_clear_ipi() export · 72df61d9

由 Kefeng Wang 提交于 4月 17, 2020

Fix incorrect EXPORT_SYMBOL().

Fixes: efca1398 ("RISC-V: Introduce a new config for SBI v0.1")
Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: NAnup Patel <anup@brainfault.org>
Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>

72df61d9

riscv: fix vdso build with lld · 3c1918c8

由 Ilie Halip 提交于 4月 15, 2020

When building with the LLVM linker this error occurrs:
    LD      arch/riscv/kernel/vdso/vdso-syms.o
  ld.lld: error: no input files

This happens because the lld treats -R as an alias to -rpath, as opposed
to ld where -R means --just-symbols.

Use the long option name for compatibility between the two.

Link: https://github.com/ClangBuiltLinux/linux/issues/805Reported-by: NDmitry Golovin <dima@golovin.in>
Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
Signed-off-by: NIlie Halip <ilie.halip@gmail.com>
Reviewed-by: NFangrui Song <maskray@google.com>
Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>

3c1918c8

mm/userfaultfd: disable userfaultfd-wp on x86_32 · b64d8d1e

由 Peter Xu 提交于 4月 20, 2020

Userfaultfd-wp is not yet working on 32bit hosts, but it's accidentally
enabled previously.  Disable it.

Fixes: 5a281062 ("userfaultfd: wp: add WP pagetable tracking to x86")
Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Tested-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
Cc: Hillf Danton <hdanton@sina.com>
Link: http://lkml.kernel.org/r/20200413141608.109211-1-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b64d8d1e

sh: fix build error in mm/init.c · 1eb64c07

由 Masahiro Yamada 提交于 4月 20, 2020

The closing parenthesis is missing.

Fixes: bfeb022f ("mm/memory_hotplug: add pgprot_t to mhp_params")
Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NGeert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: NLogan Gunthorpe <logang@deltatee.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Link: http://lkml.kernel.org/r/20200413014743.16353-1-masahiroy@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1eb64c07

RISC-V: stacktrace: Declare sp_in_global outside ifdef · af2bdf82

由 Guenter Roeck 提交于 4月 13, 2020

riscv:allnoconfig and riscv:tinyconfig fail to compile.

arch/riscv/kernel/stacktrace.c: In function 'walk_stackframe':
arch/riscv/kernel/stacktrace.c:78:8: error: 'sp_in_global' undeclared

sp_in_global is declared inside CONFIG_FRAME_POINTER but used outside
of it.

Fixes: 52e7c52d ("RISC-V: Stop relying on GCC's register allocator's hueristics")
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Reviewed-by: NPalmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>

af2bdf82

21 4月, 2020 8 次提交

arm64: sync kernel APIAKey when installing · 3fabb438

由 Mark Rutland 提交于 4月 21, 2020

A direct write to a APxxKey_EL1 register requires a context
synchronization event to ensure that indirect reads made by subsequent
instructions (e.g. AUTIASP, PACIASP) observe the new value.

When we initialize the boot task's APIAKey in boot_init_stack_canary()
via ptrauth_keys_switch_kernel() we miss the necessary ISB, and so there
is a window where instructions are not guaranteed to use the new APIAKey
value. This has been observed to result in boot-time crashes where
PACIASP and AUTIASP within a function used a mixture of the old and new
key values.

Fix this by having ptrauth_keys_switch_kernel() synchronize the new key
value with an ISB. At the same time, __ptrauth_key_install() is renamed
to __ptrauth_key_install_nosync() so that it is obvious that this
performs no synchronization itself.

Fixes: 28321582 ("arm64: initialize ptrauth keys for kernel booting task")
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Reported-by: NWill Deacon <will@kernel.org>
Cc: Amit Daniel Kachhap <amit.kachhap@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Tested-by: NWill Deacon <will@kernel.org>

3fabb438

s390/mm: fix page table upgrade vs 2ndary address mode accesses · 316ec154

由 Christian Borntraeger 提交于 4月 15, 2020

A page table upgrade in a kernel section that uses secondary address
mode will mess up the kernel instructions as follows:

Consider the following scenario: two threads are sharing memory.
On CPU1 thread 1 does e.g. strnlen_user().  That gets to
        old_fs = enable_sacf_uaccess();
        len = strnlen_user_srst(src, size);
and
                "   la    %2,0(%1)\n"
                "   la    %3,0(%0,%1)\n"
                "   slgr  %0,%0\n"
                "   sacf  256\n"
                "0: srst  %3,%2\n"
in strnlen_user_srst().  At that point we are in secondary space mode,
control register 1 points to kernel page table and instruction fetching
happens via c1, rather than usual c13.  Interrupts are not disabled, for
obvious reasons.

On CPU2 thread 2 does MAP_FIXED mmap(), forcing the upgrade of page table
from 3-level to e.g. 4-level one.  We'd allocated new top-level table,
set it up and now we hit this:
                notify = 1;
                spin_unlock_bh(&mm->page_table_lock);
        }
        if (notify)
                on_each_cpu(__crst_table_upgrade, mm, 0);
OK, we need to actually change over to use of new page table and we
need that to happen in all threads that are currently running.  Which
happens to include the thread 1.  IPI is delivered and we have
static void __crst_table_upgrade(void *arg)
{
        struct mm_struct *mm = arg;

        if (current->active_mm == mm)
                set_user_asce(mm);
        __tlb_flush_local();
}
run on CPU1.  That does
static inline void set_user_asce(struct mm_struct *mm)
{
        S390_lowcore.user_asce = mm->context.asce;
OK, user page table address updated...
        __ctl_load(S390_lowcore.user_asce, 1, 1);
... and control register 1 set to it.
        clear_cpu_flag(CIF_ASCE_PRIMARY);
}

IPI is run in home space mode, so it's fine - insns are fetched
using c13, which always points to kernel page table.  But as soon
as we return from the interrupt, previous PSW is restored, putting
CPU1 back into secondary space mode, at which point we no longer
get the kernel instructions from the kernel mapping.

The fix is to only fixup the control registers that are currently in use
for user processes during the page table update.  We must also disable
interrupts in enable_sacf_uaccess to synchronize the cr and
thread.mm_segment updates against the on_each-cpu.

Fixes: 0aaba41b ("s390: remove all code using the access register mode")
Cc: stable@vger.kernel.org # 4.15+
Reported-by: NAl Viro <viro@zeniv.linux.org.uk>
Reviewed-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
References: CVE-2020-11884
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

316ec154

x86/hyperv: Suspend/resume the VP assist page for hibernation · 421f090c

由 Dexuan Cui 提交于 4月 20, 2020

Unlike the other CPUs, CPU0 is never offlined during hibernation, so in the
resume path, the "new" kernel's VP assist page is not suspended (i.e. not
disabled), and later when we jump to the "old" kernel, the page is not
properly re-enabled for CPU0 with the allocated page from the old kernel.

So far, the VP assist page is used by hv_apic_eoi_write(), and is also
used in the case of nested virtualization (running KVM atop Hyper-V).

For hv_apic_eoi_write(), when the page is not properly re-enabled,
hvp->apic_assist is always 0, so the HV_X64_MSR_EOI MSR is always written.
This is not ideal with respect to performance, but Hyper-V can still
correctly handle this according to the Hyper-V spec; nevertheless, Linux
still must update the Hyper-V hypervisor with the correct VP assist page
to prevent Hyper-V from writing to the stale page, which causes guest
memory corruption and consequently may have caused the hangs and triple
faults seen during non-boot CPUs resume.

Fix the issue by calling hv_cpu_die()/hv_cpu_init() in the syscore ops.
Without the fix, hibernation can fail at a rate of 1/300 ~ 1/500.
With the fix, hibernation can pass a long-haul test of 2000 runs.

In the case of nested virtualization, disabling/reenabling the assist
page upon hibernation may be unsafe if there are active L2 guests.
It looks KVM should be enhanced to abort the hibernation request if
there is any active L2 guest.

Fixes: 05bd330a ("x86/hyperv: Suspend/resume the hypercall page for hibernation")
Cc: stable@vger.kernel.org
Signed-off-by: NDexuan Cui <decui@microsoft.com>
Link: https://lore.kernel.org/r/1587437171-2472-1-git-send-email-decui@microsoft.comSigned-off-by: NWei Liu <wei.liu@kernel.org>

421f090c

Drivers: hv: Move AEOI determination to architecture dependent code · 2ddddd0b

由 Michael Kelley 提交于 4月 20, 2020

Hyper-V on ARM64 doesn't provide a flag for the AEOI recommendation
in ms_hyperv.hints, so having the test in architecture independent
code doesn't work. Resolve this by moving the check of the flag
to an architecture dependent helper function. No functionality is
changed.
Signed-off-by: NMichael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20200420164926.24471-1-mikelley@microsoft.comSigned-off-by: NWei Liu <wei.liu@kernel.org>

2ddddd0b

powerpc/setup_64: Set cache-line-size based on cache-block-size · 94c0b013

由 Chris Packham 提交于 4月 17, 2020

If {i,d}-cache-block-size is set and {i,d}-cache-line-size is not, use
the block-size value for both. Per the devicetree spec cache-line-size
is only needed if it differs from the block size.

Originally the code would fallback from block size to line size. An
error message was printed if both properties were missing.

Later the code was refactored to use clearer names and logic but it
inadvertently made line size a required property, meaning on systems
without a line size property we fall back to the default from the
cputable.

On powernv (OPAL) platforms, since the introduction of device tree CPU
features (5a61ef74 ("powerpc/64s: Support new device tree binding
for discovering CPU features")), that has led to the wrong value being
used, as the fallback value is incorrect for Power8/Power9 CPUs.

The incorrect values flow through to the VDSO and also to the sysconf
values, SC_LEVEL1_ICACHE_LINESIZE etc.

Fixes: bd067f83 ("powerpc/64: Fix naming of cache block vs. cache line")
Cc: stable@vger.kernel.org # v4.11+
Signed-off-by: NChris Packham <chris.packham@alliedtelesis.co.nz>
Reported-by: NQian Cai <cai@lca.pw>
[mpe: Add even more detail to change log]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200416221908.7886-1-chris.packham@alliedtelesis.co.nz

94c0b013

bpf, x86: Fix encoding for lower 8-bit registers in BPF_STX BPF_B · aee194b1

由 Luke Nelson 提交于 4月 18, 2020

This patch fixes an encoding bug in emit_stx for BPF_B when the source
register is BPF_REG_FP.

The current implementation for BPF_STX BPF_B in emit_stx saves one REX
byte when the operands can be encoded using Mod-R/M alone. The lower 8
bits of registers %rax, %rbx, %rcx, and %rdx can be accessed without using
a REX prefix via %al, %bl, %cl, and %dl, respectively. Other registers,
(e.g., %rsi, %rdi, %rbp, %rsp) require a REX prefix to use their 8-bit
equivalents (%sil, %dil, %bpl, %spl).

The current code checks if the source for BPF_STX BPF_B is BPF_REG_1
or BPF_REG_2 (which map to %rdi and %rsi), in which case it emits the
required REX prefix. However, it misses the case when the source is
BPF_REG_FP (mapped to %rbp).

The result is that BPF_STX BPF_B with BPF_REG_FP as the source operand
will read from register %ch instead of the correct %bpl. This patch fixes
the problem by fixing and refactoring the check on which registers need
the extra REX byte. Since no BPF registers map to %rsp, there is no need
to handle %spl.

Fixes: 62258278 ("net: filter: x86: internal BPF JIT")
Signed-off-by: NXi Wang <xi.wang@gmail.com>
Signed-off-by: NLuke Nelson <luke.r.nels@gmail.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200418232655.23870-1-luke.r.nels@gmail.com

aee194b1

KVM: PPC: Book3S HV: Handle non-present PTEs in page fault functions · ae49deda

由 Paul Mackerras 提交于 4月 16, 2020

Since cd758a9b "KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT
page fault handler", it's been possible in fairly rare circumstances to
load a non-present PTE in kvmppc_book3s_hv_page_fault() when running a
guest on a POWER8 host.

Because that case wasn't checked for, we could misinterpret the non-present
PTE as being a cache-inhibited PTE. That could mismatch with the
corresponding hash PTE, which would cause the function to fail with -EFAULT
a little further down. That would propagate up to the KVM_RUN ioctl()
generally causing the KVM userspace (usually qemu) to fall over.

This addresses the problem by catching that case and returning to the guest
instead.

For completeness, this fixes the radix page fault handler in the same
way. For radix this didn't cause any obvious misbehaviour, because we
ended up putting the non-present PTE into the guest's partition-scoped
page tables, leading immediately to another hypervisor data/instruction
storage interrupt, which would go through the page fault path again
and fix things up.

Fixes: cd758a9b "KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT page fault handler"
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1820402Reported-by: NDavid Gibson <david@gibson.dropbear.id.au>
Tested-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

ae49deda

kvm: Disable objtool frame pointer checking for vmenter.S · 7f4b5cde

由 Josh Poimboeuf 提交于 4月 20, 2020

Frame pointers are completely broken by vmenter.S because it clobbers
RBP:

  arch/x86/kvm/svm/vmenter.o: warning: objtool: __svm_vcpu_run()+0xe4: BP used as a scratch register

That's unavoidable, so just skip checking that file when frame pointers
are configured in.

On the other hand, ORC can handle that code just fine, so leave objtool
enabled in the !FRAME_POINTER case.
Reported-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Message-Id: <01fae42917bacad18be8d2cbc771353da6603473.1587398610.git.jpoimboe@redhat.com>
Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Fixes: 199cd1d7 ("KVM: SVM: Split svm_vcpu_run inline assembly to separate file")
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7f4b5cde

20 4月, 2020 1 次提交

KVM: s390: Fix PV check in deliverable_irqs() · d47c4c45

由 Eric Farman 提交于 4月 15, 2020

The diag 0x44 handler, which handles a directed yield, goes into a
a codepath that does a kvm_for_each_vcpu() and ultimately
deliverable_irqs().  The new check for kvm_s390_pv_cpu_is_protected()
contains an assertion that the vcpu->mutex is held, which isn't going
to be the case in this scenario.

The result is a plethora of these messages if the lock debugging
is enabled, and thus an implication that we have a problem.

  WARNING: CPU: 9 PID: 16167 at arch/s390/kvm/kvm-s390.h:239 deliverable_irqs+0x1c6/0x1d0 [kvm]
  ...snip...
  Call Trace:
   [<000003ff80429bf2>] deliverable_irqs+0x1ca/0x1d0 [kvm]
  ([<000003ff80429b34>] deliverable_irqs+0x10c/0x1d0 [kvm])
   [<000003ff8042ba82>] kvm_s390_vcpu_has_irq+0x2a/0xa8 [kvm]
   [<000003ff804101e2>] kvm_arch_dy_runnable+0x22/0x38 [kvm]
   [<000003ff80410284>] kvm_vcpu_on_spin+0x8c/0x1d0 [kvm]
   [<000003ff80436888>] kvm_s390_handle_diag+0x3b0/0x768 [kvm]
   [<000003ff80425af4>] kvm_handle_sie_intercept+0x1cc/0xcd0 [kvm]
   [<000003ff80422bb0>] __vcpu_run+0x7b8/0xfd0 [kvm]
   [<000003ff80423de6>] kvm_arch_vcpu_ioctl_run+0xee/0x3e0 [kvm]
   [<000003ff8040ccd8>] kvm_vcpu_ioctl+0x2c8/0x8d0 [kvm]
   [<00000001504ced06>] ksys_ioctl+0xae/0xe8
   [<00000001504cedaa>] __s390x_sys_ioctl+0x2a/0x38
   [<0000000150cb9034>] system_call+0xd8/0x2d8
  2 locks held by CPU 2/KVM/16167:
   #0: 00000001951980c0 (&vcpu->mutex){+.+.}, at: kvm_vcpu_ioctl+0x90/0x8d0 [kvm]
   #1: 000000019599c0f0 (&kvm->srcu){....}, at: __vcpu_run+0x4bc/0xfd0 [kvm]
  Last Breaking-Event-Address:
   [<000003ff80429b34>] deliverable_irqs+0x10c/0x1d0 [kvm]
  irq event stamp: 11967
  hardirqs last  enabled at (11975): [<00000001502992f2>] console_unlock+0x4ca/0x650
  hardirqs last disabled at (11982): [<0000000150298ee8>] console_unlock+0xc0/0x650
  softirqs last  enabled at (7940): [<0000000150cba6ca>] __do_softirq+0x422/0x4d8
  softirqs last disabled at (7929): [<00000001501cd688>] do_softirq_own_stack+0x70/0x80

Considering what's being done here, let's fix this by removing the
mutex assertion rather than acquiring the mutex for every other vcpu.

Fixes: 201ae986 ("KVM: s390: protvirt: Implement interrupt injection")
Signed-off-by: NEric Farman <farman@linux.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Link: https://lore.kernel.org/r/20200415190353.63625-1-farman@linux.ibm.comSigned-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

d47c4c45

18 4月, 2020 3 次提交

x86/split_lock: Add Tremont family CPU models · 8b9a18a9

由 Tony Luck 提交于 4月 16, 2020

Tremont CPUs support IA32_CORE_CAPABILITIES bits to indicate whether
specific SKUs have support for split lock detection.
Signed-off-by: NTony Luck <tony.luck@intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200416205754.21177-4-tony.luck@intel.com

8b9a18a9

x86/split_lock: Bits in IA32_CORE_CAPABILITIES are not architectural · 48fd5b5e

由 Tony Luck 提交于 4月 16, 2020

The Intel Software Developers' Manual erroneously listed bit 5 of the
IA32_CORE_CAPABILITIES register as an architectural feature. It is not.

Features enumerated by IA32_CORE_CAPABILITIES are model specific and
implementation details may vary in different cpu models. Thus it is only
safe to trust features after checking the CPU model.

Icelake client and server models are known to implement the split lock
detect feature even though they don't enumerate IA32_CORE_CAPABILITIES

[ tglx: Use switch() for readability and massage comments ]

Fixes: 6650cdd9 ("x86/split_lock: Enable split lock detection by kernel")
Signed-off-by: NTony Luck <tony.luck@intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200416205754.21177-3-tony.luck@intel.com

48fd5b5e

x86/resctrl: Preserve CDP enable over CPU hotplug · 9fe04507

由 James Morse 提交于 2月 21, 2020

Resctrl assumes that all CPUs are online when the filesystem is mounted,
and that CPUs remember their CDP-enabled state over CPU hotplug.

This goes wrong when resctrl's CDP-enabled state changes while all the
CPUs in a domain are offline.

When a domain comes online, enable (or disable!) CDP to match resctrl's
current setting.

Fixes: 5ff193fb ("x86/intel_rdt: Add basic resctrl filesystem support")
Suggested-by: NReinette Chatre <reinette.chatre@intel.com>
Signed-off-by: NJames Morse <james.morse@arm.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200221162105.154163-1-james.morse@arm.com

9fe04507

17 4月, 2020 4 次提交

kvm: Handle reads of SandyBridge RAPL PMU MSRs rather than injecting #GP · 2ca1a06a

由 Venkatesh Srinivas 提交于 4月 16, 2020

Linux 3.14 unconditionally reads the RAPL PMU MSRs on boot, without handling
General Protection Faults on reading those MSRs. Rather than injecting a #GP,
which prevents boot, handle the MSRs by returning 0 for their data. Zero was
checked to be safe by code review of the RAPL PMU driver and in discussion
with the original driver author (eranian@google.com).
Signed-off-by: NVenkatesh Srinivas <venkateshs@google.com>
Signed-off-by: NJon Cargille <jcargill@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Message-Id: <20200416184254.248374-1-jcargill@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2ca1a06a

KVM: Remove CREATE_IRQCHIP/SET_PIT2 race · 7289fdb5

由 Steve Rutherford 提交于 4月 16, 2020

Fixes a NULL pointer dereference, caused by the PIT firing an interrupt
before the interrupt table has been initialized.

SET_PIT2 can race with the creation of the IRQchip. In particular,
if SET_PIT2 is called with a low PIT timer period (after the creation of
the IOAPIC, but before the instantiation of the irq routes), the PIT can
fire an interrupt at an uninitialized table.
Signed-off-by: NSteve Rutherford <srutherford@google.com>
Signed-off-by: NJon Cargille <jcargill@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Message-Id: <20200416191152.259434-1-jcargill@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7289fdb5

ARM: imx: provide v7_cpu_resume() only on ARM_CPU_SUSPEND=y · f1baca88

由 Ahmad Fatoum 提交于 3月 23, 2020

512a928a ("ARM: imx: build v7_cpu_resume() unconditionally")
introduced an unintended linker error for i.MX6 configurations that have
ARM_CPU_SUSPEND=n which can happen if neither CONFIG_PM, CONFIG_CPU_IDLE,
nor ARM_PSCI_FW are selected.

Fix this by having v7_cpu_resume() compiled only when cpu_resume() it
calls is available as well.

The C declaration for the function remains unguarded to avoid future code
inadvertently using a stub and introducing a regression to the bug the
original commit fixed.

Cc: <stable@vger.kernel.org>
Fixes: 512a928a ("ARM: imx: build v7_cpu_resume() unconditionally")
Reported-by: NClemens Gruber <clemens.gruber@pqgruber.com>
Signed-off-by: NAhmad Fatoum <a.fatoum@pengutronix.de>
Tested-by: NRoland Hieber <rhi@pengutronix.de>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>

f1baca88

x86/resctrl: Fix invalid attempt at removing the default resource group · b0151da5

由 Reinette Chatre 提交于 3月 17, 2020

The default resource group ("rdtgroup_default") is associated with the
root of the resctrl filesystem and should never be removed. New resource
groups can be created as subdirectories of the resctrl filesystem and
they can be removed from user space.

There exists a safeguard in the directory removal code
(rdtgroup_rmdir()) that ensures that only subdirectories can be removed
by testing that the directory to be removed has to be a child of the
root directory.

A possible deadlock was recently fixed with

  334b0f4e ("x86/resctrl: Fix a deadlock due to inaccurate reference").

This fix involved associating the private data of the "mon_groups"
and "mon_data" directories to the resource group to which they belong
instead of NULL as before. A consequence of this change was that
the original safeguard code preventing removal of "mon_groups" and
"mon_data" found in the root directory failed resulting in attempts to
remove the default resource group that ends in a BUG:

  kernel BUG at mm/slub.c:3969!
  invalid opcode: 0000 [#1] SMP PTI

  Call Trace:
  rdtgroup_rmdir+0x16b/0x2c0
  kernfs_iop_rmdir+0x5c/0x90
  vfs_rmdir+0x7a/0x160
  do_rmdir+0x17d/0x1e0
  do_syscall_64+0x55/0x1d0
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fix this by improving the directory removal safeguard to ensure that
subdirectories of the resctrl root directory can only be removed if they
are a child of the resctrl filesystem's root _and_ not associated with
the default resource group.

Fixes: 334b0f4e ("x86/resctrl: Fix a deadlock due to inaccurate reference")
Reported-by: NSai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Signed-off-by: NReinette Chatre <reinette.chatre@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Tested-by: NSai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/884cbe1773496b5dbec1b6bd11bb50cffa83603d.1584461853.git.reinette.chatre@intel.com

b0151da5

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功