- 01 5月, 2020 1 次提交
-
-
由 Vincenzo Frascino 提交于
On arm64 linux gcc uses -fasynchronous-unwind-tables -funwind-tables by default since gcc-8, so now the de facto platform ABI is to allow unwinding from async signal handlers. However on bare metal targets (aarch64-none-elf), and on old gcc, async and sync unwind tables are not enabled by default to avoid runtime memory costs. This means if linux is built with a baremetal toolchain the vdso.so may not have unwind tables which breaks the gcc platform ABI guarantee in userspace. Add -fasynchronous-unwind-tables explicitly to the vgettimeofday.o cflags to address the ABI change. Fixes: 28b1a824 ("arm64: vdso: Substitute gettimeofday() with C implementation") Cc: Will Deacon <will@kernel.org> Reported-by: NSzabolcs Nagy <szabolcs.nagy@arm.com> Signed-off-by: NVincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
- 25 4月, 2020 5 次提交
-
-
由 Claudio Imbrenda 提交于
The kernel fails to compile with CONFIG_PROTECTED_VIRTUALIZATION_GUEST set but CONFIG_KVM unset. This patch fixes the issue by making the needed variable always available. Link: https://lkml.kernel.org/r/20200423120114.2027410-1-imbrenda@linux.ibm.com Fixes: a0f60f84 ("s390/protvirt: Add sysfs firmware interface for Ultravisor information") Reported-by: Nkbuild test robot <lkp@intel.com> Reported-by: NPhilipp Rudo <prudo@linux.ibm.com> Suggested-by: NPhilipp Rudo <prudo@linux.ibm.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Reviewed-by: NVasily Gorbik <gor@linux.ibm.com> Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NClaudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 Wang YanQing 提交于
When verifier_zext is true, we don't need to emit code for zero-extension. Fixes: 836256bf ("x32: bpf: eliminate zero extension code-gen") Signed-off-by: NWang YanQing <udknight@gmail.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200423050637.GA4029@udknight
-
由 Luke Nelson 提交于
The current JIT clobbers the destination register for BPF_JSET BPF_X and BPF_K by using "and" and "or" instructions. This is fine when the destination register is a temporary loaded from a register stored on the stack but not otherwise. This patch fixes the problem (for both BPF_K and BPF_X) by always loading the destination register into temporaries since BPF_JSET should not modify the destination register. This bug may not be currently triggerable as BPF_REG_AX is the only register not stored on the stack and the verifier uses it in a limited way. Fixes: 03f5781b ("bpf, x86_32: add eBPF JIT compiler for ia32") Signed-off-by: NXi Wang <xi.wang@gmail.com> Signed-off-by: NLuke Nelson <luke.r.nels@gmail.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NWang YanQing <udknight@gmail.com> Link: https://lore.kernel.org/bpf/20200422173630.8351-2-luke.r.nels@gmail.com
-
由 Luke Nelson 提交于
The current JIT uses the following sequence to zero-extend into the upper 32 bits of the destination register for BPF_LDX BPF_{B,H,W}, when the destination register is not on the stack: EMIT3(0xC7, add_1reg(0xC0, dst_hi), 0); The problem is that C7 /0 encodes a MOV instruction that requires a 4-byte immediate; the current code emits only 1 byte of the immediate. This means that the first 3 bytes of the next instruction will be treated as the rest of the immediate, breaking the stream of instructions. This patch fixes the problem by instead emitting "xor dst_hi,dst_hi" to clear the upper 32 bits. This fixes the problem and is more efficient than using MOV to load a zero immediate. This bug may not be currently triggerable as BPF_REG_AX is the only register not stored on the stack and the verifier uses it in a limited way, and the verifier implements a zero-extension optimization. But the JIT should avoid emitting incorrect encodings regardless. Fixes: 03f5781b ("bpf, x86_32: add eBPF JIT compiler for ia32") Signed-off-by: NXi Wang <xi.wang@gmail.com> Signed-off-by: NLuke Nelson <luke.r.nels@gmail.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Reviewed-by: NH. Peter Anvin (Intel) <hpa@zytor.com> Acked-by: NWang YanQing <udknight@gmail.com> Link: https://lore.kernel.org/bpf/20200422173630.8351-1-luke.r.nels@gmail.com
-
由 Damien Le Moal 提交于
ARCH_HAS_STRICT_KERNEL_RWX is not useful for NO-MMU systems. Furthermore, has this option leads to very large boot image files on 64bits architectures, do not enable this option to allow supporting no-mmu platforms such as the Kendryte K210 SoC based boards. Fixes: 00cb41d5 ("riscv: add alignment for text, rodata and data sections") Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com> Reviewed-by: NAnup Patel <anup@brainfault.org> Reviewed-by: NWladimir J. van der Laan <laanwj@gmail.com> Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>
-
- 23 4月, 2020 8 次提交
-
-
由 Masahiro Yamada 提交于
Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
-
由 Vitor Massaru Iha 提交于
In this workflow: $ make ARCH=um defconfig && make ARCH=um -j8 [snip] $ make ARCH=um mrproper [snip] $ make ARCH=um defconfig O=./build_um && make ARCH=um -j8 O=./build_um [snip] CC scripts/mod/empty.o In file included from ../include/linux/types.h:6, from ../include/linux/mod_devicetable.h:12, from ../scripts/mod/devicetable-offsets.c:3: ../include/uapi/linux/types.h:5:10: fatal error: asm/types.h: No such file or directory 5 | #include <asm/types.h> | ^~~~~~~~~~~~~ compilation terminated. make[2]: *** [../scripts/Makefile.build:100: scripts/mod/devicetable-offsets.s] Error 1 make[2]: *** Waiting for unfinished jobs.... make[1]: *** [/home/iha/sdb/opensource/lkmp/linux-kselftest.git/Makefile:1140: prepare0] Error 2 make[1]: Leaving directory '/home/iha/sdb/opensource/lkmp/linux-kselftest.git/build_um' make: *** [Makefile:180: sub-make] Error 2 The cause of the error was because arch/$(SUBARCH)/include/generated files weren't properly cleaned by `make ARCH=um mrproper`. Fixes: a788b2ed ("kbuild: check arch/$(SRCARCH)/include/generated before out-of-tree build") Reported-by: NTheodore Ts'o <tytso@mit.edu> Suggested-by: NMasahiro Yamada <masahiroy@kernel.org> Signed-off-by: NVitor Massaru Iha <vitor@massaru.org> Reviewed-by: NBrendan Higgins <brendanhiggins@google.com> Tested-by: NBrendan Higgins <brendanhiggins@google.com> Link: https://groups.google.com/forum/#!msg/kunit-dev/QmA27YEgEgI/hvS1kiz2CwAJSigned-off-by: NMasahiro Yamada <masahiroy@kernel.org>
-
由 Masahiro Yamada 提交于
As the bug report [1] pointed out, <linux/vermagic.h> must be included after <linux/module.h>. I believe we should not impose any include order restriction. We often sort include directives alphabetically, but it is just coding style convention. Technically, we can include header files in any order by making every header self-contained. Currently, arch-specific MODULE_ARCH_VERMAGIC is defined in <asm/module.h>, which is not included from <linux/vermagic.h>. Hence, the straight-forward fix-up would be as follows: |--- a/include/linux/vermagic.h |+++ b/include/linux/vermagic.h |@@ -1,5 +1,6 @@ | /* SPDX-License-Identifier: GPL-2.0 */ | #include <generated/utsrelease.h> |+#include <linux/module.h> | | /* Simply sanity version stamp for modules. */ | #ifdef CONFIG_SMP This works enough, but for further cleanups, I split MODULE_ARCH_VERMAGIC definitions into <asm/vermagic.h>. With this, <linux/module.h> and <linux/vermagic.h> will be orthogonal, and the location of MODULE_ARCH_VERMAGIC definitions will be consistent. For arc and ia64, MODULE_PROC_FAMILY is only used for defining MODULE_ARCH_VERMAGIC. I squashed it. For hexagon, nds32, and xtensa, I removed <asm/modules.h> entirely because they contained nothing but MODULE_ARCH_VERMAGIC definition. Kbuild will automatically generate <asm/modules.h> at build-time, wrapping <asm-generic/module.h>. [1] https://lore.kernel.org/lkml/20200411155623.GA22175@zn.tnicReported-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org> Acked-by: NJessica Yu <jeyu@kernel.org>
-
由 Giovanni Gherdovich 提交于
Improve readability of the function intel_set_max_freq_ratio() by moving the check for KNL CPUs there, together with checks for GLM and SKX. Signed-off-by: NGiovanni Gherdovich <ggherdovich@suse.cz> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lkml.kernel.org/r/20200416054745.740-5-ggherdovich@suse.cz
-
由 Peter Zijlstra (Intel) 提交于
The static key arch_scale_freq_key only needs to be enabled once (at boot). This change fixes a bug by which the key was enabled every time cpu0 is started, even as a secondary CPU during cpu hotplug. Secondary CPUs are started from the idle thread: setting a static key from there means acquiring a lock and may result in sleeping in the idle task, causing CPU lockup. Another consequence of this change is that init_counter_refs() is now called on each CPU correctly; previously the function on_each_cpu() was used, but it was called at boot when the only online cpu is cpu0. [ggherdovich@suse.cz: Tested and wrote changelog] Fixes: 1567c3e3 ("x86, sched: Add support for frequency invariance") Reported-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NGiovanni Gherdovich <ggherdovich@suse.cz> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lkml.kernel.org/r/20200416054745.740-4-ggherdovich@suse.cz
-
由 Giovanni Gherdovich 提交于
If a CPU has less than 4 physical cores, MSR_TURBO_RATIO_LIMIT will rightfully report that the 4C turbo ratio is zero. In such cases, use the 1C turbo ratio instead for frequency invariance calculations. Fixes: 1567c3e3 ("x86, sched: Add support for frequency invariance") Reported-by: NLike Xu <like.xu@linux.intel.com> Reported-by: NNeil Rickert <nwr10cst-oslnx@yahoo.com> Signed-off-by: NGiovanni Gherdovich <ggherdovich@suse.cz> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: NDave Kleikamp <dave.kleikamp@oracle.com> Link: https://lkml.kernel.org/r/20200416054745.740-3-ggherdovich@suse.cz
-
由 Giovanni Gherdovich 提交于
Some hypervisors such as VMWare ESXi 5.5 advertise support for X86_FEATURE_APERFMPERF but then fill all MSR's with zeroes. In particular, MSR_PLATFORM_INFO set to zero tricks the code that wants to know the base clock frequency of the CPU (highest non-turbo frequency), producing a division by zero when computing the ratio turbo_freq/base_freq necessary for frequency invariant accounting. It is to be noted that even if MSR_PLATFORM_INFO contained the appropriate data, APERF and MPERF are constantly zero on ESXi 5.5, thus freq-invariance couldn't be done in principle (not that it would make a lot of sense in a VM anyway). The real problem is advertising X86_FEATURE_APERFMPERF. This appears to be fixed in more recent versions: ESXi 6.7 doesn't advertise that feature. Fixes: 1567c3e3 ("x86, sched: Add support for frequency invariance") Signed-off-by: NGiovanni Gherdovich <ggherdovich@suse.cz> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lkml.kernel.org/r/20200416054745.740-2-ggherdovich@suse.cz
-
由 Harry Pan 提交于
The Jasper Lake processor is Tremont microarchitecture, reuse the glm_cstates table of Goldmont and Goldmont Plus to enable the C-states residency profiling. Signed-off-by: NHarry Pan <harry.pan@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200402190658.1.Ic02e891daac41303aed1f2fc6c64f6110edd27bd@changeid
-
- 22 4月, 2020 10 次提交
-
-
由 Niklas Schnelle 提交于
with the introduction of CPU directed interrupts the kernel parameter pci=force_floating was introduced to fall back to the previous behavior using floating irqs. However we were still setting the affinity in that case, both in __irq_alloc_descs() and via the irq_set_affinity callback in struct irq_chip. For the former only set the affinity in the directed case. The latter is explicitly set in zpci_directed_irq_init() so we can just leave it unset for the floating case. Fixes: e979ce7b ("s390/pci: provide support for CPU directed interrupts") Co-developed-by: NAlexander Schmidt <alexs@linux.ibm.com> Signed-off-by: NAlexander Schmidt <alexs@linux.ibm.com> Signed-off-by: NNiklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 Philipp Rudo 提交于
Switching tracers include instruction patching. To prevent that a instruction is patched while it's read the instruction patching is done in stop_machine 'context'. This also means that any function called during stop_machine must not be traced. Thus add 'notrace' to all functions called within stop_machine. Fixes: 1ec2772e ("s390/diag: add a statistic for diagnose calls") Fixes: 38f2c691 ("s390: improve wait logic of stop_machine") Fixes: 4ecf0a43 ("processor: get rid of cpu_relax_yield") Signed-off-by: NPhilipp Rudo <prudo@linux.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 Christophe Leroy 提交于
CONFIG_PPC_KUAP_DEBUG is not selectable because it depends on PPC_32 which doesn't exists. Fixing it leads to a deadlock due to a vital register getting clobbered in _switch(). Change dependency to PPC32 and use r0 instead of r4 in _switch() Fixes: e2fb9f54 ("powerpc/32: Prepare for Kernel Userspace Access Protection") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/540242f7d4573f7cdf1b3bf46bb35f743b2cd68f.1587124651.git.christophe.leroy@c-s.fr
-
由 Christophe Leroy 提交于
WRITE_RO lkdtm test works. But when selecting CONFIG_DEBUG_RODATA_TEST, the kernel reports rodata_test: test data was not read only This is because when rodata test runs, there are still old entries in TLB. Flush TLB after setting kernel pages RO or NX. Fixes: d5f17ee9 ("powerpc/8xx: don't disable large TLBs with CONFIG_STRICT_KERNEL_RWX") Cc: stable@vger.kernel.org # v5.1+ Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/485caac75f195f18c11eb077b0031fdd2bb7fb9e.1587361039.git.christophe.leroy@c-s.fr
-
由 Kefeng Wang 提交于
There is no shutdown call in SBI v0.2, only set pm_power_off when RISCV_SBI_V01 enabled to fix following build error, riscv64-linux-ld: arch/riscv/kernel/sbi.o: in function `sbi_power_off': sbi.c:(.text+0xe): undefined reference to `sbi_shutdown Fixes: efca1398 ("RISC-V: Introduce a new config for SBI v0.1") Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: NAnup Patel <anup@brainfault.org> Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>
-
由 Kefeng Wang 提交于
Fix incorrect EXPORT_SYMBOL(). Fixes: efca1398 ("RISC-V: Introduce a new config for SBI v0.1") Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: NAnup Patel <anup@brainfault.org> Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>
-
由 Ilie Halip 提交于
When building with the LLVM linker this error occurrs: LD arch/riscv/kernel/vdso/vdso-syms.o ld.lld: error: no input files This happens because the lld treats -R as an alias to -rpath, as opposed to ld where -R means --just-symbols. Use the long option name for compatibility between the two. Link: https://github.com/ClangBuiltLinux/linux/issues/805Reported-by: NDmitry Golovin <dima@golovin.in> Reviewed-by: NNick Desaulniers <ndesaulniers@google.com> Signed-off-by: NIlie Halip <ilie.halip@gmail.com> Reviewed-by: NFangrui Song <maskray@google.com> Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>
-
由 Peter Xu 提交于
Userfaultfd-wp is not yet working on 32bit hosts, but it's accidentally enabled previously. Disable it. Fixes: 5a281062 ("userfaultfd: wp: add WP pagetable tracking to x86") Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org> Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Tested-by: NNaresh Kamboju <naresh.kamboju@linaro.org> Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com> Cc: Hillf Danton <hdanton@sina.com> Link: http://lkml.kernel.org/r/20200413141608.109211-1-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Masahiro Yamada 提交于
The closing parenthesis is missing. Fixes: bfeb022f ("mm/memory_hotplug: add pgprot_t to mhp_params") Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NGeert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: NLogan Gunthorpe <logang@deltatee.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Link: http://lkml.kernel.org/r/20200413014743.16353-1-masahiroy@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Guenter Roeck 提交于
riscv:allnoconfig and riscv:tinyconfig fail to compile. arch/riscv/kernel/stacktrace.c: In function 'walk_stackframe': arch/riscv/kernel/stacktrace.c:78:8: error: 'sp_in_global' undeclared sp_in_global is declared inside CONFIG_FRAME_POINTER but used outside of it. Fixes: 52e7c52d ("RISC-V: Stop relying on GCC's register allocator's hueristics") Signed-off-by: NGuenter Roeck <linux@roeck-us.net> Reviewed-by: NPalmer Dabbelt <palmerdabbelt@google.com> Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>
-
- 21 4月, 2020 8 次提交
-
-
由 Mark Rutland 提交于
A direct write to a APxxKey_EL1 register requires a context synchronization event to ensure that indirect reads made by subsequent instructions (e.g. AUTIASP, PACIASP) observe the new value. When we initialize the boot task's APIAKey in boot_init_stack_canary() via ptrauth_keys_switch_kernel() we miss the necessary ISB, and so there is a window where instructions are not guaranteed to use the new APIAKey value. This has been observed to result in boot-time crashes where PACIASP and AUTIASP within a function used a mixture of the old and new key values. Fix this by having ptrauth_keys_switch_kernel() synchronize the new key value with an ISB. At the same time, __ptrauth_key_install() is renamed to __ptrauth_key_install_nosync() so that it is obvious that this performs no synchronization itself. Fixes: 28321582 ("arm64: initialize ptrauth keys for kernel booting task") Signed-off-by: NMark Rutland <mark.rutland@arm.com> Reported-by: NWill Deacon <will@kernel.org> Cc: Amit Daniel Kachhap <amit.kachhap@arm.com> Cc: Marc Zyngier <maz@kernel.org> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> Tested-by: NWill Deacon <will@kernel.org>
-
由 Christian Borntraeger 提交于
A page table upgrade in a kernel section that uses secondary address mode will mess up the kernel instructions as follows: Consider the following scenario: two threads are sharing memory. On CPU1 thread 1 does e.g. strnlen_user(). That gets to old_fs = enable_sacf_uaccess(); len = strnlen_user_srst(src, size); and " la %2,0(%1)\n" " la %3,0(%0,%1)\n" " slgr %0,%0\n" " sacf 256\n" "0: srst %3,%2\n" in strnlen_user_srst(). At that point we are in secondary space mode, control register 1 points to kernel page table and instruction fetching happens via c1, rather than usual c13. Interrupts are not disabled, for obvious reasons. On CPU2 thread 2 does MAP_FIXED mmap(), forcing the upgrade of page table from 3-level to e.g. 4-level one. We'd allocated new top-level table, set it up and now we hit this: notify = 1; spin_unlock_bh(&mm->page_table_lock); } if (notify) on_each_cpu(__crst_table_upgrade, mm, 0); OK, we need to actually change over to use of new page table and we need that to happen in all threads that are currently running. Which happens to include the thread 1. IPI is delivered and we have static void __crst_table_upgrade(void *arg) { struct mm_struct *mm = arg; if (current->active_mm == mm) set_user_asce(mm); __tlb_flush_local(); } run on CPU1. That does static inline void set_user_asce(struct mm_struct *mm) { S390_lowcore.user_asce = mm->context.asce; OK, user page table address updated... __ctl_load(S390_lowcore.user_asce, 1, 1); ... and control register 1 set to it. clear_cpu_flag(CIF_ASCE_PRIMARY); } IPI is run in home space mode, so it's fine - insns are fetched using c13, which always points to kernel page table. But as soon as we return from the interrupt, previous PSW is restored, putting CPU1 back into secondary space mode, at which point we no longer get the kernel instructions from the kernel mapping. The fix is to only fixup the control registers that are currently in use for user processes during the page table update. We must also disable interrupts in enable_sacf_uaccess to synchronize the cr and thread.mm_segment updates against the on_each-cpu. Fixes: 0aaba41b ("s390: remove all code using the access register mode") Cc: stable@vger.kernel.org # 4.15+ Reported-by: NAl Viro <viro@zeniv.linux.org.uk> Reviewed-by: NGerald Schaefer <gerald.schaefer@de.ibm.com> References: CVE-2020-11884 Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 Dexuan Cui 提交于
Unlike the other CPUs, CPU0 is never offlined during hibernation, so in the resume path, the "new" kernel's VP assist page is not suspended (i.e. not disabled), and later when we jump to the "old" kernel, the page is not properly re-enabled for CPU0 with the allocated page from the old kernel. So far, the VP assist page is used by hv_apic_eoi_write(), and is also used in the case of nested virtualization (running KVM atop Hyper-V). For hv_apic_eoi_write(), when the page is not properly re-enabled, hvp->apic_assist is always 0, so the HV_X64_MSR_EOI MSR is always written. This is not ideal with respect to performance, but Hyper-V can still correctly handle this according to the Hyper-V spec; nevertheless, Linux still must update the Hyper-V hypervisor with the correct VP assist page to prevent Hyper-V from writing to the stale page, which causes guest memory corruption and consequently may have caused the hangs and triple faults seen during non-boot CPUs resume. Fix the issue by calling hv_cpu_die()/hv_cpu_init() in the syscore ops. Without the fix, hibernation can fail at a rate of 1/300 ~ 1/500. With the fix, hibernation can pass a long-haul test of 2000 runs. In the case of nested virtualization, disabling/reenabling the assist page upon hibernation may be unsafe if there are active L2 guests. It looks KVM should be enhanced to abort the hibernation request if there is any active L2 guest. Fixes: 05bd330a ("x86/hyperv: Suspend/resume the hypercall page for hibernation") Cc: stable@vger.kernel.org Signed-off-by: NDexuan Cui <decui@microsoft.com> Link: https://lore.kernel.org/r/1587437171-2472-1-git-send-email-decui@microsoft.comSigned-off-by: NWei Liu <wei.liu@kernel.org>
-
由 Michael Kelley 提交于
Hyper-V on ARM64 doesn't provide a flag for the AEOI recommendation in ms_hyperv.hints, so having the test in architecture independent code doesn't work. Resolve this by moving the check of the flag to an architecture dependent helper function. No functionality is changed. Signed-off-by: NMichael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20200420164926.24471-1-mikelley@microsoft.comSigned-off-by: NWei Liu <wei.liu@kernel.org>
-
由 Chris Packham 提交于
If {i,d}-cache-block-size is set and {i,d}-cache-line-size is not, use the block-size value for both. Per the devicetree spec cache-line-size is only needed if it differs from the block size. Originally the code would fallback from block size to line size. An error message was printed if both properties were missing. Later the code was refactored to use clearer names and logic but it inadvertently made line size a required property, meaning on systems without a line size property we fall back to the default from the cputable. On powernv (OPAL) platforms, since the introduction of device tree CPU features (5a61ef74 ("powerpc/64s: Support new device tree binding for discovering CPU features")), that has led to the wrong value being used, as the fallback value is incorrect for Power8/Power9 CPUs. The incorrect values flow through to the VDSO and also to the sysconf values, SC_LEVEL1_ICACHE_LINESIZE etc. Fixes: bd067f83 ("powerpc/64: Fix naming of cache block vs. cache line") Cc: stable@vger.kernel.org # v4.11+ Signed-off-by: NChris Packham <chris.packham@alliedtelesis.co.nz> Reported-by: NQian Cai <cai@lca.pw> [mpe: Add even more detail to change log] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200416221908.7886-1-chris.packham@alliedtelesis.co.nz
-
由 Luke Nelson 提交于
This patch fixes an encoding bug in emit_stx for BPF_B when the source register is BPF_REG_FP. The current implementation for BPF_STX BPF_B in emit_stx saves one REX byte when the operands can be encoded using Mod-R/M alone. The lower 8 bits of registers %rax, %rbx, %rcx, and %rdx can be accessed without using a REX prefix via %al, %bl, %cl, and %dl, respectively. Other registers, (e.g., %rsi, %rdi, %rbp, %rsp) require a REX prefix to use their 8-bit equivalents (%sil, %dil, %bpl, %spl). The current code checks if the source for BPF_STX BPF_B is BPF_REG_1 or BPF_REG_2 (which map to %rdi and %rsi), in which case it emits the required REX prefix. However, it misses the case when the source is BPF_REG_FP (mapped to %rbp). The result is that BPF_STX BPF_B with BPF_REG_FP as the source operand will read from register %ch instead of the correct %bpl. This patch fixes the problem by fixing and refactoring the check on which registers need the extra REX byte. Since no BPF registers map to %rsp, there is no need to handle %spl. Fixes: 62258278 ("net: filter: x86: internal BPF JIT") Signed-off-by: NXi Wang <xi.wang@gmail.com> Signed-off-by: NLuke Nelson <luke.r.nels@gmail.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200418232655.23870-1-luke.r.nels@gmail.com
-
由 Paul Mackerras 提交于
Since cd758a9b "KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT page fault handler", it's been possible in fairly rare circumstances to load a non-present PTE in kvmppc_book3s_hv_page_fault() when running a guest on a POWER8 host. Because that case wasn't checked for, we could misinterpret the non-present PTE as being a cache-inhibited PTE. That could mismatch with the corresponding hash PTE, which would cause the function to fail with -EFAULT a little further down. That would propagate up to the KVM_RUN ioctl() generally causing the KVM userspace (usually qemu) to fall over. This addresses the problem by catching that case and returning to the guest instead. For completeness, this fixes the radix page fault handler in the same way. For radix this didn't cause any obvious misbehaviour, because we ended up putting the non-present PTE into the guest's partition-scoped page tables, leading immediately to another hypervisor data/instruction storage interrupt, which would go through the page fault path again and fix things up. Fixes: cd758a9b "KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT page fault handler" Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1820402Reported-by: NDavid Gibson <david@gibson.dropbear.id.au> Tested-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Josh Poimboeuf 提交于
Frame pointers are completely broken by vmenter.S because it clobbers RBP: arch/x86/kvm/svm/vmenter.o: warning: objtool: __svm_vcpu_run()+0xe4: BP used as a scratch register That's unavoidable, so just skip checking that file when frame pointers are configured in. On the other hand, ORC can handle that code just fine, so leave objtool enabled in the !FRAME_POINTER case. Reported-by: NRandy Dunlap <rdunlap@infradead.org> Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com> Message-Id: <01fae42917bacad18be8d2cbc771353da6603473.1587398610.git.jpoimboe@redhat.com> Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Fixes: 199cd1d7 ("KVM: SVM: Split svm_vcpu_run inline assembly to separate file") Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 20 4月, 2020 1 次提交
-
-
由 Eric Farman 提交于
The diag 0x44 handler, which handles a directed yield, goes into a a codepath that does a kvm_for_each_vcpu() and ultimately deliverable_irqs(). The new check for kvm_s390_pv_cpu_is_protected() contains an assertion that the vcpu->mutex is held, which isn't going to be the case in this scenario. The result is a plethora of these messages if the lock debugging is enabled, and thus an implication that we have a problem. WARNING: CPU: 9 PID: 16167 at arch/s390/kvm/kvm-s390.h:239 deliverable_irqs+0x1c6/0x1d0 [kvm] ...snip... Call Trace: [<000003ff80429bf2>] deliverable_irqs+0x1ca/0x1d0 [kvm] ([<000003ff80429b34>] deliverable_irqs+0x10c/0x1d0 [kvm]) [<000003ff8042ba82>] kvm_s390_vcpu_has_irq+0x2a/0xa8 [kvm] [<000003ff804101e2>] kvm_arch_dy_runnable+0x22/0x38 [kvm] [<000003ff80410284>] kvm_vcpu_on_spin+0x8c/0x1d0 [kvm] [<000003ff80436888>] kvm_s390_handle_diag+0x3b0/0x768 [kvm] [<000003ff80425af4>] kvm_handle_sie_intercept+0x1cc/0xcd0 [kvm] [<000003ff80422bb0>] __vcpu_run+0x7b8/0xfd0 [kvm] [<000003ff80423de6>] kvm_arch_vcpu_ioctl_run+0xee/0x3e0 [kvm] [<000003ff8040ccd8>] kvm_vcpu_ioctl+0x2c8/0x8d0 [kvm] [<00000001504ced06>] ksys_ioctl+0xae/0xe8 [<00000001504cedaa>] __s390x_sys_ioctl+0x2a/0x38 [<0000000150cb9034>] system_call+0xd8/0x2d8 2 locks held by CPU 2/KVM/16167: #0: 00000001951980c0 (&vcpu->mutex){+.+.}, at: kvm_vcpu_ioctl+0x90/0x8d0 [kvm] #1: 000000019599c0f0 (&kvm->srcu){....}, at: __vcpu_run+0x4bc/0xfd0 [kvm] Last Breaking-Event-Address: [<000003ff80429b34>] deliverable_irqs+0x10c/0x1d0 [kvm] irq event stamp: 11967 hardirqs last enabled at (11975): [<00000001502992f2>] console_unlock+0x4ca/0x650 hardirqs last disabled at (11982): [<0000000150298ee8>] console_unlock+0xc0/0x650 softirqs last enabled at (7940): [<0000000150cba6ca>] __do_softirq+0x422/0x4d8 softirqs last disabled at (7929): [<00000001501cd688>] do_softirq_own_stack+0x70/0x80 Considering what's being done here, let's fix this by removing the mutex assertion rather than acquiring the mutex for every other vcpu. Fixes: 201ae986 ("KVM: s390: protvirt: Implement interrupt injection") Signed-off-by: NEric Farman <farman@linux.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/20200415190353.63625-1-farman@linux.ibm.comSigned-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
- 18 4月, 2020 3 次提交
-
-
由 Tony Luck 提交于
Tremont CPUs support IA32_CORE_CAPABILITIES bits to indicate whether specific SKUs have support for split lock detection. Signed-off-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200416205754.21177-4-tony.luck@intel.com
-
由 Tony Luck 提交于
The Intel Software Developers' Manual erroneously listed bit 5 of the IA32_CORE_CAPABILITIES register as an architectural feature. It is not. Features enumerated by IA32_CORE_CAPABILITIES are model specific and implementation details may vary in different cpu models. Thus it is only safe to trust features after checking the CPU model. Icelake client and server models are known to implement the split lock detect feature even though they don't enumerate IA32_CORE_CAPABILITIES [ tglx: Use switch() for readability and massage comments ] Fixes: 6650cdd9 ("x86/split_lock: Enable split lock detection by kernel") Signed-off-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200416205754.21177-3-tony.luck@intel.com
-
由 James Morse 提交于
Resctrl assumes that all CPUs are online when the filesystem is mounted, and that CPUs remember their CDP-enabled state over CPU hotplug. This goes wrong when resctrl's CDP-enabled state changes while all the CPUs in a domain are offline. When a domain comes online, enable (or disable!) CDP to match resctrl's current setting. Fixes: 5ff193fb ("x86/intel_rdt: Add basic resctrl filesystem support") Suggested-by: NReinette Chatre <reinette.chatre@intel.com> Signed-off-by: NJames Morse <james.morse@arm.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200221162105.154163-1-james.morse@arm.com
-
- 17 4月, 2020 4 次提交
-
-
由 Venkatesh Srinivas 提交于
Linux 3.14 unconditionally reads the RAPL PMU MSRs on boot, without handling General Protection Faults on reading those MSRs. Rather than injecting a #GP, which prevents boot, handle the MSRs by returning 0 for their data. Zero was checked to be safe by code review of the RAPL PMU driver and in discussion with the original driver author (eranian@google.com). Signed-off-by: NVenkatesh Srinivas <venkateshs@google.com> Signed-off-by: NJon Cargille <jcargill@google.com> Reviewed-by: NJim Mattson <jmattson@google.com> Message-Id: <20200416184254.248374-1-jcargill@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Steve Rutherford 提交于
Fixes a NULL pointer dereference, caused by the PIT firing an interrupt before the interrupt table has been initialized. SET_PIT2 can race with the creation of the IRQchip. In particular, if SET_PIT2 is called with a low PIT timer period (after the creation of the IOAPIC, but before the instantiation of the irq routes), the PIT can fire an interrupt at an uninitialized table. Signed-off-by: NSteve Rutherford <srutherford@google.com> Signed-off-by: NJon Cargille <jcargill@google.com> Reviewed-by: NJim Mattson <jmattson@google.com> Message-Id: <20200416191152.259434-1-jcargill@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ahmad Fatoum 提交于
512a928a ("ARM: imx: build v7_cpu_resume() unconditionally") introduced an unintended linker error for i.MX6 configurations that have ARM_CPU_SUSPEND=n which can happen if neither CONFIG_PM, CONFIG_CPU_IDLE, nor ARM_PSCI_FW are selected. Fix this by having v7_cpu_resume() compiled only when cpu_resume() it calls is available as well. The C declaration for the function remains unguarded to avoid future code inadvertently using a stub and introducing a regression to the bug the original commit fixed. Cc: <stable@vger.kernel.org> Fixes: 512a928a ("ARM: imx: build v7_cpu_resume() unconditionally") Reported-by: NClemens Gruber <clemens.gruber@pqgruber.com> Signed-off-by: NAhmad Fatoum <a.fatoum@pengutronix.de> Tested-by: NRoland Hieber <rhi@pengutronix.de> Signed-off-by: NArnd Bergmann <arnd@arndb.de>
-
由 Reinette Chatre 提交于
The default resource group ("rdtgroup_default") is associated with the root of the resctrl filesystem and should never be removed. New resource groups can be created as subdirectories of the resctrl filesystem and they can be removed from user space. There exists a safeguard in the directory removal code (rdtgroup_rmdir()) that ensures that only subdirectories can be removed by testing that the directory to be removed has to be a child of the root directory. A possible deadlock was recently fixed with 334b0f4e ("x86/resctrl: Fix a deadlock due to inaccurate reference"). This fix involved associating the private data of the "mon_groups" and "mon_data" directories to the resource group to which they belong instead of NULL as before. A consequence of this change was that the original safeguard code preventing removal of "mon_groups" and "mon_data" found in the root directory failed resulting in attempts to remove the default resource group that ends in a BUG: kernel BUG at mm/slub.c:3969! invalid opcode: 0000 [#1] SMP PTI Call Trace: rdtgroup_rmdir+0x16b/0x2c0 kernfs_iop_rmdir+0x5c/0x90 vfs_rmdir+0x7a/0x160 do_rmdir+0x17d/0x1e0 do_syscall_64+0x55/0x1d0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fix this by improving the directory removal safeguard to ensure that subdirectories of the resctrl root directory can only be removed if they are a child of the resctrl filesystem's root _and_ not associated with the default resource group. Fixes: 334b0f4e ("x86/resctrl: Fix a deadlock due to inaccurate reference") Reported-by: NSai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: NReinette Chatre <reinette.chatre@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Tested-by: NSai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/884cbe1773496b5dbec1b6bd11bb50cffa83603d.1584461853.git.reinette.chatre@intel.com
-