- 09 2月, 2022 1 次提交
-
-
由 Marc Zyngier 提交于
Userspace can specify which events a guest is allowed to use with the KVM_ARM_VCPU_PMU_V3_FILTER attribute. The list of allowed events can be identified by a guest from reading the PMCEID{0,1}_EL0 registers. Changing the PMU event filter after a VCPU has run can cause reads of the registers performed before the filter is changed to return different values than reads performed with the new event filter in place. The architecture defines the two registers as read-only, and this behaviour contradicts that. Keep track when the first VCPU has run and deny changes to the PMU event filter to prevent this from happening. Signed-off-by: NMarc Zyngier <maz@kernel.org> [ Alexandru E: Added commit message, updated ioctl documentation ] Signed-off-by: NAlexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220127161759.53553-2-alexandru.elisei@arm.com
-
- 05 2月, 2022 1 次提交
-
-
由 Jason A. Donenfeld 提交于
blake2s_compress_generic is weakly aliased by blake2s_compress. The current harness for function selection uses a function pointer, which is ordinarily inlined and resolved at compile time. But when Clang's CFI is enabled, CFI still triggers when making an indirect call via a weak symbol. This seems like a bug in Clang's CFI, as though it's bucketing weak symbols and strong symbols differently. It also only seems to trigger when "full LTO" mode is used, rather than "thin LTO". [ 0.000000][ T0] Kernel panic - not syncing: CFI failure (target: blake2s_compress_generic+0x0/0x1444) [ 0.000000][ T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-mainline-06981-g076c855b846e #1 [ 0.000000][ T0] Hardware name: MT6873 (DT) [ 0.000000][ T0] Call trace: [ 0.000000][ T0] dump_backtrace+0xfc/0x1dc [ 0.000000][ T0] dump_stack_lvl+0xa8/0x11c [ 0.000000][ T0] panic+0x194/0x464 [ 0.000000][ T0] __cfi_check_fail+0x54/0x58 [ 0.000000][ T0] __cfi_slowpath_diag+0x354/0x4b0 [ 0.000000][ T0] blake2s_update+0x14c/0x178 [ 0.000000][ T0] _extract_entropy+0xf4/0x29c [ 0.000000][ T0] crng_initialize_primary+0x24/0x94 [ 0.000000][ T0] rand_initialize+0x2c/0x6c [ 0.000000][ T0] start_kernel+0x2f8/0x65c [ 0.000000][ T0] __primary_switched+0xc4/0x7be4 [ 0.000000][ T0] Rebooting in 5 seconds.. Nonetheless, the function pointer method isn't so terrific anyway, so this patch replaces it with a simple boolean, which also gets inlined away. This successfully works around the Clang bug. In general, I'm not too keen on all of the indirection involved here; it clearly does more harm than good. Hopefully the whole thing can get cleaned up down the road when lib/crypto is overhauled more comprehensively. But for now, we go with a simple bandaid. Fixes: 6048fdcc ("lib/crypto: blake2s: include as built-in") Link: https://github.com/ClangBuiltLinux/linux/issues/1567Reported-by: NMiles Chen <miles.chen@mediatek.com> Tested-by: NMiles Chen <miles.chen@mediatek.com> Tested-by: NNathan Chancellor <nathan@kernel.org> Tested-by: NJohn Stultz <john.stultz@linaro.org> Acked-by: NNick Desaulniers <ndesaulniers@google.com> Reviewed-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
-
- 04 2月, 2022 2 次提交
-
-
由 Sean Christopherson 提交于
Use ERR_PTR_USR() when returning -EFAULT from kvm_get_attr_addr(), sparse complains about implicitly casting the kernel pointer from ERR_PTR() into a __user pointer. >> arch/x86/kvm/x86.c:4342:31: sparse: sparse: incorrect type in return expression (different address spaces) @@ expected void [noderef] __user * @@ got void * @@ arch/x86/kvm/x86.c:4342:31: sparse: expected void [noderef] __user * arch/x86/kvm/x86.c:4342:31: sparse: got void * >> arch/x86/kvm/x86.c:4342:31: sparse: sparse: incorrect type in return expression (different address spaces) @@ expected void [noderef] __user * @@ got void * @@ arch/x86/kvm/x86.c:4342:31: sparse: expected void [noderef] __user * arch/x86/kvm/x86.c:4342:31: sparse: got void * No functional change intended. Fixes: 56f289a8 ("KVM: x86: Add a helper to retrieve userspace address from kvm_device_attr") Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20220202005157.2545816-1-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jim Mattson 提交于
CPUID.(EAX=7,ECX=0):EBX.FDP_EXCPTN_ONLY[bit 6] and CPUID.(EAX=7,ECX=0):EBX.ZERO_FCS_FDS[bit 13] are "defeature" bits. Unlike most of the other CPUID feature bits, these bits are clear if the features are present and set if the features are not present. These bits should be reported in KVM_GET_SUPPORTED_CPUID, because if these bits are set on hardware, they cannot be cleared in the guest CPUID. Doing so would claim guest support for a feature that the hardware doesn't support and that can't be efficiently emulated. Of course, any software (e.g WIN87EM.DLL) expecting these features to be present likely predates these CPUID feature bits and therefore doesn't know to check for them anyway. Aaron Lewis added the corresponding X86_FEATURE macros in commit cbb99c0f ("x86/cpufeatures: Add FDP_EXCPTN_ONLY and ZERO_FCS_FDS"), with the intention of reporting these bits in KVM_GET_SUPPORTED_CPUID, but I was unable to find a proposed patch on the kvm list. Opportunistically reordered the CPUID_7_0_EBX capability bits from least to most significant. Cc: Aaron Lewis <aaronlewis@google.com> Signed-off-by: NJim Mattson <jmattson@google.com> Message-Id: <20220204001348.2844660-1-jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 03 2月, 2022 4 次提交
-
-
由 James Morse 提交于
Cortex-A510's erratum #2077057 causes SPSR_EL2 to be corrupted when single-stepping authenticated ERET instructions. A single step is expected, but a pointer authentication trap is taken instead. The erratum causes SPSR_EL1 to be copied to SPSR_EL2, which could allow EL1 to cause a return to EL2 with a guest controlled ELR_EL2. Because the conditions require an ERET into active-not-pending state, this is only a problem for the EL2 when EL2 is stepping EL1. In this case the previous SPSR_EL2 value is preserved in struct kvm_vcpu, and can be restored. Cc: stable@vger.kernel.org # 53960faf: arm64: Add Cortex-A510 CPU part definition Cc: stable@vger.kernel.org Signed-off-by: NJames Morse <james.morse@arm.com> [maz: fixup cpucaps ordering] Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220127122052.1584324-5-james.morse@arm.com
-
由 James Morse 提交于
Prior to commit defe21f4 ("KVM: arm64: Move PC rollback on SError to HYP"), when an SError is synchronised due to another exception, KVM handles the SError first. If the guest survives, the instruction that triggered the original exception is re-exectued to handle the first exception. HVC is treated as a special case as the instruction wouldn't normally be re-exectued, as its not a trap. Commit defe21f4 didn't preserve the behaviour of the 'return 1' that skips the rest of handle_exit(). Since commit defe21f4, KVM will try to handle the SError and the original exception at the same time. When the exception was an HVC, fixup_guest_exit() has already rolled back ELR_EL2, meaning if the guest has virtual SError masked, it will execute and handle the HVC twice. Restore the original behaviour. Fixes: defe21f4 ("KVM: arm64: Move PC rollback on SError to HYP") Cc: stable@vger.kernel.org Signed-off-by: NJames Morse <james.morse@arm.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220127122052.1584324-4-james.morse@arm.com
-
由 James Morse 提交于
When any exception other than an IRQ occurs, the CPU updates the ESR_EL2 register with the exception syndrome. An SError may also become pending, and will be synchronised by KVM. KVM notes the exception type, and whether an SError was synchronised in exit_code. When an exception other than an IRQ occurs, fixup_guest_exit() updates vcpu->arch.fault.esr_el2 from the hardware register. When an SError was synchronised, the vcpu esr value is used to determine if the exception was due to an HVC. If so, ELR_EL2 is moved back one instruction. This is so that KVM can process the SError first, and re-execute the HVC if the guest survives the SError. But if an IRQ synchronises an SError, the vcpu's esr value is stale. If the previous non-IRQ exception was an HVC, KVM will corrupt ELR_EL2, causing an unrelated guest instruction to be executed twice. Check ARM_EXCEPTION_CODE() before messing with ELR_EL2, IRQs don't update this register so don't need to check. Fixes: defe21f4 ("KVM: arm64: Move PC rollback on SError to HYP") Cc: stable@vger.kernel.org Reported-by: NSteven Price <steven.price@arm.com> Signed-off-by: NJames Morse <james.morse@arm.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220127122052.1584324-3-james.morse@arm.com
-
由 Jan Beulich 提交于
This started out with me noticing that "dom0_max_vcpus=<N>" with <N> larger than the number of physical CPUs reported through ACPI tables would not bring up the "excess" vCPU-s. Addressing this is the primary purpose of the change; CPU maps handling is being tidied only as far as is necessary for the change here (with the effect of also avoiding the setting up of too much per-CPU infrastructure, i.e. for CPUs which can never come online). Noticing that xen_fill_possible_map() is called way too early, whereas xen_filter_cpu_maps() is called too late (after per-CPU areas were already set up), and further observing that each of the functions serves only one of Dom0 or DomU, it looked like it was better to simplify this. Use the .get_smp_config hook instead, uniformly for Dom0 and DomU. xen_fill_possible_map() can be dropped altogether, while xen_filter_cpu_maps() is re-purposed but not otherwise changed. Signed-off-by: NJan Beulich <jbeulich@suse.com> Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/2dbd5f0a-9859-ca2d-085e-a02f7166c610@suse.comSigned-off-by: NJuergen Gross <jgross@suse.com>
-
- 02 2月, 2022 6 次提交
-
-
由 Anup Patel 提交于
The SBI implementation version returned by KVM RISC-V should be the Host Linux version code. Fixes: c62a7685 ("RISC-V: KVM: Add SBI v0.2 base extension") Signed-off-by: NAnup Patel <apatel@ventanamicro.com> Reviewed-by: NAtish Patra <atishp@rivosinc.com> Signed-off-by: NAnup Patel <anup@brainfault.org>
-
由 Mayuresh Chitale 提交于
Those applications that run in VU mode and access the time CSR cause a virtual instruction trap as Guest kernel currently does not initialize the scounteren CSR. To fix this, we should make CY, TM, and IR counters accessibile by default in VU mode (similar to OpenSBI). Fixes: a33c72fa ("RISC-V: KVM: Implement VCPU create, init and destroy functions") Cc: stable@vger.kernel.org Signed-off-by: NMayuresh Chitale <mchitale@ventanamicro.com> Signed-off-by: NAnup Patel <anup@brainfault.org>
-
由 Mark Rutland 提交于
In kvm_arch_vcpu_ioctl_run() we enter an RCU extended quiescent state (EQS) by calling guest_enter_irqoff(), and unmask IRQs prior to exiting the EQS by calling guest_exit(). As the IRQ entry code will not wake RCU in this case, we may run the core IRQ code and IRQ handler without RCU watching, leading to various potential problems. Additionally, we do not inform lockdep or tracing that interrupts will be enabled during guest execution, which caan lead to misleading traces and warnings that interrupts have been enabled for overly-long periods. This patch fixes these issues by using the new timing and context entry/exit helpers to ensure that interrupts are handled during guest vtime but with RCU watching, with a sequence: guest_timing_enter_irqoff(); guest_state_enter_irqoff(); < run the vcpu > guest_state_exit_irqoff(); < take any pending IRQs > guest_timing_exit_irqoff(); Since instrumentation may make use of RCU, we must also ensure that no instrumented code is run during the EQS. I've split out the critical section into a new kvm_riscv_enter_exit_vcpu() helper which is marked noinstr. Fixes: 99cdc6c1 ("RISC-V: Add initial skeletal KVM support") Signed-off-by: NMark Rutland <mark.rutland@arm.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Anup Patel <anup@brainfault.org> Cc: Atish Patra <atishp@atishpatra.org> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Tested-by: NAnup Patel <anup@brainfault.org> Signed-off-by: NAnup Patel <anup@brainfault.org>
-
由 Tristan Hume 提交于
Add a check for !buf->single before calling pt_buffer_region_size in a place where a missing check can cause a kernel crash. Fixes a bug introduced by commit 67063847 ("perf/x86/intel/pt: Opportunistically use single range output mode"), which added a support for PT single-range output mode. Since that commit if a PT stop filter range is hit while tracing, the kernel will crash because of a null pointer dereference in pt_handle_status due to calling pt_buffer_region_size without a ToPA configured. The commit which introduced single-range mode guarded almost all uses of the ToPA buffer variables with checks of the buf->single variable, but missed the case where tracing was stopped by the PT hardware, which happens when execution hits a configured stop filter. Tested that hitting a stop filter while PT recording successfully records a trace with this patch but crashes without this patch. Fixes: 67063847 ("perf/x86/intel/pt: Opportunistically use single range output mode") Signed-off-by: NTristan Hume <tristan@thume.ca> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NAdrian Hunter <adrian.hunter@intel.com> Cc: stable@kernel.org Link: https://lkml.kernel.org/r/20220127220806.73664-1-tristan@thume.ca
-
由 Peter Zijlstra 提交于
Kyle reported that rr[0] has started to malfunction on Comet Lake and later CPUs due to EFI starting to make use of CPL3 [1] and the PMU event filtering not distinguishing between regular CPL3 and SMM CPL3. Since this is a privilege violation, default disable SMM visibility where possible. Administrators wanting to observe SMM cycles can easily change this using the sysfs attribute while regular users don't have access to this file. [0] https://rr-project.org/ [1] See the Intel white paper "Trustworthy SMM on the Intel vPro Platform" at https://bugzilla.kernel.org/attachment.cgi?id=300300, particularly the end of page 5. Reported-by: NKyle Huey <me@kylehuey.com> Suggested-by: NAndrew Cooper <Andrew.Cooper3@citrix.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@kernel.org Link: https://lkml.kernel.org/r/YfKChjX61OW4CkYm@hirez.programming.kicks-ass.net
-
由 Mark Rutland 提交于
In kvm_arch_vcpu_ioctl_run() we enter an RCU extended quiescent state (EQS) by calling guest_enter_irqoff(), and unmasked IRQs prior to exiting the EQS by calling guest_exit(). As the IRQ entry code will not wake RCU in this case, we may run the core IRQ code and IRQ handler without RCU watching, leading to various potential problems. Additionally, we do not inform lockdep or tracing that interrupts will be enabled during guest execution, which caan lead to misleading traces and warnings that interrupts have been enabled for overly-long periods. This patch fixes these issues by using the new timing and context entry/exit helpers to ensure that interrupts are handled during guest vtime but with RCU watching, with a sequence: guest_timing_enter_irqoff(); guest_state_enter_irqoff(); < run the vcpu > guest_state_exit_irqoff(); < take any pending IRQs > guest_timing_exit_irqoff(); Since instrumentation may make use of RCU, we must also ensure that no instrumented code is run during the EQS. I've split out the critical section into a new kvm_arm_enter_exit_vcpu() helper which is marked noinstr. Fixes: 1b3d546d ("arm/arm64: KVM: Properly account for guest CPU time") Reported-by: NNicolas Saenz Julienne <nsaenzju@redhat.com> Signed-off-by: NMark Rutland <mark.rutland@arm.com> Reviewed-by: NMarc Zyngier <maz@kernel.org> Reviewed-by: NNicolas Saenz Julienne <nsaenzju@redhat.com> Cc: Alexandru Elisei <alexandru.elisei@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Message-Id: <20220201132926.3301912-3-mark.rutland@arm.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 01 2月, 2022 5 次提交
-
-
由 Mark Rutland 提交于
For consistency and clarity, migrate x86 over to the generic helpers for guest timing and lockdep/RCU/tracing management, and remove the x86-specific helpers. Prior to this patch, the guest timing was entered in kvm_guest_enter_irqoff() (called by svm_vcpu_enter_exit() and svm_vcpu_enter_exit()), and was exited by the call to vtime_account_guest_exit() within vcpu_enter_guest(). To minimize duplication and to more clearly balance entry and exit, both entry and exit of guest timing are placed in vcpu_enter_guest(), using the new guest_timing_{enter,exit}_irqoff() helpers. When context tracking is used a small amount of additional time will be accounted towards guests; tick-based accounting is unnaffected as IRQs are disabled at this point and not enabled until after the return from the guest. This also corrects (benign) mis-balanced context tracking accounting introduced in commits: ae95f566 ("KVM: X86: TSCDEADLINE MSR emulation fastpath") 26efe2fd ("KVM: VMX: Handle preemption timer fastpath") Where KVM can enter a guest multiple times, calling vtime_guest_enter() without a corresponding call to vtime_account_guest_exit(), and with vtime_account_system() called when vtime_account_guest() should be used. As account_system_time() checks PF_VCPU and calls account_guest_time(), this doesn't result in any functional problem, but is unnecessarily confusing. Signed-off-by: NMark Rutland <mark.rutland@arm.com> Acked-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NNicolas Saenz Julienne <nsaenzju@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jim Mattson <jmattson@google.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wanpeng Li <wanpengli@tencent.com> Message-Id: <20220201132926.3301912-4-mark.rutland@arm.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Mark Rutland 提交于
In kvm_arch_vcpu_ioctl_run() we use guest_enter_irqoff() and guest_exit_irqoff() directly, with interrupts masked between these. As we don't handle any timer ticks during this window, we will not account time spent within the guest as guest time, which is unfortunate. Additionally, we do not inform lockdep or tracing that interrupts will be enabled during guest execution, which caan lead to misleading traces and warnings that interrupts have been enabled for overly-long periods. This patch fixes these issues by using the new timing and context entry/exit helpers to ensure that interrupts are handled during guest vtime but with RCU watching, with a sequence: guest_timing_enter_irqoff(); guest_state_enter_irqoff(); < run the vcpu > guest_state_exit_irqoff(); < take any pending IRQs > guest_timing_exit_irqoff(); In addition, as guest exits during the "run the vcpu" step are handled by kvm_mips_handle_exit(), a wrapper function is added which ensures that such exists are handled with a sequence: guest_state_exit_irqoff(); < handle the exit > guest_state_enter_irqoff(); This means that exits which stop the vCPU running will have a redundant guest_state_enter_irqoff() .. guest_state_exit_irqoff() sequence, which can be addressed with future rework. Since instrumentation may make use of RCU, we must also ensure that no instrumented code is run during the EQS. I've split out the critical section into a new kvm_mips_enter_exit_vcpu() helper which is marked noinstr. Signed-off-by: NMark Rutland <mark.rutland@arm.com> Cc: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Message-Id: <20220201132926.3301912-6-mark.rutland@arm.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Handle non-APICv interrupt delivery in vendor code, even though it means VMX and SVM will temporarily have duplicate code. SVM's AVIC has a race condition that requires KVM to fall back to legacy interrupt injection _after_ the interrupt has been logged in the vIRR, i.e. to fix the race, SVM will need to open code the full flow anyways[*]. Refactor the code so that the SVM bug without introducing other issues, e.g. SVM would return "success" and thus invoke trace_kvm_apicv_accept_irq() even when delivery through the AVIC failed, and to opportunistically prepare for using KVM_X86_OP to fill each vendor's kvm_x86_ops struct, which will rely on the vendor function matching the kvm_x86_op pointer name. No functional change intended. [*] https://lore.kernel.org/all/20211213104634.199141-4-mlevitsk@redhat.comSigned-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20220128005208.4008533-3-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Randy Dunlap 提交于
Fix all kernel-doc warnings in mips/kvm/vz.c as reported by the kernel test robot: arch/mips/kvm/vz.c:471: warning: Function parameter or member 'out_compare' not described in '_kvm_vz_save_htimer' arch/mips/kvm/vz.c:471: warning: Function parameter or member 'out_cause' not described in '_kvm_vz_save_htimer' arch/mips/kvm/vz.c:471: warning: Excess function parameter 'compare' description in '_kvm_vz_save_htimer' arch/mips/kvm/vz.c:471: warning: Excess function parameter 'cause' description in '_kvm_vz_save_htimer' arch/mips/kvm/vz.c:1551: warning: No description found for return value of 'kvm_trap_vz_handle_cop_unusable' arch/mips/kvm/vz.c:1552: warning: expecting prototype for kvm_trap_vz_handle_cop_unusuable(). Prototype was for kvm_trap_vz_handle_cop_unusable() instead arch/mips/kvm/vz.c:1597: warning: No description found for return value of 'kvm_trap_vz_handle_msa_disabled' Fixes: c992a4f6 ("KVM: MIPS: Implement VZ support") Fixes: f4474d50 ("KVM: MIPS/VZ: Support hardware guest timer") Signed-off-by: NRandy Dunlap <rdunlap@infradead.org> Reported-by: Nkernel test robot <lkp@intel.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: linux-mips@vger.kernel.org Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> Cc: James Hogan <jhogan@kernel.org> Cc: kvm@vger.kernel.org Signed-off-by: NThomas Bogendoerfer <tsbogend@alpha.franken.de>
-
由 Thomas Bogendoerfer 提交于
Fixes: fa62f39d ("MIPS: Fix build error due to PTR used in more places") Signed-off-by: NThomas Bogendoerfer <tsbogend@alpha.franken.de>
-
- 30 1月, 2022 1 次提交
-
-
由 Randy Dunlap 提交于
In linux-next, IA64_MCA_RECOVERY uses the (new) function make_task_dead(), which is not exported for use by modules. Instead of exporting it for one user, convert IA64_MCA_RECOVERY to be a bool Kconfig symbol. In a config file from "kernel test robot <lkp@intel.com>" for a different problem, this linker error was exposed when CONFIG_IA64_MCA_RECOVERY=m. Fixes this build error: ERROR: modpost: "make_task_dead" [arch/ia64/kernel/mca_recovery.ko] undefined! Link: https://lkml.kernel.org/r/20220124213129.29306-1-rdunlap@infradead.org Fixes: 0e25498f ("exit: Add and use make_task_dead.") Signed-off-by: NRandy Dunlap <rdunlap@infradead.org> Suggested-by: NChristoph Hellwig <hch@infradead.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: N"Eric W. Biederman" <ebiederm@xmission.com> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 29 1月, 2022 1 次提交
-
-
由 James Morse 提交于
Versions of Cortex-A510 before r0p3 are affected by a hardware erratum where the hardware update of the dirty bit is not correctly ordered. Add these cpus to the cpu_has_broken_dbm list. Signed-off-by: NJames Morse <james.morse@arm.com> Link: https://lore.kernel.org/r/20220125154040.549272-3-james.morse@arm.comSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
- 28 1月, 2022 13 次提交
-
-
由 Vitaly Kuznetsov 提交于
Hyper-V TLFS explicitly forbids VMREAD and VMWRITE instructions when Enlightened VMCS interface is in use: "Any VMREAD or VMWRITE instructions while an enlightened VMCS is active is unsupported and can result in unexpected behavior."" Windows 11 + WSL2 seems to ignore this, attempts to VMREAD VMCS field 0x4404 ("VM-exit interruption information") are observed. Failing these attempts with nested_vmx_failInvalid() makes such guests unbootable. Microsoft confirms this is a Hyper-V bug and claims that it'll get fixed eventually but for the time being we need a workaround. (Temporary) allow VMREAD to get data from the currently loaded Enlightened VMCS. Note: VMWRITE instructions remain forbidden, it is not clear how to handle them properly and hopefully won't ever be needed. Reviewed-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220112170134.1904308-6-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
In preparation to allowing reads from Enlightened VMCS from handle_vmread(), implement evmcs_field_offset() to get the correct read offset. get_evmcs_offset(), which is being used by KVM-on-Hyper-V, is almost what's needed but a few things need to be adjusted. First, WARN_ON() is unacceptable for handle_vmread() as any field can (in theory) be supplied by the guest and not all fields are defined in eVMCS v1. Second, we need to handle 'holes' in eVMCS (missing fields). It also sounds like a good idea to WARN_ON() if such fields are ever accessed by KVM-on-Hyper-V. Implement dedicated evmcs_field_offset() helper. No functional change intended. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220112170134.1904308-5-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
vmcs_to_field_offset{,_table} may sound misleading as VMCS is an opaque blob which is not supposed to be accessed directly. In fact, vmcs_to_field_offset{,_table} are related to KVM defined VMCS12 structure. Rename vmcs_field_to_offset() to get_vmcs12_field_offset() for clarity. No functional change intended. Reviewed-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220112170134.1904308-4-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
Enlightened VMCS v1 doesn't have VMX_PREEMPTION_TIMER_VALUE field, PIN_BASED_VMX_PREEMPTION_TIMER is also filtered out already so it makes sense to filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER too. Note, none of the currently existing Windows/Hyper-V versions are known to enable 'save VMX-preemption timer value' when eVMCS is in use, the change is aimed at making the filtering future proof. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220112170134.1904308-3-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
Similar to MSR_IA32_VMX_EXIT_CTLS/MSR_IA32_VMX_TRUE_EXIT_CTLS, MSR_IA32_VMX_ENTRY_CTLS/MSR_IA32_VMX_TRUE_ENTRY_CTLS pair, MSR_IA32_VMX_TRUE_PINBASED_CTLS needs to be filtered the same way MSR_IA32_VMX_PINBASED_CTLS is currently filtered as guests may solely rely on 'true' MSR data. Note, none of the currently existing Windows/Hyper-V versions are known to stumble upon the unfiltered MSR_IA32_VMX_TRUE_PINBASED_CTLS, the change is aimed at making the filtering future proof. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220112170134.1904308-2-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Because KVM_GET_SUPPORTED_CPUID is meant to be passed (by simple-minded VMMs) to KVM_SET_CPUID2, it cannot include any dynamic xsave states that have not been enabled. Probing those, for example so that they can be passed to ARCH_REQ_XCOMP_GUEST_PERM, requires a new ioctl or arch_prctl. The latter is in fact worse, even though that is what the rest of the API uses, because it would require supported_xcr0 to be moved from the KVM module to the kernel just for this use. In addition, the value would be nonsensical (or an error would have to be returned) until the KVM module is loaded in. Therefore, to limit the growth of system ioctls, add a /dev/kvm variant of KVM_{GET,HAS}_DEVICE_ATTR, and implement it in x86 with just one group (0) and attribute (KVM_X86_XCOMP_GUEST_SUPP). Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Add a helper to handle converting the u64 userspace address embedded in struct kvm_device_attr into a userspace pointer, it's all too easy to forget the intermediate "unsigned long" cast as well as the truncation check. No functional change intended. Signed-off-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Roger Pau Monne 提交于
There's no point in disabling x2APIC mode when running as a Xen HVM guest, just enable it when available. Remove some unneeded wrapping around the detection functions, and simply provide a xen_x2apic_available helper that's a wrapper around x2apic_supported. Signed-off-by: NRoger Pau Monné <roger.pau@citrix.com> Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20220121090146.13697-1-roger.pau@citrix.comSigned-off-by: NJuergen Gross <jgross@suse.com>
-
由 Steven Rostedt (Google) 提交于
First S390 complained that the sorting of the mcount sections at build time caused the kernel to crash on their architecture. Now PowerPC is complaining about it too. And also ARM64 appears to be having issues. It may be necessary to also update the relocation table for the values in the mcount table. Not only do we have to sort the table, but also update the relocations that may be applied to the items in the table. If the system is not relocatable, then it is fine to sort, but if it is, some architectures may have issues (although x86 does not as it shifts all addresses the same). Add a HAVE_BUILDTIME_MCOUNT_SORT that an architecture can set to say it is safe to do the sorting at build time. Also update the config to compile in build time sorting in the sorttable code in scripts/ to depend on CONFIG_BUILDTIME_MCOUNT_SORT. Link: https://lore.kernel.org/all/944D10DA-8200-4BA9-8D0A-3BED9AA99F82@linux.ibm.com/ Link: https://lkml.kernel.org/r/20220127153821.3bc1ac6e@gandalf.local.home Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Yinan Liu <yinan@linux.alibaba.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Kees Cook <keescook@chromium.org> Reported-by: NSachin Sant <sachinp@linux.ibm.com> Reviewed-by: NMark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> [arm64] Tested-by: NSachin Sant <sachinp@linux.ibm.com> Fixes: 72b3942a ("scripts: ftrace - move the sort-processing in ftrace_init") Signed-off-by: NSteven Rostedt (Google) <rostedt@goodmis.org>
-
由 Anshuman Khandual 提交于
TRBE implementations affected by Arm erratum #1902691 might corrupt trace data or deadlock, when it's being written into the memory. So effectively TRBE is broken and hence cannot be used to capture trace data. This adds a new errata ARM64_ERRATUM_1902691 in arm64 errata framework. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Suzuki Poulose <suzuki.poulose@arm.com> Cc: coresight@lists.linaro.org Cc: linux-doc@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com> Acked-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NAnshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/1643120437-14352-5-git-send-email-anshuman.khandual@arm.comSigned-off-by: NMathieu Poirier <mathieu.poirier@linaro.org>
-
由 Anshuman Khandual 提交于
TRBE implementations affected by Arm erratum #2038923 might get TRBE into an inconsistent view on whether trace is prohibited within the CPU. As a result, the trace buffer or trace buffer state might be corrupted. This happens after TRBE buffer has been enabled by setting TRBLIMITR_EL1.E, followed by just a single context synchronization event before execution changes from a context, in which trace is prohibited to one where it isn't, or vice versa. In these mentioned conditions, the view of whether trace is prohibited is inconsistent between parts of the CPU, and the trace buffer or the trace buffer state might be corrupted. This adds a new errata ARM64_ERRATUM_2038923 in arm64 errata framework. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Suzuki Poulose <suzuki.poulose@arm.com> Cc: coresight@lists.linaro.org Cc: linux-doc@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com> Acked-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NAnshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/1643120437-14352-4-git-send-email-anshuman.khandual@arm.comSigned-off-by: NMathieu Poirier <mathieu.poirier@linaro.org>
-
由 Anshuman Khandual 提交于
TRBE implementations affected by Arm erratum #2064142 might fail to write into certain system registers after the TRBE has been disabled. Under some conditions after TRBE has been disabled, writes into certain TRBE registers TRBLIMITR_EL1, TRBPTR_EL1, TRBBASER_EL1, TRBSR_EL1 and TRBTRG_EL1 will be ignored and not be effected. This adds a new errata ARM64_ERRATUM_2064142 in arm64 errata framework. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Suzuki Poulose <suzuki.poulose@arm.com> Cc: coresight@lists.linaro.org Cc: linux-doc@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com> Acked-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NAnshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/1643120437-14352-3-git-send-email-anshuman.khandual@arm.comSigned-off-by: NMathieu Poirier <mathieu.poirier@linaro.org>
-
由 Anshuman Khandual 提交于
Add the CPU Partnumbers for the new Arm designs. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Suzuki Poulose <suzuki.poulose@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com> Acked-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NAnshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/1643120437-14352-2-git-send-email-anshuman.khandual@arm.comSigned-off-by: NMathieu Poirier <mathieu.poirier@linaro.org>
-
- 27 1月, 2022 6 次提交
-
-
由 Thomas Bogendoerfer 提交于
Use PTR_WD instead of PTR to avoid clashes with other parts. Signed-off-by: NThomas Bogendoerfer <tsbogend@alpha.franken.de>
-
由 Evgenii Stepanov 提交于
In ex_handler_load_unaligned_zeropad() we erroneously extract the data and addr register indices from ex->type rather than ex->data. As ex->type will contain EX_TYPE_LOAD_UNALIGNED_ZEROPAD (i.e. 4): * We'll always treat X0 as the address register, since EX_DATA_REG_ADDR is extracted from bits [9:5]. Thus, we may attempt to dereference an arbitrary address as X0 may hold an arbitrary value. * We'll always treat X4 as the data register, since EX_DATA_REG_DATA is extracted from bits [4:0]. Thus we will corrupt X4 and cause arbitrary behaviour within load_unaligned_zeropad() and its caller. Fix this by extracting both values from ex->data as originally intended. On an MTE-enabled QEMU image we are hitting the following crash: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Call trace: fixup_exception+0xc4/0x108 __do_kernel_fault+0x3c/0x268 do_tag_check_fault+0x3c/0x104 do_mem_abort+0x44/0xf4 el1_abort+0x40/0x64 el1h_64_sync_handler+0x60/0xa0 el1h_64_sync+0x7c/0x80 link_path_walk+0x150/0x344 path_openat+0xa0/0x7dc do_filp_open+0xb8/0x168 do_sys_openat2+0x88/0x17c __arm64_sys_openat+0x74/0xa0 invoke_syscall+0x48/0x148 el0_svc_common+0xb8/0xf8 do_el0_svc+0x28/0x88 el0_svc+0x24/0x84 el0t_64_sync_handler+0x88/0xec el0t_64_sync+0x1b4/0x1b8 Code: f8695a69 71007d1f 540000e0 927df12a (f940014a) Fixes: 753b3236 ("arm64: extable: add load_unaligned_zeropad() handler") Cc: <stable@vger.kernel.org> # 5.16.x Reviewed-by: NMark Rutland <mark.rutland@arm.com> Signed-off-by: NEvgenii Stepanov <eugenis@google.com> Link: https://lore.kernel.org/r/20220125182217.2605202-1-eugenis@google.comSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
由 Like Xu 提交于
XCR0 is reset to 1 by RESET but not INIT and IA32_XSS is zeroed by both RESET and INIT. The kvm_set_msr_common()'s handling of MSR_IA32_XSS also needs to update kvm_update_cpuid_runtime(). In the above cases, the size in bytes of the XSAVE area containing all states enabled by XCR0 or (XCRO | IA32_XSS) needs to be updated. For simplicity and consistency, existing helpers are used to write values and call kvm_update_cpuid_runtime(), and it's not exactly a fast path. Fixes: a554d207 ("KVM: X86: Processor States following Reset or INIT") Cc: stable@vger.kernel.org Signed-off-by: NLike Xu <likexu@tencent.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20220126172226.2298529-4-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Like Xu 提交于
Do a runtime CPUID update for a vCPU if MSR_IA32_XSS is written, as the size in bytes of the XSAVE area is affected by the states enabled in XSS. Fixes: 20300099 ("kvm: vmx: add MSR logic for XSAVES") Cc: stable@vger.kernel.org Signed-off-by: NLike Xu <likexu@tencent.com> [sean: split out as a separate patch, adjust Fixes tag] Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20220126172226.2298529-3-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Xiaoyao Li 提交于
It has been corrected from SDM version 075 that MSR_IA32_XSS is reset to zero on Power up and Reset but keeps unchanged on INIT. Fixes: a554d207 ("KVM: X86: Processor States following Reset or INIT") Cc: stable@vger.kernel.org Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20220126172226.2298529-2-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vasily Gorbik 提交于
Currently if z/VM guest is allowed to retrieve hypervisor performance data globally for all guests (privilege class B) the query is formed in a way to include all guests but the group name is left empty. This leads to that z/VM guests which have access control group set not being included in the results (even local vm). Change the query group identifier from empty to "any" to retrieve information about all guests from any groups (or without a group set). Cc: stable@vger.kernel.org Fixes: 31cb4bd3 ("[S390] Hypervisor filesystem (s390_hypfs) for z/VM") Reviewed-by: NGerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-