- 15 3月, 2021 40 次提交
-
-
由 Sean Christopherson 提交于
Synthesize a nested VM-Exit if L2 triggers an emulated triple fault instead of exiting to userspace, which likely will kill L1. Any flow that does KVM_REQ_TRIPLE_FAULT is suspect, but the most common scenario for L2 killing L1 is if L0 (KVM) intercepts a contributory exception that is _not_intercepted by L1. E.g. if KVM is intercepting #GPs for the VMware backdoor, a #GP that occurs in L2 while vectoring an injected #DF will cause KVM to emulate triple fault. Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210302174515.2812275-2-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Refactor the svm_exit_handlers API to pass @vcpu instead of @svm to allow directly invoking common x86 exit handlers (in a future patch). Opportunistically convert an absurd number of instances of 'svm->vcpu' to direct uses of 'vcpu' to avoid pointless casting. No functional change intended. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210205005750.3841462-4-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
The logic of update_cr0_intercept is pointlessly complicated. All svm_set_cr0 is compute the effective cr0 and compare it with the guest value. Inlining the function and simplifying the condition clarifies what it is doing. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Use trace_kvm_nested_vmenter_failed() and its macro magic to trace consistency check failures on nested VMRUN. Tracing such failures by running the buggy VMM as a KVM guest is often the only way to get a precise explanation of why VMRUN failed. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210204000117.3303214-13-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Move KVM's CC() macro to x86.h so that it can be reused by nSVM. Debugging VM-Enter is as painful on SVM as it is on VMX. Rename the more visible macro to KVM_NESTED_VMENTER_CONSISTENCY_CHECK to avoid any collisions with the uber-concise "CC". Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210204000117.3303214-12-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Krish Sadhukhan 提交于
The path for SVM_SET_NESTED_STATE needs to have the same checks for the CPU registers, as we have in the VMRUN path for a nested guest. This patch adds those missing checks to svm_set_nested_state(). Suggested-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com> Message-Id: <20201006190654.32305-3-krish.sadhukhan@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
The VMLOAD/VMSAVE data is not taken from userspace, since it will not be restored on VMEXIT (it will be copied from VMCB02 to VMCB01). For clarity, replace the wholesale copy of the VMCB save area with a copy of that state only. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Since L1 and L2 now use different VMCBs, most of the fields remain the same in VMCB02 from one L2 run to the next. Since KVM itself is not looking at VMCB12's clean field, for now not much can be optimized. However, in the future we could avoid more copies if the VMCB12's SEG and DT sections are clean. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Since L1 and L2 now use different VMCBs, most of the fields remain the same from one L1 run to the next. svm_set_cr0 and other functions called by nested_svm_vmexit already take care of clearing the corresponding clean bits; only the TSC offset is special. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Most fields were going to be overwritten by vmcb12 control fields, or do not matter at all because they are filled by the processor on vmexit. Therefore, we need not copy them from vmcb01 to vmcb02 on vmentry. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Now that SVM is using a separate vmcb01 and vmcb02 (and also uses the vmcb12 naming) we can give clearer names to functions that write to and read from those VMCBs. Likewise, variables and parameters can be renamed from nested_vmcb to vmcb12. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Cathy Avery 提交于
This patch moves the asid_generation from the vcpu to the vmcb in order to track the ASID generation that was active the last time the vmcb was run. If sd->asid_generation changes between two runs, the old ASID is invalid and must be changed. Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NCathy Avery <cavery@redhat.com> Message-Id: <20210112164313.4204-3-cavery@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Cathy Avery 提交于
This patch moves the physical cpu tracking from the vcpu to the vmcb in svm_switch_vmcb. If either vmcb01 or vmcb02 change physical cpus from one vmrun to the next the vmcb's previous cpu is preserved for comparison with the current cpu and the vmcb is marked dirty if different. This prevents the processor from using old cached data for a vmcb that may have been updated on a prior run on a different processor. It also moves the physical cpu check from svm_vcpu_load to pre_svm_run as the check only needs to be done at run. Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NCathy Avery <cavery@redhat.com> Message-Id: <20210112164313.4204-2-cavery@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Cathy Avery 提交于
svm->vmcb will now point to a separate vmcb for L1 (not nested) or L2 (nested). The main advantages are removing get_host_vmcb and hsave, in favor of concepts that are shared with VMX. We don't need anymore to stash the L1 registers in hsave while L2 runs, but we need to copy the VMLOAD/VMSAVE registers from VMCB01 to VMCB02 and back. This more or less has the same cost, but code-wise nested_svm_vmloadsave can be reused. This patch omits several optimizations that are possible: - for simplicity there is some wholesale copying of vmcb.control areas which can go away. - we should be able to better use the VMCB01 and VMCB02 clean bits. - another possibility is to always use VMCB01 for VMLOAD and VMSAVE, thus avoiding the copy of VMLOAD/VMSAVE registers from VMCB01 to VMCB02 and back. Tested: kvm-unit-tests kvm self tests Loaded fedora nested guest on fedora Signed-off-by: NCathy Avery <cavery@redhat.com> Message-Id: <20201011184818.3609-3-cavery@redhat.com> [Fix conflicts; keep VMCB02 G_PAT up to date whenever guest writes the PAT MSR; do not copy CR4 over from VMCB01 as it is not needed anymore; add a few more comments. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Override the shadow root level in the MMU context when configuring NPT for shadowing nested NPT. The level is always tied to the TDP level of the host, not whatever level the guest happens to be using. Fixes: 096586fd ("KVM: nSVM: Correctly set the shadow NPT root level in its MMU role") Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-2-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Don't strip the C-bit from the faulting address on an intercepted #PF, the address is a virtual address, not a physical address. Fixes: 0ede79e1 ("KVM: SVM: Clear C-bit from the page fault address") Cc: stable@vger.kernel.org Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-13-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
WARN if KVM is about to dereference a NULL pae_root or lm_root when loading an MMU, and convert the BUG() on a bad shadow_root_level into a WARN (now that errors are handled cleanly). With nested NPT, botching the level and sending KVM down the wrong path is all too easy, and the on-demand allocation of pae_root and lm_root means bugs crash the host. Obviously, KVM could unconditionally allocate the roots, but that's arguably a worse failure mode as it would potentially corrupt the guest instead of crashing it. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-18-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
For clarity, explicitly skip syncing roots if the MMU load failed instead of relying on the !VALID_PAGE check in kvm_mmu_sync_roots(). Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-17-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Unexport the MMU load and unload helpers now that they are no longer used (incorrectly) in vendor code. Opportunistically move the kvm_mmu_sync_roots() declaration into mmu.h, it should not be exposed to vendor code. No functional change intended. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-16-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Defer unloading the MMU after a INVPCID until the instruction emulation has completed, i.e. until after RIP has been updated. On VMX, this is a benign bug as VMX doesn't touch the MMU when skipping an emulated instruction. However, on SVM, if nrip is disabled, the emulator is used to skip an instruction, which would lead to fireworks if the emulator were invoked without a valid MMU. Fixes: eb4b248e ("kvm: vmx: Support INVPCID in shadow paging mode") Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-15-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Defer reloading the MMU after a EPTP successful EPTP switch. The VMFUNC instruction itself is executed in the previous EPTP context, any side effects, e.g. updating RIP, should occur in the old context. Practically speaking, this bug is benign as VMX doesn't touch the MMU when skipping an emulated instruction, nor does queuing a single-step #DB. No other post-switch side effects exist. Fixes: 41ab9372 ("KVM: nVMX: Emulate EPTP switching for the L1 hypervisor") Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-14-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Set the C-bit in SPTEs that are set outside of the normal MMU flows, specifically the PDPDTRs and the handful of special cased "LM root" entries, all of which are shadow paging only. Note, the direct-mapped-root PDPTR handling is needed for the scenario where paging is disabled in the guest, in which case KVM uses a direct mapped MMU even though TDP is disabled. Fixes: d0ec49d4 ("kvm/x86/svm: Support Secure Memory Encryption within KVM") Cc: stable@vger.kernel.org Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-11-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Exempt NULL PAE roots from the check to detect leaks, since kvm_mmu_free_roots() doesn't set them back to INVALID_PAGE. Stop hiding the WARNs to detect PAE root leaks behind MMU_WARN_ON, the hidden WARNs obviously didn't do their job given the hilarious number of bugs that could lead to PAE roots being leaked, not to mention the above false positive. Opportunistically delete a warning on root_hpa being valid, there's nothing special about 4/5-level shadow pages that warrants a WARN. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-9-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Check the validity of the PDPTRs before allocating any of the PAE roots, otherwise a bad PDPTR will cause KVM to leak any previously allocated roots. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-8-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Hold the mmu_lock for write for the entire duration of allocating and initializing an MMU's roots. This ensures there are MMU pages available and thus prevents root allocations from failing. That in turn fixes a bug where KVM would fail to free valid PAE roots if a one of the later roots failed to allocate. Add a comment to make_mmu_pages_available() to call out that the limit is a soft limit, e.g. KVM will temporarily exceed the threshold if a page fault allocates multiple shadow pages and there was only one page "available". Note, KVM _still_ leaks the PAE roots if the guest PDPTR checks fail. This will be addressed in a future commit. Cc: Ben Gardon <bgardon@google.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-7-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Move the on-demand allocation of the pae_root and lm_root pages, used by nested NPT for 32-bit L1s, into a separate helper. This will allow a future patch to hold mmu_lock while allocating the non-special roots so that make_mmu_pages_available() can be checked once at the start of root allocation, and thus avoid having to deal with failure in the middle of root allocation. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-6-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Allocate lm_root before the PAE roots so that the PAE roots aren't leaked if the memory allocation for the lm_root happens to fail. Note, KVM can still leak PAE roots if mmu_check_root() fails on a guest's PDPTR, or if mmu_alloc_root() fails due to MMU pages not being available. Those issues will be fixed in future commits. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-5-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Grab 'mmu' and do s/vcpu->arch.mmu/mmu to shorten line lengths and yield smaller diffs when moving code around in future cleanup without forcing the new code to use the same ugly pattern. No functional change intended. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-4-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Allocate the so called pae_root page on-demand, along with the lm_root page, when shadowing 32-bit NPT with 64-bit NPT, i.e. when running a 32-bit L1. KVM currently only allocates the page when NPT is disabled, or when L0 is 32-bit (using PAE paging). Note, there is an existing memory leak involving the MMU roots, as KVM fails to free the PAE roots on failure. This will be addressed in a future commit. Fixes: ee6268ba ("KVM: x86: Skip pae_root shadow allocation if tdp enabled") Fixes: b6b80c78 ("KVM: x86/mmu: Allocate PAE root array when using SVM's 32-bit NPT") Cc: stable@vger.kernel.org Reviewed-by: NBen Gardon <bgardon@google.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-3-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Dongli Zhang 提交于
The new per-cpu stat 'nested_run' is introduced in order to track if L1 VM is running or used to run L2 VM. An example of the usage of 'nested_run' is to help the host administrator to easily track if any L1 VM is used to run L2 VM. Suppose there is issue that may happen with nested virtualization, the administrator will be able to easily narrow down and confirm if the issue is due to nested virtualization via 'nested_run'. For example, whether the fix like commit 88dddc11 ("KVM: nVMX: do not use dangling shadow VMCS after guest reset") is required. Cc: Joe Jin <joe.jin@oracle.com> Signed-off-by: NDongli Zhang <dongli.zhang@oracle.com> Message-Id: <20210305225747.7682-1-dongli.zhang@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Linus Torvalds 提交于
-
由 Alexey Dobriyan 提交于
Doing a prctl(PR_SET_MM, PR_SET_MM_AUXV, addr, 1); will copy 1 byte from userspace to (quite big) on-stack array and then stash everything to mm->saved_auxv. AT_NULL terminator will be inserted at the very end. /proc/*/auxv handler will find that AT_NULL terminator and copy original stack contents to userspace. This devious scheme requires CAP_SYS_RESOURCE. Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip由 Linus Torvalds 提交于
Pull irq fixes from Thomas Gleixner: "A set of irqchip updates: - Make the GENERIC_IRQ_MULTI_HANDLER configuration correct - Add a missing DT compatible string for the Ingenic driver - Remove the pointless debugfs_file pointer from struct irqdomain" * tag 'irq-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/ingenic: Add support for the JZ4760 dt-bindings/irq: Add compatible string for the JZ4760B irqchip: Do not blindly select CONFIG_GENERIC_IRQ_MULTI_HANDLER ARM: ep93xx: Select GENERIC_IRQ_MULTI_HANDLER directly irqdomain: Remove debugfs_file from struct irq_domain
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip由 Linus Torvalds 提交于
Pull timer fix from Thomas Gleixner: "A single fix in for hrtimers to prevent an interrupt storm caused by the lack of reevaluation of the timers which expire in softirq context under certain circumstances, e.g. when the clock was set" * tag 'timers-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: hrtimer: Update softirq_expires_next correctly after __hrtimer_get_next_event()
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip由 Linus Torvalds 提交于
Pull scheduler fixes from Thomas Gleixner: "A set of scheduler updates: - Prevent a NULL pointer dereference in the migration_stop_cpu() mechanims - Prevent self concurrency of affine_move_task() - Small fixes and cleanups related to task migration/affinity setting - Ensure that sync_runqueues_membarrier_state() is invoked on the current CPU when it is in the cpu mask" * tag 'sched-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/membarrier: fix missing local execution of ipi_sync_rq_state() sched: Simplify set_affinity_pending refcounts sched: Fix affine_move_task() self-concurrency sched: Optimize migration_cpu_stop() sched: Collate affine_move_task() stoppers sched: Simplify migration_cpu_stop() sched: Fix migration_cpu_stop() requeueing
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip由 Linus Torvalds 提交于
Pull objtool fix from Thomas Gleixner: "A single objtool fix to handle the PUSHF/POPF validation correctly for the paravirt changes which modified arch_local_irq_restore not to use popf" * tag 'objtool-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool,x86: Fix uaccess PUSHF/POPF validation
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip由 Linus Torvalds 提交于
Pull locking fixes from Thomas Gleixner: "A couple of locking fixes: - A fix for the static_call mechanism so it handles unaligned addresses correctly. - Make u64_stats_init() a macro so every instance gets a seperate lockdep key. - Make seqcount_latch_init() a macro as well to preserve the static variable which is used for the lockdep key" * tag 'locking-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: seqlock,lockdep: Fix seqcount_latch_init() u64_stats,lockdep: Fix u64_stats_init() vs lockdep static_call: Fix the module key fixup
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip由 Linus Torvalds 提交于
Pull perf fixes from Borislav Petkov: - Make sure PMU internal buffers are flushed for per-CPU events too and properly handle PID/TID for large PEBS. - Handle the case properly when there's no PMU and therefore return an empty list of perf MSRs for VMX to switch instead of reading random garbage from the stack. * tag 'perf_urgent_for_v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/perf: Use RET0 as default for guest_get_msrs to handle "no PMU" case perf/x86/intel: Set PERF_ATTACH_SCHED_CB for large PEBS and LBR perf/core: Flush PMU internal buffers for per-CPU events
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip由 Linus Torvalds 提交于
Pull EFI fix from Ard Biesheuvel via Borislav Petkov: "Fix an oversight in the handling of EFI_RT_PROPERTIES_TABLE, which was added v5.10, but failed to take the SetVirtualAddressMap() RT service into account" * tag 'efi-urgent-for-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi: stub: omit SetVirtualAddressMap() if marked unsupported in RT_PROP table
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip由 Linus Torvalds 提交于
Pull x86 fixes from Borislav Petkov: - A couple of SEV-ES fixes and robustifications: verify usermode stack pointer in NMI is not coming from the syscall gap, correctly track IRQ states in the #VC handler and access user insn bytes atomically in same handler as latter cannot sleep. - Balance 32-bit fast syscall exit path to do the proper work on exit and thus not confuse audit and ptrace frameworks. - Two fixes for the ORC unwinder going "off the rails" into KASAN redzones and when ORC data is missing. * tag 'x86_urgent_for_v5.12_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sev-es: Use __copy_from_user_inatomic() x86/sev-es: Correctly track IRQ states in runtime #VC handler x86/sev-es: Check regs->sp is trusted before adjusting #VC IST stack x86/sev-es: Introduce ip_within_syscall_gap() helper x86/entry: Fix entry/exit mismatch on failed fast 32-bit syscalls x86/unwind/orc: Silence warnings caused by missing ORC data x86/unwind/orc: Disable KASAN checking in the ORC unwinder, part 2
-