- 05 2月, 2020 2 次提交
-
-
由 Paolo Bonzini 提交于
It is unused now. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Suravee Suthikulpanit 提交于
There are several reasons in which a VM needs to deactivate APICv e.g. disable APICv via parameter during module loading, or when enable Hyper-V SynIC support. Additional inhibit reasons will be introduced later on when dynamic APICv is supported, Introduce KVM APICv inhibit reason bits along with a new variable, apicv_inhibit_reasons, to help keep track of APICv state for each VM, Initially, the APICV_INHIBIT_REASON_DISABLE bit is used to indicate the case where APICv is disabled during KVM module load. (e.g. insmod kvm_amd avic=0 or insmod kvm_intel enable_apicv=0). Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> [Do not use get_enable_apicv; consider irqchip_split in svm.c. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 28 1月, 2020 1 次提交
-
-
由 John Allen 提交于
Current SVM implementation does not have support for handling PKU. Guests running on a host with future AMD cpus that support the feature will read garbage from the PKRU register and will hit segmentation faults on boot as memory is getting marked as protected that should not be. Ensure that cpuid from SVM does not advertise the feature. Signed-off-by: NJohn Allen <john.allen@amd.com> Cc: stable@vger.kernel.org Fixes: 0556cbdc ("x86/pkeys: Don't check if PKRU is zero before writing it") Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 24 1月, 2020 5 次提交
-
-
由 Sean Christopherson 提交于
Move the kvm_cpu_{un}init() calls to common x86 code as an intermediate step to removing kvm_cpu_{un}init() altogether. Note, VMX'x alloc_apic_access_page() and init_rmode_identity_map() are per-VM allocations and are intentionally kept if vCPU creation fails. They are freed by kvm_arch_destroy_vm(). No functional change intended. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
The allocation of FPU structs is identical across VMX and SVM, move it to common x86 code. Somewhat arbitrarily place the allocation so that it resides directly above the associated initialization via fx_init(), e.g. instead of retaining its position with respect to the overall vcpu creation flow. Although the names names kvm_arch_vcpu_create() and kvm_arch_vcpu_init() might suggest otherwise, x86 does not have a clean split between 'create' and 'init'. Allocating the struct immediately prior to the first use arguably improves readability *now*, and will yield even bigger improvements when kvm_arch_vcpu_init() is removed in a future patch. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Move allocation of VMX and SVM vcpus to common x86. Although the struct being allocated is technically a VMX/SVM struct, it can be interpreted directly as a 'struct kvm_vcpu' because of the pre-existing requirement that 'struct kvm_vcpu' be located at offset zero of the arch/vendor vcpu struct. Remove the message from the build-time assertions regarding placement of the struct, as compatibility with the arch usercopy region is no longer the sole dependent on 'struct kvm_vcpu' being at offset zero. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Capture the vcpu pointer in a local varaible and replace '&svm->vcpu' references with a direct reference to the pointer in anticipation of moving bits of the code to common x86 and passing the vcpu pointer into svm_create_vcpu(), i.e. eliminate unnecessary noise from future patches. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
If the guest is configured to have SPEC_CTRL but the host does not (which is a nonsensical configuration but these are not explicitly forbidden) then a host-initiated MSR write can write vmx->spec_ctrl (respectively svm->spec_ctrl) and trigger a #GP when KVM tries to restore the host value of the MSR. Add a more comprehensive check for valid bits of SPEC_CTRL, covering host CPUID flags and, since we are at it and it is more correct that way, guest CPUID flags too. For AMD, remove the unnecessary is_guest_mode check around setting the MSR interception bitmap, so that the code looks the same as for Intel. Cc: Jim Mattson <jmattson@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 21 1月, 2020 4 次提交
-
-
由 Tom Lendacky 提交于
The KVM MMIO support uses bit 51 as the reserved bit to cause nested page faults when a guest performs MMIO. The AMD memory encryption support uses a CPUID function to define the encryption bit position. Given this, it is possible that these bits can conflict. Use svm_hardware_setup() to override the MMIO mask if memory encryption support is enabled. Various checks are performed to ensure that the mask is properly defined and rsvd_bits() is used to generate the new mask (as was done prior to the change that necessitated this patch). Fixes: 28a1f3ac ("kvm: x86: Set highest physical address bits in non-present/reserved SPTEs") Suggested-by: NSean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Rename bit() to __feature_bit() to give it a more descriptive name, and add a macro, feature_bit(), to stuff the X68_FEATURE_ prefix to keep line lengths manageable for code that hardcodes the bit to be retrieved. No functional change intended. Cc: Jim Mattson <jmattson@google.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Now that KVM prevents setting host-reserved CR4 bits, drop the dedicated XSAVE check in guest_cpuid_has() in favor of open coding similar checks in the SVM/VMX XSAVES enabling flows. Note, checking boot_cpu_has(X86_FEATURE_XSAVE) in the XSAVES flows is technically redundant with respect to the CR4 reserved bit checks, e.g. XSAVES #UDs if CR4.OSXSAVE=0 and arch.xsaves_enabled is consumed if and only if CR4.OXSAVE=1 in guest. Keep (add?) the explicit boot_cpu_has() checks to help document KVM's usage of arch.xsaves_enabled. No functional change intended. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
ICR and TSCDEADLINE MSRs write cause the main MSRs write vmexits in our product observation, multicast IPIs are not as common as unicast IPI like RESCHEDULE_VECTOR and CALL_FUNCTION_SINGLE_VECTOR etc. This patch introduce a mechanism to handle certain performance-critical WRMSRs in a very early stage of KVM VMExit handler. This mechanism is specifically used for accelerating writes to x2APIC ICR that attempt to send a virtual IPI with physical destination-mode, fixed delivery-mode and single target. Which was found as one of the main causes of VMExits for Linux workloads. The reason this mechanism significantly reduce the latency of such virtual IPIs is by sending the physical IPI to the target vCPU in a very early stage of KVM VMExit handler, before host interrupts are enabled and before expensive operations such as reacquiring KVM’s SRCU lock. Latency is reduced even more when KVM is able to use APICv posted-interrupt mechanism (which allows to deliver the virtual IPI directly to target vCPU without the need to kick it to host). Testing on Xeon Skylake server: The virtual IPI latency from sender send to receiver receive reduces more than 200+ cpu cycles. Reviewed-by: NLiran Alon <liran.alon@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Liran Alon <liran.alon@oracle.com> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 09 1月, 2020 1 次提交
-
-
由 Peter Xu 提交于
We have both APIC_SHORT_MASK and KVM_APIC_SHORT_MASK defined for the shorthand mask. Similarly, we have both APIC_DEST_MASK and KVM_APIC_DEST_MASK defined for the destination mode mask. Drop the KVM_APIC_* macros and replace the only user of them to use the APIC_DEST_* macros instead. At the meantime, move APIC_SHORT_MASK and APIC_DEST_MASK from lapic.c to lapic.h. Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 27 11月, 2019 1 次提交
-
-
由 Peter Gonda 提交于
Memory encryption support does not have module parameter dependencies and can be moved into the general x86 cpuid __do_cpuid_ent function. This changes maintains current behavior of passing through all of CPUID.8000001F. Suggested-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPeter Gonda <pgonda@google.com> Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 15 11月, 2019 2 次提交
-
-
由 Liran Alon 提交于
This check is unnecessary as x86 update_cr8_intercept() which calls this VMX/SVM specific callback already performs this check. Reviewed-by: NJoao Martins <joao.m.martins@oracle.com> Signed-off-by: NLiran Alon <liran.alon@oracle.com> Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Andrea Arcangeli 提交于
It's enough to check the exit value and issue a direct call to avoid the retpoline for all the common vmexit reasons. After this commit is applied, here the most common retpolines executed under a high resolution timer workload in the guest on a SVM host: [..] @[ trace_retpoline+1 __trace_retpoline+30 __x86_indirect_thunk_rax+33 ktime_get_update_offsets_now+70 hrtimer_interrupt+131 smp_apic_timer_interrupt+106 apic_timer_interrupt+15 start_sw_timer+359 restart_apic_timer+85 kvm_set_msr_common+1497 msr_interception+142 vcpu_enter_guest+684 kvm_arch_vcpu_ioctl_run+261 kvm_vcpu_ioctl+559 do_vfs_ioctl+164 ksys_ioctl+96 __x64_sys_ioctl+22 do_syscall_64+89 entry_SYSCALL_64_after_hwframe+68 ]: 1940 @[ trace_retpoline+1 __trace_retpoline+30 __x86_indirect_thunk_r12+33 force_qs_rnp+217 rcu_gp_kthread+1270 kthread+268 ret_from_fork+34 ]: 4644 @[]: 25095 @[ trace_retpoline+1 __trace_retpoline+30 __x86_indirect_thunk_rax+33 lapic_next_event+28 clockevents_program_event+148 hrtimer_start_range_ns+528 start_sw_timer+356 restart_apic_timer+85 kvm_set_msr_common+1497 msr_interception+142 vcpu_enter_guest+684 kvm_arch_vcpu_ioctl_run+261 kvm_vcpu_ioctl+559 do_vfs_ioctl+164 ksys_ioctl+96 __x64_sys_ioctl+22 do_syscall_64+89 entry_SYSCALL_64_after_hwframe+68 ]: 41474 @[ trace_retpoline+1 __trace_retpoline+30 __x86_indirect_thunk_rax+33 clockevents_program_event+148 hrtimer_start_range_ns+528 start_sw_timer+356 restart_apic_timer+85 kvm_set_msr_common+1497 msr_interception+142 vcpu_enter_guest+684 kvm_arch_vcpu_ioctl_run+261 kvm_vcpu_ioctl+559 do_vfs_ioctl+164 ksys_ioctl+96 __x64_sys_ioctl+22 do_syscall_64+89 entry_SYSCALL_64_after_hwframe+68 ]: 41474 @[ trace_retpoline+1 __trace_retpoline+30 __x86_indirect_thunk_rax+33 ktime_get+58 clockevents_program_event+84 hrtimer_start_range_ns+528 start_sw_timer+356 restart_apic_timer+85 kvm_set_msr_common+1497 msr_interception+142 vcpu_enter_guest+684 kvm_arch_vcpu_ioctl_run+261 kvm_vcpu_ioctl+559 do_vfs_ioctl+164 ksys_ioctl+96 __x64_sys_ioctl+22 do_syscall_64+89 entry_SYSCALL_64_after_hwframe+68 ]: 41887 @[ trace_retpoline+1 __trace_retpoline+30 __x86_indirect_thunk_rax+33 lapic_next_event+28 clockevents_program_event+148 hrtimer_try_to_cancel+168 hrtimer_cancel+21 kvm_set_lapic_tscdeadline_msr+43 kvm_set_msr_common+1497 msr_interception+142 vcpu_enter_guest+684 kvm_arch_vcpu_ioctl_run+261 kvm_vcpu_ioctl+559 do_vfs_ioctl+164 ksys_ioctl+96 __x64_sys_ioctl+22 do_syscall_64+89 entry_SYSCALL_64_after_hwframe+68 ]: 42723 @[ trace_retpoline+1 __trace_retpoline+30 __x86_indirect_thunk_rax+33 clockevents_program_event+148 hrtimer_try_to_cancel+168 hrtimer_cancel+21 kvm_set_lapic_tscdeadline_msr+43 kvm_set_msr_common+1497 msr_interception+142 vcpu_enter_guest+684 kvm_arch_vcpu_ioctl_run+261 kvm_vcpu_ioctl+559 do_vfs_ioctl+164 ksys_ioctl+96 __x64_sys_ioctl+22 do_syscall_64+89 entry_SYSCALL_64_after_hwframe+68 ]: 42766 @[ trace_retpoline+1 __trace_retpoline+30 __x86_indirect_thunk_rax+33 ktime_get+58 clockevents_program_event+84 hrtimer_try_to_cancel+168 hrtimer_cancel+21 kvm_set_lapic_tscdeadline_msr+43 kvm_set_msr_common+1497 msr_interception+142 vcpu_enter_guest+684 kvm_arch_vcpu_ioctl_run+261 kvm_vcpu_ioctl+559 do_vfs_ioctl+164 ksys_ioctl+96 __x64_sys_ioctl+22 do_syscall_64+89 entry_SYSCALL_64_after_hwframe+68 ]: 42848 @[ trace_retpoline+1 __trace_retpoline+30 __x86_indirect_thunk_rax+33 ktime_get+58 start_sw_timer+279 restart_apic_timer+85 kvm_set_msr_common+1497 msr_interception+142 vcpu_enter_guest+684 kvm_arch_vcpu_ioctl_run+261 kvm_vcpu_ioctl+559 do_vfs_ioctl+164 ksys_ioctl+96 __x64_sys_ioctl+22 do_syscall_64+89 entry_SYSCALL_64_after_hwframe+68 ]: 499845 @total: 1780243 SVM has no TSC based programmable preemption timer so it is invoking ktime_get() frequently. Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 31 10月, 2019 1 次提交
-
-
由 Paolo Bonzini 提交于
VMX already does so if the host has SMEP, in order to support the combination of CR0.WP=1 and CR4.SMEP=1. However, it is perfectly safe to always do so, and in fact VMX already ends up running with EFER.NXE=1 on old processors that lack the "load EFER" controls, because it may help avoiding a slow MSR write. Removing all the conditionals simplifies the code. SVM does not have similar code, but it should since recent AMD processors do support SMEP. So this patch also makes the code for the two vendors more similar while fixing NPT=0, CR0.WP=1 and CR4.SMEP=1 on AMD processors. Cc: stable@vger.kernel.org Cc: Joerg Roedel <jroedel@suse.de> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 23 10月, 2019 1 次提交
-
-
由 Miaohe Lin 提交于
Guest physical APIC ID may not equal to vcpu->vcpu_id in some case. We may set the wrong physical id in avic_handle_ldr_update as we always use vcpu->vcpu_id. Get physical APIC ID from vAPIC page instead. Export and use kvm_xapic_id here and in avic_handle_apic_id_update as suggested by Vitaly. Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 22 10月, 2019 10 次提交
-
-
由 Aaron Lewis 提交于
AMD CPUs now support XSAVES in a limited fashion (they require IA32_XSS to be zero). AMD has no equivalent of Intel's "Enable XSAVES/XRSTORS" VM-execution control. Instead, XSAVES is always available to the guest when supported on the host. Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NAaron Lewis <aaronlewis@google.com> Change-Id: I40dc2c682eb0d38c2208d95d5eb7bbb6c47f6317 Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Aaron Lewis 提交于
Hoist the vendor-specific code related to loading the hardware IA32_XSS MSR with guest/host values on VM-entry/VM-exit to common x86 code. Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NAaron Lewis <aaronlewis@google.com> Change-Id: Ic6e3430833955b98eb9b79ae6715cf2a3fdd6d82 Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Aaron Lewis 提交于
When the guest can execute the XSAVES/XRSTORS instructions, set the hardware IA32_XSS MSR to guest/host values on VM-entry/VM-exit. Note that vcpu->arch.ia32_xss is currently guaranteed to be 0 on AMD, since there is no way to change it. Suggested-by: NSean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NAaron Lewis <aaronlewis@google.com> Change-Id: Id51a782462086e6d7a3ab621838e200f1c005afd Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Aaron Lewis 提交于
Cache whether XSAVES is enabled in the guest by adding xsaves_enabled to vcpu->arch. Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NAaron Lewis <aaronlewis@google.com> Change-Id: If4638e0901c28a4494dad2e103e2c075e8ab5d68 Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Suthikulpanit, Suravee 提交于
Generally, APICv for all vcpus in the VM are enable/disable in the same manner. So, get_enable_apicv() should represent APICv status of the VM instead of each VCPU. Modify kvm_x86_ops.get_enable_apicv() to take struct kvm as parameter instead of struct kvm_vcpu. Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Handle caching CR3 (from VMX's VMCS) into struct kvm_vcpu via the common cache_reg() callback and drop the dedicated decache_cr3(). The name decache_cr3() is somewhat confusing as the caching behavior of CR3 follows that of GPRs, RFLAGS and PDPTRs, (handled via cache_reg()), and has nothing in common with the caching behavior of CR0/CR4 (whose decache_cr{0,4}_guest_bits() likely provided the 'decache' verbiage). This would effectivel adds a BUG() if KVM attempts to cache CR3 on SVM. Change it to a WARN_ON_ONCE() -- if the cache never requires filling, the value is already in the right place -- and opportunistically add one in VMX to provide an equivalent check. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Tom Lendacky 提交于
Performing a WBINVD and DF_FLUSH are expensive operations. Currently, a WBINVD/DF_FLUSH is performed every time an SEV guest terminates. However, the WBINVD/DF_FLUSH is only required when an ASID is being re-allocated to a new SEV guest. Also, a single WBINVD/DF_FLUSH can enable all ASIDs that have been disassociated from guests through DEACTIVATE. To reduce the number of WBINVD/DF_FLUSH invocations, introduce a new ASID bitmap to track ASIDs that need to be reclaimed. When an SEV guest is terminated, add its ASID to the reclaim bitmap instead of clearing the bitmap in the existing SEV ASID bitmap. This delays the need to perform a WBINVD/DF_FLUSH invocation when an SEV guest terminates until all of the available SEV ASIDs have been used. At that point, the WBINVD/DF_FLUSH invocation can be performed and all ASIDs in the reclaim bitmap moved to the available ASIDs bitmap. The semaphore around DEACTIVATE can be changed to a read semaphore with the semaphore taken in write mode before performing the WBINVD/DF_FLUSH. Tested-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Tom Lendacky 提交于
Performing a WBINVD and DF_FLUSH are expensive operations. The SEV support currently performs this WBINVD/DF_FLUSH combination when an SEV guest is terminated, so there is no need for it to be done before LAUNCH. However, when the SEV firmware transitions the platform from UNINIT state to INIT state, all ASIDs will be marked invalid across all threads. Therefore, as part of transitioning the platform to INIT state, perform a WBINVD/DF_FLUSH after a successful INIT in the PSP/SEV device driver. Since the PSP/SEV device driver is x86 only, it can reference and use the WBINVD related functions directly. Cc: Gary Hook <gary.hook@amd.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Tested-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Tom Lendacky 提交于
The SEV firmware DEACTIVATE command disassociates an SEV guest from an ASID, clears the WBINVD indicator on all threads and indicates that the SEV firmware DF_FLUSH command must be issued before the ASID can be re-used. The SEV firmware DF_FLUSH command will return an error if a WBINVD has not been performed on every thread before it has been invoked. A window exists between the WBINVD and the invocation of the DF_FLUSH command where an SEV firmware DEACTIVATE command could be invoked on another thread, clearing the WBINVD indicator. This will cause the subsequent SEV firmware DF_FLUSH command to fail which, in turn, results in the SEV firmware ACTIVATE command failing for the reclaimed ASID. This results in the SEV guest failing to start. Use a mutex to close the WBINVD/DF_FLUSH window by obtaining the mutex before the DEACTIVATE and releasing it after the DF_FLUSH. This ensures that any DEACTIVATE cannot run before a DF_FLUSH has completed. Fixes: 59414c98 ("KVM: SVM: Add support for KVM_SEV_LAUNCH_START command") Tested-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Tom Lendacky 提交于
The SEV ASID bitmap currently is not protected against parallel SEV guest startups. This can result in an SEV guest failing to start because another SEV guest could have been assigned the same ASID value. Use a mutex to serialize access to the SEV ASID bitmap. Fixes: 1654efcb ("KVM: SVM: Add KVM_SEV_INIT command") Tested-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 24 9月, 2019 8 次提交
-
-
由 Jim Mattson 提交于
The RDPRU instruction gives the guest read access to the IA32_APERF MSR and the IA32_MPERF MSR. According to volume 3 of the APM, "When virtualization is enabled, this instruction can be intercepted by the Hypervisor. The intercept bit is at VMCB byte offset 10h, bit 14." Since we don't enumerate the instruction in KVM_SUPPORTED_CPUID, intercept it and synthesize #UD. Signed-off-by: NJim Mattson <jmattson@google.com> Reviewed-by: NDrew Schmitt <dasch@google.com> Reviewed-by: NJacob Xu <jacobhxu@google.com> Reviewed-by: NPeter Shier <pshier@google.com> Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
VMX's EPT misconfig flow to handle fast-MMIO path falls back to decoding the instruction to determine the instruction length when running as a guest (Hyper-V doesn't fill VMCS.VM_EXIT_INSTRUCTION_LEN because it's technically not defined for EPT misconfigs). Rather than implement the slow skip in VMX's generic skip_emulated_instruction(), handle_ept_misconfig() directly calls kvm_emulate_instruction() with EMULTYPE_SKIP, which intentionally doesn't do single-step detection, and so handle_ept_misconfig() misses a single-step #DB. Rework the EPT misconfig fallback case to route it through kvm_skip_emulated_instruction() so that single-step #DBs and interrupt shadow updates are handled automatically. I.e. make VMX's slow skip logic match SVM's and have the SVM flow not intentionally avoid the shadow update. Alternatively, the handle_ept_misconfig() could manually handle single- step detection, but that results in EMULTYPE_SKIP having split logic for the interrupt shadow vs. single-step #DBs, and split emulator logic is largely what led to this mess in the first place. Modifying SVM to mirror VMX flow isn't really an option as SVM's case isn't limited to a specific exit reason, i.e. handling the slow skip in skip_emulated_instruction() is mandatory for all intents and purposes. Drop VMX's skip_emulated_instruction() wrapper since it can now fail, and instead WARN if it fails unexpectedly, e.g. if exit_reason somehow becomes corrupted. Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Fixes: d391f120 ("x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested") Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Deferring emulation failure handling (in some cases) to the caller of x86_emulate_instruction() has proven fragile, e.g. multiple instances of KVM not setting run->exit_reason on EMULATE_FAIL, largely due to it being difficult to discern what emulation types can return what result, and which combination of types and results are handled where. Now that x86_emulate_instruction() always handles emulation failure, i.e. EMULATION_FAIL is only referenced in callers, remove the emulation_result enums entirely. Per KVM's existing exit handling conventions, return '0' and '1' for "exit to userspace" and "resume guest" respectively. Doing so cleans up many callers, e.g. they can return kvm_emulate_instruction() directly instead of having to interpret its result. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Consolidate the reporting of emulation failure into kvm_task_switch() so that it can return EMULATE_USER_EXIT. This helps pave the way for removing EMULATE_FAIL altogether. This also fixes a theoretical bug where task switch interception could suppress an EMULATE_USER_EXIT return. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Kill a few birds with one stone by forcing an exit to userspace on skip emulation failure. This removes a reference to EMULATE_FAIL, fixes a bug in handle_ept_misconfig() where it would exit to userspace without setting run->exit_reason, and fixes a theoretical bug in SVM's task_switch_interception() where it would overwrite run->exit_reason on a return of EMULATE_USER_EXIT. Note, this technically doesn't fully fix task_switch_interception() as it now incorrectly handles EMULATE_FAIL, but in practice there is no bug as EMULATE_FAIL will never be returned for EMULTYPE_SKIP. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Immediately inject a #GP when VMware emulation fails and return EMULATE_DONE instead of propagating EMULATE_FAIL up the stack. This helps pave the way for removing EMULATE_FAIL altogether. Rename EMULTYPE_VMWARE to EMULTYPE_VMWARE_GP to document that the x86 emulator is called to handle VMware #GP interception, e.g. why a #GP is injected on emulation failure for EMULTYPE_VMWARE_GP. Drop EMULTYPE_NO_UD_ON_FAIL as a standalone type. The "no #UD on fail" is used only in the VMWare case and is obsoleted by having the emulator itself reinject #GP. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: NLiran Alon <liran.alon@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
The VMware backdoor hooks #GP faults on IN{S}, OUT{S}, and RDPMC, none of which generate a non-zero error code for their #GP. Re-injecting #GP instead of attempting emulation on a non-zero error code will allow a future patch to move #GP injection (for emulation failure) into kvm_emulate_instruction() without having to plumb in the error code. Reviewed-and-tested-by: NVitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: NLiran Alon <liran.alon@oracle.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
Since commit 5158917c ("KVM: x86: nVMX: Allow nested_enable_evmcs to be NULL") the code in x86.c is prepared to see nested_enable_evmcs being NULL and in VMX case it actually is when nesting is disabled. Remove the unneeded stub from SVM code. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 12 9月, 2019 1 次提交
-
-
由 Liran Alon 提交于
Commit cd7764fe ("KVM: x86: latch INITs while in system management mode") changed code to latch INIT while vCPU is in SMM and process latched INIT when leaving SMM. It left a subtle remark in commit message that similar treatment should also be done while vCPU is in VMX non-root-mode. However, INIT signals should actually be latched in various vCPU states: (*) For both Intel and AMD, INIT signals should be latched while vCPU is in SMM. (*) For Intel, INIT should also be latched while vCPU is in VMX operation and later processed when vCPU leaves VMX operation by executing VMXOFF. (*) For AMD, INIT should also be latched while vCPU runs with GIF=0 or in guest-mode with intercept defined on INIT signal. To fix this: 1) Add kvm_x86_ops->apic_init_signal_blocked() such that each CPU vendor can define the various CPU states in which INIT signals should be blocked and modify kvm_apic_accept_events() to use it. 2) Modify vmx_check_nested_events() to check for pending INIT signal while vCPU in guest-mode. If so, emualte vmexit on EXIT_REASON_INIT_SIGNAL. Note that nSVM should have similar behaviour but is currently left as a TODO comment to implement in the future because nSVM don't yet implement svm_check_nested_events(). Note: Currently KVM nVMX implementation don't support VMX wait-for-SIPI activity state as specified in MSR_IA32_VMX_MISC bits 6:8 exposed to guest (See nested_vmx_setup_ctls_msrs()). If and when support for this activity state will be implemented, kvm_check_nested_events() would need to avoid emulating vmexit on INIT signal in case activity-state is wait-for-SIPI. In addition, kvm_apic_accept_events() would need to be modified to avoid discarding SIPI in case VMX activity-state is wait-for-SIPI but instead delay SIPI processing to vmx_check_nested_events() that would clear pending APIC events and emulate vmexit on SIPI. Reviewed-by: NJoao Martins <joao.m.martins@oracle.com> Co-developed-by: NNikita Leshenko <nikita.leshchenko@oracle.com> Signed-off-by: NNikita Leshenko <nikita.leshchenko@oracle.com> Signed-off-by: NLiran Alon <liran.alon@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 11 9月, 2019 3 次提交
-
-
由 Dan Carpenter 提交于
We refactored this code a bit and accidentally deleted the "-" character from "-EINVAL". The kvm_vcpu_map() function never returns positive EINVAL. Fixes: c8e16b78 ("x86: KVM: svm: eliminate hardcoded RIP advancement from vmrun_interception()") Cc: stable@vger.kernel.org Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Liran Alon 提交于
Receiving an unexpected exit reason from hardware should be considered as a severe bug in KVM. Therefore, instead of just injecting #UD to guest and ignore it, exit to userspace on internal error so that it could handle it properly (probably by terminating guest). In addition, prefer to use vcpu_unimpl() instead of WARN_ONCE() as handling unexpected exit reason should be a rare unexpected event (that was expected to never happen) and we prefer to print a message on it every time it occurs to guest. Furthermore, dump VMCS/VMCB to dmesg to assist diagnosing such cases. Reviewed-by: NMihai Carabas <mihai.carabas@oracle.com> Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: NJoao Martins <joao.m.martins@oracle.com> Signed-off-by: NLiran Alon <liran.alon@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Move RDMSR and WRMSR emulation into common x86 code to consolidate nearly identical SVM and VMX code. Note, consolidating RDMSR introduces an extra indirect call, i.e. retpoline, due to reaching {svm,vmx}_get_msr() via kvm_x86_ops, but a guest kernel likely has bigger problems if increasing the latency of RDMSR VM-Exits by ~70 cycles has a measurable impact on overall VM performance. E.g. the only recurring RDMSR VM-Exits (after booting) on my system running Linux 5.2 in the guest are for MSR_IA32_TSC_ADJUST via arch_cpu_idle_enter(). No functional change intended. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-