- 19 4月, 2021 1 次提交
-
-
由 chenjiajun 提交于
virt inclusion category: feature bugzilla: 46853 CVE: NA Export EXIT_REASON_PREEMPTION_TIMER kvm exits to vcpu_stat debugfs. Add a new column to vcpu_stat, and provide preemption_timer status to virtualization detection tools. Signed-off-by: Nchenjiajun <chenjiajun8@huawei.com> Reviewed-by: NXiangyou Xie <xiexiangyou@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 09 3月, 2021 3 次提交
-
-
由 Paolo Bonzini 提交于
stable inclusion from stable-5.10.17 commit 2aba53830f5d02dcd0bb74a00c8b8023df9d1398 bugzilla: 48169 -------------------------------- [ Upstream commit c1c35cf7 ] If not in long mode, the low bits of CR3 are reserved but not enforced to be zero, so remove those checks. If in long mode, however, the MBZ bits extend down to the highest physical address bit of the guest, excluding the encryption bit. Make the checks consistent with the above, and match them between nested_vmcb_checks and KVM_SET_SREGS. Cc: stable@vger.kernel.org Fixes: 761e4169 ("KVM: nSVM: Check that MBZ bits in CR3 and CR4 are not set on vmrun of nested guests") Fixes: a780a3ea ("KVM: X86: Fix reserved bits check for MOV to CR3") Reviewed-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Sean Christopherson 提交于
stable inclusion from stable-5.10.15 commit 7159239d2de1eea77b322e661141a36c6a909b05 bugzilla: 48167 -------------------------------- commit 031b91a5 upstream. Set cr3_lm_rsvd_bits, which is effectively an invalid GPA mask, at vCPU reset. The reserved bits check needs to be done even if userspace never configures the guest's CPUID model. Cc: stable@vger.kernel.org Fixes: 0107973a ("KVM: x86: Introduce cr3_lm_rsvd_bits in kvm_vcpu_arch") Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210204000117.3303214-2-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Paolo Bonzini 提交于
stable inclusion from stable-5.10.15 commit 6c0e069ac6e8db0c49dcc90d37ede5b1da08fe0b bugzilla: 48167 -------------------------------- commit 7131636e upstream. Userspace that does not know about KVM_GET_MSR_FEATURE_INDEX_LIST will generally use the default value for MSR_IA32_ARCH_CAPABILITIES. When this happens and the host has tsx=on, it is possible to end up with virtual machines that have HLE and RTM disabled, but TSX_CTRL available. If the fleet is then switched to tsx=off, kvm_get_arch_capabilities() will clear the ARCH_CAP_TSX_CTRL_MSR bit and it will not be possible to use the tsx=off hosts as migration destinations, even though the guests do not have TSX enabled. To allow this migration, allow guests to write to their TSX_CTRL MSR, while keeping the host MSR unchanged for the entire life of the guests. This ensures that TSX remains disabled and also saves MSR reads and writes, and it's okay to do because with tsx=off we know that guests will not have the HLE and RTM features in their CPUID. (If userspace sets bogus CPUID data, we do not expect HLE and RTM to work in guests anyway). Cc: stable@vger.kernel.org Fixes: cbbaa272 ("KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES") Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
- 09 2月, 2021 3 次提交
-
-
由 Jay Zhou 提交于
stable inclusion from stable-5.10.13 commit e895a39a2bcdbb1d602316f3fb416aa93818805a bugzilla: 47995 -------------------------------- commit 1f7becf1 upstream. The injection process of smi has two steps: Qemu KVM Step1: cpu->interrupt_request &= \ ~CPU_INTERRUPT_SMI; kvm_vcpu_ioctl(cpu, KVM_SMI) call kvm_vcpu_ioctl_smi() and kvm_make_request(KVM_REQ_SMI, vcpu); Step2: kvm_vcpu_ioctl(cpu, KVM_RUN, 0) call process_smi() if kvm_check_request(KVM_REQ_SMI, vcpu) is true, mark vcpu->arch.smi_pending = true; The vcpu->arch.smi_pending will be set true in step2, unfortunately if vcpu paused between step1 and step2, the kvm_run->immediate_exit will be set and vcpu has to exit to Qemu immediately during step2 before mark vcpu->arch.smi_pending true. During VM migration, Qemu will get the smi pending status from KVM using KVM_GET_VCPU_EVENTS ioctl at the downtime, then the smi pending status will be lost. Signed-off-by: NJay Zhou <jianjay.zhou@huawei.com> Signed-off-by: NShengen Zhuang <zhuangshengen@huawei.com> Message-Id: <20210118084720.1585-1-jianjay.zhou@huawei.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Paolo Bonzini 提交于
stable inclusion from stable-5.10.13 commit cffcb5e0fe2c994f0aa5d01b3c16e3f8a59350aa bugzilla: 47995 -------------------------------- commit 9a78e158 upstream. VMX also uses KVM_REQ_GET_NESTED_STATE_PAGES for the Hyper-V eVMCS, which may need to be loaded outside guest mode. Therefore we cannot WARN in that case. However, that part of nested_get_vmcs12_pages is _not_ needed at vmentry time. Split it out of KVM_REQ_GET_NESTED_STATE_PAGES handling, so that both vmentry and migration (and in the latter case, independent of is_guest_mode) do the parts that are needed. Cc: <stable@vger.kernel.org> # 5.10.x: f2c7ef3b: KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES Cc: <stable@vger.kernel.org> # 5.10.x Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Maxim Levitsky 提交于
stable inclusion from stable-5.10.13 commit 0faceb7d6dda6f370ff1fa0464d7180f7e5cb417 bugzilla: 47995 -------------------------------- commit f2c7ef3b upstream. It is possible to exit the nested guest mode, entered by svm_set_nested_state prior to first vm entry to it (e.g due to pending event) if the nested run was not pending during the migration. In this case we must not switch to the nested msr permission bitmap. Also add a warning to catch similar cases in the future. Fixes: a7d5c7ce ("KVM: nSVM: delay MSR permission processing to first nested VM run") Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210107093854.882483-2-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
- 12 1月, 2021 2 次提交
-
-
由 chenjiajun 提交于
virt inclusion category: feature bugzilla: 46853 CVE: NA Export vcpu_stat via debugfs for x86, which contains x86 kvm exits items. The path of the vcpu_stat is /sys/kernel/debug/kvm/vcpu_stat, and each line of vcpu_stat is a collection of various kvm exits for a vcpu. And through vcpu_stat, we only need to open one file to tail performance of virtual machine, which is more convenient. Signed-off-by: NFeng Lin <linfeng23@huawei.com> Signed-off-by: Nchenjiajun <chenjiajun8@huawei.com> Reviewed-by: NXiangyou Xie <xiexiangyou@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com>
-
由 chenjiajun 提交于
virt inclusion category: feature bugzilla: 46853 CVE: NA This patch create debugfs entry for vcpu stat. The entry path is /sys/kernel/debug/kvm/vcpu_stat. And vcpu_stat contains partial kvm exits items of vcpu, include: pid, hvc_exit_stat, wfe_exit_stat, wfi_exit_stat, mmio_exit_user, mmio_exit_kernel, exits Currently, The maximum vcpu limit is 1024. From this vcpu_stat, user can get the number of these kvm exits items over a period of time, which is helpful to monitor the virtual machine. Signed-off-by: NZenghui Yu <yuzenghui@huawei.com> Signed-off-by: Nchenjiajun <chenjiajun8@huawei.com> Reviewed-by: NXiangyou Xie <xiexiangyou@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com>
-
- 27 11月, 2020 1 次提交
-
-
由 Paolo Bonzini 提交于
kvm_cpu_accept_dm_intr and kvm_vcpu_ready_for_interrupt_injection are a hodge-podge of conditions, hacked together to get something that more or less works. But what is actually needed is much simpler; in both cases the fundamental question is, do we have a place to stash an interrupt if userspace does KVM_INTERRUPT? In userspace irqchip mode, that is !vcpu->arch.interrupt.injected. Currently kvm_event_needs_reinjection(vcpu) covers it, but it is unnecessarily restrictive. In split irqchip mode it's a bit more complicated, we need to check kvm_apic_accept_pic_intr(vcpu) (the IRQ window exit is basically an INTACK cycle and thus requires ExtINTs not to be masked) as well as !pending_userspace_extint(vcpu). However, there is no need to check kvm_event_needs_reinjection(vcpu), since split irqchip keeps pending ExtINT state separate from event injection state, and checking kvm_cpu_has_interrupt(vcpu) is wrong too since ExtINT has higher priority than APIC interrupts. In fact the latter fixes a bug: when userspace requests an IRQ window vmexit, an interrupt in the local APIC can cause kvm_cpu_has_interrupt() to be true and thus kvm_vcpu_ready_for_interrupt_injection() to return false. When this happens, vcpu_run does not exit to userspace but the interrupt window vmexits keep occurring. The VM loops without any hope of making progress. Once we try to fix these with something like return kvm_arch_interrupt_allowed(vcpu) && - !kvm_cpu_has_interrupt(vcpu) && - !kvm_event_needs_reinjection(vcpu) && - kvm_cpu_accept_dm_intr(vcpu); + (!lapic_in_kernel(vcpu) + ? !vcpu->arch.interrupt.injected + : (kvm_apic_accept_pic_intr(vcpu) + && !pending_userspace_extint(v))); we realize two things. First, thanks to the previous patch the complex conditional can reuse !kvm_cpu_has_extint(vcpu). Second, the interrupt window request in vcpu_enter_guest() bool req_int_win = dm_request_for_irq_injection(vcpu) && kvm_cpu_accept_dm_intr(vcpu); should be kept in sync with kvm_vcpu_ready_for_interrupt_injection(): it is unnecessary to ask the processor for an interrupt window if we would not be able to return to userspace. Therefore, kvm_cpu_accept_dm_intr(vcpu) is basically !kvm_cpu_has_extint(vcpu) ANDed with the existing check for masked ExtINT. It all makes sense: - we can accept an interrupt from userspace if there is a place to stash it (and, for irqchip split, ExtINTs are not masked). Interrupts from userspace _can_ be accepted even if right now EFLAGS.IF=0. - in order to tell userspace we will inject its interrupt ("IRQ window open" i.e. kvm_vcpu_ready_for_interrupt_injection), both KVM and the vCPU need to be ready to accept the interrupt. ... and this is what the patch implements. Reported-by: NDavid Woodhouse <dwmw@amazon.co.uk> Analyzed-by: NDavid Woodhouse <dwmw@amazon.co.uk> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NNikos Tsironis <ntsironis@arrikto.com> Reviewed-by: NDavid Woodhouse <dwmw@amazon.co.uk> Tested-by: NDavid Woodhouse <dwmw@amazon.co.uk>
-
- 13 11月, 2020 1 次提交
-
-
由 Babu Moger 提交于
SEV guests fail to boot on a system that supports the PCID feature. While emulating the RSM instruction, KVM reads the guest CR3 and calls kvm_set_cr3(). If the vCPU is in the long mode, kvm_set_cr3() does a sanity check for the CR3 value. In this case, it validates whether the value has any reserved bits set. The reserved bit range is 63:cpuid_maxphysaddr(). When AMD memory encryption is enabled, the memory encryption bit is set in the CR3 value. The memory encryption bit may fall within the KVM reserved bit range, causing the KVM emulation failure. Introduce a new field cr3_lm_rsvd_bits in kvm_vcpu_arch which will cache the reserved bits in the CR3 value. This will be initialized to rsvd_bits(cpuid_maxphyaddr(vcpu), 63). If the architecture has any special bits(like AMD SEV encryption bit) that needs to be masked from the reserved bits, should be cleared in vendor specific kvm_x86_ops.vcpu_after_set_cpuid handler. Fixes: a780a3ea ("KVM: X86: Fix reserved bits check for MOV to CR3") Signed-off-by: NBabu Moger <babu.moger@amd.com> Message-Id: <160521947657.32054.3264016688005356563.stgit@bmoger-ubuntu> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 08 11月, 2020 5 次提交
-
-
由 Pankaj Gupta 提交于
Windows2016 guest tries to enable LBR by setting the corresponding bits in MSR_IA32_DEBUGCTLMSR. KVM does not emulate MSR_IA32_DEBUGCTLMSR and spams the host kernel logs with error messages like: kvm [...]: vcpu1, guest rIP: 0xfffff800a8b687d3 kvm_set_msr_common: MSR_IA32_DEBUGCTLMSR 0x1, nop" This patch fixes this by enabling error logging only with 'report_ignored_msrs=1'. Signed-off-by: NPankaj Gupta <pankaj.gupta@cloud.ionos.com> Message-Id: <20201105153932.24316-1-pankaj.gupta.linux@gmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Oliver Upton 提交于
Commit 5b9bb0eb ("kvm: x86: encapsulate wrmsr(MSR_KVM_SYSTEM_TIME) emulation in helper fn", 2020-10-21) subtly changed the behavior of guest writes to MSR_KVM_SYSTEM_TIME(_NEW). Restore the previous behavior; update the masterclock any time the guest uses a different msr than before. Fixes: 5b9bb0eb ("kvm: x86: encapsulate wrmsr(MSR_KVM_SYSTEM_TIME) emulation in helper fn", 2020-10-21) Signed-off-by: NOliver Upton <oupton@google.com> Reviewed-by: NPeter Shier <pshier@google.com> Message-Id: <20201027231044.655110-6-oupton@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Oliver Upton 提交于
Make the paravirtual cpuid enforcement mechanism idempotent to ioctl() ordering by updating pv_cpuid.features whenever userspace requests the capability. Extract this update out of kvm_update_cpuid_runtime() into a new helper function and move its other call site into kvm_vcpu_after_set_cpuid() where it more likely belongs. Fixes: 66570e96 ("kvm: x86: only provide PV features if enabled in guest's CPUID") Signed-off-by: NOliver Upton <oupton@google.com> Reviewed-by: NPeter Shier <pshier@google.com> Message-Id: <20201027231044.655110-5-oupton@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Oliver Upton 提交于
commit 66570e96 ("kvm: x86: only provide PV features if enabled in guest's CPUID") only protects against disallowed guest writes to KVM paravirtual msrs, leaving msr reads unchecked. Fix this by enforcing KVM_CPUID_FEATURES for msr reads as well. Fixes: 66570e96 ("kvm: x86: only provide PV features if enabled in guest's CPUID") Signed-off-by: NOliver Upton <oupton@google.com> Reviewed-by: NPeter Shier <pshier@google.com> Message-Id: <20201027231044.655110-4-oupton@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
Recent introduction of the userspace msr filtering added code that uses negative error codes for cases that result in either #GP delivery to the guest, or handled by the userspace msr filtering. This breaks an assumption that a negative error code returned from the msr emulation code is a semi-fatal error which should be returned to userspace via KVM_RUN ioctl and usually kill the guest. Fix this by reusing the already existing KVM_MSR_RET_INVALID error code, and by adding a new KVM_MSR_RET_FILTERED error code for the userspace filtered msrs. Fixes: 291f35fb2c1d1 ("KVM: x86: report negative values from wrmsr emulation to userspace") Reported-by: NQian Cai <cai@redhat.com> Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20201101115523.115780-1-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 31 10月, 2020 1 次提交
-
-
由 Takashi Iwai 提交于
The newly introduced kvm_msr_ignored_check() tries to print error or debug messages via vcpu_*() macros, but those may cause Oops when NULL vcpu is passed for KVM_GET_MSRS ioctl. Fix it by replacing the print calls with kvm_*() macros. (Note that this will leave vcpu argument completely unused in the function, but I didn't touch it to make the fix as small as possible. A clean up may be applied later.) Fixes: 12bc2132 ("KVM: X86: Do the same ignore_msrs check for feature msrs") BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1178280 Cc: <stable@vger.kernel.org> Signed-off-by: NTakashi Iwai <tiwai@suse.de> Message-Id: <20201030151414.20165-1-tiwai@suse.de> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 22 10月, 2020 9 次提交
-
-
由 Maxim Levitsky 提交于
This will be used to signal an error to the userspace, in case the vendor code failed during handling of this msr. (e.g -ENOMEM) Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20201001112954.6258-4-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
This will allow the KVM to report such errors (e.g -ENOMEM) to the userspace. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20201001112954.6258-3-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
Return 1 on errors that are caused by wrong guest behavior (which will inject #GP to the guest) And return a negative error value on issues that are the kernel's fault (e.g -ENOMEM) Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20201001112954.6258-2-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
The current limit for guest CPUID leaves (KVM_MAX_CPUID_ENTRIES, 80) is reported to be insufficient but before we bump it let's switch to allocating vcpu->arch.cpuid_entries[] array dynamically. Currently, 'struct kvm_cpuid_entry2' is 40 bytes so vcpu->arch.cpuid_entries is 3200 bytes which accounts for 1/4 of the whole 'struct kvm_vcpu_arch' but having it pre-allocated (for all vCPUs which we also pre-allocate) gives us no real benefits. Another plus of the dynamic allocation is that we now do kvm_check_cpuid() check before we assign anything to vcpu->arch.cpuid_nent/cpuid_entries so no changes are made in case the check fails. Opportunistically remove unneeded 'out' labels from kvm_vcpu_ioctl_set_cpuid()/kvm_vcpu_ioctl_set_cpuid2() and return directly whenever possible. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20201001130541.1398392-3-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
-
由 Oliver Upton 提交于
KVM unconditionally provides PV features to the guest, regardless of the configured CPUID. An unwitting guest that doesn't check KVM_CPUID_FEATURES before use could access paravirt features that userspace did not intend to provide. Fix this by checking the guest's CPUID before performing any paravirtual operations. Introduce a capability, KVM_CAP_ENFORCE_PV_FEATURE_CPUID, to gate the aforementioned enforcement. Migrating a VM from a host w/o this patch to a host with this patch could silently change the ABI exposed to the guest, warranting that we default to the old behavior and opt-in for the new one. Reviewed-by: NJim Mattson <jmattson@google.com> Reviewed-by: NPeter Shier <pshier@google.com> Signed-off-by: NOliver Upton <oupton@google.com> Change-Id: I202a0926f65035b872bfe8ad15307c026de59a98 Message-Id: <20200818152429.1923996-4-oupton@google.com> Reviewed-by: NWanpeng Li <wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Oliver Upton 提交于
Small change to avoid meaningless duplication in the subsequent patch. No functional change intended. Reviewed-by: NJim Mattson <jmattson@google.com> Reviewed-by: NPeter Shier <pshier@google.com> Signed-off-by: NOliver Upton <oupton@google.com> Change-Id: I77ab9cdad239790766b7a49d5cbae5e57a3005ea Message-Id: <20200818152429.1923996-3-oupton@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Oliver Upton 提交于
No functional change intended. Reviewed-by: NJim Mattson <jmattson@google.com> Reviewed-by: NPeter Shier <pshier@google.com> Reviewed-by: NWanpeng Li <wanpengli@tencent.com> Signed-off-by: NOliver Upton <oupton@google.com> Change-Id: I7cbe71069db98d1ded612fd2ef088b70e7618426 Message-Id: <20200818152429.1923996-2-oupton@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Allowing userspace to intercept reads to x2APIC MSRs when APICV is fully enabled for the guest simply can't work. But more in general, the LAPIC could be set to in-kernel after the MSR filter is setup and allowing accesses by userspace would be very confusing. We could in principle allow userspace to intercept reads and writes to TPR, and writes to EOI and SELF_IPI, but while that could be made it work, it would still be silly. Cc: Alexander Graf <graf@amazon.com> Cc: Aaron Lewis <aaronlewis@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Rework the resetting of the MSR bitmap for x2APIC MSRs to ignore userspace filtering. Allowing userspace to intercept reads to x2APIC MSRs when APICV is fully enabled for the guest simply can't work; the LAPIC and thus virtual APIC is in-kernel and cannot be directly accessed by userspace. To keep things simple we will in fact forbid intercepting x2APIC MSRs altogether, independent of the default_allow setting. Cc: Alexander Graf <graf@amazon.com> Cc: Aaron Lewis <aaronlewis@google.com> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20201005195532.8674-3-sean.j.christopherson@intel.com> [Modified to operate even if APICv is disabled, adjust documentation. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 28 9月, 2020 13 次提交
-
-
由 Paolo Bonzini 提交于
KVM special-cases writes to MSR_IA32_TSC so that all CPUs have the same base for the TSC. This logic is complicated, and we do not want it to have any effect once the VM is started. In particular, if any guest started to synchronize its TSCs with writes to MSR_IA32_TSC rather than MSR_IA32_TSC_ADJUST, the additional effect of kvm_write_tsc code would be uncharted territory. Therefore, this patch makes writes to MSR_IA32_TSC behave essentially the same as writes to MSR_IA32_TSC_ADJUST when they come from the guest. A new selftest (which passes both before and after the patch) checks the current semantics of writes to MSR_IA32_TSC and MSR_IA32_TSC_ADJUST originating from both the host and the guest. Upcoming work to remove the special side effects of host-initiated writes to MSR_IA32_TSC and MSR_IA32_TSC_ADJUST will be able to build onto this test, adjusting the host side to use the new APIs and achieve the same effect. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
We are going to use it for SVM too, so use a more generic name. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Alexander Graf 提交于
It's not desireable to have all MSRs always handled by KVM kernel space. Some MSRs would be useful to handle in user space to either emulate behavior (like uCode updates) or differentiate whether they are valid based on the CPU model. To allow user space to specify which MSRs it wants to see handled by KVM, this patch introduces a new ioctl to push filter rules with bitmaps into KVM. Based on these bitmaps, KVM can then decide whether to reject MSR access. With the addition of KVM_CAP_X86_USER_SPACE_MSR it can also deflect the denied MSR events to user space to operate on. If no filter is populated, MSR handling stays identical to before. Signed-off-by: NAlexander Graf <graf@amazon.com> Message-Id: <20200925143422.21718-8-graf@amazon.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Alexander Graf 提交于
In the following commits we will add pieces of MSR filtering. To ensure that code compiles even with the feature half-merged, let's add a few stubs and struct definitions before the real patches start. Signed-off-by: NAlexander Graf <graf@amazon.com> Message-Id: <20200925143422.21718-4-graf@amazon.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Alexander Graf 提交于
MSRs are weird. Some of them are normal control registers, such as EFER. Some however are registers that really are model specific, not very interesting to virtualization workloads, and not performance critical. Others again are really just windows into package configuration. Out of these MSRs, only the first category is necessary to implement in kernel space. Rarely accessed MSRs, MSRs that should be fine tunes against certain CPU models and MSRs that contain information on the package level are much better suited for user space to process. However, over time we have accumulated a lot of MSRs that are not the first category, but still handled by in-kernel KVM code. This patch adds a generic interface to handle WRMSR and RDMSR from user space. With this, any future MSR that is part of the latter categories can be handled in user space. Furthermore, it allows us to replace the existing "ignore_msrs" logic with something that applies per-VM rather than on the full system. That way you can run productive VMs in parallel to experimental ones where you don't care about proper MSR handling. Signed-off-by: NAlexander Graf <graf@amazon.com> Reviewed-by: NJim Mattson <jmattson@google.com> Message-Id: <20200925143422.21718-3-graf@amazon.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Alexander Graf 提交于
When we find an MSR that we can not handle, bubble up that error code as MSR error return code. Follow up patches will use that to expose the fact that an MSR is not handled by KVM to user space. Suggested-by: NAaron Lewis <aaronlewis@google.com> Signed-off-by: NAlexander Graf <graf@amazon.com> Message-Id: <20200925143422.21718-2-graf@amazon.com> Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Rename the "shared_msrs" mechanism, which is used to defer restoring MSRs that are only consumed when running in userspace, to a more banal but less likely to be confusing "user_return_msrs". The "shared" nomenclature is confusing as it's not obvious who is sharing what, e.g. reasonable interpretations are that the guest value is shared by vCPUs in a VM, or that the MSR value is shared/common to guest and host, both of which are wrong. "shared" is also misleading as the MSR value (in hardware) is not guaranteed to be shared/reused between VMs (if that's indeed the correct interpretation of the name), as the ability to share values between VMs is simply a side effect (albiet a very nice side effect) of deferring restoration of the host value until returning from userspace. "user_return" avoids the above confusion by describing the mechanism itself instead of its effects. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-2-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Add RIP to the kvm_entry tracepoint to help debug if the kvm_exit tracepoint is disabled or if VM-Enter fails, in which case the kvm_exit tracepoint won't be hit. Read RIP from within the tracepoint itself to avoid a potential VMREAD and retpoline if the guest's RIP isn't available. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923201349.16097-2-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Replace the existing kvm_x86_ops.need_emulation_on_page_fault() with a more generic is_emulatable(), and unconditionally call the new function in x86_emulate_instruction(). KVM will use the generic hook to support multiple security related technologies that prevent emulation in one way or another. Similar to the existing AMD #NPF case where emulation of the current instruction is not possible due to lack of information, AMD's SEV-ES and Intel's SGX and TDX will introduce scenarios where emulation is impossible due to the guest's register state being inaccessible. And again similar to the existing #NPF case, emulation can be initiated by kvm_mmu_page_fault(), i.e. outside of the control of vendor-specific code. While the cause and architecturally visible behavior of the various cases are different, e.g. SGX will inject a #UD, AMD #NPF is a clean resume or complete shutdown, and SEV-ES and TDX "return" an error, the impact on the common emulation code is identical: KVM must stop emulation immediately and resume the guest. Query is_emulatable() in handle_ud() as well so that the force_emulation_prefix code doesn't incorrectly modify RIP before calling emulate_instruction() in the absurdly unlikely scenario that KVM encounters forced emulation in conjunction with "do not emulate". Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200915232702.15945-1-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
MSR reads/writes should always access the L1 state, since the (nested) hypervisor should intercept all the msrs it wants to adjust, and these that it doesn't should be read by the guest as if the host had read it. However IA32_TSC is an exception. Even when not intercepted, guest still reads the value + TSC offset. The write however does not take any TSC offset into account. This is documented in Intel's SDM and seems also to happen on AMD as well. This creates a problem when userspace wants to read the IA32_TSC value and then write it. (e.g for migration) In this case it reads L2 value but write is interpreted as an L1 value. To fix this make the userspace initiated reads of IA32_TSC return L1 value as well. Huge thanks to Dave Gilbert for helping me understand this very confusing semantic of MSR writes. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200921103805.9102-2-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Babu Moger 提交于
INVPCID instruction handling is mostly same across both VMX and SVM. So, move the code to common x86.c. Signed-off-by: NBabu Moger <babu.moger@amd.com> Reviewed-by: NJim Mattson <jmattson@google.com> Message-Id: <159985255212.11252.10322694343971983487.stgit@bmoger-ubuntu> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Babu Moger 提交于
Handling of kvm_read/write_guest_virt*() errors can be moved to common code. The same code can be used by both VMX and SVM. Signed-off-by: NBabu Moger <babu.moger@amd.com> Reviewed-by: NJim Mattson <jmattson@google.com> Message-Id: <159985254493.11252.6603092560732507607.stgit@bmoger-ubuntu> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
The kick after setting KVM_REQ_PENDING_TIMER is used to handle the timer fires on a different pCPU which vCPU is running on. This kick costs about 1000 clock cycles and we don't need this when injecting already-expired timer or when using the VMX preemption timer because kvm_lapic_expired_hv_timer() is called from the target vCPU. Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Message-Id: <1599731444-3525-6-git-send-email-wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 25 9月, 2020 1 次提交
-
-
由 Sean Christopherson 提交于
Reset the MMU context during kvm_set_cr4() if SMAP or PKE is toggled. Recent commits to (correctly) not reload PDPTRs when SMAP/PKE are toggled inadvertantly skipped the MMU context reset due to the mask of bits that triggers PDPTR loads also being used to trigger MMU context resets. Fixes: 427890af ("kvm: x86: Toggling CR4.SMAP does not load PDPTEs in PAE mode") Fixes: cb957adb ("kvm: x86: Toggling CR4.PKE does not load PDPTEs in PAE mode") Cc: Jim Mattson <jmattson@google.com> Cc: Peter Shier <pshier@google.com> Cc: Oliver Upton <oupton@google.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923215352.17756-1-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-