- 18 6月, 2021 27 次提交
-
-
由 Sean Christopherson 提交于
Remove the @reset_roots param from kvm_init_mmu(), the one user, kvm_mmu_reset_context() has already unloaded the MMU and thus freed and invalidated all roots. This also happens to be why the reset_roots=true paths doesn't leak roots; they're already invalid. No functional change intended. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-14-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Use __kvm_mmu_new_pgd() via kvm_init_shadow_ept_mmu() to emulate VMFUNC[EPTP_SWITCH] instead of nuking all MMUs. EPTP_SWITCH is the EPT equivalent of MOV to CR3, i.e. is a perfect fit for the common PGD flow, the only hiccup being that A/D enabling is buried in the EPTP. But, that is easily handled by bouncing through kvm_init_shadow_ept_mmu(). Explicitly request a guest TLB flush if VPID is disabled. Per Intel's SDM, if VPID is disabled, "an EPTP-switching VMFUNC invalidates combined mappings associated with VPID 0000H (for all PCIDs and for all EP4TA values, where EP4TA is the value of bits 51:12 of EPTP)". Note, this technically is a very bizarre bug fix of sorts if L2 is using PAE paging, as avoiding the full MMU reload also avoids incorrectly reloading the PDPTEs, which the SDM explicitly states are not touched: If PAE paging is in use, an EPTP-switching VMFUNC does not load the four page-directory-pointer-table entries (PDPTEs) from the guest-physical address in CR3. The logical processor continues to use the four guest-physical addresses already present in the PDPTEs. The guest-physical address in CR3 is not translated through the new EPT paging structures (until some operation that would load the PDPTEs). In addition to optimizing L2's MMU shenanigans, avoiding the full reload also optimizes L1's MMU as KVM_REQ_MMU_RELOAD wipes out all roots in both root_mmu and guest_mmu. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-12-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
When emulating INVVPID for L1, free only L2+ roots, using the guest_mode tag in the MMU role to identify L2+ roots. From L1's perspective, its own TLB entries use VPID=0, and INVVPID is not requied to invalidate such entries. Per Intel's SDM, INVVPID _may_ invalidate entries with VPID=0, but it is not required to do so. Cc: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-10-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Drop the dedicated nested_vmx_transition_mmu_sync() now that the MMU sync is handled via KVM_REQ_TLB_FLUSH_GUEST, and fold that flush into the all-encompassing nested_vmx_transition_tlb_flush(). Opportunistically add a comment explaning why nested EPT never needs to sync the MMU on VM-Enter. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-9-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Drop skip_mmu_sync and skip_tlb_flush from __kvm_mmu_new_pgd() now that all call sites unconditionally skip both the sync and flush. No functional change intended. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-8-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Drop bogus logic that incorrectly clobbers the accessed/dirty enabling status of the nested MMU on an EPTP switch. When nested EPT is enabled, walk_mmu points at L2's _legacy_ page tables, not L1's EPT for L2. This is likely a benign bug, as mmu->ept_ad is never consumed (since the MMU is not a nested EPT MMU), and stuffing mmu_role.base.ad_disabled will never propagate into future shadow pages since the nested MMU isn't used to map anything, just to walk L2's page tables. Note, KVM also does a full MMU reload, i.e. the guest_mmu will be recreated using the new EPTP, and thus any change in A/D enabling will be properly recognized in the relevant MMU. Fixes: 41ab9372 ("KVM: nVMX: Emulate EPTP switching for the L1 hypervisor") Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-4-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Use BIT_ULL() instead of an open-coded shift to check whether or not a function is enabled in L1's VMFUNC bitmap. This is a benign bug as KVM supports only bit 0, and will fail VM-Enter if any other bits are set, i.e. bits 63:32 are guaranteed to be zero. Note, "function" is bounded by hardware as VMFUNC will #UD before taking a VM-Exit if the function is greater than 63. Before: if ((vmcs12->vm_function_control & (1 << function)) == 0) 0x000000000001a916 <+118>: mov $0x1,%eax 0x000000000001a91b <+123>: shl %cl,%eax 0x000000000001a91d <+125>: cltq 0x000000000001a91f <+127>: and 0x128(%rbx),%rax After: if (!(vmcs12->vm_function_control & BIT_ULL(function & 63))) 0x000000000001a955 <+117>: mov 0x128(%rbx),%rdx 0x000000000001a95c <+124>: bt %rax,%rdx Fixes: 27c42a1b ("KVM: nVMX: Enable VMFUNC for the L1 hypervisor") Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-3-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Trigger a full TLB flush on behalf of the guest on nested VM-Enter and VM-Exit when VPID is disabled for L2. kvm_mmu_new_pgd() syncs only the current PGD, which can theoretically leave stale, unsync'd entries in a previous guest PGD, which could be consumed if L2 is allowed to load CR3 with PCID_NOFLUSH=1. Rename KVM_REQ_HV_TLB_FLUSH to KVM_REQ_TLB_FLUSH_GUEST so that it can be utilized for its obvious purpose of emulating a guest TLB flush. Note, there is no change the actual TLB flush executed by KVM, even though the fast PGD switch uses KVM_REQ_TLB_FLUSH_CURRENT. When VPID is disabled for L2, vpid02 is guaranteed to be '0', and thus nested_get_vpid02() will return the VPID that is shared by L1 and L2. Generate the request outside of kvm_mmu_new_pgd(), as getting the common helper to correctly identify which requested is needed is quite painful. E.g. using KVM_REQ_TLB_FLUSH_GUEST when nested EPT is in play is wrong as a TLB flush from the L1 kernel's perspective does not invalidate EPT mappings. And, by using KVM_REQ_TLB_FLUSH_GUEST, nVMX can do future simplification by moving the logic into nested_vmx_transition_tlb_flush(). Fixes: 41fab65e ("KVM: nVMX: Skip MMU sync on nested VMX transition when possible") Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-2-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
VMCS12 is used to keep the authoritative state during nested state migration. In case 'need_vmcs12_to_shadow_sync' flag is set, we're in between L2->L1 vmexit and L1 guest run when actual sync to enlightened (or shadow) VMCS happens. Nested state, however, has no flag for 'need_vmcs12_to_shadow_sync' so vmx_set_nested_state()-> set_current_vmptr() always sets it. Enlightened vmptrld path, however, doesn't have the quirk so some VMCS12 changes may not get properly reflected to eVMCS and L1 will see an incorrect state. Note, during L2 execution or when need_vmcs12_to_shadow_sync is not set the change is effectively a nop: in the former case all changes will get reflected during the first L2->L1 vmexit and in the later case VMCS12 and eVMCS are already in sync (thanks to copy_enlightened_to_vmcs12() in vmx_get_nested_state()). Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com> Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20210526132026.270394-11-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
When nested state migration happens during L1's execution, it is incorrect to modify eVMCS as it is L1 who 'owns' it at the moment. At least genuine Hyper-V seems to not be very happy when 'clean fields' data changes underneath it. 'Clean fields' data is used in KVM twice: by copy_enlightened_to_vmcs12() and prepare_vmcs02_rare() so we can reset it from prepare_vmcs02() instead. While at it, update a comment stating why exactly we need to reset 'hv_clean_fields' data from L0. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20210526132026.270394-10-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
'need_vmcs12_to_shadow_sync' is used for both shadow and enlightened VMCS sync when we exit to L1. The comment in nested_vmx_failValid() validly states why shadow vmcs sync can be omitted but this doesn't apply to enlightened VMCS as it 'shadows' all VMCS12 fields. Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com> Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20210526132026.270394-9-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
'Clean fields' data from enlightened VMCS is only valid upon vmentry: L1 hypervisor is not obliged to keep it up-to-date while it is mangling L2's state, KVM_GET_NESTED_STATE request may come at a wrong moment when actual eVMCS changes are unsynchronized with 'hv_clean_fields'. As upon migration VMCS12 is used as a source of ultimate truth, we must make sure we pick all the changes to eVMCS and thus 'clean fields' data must be ignored. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20210526132026.270394-8-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
Unlike VMREAD/VMWRITE/VMPTRLD, VMCLEAR is a valid instruction when enlightened VMCS is in use. TLFS has the following brief description: "The L1 hypervisor can execute a VMCLEAR instruction to transition an enlightened VMCS from the active to the non-active state". Normally, this change can be ignored as unmapping active eVMCS can be postponed until the next VMLAUNCH instruction but in case nested state is migrated with KVM_GET_NESTED_STATE/KVM_SET_NESTED_STATE, keeping eVMCS mapped may result in its synchronization with VMCS12 and this is incorrect: L1 hypervisor is free to reuse inactive eVMCS memory for something else. Inactive eVMCS after VMCLEAR can just be unmapped. Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com> Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20210526132026.270394-7-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
Unlike regular set_current_vmptr(), nested_vmx_handle_enlightened_vmptrld() can not be called directly from vmx_set_nested_state() as KVM may not have all the information yet (e.g. HV_X64_MSR_VP_ASSIST_PAGE MSR may not be restored yet). Enlightened VMCS is mapped later while getting nested state pages. In the meantime, vmx->nested.hv_evmcs_vmptr remains 'EVMPTR_INVALID' and it's indistinguishable from 'evmcs is not in use' case. This leads to certain issues, in particular, if KVM_GET_NESTED_STATE is called right after KVM_SET_NESTED_STATE, KVM_STATE_NESTED_EVMCS flag in the resulting state will be unset (and such state will later fail to load). Introduce 'EVMPTR_MAP_PENDING' state to detect not-yet-mapped eVMCS after restore. With this, the 'is_guest_mode(vcpu)' hack in vmx_has_valid_vmcs12() is no longer needed. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20210526132026.270394-6-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
copy_vmcs12_to_enlightened()/copy_enlightened_to_vmcs12() don't return any result, make them return 'void'. No functional change intended. Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20210526132026.270394-5-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
In theory, L1 can try to disable enlightened VMENTRY in VP assist page and try to issue VMLAUNCH/VMRESUME. While nested_vmx_handle_enlightened_vmptrld() properly handles this as 'EVMPTRLD_DISABLED', previously mapped eVMCS remains mapped and thus all evmptr_is_valid() checks will still pass and nested_vmx_run() will proceed when it shouldn't. Release eVMCS immediately when we detect that enlightened vmentry was disabled by L1. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20210526132026.270394-4-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
'dirty_vmcs12' is only checked in prepare_vmcs02_early()/prepare_vmcs02() and both checks look like: 'vmx->nested.dirty_vmcs12 || evmptr_is_valid(vmx->nested.hv_evmcs_vmptr)' so for eVMCS case the flag changes nothing. Drop the assignment to avoid the confusion. No functional change intended. Reported-by: NMaxim Levitsky <mlevitsk@redhat.com> Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20210526132026.270394-3-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
Instead of checking 'vmx->nested.hv_evmcs' use '-1' in 'vmx->nested.hv_evmcs_vmptr' to indicate 'evmcs is not in use' state. This matches how we check 'vmx->nested.current_vmptr'. Introduce EVMPTR_INVALID and evmptr_is_valid() and use it instead of raw '-1' check as a preparation to adding other 'special' values. No functional change intended. Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20210526132026.270394-2-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
if new KVM_*_SREGS2 ioctls are used, the PDPTRs are a part of the migration state and are correctly restored by those ioctls. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210607090203.133058-9-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
Similar to the rest of guest page accesses after a migration, this access should be delayed to KVM_REQ_GET_NESTED_STATE_PAGES. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210607090203.133058-6-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Remove the pdptrs_changed() check when loading L2's CR3. The set of available registers is always reset when switching VMCSes (see commit e5d03de5, "KVM: nVMX: Reset register cache (available and dirty masks) on VMCS switch"), thus the "are PDPTRs available" check will always fail. And even if it didn't fail, reading guest memory to check the PDPTRs is just as expensive as reading guest memory to load 'em. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210607090203.133058-2-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Krish Sadhukhan 提交于
Currently, the 'nested_run' statistic counts all guest-entry attempts, including those that fail during vmentry checks on Intel and during consistency checks on AMD. Convert this statistic to count only those guest-entries that make it past these state checks and make it to guest code. This will tell us the number of guest-entries that actually executed or tried to execute guest code. Signed-off-by: NKrish Sadhukhan <Krish.Sadhukhan@oracle.com> Message-Id: <20210609180340.104248-2-krish.sadhukhan@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jim Mattson 提交于
Don't allow posted interrupts to modify a stale posted interrupt descriptor (including the initial value of 0). Empirical tests on real hardware reveal that a posted interrupt descriptor referencing an unbacked address has PCI bus error semantics (reads as all 1's; writes are ignored). However, kvm can't distinguish unbacked addresses from device-backed (MMIO) addresses, so it should really ask userspace for an MMIO completion. That's overly complicated, so just punt with KVM_INTERNAL_ERROR. Don't return the error until the posted interrupt descriptor is actually accessed. We don't want to break the existing kvm-unit-tests that assume they can launch an L2 VM with a posted interrupt descriptor that references MMIO space in L1. Fixes: 6beb7bd5 ("kvm: nVMX: Refactor nested_get_vmcs12_pages()") Signed-off-by: NJim Mattson <jmattson@google.com> Message-Id: <20210604172611.281819-8-jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jim Mattson 提交于
When the kernel has no mapping for the vmcs02 virtual APIC page, userspace MMIO completion is necessary to process nested posted interrupts. This is not a configuration that KVM supports. Rather than silently ignoring the problem, try to exit to userspace with KVM_INTERNAL_ERROR. Note that the event that triggers this error is consumed as a side-effect of a call to kvm_check_nested_events. On some paths (notably through kvm_vcpu_check_block), the error is dropped. In any case, this is an incremental improvement over always ignoring the error. Signed-off-by: NJim Mattson <jmattson@google.com> Message-Id: <20210604172611.281819-7-jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jim Mattson 提交于
No functional change intended. Signed-off-by: NJim Mattson <jmattson@google.com> Reviewed-by: NOliver Upton <oupton@google.com> Message-Id: <20210604172611.281819-4-jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ilias Stamatis 提交于
Calculate the TSC offset and multiplier on nested transitions and expose the TSC scaling feature to L1. Signed-off-by: NIlias Stamatis <ilstam@amazon.com> Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210526184418.28881-11-ilstam@amazon.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ilias Stamatis 提交于
Currently vmx_vcpu_load_vmcs() writes the TSC_MULTIPLIER field of the VMCS every time the VMCS is loaded. Instead of doing this, set this field from common code on initialization and whenever the scaling ratio changes. Additionally remove vmx->current_tsc_ratio. This field is redundant as vcpu->arch.tsc_scaling_ratio already tracks the current TSC scaling ratio. The vmx->current_tsc_ratio field is only used for avoiding unnecessary writes but it is no longer needed after removing the code from the VMCS load path. Suggested-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NIlias Stamatis <ilstam@amazon.com> Message-Id: <20210607105438.16541-1-ilstam@amazon.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 07 5月, 2021 1 次提交
-
-
由 Vitaly Kuznetsov 提交于
When enlightened VMCS is in use and nested state is migrated with vmx_get_nested_state()/vmx_set_nested_state() KVM can't map evmcs page right away: evmcs gpa is not 'struct kvm_vmx_nested_state_hdr' and we can't read it from VP assist page because userspace may decide to restore HV_X64_MSR_VP_ASSIST_PAGE after restoring nested state (and QEMU, for example, does exactly that). To make sure eVMCS is mapped /vmx_set_nested_state() raises KVM_REQ_GET_NESTED_STATE_PAGES request. Commit f2c7ef3b ("KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit") added KVM_REQ_GET_NESTED_STATE_PAGES clearing to nested_vmx_vmexit() to make sure MSR permission bitmap is not switched when an immediate exit from L2 to L1 happens right after migration (caused by a pending event, for example). Unfortunately, in the exact same situation we still need to have eVMCS mapped so nested_sync_vmcs12_to_shadow() reflects changes in VMCS12 to eVMCS. As a band-aid, restore nested_get_evmcs_page() when clearing KVM_REQ_GET_NESTED_STATE_PAGES in nested_vmx_vmexit(). The 'fix' is far from being ideal as we can't easily propagate possible failures and even if we could, this is most likely already too late to do so. The whole 'KVM_REQ_GET_NESTED_STATE_PAGES' idea for mapping eVMCS after migration seems to be fragile as we diverge too much from the 'native' path when vmptr loading happens on vmx_set_nested_state(). Fixes: f2c7ef3b ("KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit") Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210503150854.1144255-2-vkuznets@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 26 4月, 2021 4 次提交
-
-
由 Sean Christopherson 提交于
Append raw to the direct variants of kvm_register_read/write(), and drop the "l" from the mode-aware variants. I.e. make the mode-aware variants the default, and make the direct variants scary sounding so as to discourage use. Accessing the full 64-bit values irrespective of mode is rarely the desired behavior. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210422022128.3464144-10-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Drop bits 63:32 of the base and/or index GPRs when calculating the effective address of a VMX instruction memory operand. Outside of 64-bit mode, memory encodings are strictly limited to E*X and below. Fixes: 064aea77 ("KVM: nVMX: Decoding memory operands of VMX instructions") Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210422022128.3464144-7-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Drop bits 63:32 of the VMCS field encoding when checking for a nested VM-Exit on VMREAD/VMWRITE in !64-bit mode. VMREAD and VMWRITE always use 32-bit operands outside of 64-bit mode. The actual emulation of VMREAD/VMWRITE does the right thing, this bug is purely limited to incorrectly causing a nested VM-Exit if a GPR happens to have bits 63:32 set outside of 64-bit mode. Fixes: a7cde481 ("KVM: nVMX: Do not forward VMREAD/VMWRITE VMExits to L1 if required so by vmcs12 vmread/vmwrite bitmaps") Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210422022128.3464144-6-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Disable pass-through of the FS and GS base MSRs for 32-bit KVM. Intel's SDM unequivocally states that the MSRs exist if and only if the CPU supports x86-64. FS_BASE and GS_BASE are mostly a non-issue; a clever guest could opportunistically use the MSRs without issue. KERNEL_GS_BASE is a bigger problem, as a clever guest would subtly be broken if it were migrated, as KVM disallows software access to the MSRs, and unlike the direct variants, KERNEL_GS_BASE needs to be explicitly migrated as it's not captured in the VMCS. Fixes: 25c5f225 ("KVM: VMX: Enable MSR Bitmap feature") Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210422023831.3473491-1-seanjc@google.com> [*NOT* for stable kernels. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 20 4月, 2021 2 次提交
-
-
由 Sean Christopherson 提交于
Enable SGX virtualization now that KVM has the VM-Exit handlers needed to trap-and-execute ENCLS to ensure correctness and/or enforce the CPU model exposed to the guest. Add a KVM module param, "sgx", to allow an admin to disable SGX virtualization independent of the kernel. When supported in hardware and the kernel, advertise SGX1, SGX2 and SGX LC to userspace via CPUID and wire up the ENCLS_EXITING bitmap based on the guest's SGX capabilities, i.e. to allow ENCLS to be executed in an SGX-enabled guest. With the exception of the provision key, all SGX attribute bits may be exposed to the guest. Guest access to the provision key, which is controlled via securityfs, will be added in a future patch. Note, KVM does not yet support exposing ENCLS_C leafs or ENCLV leafs. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NKai Huang <kai.huang@intel.com> Message-Id: <a99e9c23310c79f2f4175c1af4c4cbcef913c3e5.1618196135.git.kai.huang@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Add support for handling VM-Exits that originate from a guest SGX enclave. In SGX, an "enclave" is a new CPL3-only execution environment, wherein the CPU and memory state is protected by hardware to make the state inaccesible to code running outside of the enclave. When exiting an enclave due to an asynchronous event (from the perspective of the enclave), e.g. exceptions, interrupts, and VM-Exits, the enclave's state is automatically saved and scrubbed (the CPU loads synthetic state), and then reloaded when re-entering the enclave. E.g. after an instruction based VM-Exit from an enclave, vmcs.GUEST_RIP will not contain the RIP of the enclave instruction that trigered VM-Exit, but will instead point to a RIP in the enclave's untrusted runtime (the guest userspace code that coordinates entry/exit to/from the enclave). To help a VMM recognize and handle exits from enclaves, SGX adds bits to existing VMCS fields, VM_EXIT_REASON.VMX_EXIT_REASON_FROM_ENCLAVE and GUEST_INTERRUPTIBILITY_INFO.GUEST_INTR_STATE_ENCLAVE_INTR. Define the new architectural bits, and add a boolean to struct vcpu_vmx to cache VMX_EXIT_REASON_FROM_ENCLAVE. Clear the bit in exit_reason so that checks against exit_reason do not need to account for SGX, e.g. "if (exit_reason == EXIT_REASON_EXCEPTION_NMI)" continues to work. KVM is a largely a passive observer of the new bits, e.g. KVM needs to account for the bits when propagating information to a nested VMM, but otherwise doesn't need to act differently for the majority of VM-Exits from enclaves. The one scenario that is directly impacted is emulation, which is for all intents and purposes impossible[1] since KVM does not have access to the RIP or instruction stream that triggered the VM-Exit. The inability to emulate is a non-issue for KVM, as most instructions that might trigger VM-Exit unconditionally #UD in an enclave (before the VM-Exit check. For the few instruction that conditionally #UD, KVM either never sets the exiting control, e.g. PAUSE_EXITING[2], or sets it if and only if the feature is not exposed to the guest in order to inject a #UD, e.g. RDRAND_EXITING. But, because it is still possible for a guest to trigger emulation, e.g. MMIO, inject a #UD if KVM ever attempts emulation after a VM-Exit from an enclave. This is architecturally accurate for instruction VM-Exits, and for MMIO it's the least bad choice, e.g. it's preferable to killing the VM. In practice, only broken or particularly stupid guests should ever encounter this behavior. Add a WARN in skip_emulated_instruction to detect any attempt to modify the guest's RIP during an SGX enclave VM-Exit as all such flows should either be unreachable or must handle exits from enclaves before getting to skip_emulated_instruction. [1] Impossible for all practical purposes. Not truly impossible since KVM could implement some form of para-virtualization scheme. [2] PAUSE_LOOP_EXITING only affects CPL0 and enclaves exist only at CPL3, so we also don't need to worry about that interaction. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NKai Huang <kai.huang@intel.com> Message-Id: <315f54a8507d09c292463ef29104e1d4c62e9090.1618196135.git.kai.huang@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 17 4月, 2021 1 次提交
-
-
由 Maxim Levitsky 提交于
Injected interrupts/nmi should not block a pending exception, but rather be either lost if nested hypervisor doesn't intercept the pending exception (as in stock x86), or be delivered in exitintinfo/IDT_VECTORING_INFO field, as a part of a VMexit that corresponds to the pending exception. The only reason for an exception to be blocked is when nested run is pending (and that can't really happen currently but still worth checking for). Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210401143817.1030695-2-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 22 3月, 2021 1 次提交
-
-
由 Ingo Molnar 提交于
Fix another ~42 single-word typos in arch/x86/ code comments, missed a few in the first pass, in particular in .S files. Signed-off-by: NIngo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-kernel@vger.kernel.org
-
- 15 3月, 2021 4 次提交
-
-
由 Sean Christopherson 提交于
Synthesize a nested VM-Exit if L2 triggers an emulated triple fault instead of exiting to userspace, which likely will kill L1. Any flow that does KVM_REQ_TRIPLE_FAULT is suspect, but the most common scenario for L2 killing L1 is if L0 (KVM) intercepts a contributory exception that is _not_intercepted by L1. E.g. if KVM is intercepting #GPs for the VMware backdoor, a #GP that occurs in L2 while vectoring an injected #DF will cause KVM to emulate triple fault. Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210302174515.2812275-2-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Move KVM's CC() macro to x86.h so that it can be reused by nSVM. Debugging VM-Enter is as painful on SVM as it is on VMX. Rename the more visible macro to KVM_NESTED_VMENTER_CONSISTENCY_CHECK to avoid any collisions with the uber-concise "CC". Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210204000117.3303214-12-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Defer reloading the MMU after a EPTP successful EPTP switch. The VMFUNC instruction itself is executed in the previous EPTP context, any side effects, e.g. updating RIP, should occur in the old context. Practically speaking, this bug is benign as VMX doesn't touch the MMU when skipping an emulated instruction, nor does queuing a single-step #DB. No other post-switch side effects exist. Fixes: 41ab9372 ("KVM: nVMX: Emulate EPTP switching for the L1 hypervisor") Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-14-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Dongli Zhang 提交于
The new per-cpu stat 'nested_run' is introduced in order to track if L1 VM is running or used to run L2 VM. An example of the usage of 'nested_run' is to help the host administrator to easily track if any L1 VM is used to run L2 VM. Suppose there is issue that may happen with nested virtualization, the administrator will be able to easily narrow down and confirm if the issue is due to nested virtualization via 'nested_run'. For example, whether the fix like commit 88dddc11 ("KVM: nVMX: do not use dangling shadow VMCS after guest reset") is required. Cc: Joe Jin <joe.jin@oracle.com> Signed-off-by: NDongli Zhang <dongli.zhang@oracle.com> Message-Id: <20210305225747.7682-1-dongli.zhang@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-