- 28 5月, 2020 2 次提交
-
-
由 Paolo Bonzini 提交于
In case an interrupt arrives after nested.check_events but before the call to kvm_cpu_has_injectable_intr, we could end up enabling the interrupt window even if the interrupt is actually going to be a vmexit. This is useless rather than harmful, but it really complicates reasoning about SVM's handling of the VINTR intercept. We'd like to never bother with the VINTR intercept if V_INTR_MASKING=1 && INTERCEPT_INTR=1, because in that case there is no interrupt window and we can just exit the nested guest whenever we want. This patch moves the opening of the interrupt window inside inject_pending_event. This consolidates the check for pending interrupt/NMI/SMI in one place, and makes KVM's usage of immediate exits more consistent, extending it beyond just nested virtualization. There are two functional changes here. They only affect corner cases, but overall they simplify the inject_pending_event. - re-injection of still-pending events will also use req_immediate_exit instead of using interrupt-window intercepts. This should have no impact on performance on Intel since it simply replaces an interrupt-window or NMI-window exit for a preemption-timer exit. On AMD, which has no equivalent of the preemption time, it may incur some overhead but an actual effect on performance should only be visible in pathological cases. - kvm_arch_interrupt_allowed and kvm_vcpu_has_events will return true if an interrupt, NMI or SMI is blocked by nested_run_pending. This makes sense because entering the VM will allow it to make progress and deliver the event. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Take a u32 for the index in has_emulated_msr() to match hardware, which treats MSR indices as unsigned 32-bit values. Functionally, taking a signed int doesn't cause problems with the current code base, but could theoretically cause problems with 32-bit KVM, e.g. if the index were checked via a less-than statement, which would evaluate incorrectly for MSR indices with bit 31 set. Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200218234012.7110-3-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 16 5月, 2020 7 次提交
-
-
由 David Matlack 提交于
Two new stats for exposing halt-polling cpu usage: halt_poll_success_ns halt_poll_fail_ns Thus sum of these 2 stats is the total cpu time spent polling. "success" means the VCPU polled until a virtual interrupt was delivered. "fail" means the VCPU had to schedule out (either because the maximum poll time was reached or it needed to yield the CPU). To avoid touching every arch's kvm_vcpu_stat struct, only update and export halt-polling cpu usage stats if we're on x86. Exporting cpu usage as a u64 and in nanoseconds means we will overflow at ~500 years, which seems reasonably large. Signed-off-by: NDavid Matlack <dmatlack@google.com> Signed-off-by: NJon Cargille <jcargill@google.com> Reviewed-by: NJim Mattson <jmattson@google.com> Message-Id: <20200508182240.68440-1-jcargill@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jim Mattson 提交于
The hrtimer used to emulate the VMX-preemption timer must be pinned to the same logical processor as the vCPU thread to be interrupted if we want to have any hope of adhering to the architectural specification of the VMX-preemption timer. Even with this change, the emulated VMX-preemption timer VM-exit occasionally arrives too late. Signed-off-by: NJim Mattson <jmattson@google.com> Reviewed-by: NPeter Shier <pshier@google.com> Reviewed-by: NOliver Upton <oupton@google.com> Message-Id: <20200508203643.85477-4-jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
Adds a fastpath_t typedef since enum lines are a bit long, and replace EXIT_FASTPATH_SKIP_EMUL_INS with two new exit_fastpath_completion enum values. - EXIT_FASTPATH_EXIT_HANDLED kvm will still go through it's full run loop, but it would skip invoking the exit handler. - EXIT_FASTPATH_REENTER_GUEST complete fastpath, guest can be re-entered without invoking the exit handler or going back to vcpu_run Tested-by: NHaiwei Li <lihaiwei@tencent.com> Cc: Haiwei Li <lihaiwei@tencent.com> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Message-Id: <1588055009-12677-4-git-send-email-wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Peter Xu 提交于
Forcing the ASYNC_PF_PER_VCPU to be power of two is much easier to be used rather than calling roundup_pow_of_two() from time to time. Do this by adding a BUILD_BUG_ON() inside the hash function. Another point is that generally async pf does not allow concurrency over ASYNC_PF_PER_VCPU after all (see kvm_setup_async_pf()), so it does not make much sense either to have it not a power of two or some of the entries will definitely be wasted. Signed-off-by: NPeter Xu <peterx@redhat.com> Message-Id: <20200416155859.267366-1-peterx@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Replace KVM's PT_PAGE_TABLE_LEVEL, PT_DIRECTORY_LEVEL and PT_PDPE_LEVEL with the kernel's PG_LEVEL_4K, PG_LEVEL_2M and PG_LEVEL_1G. KVM's enums are borderline impossible to remember and result in code that is visually difficult to audit, e.g. if (!enable_ept) ept_lpage_level = 0; else if (cpu_has_vmx_ept_1g_page()) ept_lpage_level = PT_PDPE_LEVEL; else if (cpu_has_vmx_ept_2m_page()) ept_lpage_level = PT_DIRECTORY_LEVEL; else ept_lpage_level = PT_PAGE_TABLE_LEVEL; versus if (!enable_ept) ept_lpage_level = 0; else if (cpu_has_vmx_ept_1g_page()) ept_lpage_level = PG_LEVEL_1G; else if (cpu_has_vmx_ept_2m_page()) ept_lpage_level = PG_LEVEL_2M; else ept_lpage_level = PG_LEVEL_4K; No functional change intended. Suggested-by: NBarret Rhoden <brho@google.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200428005422.4235-4-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Rename PT_MAX_HUGEPAGE_LEVEL to KVM_MAX_HUGEPAGE_LEVEL and make it a separate define in anticipation of dropping KVM's PT_*_LEVEL enums in favor of the kernel's PG_LEVEL_* enums. No functional change intended. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200428005422.4235-3-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Xiaoyao Li 提交于
vcpu->arch.guest_xstate_size lost its only user since commit df1daba7 ("KVM: x86: support XSAVES usage in the host"), so clean it up. Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20200429154312.1411-1-xiaoyao.li@intel.com> Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 14 5月, 2020 7 次提交
-
-
由 Sean Christopherson 提交于
Snapshot the TDP level now that it's invariant (SVM) or dependent only on host capabilities and guest CPUID (VMX). This avoids having to call kvm_x86_ops.get_tdp_level() when initializing a TDP MMU and/or calculating the page role, and thus avoids the associated retpoline. Drop the WARN in vmx_get_tdp_level() as updating CPUID while L2 is active is legal, if dodgy. No functional change intended. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200502043234.12481-11-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Move CR0 caching into the standard register caching mechanism in order to take advantage of the availability checks provided by regs_avail. This avoids multiple VMREADs in the (uncommon) case where kvm_read_cr0() is called multiple times in a single VM-Exit, and more importantly eliminates a kvm_x86_ops hook, saves a retpoline on SVM when reading CR0, and squashes the confusing naming discrepancy of "cache_reg" vs. "decache_cr0_guest_bits". No functional change intended. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200502043234.12481-8-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Move CR4 caching into the standard register caching mechanism in order to take advantage of the availability checks provided by regs_avail. This avoids multiple VMREADs and retpolines (when configured) during nested VMX transitions as kvm_read_cr4_bits() is invoked multiple times on each transition, e.g. when stuffing CR0 and CR3. As an added bonus, this eliminates a kvm_x86_ops hook, saves a retpoline on SVM when reading CR4, and squashes the confusing naming discrepancy of "cache_reg" vs. "decache_cr4_guest_bits". No functional change intended. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200502043234.12481-7-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Save L1's TSC offset in 'struct kvm_vcpu_arch' and drop the kvm_x86_ops hook read_l1_tsc_offset(). This avoids a retpoline (when configured) when reading L1's effective TSC, which is done at least once on every VM-Exit. No functional change intended. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200502043234.12481-2-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Add an argument to interrupt_allowed and nmi_allowed, to checking if interrupt injection is blocked. Use the hook to handle the case where an interrupt arrives between check_nested_events() and the injection logic. Drop the retry of check_nested_events() that hack-a-fixed the same condition. Blocking injection is also a bit of a hack, e.g. KVM should do exiting and non-exiting interrupt processing in a single pass, but it's a more precise hack. The old comment is also misleading, e.g. KVM_REQ_EVENT is purely an optimization, setting it on every run loop (which KVM doesn't do) should not affect functionality, only performance. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200423022550.15113-13-sean.j.christopherson@intel.com> [Extend to SVM, add SMI and NMI. Even though NMI and SMI cannot come asynchronously right now, making the fix generic is easy and removes a special case. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Return an actual bool for kvm_x86_ops' {interrupt_nmi}_allowed() hook to better reflect the return semantics, and to avoid creating an even bigger mess when the related VMX code is refactored in upcoming patches. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200423022550.15113-5-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Add a kvm_x86_ops hook to detect a nested pending "hypervisor timer" and use it to effectively open a window for servicing the expired timer. Like pending SMIs on VMX, opening a window simply means requesting an immediate exit. This fixes a bug where an expired VMX preemption timer (for L2) will be delayed and/or lost if a pending exception is injected into L2. The pending exception is rightly prioritized by vmx_check_nested_events() and injected into L2, with the preemption timer left pending. Because no window opened, L2 is free to run uninterrupted. Fixes: f4124500 ("KVM: nVMX: Fully emulate preemption timer") Reported-by: NJim Mattson <jmattson@google.com> Cc: Oliver Upton <oupton@google.com> Cc: Peter Shier <pshier@google.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200423022550.15113-3-sean.j.christopherson@intel.com> [Check it in kvm_vcpu_has_events too, to ensure that the preemption timer is serviced promptly even if the vCPU is halted and L1 is not intercepting HLT. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 13 5月, 2020 1 次提交
-
-
由 Babu Moger 提交于
Though rdpkru and wrpkru are contingent upon CR4.PKE, the PKRU resource isn't. It can be read with XSAVE and written with XRSTOR. So, if we don't set the guest PKRU value here(kvm_load_guest_xsave_state), the guest can read the host value. In case of kvm_load_host_xsave_state, guest with CR4.PKE clear could potentially use XRSTOR to change the host PKRU value. While at it, move pkru state save/restore to common code and the host_pkru field to kvm_vcpu_arch. This will let SVM support protection keys. Cc: stable@vger.kernel.org Reported-by: NJim Mattson <jmattson@google.com> Signed-off-by: NBabu Moger <babu.moger@amd.com> Message-Id: <158932794619.44260.14508381096663848853.stgit@naples-babu.amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 08 5月, 2020 2 次提交
-
-
由 Paolo Bonzini 提交于
There are two issues with KVM_EXIT_DEBUG on AMD, whose root cause is the different handling of DR6 on intercepted #DB exceptions on Intel and AMD. On Intel, #DB exceptions transmit the DR6 value via the exit qualification field of the VMCS, and the exit qualification only contains the description of the precise event that caused a vmexit. On AMD, instead the DR6 field of the VMCB is filled in as if the #DB exception was to be injected into the guest. This has two effects when guest debugging is in use: * the guest DR6 is clobbered * the kvm_run->debug.arch.dr6 field can accumulate more debug events, rather than just the last one that happened (the testcase in the next patch covers this issue). This patch fixes both issues by emulating, so to speak, the Intel behavior on AMD processors. The important observation is that (after the previous patches) the VMCB value of DR6 is only ever observable from the guest is KVM_DEBUGREG_WONT_EXIT is set. Therefore we can actually set vmcb->save.dr6 to any value we want as long as KVM_DEBUGREG_WONT_EXIT is clear, which it will be if guest debugging is enabled. Therefore it is possible to enter the guest with an all-zero DR6, reconstruct the #DB payload from the DR6 we get at exit time, and let kvm_deliver_exception_payload move the newly set bits into vcpu->arch.dr6. Some extra bits may be included in the payload if KVM_DEBUGREG_WONT_EXIT is set, but this is harmless. This may not be the most optimized way to deal with this, but it is simple and, being confined within SVM code, it gets rid of the set_dr6 callback and kvm_update_dr6. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
kvm_x86_ops.set_dr6 is only ever called with vcpu->arch.dr6 as the second argument. Ensure that the VMCB value is synchronized to vcpu->arch.dr6 on #DB (both "normal" and nested) and nested vmentry, so that the current value of DR6 is always available in vcpu->arch.dr6. The get_dr6 callback can just access vcpu->arch.dr6 and becomes redundant. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 07 5月, 2020 1 次提交
-
-
由 Paolo Bonzini 提交于
Go through kvm_queue_exception_p so that the payload is correctly delivered through the exit qualification, and add a kvm_update_dr6 call to kvm_deliver_exception_payload that is needed on AMD. Reported-by: NPeter Xu <peterx@redhat.com> Reviewed-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 05 5月, 2020 1 次提交
-
-
由 Suravee Suthikulpanit 提交于
Current logic incorrectly uses the enum ioapic_irq_destination_types to check the posted interrupt destination types. However, the value was set using APIC_DM_XXX macros, which are left-shifted by 8 bits. Fixes by using the APIC_DM_FIXED and APIC_DM_LOWEST instead. Fixes: (fdcf7562 'KVM: x86: Disable posted interrupts for non-standard IRQs delivery modes') Cc: Alexander Graf <graf@amazon.com> Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <1586239989-58305-1-git-send-email-suravee.suthikulpanit@amd.com> Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com> Tested-by: NMaxim Levitsky <mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 23 4月, 2020 1 次提交
-
-
由 Paolo Bonzini 提交于
Clean up some of the patching of kvm_x86_ops, by moving kvm_x86_ops related to nested virtualization into a separate struct. As a result, these ops will always be non-NULL on VMX. This is not a problem: * check_nested_events is only called if is_guest_mode(vcpu) returns true * get_nested_state treats VMXOFF state the same as nested being disabled * set_nested_state fails if you attempt to set nested state while nesting is disabled * nested_enable_evmcs could already be called on a CPU without VMX enabled in CPUID. * nested_get_evmcs_version was fixed in the previous patch Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 21 4月, 2020 12 次提交
-
-
由 Wanpeng Li 提交于
IPI and Timer cause the main MSRs write vmexits in cloud environment observation, let's optimize virtual IPI latency more aggressively to inject target IPI as soon as possible. Running kvm-unit-tests/vmexit.flat IPI testing on SKX server, disable adaptive advance lapic timer and adaptive halt-polling to avoid the interference, this patch can give another 7% improvement. w/o fastpath -> x86.c fastpath 4238 -> 3543 16.4% x86.c fastpath -> vmx.c fastpath 3543 -> 3293 7% w/o fastpath -> vmx.c fastpath 4238 -> 3293 22.3% Cc: Haiwei Li <lihaiwei@tencent.com> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200410174703.1138-3-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Introduce a new "extended register" type, EXIT_INFO_2 (to pair with the nomenclature in .get_exit_info()), and use it to cache VMX's vmcs.EXIT_INTR_INFO. Drop a comment in vmx_recover_nmi_blocking() that is obsoleted by the generic caching mechanism. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200415203454.8296-6-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Introduce a new "extended register" type, EXIT_INFO_1 (to pair with the nomenclature in .get_exit_info()), and use it to cache VMX's vmcs.EXIT_QUALIFICATION. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200415203454.8296-5-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Rename functions and variables in kvm_mmu_new_cr3() and related code to replace "cr3" with "pgd", i.e. continue the work started by commit 727a7e27 ("KVM: x86: rename set_cr3 callback and related flags to load_mmu_pgd"). kvm_mmu_new_cr3() and company are not always loading a new CR3, e.g. when nested EPT is enabled "cr3" is actually an EPTP. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-37-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Add a separate "skip" override for MMU sync, a future change to avoid TLB flushes on nested VMX transitions may need to sync the MMU even if the TLB flush is unnecessary. Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-32-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Move the retrieval of the HPA associated with L1's APIC access page into VMX code to avoid unnecessarily calling gfn_to_page(), e.g. when the vCPU is in guest mode (L2). Alternatively, the optimization logic in VMX could be mirrored into the common x86 code, but that will get ugly fast when further optimizations are introduced. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-29-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Add KVM_REQ_TLB_FLUSH_CURRENT to allow optimized TLB flushing of VMX's EPTP/VPID contexts[*] from the KVM MMU and/or in a deferred manner, e.g. to flush L2's context during nested VM-Enter. Convert KVM_REQ_TLB_FLUSH to KVM_REQ_TLB_FLUSH_CURRENT in flows where the flush is directly associated with vCPU-scoped instruction emulation, i.e. MOV CR3 and INVPCID. Add a comment in vmx_vcpu_load_vmcs() above its KVM_REQ_TLB_FLUSH to make it clear that it deliberately requests a flush of all contexts. Service any pending flush request on nested VM-Exit as it's possible a nested VM-Exit could occur after requesting a flush for L2. Add the same logic for nested VM-Enter even though it's _extremely_ unlikely for flush to be pending on nested VM-Enter, but theoretically possible (in the future) due to RSM (SMM) emulation. [*] Intel also has an Address Space Identifier (ASID) concept, e.g. EPTP+VPID+PCID == ASID, it's just not documented in the SDM because the rules of invalidation are different based on which piece of the ASID is being changed, i.e. whether the EPTP, VPID, or PCID context must be invalidated. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-25-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Rename ->tlb_flush() to ->tlb_flush_all() in preparation for adding a new hook to flush only the current ASID/context. Opportunstically replace the comment in vmx_flush_tlb() that explains why it flushes all EPTP/VPID contexts with a comment explaining why it unconditionally uses INVEPT when EPT is enabled. I.e. rely on the "all" part of the name to clarify why it does global INVEPT/INVVPID. No functional change intended. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-23-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Drop @invalidate_gpa from ->tlb_flush() and kvm_vcpu_flush_tlb() now that all callers pass %true for said param, or ignore the param (SVM has an internal call to svm_flush_tlb() in svm_flush_tlb_guest that somewhat arbitrarily passes %false). Remove __vmx_flush_tlb() as it is no longer used. No functional change intended. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-17-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
Hyper-V PV TLB flush mechanism does TLB flush on behalf of the guest so doing tlb_flush_all() is an overkill, switch to using tlb_flush_guest() (just like KVM PV TLB flush mechanism) instead. Introduce KVM_REQ_HV_TLB_FLUSH to support the change. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Add a dedicated hook to handle flushing TLB entries on behalf of the guest, i.e. for a paravirtualized TLB flush, and use it directly instead of bouncing through kvm_vcpu_flush_tlb(). For VMX, change the effective implementation implementation to never do INVEPT and flush only the current context, i.e. to always flush via INVVPID(SINGLE_CONTEXT). The INVEPT performed by __vmx_flush_tlb() when @invalidate_gpa=false and enable_vpid=0 is unnecessary, as it will only flush guest-physical mappings; linear and combined mappings are flushed by VM-Enter when VPID is disabled, and changes in the guest pages tables do not affect guest-physical mappings. When EPT and VPID are enabled, doing INVVPID is not required (by Intel's architecture) to invalidate guest-physical mappings, i.e. TLB entries that cache guest-physical mappings can live across INVVPID as the mappings are associated with an EPTP, not a VPID. The intent of @invalidate_gpa is to inform vmx_flush_tlb() that it must "invalidate gpa mappings", i.e. do INVEPT and not simply INVVPID. Other than nested VPID handling, which now calls vpid_sync_context() directly, the only scenario where KVM can safely do INVVPID instead of INVEPT (when EPT is enabled) is if KVM is flushing TLB entries from the guest's perspective, i.e. is only required to invalidate linear mappings. For SVM, flushing TLB entries from the guest's perspective can be done by flushing the current ASID, as changes to the guest's page tables are associated only with the current ASID. Adding a dedicated ->tlb_flush_guest() paves the way toward removing @invalidate_gpa, which is a potentially dangerous control flag as its meaning is not exactly crystal clear, even for those who are familiar with the subtleties of what mappings Intel CPUs are/aren't allowed to keep across various invalidation scenarios. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-15-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Wrap the combination of mmu->invlpg and kvm_x86_ops->tlb_flush_gva into a new function. This function also lets us specify the host PGD to invalidate and also the MMU, both of which will be useful in fixing and simplifying kvm_inject_emulated_page_fault. A nested guest's MMU however has g_context->invlpg == NULL. Instead of setting it to nonpaging_invlpg, make kvm_mmu_invalidate_gva the only entry point to mmu->invlpg and make a NULL invlpg pointer equivalent to nonpaging_invlpg, saving a retpoline. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 16 4月, 2020 1 次提交
-
-
由 Sean Christopherson 提交于
Export the page fault propagation helper so that VMX can use it to correctly emulate TLB invalidation on page faults in an upcoming patch. In the (hopefully) not-too-distant future, SGX virtualization will also want access to the helper for injecting page faults to the correct level (L1 vs. L2) when emulating ENCLS instructions. Rename the function to kvm_inject_emulated_page_fault() to clarify that it is (a) injecting a fault and (b) only for page faults. WARN if it's invoked with an exception other than PF_VECTOR. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-6-sean.j.christopherson@intel.com> Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 31 3月, 2020 3 次提交
-
-
由 Sean Christopherson 提交于
Remove the __exit annotation from VMX hardware_unsetup(), the hook can be reached during kvm_init() by way of kvm_arch_hardware_unsetup() if failure occurs at various points during initialization. Removing the annotation also lets us annotate vmx_x86_ops and svm_x86_ops with __initdata; otherwise, objtool complains because it doesn't understand that the vendor specific __initdata is being copied by value to a non-__initdata instance. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200321202603.19355-8-sean.j.christopherson@intel.com> Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Replace the kvm_x86_ops pointer in common x86 with an instance of the struct to save one pointer dereference when invoking functions. Copy the struct by value to set the ops during kvm_init(). Arbitrarily use kvm_x86_ops.hardware_enable to track whether or not the ops have been initialized, i.e. a vendor KVM module has been loaded. Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200321202603.19355-7-sean.j.christopherson@intel.com> Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Move the kvm_x86_ops functions that are used only within the scope of kvm_init() into a separate struct, kvm_x86_init_ops. In addition to identifying the init-only functions without restorting to code comments, this also sets the stage for waiting until after ->hardware_setup() to set kvm_x86_ops. Setting kvm_x86_ops after ->hardware_setup() is desirable as many of the hooks are not usable until ->hardware_setup() completes. No functional change intended. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200321202603.19355-3-sean.j.christopherson@intel.com> Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 17 3月, 2020 2 次提交
-
-
由 Paolo Bonzini 提交于
The set_cr3 callback is not setting the guest CR3, it is setting the root of the guest page tables, either shadow or two-dimensional. To make this clearer as well as to indicate that the MMU calls it via kvm_mmu_load_cr3, rename it to load_mmu_pgd. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Similar to what kvm-intel.ko is doing, provide a single callback that merges svm_set_cr3, set_tdp_cr3 and nested_svm_set_tdp_cr3. This lets us unify the set_cr3 and set_tdp_cr3 entries in kvm_x86_ops. I'm doing that in this same patch because splitting it adds quite a bit of churn due to the need for forward declarations. For the same reason the assignment to vcpu->arch.mmu->set_cr3 is moved to kvm_init_shadow_mmu from init_kvm_softmmu and nested_svm_init_mmu_context. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-