- 13 11月, 2019 8 次提交
-
-
由 Junaid Shahid 提交于
commit 1aa9b9572b10529c2e64e2b8f44025d86e124308 upstream. The page table pages corresponding to broken down large pages are zapped in FIFO order, so that the large page can potentially be recovered, if it is not longer being used for execution. This removes the performance penalty for walking deeper EPT page tables. By default, one large page will last about one hour once the guest reaches a steady state. Signed-off-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Paolo Bonzini 提交于
commit b8e8c8303ff28c61046a4d0f6ea99aea609a7dc0 upstream. With some Intel processors, putting the same virtual address in the TLB as both a 4 KiB and 2 MiB page can confuse the instruction fetch unit and cause the processor to issue a machine check resulting in a CPU lockup. Unfortunately when EPT page tables use huge pages, it is possible for a malicious guest to cause this situation. Add a knob to mark huge pages as non-executable. When the nx_huge_pages parameter is enabled (and we are using EPT), all huge pages are marked as NX. If the guest attempts to execute in one of those pages, the page is broken down into 4K pages, which are then marked executable. This is not an issue for shadow paging (except nested EPT), because then the host is in control of TLB flushes and the problematic situation cannot happen. With nested EPT, again the nested guest can cause problems shadow and direct EPT is treated in the same way. [ tglx: Fixup default to auto and massage wording a bit ] Originally-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Paolo Bonzini 提交于
commit 335e192a3fa415e1202c8b9ecdaaecd643f823cc upstream. These are useful in debugging shadow paging. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Paolo Bonzini 提交于
commit e9f2a760b158551bfbef6db31d2cae45ab8072e5 upstream. Note that in such a case it is quite likely that KVM will BUG_ON in __pte_list_remove when the VM is closed. However, there is no immediate risk of memory corruption in the host so a WARN_ON is enough and it lets you gather traces for debugging. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Paolo Bonzini 提交于
commit d679b32611c0102ce33b9e1a4e4b94854ed1812a upstream. After the previous patch, the low bits of the gfn are masked in both FNAME(fetch) and __direct_map, so we do not need to clear them in transparent_hugepage_adjust. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Paolo Bonzini 提交于
commit 3fcf2d1bdeb6a513523cb2c77012a6b047aa859c upstream. These two functions are basically doing the same thing through kvm_mmu_get_page, link_shadow_page and mmu_set_spte; yet, for historical reasons, their code looks very different. This patch tries to take the best of each and make them very similar, so that it is easy to understand changes that apply to both of them. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Junaid Shahid 提交于
commit 43fdcda96e2550c6d1c46fb8a78801aa2f7276ed upstream. Release the page at the call-site where it was originally acquired. This makes the exit code cleaner for most call sites, since they do not need to duplicate code between success and the failure label. Signed-off-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Junaid Shahid 提交于
commit 0d9ce162cf46c99628cc5da9510b959c7976735b upstream. It doesn't seem as if there is any particular need for kvm_lock to be a spinlock, so convert the lock to a mutex so that sleepable functions (in particular cond_resched()) can be called while holding it. Signed-off-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 16 9月, 2019 1 次提交
-
-
由 Ben Gardon 提交于
[ Upstream commit bc8a3d8925a8fa09fa550e0da115d95851ce33c6 ] KVM bases its memory usage limits on the total number of guest pages across all memslots. However, those limits, and the calculations to produce them, use 32 bit unsigned integers. This can result in overflow if a VM has more guest pages that can be represented by a u32. As a result of this overflow, KVM can use a low limit on the number of MMU pages it will allocate. This makes KVM unable to map all of guest memory at once, prompting spurious faults. Tested: Ran all kvm-unit-tests on an Intel Haswell machine. This patch introduced no new failures. Signed-off-by: NBen Gardon <bgardon@google.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 07 8月, 2019 1 次提交
-
-
由 Arnd Bergmann 提交于
[ Upstream commit a6a6d3b1f867d34ba5bd61aa7bb056b48ca67cff ] clang finds a contruct suspicious that converts an unsigned character to a signed integer and back, causing an overflow: arch/x86/kvm/mmu.c:4605:39: error: implicit conversion from 'int' to 'u8' (aka 'unsigned char') changes value from -205 to 51 [-Werror,-Wconstant-conversion] u8 wf = (pfec & PFERR_WRITE_MASK) ? ~w : 0; ~~ ^~ arch/x86/kvm/mmu.c:4607:38: error: implicit conversion from 'int' to 'u8' (aka 'unsigned char') changes value from -241 to 15 [-Werror,-Wconstant-conversion] u8 uf = (pfec & PFERR_USER_MASK) ? ~u : 0; ~~ ^~ arch/x86/kvm/mmu.c:4609:39: error: implicit conversion from 'int' to 'u8' (aka 'unsigned char') changes value from -171 to 85 [-Werror,-Wconstant-conversion] u8 ff = (pfec & PFERR_FETCH_MASK) ? ~x : 0; ~~ ^~ Add an explicit cast to tell clang that everything works as intended here. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Link: https://github.com/ClangBuiltLinux/linux/issues/95Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 03 7月, 2019 1 次提交
-
-
由 Sean Christopherson 提交于
commit b6b80c78af838bef17501416d5d383fedab0010a upstream. SVM's Nested Page Tables (NPT) reuses x86 paging for the host-controlled page walk. For 32-bit KVM, this means PAE paging is used even when TDP is enabled, i.e. the PAE root array needs to be allocated. Fixes: ee6268ba ("KVM: x86: Skip pae_root shadow allocation if tdp enabled") Cc: stable@vger.kernel.org Reported-by: NJiri Palecek <jpalecek@web.de> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Cc: Jiri Palecek <jpalecek@web.de> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 24 3月, 2019 2 次提交
-
-
由 Sean Christopherson 提交于
commit e1359e2beb8b0a1188abc997273acbaedc8ee791 upstream. The check to detect a wrap of the MMIO generation explicitly looks for a generation number of zero. Now that unique memslots generation numbers are assigned to each address space, only address space 0 will get a generation number of exactly zero when wrapping. E.g. when address space 1 goes from 0x7fffe to 0x80002, the MMIO generation number will wrap to 0x2. Adjust the MMIO generation to strip the address space modifier prior to checking for a wrap. Fixes: 4bd518f1 ("KVM: use separate generations for each address space") Cc: <stable@vger.kernel.org> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Sean Christopherson 提交于
commit 152482580a1b0accb60676063a1ac57b2d12daf6 upstream. kvm_arch_memslots_updated() is at this point in time an x86-specific hook for handling MMIO generation wraparound. x86 stashes 19 bits of the memslots generation number in its MMIO sptes in order to avoid full page fault walks for repeat faults on emulated MMIO addresses. Because only 19 bits are used, wrapping the MMIO generation number is possible, if unlikely. kvm_arch_memslots_updated() alerts x86 that the generation has changed so that it can invalidate all MMIO sptes in case the effective MMIO generation has wrapped so as to avoid using a stale spte, e.g. a (very) old spte that was created with generation==0. Given that the purpose of kvm_arch_memslots_updated() is to prevent consuming stale entries, it needs to be called before the new generation is propagated to memslots. Invalidating the MMIO sptes after updating memslots means that there is a window where a vCPU could dereference the new memslots generation, e.g. 0, and incorrectly reuse an old MMIO spte that was created with (pre-wrap) generation==0. Fixes: e59dbe09 ("KVM: Introduce kvm_arch_memslots_updated()") Cc: <stable@vger.kernel.org> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 06 12月, 2018 1 次提交
-
-
由 Junaid Shahid 提交于
commit 0e0fee5c539b61fdd098332e0e2cc375d9073706 upstream. When a guest page table is updated via an emulated write, kvm_mmu_pte_write() is called to update the shadow PTE using the just written guest PTE value. But if two emulated guest PTE writes happened concurrently, it is possible that the guest PTE and the shadow PTE end up being out of sync. Emulated writes do not mark the shadow page as unsync-ed, so this inconsistency will not be resolved even by a guest TLB flush (unless the page was marked as unsync-ed at some other point). This is fixed by re-reading the current value of the guest PTE after the MMU lock has been acquired instead of just using the value that was written prior to calling kvm_mmu_pte_write(). Signed-off-by: NJunaid Shahid <junaids@google.com> Reviewed-by: NWanpeng Li <wanpengli@tencent.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 01 10月, 2018 1 次提交
-
-
由 Sean Christopherson 提交于
One defense against L1TF in KVM is to always set the upper five bits of the *legal* physical address in the SPTEs for non-present and reserved SPTEs, e.g. MMIO SPTEs. In the MMIO case, the GFN of the MMIO SPTE may overlap with the upper five bits that are being usurped to defend against L1TF. To preserve the GFN, the bits of the GFN that overlap with the repurposed bits are shifted left into the reserved bits, i.e. the GFN in the SPTE will be split into high and low parts. When retrieving the GFN from the MMIO SPTE, e.g. to check for an MMIO access, get_mmio_spte_gfn() unshifts the affected bits and restores the original GFN for comparison. Unfortunately, get_mmio_spte_gfn() neglects to mask off the reserved bits in the SPTE that were used to store the upper chunk of the GFN. As a result, KVM fails to detect MMIO accesses whose GPA overlaps the repurprosed bits, which in turn causes guest panics and hangs. Fix the bug by generating a mask that covers the lower chunk of the GFN, i.e. the bits that aren't shifted by the L1TF mitigation. The alternative approach would be to explicitly zero the five reserved bits that are used to store the upper chunk of the GFN, but that requires additional run-time computation and makes an already-ugly bit of code even more inscrutable. I considered adding a WARN_ON_ONCE(low_phys_bits-1 <= PAGE_SHIFT) to warn if GENMASK_ULL() generated a nonsensical value, but that seemed silly since that would mean a system that supports VMX has less than 18 bits of physical address space... Reported-by: NSakari Ailus <sakari.ailus@iki.fi> Fixes: d9b47449c1a1 ("kvm: x86: Set highest physical address bits in non-present/reserved SPTEs") Cc: Junaid Shahid <junaids@google.com> Cc: Jim Mattson <jmattson@google.com> Cc: stable@vger.kernel.org Reviewed-by: NJunaid Shahid <junaids@google.com> Tested-by: NSakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 20 9月, 2018 2 次提交
-
-
由 Tianyu Lan 提交于
kvm_commit_zap_page() has been renamed to kvm_mmu_commit_zap_page() This patch is to fix the commit. Signed-off-by: NLan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wei Yang 提交于
Here is the code path which shows kvm_mmu_setup() is invoked after kvm_mmu_create(). Since kvm_mmu_setup() is only invoked in this code path, this means the root_hpa and prev_roots are guaranteed to be invalid. And it is not necessary to reset it again. kvm_vm_ioctl_create_vcpu() kvm_arch_vcpu_create() vmx_create_vcpu() kvm_vcpu_init() kvm_arch_vcpu_init() kvm_mmu_create() kvm_arch_vcpu_setup() kvm_mmu_setup() kvm_init_mmu() This patch set reset_roots to false in kmv_mmu_setup(). Fixes: 50c28f21Signed-off-by: NWei Yang <richard.weiyang@gmail.com> Reviewed-by: NLiran Alon <liran.alon@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 07 9月, 2018 1 次提交
-
-
由 Marc Zyngier 提交于
kvm_unmap_hva is long gone, and we only have kvm_unmap_hva_range to deal with. Drop the now obsolete code. Fixes: fb1522e0 ("KVM: update to new mmu_notifier semantic v2") Cc: James Hogan <jhogan@kernel.org> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
-
- 30 8月, 2018 4 次提交
-
-
由 Sean Christopherson 提交于
Commit a6f177ef ("KVM: Reenter guest after emulation failure if due to access to non-mmio address") added reexecute_instruction() to handle the scenario where two (or more) vCPUS race to write a shadowed page, i.e. reexecute_instruction() is intended to return true if and only if the instruction being emulated was accessing a shadowed page. As L0 is only explicitly shadowing L1 tables, an emulation failure of a nested VM instruction cannot be due to a race to write a shadowed page and so should never be re-executed. This fixes an issue where an "MMIO" emulation failure[1] in L2 is all but guaranteed to result in an infinite loop when TDP is enabled. Because "cr2" is actually an L2 GPA when TDP is enabled, calling kvm_mmu_gva_to_gpa_write() to translate cr2 in the non-direct mapped case (L2 is never direct mapped) will almost always yield UNMAPPED_GVA and cause reexecute_instruction() to immediately return true. The !mmio_info_in_cache() check in kvm_mmu_page_fault() doesn't catch this case because mmio_info_in_cache() returns false for a nested MMU (the MMIO caching currently handles L1 only, e.g. to cache nested guests' GPAs we'd have to manually flush the cache when switching between VMs and when L1 updated its page tables controlling the nested guest). Way back when, commit 68be0803 ("KVM: x86: never re-execute instruction with enabled tdp") changed reexecute_instruction() to always return false when using TDP under the assumption that KVM would only get into the emulator for MMIO. Commit 95b3cf69 ("KVM: x86: let reexecute_instruction work for tdp") effectively reverted that behavior in order to handle the scenario where emulation failed due to an access from L1 to the shadow page tables for L2, but it didn't account for the case where emulation failed in L2 with TDP enabled. All of the above logic also applies to retry_instruction(), added by commit 1cb3f3ae ("KVM: x86: retry non-page-table writing instructions"). An indefinite loop in retry_instruction() should be impossible as it protects against retrying the same instruction over and over, but it's still correct to not retry an L2 instruction in the first place. Fix the immediate issue by adding a check for a nested guest when determining whether or not to allow retry in kvm_mmu_page_fault(). In addition to fixing the immediate bug, add WARN_ON_ONCE in the retry functions since they are not designed to handle nested cases, i.e. they need to be modified even if there is some scenario in the future where we want to allow retrying a nested guest. [1] This issue was encountered after commit 3a2936de ("kvm: mmu: Don't expose private memslots to L2") changed the page fault path to return KVM_PFN_NOSLOT when translating an L2 access to a prive memslot. Returning KVM_PFN_NOSLOT is semantically correct when we want to hide a memslot from L2, i.e. there effectively is no defined memory region for L2, but it has the unfortunate side effect of making KVM think the GFN is a MMIO page, thus triggering emulation. The failure occurred with in-development code that deliberately exposed a private memslot to L2, which L2 accessed with an instruction that is not emulated by KVM. Fixes: 95b3cf69 ("KVM: x86: let reexecute_instruction work for tdp") Fixes: 1cb3f3ae ("KVM: x86: retry non-page-table writing instructions") Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Cc: Jim Mattson <jmattson@google.com> Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com> Cc: Xiao Guangrong <xiaoguangrong@tencent.com> Cc: stable@vger.kernel.org Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Sean Christopherson 提交于
Effectively force kvm_mmu_page_fault() to opt-in to allowing retry to make it more obvious when and why it allows emulation to be retried. Previously this approach was less convenient due to retry and re-execute behavior being controlled by separate flags that were also inverted in their implementations (opt-in versus opt-out). Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Cc: stable@vger.kernel.org Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Sean Christopherson 提交于
retry_instruction() and reexecute_instruction() are a package deal, i.e. there is no scenario where one is allowed and the other is not. Merge their controlling emulation type flags to enforce this in code. Name the combined flag EMULTYPE_ALLOW_RETRY to make it abundantly clear that we are allowing re{try,execute} to occur, as opposed to explicitly requesting retry of a previously failed instruction. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Cc: stable@vger.kernel.org Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Sean Christopherson 提交于
Re-execution of an instruction after emulation decode failure is intended to be used only when emulating shadow page accesses. Invert the flag to make allowing re-execution opt-in since that behavior is by far in the minority. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Cc: stable@vger.kernel.org Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 15 8月, 2018 1 次提交
-
-
由 Junaid Shahid 提交于
Always set the 5 upper-most supported physical address bits to 1 for SPTEs that are marked as non-present or reserved, to make them unusable for L1TF attacks from the guest. Currently, this just applies to MMIO SPTEs. (We do not need to mark PTEs that are completely 0 as physical page 0 is already reserved.) This allows mitigation of L1TF without disabling hyper-threading by using shadow paging mode instead of EPT. Signed-off-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 06 8月, 2018 17 次提交
-
-
由 Liang Chen 提交于
Considering the fact that the pae_root shadow is not needed when tdp is in use, skip the pae_root shadow page allocation to allow mmu creation even not being able to obtain memory from DMA32 zone when particular cgroup cpuset.mems or mempolicy control is applied. Signed-off-by: NLiang Chen <liangchen.linux@gmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Tianyu Lan 提交于
mmu_set_spte() flushes remote tlbs for drop_parent_pte/drop_spte() and set_spte() separately. This may introduce redundant flush. This patch is to combine these flushes and check flush request after calling set_spte(). Signed-off-by: NLan Tianyu <Tianyu.Lan@microsoft.com> Reviewed-by: NJunaid Shahid <junaids@google.com> Reviewed-by: NXiao Guangrong <xiaoguangrong@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Tianyu Lan 提交于
Merge check of "sp->role.cr4_pae != !!is_pae(vcpu))" and "vcpu-> arch.mmu.sync_page(vcpu, sp) == 0". kvm_mmu_prepare_zap_page() is called under both these conditions. Signed-off-by: NLan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Junaid Shahid 提交于
Adds support for storing multiple previous CR3/root_hpa pairs maintained as an LRU cache, so that the lockless CR3 switch path can be used when switching back to any of them. Signed-off-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Junaid Shahid 提交于
This needs a minor bug fix. The updated patch is as follows. Thanks, Junaid ------------------------------------------------------------------------------ kvm_mmu_invlpg() and kvm_mmu_invpcid_gva() only need to flush the TLB entries for the specific guest virtual address, instead of flushing all TLB entries associated with the VM. Signed-off-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Junaid Shahid 提交于
When the guest indicates that the TLB doesn't need to be flushed in a CR3 switch, we can also skip resyncing the shadow page tables since an out-of-sync shadow page table is equivalent to an out-of-sync TLB. Signed-off-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Junaid Shahid 提交于
kvm_mmu_free_roots() now takes a mask specifying which roots to free, so that either one of the roots (active/previous) can be individually freed when needed. Signed-off-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Junaid Shahid 提交于
This allows invlpg() to be called using either the active root_hpa or the prev_root_hpa. Signed-off-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Junaid Shahid 提交于
When PCIDs are enabled, the MSb of the source operand for a MOV-to-CR3 instruction indicates that the TLB doesn't need to be flushed. This change enables this optimization for MOV-to-CR3s in the guest that have been intercepted by KVM for shadow paging and are handled within the fast CR3 switch path. Signed-off-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Junaid Shahid 提交于
Implement support for INVPCID in shadow paging mode as well. Signed-off-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Junaid Shahid 提交于
Remove the implicit flush from the set_cr3 handlers, so that the callers are able to decide whether to flush the TLB or not. Signed-off-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Junaid Shahid 提交于
Use the fast CR3 switch mechanism to locklessly change the MMU root page when switching between L1 and L2. The switch from L2 to L1 should always go through the fast path, while the switch from L1 to L2 should go through the fast path if L1's CR3/EPTP for L2 hasn't changed since the last time. Signed-off-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Junaid Shahid 提交于
This adds support for re-initializing the MMU context in a different mode while preserving the active root_hpa and the prev_root. Signed-off-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Junaid Shahid 提交于
This generalizes the lockless CR3 switch path to be able to work across different MMU modes (e.g. nested vs non-nested) by checking that the expected page role of the new root page matches the page role of the previously stored root page in addition to checking that the new CR3 matches the previous CR3. Furthermore, instead of loading the hardware CR3 in fast_cr3_switch(), it is now done in vcpu_enter_guest(), as by that time the MMU context would be up-to-date with the VCPU mode. Signed-off-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Junaid Shahid 提交于
The KVM_REQ_LOAD_CR3 request loads the hardware CR3 using the current root_hpa. Signed-off-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Junaid Shahid 提交于
These functions factor out the base role calculation from the corresponding kvm_init_*_mmu() functions. The new functions return what would be the role assigned to a root page in the current VCPU state. This can be masked with mmu_base_role_mask to derive the base role. Signed-off-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Junaid Shahid 提交于
When using shadow paging, a CR3 switch in the guest results in a VM Exit. In the common case, that VM exit doesn't require much processing by KVM. However, it does acquire the MMU lock, which can start showing signs of contention under some workloads even on a 2 VCPU VM when the guest is using KPTI. Therefore, we add a fast path that avoids acquiring the MMU lock in the most common cases e.g. when switching back and forth between the kernel and user mode CR3s used by KPTI with no guest page table changes in between. For now, this fast path is implemented only for 64-bit guests and hosts to avoid the handling of PDPTEs, but it can be extended later to 32-bit guests and/or hosts as well. Signed-off-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-