- 20 4月, 2021 4 次提交
-
-
由 Wanpeng Li 提交于
To analyze some performance issues with lock contention and scheduling, it is nice to know when directed yield are successful or failing. Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Message-Id: <1617941911-5338-2-git-send-email-wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
Enable PV TLB shootdown when !CONFIG_SMP doesn't make sense. Let's move it inside CONFIG_SMP. In addition, we can avoid define and alloc __pv_cpu_mask when !CONFIG_SMP and get rid of 'alloc' variable in kvm_alloc_cpumask. Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Message-Id: <1617941911-5338-1-git-send-email-wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ben Gardon 提交于
To avoid saddling a vCPU thread with the work of tearing down an entire paging structure, take a reference on each root before they become obsolete, so that the thread initiating the fast invalidation can tear down the paging structure and (most likely) release the last reference. As a bonus, this teardown can happen under the MMU lock in read mode so as not to block the progress of vCPU threads. Signed-off-by: NBen Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-14-bgardon@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ben Gardon 提交于
Provide a real mechanism for fast invalidation by marking roots as invalid so that their reference count will quickly fall to zero and they will be torn down. One negative side affect of this approach is that a vCPU thread will likely drop the last reference to a root and be saddled with the work of tearing down an entire paging structure. This issue will be resolved in a later commit. Signed-off-by: NBen Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-13-bgardon@google.com> [Move the loop to tdp_mmu.c, otherwise compilation fails on 32-bit. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 19 4月, 2021 12 次提交
-
-
由 Ben Gardon 提交于
To reduce lock contention and interference with page fault handlers, allow the TDP MMU functions which enable and disable dirty logging to operate under the MMU read lock. Signed-off-by: NBen Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-12-bgardon@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ben Gardon 提交于
To reduce the impact of disabling dirty logging, change the TDP MMU function which zaps collapsible SPTEs to run under the MMU read lock. This way, page faults on zapped SPTEs can proceed in parallel with kvm_mmu_zap_collapsible_sptes. Signed-off-by: NBen Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-11-bgardon@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ben Gardon 提交于
To reduce lock contention and interference with page fault handlers, allow the TDP MMU function to zap a GFN range to operate under the MMU read lock. Signed-off-by: NBen Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-10-bgardon@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ben Gardon 提交于
Protect the contents of the TDP MMU roots list with RCU in preparation for a future patch which will allow the iterator macro to be used under the MMU lock in read mode. Signed-off-by: NBen Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-9-bgardon@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ben Gardon 提交于
To reduce dependence on the MMU write lock, don't rely on the assumption that the atomic operation in kvm_tdp_mmu_get_root will always succeed. By not relying on that assumption, threads do not need to hold the MMU lock in write mode in order to take a reference on a TDP MMU root. In the root iterator, this change means that some roots might have to be skipped if they are found to have a zero refcount. This will still never happen as of this patch, but a future patch will need that flexibility to make the root iterator safe under the MMU read lock. Signed-off-by: NBen Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-8-bgardon@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ben Gardon 提交于
In order to parallelize more operations for the TDP MMU, make the refcount on TDP MMU roots atomic, so that a future patch can allow multiple threads to take a reference on the root concurrently, while holding the MMU lock in read mode. Signed-off-by: NBen Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-7-bgardon@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ben Gardon 提交于
Refactor the yield safe TDP MMU root iterator to be more amenable to changes in future commits which will allow it to be used under the MMU lock in read mode. Currently the iterator requires a complicated dance between the helper functions and different parts of the for loop which makes it hard to reason about. Moving all the logic into a single function simplifies the iterator substantially. Signed-off-by: NBen Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-6-bgardon@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ben Gardon 提交于
kvm_tdp_mmu_put_root and kvm_tdp_mmu_free_root are always called together, so merge the functions to simplify TDP MMU root refcounting / freeing. Signed-off-by: NBen Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-5-bgardon@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ben Gardon 提交于
Minor cleanup to deduplicate the code used to free a struct kvm_mmu_page in the TDP MMU. No functional change intended. Signed-off-by: NBen Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-4-bgardon@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ben Gardon 提交于
The TDP MMU is almost the only user of kvm_mmu_get_root and kvm_mmu_put_root. There is only one use of put_root in mmu.c for the legacy / shadow MMU. Open code that one use and move the get / put functions to the TDP MMU so they can be extended in future commits. No functional change intended. Signed-off-by: NBen Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-3-bgardon@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ben Gardon 提交于
kvm_tdp_mmu_zap_collapsible_sptes unnecessarily removes the const qualifier from its memlsot argument, leading to a compiler warning. Add the const annotation and pass it to subsequent functions. Signed-off-by: NBen Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-2-bgardon@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Let the TDP MMU yield when unmapping a range in response to a MMU notification, if yielding is allowed by said notification. There is no reason to disallow yielding in this case, and in theory the range being invalidated could be quite large. Cc: Ben Gardon <bgardon@google.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210402005658.3024832-11-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 17 4月, 2021 24 次提交
-
-
由 Sean Christopherson 提交于
Yank out the hva-based MMU notifier APIs now that all architectures that use the notifiers have moved to the gfn-based APIs. No functional change intended. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210402005658.3024832-7-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Move PPC to the gfn-base MMU notifier APIs, and update all 15 bajillion PPC-internal hooks to work with gfns instead of hvas. No meaningful functional change intended, though the exact order of operations is slightly different since the memslot lookups occur before calling into arch code. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210402005658.3024832-6-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Move MIPS to the gfn-based MMU notifier APIs, which do the hva->gfn lookup in common code, and whose code is nearly identical to MIPS' lookup. No meaningful functional change intended, though the exact order of operations is slightly different since the memslot lookups occur before calling into arch code. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210402005658.3024832-5-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Move arm64 to the gfn-base MMU notifier APIs, which do the hva->gfn lookup in common code. No meaningful functional change intended, though the exact order of operations is slightly different since the memslot lookups occur before calling into arch code. Reviewed-by: NMarc Zyngier <maz@kernel.org> Tested-by: NMarc Zyngier <maz@kernel.org> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210402005658.3024832-4-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Move the hva->gfn lookup for MMU notifiers into common code. Every arch does a similar lookup, and some arch code is all but identical across multiple architectures. In addition to consolidating code, this will allow introducing optimizations that will benefit all architectures without incurring multiple walks of the memslots, e.g. by taking mmu_lock if and only if a relevant range exists in the memslots. The use of __always_inline to avoid indirect call retpolines, as done by x86, may also benefit other architectures. Consolidating the lookups also fixes a wart in x86, where the legacy MMU and TDP MMU each do their own memslot walks. Lastly, future enhancements to the memslot implementation, e.g. to add an interval tree to track host address, will need to touch far less arch specific code. MIPS, PPC, and arm64 will be converted one at a time in future patches. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210402005658.3024832-3-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Return 1 from kvm_unmap_hva_range and kvm_set_spte_hva if a flush is needed, so that the generic code can coalesce the flushes. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Since all calls to kvm_flush_remote_tlbs must be preceded by kvm_mips_callbacks->prepare_flush_shadow, repurpose kvm_arch_flush_remote_tlb to invoke it. This makes it possible to use the TLB flushing mechanism provided by the generic MMU notifier code. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Both trap-and-emulate and VZ have a single implementation that covers both .flush_shadow_all and .flush_shadow_memslot, and both of them end with a call to kvm_flush_remote_tlbs. Unify the callbacks into one and extract the call to kvm_flush_remote_tlbs. The next patches will pull it further out of the the architecture-specific MMU notifier functions kvm_unmap_hva_range and kvm_set_spte_hva. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
memslots are stored in RCU and there should be no need to change them. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
When using manual protection of dirty pages, it is not necessary to protect nested page tables down to the 4K level; instead KVM can protect only hugepages in order to split them lazily, and delay write protection at 4K-granularity until KVM_CLEAR_DIRTY_LOG. This was overlooked in the TDP MMU, so do it there as well. Fixes: a6a0b05d ("kvm: x86/mmu: Support dirty logging for the TDP MMU") Cc: Ben Gardon <bgardon@google.com> Reviewed-by: NKeqian Zhu <zhukeqian1@huawei.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
Define KVM_GUESTDBG_VALID_MASK and use it to implement this capabiity. Compile tested only. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210401135451.1004564-6-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
Move KVM_GUESTDBG_VALID_MASK to kvm_host.h and use it to return the value of this capability. Compile tested only. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210401135451.1004564-5-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
Store the supported bits into KVM_GUESTDBG_VALID_MASK macro, similar to how arm does this. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210401135451.1004564-4-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
Injected interrupts/nmi should not block a pending exception, but rather be either lost if nested hypervisor doesn't intercept the pending exception (as in stock x86), or be delivered in exitintinfo/IDT_VECTORING_INFO field, as a part of a VMexit that corresponds to the pending exception. The only reason for an exception to be blocked is when nested run is pending (and that can't really happen currently but still worth checking for). Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210401143817.1030695-2-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
While KVM's MMU should be fully reset by loading of nested CR0/CR3/CR4 by KVM_SET_SREGS, we are not in nested mode yet when we do it and therefore only root_mmu is reset. On regular nested entries we call nested_svm_load_cr3 which both updates the guest's CR3 in the MMU when it is needed, and it also initializes the mmu again which makes it initialize the walk_mmu as well when nested paging is enabled in both host and guest. Since we don't call nested_svm_load_cr3 on nested state load, the walk_mmu can be left uninitialized, which can lead to a NULL pointer dereference while accessing it if we happen to get a nested page fault right after entering the nested guest first time after the migration and we decide to emulate it, which leads to the emulator trying to access walk_mmu->gva_to_gpa which is NULL. Therefore we should call this function on nested state load as well. Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210401141814.1029036-3-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Edmondson 提交于
When dumping the current VMCS state, include the MSRs that are being automatically loaded/stored during VM entry/exit. Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NDavid Edmondson <david.edmondson@oracle.com> Message-Id: <20210318120841.133123-6-david.edmondson@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Edmondson 提交于
If EFER is not being loaded from the VMCS, show the effective value by reference to the MSR autoload list or calculation. Suggested-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NDavid Edmondson <david.edmondson@oracle.com> Message-Id: <20210318120841.133123-5-david.edmondson@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Edmondson 提交于
When deciding whether to dump the GUEST_IA32_EFER and GUEST_IA32_PAT fields of the VMCS, examine only the VM entry load controls, as saving on VM exit has no effect on whether VM entry succeeds or fails. Suggested-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NDavid Edmondson <david.edmondson@oracle.com> Message-Id: <20210318120841.133123-4-david.edmondson@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Edmondson 提交于
Show EFER and PAT based on their individual entry/exit controls. Signed-off-by: NDavid Edmondson <david.edmondson@oracle.com> Message-Id: <20210318120841.133123-3-david.edmondson@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Edmondson 提交于
If the VM entry/exit controls for loading/saving MSR_EFER are either not available (an older processor or explicitly disabled) or not used (host and guest values are the same), reading GUEST_IA32_EFER from the VMCS returns an inaccurate value. Because of this, in dump_vmcs() don't use GUEST_IA32_EFER to decide whether to print the PDPTRs - always do so if the fields exist. Fixes: 4eb64dce ("KVM: x86: dump VMCS on invalid entry") Signed-off-by: NDavid Edmondson <david.edmondson@oracle.com> Message-Id: <20210318120841.133123-2-david.edmondson@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
Currently to support Intel->AMD migration, if CPU vendor is GenuineIntel, we emulate the full 64 value for MSR_IA32_SYSENTER_{EIP|ESP} msrs, and we also emulate the sysenter/sysexit instruction in long mode. (Emulator does still refuse to emulate sysenter in 64 bit mode, on the ground that the code for that wasn't tested and likely has no users) However when virtual vmload/vmsave is enabled, the vmload instruction will update these 32 bit msrs without triggering their msr intercept, which will lead to having stale values in kvm's shadow copy of these msrs, which relies on the intercept to be up to date. Fix/optimize this by doing the following: 1. Enable the MSR intercepts for SYSENTER MSRs iff vendor=GenuineIntel (This is both a tiny optimization and also ensures that in case the guest cpu vendor is AMD, the msrs will be 32 bit wide as AMD defined). 2. Store only high 32 bit part of these msrs on interception and combine it with hardware msr value on intercepted read/writes iff vendor=GenuineIntel. 3. Disable vmload/vmsave virtualization if vendor=GenuineIntel. (It is somewhat insane to set vendor=GenuineIntel and still enable SVM for the guest but well whatever). Then zero the high 32 bit parts when kvm intercepts and emulates vmload. Thanks a lot to Paulo Bonzini for helping me with fixing this in the most correct way. This patch fixes nested migration of 32 bit nested guests, that was broken because incorrect cached values of SYSENTER msrs were stored in the migration stream if L1 changed these msrs with vmload prior to L2 entry. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210401111928.996871-3-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
This is similar to existing 'guest_cpuid_is_amd_or_hygon' Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210401111928.996871-2-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Switch to GFP_KERNEL_ACCOUNT for a handful of allocations that are clearly associated with a single task/VM. Note, there are a several SEV allocations that aren't accounted, but those can (hopefully) be fixed by using the local stack for memory. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210331023025.2485960-3-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Reject KVM_SEV_INIT and KVM_SEV_ES_INIT if they are attempted after one or more vCPUs have been created. KVM assumes a VM is tagged SEV/SEV-ES prior to vCPU creation, e.g. init_vmcb() needs to mark the VMCB as SEV enabled, and svm_create_vcpu() needs to allocate the VMSA. At best, creating vCPUs before SEV/SEV-ES init will lead to unexpected errors and/or behavior, and at worst it will crash the host, e.g. sev_launch_update_vmsa() will dereference a null svm->vmsa pointer. Fixes: 1654efcb ("KVM: SVM: Add KVM_SEV_INIT command") Fixes: ad73109a ("KVM: SVM: Provide support to launch and run an SEV-ES guest") Cc: stable@vger.kernel.org Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210331031936.2495277-4-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-