- 15 3月, 2021 14 次提交
-
-
由 Sean Christopherson 提交于
WARN if KVM is about to dereference a NULL pae_root or lm_root when loading an MMU, and convert the BUG() on a bad shadow_root_level into a WARN (now that errors are handled cleanly). With nested NPT, botching the level and sending KVM down the wrong path is all too easy, and the on-demand allocation of pae_root and lm_root means bugs crash the host. Obviously, KVM could unconditionally allocate the roots, but that's arguably a worse failure mode as it would potentially corrupt the guest instead of crashing it. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-18-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
For clarity, explicitly skip syncing roots if the MMU load failed instead of relying on the !VALID_PAGE check in kvm_mmu_sync_roots(). Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-17-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Unexport the MMU load and unload helpers now that they are no longer used (incorrectly) in vendor code. Opportunistically move the kvm_mmu_sync_roots() declaration into mmu.h, it should not be exposed to vendor code. No functional change intended. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-16-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Defer unloading the MMU after a INVPCID until the instruction emulation has completed, i.e. until after RIP has been updated. On VMX, this is a benign bug as VMX doesn't touch the MMU when skipping an emulated instruction. However, on SVM, if nrip is disabled, the emulator is used to skip an instruction, which would lead to fireworks if the emulator were invoked without a valid MMU. Fixes: eb4b248e ("kvm: vmx: Support INVPCID in shadow paging mode") Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-15-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Defer reloading the MMU after a EPTP successful EPTP switch. The VMFUNC instruction itself is executed in the previous EPTP context, any side effects, e.g. updating RIP, should occur in the old context. Practically speaking, this bug is benign as VMX doesn't touch the MMU when skipping an emulated instruction, nor does queuing a single-step #DB. No other post-switch side effects exist. Fixes: 41ab9372 ("KVM: nVMX: Emulate EPTP switching for the L1 hypervisor") Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-14-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Set the C-bit in SPTEs that are set outside of the normal MMU flows, specifically the PDPDTRs and the handful of special cased "LM root" entries, all of which are shadow paging only. Note, the direct-mapped-root PDPTR handling is needed for the scenario where paging is disabled in the guest, in which case KVM uses a direct mapped MMU even though TDP is disabled. Fixes: d0ec49d4 ("kvm/x86/svm: Support Secure Memory Encryption within KVM") Cc: stable@vger.kernel.org Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-11-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Exempt NULL PAE roots from the check to detect leaks, since kvm_mmu_free_roots() doesn't set them back to INVALID_PAGE. Stop hiding the WARNs to detect PAE root leaks behind MMU_WARN_ON, the hidden WARNs obviously didn't do their job given the hilarious number of bugs that could lead to PAE roots being leaked, not to mention the above false positive. Opportunistically delete a warning on root_hpa being valid, there's nothing special about 4/5-level shadow pages that warrants a WARN. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-9-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Check the validity of the PDPTRs before allocating any of the PAE roots, otherwise a bad PDPTR will cause KVM to leak any previously allocated roots. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-8-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Hold the mmu_lock for write for the entire duration of allocating and initializing an MMU's roots. This ensures there are MMU pages available and thus prevents root allocations from failing. That in turn fixes a bug where KVM would fail to free valid PAE roots if a one of the later roots failed to allocate. Add a comment to make_mmu_pages_available() to call out that the limit is a soft limit, e.g. KVM will temporarily exceed the threshold if a page fault allocates multiple shadow pages and there was only one page "available". Note, KVM _still_ leaks the PAE roots if the guest PDPTR checks fail. This will be addressed in a future commit. Cc: Ben Gardon <bgardon@google.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-7-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Move the on-demand allocation of the pae_root and lm_root pages, used by nested NPT for 32-bit L1s, into a separate helper. This will allow a future patch to hold mmu_lock while allocating the non-special roots so that make_mmu_pages_available() can be checked once at the start of root allocation, and thus avoid having to deal with failure in the middle of root allocation. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-6-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Allocate lm_root before the PAE roots so that the PAE roots aren't leaked if the memory allocation for the lm_root happens to fail. Note, KVM can still leak PAE roots if mmu_check_root() fails on a guest's PDPTR, or if mmu_alloc_root() fails due to MMU pages not being available. Those issues will be fixed in future commits. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-5-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Grab 'mmu' and do s/vcpu->arch.mmu/mmu to shorten line lengths and yield smaller diffs when moving code around in future cleanup without forcing the new code to use the same ugly pattern. No functional change intended. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-4-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Allocate the so called pae_root page on-demand, along with the lm_root page, when shadowing 32-bit NPT with 64-bit NPT, i.e. when running a 32-bit L1. KVM currently only allocates the page when NPT is disabled, or when L0 is 32-bit (using PAE paging). Note, there is an existing memory leak involving the MMU roots, as KVM fails to free the PAE roots on failure. This will be addressed in a future commit. Fixes: ee6268ba ("KVM: x86: Skip pae_root shadow allocation if tdp enabled") Fixes: b6b80c78 ("KVM: x86/mmu: Allocate PAE root array when using SVM's 32-bit NPT") Cc: stable@vger.kernel.org Reviewed-by: NBen Gardon <bgardon@google.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-3-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Dongli Zhang 提交于
The new per-cpu stat 'nested_run' is introduced in order to track if L1 VM is running or used to run L2 VM. An example of the usage of 'nested_run' is to help the host administrator to easily track if any L1 VM is used to run L2 VM. Suppose there is issue that may happen with nested virtualization, the administrator will be able to easily narrow down and confirm if the issue is due to nested virtualization via 'nested_run'. For example, whether the fix like commit 88dddc11 ("KVM: nVMX: do not use dangling shadow VMCS after guest reset") is required. Cc: Joe Jin <joe.jin@oracle.com> Signed-off-by: NDongli Zhang <dongli.zhang@oracle.com> Message-Id: <20210305225747.7682-1-dongli.zhang@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 13 3月, 2021 2 次提交
-
-
由 Wanpeng Li 提交于
Advancing the timer expiration should only be necessary on guest initiated writes. When we cancel the timer and clear .pending during state restore, clear expired_tscdeadline as well. Reviewed-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Message-Id: <1614818118-965-1-git-send-email-wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
If mmu_lock is held for write, don't bother setting !PRESENT SPTEs to REMOVED_SPTE when recursively zapping SPTEs as part of shadow page removal. The concurrent write protections provided by REMOVED_SPTE are not needed, there are no backing page side effects to record, and MMIO SPTEs can be left as is since they are protected by the memslot generation, not by ensuring that the MMIO SPTE is unreachable (which is racy with respect to lockless walks regardless of zapping behavior). Skipping !PRESENT drastically reduces the number of updates needed to tear down sparsely populated MMUs, e.g. when tearing down a 6gb VM that didn't touch much memory, 6929/7168 (~96.6%) of SPTEs were '0' and could be skipped. Avoiding the write itself is likely close to a wash, but avoiding __handle_changed_spte() is a clear-cut win as that involves saving and restoring all non-volatile GPRs (it's a subtly big function), as well as several conditional branches before bailing out. Cc: Ben Gardon <bgardon@google.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210310003029.1250571-1-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 10 3月, 2021 1 次提交
-
-
由 Sean Christopherson 提交于
Initialize x86_pmu.guest_get_msrs to return 0/NULL to handle the "nop" case. Patching in perf_guest_get_msrs_nop() during setup does not work if there is no PMU, as setup bails before updating the static calls, leaving x86_pmu.guest_get_msrs NULL and thus a complete nop. Ultimately, this causes VMX abort on VM-Exit due to KVM putting random garbage from the stack into the MSR load list. Add a comment in KVM to note that nr_msrs is valid if and only if the return value is non-NULL. Fixes: abd562df ("x86/perf: Use static_call for x86_pmu.guest_get_msrs") Reported-by: NDmitry Vyukov <dvyukov@google.com> Reported-by: syzbot+cce9ef2dd25246f815ee@syzkaller.appspotmail.com Suggested-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210309171019.1125243-1-seanjc@google.com
-
- 06 3月, 2021 1 次提交
-
-
由 Muhammad Usama Anjum 提交于
Sparse warnings removed: warning: Using plain integer as NULL pointer Signed-off-by: NMuhammad Usama Anjum <musamaanjum@gmail.com> Message-Id: <20210305180816.GA488770@LEGION> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 05 3月, 2021 2 次提交
-
-
由 Sean Christopherson 提交于
Directly connect the 'npt' param to the 'npt_enabled' variable so that runtime adjustments to npt_enabled are reflected in sysfs. Move the !PAE restriction to a runtime check to ensure NPT is forced off if the host is using 2-level paging, and add a comment explicitly stating why NPT requires a 64-bit kernel or a kernel with PAE enabled. Opportunistically switch the param to octal permissions. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210305021637.3768573-1-seanjc@google.com> Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
When posting a deadline timer interrupt, open code the checks guarding __kvm_wait_lapic_expire() in order to skip the lapic_timer_int_injected() check in kvm_wait_lapic_expire(). The injection check will always fail since the interrupt has not yet be injected. Moving the call after injection would also be wrong as that wouldn't actually delay delivery of the IRQ if it is indeed sent via posted interrupt. Fixes: 010fd37f ("KVM: LAPIC: Reduce world switch latency caused by timer_advance_ns") Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210305021808.3769732-1-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 03 3月, 2021 4 次提交
-
-
由 Babu Moger 提交于
This problem was reported on a SVM guest while executing kexec. Kexec fails to load the new kernel when the PCID feature is enabled. When kexec starts loading the new kernel, it starts the process by resetting the vCPU's and then bringing each vCPU online one by one. The vCPU reset is supposed to reset all the register states before the vCPUs are brought online. However, the CR4 register is not reset during this process. If this register is already setup during the last boot, all the flags can remain intact. The X86_CR4_PCIDE bit can only be enabled in long mode. So, it must be enabled much later in SMP initialization. Having the X86_CR4_PCIDE bit set during SMP boot can cause a boot failures. Fix the issue by resetting the CR4 register in init_vmcb(). Signed-off-by: NBabu Moger <babu.moger@amd.com> Message-Id: <161471109108.30811.6392805173629704166.stgit@bmoger-ubuntu> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Woodhouse 提交于
This is how Xen guests do steal time accounting. The hypervisor records the amount of time spent in each of running/runnable/blocked/offline states. In the Xen accounting, a vCPU is still in state RUNSTATE_running while in Xen for a hypercall or I/O trap, etc. Only if Xen explicitly schedules does the state become RUNSTATE_blocked. In KVM this means that even when the vCPU exits the kvm_run loop, the state remains RUNSTATE_running. The VMM can explicitly set the vCPU to RUNSTATE_blocked by using the KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_CURRENT attribute, and can also use KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST to retrospectively add a given amount of time to the blocked state and subtract it from the running state. The state_entry_time corresponds to get_kvmclock_ns() at the time the vCPU entered the current state, and the total times of all four states should always add up to state_entry_time. Co-developed-by: NJoao Martins <joao.m.martins@oracle.com> Signed-off-by: NJoao Martins <joao.m.martins@oracle.com> Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Message-Id: <20210301125309.874953-2-dwmw2@infradead.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Woodhouse 提交于
When clearing the per-vCPU shared regions, set the return value to zero to indicate success. This was causing spurious errors to be returned to userspace on soft reset. Also add a paranoid BUILD_BUG_ON() for compat structure compatibility. Fixes: 0c165b3c ("KVM: x86/xen: Allow reset of Xen attributes") Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Message-Id: <20210301125309.874953-1-dwmw2@infradead.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
The Xen hypercall interface adds to the attack surface of the hypervisor and will be used quite rarely. Allow compiling it out. Suggested-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 26 2月, 2021 3 次提交
-
-
由 Paolo Bonzini 提交于
A missing flush would cause the static branch to trigger incorrectly. Cc: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Check that PML is actually enabled before setting the mask to force a SPTE to be write-protected. The bits used for the !AD_ENABLED case are in the upper half of the SPTE. With 64-bit paging and EPT, these bits are ignored, but with 32-bit PAE paging they are reserved. Setting them for L2 SPTEs without checking PML breaks NPT on 32-bit KVM. Fixes: 1f4e5fc8 ("KVM: x86: fix nested guest live migration with PML") Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210225204749.1512652-2-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
Reported by syzkaller: KASAN: null-ptr-deref in range [0x0000000000000140-0x0000000000000147] CPU: 1 PID: 8370 Comm: syz-executor859 Not tainted 5.11.0-syzkaller #0 RIP: 0010:synic_get arch/x86/kvm/hyperv.c:165 [inline] RIP: 0010:kvm_hv_set_sint_gsi arch/x86/kvm/hyperv.c:475 [inline] RIP: 0010:kvm_hv_irq_routing_update+0x230/0x460 arch/x86/kvm/hyperv.c:498 Call Trace: kvm_set_irq_routing+0x69b/0x940 arch/x86/kvm/../../../virt/kvm/irqchip.c:223 kvm_vm_ioctl+0x12d0/0x2800 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3959 vfs_ioctl fs/ioctl.c:48 [inline] __do_sys_ioctl fs/ioctl.c:753 [inline] __se_sys_ioctl fs/ioctl.c:739 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae Hyper-V context is lazily allocated until Hyper-V specific MSRs are accessed or SynIC is enabled. However, the syzkaller testcase sets irq routing table directly w/o enabling SynIC. This results in null-ptr-deref when accessing SynIC Hyper-V context. This patch fixes it. syzkaller source: https://syzkaller.appspot.com/x/repro.c?x=163342ccd00000 Reported-by: syzbot+6987f3b2dbd9eda95f12@syzkaller.appspotmail.com Fixes: 8f014550 ("KVM: x86: hyper-v: Make Hyper-V emulation enablement conditional") Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Message-Id: <1614326399-5762-1-git-send-email-wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 25 2月, 2021 1 次提交
-
-
由 Sean Christopherson 提交于
Fix the interpreation of nested_svm_vmexit()'s return value when synthesizing a nested VM-Exit after intercepting an SVM instruction while L2 was running. The helper returns '0' on success, whereas a return value of '0' in the exit handler path means "exit to userspace". The incorrect return value causes KVM to exit to userspace without filling the run state, e.g. QEMU logs "KVM: unknown exit, hardware reason 0". Fixes: 14c2bf81 ("KVM: SVM: Fix #GP handling for doubly-nested virtualization") Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210224005627.657028-1-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 24 2月, 2021 1 次提交
-
-
由 Like Xu 提交于
If lbr_desc->event is successfully created, the intel_pmu_create_ guest_lbr_event() will return 0, otherwise it will return -ENOENT, and then jump to LBR msrs dummy handling. Fixes: 1b5ac322 ("KVM: vmx/pmu: Pass-through LBR msrs when the guest LBR event is ACTIVE") Signed-off-by: NLike Xu <like.xu@linux.intel.com> Message-Id: <20210223013958.1280444-1-like.xu@linux.intel.com> [Add "< 0" and PTR_ERR to make the code clearer. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 23 2月, 2021 3 次提交
-
-
由 David Stevens 提交于
Track the range being invalidated by mmu_notifier and skip page fault retries if the fault address is not affected by the in-progress invalidation. Handle concurrent invalidations by finding the minimal range which includes all ranges being invalidated. Although the combined range may include unrelated addresses and cannot be shrunk as individual invalidation operations complete, it is unlikely the marginal gains of proper range tracking are worth the additional complexity. The primary benefit of this change is the reduction in the likelihood of extreme latency when handing a page fault due to another thread having been preempted while modifying host virtual addresses. Signed-off-by: NDavid Stevens <stevensd@chromium.org> Message-Id: <20210222024522.1751719-3-stevensd@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Don't retry a page fault due to an mmu_notifier invalidation when handling a page fault for a GPA that did not resolve to a memslot, i.e. an MMIO page fault. Invalidations from the mmu_notifier signal a change in a host virtual address (HVA) mapping; without a memslot, there is no HVA and thus no possibility that the invalidation is relevant to the page fault being handled. Note, the MMIO vs. memslot generation checks handle the case where a pending memslot will create a memslot overlapping the faulting GPA. The mmu_notifier checks are orthogonal to memslot updates. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210222024522.1751719-2-stevensd@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Right now, enter_svm_guest_mode is calling nested_prepare_vmcb_save and nested_prepare_vmcb_control. This results in is_guest_mode being false until the end of nested_prepare_vmcb_control. This is a problem because nested_prepare_vmcb_save can in turn cause changes to the intercepts and these have to be applied to the "host VMCB" (stored in svm->nested.hsave) and then merged with the VMCB12 intercepts into svm->vmcb. In particular, without this change we forget to set the CR0 read and CR0 write intercepts when running a real mode L2 guest with NPT disabled. The guest is therefore able to see the CR0.PG bit that KVM sets to enable "paged real mode". This patch fixes the svm.flat mode_switch test case with npt=0. There are no other problematic calls in nested_prepare_vmcb_save. Moving is_guest_mode to the end is done since commit 06fc7772 ("KVM: SVM: Activate nested state only when guest state is complete", 2010-04-25). However, back then KVM didn't grab a different VMCB when updating the intercepts, it had already copied/merged L1's stuff to L0's VMCB, and then updated L0's VMCB regardless of is_nested(). Later recalc_intercepts was introduced in commit 384c6368 ("KVM: SVM: Add function to recalculate intercept masks", 2011-01-12). This introduced the bug, because recalc_intercepts now throws away the intercept manipulations that svm_set_cr0 had done in the meanwhile to svm->vmcb. [1] https://lore.kernel.org/kvm/1266493115-28386-1-git-send-email-joerg.roedel@amd.com/Reviewed-by: NSean Christopherson <seanjc@google.com> Tested-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 19 2月, 2021 8 次提交
-
-
由 Sean Christopherson 提交于
Remove several exports from the MMU that are no longer necessary. No functional change intended. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-15-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Drop kvm_mmu_slot_largepage_remove_write_access() and refactor its sole caller to use kvm_mmu_slot_remove_write_access(). Remove the now-unused slot_handle_large_level() and slot_handle_all_level() helpers. No functional change intended. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-14-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Stop setting dirty bits for MMU pages when dirty logging is disabled for a memslot, as PML is now completely disabled when there are no memslots with dirty logging enabled. This means that spurious PML entries will be created for memslots with dirty logging disabled if at least one other memslot has dirty logging enabled. However, spurious PML entries are already possible since dirty bits are set only when a dirty logging is turned off, i.e. memslots that are never dirty logged will have dirty bits cleared. In the end, it's faster overall to eat a few spurious PML entries in the window where dirty logging is being disabled across all memslots. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-13-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Makarand Sonare 提交于
Currently, if enable_pml=1 PML remains enabled for the entire lifetime of the VM irrespective of whether dirty logging is enable or disabled. When dirty logging is disabled, all the pages of the VM are manually marked dirty, so that PML is effectively non-operational. Setting the dirty bits is an expensive operation which can cause severe MMU lock contention in a performance sensitive path when dirty logging is disabled after a failed or canceled live migration. Manually setting dirty bits also fails to prevent PML activity if some code path clears dirty bits, which can incur unnecessary VM-Exits. In order to avoid this extra overhead, dynamically enable/disable PML when dirty logging gets turned on/off for the first/last memslot. Signed-off-by: NMakarand Sonare <makarandsonare@google.com> Co-developed-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-12-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Add a sanity check in kvm_mmu_slot_apply_flags to assert that the LOG_DIRTY_PAGES flag is indeed being toggled, and explicitly rely on that holding true when zapping collapsible SPTEs. Manipulating the CPU dirty log (PML) and write-protection also relies on this assertion, but that's not obvious in the current code. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-11-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Drop the facade of KVM's PML logic being vendor specific and move the bits that aren't truly VMX specific into common x86 code. The MMU logic for dealing with PML is tightly coupled to the feature and to VMX's implementation, bouncing through kvm_x86_ops obfuscates the code without providing any meaningful separation of concerns or encapsulation. No functional change intended. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-10-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Store the vendor-specific dirty log size in a variable, there's no need to wrap it in a function since the value is constant after hardware_setup() runs. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-9-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Expand the comment about need to use write-protection for nested EPT when PML is enabled to clarify that the tagging is a nop when PML is _not_ enabled. Without the clarification, omitting the PML check looks wrong at first^Wfifth glance. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-8-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-