- 22 10月, 2021 3 次提交
-
-
由 David Matlack 提交于
slot_handle_leaf is a misnomer because it only operates on 4K SPTEs whereas "leaf" is used to describe any valid terminal SPTE (4K or large page). Rename slot_handle_leaf to slot_handle_level_4k to avoid confusion. Making this change makes it more obvious there is a benign discrepency between the legacy MMU and the TDP MMU when it comes to dirty logging. The legacy MMU only iterates through 4K SPTEs when zapping for collapsing and when clearing D-bits. The TDP MMU, on the other hand, iterates through SPTEs on all levels. The TDP MMU behavior of zapping SPTEs at all levels is technically overkill for its current dirty logging implementation, which always demotes to 4k SPTES, but both the TDP MMU and legacy MMU zap if and only if the SPTE can be replaced by a larger page, i.e. will not spuriously zap 2m (or larger) SPTEs. Opportunistically add comments to explain this discrepency in the code. Signed-off-by: NDavid Matlack <dmatlack@google.com> Message-Id: <20211019162223.3935109-1-dmatlack@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
"prefetch", "prefault" and "speculative" are used throughout KVM to mean the same thing. Use a single name, standardizing on "prefetch" which is already used by various functions such as direct_pte_prefetch, FNAME(prefetch_gpte), FNAME(pte_prefetch), etc. Suggested-by: NDavid Matlack <dmatlack@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Stevens 提交于
Unify the flags for rmaps and page tracking data, using a single flag in struct kvm_arch and a single loop to go over all the address spaces and memslots. This avoids code duplication between alloc_all_memslots_rmaps and kvm_page_track_enable_mmu_write_tracking. Signed-off-by: NDavid Stevens <stevensd@chromium.org> [This patch is the delta between David's v2 and v3, with conflicts fixed and my own commit message. - Paolo] Co-developed-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 18 10月, 2021 1 次提交
-
-
由 Andrei Vagin 提交于
This looks like a typo in 8f32d5e5. This change didn't intend to do any functional changes. The problem was caught by gVisor tests. Fixes: 8f32d5e5 ("KVM: x86/mmu: allow kvm_faultin_pfn to return page fault handling code") Cc: Maxim Levitsky <mlevitsk@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: NAndrei Vagin <avagin@gmail.com> Message-Id: <20211015163221.472508-1-avagin@gmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 01 10月, 2021 27 次提交
-
-
由 David Stevens 提交于
Avoid allocating the gfn_track arrays if nothing needs them. If there are no external to KVM users of the API (i.e. no GVT-g), then page tracking is only needed for shadow page tables. This means that when tdp is enabled and there are no external users, then the gfn_track arrays can be lazily allocated when the shadow MMU is actually used. This avoid allocations equal to .05% of guest memory when nested virtualization is not used, if the kernel is compiled without GVT-g. Signed-off-by: NDavid Stevens <stevensd@chromium.org> Message-Id: <20210922045859.2011227-3-stevensd@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Matlack 提交于
mmu_try_to_unsync_pages checks if page tracking is active for the given gfn, which requires knowing the memslot. We can pass down the memslot via make_spte to avoid this lookup. The memslot is also handy for make_spte's marking of the gfn as dirty: we can test whether dirty page tracking is enabled, and if so ensure that pages are mapped as writable with 4K granularity. Apart from the warning, no functional change is intended. Signed-off-by: NDavid Matlack <dmatlack@google.com> Message-Id: <20210813203504.2742757-7-dmatlack@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Matlack 提交于
Avoid the memslot lookup in rmap_add, by passing it down from the fault handling code to mmu_set_spte and then to rmap_add. No functional change intended. Signed-off-by: NDavid Matlack <dmatlack@google.com> Message-Id: <20210813203504.2742757-6-dmatlack@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
mmu_set_spte is called for either PTE prefetching or page faults. The three boolean arguments write_fault, speculative and host_writable are always respectively false/true/true for prefetching and coming from a struct kvm_page_fault for page faults. Let mmu_set_spte distinguish these two situation by accepting a possibly NULL struct kvm_page_fault argument. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
The level and A/D bit support of the new SPTE can be found in the role, which is stored in the kvm_mmu_page struct. This merges two arguments into one. For the TDP MMU, the kvm_mmu_page was not used (kvm_tdp_mmu_map does not use it if the SPTE is already present) so we fetch it just before calling make_spte. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
The level of the new SPTE can be found in the kvm_mmu_page struct; there is no need to pass it down. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Now that make_spte is called directly by the shadow MMU (rather than wrapped by set_spte), it only has to return one boolean value. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Since the two callers of set_spte do different things with the results, inlining it actually makes the code simpler to reason about. For example, FNAME(sync_page) already has a struct kvm_mmu_page *, but set_spte had to fish it back out of sptep's private page data. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Since the two callers of set_spte do different things with the results, inlining it actually makes the code simpler to reason about. For example, mmu_set_spte looks quite like tdp_mmu_map_handle_target_level, but the similarity is hidden by set_spte. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Matlack 提交于
Now that kvm_page_fault has a pointer to the memslot it can be passed down to the page tracking code to avoid a redundant slot lookup. No functional change intended. Signed-off-by: NDavid Matlack <dmatlack@google.com> Message-Id: <20210813203504.2742757-5-dmatlack@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Matlack 提交于
The memslot for the faulting gfn is used throughout the page fault handling code, so capture it in kvm_page_fault as soon as we know the gfn and use it in the page fault handling code that has direct access to the kvm_page_fault struct. Replace various tests using is_noslot_pfn with more direct tests on fault->slot being NULL. This, in combination with the subsequent patch, improves "Populate memory time" in dirty_log_perf_test by 5% when using the legacy MMU. There is no discerable improvement to the performance of the TDP MMU. No functional change intended. Suggested-by: NBen Gardon <bgardon@google.com> Signed-off-by: NDavid Matlack <dmatlack@google.com> Message-Id: <20210813203504.2742757-4-dmatlack@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
This simplifies set_spte, which we want to remove, and unifies code between the shadow MMU and the TDP MMU. The warning will be added back later to make_spte as well. There is a small disadvantage in the TDP MMU; it may unnecessarily mark a page as dirty twice if two vCPUs end up mapping the same page twice. However, this is a very small cost for a case that is already rare. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Matlack 提交于
Consolidate rmap_recycle and rmap_add into a single function since they are only ever called together (and only from one place). This has a nice side effect of eliminating an extra kvm_vcpu_gfn_to_memslot(). In addition it makes mmu_set_spte(), which is a very long function, a little shorter. No functional change intended. Signed-off-by: NDavid Matlack <dmatlack@google.com> Message-Id: <20210813203504.2742757-3-dmatlack@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
WARN and bail if the shadow walk for faulting in a SPTE terminates early, i.e. doesn't reach the expected level because the walk encountered a terminal SPTE. The shadow walks for page faults are subtle in that they install non-leaf SPTEs (zapping leaf SPTEs if necessary!) in the loop body, and consume the newly created non-leaf SPTE in the loop control, e.g. __shadow_walk_next(). In other words, the walks guarantee that the walk will stop if and only if the target level is reached by installing non-leaf SPTEs to guarantee the walk remains valid. Opportunistically use fault->goal-level instead of it.level in FNAME(fetch) to further clarify that KVM always installs the leaf SPTE at the target level. Reviewed-by: NLai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20210906122547.263316-1-jiangshanlai@gmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Pass struct kvm_page_fault to tracepoints instead of extracting the arguments from the struct. This also lets the kvm_mmu_spte_requested tracepoint pick the gfn directly from fault->gfn, instead of using the address. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Pass struct kvm_page_fault to disallowed_hugepage_adjust() instead of extracting the arguments from the struct. Tweak a bit the conditions to avoid long lines. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Pass struct kvm_page_fault to kvm_mmu_hugepage_adjust() instead of extracting the arguments from the struct; the results are also stored in the struct, so the callers are adjusted consequently. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Pass struct kvm_page_fault to fast_page_fault() instead of extracting the arguments from the struct. Suggested-by: NIsaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Pass struct kvm_page_fault to kvm_tdp_mmu_map() instead of extracting the arguments from the struct. Suggested-by: NIsaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Pass struct kvm_page_fault to __direct_map() instead of extracting the arguments from the struct. Suggested-by: NIsaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Pass struct kvm_page_fault to handle_abnormal_pfn() instead of extracting the arguments from the struct. Suggested-by: NIsaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Add fields to struct kvm_page_fault corresponding to outputs of kvm_faultin_pfn(). For now they have to be extracted again from struct kvm_page_fault in the subsequent steps, but this is temporary until other functions in the chain are switched over as well. Suggested-by: NIsaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Add fields to struct kvm_page_fault corresponding to the arguments of page_fault_handle_page_track(). The fields are initialized in the callers, and page_fault_handle_page_track() receives a struct kvm_page_fault instead of having to extract the arguments out of it. Suggested-by: NIsaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Add fields to struct kvm_page_fault corresponding to the arguments of direct_page_fault(). The fields are initialized in the callers, and direct_page_fault() receives a struct kvm_page_fault instead of having to extract the arguments out of it. Also adjust FNAME(page_fault) to store the max_level in struct kvm_page_fault, to keep it similar to the direct map path. Suggested-by: NIsaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Pass struct kvm_page_fault to mmu->page_fault() instead of extracting the arguments from the struct. FNAME(page_fault) can use the precomputed bools from the error code. Suggested-by: NIsaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Do not bother removing the low bits of the gpa. This masking dates back to the very first commit of KVM but it is unnecessary, as exemplified by the other call in kvm_tdp_page_fault. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Lai Jiangshan 提交于
So far, the loop bodies already ensure the PTE is present before calling __shadow_walk_next(): Some loop bodies simply exit with a !PRESENT directly and some other loop bodies, i.e. FNAME(fetch) and __direct_map() do not currently guard their walks with is_shadow_present_pte, but only because they install present non-leaf SPTEs in the loop itself. But checking pte present in __shadow_walk_next() (which is called from shadow_walk_okay()) is more prudent; walking past a !PRESENT SPTE would lead to attempting to read a the next level SPTE from a garbage iter->shadow_addr. It also allows to remove the is_shadow_present_pte() checks from the loop bodies. Reviewed-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20210906122547.263316-2-jiangshanlai@gmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 30 9月, 2021 6 次提交
-
-
由 Lai Jiangshan 提交于
We'd better only unsync the pagetable when there just was a really write fault on a level-1 pagetable. Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20210918005636.3675-10-jiangshanlai@gmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Lai Jiangshan 提交于
In mmu_sync_children(), it can zap the invalid list after remote tlb flushing. Emptifying the invalid list ASAP might help reduce a remote tlb flushing in some cases. Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20210918005636.3675-8-jiangshanlai@gmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Lai Jiangshan 提交于
Currently kvm_sync_page() returns true when there is any present spte. But the return value is ignored in the callers. Changing kvm_sync_page() to return true when remote flush is needed and changing mmu->sync_page() not to directly flush can combine and reduce remote flush requests. Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20210918005636.3675-7-jiangshanlai@gmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Lai Jiangshan 提交于
Because local_flush is useless, kvm_mmu_flush_or_zap() can be removed and kvm_mmu_remote_flush_or_zap is used instead. Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20210918005636.3675-6-jiangshanlai@gmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Lai Jiangshan 提交于
After any shadow page modification, flushing tlb only on current VCPU is weird due to other VCPU's tlb might still be stale. In other words, if there is any mandatory tlb-flushing after shadow page modification, SET_SPTE_NEED_REMOTE_TLB_FLUSH or remote_flush should be set and the tlbs of all VCPUs should be flushed. There is not point to only flush current tlb except when the request is from vCPU's or pCPU's activities. If there was any bug that mandatory tlb-flushing is required and SET_SPTE_NEED_REMOTE_TLB_FLUSH/remote_flush is failed to set, this patch would expose the bug in a more destructive way. The related code paths are checked and no missing SET_SPTE_NEED_REMOTE_TLB_FLUSH is found yet. Currently, there is no optional tlb-flushing after sync page related code is changed to flush tlb timely. So we can just remove these local flushing code. Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20210918005636.3675-5-jiangshanlai@gmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Make a final call to direct_pte_prefetch_many() if there are "trailing" SPTEs to prefetch, i.e. SPTEs for GFNs following the faulting GFN. The call to direct_pte_prefetch_many() in the loop only handles the case where there are !PRESENT SPTEs preceding a PRESENT SPTE. E.g. if the faulting GFN is a multiple of 8 (the prefetch size) and all SPTEs for the following GFNs are !PRESENT, the loop will terminate with "start = sptep+1" and not prefetch any SPTEs. Prefetching trailing SPTEs as intended can drastically reduce the number of guest page faults, e.g. accessing the first byte of every 4kb page in a 6gb chunk of virtual memory, in a VM with 8gb of preallocated memory, the number of pf_fixed events observed in L0 drops from ~1.75M to <0.27M. Note, this only affects memory that is backed by 4kb pages as KVM doesn't prefetch when installing hugepages. Shadow paging prefetching is not affected as it does not batch the prefetches due to the need to process the corresponding guest PTE. The TDP MMU is not affected because it doesn't have prefetching, yet... Fixes: 957ed9ef ("KVM: MMU: prefetch ptes when intercepted guest #PF") Cc: Sergey Senozhatsky <senozhatsky@google.com> Cc: Ben Gardon <bgardon@google.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210818235615.2047588-1-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 23 9月, 2021 1 次提交
-
-
由 Lai Jiangshan 提交于
If gpte is changed from non-present to present, the guest doesn't need to flush tlb per SDM. So the host must synchronze sp before link it. Otherwise the guest might use a wrong mapping. For example: the guest first changes a level-1 pagetable, and then links its parent to a new place where the original gpte is non-present. Finally the guest can access the remapped area without flushing the tlb. The guest's behavior should be allowed per SDM, but the host kvm mmu makes it wrong. Fixes: 4731d4c7 ("KVM: MMU: out of sync shadow core") Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20210918005636.3675-3-jiangshanlai@gmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 06 9月, 2021 2 次提交
-
-
由 Paolo Bonzini 提交于
It is reasonable for these functions to be used only in some configurations, for example only if the host is 64-bits (and therefore supports 64-bit guests). It is also reasonable to keep the role_regs and role accessors in sync even though some of the accessors may be used only for one of the two sets (as is the case currently for CR4.LA57).. Because clang reports warnings for unused inlines declared in a .c file, mark both sets of accessors as __maybe_unused. Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Revert a misguided illegal GPA check when "translating" a non-nested GPA. The check is woefully incomplete as it does not fill in @exception as expected by all callers, which leads to KVM attempting to inject a bogus exception, potentially exposing kernel stack information in the process. WARNING: CPU: 0 PID: 8469 at arch/x86/kvm/x86.c:525 exception_type+0x98/0xb0 arch/x86/kvm/x86.c:525 CPU: 1 PID: 8469 Comm: syz-executor531 Not tainted 5.14.0-rc7-syzkaller #0 RIP: 0010:exception_type+0x98/0xb0 arch/x86/kvm/x86.c:525 Call Trace: x86_emulate_instruction+0xef6/0x1460 arch/x86/kvm/x86.c:7853 kvm_mmu_page_fault+0x2f0/0x1810 arch/x86/kvm/mmu/mmu.c:5199 handle_ept_misconfig+0xdf/0x3e0 arch/x86/kvm/vmx/vmx.c:5336 __vmx_handle_exit arch/x86/kvm/vmx/vmx.c:6021 [inline] vmx_handle_exit+0x336/0x1800 arch/x86/kvm/vmx/vmx.c:6038 vcpu_enter_guest+0x2a1c/0x4430 arch/x86/kvm/x86.c:9712 vcpu_run arch/x86/kvm/x86.c:9779 [inline] kvm_arch_vcpu_ioctl_run+0x47d/0x1b20 arch/x86/kvm/x86.c:10010 kvm_vcpu_ioctl+0x49e/0xe50 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3652 The bug has escaped notice because practically speaking the GPA check is useless. The GPA check in question only comes into play when KVM is walking guest page tables (or "translating" CR3), and KVM already handles illegal GPA checks by setting reserved bits in rsvd_bits_mask for each PxE, or in the case of CR3 for loading PTDPTRs, manually checks for an illegal CR3. This particular failure doesn't hit the existing reserved bits checks because syzbot sets guest.MAXPHYADDR=1, and IA32 architecture simply doesn't allow for such an absurd MAXPHYADDR, e.g. 32-bit paging doesn't define any reserved PA bits checks, which KVM emulates by only incorporating the reserved PA bits into the "high" bits, i.e. bits 63:32. Simply remove the bogus check. There is zero meaningful value and no architectural justification for supporting guest.MAXPHYADDR < 32, and properly filling the exception would introduce non-trivial complexity. This reverts commit ec7771ab. Fixes: ec7771ab ("KVM: x86: mmu: Add guest physical address check in translate_gpa()") Cc: stable@vger.kernel.org Reported-by: syzbot+200c08e88ae818f849ce@syzkaller.appspotmail.com Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210831164224.1119728-2-seanjc@google.com> Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-