- 22 9月, 2021 9 次提交
-
-
由 Hou Wenlong 提交于
According to Intel's SDM Vol2 and AMD's APM Vol3, when CR4.TSD is set, use rdtsc/rdtscp instruction above privilege level 0 should trigger a #GP. Fixes: d7eb8203 ("KVM: SVM: Add intercept checks for remaining group7 instructions") Signed-off-by: NHou Wenlong <houwenlong93@linux.alibaba.com> Message-Id: <1297c0dd3f1bb47a6d089f850b629c7aa0247040.1629257115.git.houwenlong93@linux.alibaba.com> Reviewed-by: NSean Christopherson <seanjc@google.com> Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Require the target guest page to be writable when pinning memory for RECEIVE_UPDATE_DATA. Per the SEV API, the PSP writes to guest memory: The result is then encrypted with GCTX.VEK and written to the memory pointed to by GUEST_PADDR field. Fixes: 15fb7de1 ("KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command") Cc: stable@vger.kernel.org Cc: Peter Gonda <pgonda@google.com> Cc: Marc Orr <marcorr@google.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210914210951.2994260-2-seanjc@google.com> Reviewed-by: NBrijesh Singh <brijesh.singh@amd.com> Reviewed-by: NPeter Gonda <pgonda@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Mingwei Zhang 提交于
DECOMMISSION the current SEV context if binding an ASID fails after RECEIVE_START. Per AMD's SEV API, RECEIVE_START generates a new guest context and thus needs to be paired with DECOMMISSION: The RECEIVE_START command is the only command other than the LAUNCH_START command that generates a new guest context and guest handle. The missing DECOMMISSION can result in subsequent SEV launch failures, as the firmware leaks memory and might not able to allocate more SEV guest contexts in the future. Note, LAUNCH_START suffered the same bug, but was previously fixed by commit 934002cd ("KVM: SVM: Call SEV Guest Decommission if ASID binding fails"). Cc: Alper Gun <alpergun@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: David Rienjes <rientjes@google.com> Cc: Marc Orr <marcorr@google.com> Cc: John Allen <john.allen@amd.com> Cc: Peter Gonda <pgonda@google.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Vipin Sharma <vipinsh@google.com> Cc: stable@vger.kernel.org Reviewed-by: NMarc Orr <marcorr@google.com> Acked-by: NBrijesh Singh <brijesh.singh@amd.com> Fixes: af43cbbf ("KVM: SVM: Add support for KVM_SEV_RECEIVE_START command") Signed-off-by: NMingwei Zhang <mizhang@google.com> Reviewed-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210912181815.3899316-1-mizhang@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Peter Gonda 提交于
The update-VMSA ioctl touches data stored in struct kvm_vcpu, and therefore should not be performed concurrently with any VCPU ioctl that might cause KVM or the processor to use the same data. Adds vcpu mutex guard to the VMSA updating code. Refactors out __sev_launch_update_vmsa() function to deal with per vCPU parts of sev_launch_update_vmsa(). Fixes: ad73109a ("KVM: SVM: Provide support to launch and run an SEV-ES guest") Signed-off-by: NPeter Gonda <pgonda@google.com> Cc: Marc Orr <marcorr@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Cc: linux-kernel@vger.kernel.org Message-Id: <20210915171755.3773766-1-pgonda@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Yu Zhang 提交于
"VMXON pointer" is saved in vmx->nested.vmxon_ptr since commit 3573e22c ("KVM: nVMX: additional checks on vmxon region"). Also, handle_vmptrld() & handle_vmclear() now have logic to check the VMCS pointer against the VMXON pointer. So just remove the obsolete comments of handle_vmon(). Signed-off-by: NYu Zhang <yu.c.zhang@linux.intel.com> Message-Id: <20210908171731.18885-1-yu.c.zhang@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Haimin Zhang 提交于
Check the return of init_srcu_struct(), which can fail due to OOM, when initializing the page track mechanism. Lack of checking leads to a NULL pointer deref found by a modified syzkaller. Reported-by: NTCS Robot <tcs_robot@tencent.com> Signed-off-by: NHaimin Zhang <tcs_kernel@tencent.com> Message-Id: <1630636626-12262-1-git-send-email-tcs_kernel@tencent.com> [Move the call towards the beginning of kvm_arch_init_vm. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Remove vcpu_vmx.nr_active_uret_msrs and its associated comment, which are both defunct now that KVM keeps the list constant and instead explicitly tracks which entries need to be loaded into hardware. No functional change intended. Fixes: ee9d22e0 ("KVM: VMX: Use flag to indicate "active" uret MSRs instead of sorting list") Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210908002401.1947049-1-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Explicitly zero the guest's CR3 and mark it available+dirty at RESET/INIT. Per Intel's SDM and AMD's APM, CR3 is zeroed at both RESET and INIT. For RESET, this is a nop as vcpu is zero-allocated. For INIT, the bug has likely escaped notice because no firmware/kernel puts its page tables root at PA=0, let alone relies on INIT to get the desired CR3 for such page tables. Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210921000303.400537-3-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Mark all registers as available and dirty at vCPU creation, as the vCPU has obviously not been loaded into hardware, let alone been given the chance to be modified in hardware. On SVM, reading from "uninitialized" hardware is a non-issue as VMCBs are zero allocated (thus not truly uninitialized) and hardware does not allow for arbitrary field encoding schemes. On VMX, backing memory for VMCSes is also zero allocated, but true initialization of the VMCS _technically_ requires VMWRITEs, as the VMX architectural specification technically allows CPU implementations to encode fields with arbitrary schemes. E.g. a CPU could theoretically store the inverted value of every field, which would result in VMREAD to a zero-allocated field returns all ones. In practice, only the AR_BYTES fields are known to be manipulated by hardware during VMREAD/VMREAD; no known hardware or VMM (for nested VMX) does fancy encoding of cacheable field values (CR0, CR3, CR4, etc...). In other words, this is technically a bug fix, but practically speakings it's a glorified nop. Failure to mark registers as available has been a lurking bug for quite some time. The original register caching supported only GPRs (+RIP, which is kinda sorta a GPR), with the masks initialized at ->vcpu_reset(). That worked because the two cacheable registers, RIP and RSP, are generally speaking not read as side effects in other flows. Arguably, commit aff48baa ("KVM: Fetch guest cr3 from hardware on demand") was the first instance of failure to mark regs available. While _just_ marking CR3 available during vCPU creation wouldn't have fixed the VMREAD from an uninitialized VMCS bug because ept_update_paging_mode_cr0() unconditionally read vmcs.GUEST_CR3, marking CR3 _and_ intentionally not reading GUEST_CR3 when it's available would have avoided VMREAD to a technically-uninitialized VMCS. Fixes: aff48baa ("KVM: Fetch guest cr3 from hardware on demand") Fixes: 6de4f3ad ("KVM: Cache pdptrs") Fixes: 6de12732 ("KVM: VMX: Optimize vmx_get_rflags()") Fixes: 2fb92db1 ("KVM: VMX: Cache vmcs segment fields") Fixes: bd31fe49 ("KVM: VMX: Add proper cache tracking for CR0") Fixes: f98c1e77 ("KVM: VMX: Add proper cache tracking for CR4") Fixes: 5addc235 ("KVM: VMX: Cache vmcs.EXIT_QUALIFICATION using arch avail_reg flags") Fixes: 87915858 ("KVM: VMX: Cache vmcs.EXIT_INTR_INFO using arch avail_reg flags") Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210921000303.400537-2-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 06 9月, 2021 8 次提交
-
-
由 Zelin Deng 提交于
When MSR_IA32_TSC_ADJUST is written by guest due to TSC ADJUST feature especially there's a big tsc warp (like a new vCPU is hot-added into VM which has been up for a long time), tsc_offset is added by a large value then go back to guest. This causes system time jump as tsc_timestamp is not adjusted in the meantime and pvclock monotonic character. To fix this, just notify kvm to update vCPU's guest time before back to guest. Cc: stable@vger.kernel.org Signed-off-by: NZelin Deng <zelin.deng@linux.alibaba.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <1619576521-81399-2-git-send-email-zelin.deng@linux.alibaba.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
It is reasonable for these functions to be used only in some configurations, for example only if the host is 64-bits (and therefore supports 64-bit guests). It is also reasonable to keep the role_regs and role accessors in sync even though some of the accessors may be used only for one of the two sets (as is the case currently for CR4.LA57).. Because clang reports warnings for unused inlines declared in a .c file, mark both sets of accessors as __maybe_unused. Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Move "lpage_disallowed_link" out of the first 64 bytes, i.e. out of the first cache line, of kvm_mmu_page so that "spt" and to a lesser extent "gfns" land in the first cache line. "lpage_disallowed_link" is accessed relatively infrequently compared to "spt", which is accessed any time KVM is walking and/or manipulating the shadow page tables. No functional change intended. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210901221023.1303578-4-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Move "tdp_mmu_page" into the 1-byte void left by the recently removed "mmio_cached" so that it resides in the first 64 bytes of kvm_mmu_page, i.e. in the same cache line as the most commonly accessed fields. Don't bother wrapping tdp_mmu_page in CONFIG_X86_64, including the field in 32-bit builds doesn't affect the size of kvm_mmu_page, and a future patch can always wrap the field in the unlikely event KVM gains a 1-byte flag that is 32-bit specific. Note, the size of kvm_mmu_page is also unchanged on CONFIG_X86_64=y due to it previously sharing an 8-byte chunk with write_flooding_count. No functional change intended. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210901221023.1303578-3-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Revert a misguided illegal GPA check when "translating" a non-nested GPA. The check is woefully incomplete as it does not fill in @exception as expected by all callers, which leads to KVM attempting to inject a bogus exception, potentially exposing kernel stack information in the process. WARNING: CPU: 0 PID: 8469 at arch/x86/kvm/x86.c:525 exception_type+0x98/0xb0 arch/x86/kvm/x86.c:525 CPU: 1 PID: 8469 Comm: syz-executor531 Not tainted 5.14.0-rc7-syzkaller #0 RIP: 0010:exception_type+0x98/0xb0 arch/x86/kvm/x86.c:525 Call Trace: x86_emulate_instruction+0xef6/0x1460 arch/x86/kvm/x86.c:7853 kvm_mmu_page_fault+0x2f0/0x1810 arch/x86/kvm/mmu/mmu.c:5199 handle_ept_misconfig+0xdf/0x3e0 arch/x86/kvm/vmx/vmx.c:5336 __vmx_handle_exit arch/x86/kvm/vmx/vmx.c:6021 [inline] vmx_handle_exit+0x336/0x1800 arch/x86/kvm/vmx/vmx.c:6038 vcpu_enter_guest+0x2a1c/0x4430 arch/x86/kvm/x86.c:9712 vcpu_run arch/x86/kvm/x86.c:9779 [inline] kvm_arch_vcpu_ioctl_run+0x47d/0x1b20 arch/x86/kvm/x86.c:10010 kvm_vcpu_ioctl+0x49e/0xe50 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3652 The bug has escaped notice because practically speaking the GPA check is useless. The GPA check in question only comes into play when KVM is walking guest page tables (or "translating" CR3), and KVM already handles illegal GPA checks by setting reserved bits in rsvd_bits_mask for each PxE, or in the case of CR3 for loading PTDPTRs, manually checks for an illegal CR3. This particular failure doesn't hit the existing reserved bits checks because syzbot sets guest.MAXPHYADDR=1, and IA32 architecture simply doesn't allow for such an absurd MAXPHYADDR, e.g. 32-bit paging doesn't define any reserved PA bits checks, which KVM emulates by only incorporating the reserved PA bits into the "high" bits, i.e. bits 63:32. Simply remove the bogus check. There is zero meaningful value and no architectural justification for supporting guest.MAXPHYADDR < 32, and properly filling the exception would introduce non-trivial complexity. This reverts commit ec7771ab. Fixes: ec7771ab ("KVM: x86: mmu: Add guest physical address check in translate_gpa()") Cc: stable@vger.kernel.org Reported-by: syzbot+200c08e88ae818f849ce@syzkaller.appspotmail.com Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210831164224.1119728-2-seanjc@google.com> Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jia He 提交于
After reverting and restoring the fast tlb invalidation patch series, the mmio_cached is not removed. Hence a unused field is left in kvm_mmu_page. Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: NJia He <justin.he@arm.com> Message-Id: <20210830145336.27183-1-justin.he@arm.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
If we are emulating an invalid guest state, we don't have a correct exit reason, and thus we shouldn't do anything in this function. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210826095750.1650467-2-mlevitsk@redhat.com> Cc: stable@vger.kernel.org Fixes: 95b5a48c ("KVM: VMX: Handle NMIs, #MCs and async #PFs in common irqs-disabled fn", 2019-06-18) Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Include pml5_root in the set of special roots if and only if the host, and thus NPT, is using 5-level paging. mmu_alloc_special_roots() expects special roots to be allocated as a bundle, i.e. they're either all valid or all NULL. But for pml5_root, that expectation only holds true if the host uses 5-level paging, which causes KVM to WARN about pml5_root being NULL when the other special roots are valid. The silver lining of 4-level vs. 5-level NPT being tied to the host kernel's paging level is that KVM's shadow root level is constant; unlike VMX's EPT, KVM can't choose 4-level NPT based on guest.MAXPHYADDR. That means KVM can still expect pml5_root to be bundled with the other special roots, it just needs to be conditioned on the shadow root level. Fixes: cb0f722a ("KVM: x86/mmu: Support shadowing NPT when 5-level paging is enabled in host") Reported-by: NMaxim Levitsky <mlevitsk@redhat.com> Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210824005824.205536-1-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 21 8月, 2021 23 次提交
-
-
由 Wei Huang 提交于
When the 5-level page table is enabled on host OS, the nested page table for guest VMs must use 5-level as well. Update get_npt_level() function to reflect this requirement. In the meanwhile, remove the code that prevents kvm-amd driver from being loaded when 5-level page table is detected. Signed-off-by: NWei Huang <wei.huang2@amd.com> Message-Id: <20210818165549.3771014-4-wei.huang2@amd.com> [Tweak condition as suggested by Sean. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wei Huang 提交于
When the 5-level page table CPU flag is set in the host, but the guest has CR4.LA57=0 (including the case of a 32-bit guest), the top level of the shadow NPT page tables will be fixed, consisting of one pointer to a lower-level table and 511 non-present entries. Extend the existing code that creates the fixed PML4 or PDP table, to provide a fixed PML5 table if needed. This is not needed on EPT because the number of layers in the tables is specified in the EPTP instead of depending on the host CR4. Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NWei Huang <wei.huang2@amd.com> Message-Id: <20210818165549.3771014-3-wei.huang2@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wei Huang 提交于
AMD future CPUs will require a 5-level NPT if host CR4.LA57 is set. To prevent kvm_mmu_get_tdp_level() from incorrectly changing NPT level on behalf of CPUs, add a new parameter in kvm_configure_mmu() to force a fixed TDP level. Signed-off-by: NWei Huang <wei.huang2@amd.com> Message-Id: <20210818165549.3771014-2-wei.huang2@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
This change started as a way to make kvm_mmu_hugepage_adjust a bit simpler, but it does fix two bugs as well. One bug is in zapping collapsible PTEs. If a large page size is disallowed but not all of them, kvm_mmu_max_mapping_level will return the host mapping level and the small PTEs will be zapped up to that level. However, if e.g. 1GB are prohibited, we can still zap 4KB mapping and preserve the 2MB ones. This can happen for example when NX huge pages are in use. The second would happen when userspace backs guest memory with a 1gb hugepage but only assign a subset of the page to the guest. 1gb pages would be disallowed by the memslot, but not 2mb. kvm_mmu_max_mapping_level() would fall through to the host_pfn_mapping_level() logic, see the 1gb hugepage, and map the whole thing into the guest. Fixes: 2f57b705 ("KVM: x86/mmu: Persist gfn_lpage_is_disallowed() to max_level") Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
KVM_GUESTDBG_BLOCKIRQ will allow KVM to block all interrupts while running. This change is mostly intended for more robust single stepping of the guest and it has the following benefits when enabled: * Resuming from a breakpoint is much more reliable. When resuming execution from a breakpoint, with interrupts enabled, more often than not, KVM would inject an interrupt and make the CPU jump immediately to the interrupt handler and eventually return to the breakpoint, to trigger it again. From the user point of view it looks like the CPU never executed a single instruction and in some cases that can even prevent forward progress, for example, when the breakpoint is placed by an automated script (e.g lx-symbols), which does something in response to the breakpoint and then continues the guest automatically. If the script execution takes enough time for another interrupt to arrive, the guest will be stuck on the same breakpoint RIP forever. * Normal single stepping is much more predictable, since it won't land the debugger into an interrupt handler. * RFLAGS.TF has less chance to be leaked to the guest: We set that flag behind the guest's back to do single stepping but if single step lands us into an interrupt/exception handler it will be leaked to the guest in the form of being pushed to the stack. This doesn't completely eliminate this problem as exceptions can still happen, but at least this reduces the chances of this happening. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210811122927.900604-6-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
Split the check for having a vmexit handler to svm_check_exit_valid, and make svm_handle_invalid_exit only handle a vmexit that is already not valid. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210811122927.900604-2-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Drop @shared from tdp_mmu_link_page() and hardcode it to work for mmu_lock being held for read. The helper has exactly one caller and in all likelihood will only ever have exactly one caller. Even if KVM adds a path to install translations without an initiating page fault, odds are very, very good that the path will just be a wrapper to the "page fault" handler (both SNP and TDX RFCs propose patches to do exactly that). No functional change intended. Cc: Ben Gardon <bgardon@google.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210810224554.2978735-3-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Mingwei Zhang 提交于
Existing KVM code tracks the number of large pages regardless of their sizes. Therefore, when large page of 1GB (or larger) is adopted, the information becomes less useful because lpages counts a mix of 1G and 2M pages. So remove the lpages since it is easy for user space to aggregate the info. Instead, provide a comprehensive page stats of all sizes from 4K to 512G. Suggested-by: NBen Gardon <bgardon@google.com> Reviewed-by: NDavid Matlack <dmatlack@google.com> Reviewed-by: NBen Gardon <bgardon@google.com> Signed-off-by: NMingwei Zhang <mizhang@google.com> Cc: Jing Zhang <jingzhangos@google.com> Cc: David Matlack <dmatlack@google.com> Cc: Sean Christopherson <seanjc@google.com> Message-Id: <20210803044607.599629-4-mizhang@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Factor in whether or not the old/new SPTEs are shadow-present when adjusting the large page stats in the TDP MMU. A modified MMIO SPTE can toggle the page size bit, as bit 7 is used to store the MMIO generation, i.e. is_large_pte() can get a false positive when called on a MMIO SPTE. Ditto for nuking SPTEs with REMOVED_SPTE, which sets bit 7 in its magic value. Opportunistically move the logic below the check to verify at least one of the old/new SPTEs is shadow present. Use is/was_leaf even though is/was_present would suffice. The code generation is roughly equivalent since all flags need to be computed prior to the code in question, and using the *_leaf flags will minimize the diff in a future enhancement to account all pages, i.e. will change the check to "is_leaf != was_leaf". Reviewed-by: NDavid Matlack <dmatlack@google.com> Reviewed-by: NBen Gardon <bgardon@google.com> Fixes: 1699f65c ("kvm/x86: Fix 'lpages' kvm stat for TDM MMU") Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NMingwei Zhang <mizhang@google.com> Message-Id: <20210803044607.599629-3-mizhang@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Mingwei Zhang 提交于
Drop an unnecessary is_shadow_present_pte() check when updating the rmaps after installing a non-MMIO SPTE. set_spte() is used only to create shadow-present SPTEs, e.g. MMIO SPTEs are handled early on, mmu_set_spte() runs with mmu_lock held for write, i.e. the SPTE can't be zapped between writing the SPTE and updating the rmaps. Opportunistically combine the "new SPTE" logic for large pages and rmaps. No functional change intended. Suggested-by: NBen Gardon <bgardon@google.com> Reviewed-by: NDavid Matlack <dmatlack@google.com> Reviewed-by: NBen Gardon <bgardon@google.com> Reviewed-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NMingwei Zhang <mizhang@google.com> Message-Id: <20210803044607.599629-2-mizhang@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jing Zhang 提交于
Add new types of KVM stats, linear and logarithmic histogram. Histogram are very useful for observing the value distribution of time or size related stats. Signed-off-by: NJing Zhang <jingzhangos@google.com> Message-Id: <20210802165633.1866976-2-jingzhangos@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
APIC base relocation is not supported anyway and won't work correctly so just drop the code that handles it and keep AVIC MMIO bar at the default APIC base. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210810205251.424103-17-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
Currently it is possible to have the following scenario: 1. AVIC is disabled by svm_refresh_apicv_exec_ctrl 2. svm_vcpu_blocking calls avic_vcpu_put which does nothing 3. svm_vcpu_unblocking enables the AVIC (due to KVM_REQ_APICV_UPDATE) and then calls avic_vcpu_load 4. warning is triggered in avic_vcpu_load since AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK was never cleared While it is possible to just remove the warning, it seems to be more robust to fully disable/enable AVIC in svm_refresh_apicv_exec_ctrl by calling the avic_vcpu_load/avic_vcpu_put Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210810205251.424103-16-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
No functional change intended. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210810205251.424103-15-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
Since AVIC can be inhibited and uninhibited rapidly it is possible that we have nothing to do by the time the svm_refresh_apicv_exec_ctrl is called. Detect and avoid this, which will be useful when we will start calling avic_vcpu_load/avic_vcpu_put when the avic inhibition state changes. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210810205251.424103-14-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
Now that kvm_request_apicv_update doesn't need to drop the kvm->srcu lock, we can call kvm_request_apicv_update directly. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20210810205251.424103-13-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
APICV_INHIBIT_REASON_HYPERV is currently unconditionally forced upon SynIC activation as SynIC's AutoEOI is incompatible with APICv/AVIC. It is, however, possible to track whether the feature was actually used by the guest and only inhibit APICv/AVIC when needed. TLFS suggests a dedicated 'HV_DEPRECATING_AEOI_RECOMMENDED' flag to let Windows know that AutoEOI feature should be avoided. While it's up to KVM userspace to set the flag, KVM can help a bit by exposing global APICv/AVIC enablement. Maxim: - always set HV_DEPRECATING_AEOI_RECOMMENDED in kvm_get_hv_cpuid, since this feature can be used regardless of AVIC Paolo: - use arch.apicv_update_lock to protect the hv->synic_auto_eoi_used instead of atomic ops Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210810205251.424103-12-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
It is never a good idea to enter a guest on a vCPU when the AVIC inhibition state doesn't match the enablement of the AVIC on the vCPU. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210810205251.424103-11-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
Currently on SVM, the kvm_request_apicv_update toggles the APICv memslot without doing any synchronization. If there is a mismatch between that memslot state and the AVIC state, on one of the vCPUs, an APIC mmio access can be lost: For example: VCPU0: enable the APIC_ACCESS_PAGE_PRIVATE_MEMSLOT VCPU1: access an APIC mmio register. Since AVIC is still disabled on VCPU1, the access will not be intercepted by it, and neither will it cause MMIO fault, but rather it will just be read/written from/to the dummy page mapped into the APIC_ACCESS_PAGE_PRIVATE_MEMSLOT. Fix that by adding a lock guarding the AVIC state changes, and carefully order the operations of kvm_request_apicv_update to avoid this race: 1. Take the lock 2. Send KVM_REQ_APICV_UPDATE 3. Update the apic inhibit reason 4. Release the lock This ensures that at (2) all vCPUs are kicked out of the guest mode, but don't yet see the new avic state. Then only after (4) all other vCPUs can update their AVIC state and resume. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210810205251.424103-10-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
Thanks to the former patches, it is now possible to keep the APICv memslot always enabled, and it will be invisible to the guest when it is inhibited This code is based on a suggestion from Sean Christopherson: https://lkml.org/lkml/2021/7/19/2970Suggested-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210810205251.424103-9-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
on AMD, APIC virtualization needs to dynamicaly inhibit the AVIC in a response to some events, and this is problematic and not efficient to do by enabling/disabling the memslot that covers APIC's mmio range. Plus due to SRCU locking, it makes it more complex to request AVIC inhibition. Instead, the APIC memslot will be always enabled, but be invisible to the guest, such as the MMU code will not install a SPTE for it, when it is inhibited and instead jump straight to emulating the access. When inhibiting the AVIC, this SPTE will be zapped. This code is based on a suggestion from Sean Christopherson: https://lkml.org/lkml/2021/7/19/2970Suggested-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210810205251.424103-8-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
This will allow it to return RET_PF_EMULATE for APIC mmio emulation. This code is based on a patch from Sean Christopherson: https://lkml.org/lkml/2021/7/19/2970Suggested-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20210810205251.424103-7-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
try_async_pf is a wrong name for this function, since this code is used when asynchronous page fault is not enabled as well. This code is based on a patch from Sean Christopherson: https://lkml.org/lkml/2021/7/19/2970Suggested-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20210810205251.424103-6-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-