- 14 4月, 2022 1 次提交
-
-
由 Like Xu 提交于
The header lapic.h is included more than once, remove one of them. Signed-off-by: NLike Xu <likexu@tencent.com> Message-Id: <20220406063715.55625-2-likexu@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 12 4月, 2022 2 次提交
-
-
由 Vitaly Kuznetsov 提交于
The following WARN is triggered from kvm_vm_ioctl_set_clock(): WARNING: CPU: 10 PID: 579353 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:3161 mark_page_dirty_in_slot+0x6c/0x80 [kvm] ... CPU: 10 PID: 579353 Comm: qemu-system-x86 Tainted: G W O 5.16.0.stable #20 Hardware name: LENOVO 20UF001CUS/20UF001CUS, BIOS R1CET65W(1.34 ) 06/17/2021 RIP: 0010:mark_page_dirty_in_slot+0x6c/0x80 [kvm] ... Call Trace: <TASK> ? kvm_write_guest+0x114/0x120 [kvm] kvm_hv_invalidate_tsc_page+0x9e/0xf0 [kvm] kvm_arch_vm_ioctl+0xa26/0xc50 [kvm] ? schedule+0x4e/0xc0 ? __cond_resched+0x1a/0x50 ? futex_wait+0x166/0x250 ? __send_signal+0x1f1/0x3d0 kvm_vm_ioctl+0x747/0xda0 [kvm] ... The WARN was introduced by commit 03c0304a86bc ("KVM: Warn if mark_page_dirty() is called without an active vCPU") but the change seems to be correct (unlike Hyper-V TSC page update mechanism). In fact, there's no real need to actually write to guest memory to invalidate TSC page, this can be done by the first vCPU which goes through kvm_guest_time_update(). Reported-by: NMaxim Levitsky <mlevitsk@redhat.com> Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org> Suggested-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220407201013.963226-1-vkuznets@redhat.com>
-
由 Suravee Suthikulpanit 提交于
Since current AVIC implementation cannot support encrypted memory, inhibit AVIC for SEV-enabled guest. Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20220408133710.54275-1-suravee.suthikulpanit@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 05 4月, 2022 3 次提交
-
-
由 Lv Ruyi 提交于
All work currently pending will be done first by calling destroy_workqueue, so there is unnecessary to flush it explicitly. Reported-by: NZeal Robot <zealci@zte.com.cn> Signed-off-by: NLv Ruyi <lv.ruyi@zte.com.cn> Reviewed-by: NSean Christopherson <seanjc@google.com> Message-Id: <20220401083530.2407703-1-lv.ruyi@zte.com.cn> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Resolve nx_huge_pages to true/false when kvm.ko is loaded, leaving it as -1 is technically undefined behavior when its value is read out by param_get_bool(), as boolean values are supposed to be '0' or '1'. Alternatively, KVM could define a custom getter for the param, but the auto value doesn't depend on the vendor module in any way, and printing "auto" would be unnecessarily unfriendly to the user. In addition to fixing the undefined behavior, resolving the auto value also fixes the scenario where the auto value resolves to N and no vendor module is loaded. Previously, -1 would result in Y being printed even though KVM would ultimately disable the mitigation. Rename the existing MMU module init/exit helpers to clarify that they're invoked with respect to the vendor module, and add comments to document why KVM has two separate "module init" flows. ========================================================================= UBSAN: invalid-load in kernel/params.c:320:33 load of value 255 is not a valid value for type '_Bool' CPU: 6 PID: 892 Comm: tail Not tainted 5.17.0-rc3+ #799 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 ubsan_epilogue+0x5/0x40 __ubsan_handle_load_invalid_value.cold+0x43/0x48 param_get_bool.cold+0xf/0x14 param_attr_show+0x55/0x80 module_attr_show+0x1c/0x30 sysfs_kf_seq_show+0x93/0xc0 seq_read_iter+0x11c/0x450 new_sync_read+0x11b/0x1a0 vfs_read+0xf0/0x190 ksys_read+0x5f/0xe0 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> ========================================================================= Fixes: b8e8c830 ("kvm: mmu: ITLB_MULTIHIT mitigation") Cc: stable@vger.kernel.org Reported-by: NBruno Goncalves <bgoncalv@redhat.com> Reported-by: NJan Stancek <jstancek@redhat.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20220331221359.3912754-1-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Peter Gonda 提交于
Add resched to avoid warning from sev_clflush_pages() with large number of pages. Signed-off-by: NPeter Gonda <pgonda@google.com> Cc: Sean Christopherson <seanjc@google.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Message-Id: <20220330164306.2376085-1-pgonda@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 02 4月, 2022 34 次提交
-
-
由 Hou Wenlong 提交于
Before Commit c3e5e415 ("KVM: X86: Change kvm_sync_page() to return true when remote flush is needed"), the return value of kvm_sync_page() indicates whether the page is synced, and kvm_mmu_get_page() would rebuild page when the sync fails. But now, kvm_sync_page() returns false when the page is synced and no tlb flushing is required, which leads to rebuild page in kvm_mmu_get_page(). So return the return value of mmu->sync_page() directly and check it in kvm_mmu_get_page(). If the sync fails, the page will be zapped and the invalid_list is not empty, so set flush as true is accepted in mmu_sync_children(). Cc: stable@vger.kernel.org Fixes: c3e5e415 ("KVM: X86: Change kvm_sync_page() to return true when remote flush is needed") Signed-off-by: NHou Wenlong <houwenlong.hwl@antgroup.com> Acked-by: NLai Jiangshan <jiangshanlai@gmail.com> Message-Id: <0dabeeb789f57b0d793f85d073893063e692032d.1647336064.git.houwenlong.hwl@antgroup.com> [mmu_sync_children should not flush if the page is zapped. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jon Kohler 提交于
kvm_load_{guest|host}_xsave_state handles xsave on vm entry and exit, part of which is managing memory protection key state. The latest arch.pkru is updated with a rdpkru, and if that doesn't match the base host_pkru (which about 70% of the time), we issue a __write_pkru. To improve performance, implement the following optimizations: 1. Reorder if conditions prior to wrpkru in both kvm_load_{guest|host}_xsave_state. Flip the ordering of the || condition so that XFEATURE_MASK_PKRU is checked first, which when instrumented in our environment appeared to be always true and less overall work than kvm_read_cr4_bits. For kvm_load_guest_xsave_state, hoist arch.pkru != host_pkru ahead one position. When instrumented, I saw this be true roughly ~70% of the time vs the other conditions which were almost always true. With this change, we will avoid 3rd condition check ~30% of the time. 2. Wrap PKU sections with CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS, as if the user compiles out this feature, we should not have these branches at all. Signed-off-by: NJon Kohler <jon@nutanix.com> Message-Id: <20220324004439.6709-1-jon@nutanix.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
Inhibit the AVIC of the vCPU that is running nested for the duration of the nested run, so that all interrupts arriving from both its vCPU siblings and from KVM are delivered using normal IPIs and cause that vCPU to vmexit. Note that unlike normal AVIC inhibition, there is no need to update the AVIC mmio memslot, because the nested guest uses its own set of paging tables. That also means that AVIC doesn't need to be inhibited VM wide. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220322174050.241850-7-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
Add optional callback .vcpu_get_apicv_inhibit_reasons returning extra inhibit reasons that prevent APICv from working on this vCPU. Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220322174050.241850-6-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
In case L1 enables vGIF for L2, the L2 cannot affect L1's GIF, regardless of STGI/CLGI intercepts, and since VM entry enables GIF, this means that L1's GIF is always 1 while L2 is running. Thus in this case leave L1's vGIF in vmcb01, while letting L2 control the vGIF thus implementing nested vGIF. Also allow KVM to toggle L1's GIF during nested entry/exit by always using vmcb01. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220322174050.241850-5-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
Expose the pause filtering and threshold in the guest CPUID and support PAUSE filtering when possible: - If the L0 doesn't intercept PAUSE (cpu_pm=on), then allow L1 to have full control over PAUSE filtering. - if the L1 doesn't intercept PAUSE, use host values and update the adaptive count/threshold even when running nested. - Otherwise always exit to L1; it is not really possible to merge the fields correctly. It is expected that in this case, userspace will not enable this feature in the guest CPUID, to avoid having the guest update both fields pointlessly. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220322174050.241850-4-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
This was tested with kvm-unit-test that was developed for this purpose. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220322174050.241850-3-mlevitsk@redhat.com> [Copy all of DEBUGCTL except for reserved bits. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
When L2 is running without LBR virtualization, we should ensure that L1's LBR msrs continue to update as usual. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220322174050.241850-2-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
KVM always uses vgif when allowed, thus there is no need to query current vmcb for it Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220322172449.235575-9-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
This makes the code a bit shorter and cleaner. No functional change intended. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220322172449.235575-4-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
Clarify that this function is not used to initialize any part of the vmcb02. No functional change intended. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Don't snapshot tsc_khz into max_tsc_khz during KVM initialization if the host TSC is constant, in which case the actual TSC frequency will never change and thus capturing the "max" TSC during initialization is unnecessary, KVM can simply use tsc_khz during VM creation. On CPUs with constant TSC, but not a hardware-specified TSC frequency, snapshotting max_tsc_khz and using that to set a VM's default TSC frequency can lead to KVM thinking it needs to manually scale the guest's TSC if refining the TSC completes after KVM snapshots tsc_khz. The actual frequency never changes, only the kernel's calculation of what that frequency is changes. On systems without hardware TSC scaling, this either puts KVM into "always catchup" mode (extremely inefficient), or prevents creating VMs altogether. Ideally, KVM would not be able to race with TSC refinement, or would have a hook into tsc_refine_calibration_work() to get an alert when refinement is complete. Avoiding the race altogether isn't practical as refinement takes a relative eternity; it's deliberately put on a work queue outside of the normal boot sequence to avoid unnecessarily delaying boot. Adding a hook is doable, but somewhat gross due to KVM's ability to be built as a module. And if the TSC is constant, which is likely the case for every VMX/SVM-capable CPU produced in the last decade, the race can be hit if and only if userspace is able to create a VM before TSC refinement completes; refinement is slow, but not that slow. For now, punt on a proper fix, as not taking a snapshot can help some uses cases and not taking a snapshot is arguably correct irrespective of the race with refinement. [ dwmw2: Rebase on top of KVM-wide default_tsc_khz to ensure that all vCPUs get the same frequency even if we hit the race. ] Cc: Suleiman Souhlal <suleiman@google.com> Cc: Anton Romanov <romanton@google.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Message-Id: <20220225145304.36166-3-dwmw2@infradead.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Woodhouse 提交于
This sets the default TSC frequency for subsequently created vCPUs. Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Message-Id: <20220225145304.36166-2-dwmw2@infradead.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Like Xu 提交于
The [clang-analyzer-deadcode.DeadStores] helper reports that the value stored to 'irq' is never read. Signed-off-by: NLike Xu <likexu@tencent.com> Message-Id: <20220301120217.38092-1-likexu@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Zeng Guang 提交于
Currently KVM setup posted interrupt VMCS only depending on per-vcpu APICv activation status at the vCPU creation time. However, this status can be toggled dynamically under some circumstance. So potentially, later posted interrupt enabling may be problematic without VMCS readiness. To fix this, always settle the VMCS setting for posted interrupt as long as APICv is available and lapic locates in kernel. Signed-off-by: NZeng Guang <guang.zeng@intel.com> Message-Id: <20220315145836.9910-1-guang.zeng@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Boris Ostrovsky 提交于
Add support for SCHEDOP_poll hypercall. This implementation is optimized for polling for a single channel, which is what Linux does. Polling for multiple channels is not especially efficient (and has not been tested). PV spinlocks slow path uses this hypercall, and explicitly crash if it's not supported. [ dwmw2: Rework to use kvm_vcpu_halt(), not supported for 32-bit guests ] Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20220303154127.202856-17-dwmw2@infradead.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Woodhouse 提交于
At the end of the patch series adding this batch of event channel acceleration features, finally add the feature bit which advertises them and document it all. For SCHEDOP_poll we need to wake a polling vCPU when a given port is triggered, even when it's masked — and we want to implement that in the kernel, for efficiency. So we want the kernel to know that it has sole ownership of event channel delivery. Thus, we allow userspace to make the 'promise' by setting the corresponding feature bit in its KVM_XEN_HVM_CONFIG call. As we implement SCHEDOP_poll bypass later, we will do so only if that promise has been made by userspace. Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20220303154127.202856-16-dwmw2@infradead.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Woodhouse 提交于
Windows uses a per-vCPU vector, and it's delivered via the local APIC basically like an MSI (with associated EOI) unlike the traditional guest-wide vector which is just magically asserted by Xen (and in the KVM case by kvm_xen_has_interrupt() / kvm_cpu_get_extint()). Now that the kernel is able to raise event channel events for itself, being able to do so for Windows guests is also going to be useful. Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20220303154127.202856-15-dwmw2@infradead.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Woodhouse 提交于
Turns out this is a fast path for PV guests because they use it to trigger the event channel upcall. So letting it bounce all the way up to userspace is not great. Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20220303154127.202856-14-dwmw2@infradead.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Joao Martins 提交于
If the guest has offloaded the timer virq, handle the following hypercalls for programming the timer: VCPUOP_set_singleshot_timer VCPUOP_stop_singleshot_timer set_timer_op(timestamp_ns) The event channel corresponding to the timer virq is then used to inject events once timer deadlines are met. For now we back the PV timer with hrtimer. [ dwmw2: Add save/restore, 32-bit compat mode, immediate delivery, don't check timer in kvm_vcpu_has_event() ] Signed-off-by: NJoao Martins <joao.m.martins@oracle.com> Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20220303154127.202856-13-dwmw2@infradead.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Woodhouse 提交于
In order to intercept hypercalls such as VCPUOP_set_singleshot_timer, we need to be aware of the Xen CPU numbering. This looks a lot like the Hyper-V handling of vpidx, for obvious reasons. Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20220303154127.202856-12-dwmw2@infradead.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Joao Martins 提交于
Cooperative Linux guests after an IPI-many may yield vcpu if any of the IPI'd vcpus were preempted (i.e. runstate is 'runnable'.) Support SCHEDOP_yield for handling yield. Signed-off-by: NJoao Martins <joao.m.martins@oracle.com> Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20220303154127.202856-11-dwmw2@infradead.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Joao Martins 提交于
Userspace registers a sending @port to either deliver to an @eventfd or directly back to a local event channel port. After binding events the guest or host may wish to bind those events to a particular vcpu. This is usually done for unbound and and interdomain events. Update requests are handled via the KVM_XEN_EVTCHN_UPDATE flag. Unregistered ports are handled by the emulator. Co-developed-by: NAnkur Arora <ankur.a.arora@oracle.com> Co-developed-By: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NJoao Martins <joao.m.martins@oracle.com> Signed-off-by: NAnkur Arora <ankur.a.arora@oracle.com> Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20220303154127.202856-10-dwmw2@infradead.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Woodhouse 提交于
This adds a KVM_XEN_HVM_EVTCHN_SEND ioctl which allows direct injection of events given an explicit { vcpu, port, priority } in precisely the same form that those fields are given in the IRQ routing table. Userspace is currently able to inject 2-level events purely by setting the bits in the shared_info and vcpu_info, but FIFO event channels are harder to deal with; we will need the kernel to take sole ownership of delivery when we support those. A patch advertising this feature with a new bit in the KVM_CAP_XEN_HVM ioctl will be added in a subsequent patch. Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20220303154127.202856-9-dwmw2@infradead.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Woodhouse 提交于
Clean it up to return -errno on error consistently, while still being compatible with the return conventions for kvm_arch_set_irq_inatomic() and the kvm_set_irq() callback. We use -ENOTCONN to indicate when the port is masked. No existing users care, except that it's negative. Also allow it to optimise the vCPU lookup. Unless we abuse the lapic map, there is no quick lookup from APIC ID to a vCPU; the logic in kvm_get_vcpu_by_id() will just iterate over all vCPUs till it finds the one it wants. So do that just once and stash the result in the struct kvm_xen_evtchn for next time. Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20220303154127.202856-8-dwmw2@infradead.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Woodhouse 提交于
This switches the final pvclock to kvm_setup_pvclock_pfncache() and now the old kvm_setup_pvclock_page() can be removed. Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20220303154127.202856-7-dwmw2@infradead.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Woodhouse 提交于
Currently, the fast path of kvm_xen_set_evtchn_fast() doesn't set the index bits in the target vCPU's evtchn_pending_sel, because it only has a userspace virtual address with which to do so. It just sets them in the kernel, and kvm_xen_has_interrupt() then completes the delivery to the actual vcpu_info structure when the vCPU runs. Using a gfn_to_pfn_cache allows kvm_xen_set_evtchn_fast() to do the full delivery in the common case. Clean up the fallback case too, by moving the deferred delivery out into a separate kvm_xen_inject_pending_events() function which isn't ever called in atomic contexts as __kvm_xen_has_interrupt() is. Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20220303154127.202856-6-dwmw2@infradead.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Woodhouse 提交于
Add a new kvm_setup_guest_pvclock() which parallels the existing kvm_setup_pvclock_page(). The latter will be removed once we convert all users to the gfn_to_pfn_cache version. Using the new cache, we can potentially let kvm_set_guest_paused() set the PVCLOCK_GUEST_STOPPED bit directly rather than having to delegate to the vCPU via KVM_REQ_CLOCK_UPDATE. But not yet. Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20220303154127.202856-5-dwmw2@infradead.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Woodhouse 提交于
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20220303154127.202856-4-dwmw2@infradead.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
Use a dummy unused vmexit reason to mark the 'VM exit' that is happening when kvm exits to handle SMM, which is not a real VM exit. This makes it a bit easier to read the KVM trace, and avoids other potential problems due to a stale vmexit reason in the vmcb. If SVM_EXIT_SW somehow reaches svm_invoke_exit_handler(), instead, svm_check_exit_valid() will return false and a WARN will be logged. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220301135526.136554-2-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
Apparently on some systems AVIC is disabled in CPUID but still usable. Allow the user to override the CPUID if the user is willing to take the risk. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220301143650.143749-7-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
This was tested by booting L1,L2,L3 (all Linux) and checking that no VMLOAD/VMSAVE vmexits happened. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220301143650.143749-4-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Oliver Upton 提交于
KVM handles the VMCALL/VMMCALL instructions very strangely. Even though both of these instructions really should #UD when executed on the wrong vendor's hardware (i.e. VMCALL on SVM, VMMCALL on VMX), KVM replaces the guest's instruction with the appropriate instruction for the vendor. Nonetheless, older guest kernels without commit c1118b36 ("x86: kvm: use alternatives for VMCALL vs. VMMCALL if kernel text is read-only") do not patch in the appropriate instruction using alternatives, likely motivating KVM's intervention. Add a quirk allowing userspace to opt out of hypercall patching. If the quirk is disabled, KVM synthesizes a #UD in the guest. Signed-off-by: NOliver Upton <oupton@google.com> Message-Id: <20220316005538.2282772-2-oupton@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
FNAME(cmpxchg_gpte) is an inefficient mess. It is at least decent if it can go through get_user_pages_fast(), but if it cannot then it tries to use memremap(); that is not just terribly slow, it is also wrong because it assumes that the VM_PFNMAP VMA is contiguous. The right way to do it would be to do the same thing as hva_to_pfn_remapped() does since commit add6a0cd ("KVM: MMU: try to fix up page faults before giving up", 2016-07-05), using follow_pte() and fixup_user_fault() to determine the correct address to use for memremap(). To do this, one could for example extract hva_to_pfn() for use outside virt/kvm/kvm_main.c. But really there is no reason to do that either, because there is already a perfectly valid address to do the cmpxchg() on, only it is a userspace address. That means doing user_access_begin()/user_access_end() and writing the code in assembly to handle exceptions correctly. Worse, the guest PTE can be 8-byte even on i686 so there is the extra complication of using cmpxchg8b to account for. But at least it is an efficient mess. (Thanks to Linus for suggesting improvement on the inline assembly). Reported-by: NQiuhao Li <qiuhao@sysec.org> Reported-by: NGaoning Pan <pgn@zju.edu.cn> Reported-by: NYongkang Jia <kangel@zju.edu.cn> Reported-by: syzbot+6cde2282daa792c49ab8@syzkaller.appspotmail.com Debugged-by: NTadeusz Struk <tadeusz.struk@linaro.org> Tested-by: NMaxim Levitsky <mlevitsk@redhat.com> Cc: stable@vger.kernel.org Fixes: bd53cb35 ("X86/KVM: Handle PFNs outside of kernel reach when touching GPTEs") Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-