- 07 5月, 2021 29 次提交
-
-
由 Nicholas Piggin 提交于
Commit b1c5356e ("KVM: PPC: Convert to the gfn-based MMU notifier callbacks") causes unmap_gfn_range and age_gfn callbacks to only work on the first gfn in the range. It also makes the aging callbacks call into both radix and hash aging functions for radix guests. Fix this. Add warnings for the single-gfn calls that have been converted to range callbacks, in case they ever receieve ranges greater than 1. Fixes: b1c5356e ("KVM: PPC: Convert to the gfn-based MMU notifier callbacks") Reported-by: NBharata B Rao <bharata@linux.ibm.com> Tested-by: NBharata B Rao <bharata@linux.ibm.com> Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Message-Id: <20210505121509.1470207-1-npiggin@gmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
If probing MSR_TSC_AUX failed, hide RDTSCP and RDPID, and WARN if either feature was reported as supported. In theory, such a scenario should never happen as both Intel and AMD state that MSR_TSC_AUX is available if RDTSCP or RDPID is supported. But, KVM injects #GP on MSR_TSC_AUX accesses if probing failed, faults on WRMSR(MSR_TSC_AUX) may be fatal to the guest (because they happen during early CPU bringup), and KVM itself has effectively misreported RDPID support in the past. Note, this also has the happy side effect of omitting MSR_TSC_AUX from the list of MSRs that are exposed to userspace if probing the MSR fails. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-16-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Squish the Intel and AMD emulation of MSR_TSC_AUX together and tie it to the guest CPU model instead of the host CPU behavior. While not strictly necessary to avoid guest breakage, emulating cross-vendor "architecture" will provide consistent behavior for the guest, e.g. WRMSR fault behavior won't change if the vCPU is migrated to a host with divergent behavior. Note, the "new" kvm_is_supported_user_return_msr() checks do not add new functionality on either SVM or VMX. On SVM, the equivalent was "tsc_aux_uret_slot < 0", and on VMX the check was buried in the vmx_find_uret_msr() call at the find_uret_msr label. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-15-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Now that SVM and VMX both probe MSRs before "defining" user return slots for them, consolidate the code for probe+define into common x86 and eliminate the odd behavior of having the vendor code define the slot for a given MSR. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-14-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Split out and export the number of configured user return MSRs so that VMX can iterate over the set of MSRs without having to do its own tracking. Keep the list itself internal to x86 so that vendor code still has to go through the "official" APIs to add/modify entries. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-13-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Tag TSX_CTRL as not needing to be loaded when RTM isn't supported in the host. Crushing the write mask to '0' has the same effect, but requires more mental gymnastics to understand. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-12-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Drop VMX's global list of user return MSRs now that VMX doesn't resort said list to isolate "active" MSRs, i.e. now that VMX's list and x86's list have the same MSRs in the same order. In addition to eliminating the redundant list, this will also allow moving more of the list management into common x86. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-11-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Explicitly flag a uret MSR as needing to be loaded into hardware instead of resorting the list of "active" MSRs and tracking how many MSRs in total need to be loaded. The only benefit to sorting the list is that the loop to load MSRs during vmx_prepare_switch_to_guest() doesn't need to iterate over all supported uret MRS, only those that are active. But that is a pointless optimization, as the most common case, running a 64-bit guest, will load the vast majority of MSRs. Not to mention that a single WRMSR is far more expensive than iterating over the list. Providing a stable list order obviates the need to track a given MSR's "slot" in the per-CPU list of user return MSRs; all lists simply use the same ordering. Future patches will take advantage of the stable order to further simplify the related code. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-10-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Configure the list of user return MSRs that are actually supported at module init instead of reprobing the list of possible MSRs every time a vCPU is created. Curating the list on a per-vCPU basis is pointless; KVM is completely hosed if the set of supported MSRs changes after module init, or if the set of MSRs differs per physical PCU. The per-vCPU lists also increase complexity (see __vmx_find_uret_msr()) and creates corner cases that _should_ be impossible, but theoretically exist in KVM, e.g. advertising RDTSCP to userspace without actually being able to virtualize RDTSCP if probing MSR_TSC_AUX fails. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-9-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Allow userspace to enable RDPID for a guest without also enabling RDTSCP. Aside from checking for RDPID support in the obvious flows, VMX also needs to set ENABLE_RDTSCP=1 when RDPID is exposed. For the record, there is no known scenario where enabling RDPID without RDTSCP is desirable. But, both AMD and Intel architectures allow for the condition, i.e. this is purely to make KVM more architecturally accurate. Fixes: 41cd02c6 ("kvm: x86: Expose RDPID in KVM_GET_SUPPORTED_CPUID") Cc: stable@vger.kernel.org Reported-by: NReiji Watanabe <reijiw@google.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-8-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Probe MSR_TSC_AUX whether or not RDTSCP is supported in the host, and if probing succeeds, load the guest's MSR_TSC_AUX into hardware prior to VMRUN. Because SVM doesn't support interception of RDPID, RDPID cannot be disallowed in the guest (without resorting to binary translation). Leaving the host's MSR_TSC_AUX in hardware would leak the host's value to the guest if RDTSCP is not supported. Note, there is also a kernel bug that prevents leaking the host's value. The host kernel initializes MSR_TSC_AUX if and only if RDTSCP is supported, even though the vDSO usage consumes MSR_TSC_AUX via RDPID. I.e. if RDTSCP is not supported, there is no host value to leak. But, if/when the host kernel bug is fixed, KVM would start leaking MSR_TSC_AUX in the case where hardware supports RDPID but RDTSCP is unavailable for whatever reason. Probing MSR_TSC_AUX will also allow consolidating the probe and define logic in common x86, and will make it simpler to condition the existence of MSR_TSX_AUX (from the guest's perspective) on RDTSCP *or* RDPID. Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-7-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Disable preemption when probing a user return MSR via RDSMR/WRMSR. If the MSR holds a different value per logical CPU, the WRMSR could corrupt the host's value if KVM is preempted between the RDMSR and WRMSR, and then rescheduled on a different CPU. Opportunistically land the helper in common x86, SVM will use the helper in a future commit. Fixes: 4be53410 ("KVM: VMX: Initialize vmx->guest_msrs[] right after allocation") Cc: stable@vger.kernel.org Cc: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-6-seanjc@google.com> Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Add a dedicated intercept enum for RDPID instead of piggybacking RDTSCP. Unlike VMX's ENABLE_RDTSCP, RDPID is not bound to SVM's RDTSCP intercept. Fixes: fb6d4d34 ("KVM: x86: emulate RDPID") Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-5-seanjc@google.com> Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Intercept RDTSCP to inject #UD if RDTSC is disabled in the guest. Note, SVM does not support intercepting RDPID. Unlike VMX's ENABLE_RDTSCP control, RDTSCP interception does not apply to RDPID. This is a benign virtualization hole as the host kernel (incorrectly) sets MSR_TSC_AUX if RDTSCP is supported, and KVM loads the guest's MSR_TSC_AUX into hardware if RDTSCP is supported in the host, i.e. KVM will not leak the host's MSR_TSC_AUX to the guest. But, when the kernel bug is fixed, KVM will start leaking the host's MSR_TSC_AUX if RDPID is supported in hardware, but RDTSCP isn't available for whatever reason. This leak will be remedied in a future commit. Fixes: 46896c73 ("KVM: svm: add support for RDTSCP") Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-4-seanjc@google.com> Reviewed-by: NJim Mattson <jmattson@google.com> Reviewed-by: NReiji Watanabe <reijiw@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Do not advertise emulation support for RDPID if RDTSCP is unsupported. RDPID emulation subtly relies on MSR_TSC_AUX to exist in hardware, as both vmx_get_msr() and svm_get_msr() will return an error if the MSR is unsupported, i.e. ctxt->ops->get_msr() will fail and the emulator will inject a #UD. Note, RDPID emulation also relies on RDTSCP being enabled in the guest, but this is a KVM bug and will eventually be fixed. Fixes: fb6d4d34 ("KVM: x86: emulate RDPID") Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-3-seanjc@google.com> Reviewed-by: NJim Mattson <jmattson@google.com> Reviewed-by: NReiji Watanabe <reijiw@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Clear KVM's RDPID capability if the ENABLE_RDTSCP secondary exec control is unsupported. Despite being enumerated in a separate CPUID flag, RDPID is bundled under the same VMCS control as RDTSCP and will #UD in VMX non-root if ENABLE_RDTSCP is not enabled. Fixes: 41cd02c6 ("kvm: x86: Expose RDPID in KVM_GET_SUPPORTED_CPUID") Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-2-seanjc@google.com> Reviewed-by: NJim Mattson <jmattson@google.com> Reviewed-by: NReiji Watanabe <reijiw@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
While in most cases, when returning to use the VMCB01, the exit reason stored in it will be SVM_EXIT_VMRUN, on first VM exit after a nested migration this field can contain anything since the VM entry did happen before the migration. Remove this warning to avoid the false positive. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210504143936.1644378-3-mlevitsk@redhat.com> Fixes: 9a7de6ec ("KVM: nSVM: If VMRUN is single-stepped, queue the #DB intercept in nested_svm_vmexit()") Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
While usually the L1's GIF is set while L2 runs, and usually migration nested state is loaded after a vCPU reset which also sets L1's GIF to true, this is not guaranteed. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210504143936.1644378-2-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Siddharth Chandrasekaran 提交于
In ioctl KVM_X86_SET_MSR_FILTER, input from user space is validated after a memdup_user(). For invalid inputs we'd memdup and then call kfree unnecessarily. Hoist input validation to avoid kfree altogether. Signed-off-by: NSiddharth Chandrasekaran <sidcha@amazon.de> Message-Id: <20210503122111.13775-1-sidcha@amazon.de> Reviewed-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Bill Wendling 提交于
Clang's integrated assembler does not allow symbols with non-absolute values to be reassigned. Modify the interrupt entry loop macro to be compatible with IAS by using a label and an offset. Cc: Jian Cai <caij2003@gmail.com> Signed-off-by: NBill Wendling <morbo@google.com> References: https://lore.kernel.org/lkml/20200714233024.1789985-1-caij2003@gmail.com/ Message-Id: <20201211012317.3722214-1-morbo@google.com> Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
Eliminate the probably unwanted hole in 'struct kvm_vmx_nested_state_hdr': Pre-patch: struct kvm_vmx_nested_state_hdr { __u64 vmxon_pa; /* 0 8 */ __u64 vmcs12_pa; /* 8 8 */ struct { __u16 flags; /* 16 2 */ } smm; /* 16 2 */ /* XXX 2 bytes hole, try to pack */ __u32 flags; /* 20 4 */ __u64 preemption_timer_deadline; /* 24 8 */ }; Post-patch: struct kvm_vmx_nested_state_hdr { __u64 vmxon_pa; /* 0 8 */ __u64 vmcs12_pa; /* 8 8 */ struct { __u16 flags; /* 16 2 */ } smm; /* 16 2 */ __u16 pad; /* 18 2 */ __u32 flags; /* 20 4 */ __u64 preemption_timer_deadline; /* 24 8 */ }; Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210503150854.1144255-3-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
Add a test for the regression, introduced by commit f2c7ef3b ("KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit"). When L2->L1 exit is forced immediately after restoring nested state, KVM_REQ_GET_NESTED_STATE_PAGES request is cleared and VMCS12 changes (e.g. fresh RIP) are not reflected to eVMCS. The consequent nested vCPU run gets broken. Utilize NMI injection to do the job. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210505151823.1341678-3-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
'run->exit_reason == KVM_EXIT_SHUTDOWN' check is not ideal as we may be getting some unexpected exception. Directly check for #UD instead. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210505151823.1341678-2-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
When enlightened VMCS is in use and nested state is migrated with vmx_get_nested_state()/vmx_set_nested_state() KVM can't map evmcs page right away: evmcs gpa is not 'struct kvm_vmx_nested_state_hdr' and we can't read it from VP assist page because userspace may decide to restore HV_X64_MSR_VP_ASSIST_PAGE after restoring nested state (and QEMU, for example, does exactly that). To make sure eVMCS is mapped /vmx_set_nested_state() raises KVM_REQ_GET_NESTED_STATE_PAGES request. Commit f2c7ef3b ("KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit") added KVM_REQ_GET_NESTED_STATE_PAGES clearing to nested_vmx_vmexit() to make sure MSR permission bitmap is not switched when an immediate exit from L2 to L1 happens right after migration (caused by a pending event, for example). Unfortunately, in the exact same situation we still need to have eVMCS mapped so nested_sync_vmcs12_to_shadow() reflects changes in VMCS12 to eVMCS. As a band-aid, restore nested_get_evmcs_page() when clearing KVM_REQ_GET_NESTED_STATE_PAGES in nested_vmx_vmexit(). The 'fix' is far from being ideal as we can't easily propagate possible failures and even if we could, this is most likely already too late to do so. The whole 'KVM_REQ_GET_NESTED_STATE_PAGES' idea for mapping eVMCS after migration seems to be fragile as we diverge too much from the 'native' path when vmptr loading happens on vmx_set_nested_state(). Fixes: f2c7ef3b ("KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit") Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210503150854.1144255-2-vkuznets@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Siddharth Chandrasekaran 提交于
The capability that exposes new ioctl KVM_X86_SET_MSR_FILTER to userspace is specified incorrectly as the ioctl itself (instead of KVM_CAP_X86_MSR_FILTER). This patch fixes it. Fixes: 1a155254 ("KVM: x86: Introduce MSR filtering") Reviewed-by: NAlexander Graf <graf@amazon.de> Signed-off-by: NSiddharth Chandrasekaran <sidcha@amazon.de> Message-Id: <20210503120059.9283-1-sidcha@amazon.de> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
Simplify the code by making PV features shutdown happen in one place. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210414123544.1060604-6-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
Crash shutdown handler only disables kvmclock and steal time, other PV features remain active so we risk corrupting memory or getting some side-effects in kdump kernel. Move crash handler to kvm.c and unify with CPU offline. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210414123544.1060604-5-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
Currenly, we disable kvmclock from machine_shutdown() hook and this only happens for boot CPU. We need to disable it for all CPUs to guard against memory corruption e.g. on restore from hibernate. Note, writing '0' to kvmclock MSR doesn't clear memory location, it just prevents hypervisor from updating the location so for the short while after write and while CPU is still alive, the clock remains usable and correct so we don't need to switch to some other clocksource. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210414123544.1060604-4-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
Various PV features (Async PF, PV EOI, steal time) work through memory shared with hypervisor and when we restore from hibernation we must properly teardown all these features to make sure hypervisor doesn't write to stale locations after we jump to the previously hibernated kernel (which can try to place anything there). For secondary CPUs the job is already done by kvm_cpu_down_prepare(), register syscore ops to do the same for boot CPU. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210414123544.1060604-3-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 05 5月, 2021 1 次提交
-
-
由 Vitaly Kuznetsov 提交于
'pr_fmt' already has 'kvm-guest: ' so 'KVM' prefix is redundant. "Unregister pv shared memory" is very ambiguous, it's hard to say which particular PV feature it relates to. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210414123544.1060604-2-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 03 5月, 2021 10 次提交
-
-
由 Colin Ian King 提交于
The fget can potentially return null, so the fput on the error return path can cause a null pointer dereference. Fix this by checking for a null source_kvm_file before doing a fput. Addresses-Coverity: ("Dereference null return") Fixes: 54526d1f ("KVM: x86: Support KVM VMs sharing SEV context") Signed-off-by: NColin Ian King <colin.king@canonical.com> Message-Id: <20210430170303.131924-1-colin.king@canonical.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Kai Huang 提交于
The function name of kdoc of __handle_changed_spte() should be itself, rather than handle_changed_spte(). Fix the typo. Signed-off-by: NKai Huang <kai.huang@intel.com> Message-Id: <20210503042446.154695-1-kai.huang@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
This allows the KVM to load the nested state more than once without warnings. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210503125446.1353307-4-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
* Define and use an invalid GPA (all ones) for init value of last and current nested vmcb physical addresses. * Reset the current vmcb12 gpa to the invalid value when leaving the nested mode, similar to what is done on nested vmexit. * Reset the last seen vmcb12 address when disabling the nested SVM, as it relies on vmcb02 fields which are freed at that point. Fixes: 4995a368 ("KVM: SVM: Use a separate vmcb for the nested L2 guest") Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210503125446.1353307-3-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
When forcibly leaving the nested mode, we should switch to vmcb01 Fixes: 4995a368 ("KVM: SVM: Use a separate vmcb for the nested L2 guest") Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210503125446.1353307-2-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Benjamin Segall 提交于
single_task_running() is usually more general than need_resched() but CFS_BANDWIDTH throttling will use resched_task() when there is just one task to get the task to block. This was causing long-need_resched warnings and was likely allowing VMs to overrun their quota when halt polling. Signed-off-by: NBen Segall <bsegall@google.com> Signed-off-by: NVenkatesh Srinivas <venkateshs@chromium.org> Message-Id: <20210429162233.116849-1-venkateshs@chromium.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Cc: stable@vger.kernel.org Reviewed-by: NJim Mattson <jmattson@google.com>
-
由 Wanpeng Li 提交于
Commit ee66e453 (KVM: lapic: Busy wait for timer to expire when using hv_timer) tries to set ktime->expired_tscdeadline by checking ktime->hv_timer_in_use since lapic timer oneshot/periodic modes which are emulated by vmx preemption timer also get advanced, they leverage the same vmx preemption timer logic with tsc-deadline mode. However, ktime->hv_timer_in_use is cleared before apic_timer_expired() handling, let's delay this clearing in preemption-disabled region. Fixes: ee66e453 ("KVM: lapic: Busy wait for timer to expire when using hv_timer") Reviewed-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Message-Id: <1619608082-4187-1-git-send-email-wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
Large pages not being created properly may result in increased memory access time. The 'lpages' kvm stat used to keep track of the current number of large pages in the system, but with TDP MMU enabled the stat is not showing the correct number. This patch extends the lpages counter to cover the TDP case. Signed-off-by: NMd Shahadat Hossain Shahin <shahinmd@amazon.de> Cc: Bartosz Szczepanek <bsz@amazon.de> Message-Id: <1619783551459.35424@amazon.de> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Kai Huang 提交于
In kvm_tdp_mmu_map(), while iterating TDP MMU page table entries, it is possible SPTE has already been frozen by another thread but the frozen is not done yet, for instance, when another thread is still in middle of zapping large page. In this case, the !is_shadow_present_pte() check for old SPTE in tdp_mmu_for_each_pte() may hit true, and in this case allocating new page table is unnecessary since tdp_mmu_set_spte_atomic() later will return false and page table will need to be freed. Add is_removed_spte() check before allocating new page table to avoid this. Signed-off-by: NKai Huang <kai.huang@intel.com> Message-Id: <20210429041226.50279-1-kai.huang@intel.com> Reviewed-by: NBen Gardon <bgardon@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux由 Linus Torvalds 提交于
Pull orangefs updates from Mike Marshall: "orangefs: implement orangefs_readahead mm/readahead.c/read_pages was quite a bit different back when I put my open-coded readahead logic into orangefs_readpage. That logic seemed to work as designed back then, it is a trainwreck now. This implements orangefs_readahead using the new xarray and readahead_expand features and removes all my open-coded readahead logic. This results in an extreme read performance improvement, these sample numbers are from my test VM: Here's an example of what's upstream in 5.11.8-200.fc33.x86_64: 30+0 records in 30+0 records out 125829120 bytes (126 MB, 120 MiB) copied, 5.77943 s, 21.8 MB/s And here's this version of orangefs_readahead on top of 5.12.0-rc4: 30+0 records in 30+0 records out 125829120 bytes (126 MB, 120 MiB) copied, 0.325919 s, 386 MB/s There are four xfstest regressions with this patch. David Howells and Matthew Wilcox have been helping me work with this code" * tag 'for-linus-5.13-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: orangefs: leave files in the page cache for a few micro seconds at least Orangef: implement orangefs_readahead.
-