- 04 10月, 2019 1 次提交
-
-
由 Paolo Bonzini 提交于
INTEL_PMC_MAX_GENERIC is currently 32, which exceeds the 18 contiguous MSR indices reserved by Intel for event selectors. Since some machines actually have MSRs past the reserved range, filtering them against x86_pmu.num_counters_gp may have false positives. Cut the list to 18 entries to avoid this. Reported-by: NVitaly Kuznetsov <vkuznets@redhat.com> Suggested-by: NVitaly Kuznetsov <vkuznets@redhat.com> Cc: Jim Mattson <jamttson@google.com> Fixes: e2ada66e ("kvm: x86: Add Intel PMU MSRs to msrs_to_save[]", 2019-08-21) Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 03 10月, 2019 1 次提交
-
-
由 Paolo Bonzini 提交于
INTEL_PMC_MAX_GENERIC is currently 32, which exceeds the 18 contiguous MSR indices reserved by Intel for event selectors. Since some machines actually have MSRs past the reserved range, these may survive the filtering of msrs_to_save array and would be rejected by KVM_GET/SET_MSR. To avoid this, cut the list to whatever CPUID reports for the host's architectural PMU. Reported-by: NVitaly Kuznetsov <vkuznets@redhat.com> Suggested-by: NVitaly Kuznetsov <vkuznets@redhat.com> Cc: Jim Mattson <jmattson@google.com> Fixes: e2ada66e ("kvm: x86: Add Intel PMU MSRs to msrs_to_save[]", 2019-08-21) Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 01 10月, 2019 1 次提交
-
-
由 Paolo Bonzini 提交于
The largepages debugfs entry is incremented/decremented as shadow pages are created or destroyed. Clearing it will result in an underflow, which is harmless to KVM but ugly (and could be misinterpreted by tools that use debugfs information), so make this particular statistic read-only. Cc: kvm-ppc@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 26 9月, 2019 1 次提交
-
-
由 Wanpeng Li 提交于
Reported by syzkaller: WARNING: CPU: 0 PID: 6544 at /home/kernel/data/kvm/arch/x86/kvm//vmx/vmx.c:4689 handle_desc+0x37/0x40 [kvm_intel] CPU: 0 PID: 6544 Comm: a.out Tainted: G OE 5.3.0-rc4+ #4 RIP: 0010:handle_desc+0x37/0x40 [kvm_intel] Call Trace: vmx_handle_exit+0xbe/0x6b0 [kvm_intel] vcpu_enter_guest+0x4dc/0x18d0 [kvm] kvm_arch_vcpu_ioctl_run+0x407/0x660 [kvm] kvm_vcpu_ioctl+0x3ad/0x690 [kvm] do_vfs_ioctl+0xa2/0x690 ksys_ioctl+0x6d/0x80 __x64_sys_ioctl+0x1a/0x20 do_syscall_64+0x74/0x720 entry_SYSCALL_64_after_hwframe+0x49/0xbe When CR4.UMIP is set, guest should have UMIP cpuid flag. Current kvm set_sregs function doesn't have such check when userspace inputs sregs values. SECONDARY_EXEC_DESC is enabled on writes to CR4.UMIP in vmx_set_cr4 though guest doesn't have UMIP cpuid flag. The testcast triggers handle_desc warning when executing ltr instruction since guest architectural CR4 doesn't set UMIP. This patch fixes it by adding valid CR4 and CPUID combination checking in __set_sregs. syzkaller source: https://syzkaller.appspot.com/x/repro.c?x=138efb99600000 Reported-by: syzbot+0f1819555fbdce992df9@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 25 9月, 2019 1 次提交
-
-
由 Sean Christopherson 提交于
Explicitly check kvm_rebooting in kvm_spurious_fault() prior to invoking BUG(), as opposed to assuming the caller has already done so. Letting kvm_spurious_fault() be called "directly" will allow VMX to better optimize its low level assembly flows. As a happy side effect, kvm_spurious_fault() no longer needs to be marked as a dead end since it doesn't unconditionally BUG(). Acked-by: NPaolo Bonzini <pbonzini@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 24 9月, 2019 17 次提交
-
-
由 Sean Christopherson 提交于
Now that the fast invalidate mechanism has been reintroduced, restore the performance tweaks for fast invalidation that existed prior to its removal. Paraphrasing the original changelog (commit 5ff05683 was itself a partial revert): Don't force reloading the remote mmu when zapping an obsolete page, as a MMU_RELOAD request has already been issued by kvm_mmu_zap_all_fast() immediately after incrementing mmu_valid_gen, i.e. after marking pages obsolete. This reverts commit 5ff05683. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Tao Xu 提交于
UMWAIT and TPAUSE instructions use 32bit IA32_UMWAIT_CONTROL at MSR index E1H to determines the maximum time in TSC-quanta that the processor can reside in either C0.1 or C0.2. This patch emulates MSR IA32_UMWAIT_CONTROL in guest and differentiate IA32_UMWAIT_CONTROL between host and guest. The variable mwait_control_cached in arch/x86/kernel/cpu/umwait.c caches the MSR value, so this patch uses it to avoid frequently rdmsr of IA32_UMWAIT_CONTROL. Co-developed-by: NJingqi Liu <jingqi.liu@intel.com> Signed-off-by: NJingqi Liu <jingqi.liu@intel.com> Signed-off-by: NTao Xu <tao3.xu@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
VMX's EPT misconfig flow to handle fast-MMIO path falls back to decoding the instruction to determine the instruction length when running as a guest (Hyper-V doesn't fill VMCS.VM_EXIT_INSTRUCTION_LEN because it's technically not defined for EPT misconfigs). Rather than implement the slow skip in VMX's generic skip_emulated_instruction(), handle_ept_misconfig() directly calls kvm_emulate_instruction() with EMULTYPE_SKIP, which intentionally doesn't do single-step detection, and so handle_ept_misconfig() misses a single-step #DB. Rework the EPT misconfig fallback case to route it through kvm_skip_emulated_instruction() so that single-step #DBs and interrupt shadow updates are handled automatically. I.e. make VMX's slow skip logic match SVM's and have the SVM flow not intentionally avoid the shadow update. Alternatively, the handle_ept_misconfig() could manually handle single- step detection, but that results in EMULTYPE_SKIP having split logic for the interrupt shadow vs. single-step #DBs, and split emulator logic is largely what led to this mess in the first place. Modifying SVM to mirror VMX flow isn't really an option as SVM's case isn't limited to a specific exit reason, i.e. handling the slow skip in skip_emulated_instruction() is mandatory for all intents and purposes. Drop VMX's skip_emulated_instruction() wrapper since it can now fail, and instead WARN if it fails unexpectedly, e.g. if exit_reason somehow becomes corrupted. Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Fixes: d391f120 ("x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested") Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Deferring emulation failure handling (in some cases) to the caller of x86_emulate_instruction() has proven fragile, e.g. multiple instances of KVM not setting run->exit_reason on EMULATE_FAIL, largely due to it being difficult to discern what emulation types can return what result, and which combination of types and results are handled where. Now that x86_emulate_instruction() always handles emulation failure, i.e. EMULATION_FAIL is only referenced in callers, remove the emulation_result enums entirely. Per KVM's existing exit handling conventions, return '0' and '1' for "exit to userspace" and "resume guest" respectively. Doing so cleans up many callers, e.g. they can return kvm_emulate_instruction() directly instead of having to interpret its result. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Request triple fault in kvm_inject_realmode_interrupt() instead of returning EMULATE_FAIL and deferring to the caller. All existing callers request triple fault and it's highly unlikely Real Mode is going to acquire new features. While this consolidates a small amount of code, the real goal is to remove the last reference to EMULATE_FAIL. No functional change intended. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Consolidate the reporting of emulation failure into kvm_task_switch() so that it can return EMULATE_USER_EXIT. This helps pave the way for removing EMULATE_FAIL altogether. This also fixes a theoretical bug where task switch interception could suppress an EMULATE_USER_EXIT return. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Kill a few birds with one stone by forcing an exit to userspace on skip emulation failure. This removes a reference to EMULATE_FAIL, fixes a bug in handle_ept_misconfig() where it would exit to userspace without setting run->exit_reason, and fixes a theoretical bug in SVM's task_switch_interception() where it would overwrite run->exit_reason on a return of EMULATE_USER_EXIT. Note, this technically doesn't fully fix task_switch_interception() as it now incorrectly handles EMULATE_FAIL, but in practice there is no bug as EMULATE_FAIL will never be returned for EMULTYPE_SKIP. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Immediately inject a #UD and return EMULATE done if emulation fails when handling an intercepted #UD. This helps pave the way for removing EMULATE_FAIL altogether. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Add an explicit emulation type for forced #UD emulation and use it to detect that KVM should unconditionally inject a #UD instead of falling into its standard emulation failure handling. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Immediately inject a #GP when VMware emulation fails and return EMULATE_DONE instead of propagating EMULATE_FAIL up the stack. This helps pave the way for removing EMULATE_FAIL altogether. Rename EMULTYPE_VMWARE to EMULTYPE_VMWARE_GP to document that the x86 emulator is called to handle VMware #GP interception, e.g. why a #GP is injected on emulation failure for EMULTYPE_VMWARE_GP. Drop EMULTYPE_NO_UD_ON_FAIL as a standalone type. The "no #UD on fail" is used only in the VMWare case and is obsoleted by having the emulator itself reinject #GP. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: NLiran Alon <liran.alon@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Return the single-step emulation result directly instead of via an out param. Presumably at some point in the past kvm_vcpu_do_singlestep() could be called with *r==EMULATE_USER_EXIT, but that is no longer the case, i.e. all callers are happy to overwrite their own return variable. Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: NLiran Alon <liran.alon@oracle.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
When handling emulation failure, return the emulation result directly instead of capturing it in a local variable. Future patches will move additional cases into handle_emulation_failure(), clean up the cruft before so there isn't an ugly mix of setting a local variable and returning directly. Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: NLiran Alon <liran.alon@oracle.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Move the stat.mmio_exits update into x86_emulate_instruction(). This is both a bug fix, e.g. the current update flows will incorrectly increment mmio_exits on emulation failure, and a preparatory change to set the stage for eliminating EMULATE_DONE and company. Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Zapping collapsible sptes, a.k.a. 4k sptes that can be promoted into a large page, is only necessary when changing only the dirty logging flag of a memory region. If the memslot is also being moved, then all sptes for the memslot are zapped when it is invalidated. When a memslot is being created, it is impossible for there to be existing dirty mappings, e.g. KVM can have MMIO sptes, but not present, and thus dirty, sptes. Note, the comment and logic are shamelessly borrowed from MIPS's version of kvm_arch_commit_memory_region(). Fixes: 3ea3b7fa ("kvm: mmu: lazy collapse small sptes into large sptes") Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
It was discovered that after commit 65efa61d ("selftests: kvm: provide common function to enable eVMCS") hyperv_cpuid selftest is failing on AMD. The reason is that the commit changed _vcpu_ioctl() to vcpu_ioctl() in the test and this one can't fail. Instead of fixing the test is seems to make more sense to not announce KVM_CAP_HYPERV_ENLIGHTENED_VMCS support if it is definitely missing (on svm and in case kvm_intel.nested=0). Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Tianyu Lan 提交于
Hyper-V direct tlb flush function should be enabled for guest that only uses Hyper-V hypercall. User space hypervisor(e.g, Qemu) can disable KVM identification in CPUID and just exposes Hyper-V identification to make sure the precondition. Add new KVM capability KVM_CAP_ HYPERV_DIRECT_TLBFLUSH for user space to enable Hyper-V direct tlb function and this function is default to be disabled in KVM. Signed-off-by: NTianyu Lan <Tianyu.Lan@microsoft.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jim Mattson 提交于
These MSRs should be enumerated by KVM_GET_MSR_INDEX_LIST, so that userspace knows that these MSRs may be part of the vCPU state. Signed-off-by: NJim Mattson <jmattson@google.com> Reviewed-by: NEric Hankland <ehankland@google.com> Reviewed-by: NPeter Shier <pshier@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 14 9月, 2019 1 次提交
-
-
由 Fuqian Huang 提交于
Emulation of VMPTRST can incorrectly inject a page fault when passed an operand that points to an MMIO address. The page fault will use uninitialized kernel stack memory as the CR2 and error code. The right behavior would be to abort the VM with a KVM_EXIT_INTERNAL_ERROR exit to userspace; however, it is not an easy fix, so for now just ensure that the error code and CR2 are zero. Signed-off-by: NFuqian Huang <huangfq.daxian@gmail.com> Cc: stable@vger.kernel.org [add comment] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 12 9月, 2019 1 次提交
-
-
由 Jan Dakinevich 提交于
x86_emulate_instruction() takes into account ctxt->have_exception flag during instruction decoding, but in practice this flag is never set in x86_decode_insn(). Fixes: 6ea6e843 ("KVM: x86: inject exceptions produced by x86_decode_insn") Cc: stable@vger.kernel.org Cc: Denis Lunev <den@virtuozzo.com> Cc: Roman Kagan <rkagan@virtuozzo.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Signed-off-by: NJan Dakinevich <jan.dakinevich@virtuozzo.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 11 9月, 2019 5 次提交
-
-
由 Jan Dakinevich 提交于
inject_emulated_exception() returns true if and only if nested page fault happens. However, page fault can come from guest page tables walk, either nested or not nested. In both cases we should stop an attempt to read under RIP and give guest to step over its own page fault handler. This is also visible when an emulated instruction causes a #GP fault and the VMware backdoor is enabled. To handle the VMware backdoor, KVM intercepts #GP faults; with only the next patch applied, x86_emulate_instruction() injects a #GP but returns EMULATE_FAIL instead of EMULATE_DONE. EMULATE_FAIL causes handle_exception_nmi() (or gp_interception() for SVM) to re-inject the original #GP because it thinks emulation failed due to a non-VMware opcode. This patch prevents the issue as x86_emulate_instruction() will return EMULATE_DONE after injecting the #GP. Fixes: 6ea6e843 ("KVM: x86: inject exceptions produced by x86_decode_insn") Cc: stable@vger.kernel.org Cc: Denis Lunev <den@virtuozzo.com> Cc: Roman Kagan <rkagan@virtuozzo.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Signed-off-by: NJan Dakinevich <jan.dakinevich@virtuozzo.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Debugging a failed VM-Enter is often like searching for a needle in a haystack, e.g. there are over 80 consistency checks that funnel into the "invalid control field" error code. One way to expedite debug is to run the buggy code as an L1 guest under KVM (and pray that the failing check is detected by KVM). However, extracting useful debug information out of L0 KVM requires attaching a debugger to KVM and/or modifying the source, e.g. to log which check is failing. Make life a little less painful for VMM developers and add a tracepoint for failed VM-Enter consistency checks. Ideally the tracepoint would capture both what check failed and precisely why it failed, but logging why a checked failed is difficult to do in a generic tracepoint without resorting to invasive techniques, e.g. generating a custom string on failure. That being said, for the vast majority of VM-Enter failures the most difficult step is figuring out exactly what to look at, e.g. figuring out which bit was incorrectly set in a control field is usually not too painful once the guilty field as been identified. To reach a happy medium between precision and ease of use, simply log the code that detected a failed check, using a macro to execute the check and log the trace event on failure. This approach enables tracing arbitrary code, e.g. it's not limited to function calls or specific formats of checks, and the changes to the existing code are minimally invasive. A macro with a two-character name is desirable as usage of the macro doesn't result in overly long lines or confusing alignment, while still retaining some amount of readability. I.e. a one-character name is a little too terse, and a three-character name results in the contents being passed to the macro aligning with an indented line when the macro is used an in if-statement, e.g.: if (VCC(nested_vmx_check_long_line_one(...) && nested_vmx_check_long_line_two(...))) return -EINVAL; And that is the story of how the CC(), a.k.a. Consistency Check, macro got its name. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Move RDMSR and WRMSR emulation into common x86 code to consolidate nearly identical SVM and VMX code. Note, consolidating RDMSR introduces an extra indirect call, i.e. retpoline, due to reaching {svm,vmx}_get_msr() via kvm_x86_ops, but a guest kernel likely has bigger problems if increasing the latency of RDMSR VM-Exits by ~70 cycles has a measurable impact on overall VM performance. E.g. the only recurring RDMSR VM-Exits (after booting) on my system running Linux 5.2 in the guest are for MSR_IA32_TSC_ADJUST via arch_cpu_idle_enter(). No functional change intended. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Refactor the top-level MSR accessors to take/return the index and value directly instead of requiring the caller to dump them into a msr_data struct. No functional change intended. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Peter Xu 提交于
The PLE window tracepoint triggers even if the window is not changed, and the wording can be a bit confusing too. One example line: kvm_ple_window: vcpu 0: ple_window 4096 (shrink 4096) It easily let people think of "the window now is 4096 which is shrinked", but the truth is the value actually didn't change (4096). Let's only dump this message if the value really changed, and we make the message even simpler like: kvm_ple_window: vcpu 4 old 4096 new 8192 (growed) Signed-off-by: NPeter Xu <peterx@redhat.com> Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 10 9月, 2019 1 次提交
-
-
由 Sean Christopherson 提交于
Manually generate the PDPTR reserved bit mask when explicitly loading PDPTRs. The reserved bits that are being tracked by the MMU reflect the current paging mode, which is unlikely to be PAE paging in the vast majority of flows that use load_pdptrs(), e.g. CR0 and CR4 emulation, __set_sregs(), etc... This can cause KVM to incorrectly signal a bad PDPTR, or more likely, miss a reserved bit check and subsequently fail a VM-Enter due to a bad VMCS.GUEST_PDPTR. Add a one off helper to generate the reserved bits instead of sharing code across the MMU's calculations and the PDPTR emulation. The PDPTR reserved bits are basically set in stone, and pushing a helper into the MMU's calculation adds unnecessary complexity without improving readability. Oppurtunistically fix/update the comment for load_pdptrs(). Note, the buggy commit also introduced a deliberate functional change, "Also remove bit 5-6 from rsvd_bits_mask per latest SDM.", which was effectively (and correctly) reverted by commit cd9ae5fe ("KVM: x86: Fix page-tables reserved bits"). A bit of SDM archaeology shows that the SDM from late 2008 had a bug (likely a copy+paste error) where it listed bits 6:5 as AVL and A for PDPTEs used for 4k entries but reserved for 2mb entries. I.e. the SDM contradicted itself, and bits 6:5 are and always have been reserved. Fixes: 20c466b5 ("KVM: Use rsvd_bits_mask in load_pdptrs()") Cc: stable@vger.kernel.org Cc: Nadav Amit <nadav.amit@gmail.com> Reported-by: NDoug Reiland <doug.reiland@intel.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 28 8月, 2019 1 次提交
-
-
由 Sean Christopherson 提交于
Don't advance RIP or inject a single-step #DB if emulation signals a fault. This logic applies to all state updates that are conditional on clean retirement of the emulation instruction, e.g. updating RFLAGS was previously handled by commit 38827dbd ("KVM: x86: Do not update EFLAGS on faulting emulation"). Not advancing RIP is likely a nop, i.e. ctxt->eip isn't updated with ctxt->_eip until emulation "retires" anyways. Skipping #DB injection fixes a bug reported by Andy Lutomirski where a #UD on SYSCALL due to invalid state with EFLAGS.TF=1 would loop indefinitely due to emulation overwriting the #UD with #DB and thus restarting the bad SYSCALL over and over. Cc: Nadav Amit <nadav.amit@gmail.com> Cc: stable@vger.kernel.org Reported-by: NAndy Lutomirski <luto@kernel.org> Fixes: 663f4c61 ("KVM: x86: handle singlestep during emulation") Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 22 8月, 2019 7 次提交
-
-
由 Wanpeng Li 提交于
Add pv tlb shootdown tracepoint. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Remove a few stale checks for non-NULL ops now that the ops in question are implemented by both VMX and SVM. Note, this is **not** stable material, the Fixes tags are there purely to show when a particular op was first supported by both VMX and SVM. Fixes: 74f16909 ("kvm/svm: Setup MCG_CAP on AMD properly") Fixes: b31c114b ("KVM: X86: Provide a capability to disable PAUSE intercepts") Fixes: 411b44ba ("svm: Implements update_pi_irte hook to setup posted interrupt") Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Rename "access" to "mmio_access" to match the other MMIO cache members and to make it more obvious that it's tracking the access permissions for the MMIO cache. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
To avoid hardcoding xsetbv length to '3' we need to support decoding it in the emulator. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
When doing x86_emulate_instruction(EMULTYPE_SKIP) interrupt shadow has to be cleared if and only if the skipping is successful. There are two immediate issues: - In SVM skip_emulated_instruction() we are not zapping interrupt shadow in case kvm_emulate_instruction(EMULTYPE_SKIP) is used to advance RIP (!nrpip_save). - In VMX handle_ept_misconfig() when running as a nested hypervisor we (static_cpu_has(X86_FEATURE_HYPERVISOR) case) forget to clear interrupt shadow. Note that we intentionally don't handle the case when the skipped instruction is supposed to prolong the interrupt shadow ("MOV/POP SS") as skip-emulation of those instructions should not happen under normal circumstances. Suggested-by: NSean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
On AMD, kvm_x86_ops->skip_emulated_instruction(vcpu) can, in theory, fail: in !nrips case we call kvm_emulate_instruction(EMULTYPE_SKIP). Currently, we only do printk(KERN_DEBUG) when this happens and this is not ideal. Propagate the error up the stack. On VMX, skip_emulated_instruction() doesn't fail, we have two call sites calling it explicitly: handle_exception_nmi() and handle_task_switch(), we can just ignore the result. On SVM, we also have two explicit call sites: svm_queue_exception() and it seems we don't need to do anything there as we check if RIP was advanced or not. In task_switch_interception(), however, we are better off not proceeding to kvm_task_switch() in case skip_emulated_instruction() failed. Suggested-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Similar to AMD bits, set the Intel bits from the vendor-independent feature and bug flags, because KVM_GET_SUPPORTED_CPUID does not care about the vendor and they should be set on AMD processors as well. Suggested-by: NJim Mattson <jmattson@google.com> Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 05 8月, 2019 1 次提交
-
-
由 Wanpeng Li 提交于
After commit d73eb57b (KVM: Boost vCPUs that are delivering interrupts), a five years old bug is exposed. Running ebizzy benchmark in three 80 vCPUs VMs on one 80 pCPUs Skylake server, a lot of rcu_sched stall warning splatting in the VMs after stress testing: INFO: rcu_sched detected stalls on CPUs/tasks: { 4 41 57 62 77} (detected by 15, t=60004 jiffies, g=899, c=898, q=15073) Call Trace: flush_tlb_mm_range+0x68/0x140 tlb_flush_mmu.part.75+0x37/0xe0 tlb_finish_mmu+0x55/0x60 zap_page_range+0x142/0x190 SyS_madvise+0x3cd/0x9c0 system_call_fastpath+0x1c/0x21 swait_active() sustains to be true before finish_swait() is called in kvm_vcpu_block(), voluntarily preempted vCPUs are taken into account by kvm_vcpu_on_spin() loop greatly increases the probability condition kvm_arch_vcpu_runnable(vcpu) is checked and can be true, when APICv is enabled the yield-candidate vCPU's VMCS RVI field leaks(by vmx_sync_pir_to_irr()) into spinning-on-a-taken-lock vCPU's current VMCS. This patch fixes it by checking conservatively a subset of events. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Marc Zyngier <Marc.Zyngier@arm.com> Cc: stable@vger.kernel.org Fixes: 98f4a146 (KVM: add kvm_arch_vcpu_runnable() test to kvm_vcpu_on_spin() loop) Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 24 7月, 2019 1 次提交
-
-
由 Wanpeng Li 提交于
Commit 11752adb (locking/pvqspinlock: Implement hybrid PV queued/unfair locks) introduces hybrid PV queued/unfair locks - queued mode (no starvation) - unfair mode (good performance on not heavily contended lock) The lock waiter goes into the unfair mode especially in VMs with over-commit vCPUs since increaing over-commitment increase the likehood that the queue head vCPU may have been preempted and not actively spinning. However, reschedule queue head vCPU timely to acquire the lock still can get better performance than just depending on lock stealing in over-subscribe scenario. Testing on 80 HT 2 socket Xeon Skylake server, with 80 vCPUs VM 80GB RAM: ebizzy -M vanilla boosting improved 1VM 23520 25040 6% 2VM 8000 13600 70% 3VM 3100 5400 74% The lock holder vCPU yields to the queue head vCPU when unlock, to boost queue head vCPU which is involuntary preemption or the one which is voluntary halt due to fail to acquire the lock after a short spin in the guest. Cc: Waiman Long <longman@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-