- 24 9月, 2019 15 次提交
-
-
由 Sean Christopherson 提交于
Immediately inject a #GP when VMware emulation fails and return EMULATE_DONE instead of propagating EMULATE_FAIL up the stack. This helps pave the way for removing EMULATE_FAIL altogether. Rename EMULTYPE_VMWARE to EMULTYPE_VMWARE_GP to document that the x86 emulator is called to handle VMware #GP interception, e.g. why a #GP is injected on emulation failure for EMULTYPE_VMWARE_GP. Drop EMULTYPE_NO_UD_ON_FAIL as a standalone type. The "no #UD on fail" is used only in the VMWare case and is obsoleted by having the emulator itself reinject #GP. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: NLiran Alon <liran.alon@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
The VMware backdoor hooks #GP faults on IN{S}, OUT{S}, and RDPMC, none of which generate a non-zero error code for their #GP. Re-injecting #GP instead of attempting emulation on a non-zero error code will allow a future patch to move #GP injection (for emulation failure) into kvm_emulate_instruction() without having to plumb in the error code. Reviewed-and-tested-by: NVitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: NLiran Alon <liran.alon@oracle.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Return the single-step emulation result directly instead of via an out param. Presumably at some point in the past kvm_vcpu_do_singlestep() could be called with *r==EMULATE_USER_EXIT, but that is no longer the case, i.e. all callers are happy to overwrite their own return variable. Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: NLiran Alon <liran.alon@oracle.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
When handling emulation failure, return the emulation result directly instead of capturing it in a local variable. Future patches will move additional cases into handle_emulation_failure(), clean up the cruft before so there isn't an ugly mix of setting a local variable and returning directly. Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: NLiran Alon <liran.alon@oracle.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Move the stat.mmio_exits update into x86_emulate_instruction(). This is both a bug fix, e.g. the current update flows will incorrectly increment mmio_exits on emulation failure, and a preparatory change to set the stage for eliminating EMULATE_DONE and company. Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Krish Sadhukhan 提交于
According to section "Checks Related to Address-Space Size" in Intel SDM vol 3C, the following checks are performed on vmentry of nested guests: If the logical processor is outside IA-32e mode (if IA32_EFER.LMA = 0) at the time of VM entry, the following must hold: - The "IA-32e mode guest" VM-entry control is 0. - The "host address-space size" VM-exit control is 0. If the logical processor is in IA-32e mode (if IA32_EFER.LMA = 1) at the time of VM entry, the "host address-space size" VM-exit control must be 1. If the "host address-space size" VM-exit control is 0, the following must hold: - The "IA-32e mode guest" VM-entry control is 0. - Bit 17 of the CR4 field (corresponding to CR4.PCIDE) is 0. - Bits 63:32 in the RIP field are 0. If the "host address-space size" VM-exit control is 1, the following must hold: - Bit 5 of the CR4 field (corresponding to CR4.PAE) is 1. - The RIP field contains a canonical address. On processors that do not support Intel 64 architecture, checks are performed to ensure that the "IA-32e mode guest" VM-entry control and the "host address-space size" VM-exit control are both 0. Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: NKarl Heubaum <karl.heubaum@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
Hyper-V 2019 doesn't expose MD_CLEAR CPUID bit to guests when it cannot guarantee that two virtual processors won't end up running on sibling SMT threads without knowing about it. This is done as an optimization as in this case there is nothing the guest can do to protect itself against MDS and issuing additional flush requests is just pointless. On bare metal the topology is known, however, when Hyper-V is running nested (e.g. on top of KVM) it needs an additional piece of information: a confirmation that the exposed topology (wrt vCPU placement on different SMT threads) is trustworthy. NoNonArchitecturalCoreSharing (CPUID 0x40000004 EAX bit 18) is described in TLFS as follows: "Indicates that a virtual processor will never share a physical core with another virtual processor, except for virtual processors that are reported as sibling SMT threads." From KVM we can give such guarantee in two cases: - SMT is unsupported or forcefully disabled (just 'disabled' doesn't work as it can become re-enabled during the lifetime of the guest). - vCPUs are properly pinned so the scheduler won't put them on sibling SMT threads (when they're not reported as such). This patch reports NoNonArchitecturalCoreSharing bit in to userspace in the first case. The second case is outside of KVM's domain of responsibility (as vCPU pinning is actually done by someone who manages KVM's userspace - e.g. libvirt pinning QEMU threads). Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
Reported by syzkaller: kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN RIP: 0010:__apic_accept_irq+0x46/0x740 arch/x86/kvm/lapic.c:1029 Call Trace: kvm_apic_set_irq+0xb4/0x140 arch/x86/kvm/lapic.c:558 stimer_notify_direct arch/x86/kvm/hyperv.c:648 [inline] stimer_expiration arch/x86/kvm/hyperv.c:659 [inline] kvm_hv_process_stimers+0x594/0x1650 arch/x86/kvm/hyperv.c:686 vcpu_enter_guest+0x2b2a/0x54b0 arch/x86/kvm/x86.c:7896 vcpu_run+0x393/0xd40 arch/x86/kvm/x86.c:8152 kvm_arch_vcpu_ioctl_run+0x636/0x900 arch/x86/kvm/x86.c:8360 kvm_vcpu_ioctl+0x6cf/0xaf0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2765 The testcase programs HV_X64_MSR_STIMERn_CONFIG/HV_X64_MSR_STIMERn_COUNT, in addition, there is no lapic in the kernel, the counters value are small enough in order that kvm_hv_process_stimers() inject this already-expired timer interrupt into the guest through lapic in the kernel which triggers the NULL deferencing. This patch fixes it by don't advertise direct mode synthetic timers and discarding the inject when lapic is not in kernel. syzkaller source: https://syzkaller.appspot.com/x/repro.c?x=1752fe0a600000 Reported-by: syzbot+dff25ee91f0c7d5c1695@syzkaller.appspotmail.com Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Zapping collapsible sptes, a.k.a. 4k sptes that can be promoted into a large page, is only necessary when changing only the dirty logging flag of a memory region. If the memslot is also being moved, then all sptes for the memslot are zapped when it is invalidated. When a memslot is being created, it is impossible for there to be existing dirty mappings, e.g. KVM can have MMIO sptes, but not present, and thus dirty, sptes. Note, the comment and logic are shamelessly borrowed from MIPS's version of kvm_arch_commit_memory_region(). Fixes: 3ea3b7fa ("kvm: mmu: lazy collapse small sptes into large sptes") Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
It was discovered that after commit 65efa61d ("selftests: kvm: provide common function to enable eVMCS") hyperv_cpuid selftest is failing on AMD. The reason is that the commit changed _vcpu_ioctl() to vcpu_ioctl() in the test and this one can't fail. Instead of fixing the test is seems to make more sense to not announce KVM_CAP_HYPERV_ENLIGHTENED_VMCS support if it is definitely missing (on svm and in case kvm_intel.nested=0). Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
Since commit 5158917c ("KVM: x86: nVMX: Allow nested_enable_evmcs to be NULL") the code in x86.c is prepared to see nested_enable_evmcs being NULL and in VMX case it actually is when nesting is disabled. Remove the unneeded stub from SVM code. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
Hyper-V provides direct tlb flush function which helps L1 Hypervisor to handle Hyper-V tlb flush request from L2 guest. Add the function support for VMX. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NTianyu Lan <Tianyu.Lan@microsoft.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Tianyu Lan 提交于
Hyper-V direct tlb flush function should be enabled for guest that only uses Hyper-V hypercall. User space hypervisor(e.g, Qemu) can disable KVM identification in CPUID and just exposes Hyper-V identification to make sure the precondition. Add new KVM capability KVM_CAP_ HYPERV_DIRECT_TLBFLUSH for user space to enable Hyper-V direct tlb function and this function is default to be disabled in KVM. Signed-off-by: NTianyu Lan <Tianyu.Lan@microsoft.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Tianyu Lan 提交于
The struct hv_vp_assist_page was defined incorrectly. The "vtl_control" should be u64[3], "nested_enlightenments _control" should be a u64 and there are 7 reserved bytes following "enlighten_vmentry". Fix the definition. Signed-off-by: NTianyu Lan <Tianyu.Lan@microsoft.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jim Mattson 提交于
These MSRs should be enumerated by KVM_GET_MSR_INDEX_LIST, so that userspace knows that these MSRs may be part of the vCPU state. Signed-off-by: NJim Mattson <jmattson@google.com> Reviewed-by: NEric Hankland <ehankland@google.com> Reviewed-by: NPeter Shier <pshier@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 16 9月, 2019 5 次提交
-
-
由 Johannes Berg 提交于
When we have virtio enabled, we must have real barriers since we may be running on an SMP machine (quite likely are, in fact), so the other process can be on another CPU. Since in any other case we don't really use DMA barriers, remove their override completely so real barriers will get used. In the future we might need them for other cases as well. Signed-off-by: NJohannes Berg <johannes.berg@intel.com> Signed-off-by: NRichard Weinberger <richard@nod.at>
-
由 Johannes Berg 提交于
UML has its own platform-specific barrier.h under arch/x86/um/, which should get used. Fix the build system to use it, and then fix the barrier.h to actually compile. Signed-off-by: NJohannes Berg <johannes.berg@intel.com> Signed-off-by: NRichard Weinberger <richard@nod.at>
-
由 Johannes Berg 提交于
Fix a warning about the function type being wrong. Signed-off-by: NJohannes Berg <johannes.berg@intel.com> Signed-off-by: NRichard Weinberger <richard@nod.at>
-
由 Rasmus Villemoes 提交于
This helps preventing a BUG* or WARN* in some static inline from preventing that (or one of its callers) being inlined, so should allow gcc to make better informed inlining decisions. For example, with gcc 9.2, tcp_fastopen_no_cookie() vanishes from net/ipv4/tcp_fastopen.o. It does not itself have any BUG or WARN, but it calls dst_metric() which has a WARN_ON_ONCE - and despite that WARN_ON_ONCE vanishing since the condition is compile-time false, dst_metric() is apparently sufficiently "large" that when it gets inlined into tcp_fastopen_no_cookie(), the latter becomes too large for inlining. Overall, if one asks size(1), .text decreases a little and .data increases by about the same amount (x86-64 defconfig) $ size vmlinux.{before,after} text data bss dec hex filename 19709726 5202600 1630280 26542606 195020e vmlinux.before 19709330 5203068 1630280 26542678 1950256 vmlinux.after while bloat-o-meter says add/remove: 10/28 grow/shrink: 103/51 up/down: 3669/-2854 (815) ... Total: Before=14783683, After=14784498, chg +0.01% Acked-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: NMiguel Ojeda <miguel.ojeda.sandonis@gmail.com>
-
由 Rasmus Villemoes 提交于
Most, if not all, uses of the alternative* family just provide one or two instructions in .text, but the string literal can be quite large, causing gcc to overestimate the size of the generated code. That in turn affects its decisions about inlining of the function containing the alternative() asm statement. New enough versions of gcc allow one to overrule the estimated size by using "asm inline" instead of just "asm". So replace asm by the helper asm_inline, which for older gccs just expands to asm. Acked-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: NMiguel Ojeda <miguel.ojeda.sandonis@gmail.com>
-
- 14 9月, 2019 4 次提交
-
-
由 Sean Christopherson 提交于
James Harvey reported a livelock that was introduced by commit d012a06a ("Revert "KVM: x86/mmu: Zap only the relevant pages when removing a memslot""). The livelock occurs because kvm_mmu_zap_all() as it exists today will voluntarily reschedule and drop KVM's mmu_lock, which allows other vCPUs to add shadow pages. With enough vCPUs, kvm_mmu_zap_all() can get stuck in an infinite loop as it can never zap all pages before observing lock contention or the need to reschedule. The equivalent of kvm_mmu_zap_all() that was in use at the time of the reverted commit (4e103134, "KVM: x86/mmu: Zap only the relevant pages when removing a memslot") employed a fast invalidate mechanism and was not susceptible to the above livelock. There are three ways to fix the livelock: - Reverting the revert (commit d012a06a) is not a viable option as the revert is needed to fix a regression that occurs when the guest has one or more assigned devices. It's unlikely we'll root cause the device assignment regression soon enough to fix the regression timely. - Remove the conditional reschedule from kvm_mmu_zap_all(). However, although removing the reschedule would be a smaller code change, it's less safe in the sense that the resulting kvm_mmu_zap_all() hasn't been used in the wild for flushing memslots since the fast invalidate mechanism was introduced by commit 6ca18b69 ("KVM: x86: use the fast way to invalidate all pages"), back in 2013. - Reintroduce the fast invalidate mechanism and use it when zapping shadow pages in response to a memslot being deleted/moved, which is what this patch does. For all intents and purposes, this is a revert of commit ea145aac ("Revert "KVM: MMU: fast invalidate all pages"") and a partial revert of commit 7390de1e ("Revert "KVM: x86: use the fast way to invalidate all pages""), i.e. restores the behavior of commit 5304b8d3 ("KVM: MMU: fast invalidate all pages") and commit 6ca18b69 ("KVM: x86: use the fast way to invalidate all pages") respectively. Fixes: d012a06a ("Revert "KVM: x86/mmu: Zap only the relevant pages when removing a memslot"") Reported-by: NJames Harvey <jamespharvey20@gmail.com> Cc: Alex Willamson <alex.williamson@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Fuqian Huang 提交于
Emulation of VMPTRST can incorrectly inject a page fault when passed an operand that points to an MMIO address. The page fault will use uninitialized kernel stack memory as the CR2 and error code. The right behavior would be to abort the VM with a KVM_EXIT_INTERNAL_ERROR exit to userspace; however, it is not an easy fix, so for now just ensure that the error code and CR2 are zero. Signed-off-by: NFuqian Huang <huangfq.daxian@gmail.com> Cc: stable@vger.kernel.org [add comment] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
The implementation of vmread to memory is still incomplete, as it lacks the ability to do vmread to I/O memory just like vmptrst. Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
IPI shorthand is supported now by linux apic/x2apic driver, switch to IPI shorthand for all excluding self and all including self destination shorthand in kvm guest, to avoid splitting the target mask into several PV IPI hypercalls. This patch removes the kvm_send_ipi_all() and kvm_send_ipi_allbutself() since the callers in APIC codes have already taken care of apic_use_ipi_shorthand and fallback to ->send_IPI_mask and ->send_IPI_mask_allbutself if it is false. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Nadav Amit <namit@vmware.com> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 12 9月, 2019 6 次提交
-
-
由 Liran Alon 提交于
Commit cd7764fe ("KVM: x86: latch INITs while in system management mode") changed code to latch INIT while vCPU is in SMM and process latched INIT when leaving SMM. It left a subtle remark in commit message that similar treatment should also be done while vCPU is in VMX non-root-mode. However, INIT signals should actually be latched in various vCPU states: (*) For both Intel and AMD, INIT signals should be latched while vCPU is in SMM. (*) For Intel, INIT should also be latched while vCPU is in VMX operation and later processed when vCPU leaves VMX operation by executing VMXOFF. (*) For AMD, INIT should also be latched while vCPU runs with GIF=0 or in guest-mode with intercept defined on INIT signal. To fix this: 1) Add kvm_x86_ops->apic_init_signal_blocked() such that each CPU vendor can define the various CPU states in which INIT signals should be blocked and modify kvm_apic_accept_events() to use it. 2) Modify vmx_check_nested_events() to check for pending INIT signal while vCPU in guest-mode. If so, emualte vmexit on EXIT_REASON_INIT_SIGNAL. Note that nSVM should have similar behaviour but is currently left as a TODO comment to implement in the future because nSVM don't yet implement svm_check_nested_events(). Note: Currently KVM nVMX implementation don't support VMX wait-for-SIPI activity state as specified in MSR_IA32_VMX_MISC bits 6:8 exposed to guest (See nested_vmx_setup_ctls_msrs()). If and when support for this activity state will be implemented, kvm_check_nested_events() would need to avoid emulating vmexit on INIT signal in case activity-state is wait-for-SIPI. In addition, kvm_apic_accept_events() would need to be modified to avoid discarding SIPI in case VMX activity-state is wait-for-SIPI but instead delay SIPI processing to vmx_check_nested_events() that would clear pending APIC events and emulate vmexit on SIPI. Reviewed-by: NJoao Martins <joao.m.martins@oracle.com> Co-developed-by: NNikita Leshenko <nikita.leshchenko@oracle.com> Signed-off-by: NNikita Leshenko <nikita.leshchenko@oracle.com> Signed-off-by: NLiran Alon <liran.alon@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Liran Alon 提交于
According to Intel SDM section 25.2 "Other Causes of VM Exits", When INIT signal is received on a CPU that is running in VMX non-root mode it should cause an exit with exit-reason of 3. (See Intel SDM Appendix C "VMX BASIC EXIT REASONS") This patch introduce the exit-reason definition. Reviewed-by: NBhavesh Davda <bhavesh.davda@oracle.com> Reviewed-by: NJoao Martins <joao.m.martins@oracle.com> Co-developed-by: NNikita Leshenko <nikita.leshchenko@oracle.com> Signed-off-by: NNikita Leshenko <nikita.leshchenko@oracle.com> Signed-off-by: NLiran Alon <liran.alon@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
The hrtimer which is used to emulate lapic timer is stopped during vcpu reset, preemption timer should do the same. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
This patch optimizes the virtual IPI emulation sequence: write ICR2 write ICR2 write ICR read ICR2 read ICR ==> send virtual IPI read ICR2 write ICR send virtual IPI It can reduce kvm-unit-tests/vmexit.flat IPI testing latency(from sender send IPI to sender receive the ACK) from 3319 cycles to 3203 cycles on SKylake server. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jiří Paleček 提交于
On AMD processors, in PAE 32bit mode, nested KVM instances don't work. The L0 host get a kernel OOPS, which is related to arch.mmu->pae_root being NULL. The reason for this is that when setting up nested KVM instance, arch.mmu is set to &arch.guest_mmu (while normally, it would be &arch.root_mmu). However, the initialization and allocation of pae_root only creates it in root_mmu. KVM code (ie. in mmu_alloc_shadow_roots) then accesses arch.mmu->pae_root, which is the unallocated arch.guest_mmu->pae_root. This fix just allocates (and frees) pae_root in both guest_mmu and root_mmu (and also lm_root if it was allocated). The allocation is subject to previous restrictions ie. it won't allocate anything on 64-bit and AFAIK not on Intel. Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=203923 Fixes: 14c07ad8 ("x86/kvm/mmu: introduce guest_mmu") Signed-off-by: NJiri Palecek <jpalecek@web.de> Tested-by: NJiri Palecek <jpalecek@web.de> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jan Dakinevich 提交于
x86_emulate_instruction() takes into account ctxt->have_exception flag during instruction decoding, but in practice this flag is never set in x86_decode_insn(). Fixes: 6ea6e843 ("KVM: x86: inject exceptions produced by x86_decode_insn") Cc: stable@vger.kernel.org Cc: Denis Lunev <den@virtuozzo.com> Cc: Roman Kagan <rkagan@virtuozzo.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Signed-off-by: NJan Dakinevich <jan.dakinevich@virtuozzo.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 11 9月, 2019 10 次提交
-
-
由 Jan Dakinevich 提交于
inject_emulated_exception() returns true if and only if nested page fault happens. However, page fault can come from guest page tables walk, either nested or not nested. In both cases we should stop an attempt to read under RIP and give guest to step over its own page fault handler. This is also visible when an emulated instruction causes a #GP fault and the VMware backdoor is enabled. To handle the VMware backdoor, KVM intercepts #GP faults; with only the next patch applied, x86_emulate_instruction() injects a #GP but returns EMULATE_FAIL instead of EMULATE_DONE. EMULATE_FAIL causes handle_exception_nmi() (or gp_interception() for SVM) to re-inject the original #GP because it thinks emulation failed due to a non-VMware opcode. This patch prevents the issue as x86_emulate_instruction() will return EMULATE_DONE after injecting the #GP. Fixes: 6ea6e843 ("KVM: x86: inject exceptions produced by x86_decode_insn") Cc: stable@vger.kernel.org Cc: Denis Lunev <den@virtuozzo.com> Cc: Roman Kagan <rkagan@virtuozzo.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Signed-off-by: NJan Dakinevich <jan.dakinevich@virtuozzo.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
The downside of guest side polling is that polling is performed even with other runnable tasks in the host. However, even if poll in kvm can aware whether or not other runnable tasks in the same pCPU, it can still incur extra overhead in over-subscribe scenario. Now we can just enable guest polling when dedicated pCPUs are available. Acked-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Sean Christopherson 提交于
Use the recently added tracepoint for logging nested VM-Enter failures instead of spamming the kernel log when hardware detects a consistency check failure. Take the opportunity to print the name of the error code instead of dumping the raw hex number, but limit the symbol table to error codes that can reasonably be encountered by KVM. Add an equivalent tracepoint in nested_vmx_check_vmentry_hw(), e.g. so that tracing of "invalid control field" errors isn't suppressed when nested early checks are enabled. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Debugging a failed VM-Enter is often like searching for a needle in a haystack, e.g. there are over 80 consistency checks that funnel into the "invalid control field" error code. One way to expedite debug is to run the buggy code as an L1 guest under KVM (and pray that the failing check is detected by KVM). However, extracting useful debug information out of L0 KVM requires attaching a debugger to KVM and/or modifying the source, e.g. to log which check is failing. Make life a little less painful for VMM developers and add a tracepoint for failed VM-Enter consistency checks. Ideally the tracepoint would capture both what check failed and precisely why it failed, but logging why a checked failed is difficult to do in a generic tracepoint without resorting to invasive techniques, e.g. generating a custom string on failure. That being said, for the vast majority of VM-Enter failures the most difficult step is figuring out exactly what to look at, e.g. figuring out which bit was incorrectly set in a control field is usually not too painful once the guilty field as been identified. To reach a happy medium between precision and ease of use, simply log the code that detected a failed check, using a macro to execute the check and log the trace event on failure. This approach enables tracing arbitrary code, e.g. it's not limited to function calls or specific formats of checks, and the changes to the existing code are minimally invasive. A macro with a two-character name is desirable as usage of the macro doesn't result in overly long lines or confusing alignment, while still retaining some amount of readability. I.e. a one-character name is a little too terse, and a three-character name results in the contents being passed to the macro aligning with an indented line when the macro is used an in if-statement, e.g.: if (VCC(nested_vmx_check_long_line_one(...) && nested_vmx_check_long_line_two(...))) return -EINVAL; And that is the story of how the CC(), a.k.a. Consistency Check, macro got its name. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Dan Carpenter 提交于
We refactored this code a bit and accidentally deleted the "-" character from "-EINVAL". The kvm_vcpu_map() function never returns positive EINVAL. Fixes: c8e16b78 ("x86: KVM: svm: eliminate hardcoded RIP advancement from vmrun_interception()") Cc: stable@vger.kernel.org Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Liran Alon 提交于
Receiving an unexpected exit reason from hardware should be considered as a severe bug in KVM. Therefore, instead of just injecting #UD to guest and ignore it, exit to userspace on internal error so that it could handle it properly (probably by terminating guest). In addition, prefer to use vcpu_unimpl() instead of WARN_ONCE() as handling unexpected exit reason should be a rare unexpected event (that was expected to never happen) and we prefer to print a message on it every time it occurs to guest. Furthermore, dump VMCS/VMCB to dmesg to assist diagnosing such cases. Reviewed-by: NMihai Carabas <mihai.carabas@oracle.com> Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: NJoao Martins <joao.m.martins@oracle.com> Signed-off-by: NLiran Alon <liran.alon@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Christoph Hellwig 提交于
Now that we know we always have the dma-noncoherent.h helpers available if we are on an architecture with support for non-coherent devices, we can just call them directly, and remove the calls to the dma-direct routines, including the fact that we call the dma_direct_map_page routines but ignore the value returned from it. Instead we now have Xen wrappers for the arch_sync_dma_for_{device,cpu} helpers that call the special Xen versions of those routines for foreign pages. Note that the new helpers get the physical address passed in addition to the dma address to avoid another translation for the local cache maintainance. The pfn_valid checks remain on the dma address as in the old code, even if that looks a little funny. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: NStefano Stabellini <sstabellini@kernel.org> Acked-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Christoph Hellwig 提交于
These routines are only used by swiotlb-xen, which cannot be modular. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NStefano Stabellini <sstabellini@kernel.org>
-
由 Sean Christopherson 提交于
Move RDMSR and WRMSR emulation into common x86 code to consolidate nearly identical SVM and VMX code. Note, consolidating RDMSR introduces an extra indirect call, i.e. retpoline, due to reaching {svm,vmx}_get_msr() via kvm_x86_ops, but a guest kernel likely has bigger problems if increasing the latency of RDMSR VM-Exits by ~70 cycles has a measurable impact on overall VM performance. E.g. the only recurring RDMSR VM-Exits (after booting) on my system running Linux 5.2 in the guest are for MSR_IA32_TSC_ADJUST via arch_cpu_idle_enter(). No functional change intended. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Refactor the top-level MSR accessors to take/return the index and value directly instead of requiring the caller to dump them into a msr_data struct. No functional change intended. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-