- 19 5月, 2016 6 次提交
-
-
由 Suravee Suthikulpanit 提交于
When enable AVIC: * Do not intercept CR8 since this should be handled by AVIC HW. * Also, we don't need to sync cr8/V_TPR and APIC backing page. Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> [Rename svm_in_nested_interrupt_shadow to svm_nested_virtualize_tpr. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Suravee Suthikulpanit 提交于
Since AVIC only virtualizes xAPIC hardware for the guest, this patch disable x2APIC support in guest CPUID. Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Suravee Suthikulpanit 提交于
Adding kvm_x86_ops hooks to allow APICv to do post state restore. This is required to support VM save and restore feature. Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Suravee Suthikulpanit 提交于
This patch introduces VMEXIT handlers, avic_incomplete_ipi_interception() and avic_unaccelerated_access_interception() along with two trace points (trace_kvm_avic_incomplete_ipi and trace_kvm_avic_unaccelerated_access). Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Suravee Suthikulpanit 提交于
This patch introduces a new mechanism to inject interrupt using AVIC. Since VINTR is not supported when enable AVIC, we need to inject interrupt via APIC backing page instead. This patch also adds support for AVIC doorbell, which is used by KVM to signal a running vcpu to check IRR for injected interrupts. Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Suravee Suthikulpanit 提交于
This patch introduces AVIC-related data structure, and AVIC initialization code. There are three main data structures for AVIC: * Virtual APIC (vAPIC) backing page (per-VCPU) * Physical APIC ID table (per-VM) * Logical APIC ID table (per-VM) Currently, AVIC is disabled by default. Users can manually enable AVIC via kernel boot option kvm-amd.avic=1 or during kvm-amd module loading with parameter avic=1. Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> [Avoid extra indentation (Boris). - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 22 3月, 2016 1 次提交
-
-
由 Huaitong Han 提交于
Protection keys define a new 4-bit protection key field (PKEY) in bits 62:59 of leaf entries of the page tables, the PKEY is an index to PKRU register(16 domains), every domain has 2 bits(write disable bit, access disable bit). Static logic has been produced in update_pkru_bitmask, dynamic logic need read pkey from page table entries, get pkru value, and deduce the correct result. [ Huaitong: Xiao helps to modify many sections. ] Signed-off-by: NHuaitong Han <huaitong.han@intel.com> Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 17 2月, 2016 1 次提交
-
-
由 Andrey Smetanin 提交于
Pass the return code from kvm_emulate_hypercall on to the caller, in order to allow it to indicate to the userspace that the hypercall has to be handled there. Also adjust all the existing code paths to return 1 to make sure the hypercall isn't passed to the userspace without setting kvm_run appropriately. Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Joerg Roedel <joro@8bytes.org> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 11 12月, 2015 1 次提交
-
-
由 Paolo Bonzini 提交于
Invoking tracepoints within kvm_guest_enter/kvm_guest_exit causes a lockdep splat. Reported-by: NBorislav Petkov <bp@alien8.de> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 26 11月, 2015 3 次提交
-
-
由 Paolo Bonzini 提交于
RDTSCP was never supported for AMD CPUs, which nobody noticed because Linux does not use it. But exactly the fact that Linux does not use it makes the implementation very simple; we can freely trash MSR_TSC_AUX while running the guest. Cc: Joerg Roedel <joro@8bytes.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Andrey Smetanin 提交于
The decision on whether to use hardware APIC virtualization used to be taken globally, based on the availability of the feature in the CPU and the value of a module parameter. However, under certain circumstances we want to control it on per-vcpu basis. In particular, when the userspace activates HyperV synthetic interrupt controller (SynIC), APICv has to be disabled as it's incompatible with SynIC auto-EOI behavior. To achieve that, introduce 'apicv_active' flag on struct kvm_vcpu_arch, and kvm_vcpu_deactivate_apicv() function to turn APICv off. The flag is initialized based on the module parameter and CPU capability, and consulted whenever an APICv-specific action is performed. Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com> Signed-off-by: NDenis V. Lunev <den@openvz.org> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Andrey Smetanin 提交于
The function to determine if the vector is handled by ioapic used to rely on the fact that only ioapic-handled vectors were set up to cause vmexits when virtual apic was in use. We're going to break this assumption when introducing Hyper-V synthetic interrupts: they may need to cause vmexits too. To achieve that, introduce a new bitmap dedicated specifically for ioapic-handled vectors, and populate EOI exit bitmap from it for now. Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com> Signed-off-by: NDenis V. Lunev <den@openvz.org> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 24 11月, 2015 1 次提交
-
-
由 Borislav Petkov 提交于
The kernel accesses IC_CFG MSR (0xc0011021) on AMD because it checks whether the way access filter is enabled on some F15h models, and, if so, disables it. kvm doesn't handle that MSR access and complains about it, which can get really noisy in dmesg when one starts kvm guests all the time for testing. And it is useless anyway - guest kernel shouldn't be doing such changes anyway so tell it that that filter is disabled. Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1448273546-2567-4-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 10 11月, 2015 10 次提交
-
-
由 Paolo Bonzini 提交于
Because #DB is now intercepted unconditionally, this callback only operates on #BP for both VMX and SVM. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
This is needed to avoid the possibility that the guest triggers an infinite stream of #DB exceptions (CVE-2015-8104). VMX is not affected: because it does not save DR6 in the VMCS, it already intercepts #DB unconditionally. Reported-by: NJan Beulich <jbeulich@suse.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Eric Northup 提交于
It was found that a guest can DoS a host by triggering an infinite stream of "alignment check" (#AC) exceptions. This causes the microcode to enter an infinite loop where the core never receives another interrupt. The host kernel panics pretty quickly due to the effects (CVE-2015-5307). Signed-off-by: NEric Northup <digitaleric@google.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Haozhong Zhang 提交于
Both VMX and SVM scales the host TSC in the same way in call-back read_l1_tsc(), so this patch moves the scaling logic from call-back read_l1_tsc() to a common function kvm_read_l1_tsc(). Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Haozhong Zhang 提交于
For both VMX and SVM, if the 2nd argument of call-back adjust_tsc_offset() is the host TSC, then adjust_tsc_offset() will scale it first. This patch moves this common TSC scaling logic to its caller adjust_tsc_offset_host() and rename the call-back adjust_tsc_offset() to adjust_tsc_offset_guest(). Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Haozhong Zhang 提交于
Both VMX and SVM calculate the tsc-offset in the same way, so this patch removes the call-back compute_tsc_offset() and replaces it with a common function kvm_compute_tsc_offset(). Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Haozhong Zhang 提交于
Both VMX and SVM propagate virtual_tsc_khz in the same way, so this patch removes the call-back set_tsc_khz() and replaces it with a common function. Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Haozhong Zhang 提交于
VMX and SVM calculate the TSC scaling ratio in a similar logic, so this patch generalizes it to a common TSC scaling function. Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com> [Inline the multiplication and shift steps into mul_u64_u64_shr. Remove BUG_ON. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Haozhong Zhang 提交于
This patch moves the field of TSC scaling ratio from the architecture struct vcpu_svm to the common struct kvm_vcpu_arch. Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Haozhong Zhang 提交于
The number of bits of the fractional part of the 64-bit TSC scaling ratio in VMX and SVM is different. This patch makes the architecture code to collect the number of fractional bits and other related information into variables that can be accessed in the common code. Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 19 10月, 2015 1 次提交
-
-
由 Paolo Bonzini 提交于
Not zeroing EFER means that a 32-bit firmware cannot enter paging mode without clearing EFER.LME first (which it should not know about). Yang Zhang from Intel confirmed that the manual is wrong and EFER is cleared to zero on INIT. Fixes: d28bc9dd Cc: stable@vger.kernel.org Cc: Yang Z Zhang <yang.z.zhang@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 16 10月, 2015 1 次提交
-
-
由 Joerg Roedel 提交于
Currently we always write the next_rip of the shadow vmcb to the guests vmcb when we emulate a vmexit. This could confuse the guest when its cpuid indicated no support for the next_rip feature. Fix this by only propagating next_rip if the guest actually supports it. Cc: Bandan Das <bsd@redhat.com> Cc: Dirk Mueller <dmueller@suse.com> Tested-By: NDirk Mueller <dmueller@suse.com> Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 01 10月, 2015 9 次提交
-
-
由 Paolo Bonzini 提交于
The interrupt window is currently checked twice, once in vmx.c/svm.c and once in dm_request_for_irq_injection. The only difference is the extra check for kvm_arch_interrupt_allowed in dm_request_for_irq_injection, and the different return value (EINTR/KVM_EXIT_INTR for vmx.c/svm.c vs. 0/KVM_EXIT_IRQ_WINDOW_OPEN for dm_request_for_irq_injection). However, dm_request_for_irq_injection is basically dead code! Revive it by removing the checks in vmx.c and svm.c's vmexit handlers, and fixing the returned values for the dm_request_for_irq_injection case. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Avoid pointer chasing and memory barriers, and simplify the code when split irqchip (LAPIC in kernel, IOAPIC/PIC in userspace) is introduced. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
This will avoid an unnecessary trip to ->kvm and from there to the VPIC. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
We can reuse the algorithm that computes the EOI exit bitmap to figure out which vectors are handled by the IOAPIC. The only difference between the two is for edge-triggered interrupts other than IRQ8 that have no notifiers active; however, the IOAPIC does not have to do anything special for these interrupts anyway. This again limits the interactions between the IOAPIC and the LAPIC, making it easier to move the former to userspace. Inspired by a patch from Steve Rutherford. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Dirk Müller 提交于
The cpu feature flags are not ever going to change, so warning everytime can cause a lot of kernel log spam (in our case more than 10GB/hour). The warning seems to only occur when nested virtualization is enabled, so it's probably triggered by a KVM bug. This is a sensible and safe change anyway, and the KVM bug fix might not be suitable for stable releases anyway. Cc: stable@vger.kernel.org Signed-off-by: NDirk Mueller <dmueller@suse.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
This reverts commit 3c2e7f7d. Initializing the mapping from MTRR to PAT values was reported to fail nondeterministically, and it also caused extremely slow boot (due to caching getting disabled---bug 103321) with assigned devices. Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de> Reported-by: NSebastian Schuette <dracon@ewetel.net> Cc: stable@vger.kernel.org # 4.2+ Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
This reverts commit 54928303. It builds on the commit that is being reverted next. Cc: stable@vger.kernel.org # 4.2+ Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
This reverts commit e098223b, which has a dependency on other commits being reverted. Cc: stable@vger.kernel.org # 4.2+ Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
This reverts commit fd717f11. It was reported to cause Machine Check Exceptions (bug 104091). Reported-by: harn-solo@gmx.de Cc: stable@vger.kernel.org # 4.2+ Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 25 9月, 2015 1 次提交
-
-
由 Paolo Bonzini 提交于
kvm_set_cr0 may want to call kvm_zap_gfn_range and thus access the memslots array (SRCU protected). Using a mini SRCU critical section is ugly, and adding it to kvm_arch_vcpu_create doesn't work because the VMX vcpu_create callback calls synchronize_srcu. Fixes this lockdep splat: =============================== [ INFO: suspicious RCU usage. ] 4.3.0-rc1+ #1 Not tainted ------------------------------- include/linux/kvm_host.h:488 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by qemu-system-i38/17000: #0: (&(&kvm->mmu_lock)->rlock){+.+...}, at: kvm_zap_gfn_range+0x24/0x1a0 [kvm] [...] Call Trace: dump_stack+0x4e/0x84 lockdep_rcu_suspicious+0xfd/0x130 kvm_zap_gfn_range+0x188/0x1a0 [kvm] kvm_set_cr0+0xde/0x1e0 [kvm] init_vmcb+0x760/0xad0 [kvm_amd] svm_create_vcpu+0x197/0x250 [kvm_amd] kvm_arch_vcpu_create+0x47/0x70 [kvm] kvm_vm_ioctl+0x302/0x7e0 [kvm] ? __lock_is_held+0x51/0x70 ? __fget+0x101/0x210 do_vfs_ioctl+0x2f4/0x560 ? __fget_light+0x29/0x90 SyS_ioctl+0x4c/0x90 entry_SYSCALL_64_fastpath+0x16/0x73 Reported-by: NBorislav Petkov <bp@alien8.de> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 18 9月, 2015 1 次提交
-
-
由 Igor Mammedov 提交于
When INIT/SIPI sequence is sent to VCPU which before that was in use by OS, VMRUN might fail with: KVM: entry failed, hardware error 0xffffffff EAX=00000000 EBX=00000000 ECX=00000000 EDX=000006d3 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 EIP=00000000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 0000ffff 00009300 CS =9a00 0009a000 0000ffff 00009a00 [...] CR0=60000010 CR2=b6f3e000 CR3=01942000 CR4=000007e0 [...] EFER=0000000000000000 with corresponding SVM error: KVM: FAILED VMRUN WITH VMCB: [...] cpl: 0 efer: 0000000000001000 cr0: 0000000080010010 cr2: 00007fd7fe85bf90 cr3: 0000000187d0c000 cr4: 0000000000000020 [...] What happens is that VCPU state right after offlinig: CR0: 0x80050033 EFER: 0xd01 CR4: 0x7e0 -> long mode with CR3 pointing to longmode page tables and when VCPU gets INIT/SIPI following transition happens CR0: 0 -> 0x60000010 EFER: 0x0 CR4: 0x7e0 -> paging disabled with stale CR3 However SVM under the hood puts VCPU in Paged Real Mode* which effectively translates CR0 0x60000010 -> 80010010 after svm_vcpu_reset() -> init_vmcb() -> kvm_set_cr0() -> svm_set_cr0() but from kvm_set_cr0() perspective CR0: 0 -> 0x60000010 only caching bits are changed and commit d81135a5 ("KVM: x86: do not reset mmu if CR0.CD and CR0.NW are changed")' regressed svm_vcpu_reset() which relied on MMU being reset. As result VMRUN after svm_vcpu_reset() tries to run VCPU in Paged Real Mode with stale MMU context (longmode page tables), which causes some AMD CPUs** to bail out with VMEXIT_INVALID. Fix issue by unconditionally resetting MMU context at init_vmcb() time. * AMD64 Architecture Programmer’s Manual, Volume 2: System Programming, rev: 3.25 15.19 Paged Real Mode ** Opteron 1216 Signed-off-by: NIgor Mammedov <imammedo@redhat.com> Fixes: d81135a5 Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 05 8月, 2015 1 次提交
-
-
由 Xiao Guangrong 提交于
We have abstracted the data struct and functions which are used to check reserved bit on guest page tables, now we extend the logic to check zero bits on shadow page tables The zero bits on sptes include not only reserved bits on hardware but also the bits that SPTEs willnever use. For example, shadow pages will never use GB pages unless the guest uses them too. Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 23 7月, 2015 3 次提交
-
-
由 Paolo Bonzini 提交于
We can disable CD unconditionally when there is no assigned device. KVM now forces guest PAT to all-writeback in that case, so it makes sense to also force CR0.CD=0. When there are assigned devices, emulate cache-disabled operation through the page tables. This behavior is consistent with VMX microcode, where CD/NW are not touched by vmentry/vmexit. However, keep this dependent on the quirk because OVMF enables the caches too late. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Make them clearly architecture-dependent; the capability is valid for all architectures, but the argument is not. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
The logic of the disabled_quirks field usually results in a double negation. Wrap it in a simple function that checks the bit and negates it. Based on a patch from Xiao Guangrong. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-