- 23 9月, 2016 2 次提交
-
-
由 Wanpeng Li 提交于
I observed that kvmvapic(to optimize flexpriority=N or AMD) is used to boost TPR access when testing kvm-unit-test/eventinj.flat tpr case on my haswell desktop (w/ flexpriority, w/o APICv). Commit (8d14695f x86, apicv: add virtual x2apic support) disable virtual x2apic mode completely if w/o APICv, and the author also told me that windows guest can't enter into x2apic mode when he developed the APICv feature several years ago. However, it is not truth currently, Interrupt Remapping and vIOMMU is added to qemu and the developers from Intel test windows 8 can work in x2apic mode w/ Interrupt Remapping enabled recently. This patch enables TPR shadow for virtual x2apic mode to boost windows guest in x2apic mode even if w/o APICv. Can pass the kvm-unit-test. Suggested-by: NRadim Krčmář <rkrcmar@redhat.com> Suggested-by: NWincy Van <fanwenyi0529@gmail.com> Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Wincy Van <fanwenyi0529@gmail.com> Cc: Yang Zhang <yang.zhang.wz@gmail.com> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Wanpeng Li 提交于
WARNING: CPU: 1 PID: 4230 at kernel/sched/core.c:7564 __might_sleep+0x7e/0x80 do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff8d0de7f9>] prepare_to_swait+0x39/0xa0 CPU: 1 PID: 4230 Comm: qemu-system-x86 Not tainted 4.8.0-rc5+ #47 Call Trace: dump_stack+0x99/0xd0 __warn+0xd1/0xf0 warn_slowpath_fmt+0x4f/0x60 ? prepare_to_swait+0x39/0xa0 ? prepare_to_swait+0x39/0xa0 __might_sleep+0x7e/0x80 __gfn_to_pfn_memslot+0x156/0x480 [kvm] gfn_to_pfn+0x2a/0x30 [kvm] gfn_to_page+0xe/0x20 [kvm] kvm_vcpu_reload_apic_access_page+0x32/0xa0 [kvm] nested_vmx_vmexit+0x765/0xca0 [kvm_intel] ? _raw_spin_unlock_irqrestore+0x36/0x80 vmx_check_nested_events+0x49/0x1f0 [kvm_intel] kvm_arch_vcpu_runnable+0x2d/0xe0 [kvm] kvm_vcpu_check_block+0x12/0x60 [kvm] kvm_vcpu_block+0x94/0x4c0 [kvm] kvm_arch_vcpu_ioctl_run+0x619/0x1aa0 [kvm] ? kvm_arch_vcpu_ioctl_run+0xdf1/0x1aa0 [kvm] kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm] =============================== [ INFO: suspicious RCU usage. ] 4.8.0-rc5+ #47 Not tainted ------------------------------- ./include/linux/kvm_host.h:535 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by qemu-system-x86/4230: #0: (&vcpu->mutex){+.+.+.}, at: [<ffffffffc062975c>] vcpu_load+0x1c/0x60 [kvm] stack backtrace: CPU: 1 PID: 4230 Comm: qemu-system-x86 Not tainted 4.8.0-rc5+ #47 Call Trace: dump_stack+0x99/0xd0 lockdep_rcu_suspicious+0xe7/0x120 gfn_to_memslot+0x12a/0x140 [kvm] gfn_to_pfn+0x12/0x30 [kvm] gfn_to_page+0xe/0x20 [kvm] kvm_vcpu_reload_apic_access_page+0x32/0xa0 [kvm] nested_vmx_vmexit+0x765/0xca0 [kvm_intel] ? _raw_spin_unlock_irqrestore+0x36/0x80 vmx_check_nested_events+0x49/0x1f0 [kvm_intel] kvm_arch_vcpu_runnable+0x2d/0xe0 [kvm] kvm_vcpu_check_block+0x12/0x60 [kvm] kvm_vcpu_block+0x94/0x4c0 [kvm] kvm_arch_vcpu_ioctl_run+0x619/0x1aa0 [kvm] ? kvm_arch_vcpu_ioctl_run+0xdf1/0x1aa0 [kvm] kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm] ? __fget+0xfd/0x210 ? __lock_is_held+0x54/0x70 do_vfs_ioctl+0x96/0x6a0 ? __fget+0x11c/0x210 ? __fget+0x5/0x210 SyS_ioctl+0x79/0x90 do_syscall_64+0x81/0x220 entry_SYSCALL64_slow_path+0x25/0x25 These can be triggered by running kvm-unit-test: ./x86-run x86/vmx.flat The nested preemption timer is based on hrtimer which is started on L2 entry, stopped on L2 exit and evaluated via the new check_nested_events hook. The current logic adds vCPU to a simple waitqueue (TASK_INTERRUPTIBLE) if need to yield pCPU and w/o holding srcu read lock when accesses memslots, both can be in nested preemption timer evaluation path which results in the warning above. This patch fix it by leveraging request bit to async reload APIC access page before vmentry in order to avoid to reload directly during the nested preemption timer evaluation, it is safe since the vmcs01 is loaded and current is nested vmexit. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Yunhong Jiang <yunhong.jiang@intel.com> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 16 9月, 2016 1 次提交
-
-
由 Luiz Capitulino 提交于
The TSC offset can now be read directly from struct kvm_arch_vcpu. Signed-off-by: NLuiz Capitulino <lcapitulino@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 08 9月, 2016 6 次提交
-
-
由 Jan Dakinevich 提交于
Expose the feature to L1 hypervisor if host CPU supports it, since certain hypervisors requires it for own purposes. According to Intel SDM A.1, if CPU supports the feature, VMX_INSTRUCTION_INFO field of VMCS will contain detailed information about INS/OUTS instructions handling. This field is already copied to VMCS12 for L1 hypervisor (see prepare_vmcs12 routine) independently feature presence. Signed-off-by: NJan Dakinevich <jan.dakinevich@gmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
setup_vmcs_config takes a pointer to the vmcs_config global. The indirection is somewhat pointless, but just keep things consistent for now. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
These are mostly related to nested VMX. They needn't have a loglevel as high as KERN_WARN, and mustn't be allowed to pollute the host logs. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jan Dakinevich 提交于
If EPT support is exposed to L1 hypervisor, guest linear-address field of VMCS should contain GVA of L2, the access to which caused EPT violation. Signed-off-by: NJan Dakinevich <jan.dakinevich@gmail.com> Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
Commit 61abdbe0 ("kvm: x86: make lapic hrtimer pinned") pins the emulated lapic timer. This patch does the same for the emulated nested preemption timer to avoid vmexit an unrelated vCPU and the latency of kicking IPI to another vCPU. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Yunhong Jiang <yunhong.jiang@intel.com> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Liang Li 提交于
The validity check for the guest line address is inefficient, check the invalid value instead of enumerating the valid ones. Signed-off-by: NLiang Li <liang.z.li@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 18 8月, 2016 3 次提交
-
-
由 Peter Feiner 提交于
When the host supported TSC scaling, L2 would use a TSC multiplier of 0, which causes a VM entry failure. Now L2's TSC uses the same multiplier as L1. Signed-off-by: NPeter Feiner <pfeiner@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Radim Krčmář 提交于
If vmcs12 does not intercept APIC_BASE writes, then KVM will handle the write with vmcs02 as the current VMCS. This will incorrectly apply modifications intended for vmcs01 to vmcs02 and L2 can use it to gain access to L0's x2APIC registers by disabling virtualized x2APIC while using msr bitmap that assumes enabled. Postpone execution of vmx_set_virtual_x2apic_mode until vmcs01 is the current VMCS. An alternative solution would temporarily make vmcs01 the current VMCS, but it requires more care. Fixes: 8d14695f ("x86, apicv: add virtual x2apic support") Reported-by: NJim Mattson <jmattson@google.com> Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Radim Krčmář 提交于
msr bitmap can be used to avoid a VM exit (interception) on guest MSR accesses. In some configurations of VMX controls, the guest can even directly access host's x2APIC MSRs. See SDM 29.5 VIRTUALIZING MSR-BASED APIC ACCESSES. L2 could read all L0's x2APIC MSRs and write TPR, EOI, and SELF_IPI. To do so, L1 would first trick KVM to disable all possible interceptions by enabling APICv features and then would turn those features off; nested_vmx_merge_msr_bitmap() only disabled interceptions, so VMX would not intercept previously enabled MSRs even though they were not safe with the new configuration. Correctly re-enabling interceptions is not enough as a second bug would still allow L1+L2 to access host's MSRs: msr bitmap was shared for all VMCSs, so L1 could trigger a race to get the desired combination of msr bitmap and VMX controls. This fix allocates a msr bitmap for every L1 VCPU, allows only safe x2APIC MSRs from L1's msr bitmap, and disables msr bitmaps if they would have to intercept everything anyway. Fixes: 3af18d9c ("KVM: nVMX: Prepare for using hardware MSR bitmap") Reported-by: NJim Mattson <jmattson@google.com> Suggested-by: NWincy Van <fanwenyi0529@gmail.com> Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 04 8月, 2016 2 次提交
-
-
由 Bandan Das 提交于
Commit 4b855078 ("KVM: nVMX: Don't advertise single context invalidation for invept") removed advertising single context invalidation since the spec does not mandate it. However, some hypervisors (such as ESX) require it to be present before willing to use ept in a nested environment. Advertise it and fallback to the global case. Signed-off-by: NBandan Das <bsd@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Bandan Das 提交于
Nested vpid is already supported and both single/global modes are advertised to the guest Signed-off-by: NBandan Das <bsd@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 01 8月, 2016 2 次提交
-
-
由 Jim Mattson 提交于
Kexec needs to know the addresses of all VMCSs that are active on each CPU, so that it can flush them from the VMCS caches. It is safe to record superfluous addresses that are not associated with an active VMCS, but it is not safe to omit an address associated with an active VMCS. After a call to vmcs_load, the VMCS that was loaded is active on the CPU. The VMCS should be added to the CPU's list of active VMCSs before it is loaded. Signed-off-by: NJim Mattson <jmattson@google.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 David Matlack 提交于
KVM maintains L1's current VMCS in guest memory, at the guest physical page identified by the argument to VMPTRLD. This makes hairy time-of-check to time-of-use bugs possible,as VCPUs can be writing the the VMCS page in memory while KVM is emulating VMLAUNCH and VMRESUME. The spec documents that writing to the VMCS page while it is loaded is "undefined". Therefore it is reasonable to load the entire VMCS into an internal cache during VMPTRLD and ignore writes to the VMCS page -- the guest should be using VMREAD and VMWRITE to access the current VMCS. To adhere to the spec, KVM should flush the current VMCS during VMPTRLD, and the target VMCS during VMCLEAR (as given by the operand to VMCLEAR). Since this implementation of VMCS caching only maintains the the current VMCS, VMCLEAR will only do a flush if the operand to VMCLEAR is the current VMCS pointer. KVM will also flush during VMXOFF, which is not mandated by the spec, but also not in conflict with the spec. Signed-off-by: NDavid Matlack <dmatlack@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 24 7月, 2016 1 次提交
-
-
由 Dan Williams 提交于
This reverts commit 8b3e34e4. Given the deprecation of the pcommit instruction, the relevant VMX features and CPUID bits are not going to be rolled into the SDM. Remove their usage from KVM. Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 16 7月, 2016 1 次提交
-
-
由 Cao, Lei 提交于
With PML enabled, guest will shut down if a PML full VMEXIT occurs during event delivery. According to Intel SDM 27.2.3, PML full VMEXIT can occur when event is being delivered through IDT, so KVM should not exit to user space with error. Instead, it should let EXIT_REASON_PML_FULL go through and the event will be re-injected on the next VMENTRY. Signed-off-by: NLei Cao <lei.cao@stratus.com> Cc: stable@vger.kernel.org Fixes: 843e4330 ("KVM: VMX: Add PML support in VMX") [Shortened the summary and Cc'd stable.] Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 15 7月, 2016 2 次提交
-
-
由 Jim Mattson 提交于
When freeing the nested resources of a vcpu, there is an assumption that the vcpu's vmcs01 is the current VMCS on the CPU that executes nested_release_vmcs12(). If this assumption is violated, the vcpu's vmcs01 may be made active on multiple CPUs at the same time, in violation of Intel's specification. Moreover, since the vcpu's vmcs01 is not VMCLEARed on every CPU on which it is active, it can linger in a CPU's VMCS cache after it has been freed and potentially repurposed. Subsequent eviction from the CPU's VMCS cache on a capacity miss can result in memory corruption. It is not sufficient for vmx_free_vcpu() to call vmx_load_vmcs01(). If the vcpu in question was last loaded on a different CPU, it must be migrated to the current CPU before calling vmx_load_vmcs01(). Signed-off-by: NJim Mattson <jmattson@google.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Peter Feiner 提交于
Between loading the new VMCS and enabling PML, the CPU was unpinned. If the vCPU thread were migrated to another CPU in the interim (e.g., due to preemption or sleeping alloc_page), then the VMWRITEs to enable PML would target the wrong VMCS -- or no VMCS at all: [ 2087.266950] vmwrite error: reg 200e value 3fe1d52000 (err -506126336) [ 2087.267062] vmwrite error: reg 812 value 1ff (err 511) [ 2087.267125] vmwrite error: reg 401e value 12229c00 (err 304258048) This patch ensures that the VMCS remains current while enabling PML by doing the VMWRITEs while the CPU is pinned. Allocation of the PML buffer is hoisted out of the critical section. Signed-off-by: NPeter Feiner <pfeiner@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 14 7月, 2016 5 次提交
-
-
由 Radim Krčmář 提交于
KVM_CAP_X2APIC_API is a capability for features related to x2APIC enablement. KVM_X2APIC_API_32BIT_FORMAT feature can be enabled to extend APIC ID in get/set ioctl and MSI addresses to 32 bits. Both are needed to support x2APIC. The feature has to be enableable and disabled by default, because get/set ioctl shifted and truncated APIC ID to 8 bits by using a non-standard protocol inspired by xAPIC and the change is not backward-compatible. Changes to MSI addresses follow the format used by interrupt remapping unit. The upper address word, that used to be 0, contains upper 24 bits of the LAPIC address in its upper 24 bits. Lower 8 bits are reserved as 0. Using the upper address word is not backward-compatible either as we didn't check that userspace zeroed the word. Reserved bits are still not explicitly checked, but non-zero data will affect LAPIC addresses, which will cause a bug. Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Radim Krčmář 提交于
The register is in hardware-compatible format now, so there is not need to intercept. Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Bandan Das 提交于
MMU now knows about execute only mappings, so advertise the feature to L1 hypervisors Signed-off-by: NBandan Das <bsd@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Bandan Das 提交于
To support execute only mappings on behalf of L1 hypervisors, reuse ACC_USER_MASK to signify if the L1 hypervisor has the R bit set. For the nested EPT case, we assumed that the U bit was always set since there was no equivalent in EPT page tables. Strictly speaking, this was not necessary because handle_ept_violation never set PFERR_USER_MASK in the error code (uf=0 in the parlance of update_permission_bitmask). We now have to set both U and UF correctly, respectively in FNAME(gpte_access) and in handle_ept_violation. Also in handle_ept_violation bit 3 of the exit qualification is not enough to detect a present PTE; all three bits 3-5 have to be checked. Signed-off-by: NBandan Das <bsd@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Bandan Das 提交于
To support execute only mappings on behalf of L1 hypervisors, we need to teach set_spte() to honor all three of L1's XWR bits. As a start, add a new variable "shadow_present_mask" that will be set for non-EPT shadow paging and clear for EPT. Signed-off-by: NBandan Das <bsd@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 11 7月, 2016 4 次提交
-
-
由 Paolo Bonzini 提交于
There is no reason to read the entry/exit control fields of the VMCS and immediately write back the same value. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Because the vmcs12 preemption timer is emulated through a separate hrtimer, we can keep on using the preemption timer in the vmcs02 to emulare L1's TSC deadline timer. However, the corresponding bit in the pin-based execution control field must be kept consistent between vmcs01 and vmcs02. On vmentry we copy it into the vmcs02; on vmexit the preemption timer must be disabled in the vmcs01 if a preemption timer vmexit happened while in guest mode. The preemption timer value in the vmcs02 is set by vmx_vcpu_run, so it need not be considered in prepare_vmcs02. Cc: Yunhong Jiang <yunhong.jiang@intel.com> Cc: Haozhong Zhang <haozhong.zhang@intel.com> Tested-by: NWanpeng Li <kernellwp@gmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
The preemption timer for nested VMX is emulated by hrtimer which is started on L2 entry, stopped on L2 exit and evaluated via the check_nested_events hook. However, nested_vmx_exit_handled is always returning true for preemption timer vmexit. Then, the L1 preemption timer vmexit is captured and be treated as a L2 preemption timer vmexit, causing NULL pointer dereferences or worse in the L1 guest's vmexit handler: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [< (null)>] (null) PGD 0 Oops: 0010 [#1] SMP Call Trace: ? kvm_lapic_expired_hv_timer+0x47/0x90 [kvm] handle_preemption_timer+0xe/0x20 [kvm_intel] vmx_handle_exit+0x169/0x15a0 [kvm_intel] ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm] kvm_arch_vcpu_ioctl_run+0xdee/0x19d0 [kvm] ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm] ? vcpu_load+0x1c/0x60 [kvm] ? kvm_arch_vcpu_load+0x57/0x260 [kvm] kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm] do_vfs_ioctl+0x96/0x6a0 ? __fget_light+0x2a/0x90 SyS_ioctl+0x79/0x90 do_syscall_64+0x68/0x180 entry_SYSCALL64_slow_path+0x25/0x25 Code: Bad RIP value. RIP [< (null)>] (null) RSP <ffff8800b5263c48> CR2: 0000000000000000 ---[ end trace 9c70c48b1a2bc66e ]--- This can be reproduced readily by preemption timer enabled on L0 and disabled on L1. Return false since preemption timer vmexits must never be reflected to L2. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Yunhong Jiang <yunhong.jiang@intel.com> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Simplify cpu_has_vmx_preemption_timer. This is consistent with the rest of setup_vmcs_config and preparatory for the next patch. Tested-by: NWanpeng Li <kernellwp@gmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 05 7月, 2016 1 次提交
-
-
由 Wei Yongjun 提交于
Use ARRAY_SIZE instead of dividing sizeof array with sizeof an element Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 01 7月, 2016 3 次提交
-
-
由 Paolo Bonzini 提交于
If the TSC deadline timer is programmed really close to the deadline or even in the past, the computation in vmx_set_hv_timer can underflow and cause delta_tsc to be set to a huge value. This generally results in vmx_set_hv_timer returning -ERANGE, but we can fix it by limiting delta_tsc to be positive or zero. Reported-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
This gains a few clock cycles per vmexit. On Intel there is no need anymore to enable the interrupts in vmx_handle_external_intr, since we are using the "acknowledge interrupt on exit" feature. AMD needs to do that, and must be careful to avoid the interrupt shadow. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
This is necessary to simplify handle_external_intr in the next patch. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 27 6月, 2016 1 次提交
-
-
由 Quentin Casasnovas 提交于
I couldn't get Xen to boot a L2 HVM when it was nested under KVM - it was getting a GP(0) on a rather unspecial vmread from Xen: (XEN) ----[ Xen-4.7.0-rc x86_64 debug=n Not tainted ]---- (XEN) CPU: 1 (XEN) RIP: e008:[<ffff82d0801e629e>] vmx_get_segment_register+0x14e/0x450 (XEN) RFLAGS: 0000000000010202 CONTEXT: hypervisor (d1v0) (XEN) rax: ffff82d0801e6288 rbx: ffff83003ffbfb7c rcx: fffffffffffab928 (XEN) rdx: 0000000000000000 rsi: 0000000000000000 rdi: ffff83000bdd0000 (XEN) rbp: ffff83000bdd0000 rsp: ffff83003ffbfab0 r8: ffff830038813910 (XEN) r9: ffff83003faf3958 r10: 0000000a3b9f7640 r11: ffff83003f82d418 (XEN) r12: 0000000000000000 r13: ffff83003ffbffff r14: 0000000000004802 (XEN) r15: 0000000000000008 cr0: 0000000080050033 cr4: 00000000001526e0 (XEN) cr3: 000000003fc79000 cr2: 0000000000000000 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 (XEN) Xen code around <ffff82d0801e629e> (vmx_get_segment_register+0x14e/0x450): (XEN) 00 00 41 be 02 48 00 00 <44> 0f 78 74 24 08 0f 86 38 56 00 00 b8 08 68 00 (XEN) Xen stack trace from rsp=ffff83003ffbfab0: ... (XEN) Xen call trace: (XEN) [<ffff82d0801e629e>] vmx_get_segment_register+0x14e/0x450 (XEN) [<ffff82d0801f3695>] get_page_from_gfn_p2m+0x165/0x300 (XEN) [<ffff82d0801bfe32>] hvmemul_get_seg_reg+0x52/0x60 (XEN) [<ffff82d0801bfe93>] hvm_emulate_prepare+0x53/0x70 (XEN) [<ffff82d0801ccacb>] handle_mmio+0x2b/0xd0 (XEN) [<ffff82d0801be591>] emulate.c#_hvm_emulate_one+0x111/0x2c0 (XEN) [<ffff82d0801cd6a4>] handle_hvm_io_completion+0x274/0x2a0 (XEN) [<ffff82d0801f334a>] __get_gfn_type_access+0xfa/0x270 (XEN) [<ffff82d08012f3bb>] timer.c#add_entry+0x4b/0xb0 (XEN) [<ffff82d08012f80c>] timer.c#remove_entry+0x7c/0x90 (XEN) [<ffff82d0801c8433>] hvm_do_resume+0x23/0x140 (XEN) [<ffff82d0801e4fe7>] vmx_do_resume+0xa7/0x140 (XEN) [<ffff82d080164aeb>] context_switch+0x13b/0xe40 (XEN) [<ffff82d080128e6e>] schedule.c#schedule+0x22e/0x570 (XEN) [<ffff82d08012c0cc>] softirq.c#__do_softirq+0x5c/0x90 (XEN) [<ffff82d0801602c5>] domain.c#idle_loop+0x25/0x50 (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 1: (XEN) GENERAL PROTECTION FAULT (XEN) [error_code=0000] (XEN) **************************************** Tracing my host KVM showed it was the one injecting the GP(0) when emulating the VMREAD and checking the destination segment permissions in get_vmx_mem_address(): 3) | vmx_handle_exit() { 3) | handle_vmread() { 3) | nested_vmx_check_permission() { 3) | vmx_get_segment() { 3) 0.074 us | vmx_read_guest_seg_base(); 3) 0.065 us | vmx_read_guest_seg_selector(); 3) 0.066 us | vmx_read_guest_seg_ar(); 3) 1.636 us | } 3) 0.058 us | vmx_get_rflags(); 3) 0.062 us | vmx_read_guest_seg_ar(); 3) 3.469 us | } 3) | vmx_get_cs_db_l_bits() { 3) 0.058 us | vmx_read_guest_seg_ar(); 3) 0.662 us | } 3) | get_vmx_mem_address() { 3) 0.068 us | vmx_cache_reg(); 3) | vmx_get_segment() { 3) 0.074 us | vmx_read_guest_seg_base(); 3) 0.068 us | vmx_read_guest_seg_selector(); 3) 0.071 us | vmx_read_guest_seg_ar(); 3) 1.756 us | } 3) | kvm_queue_exception_e() { 3) 0.066 us | kvm_multiple_exception(); 3) 0.684 us | } 3) 4.085 us | } 3) 9.833 us | } 3) + 10.366 us | } Cross-checking the KVM/VMX VMREAD emulation code with the Intel Software Developper Manual Volume 3C - "VMREAD - Read Field from Virtual-Machine Control Structure", I found that we're enforcing that the destination operand is NOT located in a read-only data segment or any code segment when the L1 is in long mode - BUT that check should only happen when it is in protected mode. Shuffling the code a bit to make our emulation follow the specification allows me to boot a Xen dom0 in a nested KVM and start HVM L2 guests without problems. Fixes: f9eb4af6 ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions") Signed-off-by: NQuentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Eugene Korenevsky <ekorenevsky@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 24 6月, 2016 3 次提交
-
-
由 Ashok Raj 提交于
On Intel platforms, this patch adds LMCE to KVM MCE supported capabilities and handles guest access to LMCE related MSRs. Signed-off-by: NAshok Raj <ashok.raj@intel.com> [Haozhong: macro KVM_MCE_CAP_SUPPORTED => variable kvm_mce_cap_supported Only enable LMCE on Intel platform Check MSR_IA32_FEATURE_CONTROL when handling guest access to MSR_IA32_MCG_EXT_CTL] Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Haozhong Zhang 提交于
KVM currently does not check the value written to guest MSR_IA32_FEATURE_CONTROL, though bits corresponding to disabled features may be set. This patch makes KVM to validate individual bits written to guest MSR_IA32_FEATURE_CONTROL according to enabled features. Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Haozhong Zhang 提交于
msr_ia32_feature_control will be used for LMCE and not depend only on nested anymore, so move it from struct nested_vmx to struct vcpu_vmx. Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 16 6月, 2016 3 次提交
-
-
由 Yunhong Jiang 提交于
Hook the VMX preemption timer to the "hv timer" functionality added by the previous patch. This includes: checking if the feature is supported, if the feature is broken on the CPU, the hooks to setup/clean the VMX preemption timer, arming the timer on vmentry and handling the vmexit. A module parameter states if the VMX preemption timer should be utilized. Signed-off-by: NYunhong Jiang <yunhong.jiang@intel.com> [Move hv_deadline_tsc to struct vcpu_vmx, use -1 as the "unset" value. Put all VMX bits here. Enable it by default #yolo. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Yunhong Jiang 提交于
Prepare to switch from preemption timer to hrtimer in the vmx_pre/post_block. Current functions are only for posted interrupt, rename them accordingly. Signed-off-by: NYunhong Jiang <yunhong.jiang@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Yang Zhang 提交于
VT-d posted interrupt is relying on the CPU side's posted interrupt. Need to check whether VCPU's APICv is active before enabing VT-d posted interrupt. Fixes: d62caabb Cc: stable@vger.kernel.org Signed-off-by: NYang Zhang <yang.zhang.wz@gmail.com> Signed-off-by: NShengge Ding <shengge.dsg@alibaba-inc.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-