- 08 12月, 2016 13 次提交
-
-
由 Ladi Prosek 提交于
Loading CR3 as part of emulating vmentry is different from regular CR3 loads, as implemented in kvm_set_cr3, in several ways. * different rules are followed to check CR3 and it is desirable for the caller to distinguish between the possible failures * PDPTRs are not loaded if PAE paging and nested EPT are both enabled * many MMU operations are not necessary This patch introduces nested_vmx_load_cr3 suitable for CR3 loads as part of nested vmentry and vmexit, and makes use of it on the nested vmentry path. Signed-off-by: NLadi Prosek <lprosek@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Ladi Prosek 提交于
It is possible that prepare_vmcs02 fails to load the guest state. This patch adds the proper error handling for such a case. L1 will receive an INVALID_STATE vmexit with the appropriate exit qualification if it happens. A failure to set guest CR3 is the only error propagated from prepare_vmcs02 at the moment. Signed-off-by: NLadi Prosek <lprosek@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Ladi Prosek 提交于
KVM does not correctly handle L1 hypervisors that emulate L2 real mode with PAE and EPT, such as Hyper-V. In this mode, the L1 hypervisor populates guest PDPTE VMCS fields and leaves guest CR3 uninitialized because it is not used (see 26.3.2.4 Loading Page-Directory-Pointer-Table Entries). KVM always dereferences CR3 and tries to load PDPTEs if PAE is on. This leads to two related issues: 1) On the first nested vmentry, the guest PDPTEs, as populated by L1, are overwritten in ept_load_pdptrs because the registers are believed to have been loaded in load_pdptrs as part of kvm_set_cr3. This is incorrect. L2 is running with PAE enabled but PDPTRs have been set up by L1. 2) When L2 is about to enable paging and loads its CR3, we, again, attempt to load PDPTEs in load_pdptrs called from kvm_set_cr3. There are no guarantees that this will succeed (it's just a CR3 load, paging is not enabled yet) and if it doesn't, kvm_set_cr3 returns early without persisting the CR3 which is then lost and L2 crashes right after it enables paging. This patch replaces the kvm_set_cr3 call with a simple register write if PAE and EPT are both on. CR3 is not to be interpreted in this case. Signed-off-by: NLadi Prosek <lprosek@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 David Matlack 提交于
vmx_set_cr0() modifies GUEST_EFER and "IA-32e mode guest" in the current VMCS. Call vmx_set_efer() after vmx_set_cr0() so that emulated VM-entry is more faithful to VMCS12. This patch correctly causes VM-entry to fail when "IA-32e mode guest" is 1 and GUEST_CR0.PG is 0. Previously this configuration would succeed and "IA-32e mode guest" would silently be disabled by KVM. Signed-off-by: NDavid Matlack <dmatlack@google.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 David Matlack 提交于
MSR_IA32_CR{0,4}_FIXED1 define which bits in CR0 and CR4 are allowed to be 1 during VMX operation. Since the set of allowed-1 bits is the same in and out of VMX operation, we can generate these MSRs entirely from the guest's CPUID. This lets userspace avoiding having to save/restore these MSRs. This patch also initializes MSR_IA32_CR{0,4}_FIXED1 from the CPU's MSRs by default. This is a saner than the current default of -1ull, which includes bits that the host CPU does not support. Signed-off-by: NDavid Matlack <dmatlack@google.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 David Matlack 提交于
KVM emulates MSR_IA32_VMX_CR{0,4}_FIXED1 with the value -1ULL, meaning all CR0 and CR4 bits are allowed to be 1 during VMX operation. This does not match real hardware, which disallows the high 32 bits of CR0 to be 1, and disallows reserved bits of CR4 to be 1 (including bits which are defined in the SDM but missing according to CPUID). A guest can induce a VM-entry failure by setting these bits in GUEST_CR0 and GUEST_CR4, despite MSR_IA32_VMX_CR{0,4}_FIXED1 indicating they are valid. Since KVM has allowed all bits to be 1 in CR0 and CR4, the existing checks on these registers do not verify must-be-0 bits. Fix these checks to identify must-be-0 bits according to MSR_IA32_VMX_CR{0,4}_FIXED1. This patch should introduce no change in behavior in KVM, since these MSRs are still -1ULL. Signed-off-by: NDavid Matlack <dmatlack@google.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 David Matlack 提交于
The VMX capability MSRs advertise the set of features the KVM virtual CPU can support. This set of features varies across different host CPUs and KVM versions. This patch aims to addresses both sources of differences, allowing VMs to be migrated across CPUs and KVM versions without guest-visible changes to these MSRs. Note that cross-KVM- version migration is only supported from this point forward. When the VMX capability MSRs are restored, they are audited to check that the set of features advertised are a subset of what KVM and the CPU support. Since the VMX capability MSRs are read-only, they do not need to be on the default MSR save/restore lists. The userspace hypervisor can set the values of these MSRs or read them from KVM at VCPU creation time, and restore the same value after every save/restore. Signed-off-by: NDavid Matlack <dmatlack@google.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 David Matlack 提交于
The "non-true" VMX capability MSRs can be generated from their "true" counterparts, by OR-ing the default1 bits. The default1 bits are fixed and defined in the SDM. Since we can generate the non-true VMX MSRs from the true versions, there's no need to store both in struct nested_vmx. This also lets userspace avoid having to restore the non-true MSRs. Note this does not preclude emulating MSR_IA32_VMX_BASIC[55]=0. To do so, we simply need to set all the default1 bits in the true MSRs (such that the true MSRs and the generated non-true MSRs are equal). Signed-off-by: NDavid Matlack <dmatlack@google.com> Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Kyle Huey 提交于
The trap flag stays set until software clears it. Signed-off-by: NKyle Huey <khuey@kylehuey.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Kyle Huey 提交于
kvm_skip_emulated_instruction calls both kvm_x86_ops->skip_emulated_instruction and kvm_vcpu_check_singlestep, skipping the emulated instruction and generating a trap if necessary. Replacing skip_emulated_instruction calls with kvm_skip_emulated_instruction is straightforward, except for: - ICEBP, which is already inside a trap, so avoid triggering another trap. - Instructions that can trigger exits to userspace, such as the IO insns, MOVs to CR8, and HALT. If kvm_skip_emulated_instruction does trigger a KVM_GUESTDBG_SINGLESTEP exit, and the handling code for IN/OUT/MOV CR8/HALT also triggers an exit to userspace, the latter will take precedence. The singlestep will be triggered again on the next instruction, which is the current behavior. - Task switch instructions which would require additional handling (e.g. the task switch bit) and are instead left alone. - Cases where VMLAUNCH/VMRESUME do not proceed to the next instruction, which do not trigger singlestep traps as mentioned previously. Signed-off-by: NKyle Huey <khuey@kylehuey.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Kyle Huey 提交于
We can't return both the pass/fail boolean for the vmcs and the upcoming continue/exit-to-userspace boolean for skip_emulated_instruction out of nested_vmx_check_vmcs, so move skip_emulated_instruction out of it instead. Additionally, VMENTER/VMRESUME only trigger singlestep exceptions when they advance the IP to the following instruction, not when they a) succeed, b) fail MSR validation or c) throw an exception. Add a separate call to skip_emulated_instruction that will later not be converted to the variant that checks the singlestep flag. Signed-off-by: NKyle Huey <khuey@kylehuey.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Kyle Huey 提交于
The functions being moved ahead of skip_emulated_instruction here don't need updated IPs, and skipping the emulated instruction at the end will make it easier to return its value. Signed-off-by: NKyle Huey <khuey@kylehuey.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Kyle Huey 提交于
Once skipping the emulated instruction can potentially trigger an exit to userspace (via KVM_GUESTDBG_SINGLESTEP) kvm_emulate_cpuid will need to propagate a return value. Signed-off-by: NKyle Huey <khuey@kylehuey.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 25 11月, 2016 2 次提交
-
-
由 Tom Lendacky 提交于
Update the I/O interception support to add the kvm_fast_pio_in function to speed up the in instruction similar to the out instruction. Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com> Reviewed-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Tom Lendacky 提交于
AMD hardware adds two additional bits to aid in nested page fault handling. Bit 32 - NPF occurred while translating the guest's final physical address Bit 33 - NPF occurred while translating the guest page tables The guest page tables fault indicator can be used as an aid for nested virtualization. Using V0 for the host, V1 for the first level guest and V2 for the second level guest, when both V1 and V2 are using nested paging there are currently a number of unnecessary instruction emulations. When V2 is launched shadow paging is used in V1 for the nested tables of V2. As a result, KVM marks these pages as RO in the host nested page tables. When V2 exits and we resume V1, these pages are still marked RO. Every nested walk for a guest page table is treated as a user-level write access and this causes a lot of NPFs because the V1 page tables are marked RO in the V0 nested tables. While executing V1, when these NPFs occur KVM sees a write to a read-only page, emulates the V1 instruction and unprotects the page (marking it RW). This patch looks for cases where we get a NPF due to a guest page table walk where the page was marked RO. It immediately unprotects the page and resumes the guest, leading to far fewer instruction emulations when nested virtualization is used. Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com> Reviewed-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 23 11月, 2016 3 次提交
-
-
由 Bandan Das 提交于
Change unimplemented msrs messages to use pr_debug. If CONFIG_DYNAMIC_DEBUG is set, then these messages can be enabled at run time or else -DDEBUG can be used at compile time to enable them. These messages will still be printed if ignore_msrs=1. Signed-off-by: NBandan Das <bsd@redhat.com> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Jan Dakinevich 提交于
- Expose all invalidation types to the L1 - Reject invvpid instruction, if L1 passed zero vpid value to single context invalidations Signed-off-by: NJan Dakinevich <jan.dakinevich@gmail.com> Tested-by: NLadi Prosek <lprosek@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Jan Dakinevich 提交于
- Remove VMX_EPT_EXTENT_INDIVIDUAL_ADDR, since there is no such type of EPT invalidation - Add missing VPID types names Signed-off-by: NJan Dakinevich <jan.dakinevich@gmail.com> Tested-by: NLadi Prosek <lprosek@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 22 11月, 2016 1 次提交
-
-
由 Jim Mattson 提交于
From the Intel SDM, volume 3, section 10.4.3, "Enabling or Disabling the Local APIC," When IA32_APIC_BASE[11] is 0, the processor is functionally equivalent to an IA-32 processor without an on-chip APIC. The CPUID feature flag for the APIC (see Section 10.4.2, "Presence of the Local APIC") is also set to 0. Signed-off-by: NJim Mattson <jmattson@google.com> [Changed subject tag from nVMX to x86.] Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 17 11月, 2016 10 次提交
-
-
由 Luwei Kang 提交于
Add two new AVX512 subfeatures support for KVM guest. AVX512_4VNNIW: Vector instructions for deep learning enhanced word variable precision. AVX512_4FMAPS: Vector instructions for deep learning floating-point single precision. Reviewed-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NHe Chen <he.chen@linux.intel.com> Signed-off-by: NLuwei Kang <luwei.kang@intel.com> [Changed subject tags.] Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Radim Krčmář 提交于
Internal errors were reported on 16 bit fxsave and fxrstor with ipxe. Old Intels don't have unrestricted_guest, so we have to emulate them. The patch takes advantage of the hardware implementation. AMD and Intel differ in saving and restoring other fields in first 32 bytes. A test wrote 0xff to the fxsave area, 0 to upper bits of MCSXR in the fxsave area, executed fxrstor, rewrote the fxsave area to 0xee, and executed fxsave: Intel (Nehalem): 7f 1f 7f 7f ff 00 ff 07 ff ff ff ff ff ff 00 00 ff ff ff ff ff ff 00 00 ff ff 00 00 ff ff 00 00 Intel (Haswell -- deprecated FPU CS and FPU DS): 7f 1f 7f 7f ff 00 ff 07 ff ff ff ff 00 00 00 00 ff ff ff ff 00 00 00 00 ff ff 00 00 ff ff 00 00 AMD (Opteron 2300-series): 7f 1f 7f 7f ff 00 ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee ff ff 00 00 ff ff 02 00 fxsave/fxrstor will only be emulated on early Intels, so KVM can't do much to improve the situation. Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Radim Krčmář 提交于
Move the existing exception handling for inline assembly into a macro and switch its return values to X86EMUL type. Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Radim Krčmář 提交于
Alignments are exclusive, so 5 modes can be expressed in 3 bits. Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Radim Krčmář 提交于
Needed for FXSAVE and FXRSTOR. Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jiang Biao 提交于
The local variable *gpa_offset* is set but not used afterwards, which make the compiler issue a warning with option -Wunused-but-set-variable. Remove it to avoid the warning. Signed-off-by: NJiang Biao <jiang.biao2@zte.com.cn> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jiang Biao 提交于
synic_set_irq is only used in hyperv.c, and should be static to avoid compiling warning when with -Wmissing-prototypes option. Signed-off-by: NJiang Biao <jiang.biao2@zte.com.cn> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jiang Biao 提交于
The use of local variable *function* is not necessary here. Remove it to avoid compiling warning with -Wunused-but-set-variable option. Signed-off-by: NJiang Biao <jiang.biao2@zte.com.cn> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jiang Biao 提交于
kvm_emulate_wbinvd_noskip is only used in x86.c, and should be static to avoid compiling warning when with -Wmissing-prototypes option. Signed-off-by: NJiang Biao <jiang.biao2@zte.com.cn> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jiang Biao 提交于
vmx_arm_hv_timer is only used in vmx.c, and should be static to avoid compiling warning when with -Wmissing-prototypes option. Signed-off-by: NJiang Biao <jiang.biao2@zte.com.cn> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 16 11月, 2016 2 次提交
-
-
由 He Chen 提交于
Sparse populated CPUID leafs are collected in a software provided leaf to avoid bloat of the x86_capability array, but there is no way to rebuild the real leafs (e.g. for KVM CPUID enumeration) other than rereading the CPUID leaf from the CPU. While this is possible it is problematic as it does not take software disabled features into account. If a feature is disabled on the host it should not be exposed to a guest either. Add get_scattered_cpuid_leaf() which rebuilds the leaf from the scattered cpuid table information and the active CPU features. [ tglx: Rewrote changelog ] Signed-off-by: NHe Chen <he.chen@linux.intel.com> Reviewed-by: NBorislav Petkov <bp@suse.de> Cc: Luwei Kang <luwei.kang@intel.com> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Piotr Luc <Piotr.Luc@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: http://lkml.kernel.org/r/1478856336-9388-3-git-send-email-he.chen@linux.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 He Chen 提交于
cpuid_regs is defined multiple times as structure and enum. Rename the enum and move all of it to processor.h so we don't end up with more instances. Rename the misnomed register enumeration from CR_* to the obvious CPUID_*. [ tglx: Rewrote changelog ] Signed-off-by: NHe Chen <he.chen@linux.intel.com> Reviewed-by: NBorislav Petkov <bp@alien8.de> Cc: Luwei Kang <luwei.kang@intel.com> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Piotr Luc <Piotr.Luc@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: http://lkml.kernel.org/r/1478856336-9388-2-git-send-email-he.chen@linux.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 12 11月, 2016 2 次提交
-
-
由 Arnd Bergmann 提交于
The rfc4106 encrypy/decrypt helper functions cause an annoying false-positive warning in allmodconfig if we turn on -Wmaybe-uninitialized warnings again: arch/x86/crypto/aesni-intel_glue.c: In function ‘helper_rfc4106_decrypt’: include/linux/scatterlist.h:67:31: warning: ‘dst_sg_walk.sg’ may be used uninitialized in this function [-Wmaybe-uninitialized] The problem seems to be that the compiler doesn't track the state of the 'one_entry_in_sg' variable across the kernel_fpu_begin/kernel_fpu_end section. This takes the easy way out by adding a bogus initialization, which should be harmless enough to get the patch into v4.9 so we can turn on this warning again by default without producing useless output. A follow-up patch for v4.10 rearranges the code to make the warning go away. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Arnd Bergmann 提交于
apm_bios_call() can fail, and return a status in its argument structure. If that status however is zero during a call from apm_get_power_status(), we end up using data that may have never been set, as reported by "gcc -Wmaybe-uninitialized": arch/x86/kernel/apm_32.c: In function ‘apm’: arch/x86/kernel/apm_32.c:1729:17: error: ‘bx’ may be used uninitialized in this function [-Werror=maybe-uninitialized] arch/x86/kernel/apm_32.c:1835:5: error: ‘cx’ may be used uninitialized in this function [-Werror=maybe-uninitialized] arch/x86/kernel/apm_32.c:1730:17: note: ‘cx’ was declared here arch/x86/kernel/apm_32.c:1842:27: error: ‘dx’ may be used uninitialized in this function [-Werror=maybe-uninitialized] arch/x86/kernel/apm_32.c:1731:17: note: ‘dx’ was declared here This changes the function to return "APM_NO_ERROR" here, which makes the code more robust to broken BIOS versions, and avoids the warning. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Reviewed-by: NJiri Kosina <jkosina@suse.cz> Reviewed-by: NLuis R. Rodriguez <mcgrof@kernel.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 11月, 2016 3 次提交
-
-
由 Jike Song 提交于
Signed-off-by: NJike Song <jike.song@intel.com> Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jike Song 提交于
The user of page_track might needs extra information, so pass the kvm_page_track_notifier_node to callbacks. Signed-off-by: NJike Song <jike.song@intel.com> Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Xiaoguang Chen 提交于
When a memory slot is being moved or removed users of page track can be notified. So users can drop write-protection for the pages in that memory slot. This notifier type is needed by KVMGT to sync up its shadow page table when memory slot is being moved or removed. Register the notifier type track_flush_slot to receive memslot move and remove event. Reviewed-by: NXiao Guangrong <guangrong.xiao@intel.com> Signed-off-by: NChen Xiaoguang <xiaoguang.chen@intel.com> [Squashed commits to avoid bisection breakage and reworded the subject.] Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 03 11月, 2016 4 次提交
-
-
由 Paolo Bonzini 提交于
On some benchmarks (e.g. netperf with ioeventfd disabled), APICv posted interrupts turn out to be slower than interrupt injection via KVM_REQ_EVENT. This patch optimizes a bit the IRR update, avoiding expensive atomic operations in the common case where PI.ON=0 at vmentry or the PIR vector is mostly zero. This saves at least 20 cycles (1%) per vmexit, as measured by kvm-unit-tests' inl_from_qemu test (20 runs): | enable_apicv=1 | enable_apicv=0 | mean stdev | mean stdev ----------|-----------------|------------------ before | 5826 32.65 | 5765 47.09 after | 5809 43.42 | 5777 77.02 Of course, any change in the right column is just placebo effect. :) The savings are bigger if interrupts are frequent. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
These are never used by the host, but they can still be reflected to the guest. Tested-by: NLadi Prosek <lprosek@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
The base clock for the LAPIC timer is always CLOCK_MONOTONIC. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
Most windows guests still utilize APIC Timer periodic/oneshot mode instead of tsc-deadline mode, and the APIC Timer periodic/oneshot mode are still emulated by high overhead hrtimer on host. This patch converts the expected expire time of the periodic/oneshot mode to guest deadline tsc in order to leverage VMX preemption timer logic for APIC Timer tsc-deadline mode. After each preemption timer vmexit preemption timer is restarted to emulate LVTT current-count register is automatically reloaded from the initial-count register when the count reaches 0. This patch reduces ~5600 cycles for each APIC Timer periodic mode operation virtualization. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Yunhong Jiang <yunhong.jiang@intel.com> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> [Squashed with my fixes that were reviewed-by Paolo.] Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-