- 14 8月, 2012 1 次提交
-
-
由 Gleb Natapov 提交于
MSR_IA32_DEBUGCTLMSR is zeroed on VMEXIT. Restore it to the correct value. Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 06 8月, 2012 1 次提交
-
-
由 Xiao Guangrong 提交于
After commit a2766325, the error page is replaced by the error code, it need not be released anymore [ The patch has been compiling tested for powerpc ] Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 02 8月, 2012 1 次提交
-
-
由 Avi Kivity 提交于
Commit b2da15ac ("KVM: VMX: Optimize %ds, %es reload") broke i386 in the following scenario: vcpu_load ... vmx_save_host_state vmx_vcpu_run (ds.rpl, es.rpl cleared by hardware) interrupt push ds, es # pushes bad ds, es schedule vmx_vcpu_put vmx_load_host_state reload ds, es (with __USER_DS) pop ds, es # of other thread's stack iret # other thread runs interrupt push ds, es schedule # back in vcpu thread pop ds, es # now with rpl=0 iret ... vcpu_put resume_userspace iret # clears ds, es due to mismatched rpl (instead of resume_userspace, we might return with SYSEXIT and then take an exception; when the exception IRETs we end up with cleared ds, es) Fix by avoiding the optimization on i386 and reloading ds, es on the lightweight exit path. Reported-by: NChris Clayron <chris2553@googlemail.com> Signed-off-by: NAvi Kivity <avi@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 21 7月, 2012 1 次提交
-
-
由 Guo Chao 提交于
Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 12 7月, 2012 1 次提交
-
-
由 Mao, Junjie 提交于
This patch handles PCID/INVPCID for guests. Process-context identifiers (PCIDs) are a facility by which a logical processor may cache information for multiple linear-address spaces so that the processor may retain cached information when software switches to a different linear address space. Refer to section 4.10.1 in IA32 Intel Software Developer's Manual Volume 3A for details. For guests with EPT, the PCID feature is enabled and INVPCID behaves as running natively. For guests without EPT, the PCID feature is disabled and INVPCID triggers #UD. Signed-off-by: NJunjie Mao <junjie.mao@intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 11 7月, 2012 1 次提交
-
-
由 Xiao Guangrong 提交于
Export the present bit of page fault error code, the later patch will use it Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 09 7月, 2012 8 次提交
-
-
由 Avi Kivity 提交于
Our emulation should be complete enough that we can emulate guests while they are in big real mode, or in a mode transition that is not virtualizable without unrestricted guest support. Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Avi Kivity 提交于
If instruction emulation fails, report it properly to userspace. Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Avi Kivity 提交于
Process the event, possibly injecting an interrupt, before continuing. Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Avi Kivity 提交于
If we return early from an invalid guest state emulation loop, make sure we return to it later if the guest state is still invalid. Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Avi Kivity 提交于
Checking EFLAGS.IF is incorrect as we might be in interrupt shadow. If that is the case, the main loop will notice that and not inject the interrupt, causing an endless loop. Fix by using vmx_interrupt_allowed() to check if we can inject an interrupt instead. Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Avi Kivity 提交于
Otherwise, if the guest ends up looping, we never exit the srcu critical section, which causes synchronize_srcu() to hang. Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Avi Kivity 提交于
Some userspace (e.g. QEMU 1.1) munge the d and g bits of segment descriptors, causing us not to recognize them as unusable segments with emulate_invalid_guest_state=1. Relax the check by testing for segment not present (a non-present segment cannot be usable). Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Avi Kivity 提交于
In protected mode, the CPL is defined as the lower two bits of CS, as set by the last far jump. But during the transition to protected mode, there is no last far jump, so we need to return zero (the inherited real mode CPL). Fix by reading CPL from the cache during the transition. This isn't 100% correct since we don't set the CPL cache on a far jump, but since protected mode transition will always jump to a segment with RPL=0, it will always work. Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 04 7月, 2012 1 次提交
-
-
由 Guo Chao 提交于
Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 06 6月, 2012 1 次提交
-
-
由 Christoffer Dall 提交于
Introduces a couple of print functions, which are essentially wrappers around standard printk functions, with a KVM: prefix. Functions introduced or modified are: - kvm_err(fmt, ...) - kvm_info(fmt, ...) - kvm_debug(fmt, ...) - kvm_pr_unimpl(fmt, ...) - pr_unimpl(vcpu, fmt, ...) -> vcpu_unimpl(vcpu, fmt, ...) Signed-off-by: NChristoffer Dall <c.dall@virtualopensystems.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 05 6月, 2012 4 次提交
-
-
由 Orit Wasserman 提交于
For example migration between Westmere and Nehelem hosts, caught in big real mode. The code that fixes the segments for real mode guest was moved from enter_rmode to vmx_set_segments. enter_rmode calls vmx_set_segments for each segment. Signed-off-by: NOrit Wasserman <owasserm@rehdat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xudong Hao 提交于
Signed-off-by: NHaitao Shan <haitao.shan@intel.com> Signed-off-by: NXudong Hao <xudong.hao@intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xudong Hao 提交于
In EPT page structure entry, Enable EPT A/D bits if processor supported. Signed-off-by: NHaitao Shan <haitao.shan@intel.com> Signed-off-by: NXudong Hao <xudong.hao@intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xudong Hao 提交于
Add kernel parameter to control A/D bits support, it's on by default. Signed-off-by: NHaitao Shan <haitao.shan@intel.com> Signed-off-by: NXudong Hao <xudong.hao@intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 17 5月, 2012 2 次提交
-
-
由 Avi Kivity 提交于
On x86_64, we can defer %ds and %es reload to the heavyweight context switch, since nothing in the lightweight paths uses the host %ds or %es (they are ignored by the processor). Furthermore we can avoid the load if the segments are null, by letting the hardware load the null segments for us. This is the expected case. On i386, we could avoid the reload entirely, since the entry.S paths take care of reload, except for the SYSEXIT path which leaves %ds and %es set to __USER_DS. So we set them to the same values as well. Saves about 70 cycles out of 1600 (around 4%; noisy measurements). Signed-off-by: NAvi Kivity <avi@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Avi Kivity 提交于
The vmx exit code unconditionally restores %ds and %es to __USER_DS. This can override the user's values, since %ds and %es are not saved and restored in x86_64 syscalls. In practice, this isn't dangerous since nobody uses segment registers in long mode, least of all programs that use KVM. Signed-off-by: NAvi Kivity <avi@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 14 5月, 2012 1 次提交
-
-
由 Xiao Guangrong 提交于
fix: [ 1529.577273] Call Trace: [ 1529.577289] [<ffffffffa060d58f>] kvm_arch_hardware_disable+0x13/0x30 [kvm] [ 1529.577302] [<ffffffffa05fa2d4>] hardware_disable_nolock+0x35/0x39 [kvm] [ 1529.577311] [<ffffffffa05fa29f>] ? cpumask_clear_cpu.constprop.31+0x13/0x13 [kvm] [ 1529.577315] [<ffffffff81096ba8>] on_each_cpu+0x44/0x84 [ 1529.577326] [<ffffffffa05f98b5>] hardware_disable_all_nolock+0x34/0x36 [kvm] [ 1529.577335] [<ffffffffa05f98e2>] hardware_disable_all+0x2b/0x39 [kvm] [ 1529.577349] [<ffffffffa05fafe5>] kvm_put_kvm+0xed/0x10f [kvm] [ 1529.577358] [<ffffffffa05fb3d7>] kvm_vm_release+0x22/0x28 [kvm] Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 19 4月, 2012 1 次提交
-
-
由 Avi Kivity 提交于
kvm_set_shared_msr() may not be called in preemptible context, but vmx_set_msr() does so: BUG: using smp_processor_id() in preemptible [00000000] code: qemu-kvm/22713 caller is kvm_set_shared_msr+0x32/0xa0 [kvm] Pid: 22713, comm: qemu-kvm Not tainted 3.4.0-rc3+ #39 Call Trace: [<ffffffff8131fa82>] debug_smp_processor_id+0xe2/0x100 [<ffffffffa0328ae2>] kvm_set_shared_msr+0x32/0xa0 [kvm] [<ffffffffa03a103b>] vmx_set_msr+0x28b/0x2d0 [kvm_intel] ... Making kvm_set_shared_msr() work in preemptible is cleaner, but it's used in the fast path. Making two variants is overkill, so this patch just disables preemption around the call. Reported-by: NDave Jones <davej@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 08 4月, 2012 1 次提交
-
-
由 Josh Triplett 提交于
Enable x86 feature-based autoloading for the kvm-intel module on CPUs with X86_FEATURE_VMX. Signed-off-by: NJosh Triplett <josh@joshtriplett.org> Acked-By: NKay Sievers <kay@vrfy.org> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 06 4月, 2012 1 次提交
-
-
由 Marcelo Tosatti 提交于
vmx_set_cr0 is called from vcpu run context, therefore it expects kvm->srcu to be held (for setting up the real-mode TSS). Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 08 3月, 2012 6 次提交
-
-
由 Nadav Har'El 提交于
The code which checks whether to inject a pagefault to L1 or L2 (in nested VMX) was wrong, incorrect in how it checked the PF_VECTOR bit. Thanks to Dan Carpenter for spotting this. Signed-off-by: NNadav Har'El <nyh@il.ibm.com> Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Avi Kivity 提交于
Shared MSRs (MSR_*STAR and related) are stored in both vmx->guest_msrs and in the CPU registers, but vmx_set_msr() only updated memory. Prior to 46199f33, this didn't matter, since we called vmx_load_host_state(), which scheduled a vmx_save_host_state(), which re-synchronized the CPU state, but now we don't, so the CPU state will not be synchronized until the next exit to host userspace. This mostly affects nested vmx workloads, which play with these MSRs a lot. Fix by loading the MSR eagerly. Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Kevin Wolf 提交于
Currently, all task switches check privileges against the DPL of the TSS. This is only correct for jmp/call to a TSS. If a task gate is used, the DPL of this take gate is used for the check instead. Exceptions, external interrupts and iret shouldn't perform any check. [avi: kill kvm-kmod remnants] Signed-off-by: NKevin Wolf <kwolf@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Raghavendra K T 提交于
yield_on_hlt was introduced for CPU bandwidth capping. Now it is redundant with CFS hardlimit. yield_on_hlt also complicates the scenario in paravirtual environment, that needs to trap halt. for e.g. paravirtualized ticket spinlocks. Acked-by: NAnthony Liguori <aliguori@us.ibm.com> Signed-off-by: NRaghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Marcelo Tosatti 提交于
Redefine the API to take a parameter indicating whether an adjustment is in host or guest cycles. Signed-off-by: NZachary Amsden <zamsden@gmail.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Zachary Amsden 提交于
This requires some restructuring; rather than use 'virtual_tsc_khz' to indicate whether hardware rate scaling is in effect, we consider each VCPU to always have a virtual TSC rate. Instead, there is new logic above the vendor-specific hardware scaling that decides whether it is even necessary to use and updates all rate variables used by common code. This means we can simply query the virtual rate at any point, which is needed for software rate scaling. There is also now a threshold added to the TSC rate scaling; minor differences and variations of measured TSC rate can accidentally provoke rate scaling to be used when it is not needed. Instead, we have a tolerance variable called tsc_tolerance_ppm, which is the maximum variation from user requested rate at which scaling will be used. The default is 250ppm, which is the half the threshold for NTP adjustment, allowing for some hardware variation. In the event that hardware rate scaling is not available, we can kludge a bit by forcing TSC catchup to turn on when a faster than hardware speed has been requested, but there is nothing available yet for the reverse case; this requires a trap and emulate software implementation for RDTSC, which is still forthcoming. [avi: fix 64-bit division on i386] Signed-off-by: NZachary Amsden <zamsden@gmail.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 22 2月, 2012 1 次提交
-
-
由 Linus Torvalds 提交于
While various modules include <asm/i387.h> to get access to things we actually *intend* for them to use, most of that header file was really pretty low-level internal stuff that we really don't want to expose to others. So split the header file into two: the small exported interfaces remain in <asm/i387.h>, while the internal definitions that are only used by core architecture code are now in <asm/fpu-internal.h>. The guiding principle for this was to expose functions that we export to modules, and leave them in <asm/i387.h>, while stuff that is used by task switching or was marked GPL-only is in <asm/fpu-internal.h>. The fpu-internal.h file could be further split up too, especially since arch/x86/kvm/ uses some of the remaining stuff for its module. But that kvm usage should probably be abstracted out a bit, and at least now the internal FPU accessor functions are much more contained. Even if it isn't perhaps as contained as it _could_ be. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1202211340330.5354@i5.linux-foundation.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 19 2月, 2012 1 次提交
-
-
由 Linus Torvalds 提交于
This moves the bit that indicates whether a thread has ownership of the FPU from the TS_USEDFPU bit in thread_info->status to a word of its own (called 'has_fpu') in task_struct->thread.has_fpu. This fixes two independent bugs at the same time: - changing 'thread_info->status' from the scheduler causes nasty problems for the other users of that variable, since it is defined to be thread-synchronous (that's what the "TS_" part of the naming was supposed to indicate). So perfectly valid code could (and did) do ti->status |= TS_RESTORE_SIGMASK; and the compiler was free to do that as separate load, or and store instructions. Which can cause problems with preemption, since a task switch could happen in between, and change the TS_USEDFPU bit. The change to TS_USEDFPU would be overwritten by the final store. In practice, this seldom happened, though, because the 'status' field was seldom used more than once, so gcc would generally tend to generate code that used a read-modify-write instruction and thus happened to avoid this problem - RMW instructions are naturally low fat and preemption-safe. - On x86-32, the current_thread_info() pointer would, during interrupts and softirqs, point to a *copy* of the real thread_info, because x86-32 uses %esp to calculate the thread_info address, and thus the separate irq (and softirq) stacks would cause these kinds of odd thread_info copy aliases. This is normally not a problem, since interrupts aren't supposed to look at thread information anyway (what thread is running at interrupt time really isn't very well-defined), but it confused the heck out of irq_fpu_usable() and the code that tried to squirrel away the FPU state. (It also caused untold confusion for us poor kernel developers). It also turns out that using 'task_struct' is actually much more natural for most of the call sites that care about the FPU state, since they tend to work with the task struct for other reasons anyway (ie scheduling). And the FPU data that we are going to save/restore is found there too. Thanks to Arjan Van De Ven <arjan@linux.intel.com> for pointing us to the %esp issue. Cc: Arjan van de Ven <arjan@linux.intel.com> Reported-and-tested-by: NRaphael Prevost <raphael@buro.asia> Acked-and-tested-by: NSuresh Siddha <suresh.b.siddha@intel.com> Tested-by: NPeter Anvin <hpa@zytor.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 2月, 2012 1 次提交
-
-
由 Linus Torvalds 提交于
This creates three helper functions that do the TS_USEDFPU accesses, and makes everybody that used to do it by hand use those helpers instead. In addition, there's a couple of helper functions for the "change both CR0.TS and TS_USEDFPU at the same time" case, and the places that do that together have been changed to use those. That means that we have fewer random places that open-code this situation. The intent is partly to clarify the code without actually changing any semantics yet (since we clearly still have some hard to reproduce bug in this area), but also to make it much easier to use another approach entirely to caching the CR0.TS bit for software accesses. Right now we use a bit in the thread-info 'status' variable (this patch does not change that), but we might want to make it a full field of its own or even make it a per-cpu variable. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 1月, 2012 1 次提交
-
-
由 Rusty Russell 提交于
module_param(bool) used to counter-intuitively take an int. In fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy trick. It's time to remove the int/unsigned int option. For this version it'll simply give a warning, but it'll break next kernel version. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
- 27 12月, 2011 4 次提交
-
-
由 Avi Kivity 提交于
Intercept RDPMC and forward it to the PMU emulation code. Signed-off-by: NAvi Kivity <avi@redhat.com> Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Avi Kivity 提交于
The cpuid code has grown; put it into a separate file. Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xiao Guangrong 提交于
Introduce id_to_memslot to get memslot by slot id Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Gleb Natapov 提交于
vmx_load_host_state() does not handle msrs switching (except MSR_KERNEL_GS_BASE) since commit 26bb0981. Remove call to it where it is no longer make sense. Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-