- 14 7月, 2011 3 次提交
-
-
由 Glauber Costa 提交于
This patch makes update_rq_clock() aware of steal time. The mechanism of operation is not different from irq_time, and follows the same principles. This lives in a CONFIG option itself, and can be compiled out independently of the rest of steal time reporting. The effect of disabling it is that the scheduler will still report steal time (that cannot be disabled), but won't use this information for cpu power adjustments. Everytime update_rq_clock_task() is invoked, we query information about how much time was stolen since last call, and feed it into sched_rt_avg_update(). Although steal time reporting in account_process_tick() keeps track of the last time we read the steal clock, in prev_steal_time, this patch do it independently using another field, prev_steal_time_rq. This is because otherwise, information about time accounted in update_process_tick() would never reach us in update_rq_clock(). Signed-off-by: NGlauber Costa <glommer@redhat.com> Acked-by: NRik van Riel <riel@redhat.com> Acked-by: NPeter Zijlstra <peterz@infradead.org> Tested-by: NEric B Munson <emunson@mgebm.net> CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> CC: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Glauber Costa 提交于
This patch adds a function pointer in one of the many paravirt_ops structs, to allow guests to register a steal time function. Besides a steal time function, we also declare two jump_labels. They will be used to allow the steal time code to be easily bypassed when not in use. Signed-off-by: NGlauber Costa <glommer@redhat.com> Acked-by: NRik van Riel <riel@redhat.com> Tested-by: NEric B Munson <emunson@mgebm.net> CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Glauber Costa 提交于
To implement steal time, we need the hypervisor to pass the guest information about how much time was spent running other processes outside the VM, while the vcpu had meaningful work to do - halt time does not count. This information is acquired through the run_delay field of delayacct/schedstats infrastructure, that counts time spent in a runqueue but not running. Steal time is a per-cpu information, so the traditional MSR-based infrastructure is used. A new msr, KVM_MSR_STEAL_TIME, holds the memory area address containing information about steal time This patch contains the hypervisor part of the steal time infrasructure, and can be backported independently of the guest portion. [avi, yongjie: export delayacct_on, to avoid build failures in some configs] Signed-off-by: NGlauber Costa <glommer@redhat.com> Tested-by: NEric B Munson <emunson@mgebm.net> CC: Rik van Riel <riel@redhat.com> CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: NYongjie Ren <yongjie.ren@intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 12 7月, 2011 37 次提交
-
-
由 Glauber Costa 提交于
To implement steal time, we need the hypervisor to pass the guest information about how much time was spent running other processes outside the VM. This is per-vcpu, and using the kvmclock structure for that is an abuse we decided not to make. In this patchset, I am introducing a new msr, KVM_MSR_STEAL_TIME, that holds the memory area address containing information about steal time This patch contains the headers for it. I am keeping it separate to facilitate backports to people who wants to backport the kernel part but not the hypervisor, or the other way around. Signed-off-by: NGlauber Costa <glommer@redhat.com> Acked-by: NRik van Riel <riel@redhat.com> Tested-by: NEric B Munson <emunson@mgebm.net> CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Glauber Costa 提交于
This patch is simple, put in a different commit so it can be more easily shared between guest and hypervisor. It just defines a named constant to indicate the enable bit for KVM-specific MSRs. Signed-off-by: NGlauber Costa <glommer@redhat.com> Acked-by: NRik van Riel <riel@redhat.com> Tested-by: NEric B Munson <emunson@mgebm.net> CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Takuya Yoshikawa 提交于
Suggested by Ingo and Avi. Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Takuya Yoshikawa 提交于
The current name does not explain the meaning well. So give it a better name "retry_walk" to show that we are trying the walk again. This was suggested by Ingo Molnar. Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Takuya Yoshikawa 提交于
Avoid two step jump to the error handling part. This eliminates the use of the variables present and rsvd_fault. We also use the const type qualifier to show that write/user/fetch_fault do not change in the function. Both of these were suggested by Ingo Molnar. Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Marcelo Tosatti 提交于
This reverts commit bee931d31e588b8eb86b7edee32fac2d16930cd7. TLB flush should be done lazily during guest entry, in kvm_mmu_load(). Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Avi Kivity 提交于
kvm_set_cr0() and kvm_set_cr4(), and possible other functions, assume that kvm_mmu_reset_context() flushes the guest TLB. However, it does not. Fix by flushing the tlb (and syncing the new root as well). Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Avi Kivity 提交于
When CR0.WP=0, we sometimes map user pages as kernel pages (to allow the kernel to write to them). Unfortunately this also allows the kernel to fetch from these pages, even if CR4.SMEP is set. Adjust for this by also setting NX on the spte in these circumstances. Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Yang, Wei 提交于
This patch exposes ERMS feature to KVM guests. The REP MOVSB/STOSB instruction can enhance fast strings attempts to move as much of the data with larger size load/stores as possible. Signed-off-by: NYang, Wei <wei.y.yang@intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Yang, Wei 提交于
This patch exposes RDWRGSFS bit to KVM guests. Signed-off-by: NYang, Wei <wei.y.yang@intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Yang, Wei 提交于
This patch adds RDWRGSFS support when setting CR4. Signed-off-by: NYang, Wei <wei.y.yang@intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Yang, Wei 提交于
This patch removes RDWRGSFS bit from CR4_RESERVED_BITS. Signed-off-by: NYang, Wei <wei.y.yang@intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Yang, Wei Y 提交于
This patch exposes DRNG feature to KVM guests. The RDRAND instruction can provide software with sequences of random numbers generated from white noise. Signed-off-by: NYang, Wei <wei.y.yang@intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Andre Przywara 提交于
commit 123108f1c1aafd51d6a5c79cc04d7999dd88a930 tried to fix KVMs XSAVE valid feature scanning, but it was wrong. It was not considering the sparse nature of this bitfield, instead reading values from uninitialized members of the entries array. This patch now separates subleaf indicies from KVM's array indicies and fills the entry before querying it's value. This fixes AVX support in KVM guests. Signed-off-by: NAndre Przywara <andre.przywara@amd.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Yang, Wei Y 提交于
This patch adds instruction fetch checking when walking guest page table, to implement SMEP when emulating instead of executing natively. Signed-off-by: NYang, Wei <wei.y.yang@intel.com> Signed-off-by: NShan, Haitao <haitao.shan@intel.com> Signed-off-by: NLi, Xin <xin.li@intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Yang, Wei Y 提交于
This patch masks CPUID leaf 7 ebx against host capability word9. Signed-off-by: NYang, Wei <wei.y.yang@intel.com> Signed-off-by: NShan, Haitao <haitao.shan@intel.com> Signed-off-by: NLi, Xin <xin.li@intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Yang, Wei Y 提交于
This patch adds SMEP handling when setting CR4. Signed-off-by: NYang, Wei <wei.y.yang@intel.com> Signed-off-by: NShan, Haitao <haitao.shan@intel.com> Signed-off-by: NLi, Xin <xin.li@intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Yang, Wei Y 提交于
This patch removes SMEP bit from CR4_RESERVED_BITS. Signed-off-by: NYang, Wei <wei.y.yang@intel.com> Signed-off-by: NShan, Haitao <haitao.shan@intel.com> Signed-off-by: NLi, Xin <xin.li@intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Nadav Har'El 提交于
The nested VMX feature is supposed to fully emulate VMX for the guest. This (theoretically) not only allows it to run its own guests, but also also to further emulate VMX for its own guests, and allow arbitrarily deep nesting. This patch fixes a bug (discovered by Kevin Tian) in handling a VMLAUNCH by L2, which prevented deeper nesting. Deeper nesting now works (I only actually tested L3), but is currently *absurdly* slow, to the point of being unusable. Signed-off-by: NNadav Har'El <nyh@il.ibm.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Avi Kivity 提交于
This saves a lot of pointless casts x86_emulate_ctxt and decode_cache. Signed-off-by: NAvi Kivity <avi@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Avi Kivity 提交于
The name eip conflicts with a field of the same name in x86_emulate_ctxt, which we plan to fold decode_cache into. The name _eip is unfortunate, but what's really needed is a refactoring here, not a better name. Signed-off-by: NAvi Kivity <avi@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Jan Kiszka 提交于
a is unused now on CONFIG_X86_32. Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Takuya Yoshikawa 提交于
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Takuya Yoshikawa 提交于
LOOP/LOOPcc : E0-E2 JCXZ/JECXZ/JRCXZ : E3 Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Takuya Yoshikawa 提交于
Call emulate_int() directly to avoid spaghetti goto's. Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Takuya Yoshikawa 提交于
Different functions for those which take segment register operands. Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Takuya Yoshikawa 提交于
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Takuya Yoshikawa 提交于
In addition, replace one "goto xchg" with an em_xchg() call. Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Takuya Yoshikawa 提交于
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Takuya Yoshikawa 提交于
Move the following functions to the opcode tables: RET (Far return) : CB IRET : CF JMP (Jump far) : EA SYSCALL : 0F 05 CLTS : 0F 06 SYSENTER : 0F 34 SYSEXIT : 0F 35 Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Takuya Yoshikawa 提交于
The next patch will change these to be called by opcode::execute. Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Takuya Yoshikawa 提交于
We should use the local variables ctxt and c when the emulate_ctxt and decode appears many times. At least, we need to be consistent about how we use these in a function. Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Nadav Har'El 提交于
Small corrections of KVM (spelling, etc.) not directly related to nested VMX. Signed-off-by: NNadav Har'El <nyh@il.ibm.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Nadav Har'El 提交于
If the "nested" module option is enabled, add the "VMX" CPU feature to the list of CPU features KVM advertises with the KVM_GET_SUPPORTED_CPUID ioctl. Qemu uses this ioctl, and intersects KVM's list with its own list of desired cpu features (depending on the -cpu option given to qemu) to determine the final list of features presented to the guest. Signed-off-by: NNadav Har'El <nyh@il.ibm.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Nadav Har'El 提交于
In the unlikely case that L1 does not capture MSR_IA32_TSC, L0 needs to emulate this MSR write by L2 by modifying vmcs02.tsc_offset. We also need to set vmcs12.tsc_offset, for this change to survive the next nested entry (see prepare_vmcs02()). Additionally, we also need to modify vmx_adjust_tsc_offset: The semantics of this function is that the TSC of all guests on this vcpu, L1 and possibly several L2s, need to be adjusted. To do this, we need to adjust vmcs01's tsc_offset (this offset will also apply to each L2s we enter). We can't set vmcs01 now, so we have to remember this adjustment and apply it when we later exit to L1. Signed-off-by: NNadav Har'El <nyh@il.ibm.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Nadav Har'El 提交于
KVM's "Lazy FPU loading" means that sometimes L0 needs to set CR0.TS, even if a guest didn't set it. Moreover, L0 must also trap CR0.TS changes and NM exceptions, even if we have a guest hypervisor (L1) who didn't want these traps. And of course, conversely: If L1 wanted to trap these events, we must let it, even if L0 is not interested in them. This patch fixes some existing KVM code (in update_exception_bitmap(), vmx_fpu_activate(), vmx_fpu_deactivate()) to do the correct merging of L0's and L1's needs. Note that handle_cr() was already fixed in the above patch, and that new code in introduced in previous patches already handles CR0 correctly (see prepare_vmcs02(), prepare_vmcs12(), and nested_vmx_vmexit()). Signed-off-by: NNadav Har'El <nyh@il.ibm.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Nadav Har'El 提交于
When L2 tries to modify CR0 or CR4 (with mov or clts), and modifies a bit which L1 asked to shadow (via CR[04]_GUEST_HOST_MASK), we already do the right thing: we let L1 handle the trap (see nested_vmx_exit_handled_cr() in a previous patch). When L2 modifies bits that L1 doesn't care about, we let it think (via CR[04]_READ_SHADOW) that it did these modifications, while only changing (in GUEST_CR[04]) the bits that L0 doesn't shadow. This is needed for corect handling of CR0.TS for lazy FPU loading: L0 may want to leave TS on, while pretending to allow the guest to change it. Signed-off-by: NNadav Har'El <nyh@il.ibm.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-