提交 · 095c0aa83e52d6c3dd7168610746703921f570af · openeuler / Kernel

14 7月, 2011 3 次提交

sched: adjust scheduler cpu power for stolen time · 095c0aa8

由 Glauber Costa 提交于 7月 11, 2011

This patch makes update_rq_clock() aware of steal time.
The mechanism of operation is not different from irq_time,
and follows the same principles. This lives in a CONFIG
option itself, and can be compiled out independently of
the rest of steal time reporting. The effect of disabling it
is that the scheduler will still report steal time (that cannot be
disabled), but won't use this information for cpu power adjustments.

Everytime update_rq_clock_task() is invoked, we query information
about how much time was stolen since last call, and feed it into
sched_rt_avg_update().

Although steal time reporting in account_process_tick() keeps
track of the last time we read the steal clock, in prev_steal_time,
this patch do it independently using another field,
prev_steal_time_rq. This is because otherwise, information about time
accounted in update_process_tick() would never reach us in update_rq_clock().
Signed-off-by: NGlauber Costa <glommer@redhat.com>
Acked-by: NRik van Riel <riel@redhat.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Tested-by: NEric B Munson <emunson@mgebm.net>
CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
CC: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

095c0aa8

KVM guest: Add a pv_ops stub for steal time · 3c404b57

由 Glauber Costa 提交于 7月 11, 2011

This patch adds a function pointer in one of the many paravirt_ops
structs, to allow guests to register a steal time function. Besides
a steal time function, we also declare two jump_labels. They will be
used to allow the steal time code to be easily bypassed when not
in use.
Signed-off-by: NGlauber Costa <glommer@redhat.com>
Acked-by: NRik van Riel <riel@redhat.com>
Tested-by: NEric B Munson <emunson@mgebm.net>
CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

3c404b57

KVM: Steal time implementation · c9aaa895

由 Glauber Costa 提交于 7月 11, 2011

To implement steal time, we need the hypervisor to pass the guest
information about how much time was spent running other processes
outside the VM, while the vcpu had meaningful work to do - halt
time does not count.

This information is acquired through the run_delay field of
delayacct/schedstats infrastructure, that counts time spent in a
runqueue but not running.

Steal time is a per-cpu information, so the traditional MSR-based
infrastructure is used. A new msr, KVM_MSR_STEAL_TIME, holds the
memory area address containing information about steal time

This patch contains the hypervisor part of the steal time infrasructure,
and can be backported independently of the guest portion.

[avi, yongjie: export delayacct_on, to avoid build failures in some configs]
Signed-off-by: NGlauber Costa <glommer@redhat.com>
Tested-by: NEric B Munson <emunson@mgebm.net>
CC: Rik van Riel <riel@redhat.com>
CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: NYongjie Ren <yongjie.ren@intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

c9aaa895

12 7月, 2011 37 次提交

KVM: KVM Steal time guest/host interface · 9ddabbe7

由 Glauber Costa 提交于 7月 11, 2011

To implement steal time, we need the hypervisor to pass the guest information
about how much time was spent running other processes outside the VM.
This is per-vcpu, and using the kvmclock structure for that is an abuse
we decided not to make.

In this patchset, I am introducing a new msr, KVM_MSR_STEAL_TIME, that
holds the memory area address containing information about steal time

This patch contains the headers for it. I am keeping it separate to facilitate
backports to people who wants to backport the kernel part but not the
hypervisor, or the other way around.
Signed-off-by: NGlauber Costa <glommer@redhat.com>
Acked-by: NRik van Riel <riel@redhat.com>
Tested-by: NEric B Munson <emunson@mgebm.net>
CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

9ddabbe7

KVM: Add constant to represent KVM MSRs enabled bit in guest/host interface · 4b6b35f5

由 Glauber Costa 提交于 7月 11, 2011

This patch is simple, put in a different commit so it can be more easily
shared between guest and hypervisor. It just defines a named constant
to indicate the enable bit for KVM-specific MSRs.
Signed-off-by: NGlauber Costa <glommer@redhat.com>
Acked-by: NRik van Riel <riel@redhat.com>
Tested-by: NEric B Munson <emunson@mgebm.net>
CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

4b6b35f5

KVM: MMU: Introduce is_last_gpte() to clean up walk_addr_generic() · 3c8c652a

由 Takuya Yoshikawa 提交于 7月 01, 2011

Suggested by Ingo and Avi.

Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

3c8c652a

KVM: MMU: Rename the walk label in walk_addr_generic() · 92c1c1e8

由 Takuya Yoshikawa 提交于 7月 01, 2011

The current name does not explain the meaning well.  So give it a better
name "retry_walk" to show that we are trying the walk again.

This was suggested by Ingo Molnar.

Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

92c1c1e8

KVM: MMU: Clean up the error handling of walk_addr_generic() · 134291bf

由 Takuya Yoshikawa 提交于 7月 01, 2011

Avoid two step jump to the error handling part.  This eliminates the use
of the variables present and rsvd_fault.

We also use the const type qualifier to show that write/user/fetch_fault
do not change in the function.

Both of these were suggested by Ingo Molnar.

Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

134291bf

Revert "KVM: MMU: make kvm_mmu_reset_context() flush the guest TLB" · f8f7e5ee

由 Marcelo Tosatti 提交于 6月 21, 2011

This reverts commit bee931d31e588b8eb86b7edee32fac2d16930cd7.

TLB flush should be done lazily during guest entry, in
kvm_mmu_load().
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

f8f7e5ee

KVM: MMU: make kvm_mmu_reset_context() flush the guest TLB · 45bd07b9

由 Avi Kivity 提交于 6月 12, 2011

kvm_set_cr0() and kvm_set_cr4(), and possible other functions,
assume that kvm_mmu_reset_context() flushes the guest TLB.  However,
it does not.

Fix by flushing the tlb (and syncing the new root as well).
Signed-off-by: NAvi Kivity <avi@redhat.com>

45bd07b9

KVM: MMU: Adjust shadow paging to work when SMEP=1 and CR0.WP=0 · 411c588d

由 Avi Kivity 提交于 6月 06, 2011

When CR0.WP=0, we sometimes map user pages as kernel pages (to allow
the kernel to write to them).  Unfortunately this also allows the kernel
to fetch from these pages, even if CR4.SMEP is set.

Adjust for this by also setting NX on the spte in these circumstances.
Signed-off-by: NAvi Kivity <avi@redhat.com>

411c588d

KVM: Enable ERMS feature support for KVM · a01c8f9b

由 Yang, Wei 提交于 6月 14, 2011

This patch exposes ERMS feature to KVM guests.

The REP MOVSB/STOSB instruction can enhance fast strings attempts to
move as much of the data with larger size load/stores as possible.
Signed-off-by: NYang, Wei <wei.y.yang@intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

a01c8f9b

KVM: Expose RDWRGSFS bit to KVM guests · 176f61da

由 Yang, Wei 提交于 6月 14, 2011

This patch exposes RDWRGSFS bit to KVM guests.
Signed-off-by: NYang, Wei <wei.y.yang@intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

176f61da

KVM: Add RDWRGSFS support when setting CR4 · 74dc2b4f

由 Yang, Wei 提交于 6月 14, 2011

This patch adds RDWRGSFS support when setting CR4.
Signed-off-by: NYang, Wei <wei.y.yang@intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

74dc2b4f

KVM: Remove RDWRGSFS bit from CR4_RESERVED_BITS · d9c3476d

由 Yang, Wei 提交于 6月 14, 2011

This patch removes RDWRGSFS bit from CR4_RESERVED_BITS.
Signed-off-by: NYang, Wei <wei.y.yang@intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d9c3476d

KVM: Enable DRNG feature support for KVM · 4a00efdf

由 Yang, Wei Y 提交于 6月 13, 2011

This patch exposes DRNG feature to KVM guests.

The RDRAND instruction can provide software with sequences of
random numbers generated from white noise.
Signed-off-by: NYang, Wei <wei.y.yang@intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

4a00efdf

KVM: fix XSAVE bit scanning (now properly) · 02668b06

由 Andre Przywara 提交于 6月 10, 2011

commit 123108f1c1aafd51d6a5c79cc04d7999dd88a930 tried to fix KVMs
XSAVE valid feature scanning, but it was wrong. It was not considering
the sparse nature of this bitfield, instead reading values from
uninitialized members of the entries array.
This patch now separates subleaf indicies from KVM's array indicies
and fills the entry before querying it's value.
This fixes AVX support in KVM guests.
Signed-off-by: NAndre Przywara <andre.przywara@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

02668b06

KVM: Add instruction fetch checking when walking guest page table · e57d4a35

由 Yang, Wei Y 提交于 6月 03, 2011

This patch adds instruction fetch checking when walking guest page table,
to implement SMEP when emulating instead of executing natively.
Signed-off-by: NYang, Wei <wei.y.yang@intel.com>
Signed-off-by: NShan, Haitao <haitao.shan@intel.com>
Signed-off-by: NLi, Xin <xin.li@intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

e57d4a35

KVM: Mask function7 ebx against host capability word9 · 611c120f

由 Yang, Wei Y 提交于 6月 03, 2011

This patch masks CPUID leaf 7 ebx against host capability word9.
Signed-off-by: NYang, Wei <wei.y.yang@intel.com>
Signed-off-by: NShan, Haitao <haitao.shan@intel.com>
Signed-off-by: NLi, Xin <xin.li@intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

611c120f

KVM: Add SMEP support when setting CR4 · c68b734f

由 Yang, Wei Y 提交于 6月 03, 2011

This patch adds SMEP handling when setting CR4.
Signed-off-by: NYang, Wei <wei.y.yang@intel.com>
Signed-off-by: NShan, Haitao <haitao.shan@intel.com>
Signed-off-by: NLi, Xin <xin.li@intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

c68b734f

KVM: Remove SMEP bit from CR4_RESERVED_BITS · 8d9c975f

由 Yang, Wei Y 提交于 6月 03, 2011

This patch removes SMEP bit from CR4_RESERVED_BITS.
Signed-off-by: NYang, Wei <wei.y.yang@intel.com>
Signed-off-by: NShan, Haitao <haitao.shan@intel.com>
Signed-off-by: NLi, Xin <xin.li@intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

8d9c975f

KVM: nVMX: Fix bug preventing more than two levels of nesting · 509c75ea

由 Nadav Har'El 提交于 6月 02, 2011

The nested VMX feature is supposed to fully emulate VMX for the guest. This
(theoretically) not only allows it to run its own guests, but also also
to further emulate VMX for its own guests, and allow arbitrarily deep nesting.

This patch fixes a bug (discovered by Kevin Tian) in handling a VMLAUNCH
by L2, which prevented deeper nesting.

Deeper nesting now works (I only actually tested L3), but is currently
*absurdly* slow, to the point of being unusable.
Signed-off-by: NNadav Har'El <nyh@il.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

509c75ea

KVM: x86 emulator: fold decode_cache into x86_emulate_ctxt · 9dac77fa

由 Avi Kivity 提交于 6月 01, 2011

This saves a lot of pointless casts x86_emulate_ctxt and decode_cache.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

9dac77fa

KVM: x86 emulator: rename decode_cache::eip to _eip · 36dd9bb5

由 Avi Kivity 提交于 6月 01, 2011

The name eip conflicts with a field of the same name in x86_emulate_ctxt,
which we plan to fold decode_cache into.

The name _eip is unfortunate, but what's really needed is a refactoring
here, not a better name.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

36dd9bb5

KVM: VMX: Silence warning on 32-bit hosts · 2e4ce7f5

由 Jan Kiszka 提交于 6月 01, 2011

a is unused now on CONFIG_X86_32.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

2e4ce7f5

KVM: x86 emulator: Use opcode::execute for CLI/STI(FA/FB) · f411e6cd

由 Takuya Yoshikawa 提交于 5月 29, 2011

Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

f411e6cd

KVM: x86 emulator: Use opcode::execute for LOOP/JCXZ · d06e03ad

由 Takuya Yoshikawa 提交于 5月 29, 2011

  LOOP/LOOPcc      : E0-E2
  JCXZ/JECXZ/JRCXZ : E3
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

d06e03ad

KVM: x86 emulator: Clean up INT n/INTO/INT 3(CC/CD/CE) · 5c5df76b

由 Takuya Yoshikawa 提交于 5月 29, 2011

Call emulate_int() directly to avoid spaghetti goto's.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

5c5df76b

KVM: x86 emulator: Use opcode::execute for MOV(8C/8E) · 1bd5f469

由 Takuya Yoshikawa 提交于 5月 29, 2011

Different functions for those which take segment register operands.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

1bd5f469

KVM: x86 emulator: Use opcode::execute for RET(C3) · ebda02c2

由 Takuya Yoshikawa 提交于 5月 29, 2011

Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

ebda02c2

KVM: x86 emulator: Use opcode::execute for XCHG(86/87) · e4f973ae

由 Takuya Yoshikawa 提交于 5月 29, 2011

In addition, replace one "goto xchg" with an em_xchg() call.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

e4f973ae

KVM: x86 emulator: Use opcode::execute for TEST(84/85, A8/A9) · 9f21ca59

由 Takuya Yoshikawa 提交于 5月 29, 2011

Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

9f21ca59

KVM: x86 emulator: Use opcode::execute for some instructions · db5b0762

由 Takuya Yoshikawa 提交于 5月 29, 2011

Move the following functions to the opcode tables:

  RET (Far return) : CB
  IRET             : CF
  JMP (Jump far)   : EA

  SYSCALL          : 0F 05
  CLTS             : 0F 06
  SYSENTER         : 0F 34
  SYSEXIT          : 0F 35
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

db5b0762

KVM: x86 emulator: Rename emulate_xxx() to em_xxx() · e01991e7

由 Takuya Yoshikawa 提交于 5月 29, 2011

The next patch will change these to be called by opcode::execute.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

e01991e7

KVM: x86 emulator: Use the pointers ctxt and c consistently · 9d74191a

由 Takuya Yoshikawa 提交于 5月 29, 2011

We should use the local variables ctxt and c when the emulate_ctxt and
decode appears many times.  At least, we need to be consistent about
how we use these in a function.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

9d74191a

KVM: nVMX: Miscellenous small corrections · 2844d849