提交 · 838815a78785022f6611e5c48386567aea7b818b · openanolis / cloud-kernel

19 5月, 2010 25 次提交

x86: KVM guest: Try using new kvm clock msrs · 838815a7

由 Glauber Costa 提交于 5月 11, 2010

We now added a new set of clock-related msrs in replacement of the old
ones. In theory, we could just try to use them and get a return value
indicating they do not exist, due to our use of kvm_write_msr_save.

However, kvm clock registration happens very early, and if we ever
try to write to a non-existant MSR, we raise a lethal #GP, since our
idt handlers are not in place yet.

So this patch tests for a cpuid feature exported by the host to
decide which set of msrs are supported.
Signed-off-by: NGlauber Costa <glommer@redhat.com>
Acked-by: NZachary Amsden <zamsden@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

838815a7

KVM: x86: export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID · 84478c82

由 Glauber Costa 提交于 5月 11, 2010

Right now, we were using individual KVM_CAP entities to communicate
userspace about which cpuids we support. This is suboptimal, since it
generates a delay between the feature arriving in the host, and
being available at the guest.

A much better mechanism is to list para features in KVM_GET_SUPPORTED_CPUID.
This makes userspace automatically aware of what we provide. And if we
ever add a new cpuid bit in the future, we have to do that again,
which create some complexity and delay in feature adoption.
Signed-off-by: NGlauber Costa <glommer@redhat.com>
Acked-by: NZachary Amsden <zamsden@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

84478c82

KVM: x86: add new KVMCLOCK cpuid feature · 0e6ac58a

由 Glauber Costa 提交于 5月 11, 2010

This cpuid, KVM_CPUID_CLOCKSOURCE2, will indicate to the guest
that kvmclock is available through a new set of MSRs. The old ones
are deprecated.
Signed-off-by: NGlauber Costa <glommer@redhat.com>
Acked-by: NZachary Amsden <zamsden@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

0e6ac58a

KVM: x86: change msr numbers for kvmclock · 11c6bffa

由 Glauber Costa 提交于 5月 11, 2010

Avi pointed out a while ago that those MSRs falls into the pentium
PMU range. So the idea here is to add new ones, and after a while,
deprecate the old ones.
Signed-off-by: NGlauber Costa <glommer@redhat.com>
Acked-by: NZachary Amsden <zamsden@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

11c6bffa

x86, paravirt: Add a global synchronization point for pvclock · 489fb490

由 Glauber Costa 提交于 5月 11, 2010

In recent stress tests, it was found that pvclock-based systems
could seriously warp in smp systems. Using ingo's time-warp-test.c,
I could trigger a scenario as bad as 1.5mi warps a minute in some systems.
(to be fair, it wasn't that bad in most of them). Investigating further, I
found out that such warps were caused by the very offset-based calculation
pvclock is based on.

This happens even on some machines that report constant_tsc in its tsc flags,
specially on multi-socket ones.

Two reads of the same kernel timestamp at approx the same time, will likely
have tsc timestamped in different occasions too. This means the delta we
calculate is unpredictable at best, and can probably be smaller in a cpu
that is legitimately reading clock in a forward ocasion.

Some adjustments on the host could make this window less likely to happen,
but still, it pretty much poses as an intrinsic problem of the mechanism.

A while ago, I though about using a shared variable anyway, to hold clock
last state, but gave up due to the high contention locking was likely
to introduce, possibly rendering the thing useless on big machines. I argue,
however, that locking is not necessary.

We do a read-and-return sequence in pvclock, and between read and return,
the global value can have changed. However, it can only have changed
by means of an addition of a positive value. So if we detected that our
clock timestamp is less than the current global, we know that we need to
return a higher one, even though it is not exactly the one we compared to.

OTOH, if we detect we're greater than the current time source, we atomically
replace the value with our new readings. This do causes contention on big
boxes (but big here means *BIG*), but it seems like a good trade off, since
it provide us with a time source guaranteed to be stable wrt time warps.

After this patch is applied, I don't see a single warp in time during 5 days
of execution, in any of the machines I saw them before.
Signed-off-by: NGlauber Costa <glommer@redhat.com>
Acked-by: NZachary Amsden <zamsden@redhat.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
CC: Avi Kivity <avi@redhat.com>
CC: Marcelo Tosatti <mtosatti@redhat.com>
CC: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

489fb490

x86, paravirt: Enable pvclock flags in vcpu_time_info structure · 424c32f1

由 Glauber Costa 提交于 5月 11, 2010

This patch removes one padding byte and transform it into a flags
field. New versions of guests using pvclock will query these flags
upon each read.

Flags, however, will only be interpreted when the guest decides to.
It uses the pvclock_valid_flags function to signal that a specific
set of flags should be taken into consideration. Which flags are valid
are usually devised via HV negotiation.
Signed-off-by: NGlauber Costa <glommer@redhat.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
Acked-by: NZachary Amsden <zamsden@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

424c32f1

KVM: x86: Inject #GP with the right rip on efer writes · b69e8cae

由 Roedel, Joerg 提交于 5月 06, 2010

This patch fixes a bug in the KVM efer-msr write path. If a
guest writes to a reserved efer bit the set_efer function
injects the #GP directly. The architecture dependent wrmsr
function does not see this, assumes success and advances the
rip. This results in a #GP in the guest with the wrong rip.
This patch fixes this by reporting efer write errors back to
the architectural wrmsr function.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

b69e8cae

KVM: SVM: Don't allow nested guest to VMMCALL into host · 0d945bd9

由 Joerg Roedel 提交于 5月 05, 2010

This patch disables the possibility for a l2-guest to do a
VMMCALL directly into the host. This would happen if the
l1-hypervisor doesn't intercept VMMCALL and the l2-guest
executes this instruction.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

0d945bd9

KVM: x86: Fix exception reinjection forced to true · 3f0fd292

由 Joerg Roedel 提交于 5月 05, 2010

The patch merged recently which allowed to mark an exception
as reinjected has a bug as it always marks the exception as
reinjected. This breaks nested-svm shadow-on-shadow
implementation.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

3f0fd292

KVM: Fix wallclock version writing race · 9ed3c444

由 Avi Kivity 提交于 5月 04, 2010

Wallclock writing uses an unprotected global variable to hold the version;
this can cause one guest to interfere with another if both write their
wallclock at the same time.
Acked-by: NGlauber Costa <glommer@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

9ed3c444

KVM: MMU: Don't read pdptrs with mmu spinlock held in mmu_alloc_roots · 8facbbff

由 Avi Kivity 提交于 5月 04, 2010

On svm, kvm_read_pdptr() may require reading guest memory, which can sleep.

Push the spinlock into mmu_alloc_roots(), and only take it after we've read
the pdptr.
Tested-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

8facbbff

KVM: VMX: enable VMXON check with SMX enabled (Intel TXT) · cafd6659

由 Shane Wang 提交于 4月 29, 2010

Per document, for feature control MSR:

Bit 1 enables VMXON in SMX operation. If the bit is clear, execution
of VMXON in SMX operation causes a general-protection exception.
Bit 2 enables VMXON outside SMX operation. If the bit is clear, execution
of VMXON outside SMX operation causes a general-protection exception.

This patch is to enable this kind of check with SMX for VMXON in KVM.
Signed-off-by: NShane Wang <shane.wang@intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

cafd6659

KVM: x86: properly update ready_for_interrupt_injection · f1d86e46

由 Marcelo Tosatti 提交于 5月 03, 2010

The recent changes to emulate string instructions without entering guest
mode exposed a bug where pending interrupts are not properly reflected
in ready_for_interrupt_injection.

The result is that userspace overwrites a previously queued interrupt,
when irqchip's are emulated in userspace.

Fix by always updating state before returning to userspace.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

f1d86e46

KVM: VMX: Atomically switch efer if EPT && !EFER.NX · 84ad33ef

由 Avi Kivity 提交于 4月 28, 2010

When EPT is enabled, we cannot emulate EFER.NX=0 through the shadow page
tables.  This causes accesses through ptes with bit 63 set to succeed instead
of failing a reserved bit check.
Signed-off-by: NAvi Kivity <avi@redhat.com>

84ad33ef

KVM: VMX: Add facility to atomically switch MSRs on guest entry/exit · 61d2ef2c

由 Avi Kivity 提交于 4月 28, 2010

Some guest msr values cannot be used on the host (for example. EFER.NX=0),
so we need to switch them atomically during guest entry or exit.

Add a facility to program the vmx msr autoload registers accordingly.
Signed-off-by: NAvi Kivity <avi@redhat.com>

61d2ef2c

A
KVM: VMX: Add definitions for guest and host EFER autoswitch vmcs entries · 5dfa3d17
由 Avi Kivity 提交于 4月 28, 2010
```
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
5dfa3d17
A
KVM: VMX: Add definition for msr autoload entry · 19b95dba
由 Avi Kivity 提交于 4月 28, 2010
```
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
19b95dba

KVM: Let vcpu structure alignment be determined at runtime · 0ee75bea

由 Avi Kivity 提交于 4月 28, 2010

vmx and svm vcpus have different contents and therefore may have different
alignmment requirements. Let each specify its required alignment.
Signed-off-by: NAvi Kivity <avi@redhat.com>

0ee75bea

KVM: MMU: cleanup invlpg code · 884a0ff0

由 Xiao Guangrong 提交于 4月 28, 2010

Using is_last_spte() to cleanup invlpg code
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

884a0ff0

KVM: MMU: move unsync/sync tracpoints to proper place · 5e1b3ddb

由 Xiao Guangrong 提交于 4月 28, 2010

Move unsync/sync tracepoints to the proper place, it's good
for us to obtain unsync page live time
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

5e1b3ddb

KVM: MMU: convert mmu tracepoints · 85f2067c

由 Xiao Guangrong 提交于 4月 28, 2010

Convert mmu tracepoints by using DECLARE_EVENT_CLASS
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

85f2067c

KVM: MMU: fix for calculating gpa in invlpg code · 22c9b2d1

由 Xiao Guangrong 提交于 4月 28, 2010

If the guest is 32-bit, we should use 'quadrant' to adjust gpa
offset
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

22c9b2d1

S
KVM: powerpc: use of kzalloc/kfree requires including slab.h · 329d20ba
由 Stephen Rothwell 提交于 4月 27, 2010
```
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
```
329d20ba

KVM: Fix mmu shrinker error · d35b8dd9

由 Gui Jianfeng 提交于 4月 27, 2010

kvm_mmu_remove_one_alloc_mmu_page() assumes kvm_mmu_zap_page() only reclaims
only one sp, but that's not the case. This will cause mmu shrinker returns
a wrong number. This patch fix the counting error.
Signed-off-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

d35b8dd9

KVM: MMU: fix hashing for TDP and non-paging modes · 5a7388c2

由 Eric Northup 提交于 4月 26, 2010

For TDP mode, avoid creating multiple page table roots for the single
guest-to-host physical address map by fixing the inputs used for the
shadow page table hash in mmu_alloc_roots().
Signed-off-by: NEric Northup <digitaleric@google.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

5a7388c2

17 5月, 2010 15 次提交

KVM: Minor MMU documentation edits · c4bd09b2

由 Avi Kivity 提交于 4月 26, 2010

Reported by Andrew Jones.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

c4bd09b2

KVM: Document KVM_GET_MP_STATE and KVM_SET_MP_STATE · b843f065

由 Avi Kivity 提交于 4月 25, 2010

Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

b843f065

KVM: MMU: fix sp->unsync type error in trace event definition · df2fb6e7

由 Gui Jianfeng 提交于 4月 22, 2010

sp->unsync is bool now, so update trace event declaration.
Signed-off-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

df2fb6e7

KVM: SVM: Handle MCE intercepts always on host level · ff47a49b