提交 · 18863bdd60f895f3b3ba16b15e8331aee781e8ec · openanolis / cloud-kernel

03 12月, 2009 24 次提交

KVM: x86 shared msr infrastructure · 18863bdd

由 Avi Kivity 提交于 9月 07, 2009

The various syscall-related MSRs are fairly expensive to switch.  Currently
we switch them on every vcpu preemption, which is far too often:

- if we're switching to a kernel thread (idle task, threaded interrupt,
  kernel-mode virtio server (vhost-net), for example) and back, then
  there's no need to switch those MSRs since kernel threasd won't
  be exiting to userspace.

- if we're switching to another guest running an identical OS, most likely
  those MSRs will have the same value, so there's little point in reloading
  them.

- if we're running the same OS on the guest and host, the MSRs will have
  identical values and reloading is unnecessary.

This patch uses the new user return notifiers to implement last-minute
switching, and checks the msr values to avoid unnecessary reloading.
Signed-off-by: NAvi Kivity <avi@redhat.com>

18863bdd

KVM: allow userspace to adjust kvmclock offset · afbcf7ab

由 Glauber Costa 提交于 10月 16, 2009

When we migrate a kvm guest that uses pvclock between two hosts, we may
suffer a large skew. This is because there can be significant differences
between the monotonic clock of the hosts involved. When a new host with
a much larger monotonic time starts running the guest, the view of time
will be significantly impacted.

Situation is much worse when we do the opposite, and migrate to a host with
a smaller monotonic clock.

This proposed ioctl will allow userspace to inform us what is the monotonic
clock value in the source host, so we can keep the time skew short, and
more importantly, never goes backwards. Userspace may also need to trigger
the current data, since from the first migration onwards, it won't be
reflected by a simple call to clock_gettime() anymore.

[marcelo: future-proof abi with a flags field]
[jan: fix KVM_GET_CLOCK by clearing flags field instead of checking it]
Signed-off-by: NGlauber Costa <glommer@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

afbcf7ab

KVM: x86: Fix guest single-stepping while interruptible · 94fe45da

由 Jan Kiszka 提交于 10月 18, 2009

Commit 705c5323 opened the doors of hell by unconditionally injecting
single-step flags as long as guest_debug signaled this. This doesn't
work when the guest branches into some interrupt or exception handler
and triggers a vmexit with flag reloading.

Fix it by saving cs:rip when user space requests single-stepping and
restricting the trace flag injection to this guest code position.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

94fe45da

KVM: Xen PV-on-HVM guest support · ffde22ac

由 Ed Swierk 提交于 10月 15, 2009

Support for Xen PV-on-HVM guests can be implemented almost entirely in
userspace, except for handling one annoying MSR that maps a Xen
hypercall blob into guest address space.

A generic mechanism to delegate MSR writes to userspace seems overkill
and risks encouraging similar MSR abuse in the future.  Thus this patch
adds special support for the Xen HVM MSR.

I implemented a new ioctl, KVM_XEN_HVM_CONFIG, that lets userspace tell
KVM which MSR the guest will write to, as well as the starting address
and size of the hypercall blobs (one each for 32-bit and 64-bit) that
userspace has loaded from files.  When the guest writes to the MSR, KVM
copies one page of the blob from userspace to the guest.

I've tested this patch with a hacked-up version of Gerd's userspace
code, booting a number of guests (CentOS 5.3 i386 and x86_64, and
FreeBSD 8.0-RC1 amd64) and exercising PV network and block devices.

[jan: fix i386 build warning]
[avi: future proof abi with a flags field]
Signed-off-by: NEd Swierk <eswierk@aristanetworks.com>
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

ffde22ac

KVM: x86: Drop unneeded CONFIG_HAS_IOMEM check · 94c30d9c

由 Jan Kiszka 提交于 10月 12, 2009

This (broken) check dates back to the days when this code was shared
across architectures. x86 has IOMEM, so drop it.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

94c30d9c

KVM: x86: Harden against cpufreq · 6b7d7e76

由 Zachary Amsden 提交于 10月 09, 2009

If cpufreq can't determine the CPU khz, or cpufreq is not compiled in,
we should fallback to the measured TSC khz.
Signed-off-by: NZachary Amsden <zamsden@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

6b7d7e76

KVM: SVM: Add tracepoint for skinit instruction · 532a46b9

由 Joerg Roedel 提交于 10月 09, 2009

This patch adds a tracepoint for the event that the guest
executed the SKINIT instruction. This information is
important because SKINIT is an SVM extenstion not yet
implemented by nested SVM and we may need this information
for debugging hypervisors that do not yet run on nested SVM.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

532a46b9

KVM: SVM: Add tracepoint for invlpga instruction · ec1ff790

由 Joerg Roedel 提交于 10月 09, 2009

This patch adds a tracepoint for the event that the guest
executed the INVLPGA instruction.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

ec1ff790

KVM: SVM: Add tracepoint for #vmexit because intr pending · 236649de

由 Joerg Roedel 提交于 10月 09, 2009

This patch adds a special tracepoint for the event that a
nested #vmexit is injected because kvm wants to inject an
interrupt into the guest.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

236649de

KVM: SVM: Add tracepoint for injected #vmexit · 17897f36

由 Joerg Roedel 提交于 10月 09, 2009

This patch adds a tracepoint for a nested #vmexit that gets
re-injected to the guest.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

17897f36

KVM: SVM: Add tracepoint for nested #vmexit · d8cabddf

由 Joerg Roedel 提交于 10月 09, 2009

This patch adds a tracepoint for every #vmexit we get from a
nested guest.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

d8cabddf

KVM: SVM: Add tracepoint for nested vmrun · 0ac406de

由 Joerg Roedel 提交于 10月 09, 2009

This patch adds a dedicated kvm tracepoint for a nested
vmrun.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

0ac406de

KVM: x86: include pvclock MSRs in msrs_to_save · e3267cbb

由 Glauber Costa 提交于 10月 06, 2009

For a while now, we are issuing a rdmsr instruction to find out which
msrs in our save list are really supported by the underlying machine.
However, it fails to account for kvm-specific msrs, such as the pvclock
ones.

This patch moves then to the beginning of the list, and skip testing them.

Cc: stable@kernel.org
Signed-off-by: NGlauber Costa <glommer@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

e3267cbb

KVM: x86: Rework guest single-step flag injection and filtering · 91586a3b

由 Jan Kiszka 提交于 10月 05, 2009

Push TF and RF injection and filtering on guest single-stepping into the
vender get/set_rflags callbacks. This makes the whole mechanism more
robust wrt user space IOCTL order and instruction emulations.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

91586a3b

KVM: x86: disable paravirt mmu reporting · a68a6a72

由 Marcelo Tosatti 提交于 10月 01, 2009

Disable paravirt MMU capability reporting, so that new (or rebooted)
guests switch to native operation.

Paravirt MMU is a burden to maintain and does not bring significant
advantages compared to shadow anymore.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

a68a6a72

KVM: x86: Refactor guest debug IOCTL handling · 355be0b9

由 Jan Kiszka 提交于 10月 03, 2009

Much of so far vendor-specific code for setting up guest debug can
actually be handled by the generic code. This also fixes a minor deficit
in the SVM part /wrt processing KVM_GUESTDBG_ENABLE.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

355be0b9

KVM: remove pre_task_link setting in save_state_to_tss16 · 201d945b

由 Juan Quintela 提交于 9月 30, 2009

Now, also remove pre_task_link setting in save_state_to_tss16.

  commit b237ac37
  Author: Gleb Natapov <gleb@redhat.com>
  Date:   Mon Mar 30 16:03:24 2009 +0300

    KVM: Fix task switch back link handling.

CC: Gleb Natapov <gleb@redhat.com>
Signed-off-by: NJuan Quintela <quintela@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

201d945b

KVM: Kill the confusing tsc_ref_khz and ref_freq variables · 0cca7907

由 Zachary Amsden 提交于 9月 29, 2009

They are globals, not clearly protected by any ordering or locking, and
vulnerable to various startup races.

Instead, for variable TSC machines, register the cpufreq notifier and get
the TSC frequency directly from the cpufreq machinery.  Not only is it
always right, it is also perfectly accurate, as no error prone measurement
is required.

On such machines, when a new CPU online is brought online, it isn't clear what
frequency it will start with, and it may not correspond to the reference, thus
in hardware_enable we clear the cpu_tsc_khz variable to zero and make sure
it is set before running on a VCPU.
Signed-off-by: NZachary Amsden <zamsden@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

0cca7907

Z
KVM: Separate timer intialization into an indepedent function · b820cc0c
由 Zachary Amsden 提交于 9月 29, 2009
```
Signed-off-by: NZachary Amsden <zamsden@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
```
b820cc0c

KVM: Activate Virtualization On Demand · 10474ae8

由 Alexander Graf 提交于 9月 15, 2009

X86 CPUs need to have some magic happening to enable the virtualization
extensions on them. This magic can result in unpleasant results for
users, like blocking other VMMs from working (vmx) or using invalid TLB
entries (svm).

Currently KVM activates virtualization when the respective kernel module
is loaded. This blocks us from autoloading KVM modules without breaking
other VMMs.

To circumvent this problem at least a bit, this patch introduces on
demand activation of virtualization. This means, that instead
virtualization is enabled on creation of the first virtual machine
and disabled on destruction of the last one.

So using this, KVM can be easily autoloaded, while keeping other
hypervisors usable.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

10474ae8

A
KVM: Return -ENOTTY on unrecognized ioctls · 367e1319
由 Avi Kivity 提交于 8月 26, 2009
```
Not the incorrect -EINVAL.
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
367e1319

KVM: Drop kvm->irq_lock lock from irq injection path · 680b3648

由 Gleb Natapov 提交于 8月 24, 2009

The only thing it protects now is interrupt injection into lapic and
this can work lockless. Even now with kvm->irq_lock in place access
to lapic is not entirely serialized since vcpu access doesn't take
kvm->irq_lock.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

680b3648

KVM: Move IO APIC to its own lock · eba0226b

由 Gleb Natapov 提交于 8月 24, 2009

The allows removal of irq_lock from the injection path.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

eba0226b

KVM: Don't pass kvm_run arguments · 851ba692

由 Avi Kivity 提交于 8月 24, 2009

They're just copies of vcpu->run, which is readily accessible.
Signed-off-by: NAvi Kivity <avi@redhat.com>

851ba692

04 11月, 2009 2 次提交

KVM: get_tss_base_addr() should return a gpa_t · abb39119

由 Gleb Natapov 提交于 10月 25, 2009

If TSS we are switching to resides in high memory task switch will fail
since address will be truncated. Windows2k3 does this sometimes when
running with more then 4G

Cc: stable@kernel.org
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

abb39119

KVM: x86: Catch potential overrun in MCE setup · a9e38c3e

由 Jan Kiszka 提交于 10月 23, 2009

We only allocate memory for 32 MCE banks (KVM_MAX_MCE_BANKS) but we
allow user space to fill up to 255 on setup (mcg_cap & 0xff), corrupting
kernel memory. Catch these overflows.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

a9e38c3e

04 10月, 2009 1 次提交

KVM: Prevent overflow in KVM_GET_SUPPORTED_CPUID · 6a544355

由 Avi Kivity 提交于 10月 04, 2009

The number of entries is multiplied by the entry size, which can
overflow on 32-bit hosts.  Bound the entry count instead.
Reported-by: NDavid Wagner <daw@cs.berkeley.edu>
Cc: stable@kernel.org
Signed-off-by: NAvi Kivity <avi@redhat.com>

6a544355

10 9月, 2009 13 次提交

KVM: VMX: Check cpl before emulating debug register access · 0a79b009

由 Avi Kivity 提交于 9月 01, 2009

Debug registers may only be accessed from cpl 0.  Unfortunately, vmx will
code to emulate the instruction even though it was issued from guest
userspace, possibly leading to an unexpected trap later.

Cc: stable@kernel.org
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

0a79b009

KVM: x86: drop duplicate kvm_flush_remote_tlb calls · e3904e6e

由 Marcelo Tosatti 提交于 9月 08, 2009

kvm_mmu_slot_remove_write_access already calls it.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

e3904e6e

KVM: Use thread debug register storage instead of kvm specific data · 3d53c27d

由 Avi Kivity 提交于 9月 01, 2009

Instead of saving the debug registers from the processor to a kvm data
structure, rely in the debug registers stored in the thread structure.
This allows us not to save dr6 and dr7.

Reduces lightweight vmexit cost by 350 cycles, or 11 percent.
Signed-off-by: NAvi Kivity <avi@redhat.com>

3d53c27d

KVM: Protect update_cr8_intercept() when running without an apic · 88c808fd

由 Avi Kivity 提交于 8月 17, 2009

update_cr8_intercept() can be triggered from userspace while there
is no apic present.
Signed-off-by: NAvi Kivity <avi@redhat.com>

88c808fd

KVM: Use kvm_{read,write}_guest_virt() to read and write segment descriptors · d9048d32

由 Mikhail Ershov 提交于 8月 19, 2009

Segment descriptors tables can be placed on two non-contiguous pages.
This patch makes reading segment descriptors by linear address.
Signed-off-by: NMikhail Ershov <Mike.Ershov@gmail.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d9048d32

KVM: Rename x86_emulate.c to emulate.c · 56e82318

由 Avi Kivity 提交于 8月 12, 2009

We're in arch/x86, what could we possibly be emulating?
Signed-off-by: NAvi Kivity <avi@redhat.com>

56e82318

KVM: When switching to a vm8086 task, load segments as 16-bit · c0c7c04b

由 Anthony Liguori 提交于 8月 11, 2009

According to 16.2.5 in the SDM, eflags.vm in the tss is consulted before loading
and new segments. If eflags.vm == 1, then the segments are treated as 16-bit
segments. The LDTR and TR are not normally available in vm86 mode so if they
happen to somehow get loaded, they need to be treated as 32-bit segments.

This fixes an invalid vmentry failure in a custom OS that was happening after
a task switch into vm8086 mode. Since the segments were being mistakenly
treated as 32-bit, we loaded garbage state.
Signed-off-by: NAnthony Liguori <aliguori@us.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

c0c7c04b

KVM: Update cr8 intercept when APIC TPR is changed by userspace · cb142eb7

由 Gleb Natapov 提交于 8月 09, 2009

Since on vcpu entry we do it only if apic is enabled we should do
it when TPR is changed while apic is disabled. This happens when windows
resets HW without setting TPR to zero.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

cb142eb7

KVM: ignore reads to perfctr msrs · 1f3ee616

由 Amit Shah 提交于 6月 30, 2009

We ignore writes to the perfctr msrs. Ignore reads as well.

Kaspersky antivirus crashes Windows guests if it can't read
these MSRs.
Signed-off-by: NAmit Shah <amit.shah@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

1f3ee616

KVM: x86: Disallow hypercalls for guest callers in rings > 0 · 07708c4a

由 Jan Kiszka 提交于 8月 03, 2009

So far unprivileged guest callers running in ring 3 can issue, e.g., MMU
hypercalls. Normally, such callers cannot provide any hand-crafted MMU
command structure as it has to be passed by its physical address, but
they can still crash the guest kernel by passing random addresses.

To close the hole, this patch considers hypercalls valid only if issued
from guest ring 0. This may still be relaxed on a per-hypercall base in
the future once required.

Cc: stable@kernel.org
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

07708c4a

KVM: report 1GB page support to userspace · 344f414f

由 Joerg Roedel 提交于 7月 27, 2009

If userspace knows that the kernel part supports 1GB pages it can enable
the corresponding cpuid bit so that guests actually use GB pages.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

344f414f

KVM: Align cr8 threshold when userspace changes cr8 · 5f0269f5

由 Mikhail Ershov 提交于 8月 03, 2009

Commit f0a3602c20 ("KVM: Move interrupt injection logic to x86.c") does not
update the cr8 intercept if the lapic is disabled, so when userspace updates
cr8, the cr8 threshold control is not updated and we are left with illegal
control fields.

Fix by explicitly resetting the cr8 threshold.
Signed-off-by: NAvi Kivity <avi@redhat.com>

5f0269f5

KVM: VMX: Introduce KVM_SET_IDENTITY_MAP_ADDR ioctl · b927a3ce

由 Sheng Yang 提交于 7月 21, 2009

Now KVM allow guest to modify guest's physical address of EPT's identity mapping page.

(change from v1, discard unnecessary check, change ioctl to accept parameter
address rather than value)
Signed-off-by: NSheng Yang <sheng@linux.intel.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

b927a3ce

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功