提交 · d8cabddf7e8fbdced2dd668c98d7762c7ef75245 · openeuler / Kernel

03 12月, 2009 14 次提交

KVM: SVM: Add tracepoint for nested #vmexit · d8cabddf

由 Joerg Roedel 提交于 10月 09, 2009

This patch adds a tracepoint for every #vmexit we get from a
nested guest.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

d8cabddf

KVM: SVM: Add tracepoint for nested vmrun · 0ac406de

由 Joerg Roedel 提交于 10月 09, 2009

This patch adds a dedicated kvm tracepoint for a nested
vmrun.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

0ac406de

KVM: x86: include pvclock MSRs in msrs_to_save · e3267cbb

由 Glauber Costa 提交于 10月 06, 2009

For a while now, we are issuing a rdmsr instruction to find out which
msrs in our save list are really supported by the underlying machine.
However, it fails to account for kvm-specific msrs, such as the pvclock
ones.

This patch moves then to the beginning of the list, and skip testing them.

Cc: stable@kernel.org
Signed-off-by: NGlauber Costa <glommer@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

e3267cbb

KVM: x86: Rework guest single-step flag injection and filtering · 91586a3b

由 Jan Kiszka 提交于 10月 05, 2009

Push TF and RF injection and filtering on guest single-stepping into the
vender get/set_rflags callbacks. This makes the whole mechanism more
robust wrt user space IOCTL order and instruction emulations.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

91586a3b

KVM: x86: disable paravirt mmu reporting · a68a6a72

由 Marcelo Tosatti 提交于 10月 01, 2009

Disable paravirt MMU capability reporting, so that new (or rebooted)
guests switch to native operation.

Paravirt MMU is a burden to maintain and does not bring significant
advantages compared to shadow anymore.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

a68a6a72

KVM: x86: Refactor guest debug IOCTL handling · 355be0b9

由 Jan Kiszka 提交于 10月 03, 2009

Much of so far vendor-specific code for setting up guest debug can
actually be handled by the generic code. This also fixes a minor deficit
in the SVM part /wrt processing KVM_GUESTDBG_ENABLE.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

355be0b9

KVM: remove pre_task_link setting in save_state_to_tss16 · 201d945b

由 Juan Quintela 提交于 9月 30, 2009

Now, also remove pre_task_link setting in save_state_to_tss16.

  commit b237ac37
  Author: Gleb Natapov <gleb@redhat.com>
  Date:   Mon Mar 30 16:03:24 2009 +0300

    KVM: Fix task switch back link handling.

CC: Gleb Natapov <gleb@redhat.com>
Signed-off-by: NJuan Quintela <quintela@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

201d945b

KVM: Kill the confusing tsc_ref_khz and ref_freq variables · 0cca7907

由 Zachary Amsden 提交于 9月 29, 2009

They are globals, not clearly protected by any ordering or locking, and
vulnerable to various startup races.

Instead, for variable TSC machines, register the cpufreq notifier and get
the TSC frequency directly from the cpufreq machinery.  Not only is it
always right, it is also perfectly accurate, as no error prone measurement
is required.

On such machines, when a new CPU online is brought online, it isn't clear what
frequency it will start with, and it may not correspond to the reference, thus
in hardware_enable we clear the cpu_tsc_khz variable to zero and make sure
it is set before running on a VCPU.
Signed-off-by: NZachary Amsden <zamsden@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

0cca7907

Z
KVM: Separate timer intialization into an indepedent function · b820cc0c
由 Zachary Amsden 提交于 9月 29, 2009
```
Signed-off-by: NZachary Amsden <zamsden@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
```
b820cc0c

KVM: Activate Virtualization On Demand · 10474ae8

由 Alexander Graf 提交于 9月 15, 2009

X86 CPUs need to have some magic happening to enable the virtualization
extensions on them. This magic can result in unpleasant results for
users, like blocking other VMMs from working (vmx) or using invalid TLB
entries (svm).

Currently KVM activates virtualization when the respective kernel module
is loaded. This blocks us from autoloading KVM modules without breaking
other VMMs.

To circumvent this problem at least a bit, this patch introduces on
demand activation of virtualization. This means, that instead
virtualization is enabled on creation of the first virtual machine
and disabled on destruction of the last one.

So using this, KVM can be easily autoloaded, while keeping other
hypervisors usable.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

10474ae8

A
KVM: Return -ENOTTY on unrecognized ioctls · 367e1319
由 Avi Kivity 提交于 8月 26, 2009
```
Not the incorrect -EINVAL.
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
367e1319

KVM: Drop kvm->irq_lock lock from irq injection path · 680b3648

由 Gleb Natapov 提交于 8月 24, 2009

The only thing it protects now is interrupt injection into lapic and
this can work lockless. Even now with kvm->irq_lock in place access
to lapic is not entirely serialized since vcpu access doesn't take
kvm->irq_lock.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

680b3648

KVM: Move IO APIC to its own lock · eba0226b

由 Gleb Natapov 提交于 8月 24, 2009

The allows removal of irq_lock from the injection path.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

eba0226b

KVM: Don't pass kvm_run arguments · 851ba692

由 Avi Kivity 提交于 8月 24, 2009

They're just copies of vcpu->run, which is readily accessible.
Signed-off-by: NAvi Kivity <avi@redhat.com>

851ba692

04 11月, 2009 2 次提交

KVM: get_tss_base_addr() should return a gpa_t · abb39119

由 Gleb Natapov 提交于 10月 25, 2009

If TSS we are switching to resides in high memory task switch will fail
since address will be truncated. Windows2k3 does this sometimes when
running with more then 4G

Cc: stable@kernel.org
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

abb39119

KVM: x86: Catch potential overrun in MCE setup · a9e38c3e

由 Jan Kiszka 提交于 10月 23, 2009

We only allocate memory for 32 MCE banks (KVM_MAX_MCE_BANKS) but we
allow user space to fill up to 255 on setup (mcg_cap & 0xff), corrupting
kernel memory. Catch these overflows.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

a9e38c3e

04 10月, 2009 1 次提交

KVM: Prevent overflow in KVM_GET_SUPPORTED_CPUID · 6a544355

由 Avi Kivity 提交于 10月 04, 2009

The number of entries is multiplied by the entry size, which can
overflow on 32-bit hosts.  Bound the entry count instead.
Reported-by: NDavid Wagner <daw@cs.berkeley.edu>
Cc: stable@kernel.org
Signed-off-by: NAvi Kivity <avi@redhat.com>

6a544355

10 9月, 2009 23 次提交

KVM: VMX: Check cpl before emulating debug register access · 0a79b009

由 Avi Kivity 提交于 9月 01, 2009

Debug registers may only be accessed from cpl 0.  Unfortunately, vmx will
code to emulate the instruction even though it was issued from guest
userspace, possibly leading to an unexpected trap later.

Cc: stable@kernel.org
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

0a79b009

KVM: x86: drop duplicate kvm_flush_remote_tlb calls · e3904e6e

由 Marcelo Tosatti 提交于 9月 08, 2009

kvm_mmu_slot_remove_write_access already calls it.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

e3904e6e

KVM: Use thread debug register storage instead of kvm specific data · 3d53c27d

由 Avi Kivity 提交于 9月 01, 2009

Instead of saving the debug registers from the processor to a kvm data
structure, rely in the debug registers stored in the thread structure.
This allows us not to save dr6 and dr7.

Reduces lightweight vmexit cost by 350 cycles, or 11 percent.
Signed-off-by: NAvi Kivity <avi@redhat.com>

3d53c27d

KVM: Protect update_cr8_intercept() when running without an apic · 88c808fd

由 Avi Kivity 提交于 8月 17, 2009

update_cr8_intercept() can be triggered from userspace while there
is no apic present.
Signed-off-by: NAvi Kivity <avi@redhat.com>

88c808fd

KVM: Use kvm_{read,write}_guest_virt() to read and write segment descriptors · d9048d32

由 Mikhail Ershov 提交于 8月 19, 2009

Segment descriptors tables can be placed on two non-contiguous pages.
This patch makes reading segment descriptors by linear address.
Signed-off-by: NMikhail Ershov <Mike.Ershov@gmail.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d9048d32

KVM: Rename x86_emulate.c to emulate.c · 56e82318

由 Avi Kivity 提交于 8月 12, 2009

We're in arch/x86, what could we possibly be emulating?
Signed-off-by: NAvi Kivity <avi@redhat.com>

56e82318

KVM: When switching to a vm8086 task, load segments as 16-bit · c0c7c04b

由 Anthony Liguori 提交于 8月 11, 2009

According to 16.2.5 in the SDM, eflags.vm in the tss is consulted before loading
and new segments. If eflags.vm == 1, then the segments are treated as 16-bit
segments. The LDTR and TR are not normally available in vm86 mode so if they
happen to somehow get loaded, they need to be treated as 32-bit segments.

This fixes an invalid vmentry failure in a custom OS that was happening after
a task switch into vm8086 mode. Since the segments were being mistakenly
treated as 32-bit, we loaded garbage state.
Signed-off-by: NAnthony Liguori <aliguori@us.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

c0c7c04b

KVM: Update cr8 intercept when APIC TPR is changed by userspace · cb142eb7

由 Gleb Natapov 提交于 8月 09, 2009

Since on vcpu entry we do it only if apic is enabled we should do
it when TPR is changed while apic is disabled. This happens when windows
resets HW without setting TPR to zero.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

cb142eb7

KVM: ignore reads to perfctr msrs · 1f3ee616

由 Amit Shah 提交于 6月 30, 2009

We ignore writes to the perfctr msrs. Ignore reads as well.

Kaspersky antivirus crashes Windows guests if it can't read
these MSRs.
Signed-off-by: NAmit Shah <amit.shah@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

1f3ee616

KVM: x86: Disallow hypercalls for guest callers in rings > 0 · 07708c4a

由 Jan Kiszka 提交于 8月 03, 2009

So far unprivileged guest callers running in ring 3 can issue, e.g., MMU
hypercalls. Normally, such callers cannot provide any hand-crafted MMU
command structure as it has to be passed by its physical address, but
they can still crash the guest kernel by passing random addresses.

To close the hole, this patch considers hypercalls valid only if issued
from guest ring 0. This may still be relaxed on a per-hypercall base in
the future once required.

Cc: stable@kernel.org
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

07708c4a

KVM: report 1GB page support to userspace · 344f414f

由 Joerg Roedel 提交于 7月 27, 2009

If userspace knows that the kernel part supports 1GB pages it can enable
the corresponding cpuid bit so that guests actually use GB pages.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

344f414f

KVM: Align cr8 threshold when userspace changes cr8 · 5f0269f5

由 Mikhail Ershov 提交于 8月 03, 2009

Commit f0a3602c20 ("KVM: Move interrupt injection logic to x86.c") does not
update the cr8 intercept if the lapic is disabled, so when userspace updates
cr8, the cr8 threshold control is not updated and we are left with illegal
control fields.

Fix by explicitly resetting the cr8 threshold.
Signed-off-by: NAvi Kivity <avi@redhat.com>

5f0269f5

KVM: VMX: Introduce KVM_SET_IDENTITY_MAP_ADDR ioctl · b927a3ce

由 Sheng Yang 提交于 7月 21, 2009

Now KVM allow guest to modify guest's physical address of EPT's identity mapping page.

(change from v1, discard unnecessary check, change ioctl to accept parameter
address rather than value)
Signed-off-by: NSheng Yang <sheng@linux.intel.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

b927a3ce

KVM: x86: use kvm_get_gdt() and kvm_read_ldt() · b792c344

由 Akinobu Mita 提交于 7月 19, 2009

Use kvm_get_gdt() and kvm_read_ldt() to reduce inline assembly code.

Cc: Avi Kivity <avi@redhat.com>
Cc: kvm@vger.kernel.org
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

b792c344

KVM: x86: use get_desc_base() and get_desc_limit() · 46a359e7

由 Akinobu Mita 提交于 7月 18, 2009

Use get_desc_base() and get_desc_limit() to get the base address and
limit in desc_struct.

Cc: Avi Kivity <avi@redhat.com>
Cc: kvm@vger.kernel.org
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

46a359e7

KVM: Reduce runnability interface with arch support code · a1b37100

由 Gleb Natapov 提交于 7月 09, 2009

Remove kvm_cpu_has_interrupt() and kvm_arch_interrupt_allowed() from
interface between general code and arch code. kvm_arch_vcpu_runnable()
checks for interrupts instead.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

a1b37100

G
KVM: Move exception handling to the same place as other events · b59bb7bd
由 Gleb Natapov 提交于 7月 09, 2009
```
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
b59bb7bd

KVM: add ioeventfd support · d34e6b17

由 Gregory Haskins 提交于 7月 07, 2009

ioeventfd is a mechanism to register PIO/MMIO regions to trigger an eventfd
signal when written to by a guest.  Host userspace can register any
arbitrary IO address with a corresponding eventfd and then pass the eventfd
to a specific end-point of interest for handling.

Normal IO requires a blocking round-trip since the operation may cause
side-effects in the emulated model or may return data to the caller.
Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM
"heavy-weight" exit back to userspace, and is ultimately serviced by qemu's
device model synchronously before returning control back to the vcpu.

However, there is a subclass of IO which acts purely as a trigger for
other IO (such as to kick off an out-of-band DMA request, etc).  For these
patterns, the synchronous call is particularly expensive since we really
only want to simply get our notification transmitted asychronously and
return as quickly as possible.  All the sychronous infrastructure to ensure
proper data-dependencies are met in the normal IO case are just unecessary
overhead for signalling.  This adds additional computational load on the
system, as well as latency to the signalling path.

Therefore, we provide a mechanism for registration of an in-kernel trigger
point that allows the VCPU to only require a very brief, lightweight
exit just long enough to signal an eventfd.  This also means that any
clients compatible with the eventfd interface (which includes userspace
and kernelspace equally well) can now register to be notified. The end
result should be a more flexible and higher performance notification API
for the backend KVM hypervisor and perhipheral components.

To test this theory, we built a test-harness called "doorbell".  This
module has a function called "doorbell_ring()" which simply increments a
counter for each time the doorbell is signaled.  It supports signalling
from either an eventfd, or an ioctl().

We then wired up two paths to the doorbell: One via QEMU via a registered
io region and through the doorbell ioctl().  The other is direct via
ioeventfd.

You can download this test harness here:

ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2

The measured results are as follows:

qemu-mmio:       110000 iops, 9.09us rtt
ioeventfd-mmio: 200100 iops, 5.00us rtt
ioeventfd-pio:  367300 iops, 2.72us rtt

I didn't measure qemu-pio, because I have to figure out how to register a
PIO region with qemu's device model, and I got lazy.  However, for now we
can extrapolate based on the data from the NULLIO runs of +2.56us for MMIO,
and -350ns for HC, we get:

qemu-pio:      153139 iops, 6.53us rtt
ioeventfd-hc: 412585 iops, 2.37us rtt

these are just for fun, for now, until I can gather more data.

Here is a graph for your convenience:

http://developer.novell.com/wiki/images/7/76/Iofd-chart.png

The conclusion to draw is that we save about 4us by skipping the userspace
hop.

--------------------
Signed-off-by: NGregory Haskins <ghaskins@novell.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d34e6b17

KVM: PIT support for HPET legacy mode · e9f42757

由 Beth Kon 提交于 7月 07, 2009

When kvm is in hpet_legacy_mode, the hpet is providing the timer
interrupt and the pit should not be. So in legacy mode, the pit timer
is destroyed, but the *state* of the pit is maintained. So if kvm or
the guest tries to modify the state of the pit, this modification is
accepted, *except* that the timer isn't actually started. When we exit
hpet_legacy_mode, the current state of the pit (which is up to date
since we've been accepting modifications) is used to restart the pit
timer.

The saved_mode code in kvm_pit_load_count temporarily changes mode to
0xff in order to destroy the timer, but then restores the actual
value, again maintaining "current" state of the pit for possible later
reenablement.

[avi: add some reserved storage in the ioctl; make SET_PIT2 IOW]
[marcelo: fix memory corruption due to reserved storage]
Signed-off-by: NBeth Kon <eak@us.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

e9f42757

KVM: Always report x2apic as supported feature · 0d1de2d9

由 Gleb Natapov 提交于 7月 12, 2009

We emulate x2apic in software, so host support is not required.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

0d1de2d9

KVM: No need to kick cpu if not in a guest mode · c7f0f24b

由 Gleb Natapov 提交于 7月 07, 2009

This will save a couple of IPIs.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Acked-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

c7f0f24b

KVM: fix MMIO_CONF_BASE MSR access · f7c6d140

由 Andre Przywara 提交于 7月 02, 2009

Some Windows versions check whether the BIOS has setup MMI/O for
config space accesses on AMD Fam10h CPUs, we say "no" by returning 0 on
reads and only allow disabling of MMI/O CfgSpace setup by igoring "0" writes.
Signed-off-by: NAndre Przywara <andre.przywara@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

f7c6d140

Revert "KVM: x86: check for cr3 validity in ioctl_set_sregs" · dc7e795e

由 Jan Kiszka 提交于 7月 01, 2009

This reverts commit 6c20e1442bb1c62914bb85b7f4a38973d2a423ba.

To my understanding, it became obsolete with the advent of the more
robust check in mmu_alloc_roots (89da4ff17f). Moreover, it prevents
the conceptually safe pattern

 1. set sregs
 2. register mem-slots
 3. run vcpu

by setting a sticky triple fault during step 1.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

dc7e795e

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功