提交 · d28c6cfbbc5e2d4fccfe6d733995ed5971ca87f6 · OpenHarmony / kernel_linux

03 5月, 2007 8 次提交

KVM: MMU: Fix hugepage pdes mapping same physical address with different access · d28c6cfb

由 Avi Kivity 提交于 3月 23, 2007

The kvm mmu keeps a shadow page for hugepage pdes; if several such pdes map
the same physical address, they share the same shadow page. This is a fairly
common case (kernel mappings on i386 nonpae Linux, for example).

However, if the two pdes map the same memory but with different permissions, kvm
will happily use the cached shadow page. If the access through the more
permissive pde will occur after the access to the strict pde, an endless pagefault
loop will be generated and the guest will make no progress.

Fix by making the access permissions part of the cache lookup key.

The fix allows Xen pae to boot on kvm and run guest domains.

Thanks to Jeremy Fitzhardinge for reporting the bug and testing the fix.
Signed-off-by: NAvi Kivity <avi@qumranet.com>

d28c6cfb

KVM: Remove set_cr0_no_modeswitch() arch op · f6528b03

由 Avi Kivity 提交于 3月 20, 2007

set_cr0_no_modeswitch() was a hack to avoid corrupting segment registers.
As we now cache the protected mode values on entry to real mode, this
isn't an issue anymore, and it interferes with reboot (which usually _is_
a modeswitch).
Signed-off-by: NAvi Kivity <avi@qumranet.com>

f6528b03

KVM: MMU: Remove global pte tracking · aac01224

由 Avi Kivity 提交于 3月 20, 2007

The initial, noncaching, version of the kvm mmu flushed the all nonglobal
shadow page table translations (much like a native tlb flush).  The new
implementation flushes translations only when they change, rendering global
pte tracking superfluous.

This removes the unused tracking mechanism and storage space.
Signed-off-by: NAvi Kivity <avi@qumranet.com>

aac01224

KVM: Avoid guest virtual addresses in string pio userspace interface · 039576c0

由 Avi Kivity 提交于 3月 20, 2007

The current string pio interface communicates using guest virtual addresses,
relying on userspace to translate addresses and to check permissions. This
interface cannot fully support guest smp, as the check needs to take into
account two pages at one in case an unaligned string transfer straddles a
page boundary.

Change the interface not to communicate guest addresses at all; instead use
a buffer page (mmaped by userspace) and do transfers there. The kernel
manages the virtual to physical translation and can perform the checks
atomically by taking the appropriate locks.
Signed-off-by: NAvi Kivity <avi@qumranet.com>

039576c0

KVM: Add guest mode signal mask · 1961d276

由 Avi Kivity 提交于 3月 05, 2007

Allow a special signal mask to be used while executing in guest mode. This
allows signals to be used to interrupt a vcpu without requiring signal
delivery to a userspace handler, which is quite expensive. Userspace still
receives -EINTR and can get the signal via sigwait().
Signed-off-by: NAvi Kivity <avi@qumranet.com>

1961d276

KVM: Handle cpuid in the kernel instead of punting to userspace · 06465c5a

由 Avi Kivity 提交于 2月 28, 2007

KVM used to handle cpuid by letting userspace decide what values to
return to the guest.  We now handle cpuid completely in the kernel.  We
still let userspace decide which values the guest will see by having
userspace set up the value table beforehand (this is necessary to allow
management software to set the cpu features to the least common denominator,
so that live migration can work).

The motivation for the change is that kvm kernel code can be impacted by
cpuid features, for example the x86 emulator.
Signed-off-by: NAvi Kivity <avi@qumranet.com>

06465c5a

KVM: Do not communicate to userspace through cpu registers during PIO · 46fc1477

由 Avi Kivity 提交于 2月 22, 2007

Currently when passing the a PIO emulation request to userspace, we
rely on userspace updating %rax (on 'in' instructions) and %rsi/%rdi/%rcx
(on string instructions).  This (a) requires two extra ioctls for getting
and setting the registers and (b) is unfriendly to non-x86 archs, when
they get kvm ports.

So fix by doing the register fixups in the kernel and passing to userspace
only an abstract description of the PIO to be done.
Signed-off-by: NAvi Kivity <avi@qumranet.com>

46fc1477

KVM: Use a shared page for kernel/user communication when runing a vcpu · 9a2bb7f4

由 Avi Kivity 提交于 2月 22, 2007

Instead of passing a 'struct kvm_run' back and forth between the kernel and
userspace, allocate a page and allow the user to mmap() it.  This reduces
needless copying and makes the interface expandable by providing lots of
free space.
Signed-off-by: NAvi Kivity <avi@qumranet.com>

9a2bb7f4

04 3月, 2007 4 次提交

KVM: Per-vcpu inodes · bccf2150

由 Avi Kivity 提交于 2月 21, 2007

Allocate a distinct inode for every vcpu in a VM.  This has the following
benefits:

 - the filp cachelines are no longer bounced when f_count is incremented on
   every ioctl()
 - the API and internal code are distinctly clearer; for example, on the
   KVM_GET_REGS ioctl, there is no need to copy the vcpu number from
   userspace and then copy the registers back; the vcpu identity is derived
   from the fd used to make the call

Right now the performance benefits are completely theoretical since (a) we
don't support more than one vcpu per VM and (b) virtualization hardware
inefficiencies completely everwhelm any cacheline bouncing effects.  But
both of these will change, and we need to prepare the API today.
Signed-off-by: NAvi Kivity <avi@qumranet.com>

bccf2150

A
KVM: Wire up hypercall handlers to a central arch-independent location · 270fd9b9
由 Avi Kivity 提交于 2月 19, 2007
```
Signed-off-by: NAvi Kivity <avi@qumranet.com>
```
270fd9b9

KVM: add MSR based hypercall API · 102d8325

由 Ingo Molnar 提交于 2月 19, 2007

This adds a special MSR based hypercall API to KVM. This is to be
used by paravirtual kernels and virtual drivers.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

102d8325

KVM: Use page_private()/set_page_private() apis · 5972e953

由 Markus Rechberger 提交于 2月 19, 2007

Besides using an established api, this allows using kvm in older kernels.
Signed-off-by: NMarkus Rechberger <markus.rechberger@amd.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

5972e953

13 2月, 2007 3 次提交

[PATCH] KVM: cpu hotplug support · 774c47f1

由 Avi Kivity 提交于 2月 12, 2007

On hotplug, we execute the hardware extension enable sequence.  On unplug, we
decache any vcpus that last ran on the exiting cpu, and execute the hardware
extension disable sequence.
Signed-off-by: NAvi Kivity <avi@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

774c47f1

[PATCH] KVM: Add a global list of all virtual machines · 133de902

由 Avi Kivity 提交于 2月 12, 2007

This will allow us to iterate over all vcpus and see which cpus they are
running on.

[akpm@osdl.org: use standard (ugly) initialisers]
Signed-off-by: NAvi Kivity <avi@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

133de902

[PATCH] kvm: Fix asm constraint for lldt instruction · a0610ddf

由 S.Caglar Onur 提交于 2月 12, 2007

lldt does not accept immediate operands, which "g" allows.
Signed-off-by: NS.Caglar Onur <caglar@pardus.org.tr>
Signed-off-by: NAvi Kivity <avi@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a0610ddf

27 1月, 2007 1 次提交

[PATCH] KVM: Emulate IA32_MISC_ENABLE msr · 6f00e68f

由 Avi Kivity 提交于 1月 26, 2007

This allows netbsd 3.1 i386 to get further along installing.
Signed-off-by: NAvi Kivity <avi@qumranet.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6f00e68f

06 1月, 2007 13 次提交

[PATCH] KVM: MMU: Replace atomic allocations by preallocated objects · 714b93da

由 Avi Kivity 提交于 1月 05, 2007

The mmu sometimes needs memory for reverse mapping and parent pte chains.
however, we can't allocate from within the mmu because of the atomic context.

So, move the allocations to a central place that can be executed before the
main mmu machinery, where we can bail out on failure before any damage is
done.

(error handling is deffered for now, but the basic structure is there)
Signed-off-by: NAvi Kivity <avi@qumranet.com>
Acked-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

714b93da

[PATCH] KVM: MMU: Never free a shadow page actively serving as a root · 3bb65a22

由 Avi Kivity 提交于 1月 05, 2007

We always need cr3 to point to something valid, so if we detect that we're
freeing a root page, simply push it back to the top of the active list.
Signed-off-by: NAvi Kivity <avi@qumranet.com>
Acked-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

3bb65a22

[PATCH] KVM: MMU: Page table write flood protection · 86a5ba02

由 Avi Kivity 提交于 1月 05, 2007

In fork() (or when we protect a page that is no longer a page table), we can
experience floods of writes to a page, which have to be emulated.  This is
expensive.

So, if we detect such a flood, zap the page so subsequent writes can proceed
natively.
Signed-off-by: NAvi Kivity <avi@qumranet.com>
Acked-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

86a5ba02

[PATCH] KVM: MMU: Remove invlpg interception · 5f015a5b

由 Avi Kivity 提交于 1月 05, 2007

Since we write protect shadowed guest page tables, there is no need to trap
page invalidations (the guest will always change the mapping before issuing
the invlpg instruction).
Signed-off-by: NAvi Kivity <avi@qumranet.com>
Acked-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

5f015a5b

[PATCH] KVM: MMU: oom handling · ebeace86

由 Avi Kivity 提交于 1月 05, 2007

When beginning to process a page fault, make sure we have enough shadow pages
available to service the fault.  If not, free some pages.
Signed-off-by: NAvi Kivity <avi@qumranet.com>
Acked-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

ebeace86

[PATCH] KVM: MMU: If emulating an instruction fails, try unprotecting the page · a436036b

由 Avi Kivity 提交于 1月 05, 2007

A page table may have been recycled into a regular page, and so any
instruction can be executed on it.  Unprotect the page and let the cpu do its
thing.
Signed-off-by: NAvi Kivity <avi@qumranet.com>
Acked-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a436036b

[PATCH] KVM: MMU: Support emulated writes into RAM · da4a00f0

由 Avi Kivity 提交于 1月 05, 2007

As the mmu write protects guest page table, we emulate those writes.  Since
they are not mmio, there is no need to go to userspace to perform them.

So, perform the writes in the kernel if possible, and notify the mmu about
them so it can take the approriate action.
Signed-off-by: NAvi Kivity <avi@qumranet.com>
Acked-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

da4a00f0

[PATCH] KVM: MMU: Shadow page table caching · cea0f0e7

由 Avi Kivity 提交于 1月 05, 2007

Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.

The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
   * we can cache real mode, 32-bit mode, pae, and long mode page
     tables simultaneously.  this is useful for smp bootup.
- the guest page table table
   * some kernels use a page as both a page table and a page directory.  this
     allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
   * 32-bit mode page tables span 4MB, whereas a shadow page table spans
     2MB.  similarly, a 32-bit page directory spans 4GB, while a shadow
     page directory spans 1GB.  the quadrant allows caching up to 4 shadow page
     tables for one guest page in one level.
- a "metaphysical" bit
   * for real mode, and for pse pages, there is no guest page table, so set
     the bit to avoid write protecting the page.
Signed-off-by: NAvi Kivity <avi@qumranet.com>
Acked-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

cea0f0e7

[PATCH] KVM: MU: Special treatment for shadow pae root pages · 17ac10ad

由 Avi Kivity 提交于 1月 05, 2007

Since we're not going to cache the pae-mode shadow root pages, allocate a
single pae shadow that will hold the four lower-level pages, which will act as
roots.
Signed-off-by: NAvi Kivity <avi@qumranet.com>
Acked-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

17ac10ad

[PATCH] KVM: MMU: Load the pae pdptrs on cr3 change like the processor does · 1342d353

由 Avi Kivity 提交于 1月 05, 2007

In pae mode, a load of cr3 loads the four third-level page table entries in
addition to cr3 itself.
Signed-off-by: NAvi Kivity <avi@qumranet.com>
Acked-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

1342d353

[PATCH] KVM: MMU: Implement simple reverse mapping · cd4a4e53

由 Avi Kivity 提交于 1月 05, 2007

Keep in each host page frame's page->private a pointer to the shadow pte which
maps it.  If there are multiple shadow ptes mapping the page, set bit 0 of
page->private, and use the rest as a pointer to a linked list of all such
mappings.

Reverse mappings are needed because we when we cache shadow page tables, we
must protect the guest page tables from being modified by the guest, as that
would invalidate the cached ptes.
Signed-off-by: NAvi Kivity <avi@qumranet.com>
Acked-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

cd4a4e53

[PATCH] KVM: Prevent stale bits in cr0 and cr4 · 399badf3

由 Avi Kivity 提交于 1月 05, 2007

Hardware virtualization implementations allow the guests to freely change some
of the bits in cr0 and cr4, but trap when changing the other bits. This is
useful to avoid excessive exits due to changing, for example, the ts flag.

It also means the kvm's copy of cr0 and cr4 may be stale with respect to these
bits. most of the time this doesn't matter as these bits are not very
interesting. Other times, however (for example when returning cr0 to
userspace), they are, so get the fresh contents of these bits from the guest
by means of a new arch operation.
Signed-off-by: NAvi Kivity <avi@qumranet.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

399badf3

[PATCH] KVM: Improve interrupt response · c1150d8c

由 Dor Laor 提交于 1月 05, 2007

The current interrupt injection mechanism might delay an interrupt under
the following circumstances:

 - if injection fails because the guest is not interruptible (rflags.IF clear,
   or after a 'mov ss' or 'sti' instruction).  Userspace can check rflags,
   but the other cases or not testable under the current API.
 - if injection fails because of a fault during delivery.  This probably
   never happens under normal guests.
 - if injection fails due to a physical interrupt causing a vmexit so that
   it can be handled by the host.

In all cases the guest proceeds without processing the interrupt, reducing
the interactive feel and interrupt throughput of the guest.

This patch fixes the situation by allowing userspace to request an exit
when the 'interrupt window' opens, so that it can re-inject the interrupt
at the right time.  Guest interactivity is very visibly improved.
Signed-off-by: NDor Laor <dor.laor@qumranet.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>
Acked-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

c1150d8c

31 12月, 2006 3 次提交

[PATCH] kvm: fix GFP_KERNEL allocation in atomic section in kvm_dev_ioctl_create_vcpu() · 8018c27b

由 Ingo Molnar 提交于 12月 29, 2006

fix an GFP_KERNEL allocation in atomic section: kvm_dev_ioctl_create_vcpu()
called kvm_mmu_init(), which calls alloc_pages(), while holding the vcpu.

The fix is to set up the MMU state in two phases: kvm_mmu_create() and
kvm_mmu_setup().

(NOTE: free_vcpus does an kvm_mmu_destroy() call so there's no need for any
extra teardown branch on allocation/init failure here.)
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

8018c27b

[PATCH] KVM: Move common msr handling to arch independent code · 3bab1f5d

由 Avi Kivity 提交于 12月 29, 2006

Signed-off-by: NAvi Kivity <avi@qumranet.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

3bab1f5d

[PATCH] KVM: Simplify is_long_mode() · a9058ecd

由 Avi Kivity 提交于 12月 29, 2006

Instead of doing tricky stuff with the arch dependent virtualization
registers, take a peek at the guest's efer.

This simlifies some code, and fixes some confusion in the mmu branch.
Signed-off-by: NAvi Kivity <avi@qumranet.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a9058ecd

14 12月, 2006 1 次提交

[PATCH] KVM: Replace __x86_64__ with CONFIG_X86_64 · 05b3e0c2

由 Avi Kivity 提交于 12月 13, 2006

As per akpm's request.
Signed-off-by: NAvi Kivity <avi@qumranet.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

05b3e0c2

11 12月, 2006 1 次提交

[PATCH] kvm: userspace interface · 6aa8b732

由 Avi Kivity 提交于 12月 10, 2006

web site: http://kvm.sourceforge.net

mailing list: kvm-devel@lists.sourceforge.net
  (http://lists.sourceforge.net/lists/listinfo/kvm-devel)

The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture.  The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace.  Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.

Using this driver, one can start multiple virtual machines on a host.

Each virtual machine is a process on the host; a virtual cpu is a thread in
that process.  kill(1), nice(1), top(1) work as expected.  In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode.  Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm).  Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.

The driver supports i386 and x86_64 hosts and guests.  All combinations are
allowed except x86_64 guest on i386 host.  For i386 guests and hosts, both pae
and non-pae paging modes are supported.

SMP hosts and UP guests are supported.  At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.

Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch.  We plan to address this in two ways:

- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables

Currently a virtual desktop is responsive but consumes a lot of CPU.  Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization.  Linux/X is slower, probably due
to X being in a separate process.

In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.

Caveats (akpm: might no longer be true):

- The Windows install currently bluescreens due to a problem with the
  virtual APIC.  We are working on a fix.  A temporary workaround is to
  use an existing image or install through qemu
- Windows 64-bit does not work.  That's also true for qemu, so it's
  probably a problem with the device model.

[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: NYaniv Kamay <yaniv@qumranet.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: NUri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NAnthony Liguori <anthony@codemonkey.ws>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

6aa8b732

OpenHarmony / kernel_linux 上一次同步 3 年多

OpenHarmony / kernel_linux
上一次同步 3 年多