- 15 6月, 2007 1 次提交
-
-
由 Avi Kivity 提交于
The lazy fpu changes did not take into account that some vmexit handlers can sleep. Move loading the guest state into the inner loop so that it can be reloaded if necessary, and move loading the host state into vmx_vcpu_put() so it can be performed whenever we relinquish the vcpu. Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
- 22 5月, 2007 1 次提交
-
-
由 Alexey Dobriyan 提交于
First thing mm.h does is including sched.h solely for can_do_mlock() inline function which has "current" dereference inside. By dealing with can_do_mlock() mm.h can be detached from sched.h which is good. See below, why. This patch a) removes unconditional inclusion of sched.h from mm.h b) makes can_do_mlock() normal function in mm/mlock.c c) exports can_do_mlock() to not break compilation d) adds sched.h inclusions back to files that were getting it indirectly. e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were getting them indirectly Net result is: a) mm.h users would get less code to open, read, preprocess, parse, ... if they don't need sched.h b) sched.h stops being dependency for significant number of files: on x86_64 allmodconfig touching sched.h results in recompile of 4083 files, after patch it's only 3744 (-8.3%). Cross-compile tested on all arm defconfigs, all mips defconfigs, all powerpc defconfigs, alpha alpha-up arm i386 i386-up i386-defconfig i386-allnoconfig ia64 ia64-up m68k mips parisc parisc-up powerpc powerpc-up s390 s390-up sparc sparc-up sparc64 sparc64-up um-x86_64 x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig as well as my two usual configs. Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 03 5月, 2007 17 次提交
-
-
由 Avi Kivity 提交于
When emulating an mmio read, we actually emulate twice: once to determine the physical address of the mmio, and, after we've exited to userspace to get the mmio value, we emulate again to place the value in the result register and update any flags. But we don't really need to enter the guest again for that, only to take an immediate vmexit. So, if we detect that we're doing an mmio read, emulate a single instruction before entering the guest again. Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Anthony Liguori 提交于
Set all of the host mask bits for CR0 so that we can maintain a proper shadow of CR0. This exposes CR0.TS, paving the way for lazy fpu handling. Signed-off-by: NAnthony Liguori <aliguori@us.ibm.com> Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Anthony Liguori 提交于
Avoid saving and restoring the guest fpu state on every exit. This shaves ~100 cycles off the guest/host switch. Signed-off-by: NAnthony Liguori <aliguori@us.ibm.com> Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Avi Kivity 提交于
Make the exit statistics per-vcpu instead of global. This gives a 3.5% boost when running one virtual machine per core on my two socket dual core (4 cores total) machine. Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Avi Kivity 提交于
Better leak detection, statistics, memory use, speed -- goodness all around. Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Avi Kivity 提交于
With this, we can specify that accesses to one physical memory range will be remapped to another. This is useful for the vga window at 0xa0000 which is used as a movable window into the (much larger) framebuffer. Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Avi Kivity 提交于
Mapping a guest page to a host page is a common operation. Currently, one has first to find the memory slot where the page belongs (gfn_to_memslot), then locate the page itself (gfn_to_page()). This is clumsy, and also won't work well with memory aliases. So simplify gfn_to_page() not to require memory slot translation first, and instead do it internally. Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Dor Laor 提交于
Functions that play around with the physical memory map need a way to clear mappings to possibly nonexistent or invalid memory. Both the mmu cache and the processor tlb are cleared. Signed-off-by: NDor Laor <dor.laor@qumranet.com> Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Avi Kivity 提交于
When a vcpu is migrated from one cpu to another, its timestamp counter may lose its monotonic property if the host has unsynced timestamp counters. This can confuse the guest, sometimes to the point of refusing to boot. As the rdtsc instruction is rather fast on AMD processors (7-10 cycles), we can simply record the last host tsc when we drop the cpu, and adjust the vcpu tsc offset when we detect that we've migrated to a different cpu. Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Avi Kivity 提交于
The kvm mmu keeps a shadow page for hugepage pdes; if several such pdes map the same physical address, they share the same shadow page. This is a fairly common case (kernel mappings on i386 nonpae Linux, for example). However, if the two pdes map the same memory but with different permissions, kvm will happily use the cached shadow page. If the access through the more permissive pde will occur after the access to the strict pde, an endless pagefault loop will be generated and the guest will make no progress. Fix by making the access permissions part of the cache lookup key. The fix allows Xen pae to boot on kvm and run guest domains. Thanks to Jeremy Fitzhardinge for reporting the bug and testing the fix. Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Avi Kivity 提交于
set_cr0_no_modeswitch() was a hack to avoid corrupting segment registers. As we now cache the protected mode values on entry to real mode, this isn't an issue anymore, and it interferes with reboot (which usually _is_ a modeswitch). Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Avi Kivity 提交于
The initial, noncaching, version of the kvm mmu flushed the all nonglobal shadow page table translations (much like a native tlb flush). The new implementation flushes translations only when they change, rendering global pte tracking superfluous. This removes the unused tracking mechanism and storage space. Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Avi Kivity 提交于
The current string pio interface communicates using guest virtual addresses, relying on userspace to translate addresses and to check permissions. This interface cannot fully support guest smp, as the check needs to take into account two pages at one in case an unaligned string transfer straddles a page boundary. Change the interface not to communicate guest addresses at all; instead use a buffer page (mmaped by userspace) and do transfers there. The kernel manages the virtual to physical translation and can perform the checks atomically by taking the appropriate locks. Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Avi Kivity 提交于
Allow a special signal mask to be used while executing in guest mode. This allows signals to be used to interrupt a vcpu without requiring signal delivery to a userspace handler, which is quite expensive. Userspace still receives -EINTR and can get the signal via sigwait(). Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Avi Kivity 提交于
KVM used to handle cpuid by letting userspace decide what values to return to the guest. We now handle cpuid completely in the kernel. We still let userspace decide which values the guest will see by having userspace set up the value table beforehand (this is necessary to allow management software to set the cpu features to the least common denominator, so that live migration can work). The motivation for the change is that kvm kernel code can be impacted by cpuid features, for example the x86 emulator. Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Avi Kivity 提交于
Currently when passing the a PIO emulation request to userspace, we rely on userspace updating %rax (on 'in' instructions) and %rsi/%rdi/%rcx (on string instructions). This (a) requires two extra ioctls for getting and setting the registers and (b) is unfriendly to non-x86 archs, when they get kvm ports. So fix by doing the register fixups in the kernel and passing to userspace only an abstract description of the PIO to be done. Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Avi Kivity 提交于
Instead of passing a 'struct kvm_run' back and forth between the kernel and userspace, allocate a page and allow the user to mmap() it. This reduces needless copying and makes the interface expandable by providing lots of free space. Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
- 04 3月, 2007 4 次提交
-
-
由 Avi Kivity 提交于
Allocate a distinct inode for every vcpu in a VM. This has the following benefits: - the filp cachelines are no longer bounced when f_count is incremented on every ioctl() - the API and internal code are distinctly clearer; for example, on the KVM_GET_REGS ioctl, there is no need to copy the vcpu number from userspace and then copy the registers back; the vcpu identity is derived from the fd used to make the call Right now the performance benefits are completely theoretical since (a) we don't support more than one vcpu per VM and (b) virtualization hardware inefficiencies completely everwhelm any cacheline bouncing effects. But both of these will change, and we need to prepare the API today. Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Avi Kivity 提交于
Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Ingo Molnar 提交于
This adds a special MSR based hypercall API to KVM. This is to be used by paravirtual kernels and virtual drivers. Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Markus Rechberger 提交于
Besides using an established api, this allows using kvm in older kernels. Signed-off-by: NMarkus Rechberger <markus.rechberger@amd.com> Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
- 13 2月, 2007 3 次提交
-
-
由 Avi Kivity 提交于
On hotplug, we execute the hardware extension enable sequence. On unplug, we decache any vcpus that last ran on the exiting cpu, and execute the hardware extension disable sequence. Signed-off-by: NAvi Kivity <avi@qumranet.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Avi Kivity 提交于
This will allow us to iterate over all vcpus and see which cpus they are running on. [akpm@osdl.org: use standard (ugly) initialisers] Signed-off-by: NAvi Kivity <avi@qumranet.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 S.Caglar Onur 提交于
lldt does not accept immediate operands, which "g" allows. Signed-off-by: NS.Caglar Onur <caglar@pardus.org.tr> Signed-off-by: NAvi Kivity <avi@qumranet.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 27 1月, 2007 1 次提交
-
-
由 Avi Kivity 提交于
This allows netbsd 3.1 i386 to get further along installing. Signed-off-by: NAvi Kivity <avi@qumranet.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 1月, 2007 13 次提交
-
-
由 Avi Kivity 提交于
The mmu sometimes needs memory for reverse mapping and parent pte chains. however, we can't allocate from within the mmu because of the atomic context. So, move the allocations to a central place that can be executed before the main mmu machinery, where we can bail out on failure before any damage is done. (error handling is deffered for now, but the basic structure is there) Signed-off-by: NAvi Kivity <avi@qumranet.com> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Avi Kivity 提交于
We always need cr3 to point to something valid, so if we detect that we're freeing a root page, simply push it back to the top of the active list. Signed-off-by: NAvi Kivity <avi@qumranet.com> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Avi Kivity 提交于
In fork() (or when we protect a page that is no longer a page table), we can experience floods of writes to a page, which have to be emulated. This is expensive. So, if we detect such a flood, zap the page so subsequent writes can proceed natively. Signed-off-by: NAvi Kivity <avi@qumranet.com> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Avi Kivity 提交于
Since we write protect shadowed guest page tables, there is no need to trap page invalidations (the guest will always change the mapping before issuing the invlpg instruction). Signed-off-by: NAvi Kivity <avi@qumranet.com> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Avi Kivity 提交于
When beginning to process a page fault, make sure we have enough shadow pages available to service the fault. If not, free some pages. Signed-off-by: NAvi Kivity <avi@qumranet.com> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Avi Kivity 提交于
A page table may have been recycled into a regular page, and so any instruction can be executed on it. Unprotect the page and let the cpu do its thing. Signed-off-by: NAvi Kivity <avi@qumranet.com> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Avi Kivity 提交于
As the mmu write protects guest page table, we emulate those writes. Since they are not mmio, there is no need to go to userspace to perform them. So, perform the writes in the kernel if possible, and notify the mmu about them so it can take the approriate action. Signed-off-by: NAvi Kivity <avi@qumranet.com> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Avi Kivity 提交于
Define a hashtable for caching shadow page tables. Look up the cache on context switch (cr3 change) or during page faults. The key to the cache is a combination of - the guest page table frame number - the number of paging levels in the guest * we can cache real mode, 32-bit mode, pae, and long mode page tables simultaneously. this is useful for smp bootup. - the guest page table table * some kernels use a page as both a page table and a page directory. this allows multiple shadow pages to exist for that page, one per level - the "quadrant" * 32-bit mode page tables span 4MB, whereas a shadow page table spans 2MB. similarly, a 32-bit page directory spans 4GB, while a shadow page directory spans 1GB. the quadrant allows caching up to 4 shadow page tables for one guest page in one level. - a "metaphysical" bit * for real mode, and for pse pages, there is no guest page table, so set the bit to avoid write protecting the page. Signed-off-by: NAvi Kivity <avi@qumranet.com> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Avi Kivity 提交于
Since we're not going to cache the pae-mode shadow root pages, allocate a single pae shadow that will hold the four lower-level pages, which will act as roots. Signed-off-by: NAvi Kivity <avi@qumranet.com> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Avi Kivity 提交于
In pae mode, a load of cr3 loads the four third-level page table entries in addition to cr3 itself. Signed-off-by: NAvi Kivity <avi@qumranet.com> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Avi Kivity 提交于
Keep in each host page frame's page->private a pointer to the shadow pte which maps it. If there are multiple shadow ptes mapping the page, set bit 0 of page->private, and use the rest as a pointer to a linked list of all such mappings. Reverse mappings are needed because we when we cache shadow page tables, we must protect the guest page tables from being modified by the guest, as that would invalidate the cached ptes. Signed-off-by: NAvi Kivity <avi@qumranet.com> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Avi Kivity 提交于
Hardware virtualization implementations allow the guests to freely change some of the bits in cr0 and cr4, but trap when changing the other bits. This is useful to avoid excessive exits due to changing, for example, the ts flag. It also means the kvm's copy of cr0 and cr4 may be stale with respect to these bits. most of the time this doesn't matter as these bits are not very interesting. Other times, however (for example when returning cr0 to userspace), they are, so get the fresh contents of these bits from the guest by means of a new arch operation. Signed-off-by: NAvi Kivity <avi@qumranet.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Dor Laor 提交于
The current interrupt injection mechanism might delay an interrupt under the following circumstances: - if injection fails because the guest is not interruptible (rflags.IF clear, or after a 'mov ss' or 'sti' instruction). Userspace can check rflags, but the other cases or not testable under the current API. - if injection fails because of a fault during delivery. This probably never happens under normal guests. - if injection fails due to a physical interrupt causing a vmexit so that it can be handled by the host. In all cases the guest proceeds without processing the interrupt, reducing the interactive feel and interrupt throughput of the guest. This patch fixes the situation by allowing userspace to request an exit when the 'interrupt window' opens, so that it can re-inject the interrupt at the right time. Guest interactivity is very visibly improved. Signed-off-by: NDor Laor <dor.laor@qumranet.com> Signed-off-by: NAvi Kivity <avi@qumranet.com> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-