- 15 10月, 2008 13 次提交
-
-
由 Avi Kivity 提交于
Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Mohammed Gamal 提交于
Since checking for vcpu->arch.rmode.active is already done whenever we call handle_rmode_exception(), checking it inside the function is redundant. Signed-off-by: NMohammed Gamal <m.gamal005@gmail.com> Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Avi Kivity 提交于
Instead of looking at failed injections in the vm entry path, move processing to the exit path in vmx_complete_interrupts(). This simplifes the logic and removes any state that is hidden in vmx registers. Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Avi Kivity 提交于
Similar to the exception queue, this hold interrupts that have been accepted by the virtual processor core but not yet injected. Not yet used. Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Avi Kivity 提交于
The vmx code assumes that IDT-Vectoring can only be set when an exception is injected due to the exception in question. That's not true, however: if the exception is injected correctly, and later another exception occurs but its delivery is blocked due to a fault, then we will incorrectly assume the first exception was not delivered. Fix by unconditionally dequeuing the pending exception, and requeuing it (or the second exception) if we see it in the IDT-Vectoring field. Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Avi Kivity 提交于
If we're emulating an instruction, either it will succeed, in which case any previously queued exception will be spurious, or we will requeue the same exception. Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Avi Kivity 提交于
Instead of processing nmi injection failure in the vm entry path, move it to the vm exit path (vm_complete_interrupts()). This separates nmi injection from nmi post-processing, and moves the nmi state from the VT state into vcpu state (new variable nmi_injected specifying an injection in progress). Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Avi Kivity 提交于
Currently most interrupt exit processing is handled on the entry path, which is confusing. Move the NMI IRET fault processing to a new function, vmx_complete_interrupts(), which is called on the vmexit path. Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Avi Kivity 提交于
The twisty maze of conditionals can be reduced. [joerg: fix tlb flushing] Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com> Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Avi Kivity 提交于
Place into own function, in preparation for further cleanups. Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Amit Shah 提交于
This function injects an interrupt into the guest given the kvm struct, the (guest) irq number and the interrupt level. Signed-off-by: NAmit Shah <amit.shah@qumranet.com> Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Marcelo Tosatti 提交于
As suggested by Avi, introduce accessors to read/write guest registers. This simplifies the ->cache_regs/->decache_regs interface, and improves register caching which is important for VMX, where the cost of vmcs_read/vmcs_write is significant. [avi: fix warnings] Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Sheng Yang 提交于
MSR_IA32_FEATURE_LOCKED is just a bit in fact, which shouldn't be prefixed with MSR_. So is MSR_IA32_FEATURE_VMXON_ENABLED. Signed-off-by: NSheng Yang <sheng.yang@intel.com> Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
- 11 9月, 2008 3 次提交
-
-
由 Sheng Yang 提交于
As well as discard fake accessed bit and dirty bit of EPT. Signed-off-by: NSheng Yang <sheng.yang@intel.com> Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Joerg Roedel 提交于
Accesses to CR4 are intercepted even with Nested Paging enabled. But the code does not check if the guest wants to do a global TLB flush. So this flush gets lost. This patch adds the check and the flush to svm_set_cr4. Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com> Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Joerg Roedel 提交于
This patch introduces a guest TLB flush on every NPF exit in KVM. This fixes random segfaults and #UD exceptions in the guest seen under some workloads (e.g. long running compile workloads or tbench). A kernbench run with and without that fix showed that it has a slowdown lower than 0.5% Cc: stable@kernel.org Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com> Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
- 10 9月, 2008 1 次提交
-
-
由 Sheng Yang 提交于
They are hardware specific MSRs, and we would use them in virtualization feature detection later. Signed-off-by: NSheng Yang <sheng.yang@intel.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 25 8月, 2008 1 次提交
-
-
由 Avi Kivity 提交于
The shadow code assigns a pte directly in one place, which is nonatomic on i386 can can cause random memory references. Fix by using an atomic setter. Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
- 29 7月, 2008 5 次提交
-
-
由 Avi Kivity 提交于
Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Andrea Arcangeli 提交于
Synchronize changes to host virtual addresses which are part of a KVM memory slot to the KVM shadow mmu. This allows pte operations like swapping, page migration, and madvise() to transparently work with KVM. Signed-off-by: NAndrea Arcangeli <andrea@qumranet.com> Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Andrea Arcangeli 提交于
This allows reading memslots with only the mmu_lock hold for mmu notifiers that runs in atomic context and with mmu_lock held. Signed-off-by: NAndrea Arcangeli <andrea@qumranet.com> Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Andrea Arcangeli 提交于
This allows the mmu notifier code to run unalias_gfn with only the mmu_lock held. Only alias writes need the mmu_lock held. Readers will either take the slots_lock in read mode or the mmu_lock. Signed-off-by: NAndrea Arcangeli <andrea@qumranet.com> Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Andrea Arcangeli 提交于
With KVM/GFP/XPMEM there isn't just the primary CPU MMU pointing to pages. There are secondary MMUs (with secondary sptes and secondary tlbs) too. sptes in the kvm case are shadow pagetables, but when I say spte in mmu-notifier context, I mean "secondary pte". In GRU case there's no actual secondary pte and there's only a secondary tlb because the GRU secondary MMU has no knowledge about sptes and every secondary tlb miss event in the MMU always generates a page fault that has to be resolved by the CPU (this is not the case of KVM where the a secondary tlb miss will walk sptes in hardware and it will refill the secondary tlb transparently to software if the corresponding spte is present). The same way zap_page_range has to invalidate the pte before freeing the page, the spte (and secondary tlb) must also be invalidated before any page is freed and reused. Currently we take a page_count pin on every page mapped by sptes, but that means the pages can't be swapped whenever they're mapped by any spte because they're part of the guest working set. Furthermore a spte unmap event can immediately lead to a page to be freed when the pin is released (so requiring the same complex and relatively slow tlb_gather smp safe logic we have in zap_page_range and that can be avoided completely if the spte unmap event doesn't require an unpin of the page previously mapped in the secondary MMU). The mmu notifiers allow kvm/GRU/XPMEM to attach to the tsk->mm and know when the VM is swapping or freeing or doing anything on the primary MMU so that the secondary MMU code can drop sptes before the pages are freed, avoiding all page pinning and allowing 100% reliable swapping of guest physical address space. Furthermore it avoids the code that teardown the mappings of the secondary MMU, to implement a logic like tlb_gather in zap_page_range that would require many IPI to flush other cpu tlbs, for each fixed number of spte unmapped. To make an example: if what happens on the primary MMU is a protection downgrade (from writeable to wrprotect) the secondary MMU mappings will be invalidated, and the next secondary-mmu-page-fault will call get_user_pages and trigger a do_wp_page through get_user_pages if it called get_user_pages with write=1, and it'll re-establishing an updated spte or secondary-tlb-mapping on the copied page. Or it will setup a readonly spte or readonly tlb mapping if it's a guest-read, if it calls get_user_pages with write=0. This is just an example. This allows to map any page pointed by any pte (and in turn visible in the primary CPU MMU), into a secondary MMU (be it a pure tlb like GRU, or an full MMU with both sptes and secondary-tlb like the shadow-pagetable layer with kvm), or a remote DMA in software like XPMEM (hence needing of schedule in XPMEM code to send the invalidate to the remote node, while no need to schedule in kvm/gru as it's an immediate event like invalidating primary-mmu pte). At least for KVM without this patch it's impossible to swap guests reliably. And having this feature and removing the page pin allows several other optimizations that simplify life considerably. Dependencies: 1) mm_take_all_locks() to register the mmu notifier when the whole VM isn't doing anything with "mm". This allows mmu notifier users to keep track if the VM is in the middle of the invalidate_range_begin/end critical section with an atomic counter incraese in range_begin and decreased in range_end. No secondary MMU page fault is allowed to map any spte or secondary tlb reference, while the VM is in the middle of range_begin/end as any page returned by get_user_pages in that critical section could later immediately be freed without any further ->invalidate_page notification (invalidate_range_begin/end works on ranges and ->invalidate_page isn't called immediately before freeing the page). To stop all page freeing and pagetable overwrites the mmap_sem must be taken in write mode and all other anon_vma/i_mmap locks must be taken too. 2) It'd be a waste to add branches in the VM if nobody could possibly run KVM/GRU/XPMEM on the kernel, so mmu notifiers will only enabled if CONFIG_KVM=m/y. In the current kernel kvm won't yet take advantage of mmu notifiers, but this already allows to compile a KVM external module against a kernel with mmu notifiers enabled and from the next pull from kvm.git we'll start using them. And GRU/XPMEM will also be able to continue the development by enabling KVM=m in their config, until they submit all GRU/XPMEM GPLv2 code to the mainline kernel. Then they can also enable MMU_NOTIFIERS in the same way KVM does it (even if KVM=n). This guarantees nobody selects MMU_NOTIFIER=y if KVM and GRU and XPMEM are all =n. The mmu_notifier_register call can fail because mm_take_all_locks may be interrupted by a signal and return -EINTR. Because mmu_notifier_reigster is used when a driver startup, a failure can be gracefully handled. Here an example of the change applied to kvm to register the mmu notifiers. Usually when a driver startups other allocations are required anyway and -ENOMEM failure paths exists already. struct kvm *kvm_arch_create_vm(void) { struct kvm *kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL); + int err; if (!kvm) return ERR_PTR(-ENOMEM); INIT_LIST_HEAD(&kvm->arch.active_mmu_pages); + kvm->arch.mmu_notifier.ops = &kvm_mmu_notifier_ops; + err = mmu_notifier_register(&kvm->arch.mmu_notifier, current->mm); + if (err) { + kfree(kvm); + return ERR_PTR(err); + } + return kvm; } mmu_notifier_unregister returns void and it's reliable. The patch also adds a few needed but missing includes that would prevent kernel to compile after these changes on non-x86 archs (x86 didn't need them by luck). [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix mm/filemap_xip.c build] [akpm@linux-foundation.org: fix mm/mmu_notifier.c build] Signed-off-by: NAndrea Arcangeli <andrea@qumranet.com> Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NChristoph Lameter <cl@linux-foundation.org> Cc: Jack Steiner <steiner@sgi.com> Cc: Robin Holt <holt@sgi.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Kanoj Sarcar <kanojsarcar@yahoo.com> Cc: Roland Dreier <rdreier@cisco.com> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Avi Kivity <avi@qumranet.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Anthony Liguori <aliguori@us.ibm.com> Cc: Chris Wright <chrisw@redhat.com> Cc: Marcelo Tosatti <marcelo@kvack.org> Cc: Eric Dumazet <dada1@cosmosbay.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: Izik Eidus <izike@qumranet.com> Cc: Anthony Liguori <aliguori@us.ibm.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 27 7月, 2008 7 次提交
-
-
由 Sheng Yang 提交于
As well as move set base/mask ptes to vmx_init(). Signed-off-by: NSheng Yang <sheng.yang@intel.com> Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Sheng Yang 提交于
Signed-off-by: NSheng Yang <sheng.yang@intel.com> Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Marcelo Tosatti 提交于
If 'g' is one then limit is 4kb granular. Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Avi Kivity 提交于
When an event (such as an interrupt) is injected, and the stack is shadowed (and therefore write protected), the guest will exit. The current code will see that the stack is shadowed and emulate a few instructions, each time postponing the injection. Eventually the injection may succeed, but at that time the guest may be unwilling to accept the interrupt (for example, the TPR may have changed). This occurs every once in a while during a Windows 2008 boot. Fix by unshadowing the fault address if the fault was due to an event injection. Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Marcelo Tosatti 提交于
There is no guarantee that the old TSS descriptor in the GDT contains the proper base address. This is the case for Windows installation's reboot-via-triplefault. Use guest registers instead. Also translate the address properly. Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Marcelo Tosatti 提交于
The segment base is always a linear address, so translate before accessing guest memory. Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Joerg Roedel 提交于
If NPT is enabled after loading both KVM modules on AMD and it should be disabled, both KVM modules must be reloaded. If only the architecture module is reloaded the behavior is undefined. With this patch it is possible to disable NPT only by reloading the kvm_amd module. Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com> Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
- 20 7月, 2008 10 次提交
-
-
由 Avi Kivity 提交于
The direct mapped shadow code (used for real mode and two dimensional paging) sets upper-level ptes using direct assignment rather than calling set_shadow_pte(). A nonpae host will split this into two writes, which opens up a race if another vcpu accesses the same memory area. Fix by calling set_shadow_pte() instead of assigning directly. Noticed by Izik Eidus. Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Glauber Costa 提交于
If the guest issues a clflush in a mmio address, the instruction can trap into the hypervisor. Currently, we do not decode clflush properly, causing the guest to hang. This patch fixes this emulating clflush (opcode 0f ae). Signed-off-by: NGlauber Costa <gcosta@redhat.com> Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Marcelo Tosatti 提交于
Harden kvm_mmu_zap_page() against invalid root pages that had been shadowed from memslots that are gone. Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Marcelo Tosatti 提交于
Flush the shadow mmu before removing regions to avoid stale entries. Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Avi Kivity 提交于
Fixes compilation with CONFIG_VMI enabled. Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Ben-Ami Yassour 提交于
Check that an injected pic irq is between 0 and 15. Signed-off-by: NBen-Ami Yassour <benami@il.ibm.com> Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Mohammed Gamal 提交于
This patch fixes issue encountered with HLT instruction under FreeDOS's HIMEM XMS Driver. The HLT instruction jumped directly to the done label and skips updating the EIP value, therefore causing the guest to spin endlessly on the same instruction. The patch changes the instruction so that it writes back the updated EIP value. Signed-off-by: NMohammed Gamal <m.gamal005@gmail.com> Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Avi Kivity 提交于
Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Sheng Yang 提交于
Fix a potention issue caused by kvm_mmu_slot_remove_write_access(). The old behavior don't sync EPT TLB with modified EPT entry, which result in inconsistent content of EPT TLB and EPT table. Signed-off-by: NSheng Yang <sheng.yang@intel.com> Signed-off-by: NAvi Kivity <avi@qumranet.com>
-
由 Marcelo Tosatti 提交于
kvm_mmu_zap_page() needs slots lock held (rmap_remove->gfn_to_memslot, for example). Since kvm_lock spinlock is held in mmu_shrink(), do a non-blocking down_read_trylock(). Untested. Signed-off-by: NAvi Kivity <avi@qumranet.com>
-