- 10 6月, 2009 11 次提交
-
-
由 Avi Kivity 提交于
If a slots guest physical address and host virtual address unequal (mod large page size), then we would erronously try to back guest large pages with host large pages. Detect this misalignment and diable large page support for the trouble slot. Cc: stable@kernel.org Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Marcelo Tosatti 提交于
kvm_handle_hva relies on mmu_lock protection to safely access the memslot structures. Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Marcelo Tosatti 提交于
kvm_assigned_dev_ack_irq is vulnerable to a race condition with the interrupt handler function. It does: if (dev->host_irq_disabled) { enable_irq(dev->host_irq); dev->host_irq_disabled = false; } If an interrupt triggers before the host->dev_irq_disabled assignment, it will disable the interrupt and set dev->host_irq_disabled to true. On return to kvm_assigned_dev_ack_irq, dev->host_irq_disabled is set to false, and the next kvm_assigned_dev_ack_irq call will fail to reenable it. Other than that, having the interrupt handler and work handlers run in parallel sounds like asking for trouble (could not spot any obvious problem, but better not have to, its fragile). CC: sheng.yang@intel.com Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Sheng Yang 提交于
Intel TXT(Trusted Execution Technology) required VMX off for all cpu to work when system shutdown. CC: Joseph Cihula <joseph.cihula@intel.com> Signed-off-by: NSheng Yang <sheng@linux.intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Gleb Natapov 提交于
kvm_vcpu_block() unhalts vpu on an interrupt/timer without checking if interrupt window is actually opened. Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Gleb Natapov 提交于
Currently timer events are processed before entering guest mode. Move it to main vcpu event loop since timer events should be processed even while vcpu is halted. Timer may cause interrupt/nmi to be injected and only then vcpu will be unhalted. Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Gleb Natapov 提交于
free_mmu_pages() should only undo what alloc_mmu_pages() does. Free mmu pages from the generic VM destruction function, kvm_destroy_vm(). Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Sheng Yang 提交于
After discussion with Marcelo, we decided to rework device assignment framework together. The old problems are kernel logic is unnecessary complex. So Marcelo suggest to split it into a more elegant way: 1. Split host IRQ assign and guest IRQ assign. And userspace determine the combination. Also discard msi2intx parameter, userspace can specific KVM_DEV_IRQ_HOST_MSI | KVM_DEV_IRQ_GUEST_INTX in assigned_irq->flags to enable MSI to INTx convertion. 2. Split assign IRQ and deassign IRQ. Import two new ioctls: KVM_ASSIGN_DEV_IRQ and KVM_DEASSIGN_DEV_IRQ. This patch also fixed the reversed _IOR vs _IOW in definition(by deprecated the old interface). [avi: replace homemade bitcount() by hweight_long()] Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NSheng Yang <sheng@linux.intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Sheng Yang 提交于
This patch finally enable MSI-X. What we need for MSI-X: 1. Intercept one page in MMIO region of device. So that we can get guest desired MSI-X table and set up the real one. Now this have been done by guest, and transfer to kernel using ioctl KVM_SET_MSIX_NR and KVM_SET_MSIX_ENTRY. 2. Information for incoming interrupt. Now one device can have more than one interrupt, and they are all handled by one workqueue structure. So we need to identify them. The previous patch enable gsi_msg_pending_bitmap get this done. 3. Mapping from host IRQ to guest gsi as well as guest gsi to real MSI/MSI-X message address/data. We used same entry number for the host and guest here, so that it's easy to find the correlated guest gsi. What we lack for now: 1. The PCI spec said nothing can existed with MSI-X table in the same page of MMIO region, except pending bits. The patch ignore pending bits as the first step (so they are always 0 - no pending). 2. The PCI spec allowed to change MSI-X table dynamically. That means, the OS can enable MSI-X, then mask one MSI-X entry, modify it, and unmask it. The patch didn't support this, and Linux also don't work in this way. 3. The patch didn't implement MSI-X mask all and mask single entry. I would implement the former in driver/pci/msi.c later. And for single entry, userspace should have reposibility to handle it. Signed-off-by: NSheng Yang <sheng@linux.intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Sheng Yang 提交于
We have to handle more than one interrupt with one handler for MSI-X. Avi suggested to use a flag to indicate the pending. So here is it. Signed-off-by: NSheng Yang <sheng@linux.intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Sheng Yang 提交于
Introduce KVM_SET_MSIX_NR and KVM_SET_MSIX_ENTRY two ioctls. This two ioctls are used by userspace to specific guest device MSI-X entry number and correlate MSI-X entry with GSI during the initialization stage. MSI-X should be well initialzed before enabling. Don't support change MSI-X entry number for now. Signed-off-by: NSheng Yang <sheng@linux.intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 09 6月, 2009 1 次提交
-
-
由 Avi Kivity 提交于
one system was found there is crash during reboot then kvm/MAXSMP Sending all processes the KILL signal... done Please stand by while rebooting the system... [ 1721.856538] md: stopping all md devices. [ 1722.852139] kvm: exiting hardware virtualization [ 1722.854601] BUG: unable to handle kernel NULL pointer dereference at (null) [ 1722.872219] IP: [<ffffffff8102c6b6>] hardware_disable+0x4c/0xb4 [ 1722.877955] PGD 0 [ 1722.880042] Oops: 0000 [#1] SMP [ 1722.892548] last sysfs file: /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/host0/target0:2:0/0:2:0:0/vendor [ 1722.900977] CPU 9 [ 1722.912606] Modules linked in: [ 1722.914226] Pid: 0, comm: swapper Not tainted 2.6.30-rc7-tip-01843-g2305324-dirty #299 ... [ 1722.932589] RIP: 0010:[<ffffffff8102c6b6>] [<ffffffff8102c6b6>] hardware_disable+0x4c/0xb4 [ 1722.942709] RSP: 0018:ffffc900010b6ed8 EFLAGS: 00010046 [ 1722.956121] RAX: 0000000000000000 RBX: ffffc9000e253140 RCX: 0000000000000009 [ 1722.972202] RDX: 000000000000b020 RSI: ffffc900010c3220 RDI: ffffffffffffd790 [ 1722.977399] RBP: ffffc900010b6f08 R08: 0000000000000000 R09: 0000000000000000 [ 1722.995149] R10: 00000000000004b8 R11: 966912b6c78fddbd R12: 0000000000000009 [ 1723.011551] R13: 000000000000b020 R14: 0000000000000009 R15: 0000000000000000 [ 1723.019898] FS: 0000000000000000(0000) GS:ffffc900010b3000(0000) knlGS:0000000000000000 [ 1723.034389] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [ 1723.041164] CR2: 0000000000000000 CR3: 0000000001001000 CR4: 00000000000006e0 [ 1723.056192] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1723.072546] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 1723.080562] Process swapper (pid: 0, threadinfo ffff88107e464000, task ffff88047e5a2550) [ 1723.096144] Stack: [ 1723.099071] 0000000000000046 ffffc9000e253168 966912b6c78fddbd ffffc9000e253140 [ 1723.115471] ffff880c7d4304d0 ffffc9000e253168 ffffc900010b6f28 ffffffff81011022 [ 1723.132428] ffffc900010b6f48 966912b6c78fddbd ffffc900010b6f48 ffffffff8100b83b [ 1723.141973] Call Trace: [ 1723.142981] <IRQ> <0> [<ffffffff81011022>] kvm_arch_hardware_disable+0x26/0x3c [ 1723.158153] [<ffffffff8100b83b>] hardware_disable+0x3f/0x55 [ 1723.172168] [<ffffffff810b95f6>] generic_smp_call_function_interrupt+0x76/0x13c [ 1723.178836] [<ffffffff8104cbea>] smp_call_function_interrupt+0x3a/0x5e [ 1723.194689] [<ffffffff81035bf3>] call_function_interrupt+0x13/0x20 [ 1723.199750] <EOI> <0> [<ffffffff814ad3b4>] ? acpi_idle_enter_c1+0xd3/0xf4 [ 1723.217508] [<ffffffff814ad3ae>] ? acpi_idle_enter_c1+0xcd/0xf4 [ 1723.232172] [<ffffffff814ad4bc>] ? acpi_idle_enter_bm+0xe7/0x2ce [ 1723.235141] [<ffffffff81a8d93f>] ? __atomic_notifier_call_chain+0x0/0xac [ 1723.253381] [<ffffffff818c3dff>] ? menu_select+0x58/0xd2 [ 1723.258179] [<ffffffff818c2c9d>] ? cpuidle_idle_call+0xa4/0xf3 [ 1723.272828] [<ffffffff81034085>] ? cpu_idle+0xb8/0x101 [ 1723.277085] [<ffffffff81a80163>] ? start_secondary+0x1bc/0x1d7 [ 1723.293708] Code: b0 00 00 65 48 8b 04 25 28 00 00 00 48 89 45 e0 31 c0 48 8b 04 cd 30 ee 27 82 49 89 cc 49 89 d5 48 8b 04 10 48 8d b8 90 d7 ff ff <48> 8b 87 70 28 00 00 48 8d 98 90 d7 ff ff eb 16 e8 e9 fe ff ff [ 1723.335524] RIP [<ffffffff8102c6b6>] hardware_disable+0x4c/0xb4 [ 1723.342076] RSP <ffffc900010b6ed8> [ 1723.352021] CR2: 0000000000000000 [ 1723.354348] ---[ end trace e2aec53dae150aa1 ]--- it turns out that we need clear cpus_hardware_enabled in that case. Reported-and-tested-by: NYinghai Lu <yinghai@kernel.org> Signed-off-by: NYinghai Lu <yinghai@kernel.org> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
- 08 6月, 2009 1 次提交
-
-
由 Avi Kivity 提交于
Under CONFIG_MAXSMP, cpus_hardware_enabled is allocated from the heap and not statically initialized. This causes a crash on reboot when kvm thinks vmx is enabled on random nonexistent cpus and accesses nonexistent percpu lists. Fix by explicitly clearing the variable. Cc: stable@kernel.org Reported-and-tested-by: NYinghai Lu <yinghai@kernel.org> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 22 4月, 2009 2 次提交
-
-
由 Jan Kiszka 提交于
When checking for overlapping slots on registration of a new one, kvm currently also considers zero-length (ie. deleted) slots and rejects requests incorrectly. This finally denies user space from joining slots. Fix the check by skipping deleted slots and advertise this via a KVM_CAP_JOIN_MEMORY_REGIONS_WORKS. Cc: stable@kernel.org Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Avi Kivity 提交于
The large page initialization code concludes there are two large pages spanned by a slot covering 1 (small) page starting at gfn 1. This is incorrect, and also results in incorrect write_count initialization in some cases (base = 1, npages = 513 for example). Cc: stable@kernel.org Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 24 3月, 2009 8 次提交
-
-
由 Sheng Yang 提交于
In capability probing ioctl. Signed-off-by: NSheng Yang <sheng@linux.intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Weidong Han 提交于
only need to set assigned_dev_id for deassignment, use match->flags to judge and deassign it. Acked-by: NMark McLoughlin <markmc@redhat.com> Signed-off-by: NWeidong Han <weidong.han@intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Joerg Roedel 提交于
The function kvm_is_mmio_pfn is called before put_page is called on a page by KVM. This is a problem when when this function is called on some struct page which is part of a compund page. It does not test the reserved flag of the compound page but of the struct page within the compount page. This is a problem when KVM works with hugepages allocated at boot time. These pages have the reserved bit set in all tail pages. Only the flag in the compount head is cleared. KVM would not put such a page which results in a memory leak. Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com> Acked-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Sheng Yang 提交于
Merge MSI userspace interface with IRQ routing table. Notice the API have been changed, and using IRQ routing table would be the only interface kvm-userspace supported. Signed-off-by: NSheng Yang <sheng@linux.intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Avi Kivity 提交于
Currently KVM has a static routing from GSI numbers to interrupts (namely, 0-15 are mapped 1:1 to both PIC and IOAPIC, and 16:23 are mapped 1:1 to the IOAPIC). This is insufficient for several reasons: - HPET requires non 1:1 mapping for the timer interrupt - MSIs need a new method to assign interrupt numbers and dispatch them - ACPI APIC mode needs to be able to reassign the PCI LINK interrupts to the ioapics This patch implements an interrupt routing table (as a linked list, but this can be easily changed) and a userspace interface to replace the table. The routing table is initialized according to the current hardwired mapping. Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Avi Kivity 提交于
Allow clients to request notifications when the guest masks or unmasks a particular irq line. This complements irq ack notifications, as the guest will not ack an irq line that is masked. Currently implemented for the ioapic only. Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Sheng Yang 提交于
MSI is always enabled by default for msi2intx=1. But if msi2intx=0, we have to disable MSI if guest require to do so. The patch also discard unnecessary msi2intx judgment if guest want to update MSI state. Notice KVM_DEV_IRQ_ASSIGN_MSI_ACTION is a mask which should cover all MSI related operations, though we only got one for now. Signed-off-by: NSheng Yang <sheng@linux.intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Jan Kiszka 提交于
This rips out the support for KVM_DEBUG_GUEST and introduces a new IOCTL instead: KVM_SET_GUEST_DEBUG. The IOCTL payload consists of a generic part, controlling the "main switch" and the single-step feature. The arch specific part adds an x86 interface for intercepting both types of debug exceptions separately and re-injecting them when the host was not interested. Moveover, the foundation for guest debugging via debug registers is layed. To signal breakpoint events properly back to userland, an arch-specific data block is now returned along KVM_EXIT_DEBUG. For x86, the arch block contains the PC, the debug exception, and relevant debug registers to tell debug events properly apart. The availability of this new interface is signaled by KVM_CAP_SET_GUEST_DEBUG. Empty stubs for not yet supported archs are provided. Note that both SVM and VTX are supported, but only the latter was tested yet. Based on the experience with all those VTX corner case, I would be fairly surprised if SVM will work out of the box. Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 15 2月, 2009 5 次提交
-
-
由 Mark McLoughlin 提交于
kvm->slots_lock is outer to kvm->lock, so take slots_lock in kvm_vm_ioctl_assign_device() before taking kvm->lock, rather than taking it in kvm_iommu_map_memslots(). Cc: stable@kernel.org Signed-off-by: NMark McLoughlin <markmc@redhat.com> Acked-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Sheng Yang 提交于
Missing buckets and wrong parameter for free_irq() Signed-off-by: NSheng Yang <sheng@linux.intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Sheng Yang 提交于
In the past, kvm_get_kvm() and kvm_put_kvm() was called in assigned device irq handler and interrupt_work, in order to prevent cancel_work_sync() in kvm_free_assigned_irq got a illegal state when waiting for interrupt_work done. But it's tricky and still got two problems: 1. A bug ignored two conditions that cancel_work_sync() would return true result in a additional kvm_put_kvm(). 2. If interrupt type is MSI, we would got a window between cancel_work_sync() and free_irq(), which interrupt would be injected again... This patch discard the reference count used for irq handler and interrupt_work, and ensure the legal state by moving the free function at the very beginning of kvm_destroy_vm(). And the patch fix the second bug by disable irq before cancel_work_sync(), which may result in nested disable of irq but OK for we are going to free it. Signed-off-by: NSheng Yang <sheng@linux.intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Sheng Yang 提交于
kvm_arch_sync_events is introduced to quiet down all other events may happen contemporary with VM destroy process, like IRQ handler and work struct for assigned device. For kvm_arch_sync_events is called at the very beginning of kvm_destroy_vm(), so the state of KVM here is legal and can provide a environment to quiet down other events. Signed-off-by: NSheng Yang <sheng@linux.intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Marcelo Tosatti 提交于
The destructor for huge pages uses the backing inode for adjusting hugetlbfs accounting. Hugepage mappings are destroyed by exit_mmap, after mmu_notifier_release, so there are no notifications through unmap_hugepage_range at this point. The hugetlbfs inode can be freed with pages backed by it referenced by the shadow. When the shadow releases its reference, the huge page destructor will access a now freed inode. Implement the release operation for kvm mmu notifiers to release page refs before the hugetlbfs inode is gone. Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 03 1月, 2009 4 次提交
-
-
由 Joerg Roedel 提交于
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
-
由 Weidong Han 提交于
In kvm_iommu_unmap_memslots(), assigned_dev_head is already empty. Signed-off-by: NWeidong Han <weidong.han@intel.com> Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
-
由 Weidong Han 提交于
Support device deassignment, it can be used in device hotplug. Signed-off-by: NWeidong Han <weidong.han@intel.com> Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
-
由 Weidong Han 提交于
intel iommu APIs are updated, use the new APIs. In addition, change kvm_iommu_map_guest() to just create the domain, let kvm_iommu_assign_device() assign device. Signed-off-by: NWeidong Han <weidong.han@intel.com> Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
-
- 31 12月, 2008 8 次提交
-
-
由 Mark McLoughlin 提交于
If an assigned device shares a guest irq with an emulated device then we currently interpret an ack generated by the emulated device as originating from the assigned device leading to e.g. "Unbalanced enable for IRQ 4347" from the enable_irq() in kvm_assigned_dev_ack_irq(). The fix is fairly simple - don't enable the physical device irq unless it was previously disabled. Of course, this can still lead to a situation where a non-assigned device ACK can cause the physical device irq to be reenabled before the device was serviced. However, being level sensitive, the interrupt will merely be regenerated. Signed-off-by: NMark McLoughlin <markmc@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Avi Kivity 提交于
Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Avi Kivity 提交于
Userspace might need to act differently. Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Rusty Russell 提交于
This changes cpus_hardware_enabled from a cpumask_t to a cpumask_var_t: equivalent for CONFIG_CPUMASKS_OFFSTACK=n, otherwise dynamically allocated. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Rusty Russell 提交于
We're getting rid on on-stack cpumasks for large NR_CPUS. 1) Use cpumask_var_t/alloc_cpumask_var. 2) smp_call_function_mask -> smp_call_function_many 3) cpus_clear, cpus_empty, cpu_set -> cpumask_clear, cpumask_empty, cpumask_set_cpu. This actually generates slightly smaller code than the old one with CONFIG_CPUMASKS_OFFSTACK=n. (gcc knows that cpus cannot be NULL in that case, where cpumask_var_t is cpumask_t[1]). Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Rusty Russell 提交于
Avi said: > Wow, code duplication from Rusty. Things must be bad. Something about glass houses comes to mind. But instead, a patch. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Christian Borntraeger 提交于
There is a race between a "close of the file descriptors" and module unload in the kvm module. You can easily trigger this problem by applying this debug patch: >--- kvm.orig/virt/kvm/kvm_main.c >+++ kvm/virt/kvm/kvm_main.c >@@ -648,10 +648,14 @@ void kvm_free_physmem(struct kvm *kvm) > kvm_free_physmem_slot(&kvm->memslots[i], NULL); > } > >+#include <linux/delay.h> > static void kvm_destroy_vm(struct kvm *kvm) > { > struct mm_struct *mm = kvm->mm; > >+ printk("off1\n"); >+ msleep(5000); >+ printk("off2\n"); > spin_lock(&kvm_lock); > list_del(&kvm->vm_list); > spin_unlock(&kvm_lock); and killing the userspace, followed by an rmmod. The problem is that kvm_destroy_vm can run while the module count is 0. That means, you can remove the module while kvm_destroy_vm is running. But kvm_destroy_vm is part of the module text. This causes a kerneloops. The race exists without the msleep but is much harder to trigger. This patch requires the fix for anon_inodes (anon_inodes: use fops->owner for module refcount). With this patch, we can set the owner of all anonymous KVM inodes file operations. The VFS will then control the KVM module refcount as long as there is an open file. kvm_destroy_vm will be called by the release function of the last closed file - before the VFS drops the module refcount. Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Glauber Costa 提交于
Right now, KVM does not remove a slot when we do a register ioctl for size 0 (would be the expected behaviour). Instead, we only mark it as empty, but keep all bitmaps and allocated data structures present. It completely nullifies our chances of reusing that same slot again for mapping a different piece of memory. In this patch, we destroy rmaps, and vfree() the pointers that used to hold the dirty bitmap, rmap and lpage_info structures. Signed-off-by: NGlauber Costa <glommer@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-