1. 10 6月, 2009 11 次提交
    • A
      KVM: Disable large pages on misaligned memory slots · ac04527f
      Avi Kivity 提交于
      If a slots guest physical address and host virtual address unequal (mod
      large page size), then we would erronously try to back guest large pages
      with host large pages.  Detect this misalignment and diable large page
      support for the trouble slot.
      
      Cc: stable@kernel.org
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      ac04527f
    • M
      KVM: take mmu_lock when updating a deleted slot · b43b1901
      Marcelo Tosatti 提交于
      kvm_handle_hva relies on mmu_lock protection to safely access
      the memslot structures.
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      b43b1901
    • M
      KVM: protect assigned dev workqueue, int handler and irq acker · 547de29e
      Marcelo Tosatti 提交于
      kvm_assigned_dev_ack_irq is vulnerable to a race condition with the
      interrupt handler function. It does:
      
              if (dev->host_irq_disabled) {
                      enable_irq(dev->host_irq);
                      dev->host_irq_disabled = false;
              }
      
      If an interrupt triggers before the host->dev_irq_disabled assignment,
      it will disable the interrupt and set dev->host_irq_disabled to true.
      
      On return to kvm_assigned_dev_ack_irq, dev->host_irq_disabled is set to
      false, and the next kvm_assigned_dev_ack_irq call will fail to reenable
      it.
      
      Other than that, having the interrupt handler and work handlers run in
      parallel sounds like asking for trouble (could not spot any obvious
      problem, but better not have to, its fragile).
      
      CC: sheng.yang@intel.com
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      547de29e
    • S
      KVM: VMX: Disable VMX when system shutdown · 8e1c1815
      Sheng Yang 提交于
      Intel TXT(Trusted Execution Technology) required VMX off for all cpu to work
      when system shutdown.
      
      CC: Joseph Cihula <joseph.cihula@intel.com>
      Signed-off-by: NSheng Yang <sheng@linux.intel.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      8e1c1815
    • G
      KVM: Fix interrupt unhalting a vcpu when it shouldn't · 78646121
      Gleb Natapov 提交于
      kvm_vcpu_block() unhalts vpu on an interrupt/timer without checking
      if interrupt window is actually opened.
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      78646121
    • G
      KVM: Timer event should not unconditionally unhalt vcpu. · 09cec754
      Gleb Natapov 提交于
      Currently timer events are processed before entering guest mode. Move it
      to main vcpu event loop since timer events should be processed even while
      vcpu is halted.  Timer may cause interrupt/nmi to be injected and only then
      vcpu will be unhalted.
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      09cec754
    • G
      KVM: MMU: do not free active mmu pages in free_mmu_pages() · f00be0ca
      Gleb Natapov 提交于
      free_mmu_pages() should only undo what alloc_mmu_pages() does.
      Free mmu pages from the generic VM destruction function, kvm_destroy_vm().
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      f00be0ca
    • S
      KVM: Device assignment framework rework · e56d532f
      Sheng Yang 提交于
      After discussion with Marcelo, we decided to rework device assignment framework
      together. The old problems are kernel logic is unnecessary complex. So Marcelo
      suggest to split it into a more elegant way:
      
      1. Split host IRQ assign and guest IRQ assign. And userspace determine the
      combination. Also discard msi2intx parameter, userspace can specific
      KVM_DEV_IRQ_HOST_MSI | KVM_DEV_IRQ_GUEST_INTX in assigned_irq->flags to
      enable MSI to INTx convertion.
      
      2. Split assign IRQ and deassign IRQ. Import two new ioctls:
      KVM_ASSIGN_DEV_IRQ and KVM_DEASSIGN_DEV_IRQ.
      
      This patch also fixed the reversed _IOR vs _IOW in definition(by deprecated the
      old interface).
      
      [avi: replace homemade bitcount() by hweight_long()]
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NSheng Yang <sheng@linux.intel.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      e56d532f
    • S
      KVM: Enable MSI-X for KVM assigned device · d510d6cc
      Sheng Yang 提交于
      This patch finally enable MSI-X.
      
      What we need for MSI-X:
      1. Intercept one page in MMIO region of device. So that we can get guest desired
      MSI-X table and set up the real one. Now this have been done by guest, and
      transfer to kernel using ioctl KVM_SET_MSIX_NR and KVM_SET_MSIX_ENTRY.
      
      2. Information for incoming interrupt. Now one device can have more than one
      interrupt, and they are all handled by one workqueue structure. So we need to
      identify them. The previous patch enable gsi_msg_pending_bitmap get this done.
      
      3. Mapping from host IRQ to guest gsi as well as guest gsi to real MSI/MSI-X
      message address/data. We used same entry number for the host and guest here, so
      that it's easy to find the correlated guest gsi.
      
      What we lack for now:
      1. The PCI spec said nothing can existed with MSI-X table in the same page of
      MMIO region, except pending bits. The patch ignore pending bits as the first
      step (so they are always 0 - no pending).
      
      2. The PCI spec allowed to change MSI-X table dynamically. That means, the OS
      can enable MSI-X, then mask one MSI-X entry, modify it, and unmask it. The patch
      didn't support this, and Linux also don't work in this way.
      
      3. The patch didn't implement MSI-X mask all and mask single entry. I would
      implement the former in driver/pci/msi.c later. And for single entry, userspace
      should have reposibility to handle it.
      Signed-off-by: NSheng Yang <sheng@linux.intel.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      d510d6cc
    • S
      KVM: Add MSI-X interrupt injection logic · 2350bd1f
      Sheng Yang 提交于
      We have to handle more than one interrupt with one handler for MSI-X. Avi
      suggested to use a flag to indicate the pending. So here is it.
      Signed-off-by: NSheng Yang <sheng@linux.intel.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      2350bd1f
    • S
      KVM: Ioctls for init MSI-X entry · c1e01514
      Sheng Yang 提交于
      Introduce KVM_SET_MSIX_NR and KVM_SET_MSIX_ENTRY two ioctls.
      
      This two ioctls are used by userspace to specific guest device MSI-X entry
      number and correlate MSI-X entry with GSI during the initialization stage.
      
      MSI-X should be well initialzed before enabling.
      
      Don't support change MSI-X entry number for now.
      Signed-off-by: NSheng Yang <sheng@linux.intel.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      c1e01514
  2. 09 6月, 2009 1 次提交
    • A
      kvm: fix kvm reboot crash when MAXSMP is used · 8437a617
      Avi Kivity 提交于
      one system was found there is crash during reboot then kvm/MAXSMP
      Sending all processes the KILL signal...                              done
      Please stand by while rebooting the system...
      [ 1721.856538] md: stopping all md devices.
      [ 1722.852139] kvm: exiting hardware virtualization
      [ 1722.854601] BUG: unable to handle kernel NULL pointer dereference at (null)
      [ 1722.872219] IP: [<ffffffff8102c6b6>] hardware_disable+0x4c/0xb4
      [ 1722.877955] PGD 0
      [ 1722.880042] Oops: 0000 [#1] SMP
      [ 1722.892548] last sysfs file: /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/host0/target0:2:0/0:2:0:0/vendor
      [ 1722.900977] CPU 9
      [ 1722.912606] Modules linked in:
      [ 1722.914226] Pid: 0, comm: swapper Not tainted 2.6.30-rc7-tip-01843-g2305324-dirty #299 ...
      [ 1722.932589] RIP: 0010:[<ffffffff8102c6b6>]  [<ffffffff8102c6b6>] hardware_disable+0x4c/0xb4
      [ 1722.942709] RSP: 0018:ffffc900010b6ed8  EFLAGS: 00010046
      [ 1722.956121] RAX: 0000000000000000 RBX: ffffc9000e253140 RCX: 0000000000000009
      [ 1722.972202] RDX: 000000000000b020 RSI: ffffc900010c3220 RDI: ffffffffffffd790
      [ 1722.977399] RBP: ffffc900010b6f08 R08: 0000000000000000 R09: 0000000000000000
      [ 1722.995149] R10: 00000000000004b8 R11: 966912b6c78fddbd R12: 0000000000000009
      [ 1723.011551] R13: 000000000000b020 R14: 0000000000000009 R15: 0000000000000000
      [ 1723.019898] FS:  0000000000000000(0000) GS:ffffc900010b3000(0000) knlGS:0000000000000000
      [ 1723.034389] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      [ 1723.041164] CR2: 0000000000000000 CR3: 0000000001001000 CR4: 00000000000006e0
      [ 1723.056192] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 1723.072546] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [ 1723.080562] Process swapper (pid: 0, threadinfo ffff88107e464000, task ffff88047e5a2550)
      [ 1723.096144] Stack:
      [ 1723.099071]  0000000000000046 ffffc9000e253168 966912b6c78fddbd ffffc9000e253140
      [ 1723.115471]  ffff880c7d4304d0 ffffc9000e253168 ffffc900010b6f28 ffffffff81011022
      [ 1723.132428]  ffffc900010b6f48 966912b6c78fddbd ffffc900010b6f48 ffffffff8100b83b
      [ 1723.141973] Call Trace:
      [ 1723.142981]  <IRQ> <0> [<ffffffff81011022>] kvm_arch_hardware_disable+0x26/0x3c
      [ 1723.158153]  [<ffffffff8100b83b>] hardware_disable+0x3f/0x55
      [ 1723.172168]  [<ffffffff810b95f6>] generic_smp_call_function_interrupt+0x76/0x13c
      [ 1723.178836]  [<ffffffff8104cbea>] smp_call_function_interrupt+0x3a/0x5e
      [ 1723.194689]  [<ffffffff81035bf3>] call_function_interrupt+0x13/0x20
      [ 1723.199750]  <EOI> <0> [<ffffffff814ad3b4>] ? acpi_idle_enter_c1+0xd3/0xf4
      [ 1723.217508]  [<ffffffff814ad3ae>] ? acpi_idle_enter_c1+0xcd/0xf4
      [ 1723.232172]  [<ffffffff814ad4bc>] ? acpi_idle_enter_bm+0xe7/0x2ce
      [ 1723.235141]  [<ffffffff81a8d93f>] ? __atomic_notifier_call_chain+0x0/0xac
      [ 1723.253381]  [<ffffffff818c3dff>] ? menu_select+0x58/0xd2
      [ 1723.258179]  [<ffffffff818c2c9d>] ? cpuidle_idle_call+0xa4/0xf3
      [ 1723.272828]  [<ffffffff81034085>] ? cpu_idle+0xb8/0x101
      [ 1723.277085]  [<ffffffff81a80163>] ? start_secondary+0x1bc/0x1d7
      [ 1723.293708] Code: b0 00 00 65 48 8b 04 25 28 00 00 00 48 89 45 e0 31 c0 48 8b 04 cd 30 ee 27 82 49 89 cc 49 89 d5 48 8b 04 10 48 8d b8 90 d7 ff ff <48> 8b 87 70 28 00 00 48 8d 98 90 d7 ff ff eb 16 e8 e9 fe ff ff
      [ 1723.335524] RIP  [<ffffffff8102c6b6>] hardware_disable+0x4c/0xb4
      [ 1723.342076]  RSP <ffffc900010b6ed8>
      [ 1723.352021] CR2: 0000000000000000
      [ 1723.354348] ---[ end trace e2aec53dae150aa1 ]---
      
      it turns out that we need clear cpus_hardware_enabled in that case.
      Reported-and-tested-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      8437a617
  3. 08 6月, 2009 1 次提交
  4. 22 4月, 2009 2 次提交
  5. 24 3月, 2009 8 次提交
    • S
      KVM: Get support IRQ routing entry counts · 36463146
      Sheng Yang 提交于
      In capability probing ioctl.
      Signed-off-by: NSheng Yang <sheng@linux.intel.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      36463146
    • W
      KVM: fix kvm_vm_ioctl_deassign_device · 4a906e49
      Weidong Han 提交于
      only need to set assigned_dev_id for deassignment, use
      match->flags to judge and deassign it.
      Acked-by: NMark McLoughlin <markmc@redhat.com>
      Signed-off-by: NWeidong Han <weidong.han@intel.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      4a906e49
    • J
      KVM: MMU: handle compound pages in kvm_is_mmio_pfn · fc5659c8
      Joerg Roedel 提交于
      The function kvm_is_mmio_pfn is called before put_page is called on a
      page by KVM. This is a problem when when this function is called on some
      struct page which is part of a compund page. It does not test the
      reserved flag of the compound page but of the struct page within the
      compount page. This is a problem when KVM works with hugepages allocated
      at boot time. These pages have the reserved bit set in all tail pages.
      Only the flag in the compount head is cleared. KVM would not put such a
      page which results in a memory leak.
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      Acked-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      fc5659c8
    • S
      KVM: Use irq routing API for MSI · 79950e10
      Sheng Yang 提交于
      Merge MSI userspace interface with IRQ routing table. Notice the API have been
      changed, and using IRQ routing table would be the only interface kvm-userspace
      supported.
      Signed-off-by: NSheng Yang <sheng@linux.intel.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      79950e10
    • A
      KVM: Userspace controlled irq routing · 399ec807
      Avi Kivity 提交于
      Currently KVM has a static routing from GSI numbers to interrupts (namely,
      0-15 are mapped 1:1 to both PIC and IOAPIC, and 16:23 are mapped 1:1 to
      the IOAPIC).  This is insufficient for several reasons:
      
      - HPET requires non 1:1 mapping for the timer interrupt
      - MSIs need a new method to assign interrupt numbers and dispatch them
      - ACPI APIC mode needs to be able to reassign the PCI LINK interrupts to the
        ioapics
      
      This patch implements an interrupt routing table (as a linked list, but this
      can be easily changed) and a userspace interface to replace the table.  The
      routing table is initialized according to the current hardwired mapping.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      399ec807
    • A
      KVM: Interrupt mask notifiers for ioapic · 75858a84
      Avi Kivity 提交于
      Allow clients to request notifications when the guest masks or unmasks a
      particular irq line.  This complements irq ack notifications, as the guest
      will not ack an irq line that is masked.
      
      Currently implemented for the ioapic only.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      75858a84
    • S
      KVM: Add support to disable MSI for assigned device · 17071fe7
      Sheng Yang 提交于
      MSI is always enabled by default for msi2intx=1. But if msi2intx=0, we
      have to disable MSI if guest require to do so.
      
      The patch also discard unnecessary msi2intx judgment if guest want to update
      MSI state.
      
      Notice KVM_DEV_IRQ_ASSIGN_MSI_ACTION is a mask which should cover all MSI
      related operations, though we only got one for now.
      Signed-off-by: NSheng Yang <sheng@linux.intel.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      17071fe7
    • J
      KVM: New guest debug interface · d0bfb940
      Jan Kiszka 提交于
      This rips out the support for KVM_DEBUG_GUEST and introduces a new IOCTL
      instead: KVM_SET_GUEST_DEBUG. The IOCTL payload consists of a generic
      part, controlling the "main switch" and the single-step feature. The
      arch specific part adds an x86 interface for intercepting both types of
      debug exceptions separately and re-injecting them when the host was not
      interested. Moveover, the foundation for guest debugging via debug
      registers is layed.
      
      To signal breakpoint events properly back to userland, an arch-specific
      data block is now returned along KVM_EXIT_DEBUG. For x86, the arch block
      contains the PC, the debug exception, and relevant debug registers to
      tell debug events properly apart.
      
      The availability of this new interface is signaled by
      KVM_CAP_SET_GUEST_DEBUG. Empty stubs for not yet supported archs are
      provided.
      
      Note that both SVM and VTX are supported, but only the latter was tested
      yet. Based on the experience with all those VTX corner case, I would be
      fairly surprised if SVM will work out of the box.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      d0bfb940
  6. 15 2月, 2009 5 次提交
  7. 03 1月, 2009 4 次提交
  8. 31 12月, 2008 8 次提交