1. 25 4月, 2010 2 次提交
  2. 01 3月, 2010 6 次提交
  3. 08 12月, 2009 1 次提交
  4. 03 12月, 2009 7 次提交
    • C
      KVM: s390: Make psw available on all exits, not just a subset · d7b0b5eb
      Carsten Otte 提交于
      This patch moves s390 processor status word into the base kvm_run
      struct and keeps it up-to date on all userspace exits.
      
      The userspace ABI is broken by this, however there are no applications
      in the wild using this.  A capability check is provided so users can
      verify the updated API exists.
      
      Cc: stable@kernel.org
      Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      d7b0b5eb
    • J
      KVM: x86: Add KVM_GET/SET_VCPU_EVENTS · 3cfc3092
      Jan Kiszka 提交于
      This new IOCTL exports all yet user-invisible states related to
      exceptions, interrupts, and NMIs. Together with appropriate user space
      changes, this fixes sporadic problems of vmsave/restore, live migration
      and system reset.
      
      [avi: future-proof abi by adding a flags field]
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      3cfc3092
    • A
      KVM: VMX: Report unexpected simultaneous exceptions as internal errors · 65ac7264
      Avi Kivity 提交于
      These happen when we trap an exception when another exception is being
      delivered; we only expect these with MCEs and page faults.  If something
      unexpected happens, things probably went south and we're better off reporting
      an internal error and freezing.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      65ac7264
    • A
      KVM: Allow internal errors reported to userspace to carry extra data · a9c7399d
      Avi Kivity 提交于
      Usually userspace will freeze the guest so we can inspect it, but some
      internal state is not available.  Add extra data to internal error
      reporting so we can expose it to the debugger.  Extra data is specific
      to the suberror.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      a9c7399d
    • J
      KVM: Reorder IOCTLs in main kvm.h · c54d2aba
      Jan Kiszka 提交于
      Obviously, people tend to extend this header at the bottom - more or
      less blindly. Ensure that deprecated stuff gets its own corner again by
      moving things to the top. Also add some comments and reindent IOCTLs to
      make them more readable and reduce the risk of number collisions.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      c54d2aba
    • G
      KVM: allow userspace to adjust kvmclock offset · afbcf7ab
      Glauber Costa 提交于
      When we migrate a kvm guest that uses pvclock between two hosts, we may
      suffer a large skew. This is because there can be significant differences
      between the monotonic clock of the hosts involved. When a new host with
      a much larger monotonic time starts running the guest, the view of time
      will be significantly impacted.
      
      Situation is much worse when we do the opposite, and migrate to a host with
      a smaller monotonic clock.
      
      This proposed ioctl will allow userspace to inform us what is the monotonic
      clock value in the source host, so we can keep the time skew short, and
      more importantly, never goes backwards. Userspace may also need to trigger
      the current data, since from the first migration onwards, it won't be
      reflected by a simple call to clock_gettime() anymore.
      
      [marcelo: future-proof abi with a flags field]
      [jan: fix KVM_GET_CLOCK by clearing flags field instead of checking it]
      Signed-off-by: NGlauber Costa <glommer@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      afbcf7ab
    • E
      KVM: Xen PV-on-HVM guest support · ffde22ac
      Ed Swierk 提交于
      Support for Xen PV-on-HVM guests can be implemented almost entirely in
      userspace, except for handling one annoying MSR that maps a Xen
      hypercall blob into guest address space.
      
      A generic mechanism to delegate MSR writes to userspace seems overkill
      and risks encouraging similar MSR abuse in the future.  Thus this patch
      adds special support for the Xen HVM MSR.
      
      I implemented a new ioctl, KVM_XEN_HVM_CONFIG, that lets userspace tell
      KVM which MSR the guest will write to, as well as the starting address
      and size of the hypercall blobs (one each for 32-bit and 64-bit) that
      userspace has loaded from files.  When the guest writes to the MSR, KVM
      copies one page of the blob from userspace to the guest.
      
      I've tested this patch with a hacked-up version of Gerd's userspace
      code, booting a number of guests (CentOS 5.3 i386 and x86_64, and
      FreeBSD 8.0-RC1 amd64) and exercising PV network and block devices.
      
      [jan: fix i386 build warning]
      [avi: future proof abi with a flags field]
      Signed-off-by: NEd Swierk <eswierk@aristanetworks.com>
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      ffde22ac
  5. 10 9月, 2009 11 次提交
    • S
      KVM: VMX: Introduce KVM_SET_IDENTITY_MAP_ADDR ioctl · b927a3ce
      Sheng Yang 提交于
      Now KVM allow guest to modify guest's physical address of EPT's identity mapping page.
      
      (change from v1, discard unnecessary check, change ioctl to accept parameter
      address rather than value)
      Signed-off-by: NSheng Yang <sheng@linux.intel.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      b927a3ce
    • G
      KVM: add ioeventfd support · d34e6b17
      Gregory Haskins 提交于
      ioeventfd is a mechanism to register PIO/MMIO regions to trigger an eventfd
      signal when written to by a guest.  Host userspace can register any
      arbitrary IO address with a corresponding eventfd and then pass the eventfd
      to a specific end-point of interest for handling.
      
      Normal IO requires a blocking round-trip since the operation may cause
      side-effects in the emulated model or may return data to the caller.
      Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM
      "heavy-weight" exit back to userspace, and is ultimately serviced by qemu's
      device model synchronously before returning control back to the vcpu.
      
      However, there is a subclass of IO which acts purely as a trigger for
      other IO (such as to kick off an out-of-band DMA request, etc).  For these
      patterns, the synchronous call is particularly expensive since we really
      only want to simply get our notification transmitted asychronously and
      return as quickly as possible.  All the sychronous infrastructure to ensure
      proper data-dependencies are met in the normal IO case are just unecessary
      overhead for signalling.  This adds additional computational load on the
      system, as well as latency to the signalling path.
      
      Therefore, we provide a mechanism for registration of an in-kernel trigger
      point that allows the VCPU to only require a very brief, lightweight
      exit just long enough to signal an eventfd.  This also means that any
      clients compatible with the eventfd interface (which includes userspace
      and kernelspace equally well) can now register to be notified. The end
      result should be a more flexible and higher performance notification API
      for the backend KVM hypervisor and perhipheral components.
      
      To test this theory, we built a test-harness called "doorbell".  This
      module has a function called "doorbell_ring()" which simply increments a
      counter for each time the doorbell is signaled.  It supports signalling
      from either an eventfd, or an ioctl().
      
      We then wired up two paths to the doorbell: One via QEMU via a registered
      io region and through the doorbell ioctl().  The other is direct via
      ioeventfd.
      
      You can download this test harness here:
      
      ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2
      
      The measured results are as follows:
      
      qemu-mmio:       110000 iops, 9.09us rtt
      ioeventfd-mmio: 200100 iops, 5.00us rtt
      ioeventfd-pio:  367300 iops, 2.72us rtt
      
      I didn't measure qemu-pio, because I have to figure out how to register a
      PIO region with qemu's device model, and I got lazy.  However, for now we
      can extrapolate based on the data from the NULLIO runs of +2.56us for MMIO,
      and -350ns for HC, we get:
      
      qemu-pio:      153139 iops, 6.53us rtt
      ioeventfd-hc: 412585 iops, 2.37us rtt
      
      these are just for fun, for now, until I can gather more data.
      
      Here is a graph for your convenience:
      
      http://developer.novell.com/wiki/images/7/76/Iofd-chart.png
      
      The conclusion to draw is that we save about 4us by skipping the userspace
      hop.
      
      --------------------
      Signed-off-by: NGregory Haskins <ghaskins@novell.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      d34e6b17
    • B
      KVM: PIT support for HPET legacy mode · e9f42757
      Beth Kon 提交于
      When kvm is in hpet_legacy_mode, the hpet is providing the timer
      interrupt and the pit should not be. So in legacy mode, the pit timer
      is destroyed, but the *state* of the pit is maintained. So if kvm or
      the guest tries to modify the state of the pit, this modification is
      accepted, *except* that the timer isn't actually started. When we exit
      hpet_legacy_mode, the current state of the pit (which is up to date
      since we've been accepting modifications) is used to restart the pit
      timer.
      
      The saved_mode code in kvm_pit_load_count temporarily changes mode to
      0xff in order to destroy the timer, but then restores the actual
      value, again maintaining "current" state of the pit for possible later
      reenablement.
      
      [avi: add some reserved storage in the ioctl; make SET_PIT2 IOW]
      [marcelo: fix memory corruption due to reserved storage]
      Signed-off-by: NBeth Kon <eak@us.ibm.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      e9f42757
    • M
      KVM: remove old KVMTRACE support code · 2023a29c
      Marcelo Tosatti 提交于
      Return EOPNOTSUPP for KVM_TRACE_ENABLE/PAUSE/DISABLE ioctls.
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      2023a29c
    • A
      KVM: Return to userspace on emulation failure · 3f5d18a9
      Avi Kivity 提交于
      Instead of mindlessly retrying to execute the instruction, report the
      failure to userspace.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      3f5d18a9
    • G
      KVM: Break dependency between vcpu index in vcpus array and vcpu_id. · 73880c80
      Gleb Natapov 提交于
      Archs are free to use vcpu_id as they see fit. For x86 it is used as
      vcpu's apic id. New ioctl is added to configure boot vcpu id that was
      assumed to be 0 till now.
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      73880c80
    • A
      KVM: Reorder ioctls in kvm.h · 6a4a9839
      Avi Kivity 提交于
      Somehow the VM ioctls got unsorted; resort.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      6a4a9839
    • S
      KVM: Downsize max support MSI-X entry to 256 · e7333391
      Sheng Yang 提交于
      We only trap one page for MSI-X entry now, so it's 4k/(128/8) = 256 entries at
      most.
      Signed-off-by: NSheng Yang <sheng@linux.intel.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      e7333391
    • J
      KVM: Allow PIT emulation without speaker port · c5ff41ce
      Jan Kiszka 提交于
      The in-kernel speaker emulation is only a dummy and also unneeded from
      the performance point of view. Rather, it takes user space support to
      generate sound output on the host, e.g. console beeps.
      
      To allow this, introduce KVM_CREATE_PIT2 which controls in-kernel
      speaker port emulation via a flag passed along the new IOCTL. It also
      leaves room for future extensions of the PIT configuration interface.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      c5ff41ce
    • G
      KVM: irqfd · 721eecbf
      Gregory Haskins 提交于
      KVM provides a complete virtual system environment for guests, including
      support for injecting interrupts modeled after the real exception/interrupt
      facilities present on the native platform (such as the IDT on x86).
      Virtual interrupts can come from a variety of sources (emulated devices,
      pass-through devices, etc) but all must be injected to the guest via
      the KVM infrastructure.  This patch adds a new mechanism to inject a specific
      interrupt to a guest using a decoupled eventfd mechnanism:  Any legal signal
      on the irqfd (using eventfd semantics from either userspace or kernel) will
      translate into an injected interrupt in the guest at the next available
      interrupt window.
      Signed-off-by: NGregory Haskins <ghaskins@novell.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      721eecbf
    • H
      KVM: Add MCE support · 890ca9ae
      Huang Ying 提交于
      The related MSRs are emulated. MCE capability is exported via
      extension KVM_CAP_MCE and ioctl KVM_X86_GET_MCE_CAP_SUPPORTED.  A new
      vcpu ioctl command KVM_X86_SETUP_MCE is used to setup MCE emulation
      such as the mcg_cap. MCE is injected via vcpu ioctl command
      KVM_X86_SET_MCE. Extended machine-check state (MCG_EXT_P) and CMCI are
      not implemented.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      890ca9ae
  6. 10 6月, 2009 4 次提交
    • N
      KVM: Make kvm header C++ friendly · 2f8b9ee1
      nathan binkert 提交于
      Two things needed fixing: 1) g++ does not allow a named structure type
      within an anonymous union and 2) Avoid name clash between two padding
      fields within the same struct by giving them different names as is
      done elsewhere in the header.
      Signed-off-by: NNathan Binkert <nate@binkert.org>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      2f8b9ee1
    • S
      KVM: Device assignment framework rework · e56d532f
      Sheng Yang 提交于
      After discussion with Marcelo, we decided to rework device assignment framework
      together. The old problems are kernel logic is unnecessary complex. So Marcelo
      suggest to split it into a more elegant way:
      
      1. Split host IRQ assign and guest IRQ assign. And userspace determine the
      combination. Also discard msi2intx parameter, userspace can specific
      KVM_DEV_IRQ_HOST_MSI | KVM_DEV_IRQ_GUEST_INTX in assigned_irq->flags to
      enable MSI to INTx convertion.
      
      2. Split assign IRQ and deassign IRQ. Import two new ioctls:
      KVM_ASSIGN_DEV_IRQ and KVM_DEASSIGN_DEV_IRQ.
      
      This patch also fixed the reversed _IOR vs _IOW in definition(by deprecated the
      old interface).
      
      [avi: replace homemade bitcount() by hweight_long()]
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NSheng Yang <sheng@linux.intel.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      e56d532f
    • S
      KVM: Enable MSI-X for KVM assigned device · d510d6cc
      Sheng Yang 提交于
      This patch finally enable MSI-X.
      
      What we need for MSI-X:
      1. Intercept one page in MMIO region of device. So that we can get guest desired
      MSI-X table and set up the real one. Now this have been done by guest, and
      transfer to kernel using ioctl KVM_SET_MSIX_NR and KVM_SET_MSIX_ENTRY.
      
      2. Information for incoming interrupt. Now one device can have more than one
      interrupt, and they are all handled by one workqueue structure. So we need to
      identify them. The previous patch enable gsi_msg_pending_bitmap get this done.
      
      3. Mapping from host IRQ to guest gsi as well as guest gsi to real MSI/MSI-X
      message address/data. We used same entry number for the host and guest here, so
      that it's easy to find the correlated guest gsi.
      
      What we lack for now:
      1. The PCI spec said nothing can existed with MSI-X table in the same page of
      MMIO region, except pending bits. The patch ignore pending bits as the first
      step (so they are always 0 - no pending).
      
      2. The PCI spec allowed to change MSI-X table dynamically. That means, the OS
      can enable MSI-X, then mask one MSI-X entry, modify it, and unmask it. The patch
      didn't support this, and Linux also don't work in this way.
      
      3. The patch didn't implement MSI-X mask all and mask single entry. I would
      implement the former in driver/pci/msi.c later. And for single entry, userspace
      should have reposibility to handle it.
      Signed-off-by: NSheng Yang <sheng@linux.intel.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      d510d6cc
    • S
      KVM: Ioctls for init MSI-X entry · c1e01514
      Sheng Yang 提交于
      Introduce KVM_SET_MSIX_NR and KVM_SET_MSIX_ENTRY two ioctls.
      
      This two ioctls are used by userspace to specific guest device MSI-X entry
      number and correlate MSI-X entry with GSI during the initialization stage.
      
      MSI-X should be well initialzed before enabling.
      
      Don't support change MSI-X entry number for now.
      Signed-off-by: NSheng Yang <sheng@linux.intel.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      c1e01514
  7. 22 4月, 2009 1 次提交
  8. 24 3月, 2009 8 次提交