1. 05 5月, 2014 1 次提交
    • C
      kvm/irqchip: Speed up KVM_SET_GSI_ROUTING · 719d93cd
      Christian Borntraeger 提交于
      When starting lots of dataplane devices the bootup takes very long on
      Christian's s390 with irqfd patches. With larger setups he is even
      able to trigger some timeouts in some components.  Turns out that the
      KVM_SET_GSI_ROUTING ioctl takes very long (strace claims up to 0.1 sec)
      when having multiple CPUs.  This is caused by the  synchronize_rcu and
      the HZ=100 of s390.  By changing the code to use a private srcu we can
      speed things up.  This patch reduces the boot time till mounting root
      from 8 to 2 seconds on my s390 guest with 100 disks.
      
      Uses of hlist_for_each_entry_rcu, hlist_add_head_rcu, hlist_del_init_rcu
      are fine because they do not have lockdep checks (hlist_for_each_entry_rcu
      uses rcu_dereference_raw rather than rcu_dereference, and write-sides
      do not do rcu lockdep at all).
      
      Note that we're hardly relying on the "sleepable" part of srcu.  We just
      want SRCU's faster detection of grace periods.
      
      Testing was done by Andrew Theurer using netperf tests STREAM, MAERTS
      and RR.  The difference between results "before" and "after" the patch
      has mean -0.2% and standard deviation 0.6%.  Using a paired t-test on the
      data points says that there is a 2.5% probability that the patch is the
      cause of the performance difference (rather than a random fluctuation).
      
      (Restricting the t-test to RR, which is the most likely to be affected,
      changes the numbers to respectively -0.3% mean, 0.7% stdev, and 8%
      probability that the numbers actually say something about the patch.
      The probability increases mostly because there are fewer data points).
      
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> # s390
      Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      719d93cd
  2. 28 4月, 2014 2 次提交
  3. 24 4月, 2014 1 次提交
  4. 22 4月, 2014 1 次提交
  5. 18 4月, 2014 2 次提交
    • M
      KVM: VMX: speed up wildcard MMIO EVENTFD · 68c3b4d1
      Michael S. Tsirkin 提交于
      With KVM, MMIO is much slower than PIO, due to the need to
      do page walk and emulation. But with EPT, it does not have to be: we
      know the address from the VMCS so if the address is unique, we can look
      up the eventfd directly, bypassing emulation.
      
      Unfortunately, this only works if userspace does not need to match on
      access length and data.  The implementation adds a separate FAST_MMIO
      bus internally. This serves two purposes:
          - minimize overhead for old userspace that does not use eventfd with lengtth = 0
          - minimize disruption in other code (since we don't know the length,
            devices on the MMIO bus only get a valid address in write, this
            way we don't need to touch all devices to teach them to handle
            an invalid length)
      
      At the moment, this optimization only has effect for EPT on x86.
      
      It will be possible to speed up MMIO for NPT and MMU using the same
      idea in the future.
      
      With this patch applied, on VMX MMIO EVENTFD is essentially as fast as PIO.
      I was unable to detect any measureable slowdown to non-eventfd MMIO.
      
      Making MMIO faster is important for the upcoming virtio 1.0 which
      includes an MMIO signalling capability.
      
      The idea was suggested by Peter Anvin.  Lots of thanks to Gleb for
      pre-review and suggestions.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      68c3b4d1
    • M
      KVM: support any-length wildcard ioeventfd · f848a5a8
      Michael S. Tsirkin 提交于
      It is sometimes benefitial to ignore IO size, and only match on address.
      In hindsight this would have been a better default than matching length
      when KVM_IOEVENTFD_FLAG_DATAMATCH is not set, In particular, this kind
      of access can be optimized on VMX: there no need to do page lookups.
      This can currently be done with many ioeventfds but in a suboptimal way.
      
      However we can't change kernel/userspace ABI without risk of breaking
      some applications.
      Use len = 0 to mean "ignore length for matching" in a more optimal way.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      f848a5a8
  6. 08 4月, 2014 1 次提交
  7. 04 4月, 2014 2 次提交
    • P
      KVM: ioapic: try to recover if pending_eoi goes out of range · 4009b249
      Paolo Bonzini 提交于
      The RTC tracking code tracks the cardinality of rtc_status.dest_map
      into rtc_status.pending_eoi.  It has some WARN_ONs that trigger if
      pending_eoi ever becomes negative; however, these do not do anything
      to recover, and it bad things will happen soon after they trigger.
      
      When the next RTC interrupt is triggered, rtc_check_coalesced() will
      return false, but ioapic_service will find pending_eoi != 0 and
      do a BUG_ON.  To avoid this, should pending_eoi ever be nonzero,
      call kvm_rtc_eoi_tracking_restore_all to recompute a correct
      dest_map and pending_eoi.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4009b249
    • P
      KVM: ioapic: fix assignment of ioapic->rtc_status.pending_eoi (CVE-2014-0155) · 5678de3f
      Paolo Bonzini 提交于
      QE reported that they got the BUG_ON in ioapic_service to trigger.
      I cannot reproduce it, but there are two reasons why this could happen.
      
      The less likely but also easiest one, is when kvm_irq_delivery_to_apic
      does not deliver to any APIC and returns -1.
      
      Because irqe.shorthand == 0, the kvm_for_each_vcpu loop in that
      function is never reached.  However, you can target the similar loop in
      kvm_irq_delivery_to_apic_fast; just program a zero logical destination
      address into the IOAPIC, or an out-of-range physical destination address.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5678de3f
  8. 21 3月, 2014 4 次提交
  9. 19 3月, 2014 1 次提交
    • C
      KVM: eventfd: Fix lock order inversion. · 684a0b71
      Cornelia Huck 提交于
      When registering a new irqfd, we call its ->poll method to collect any
      event that might have previously been pending so that we can trigger it.
      This is done under the kvm->irqfds.lock, which means the eventfd's ctx
      lock is taken under it.
      
      However, if we get a POLLHUP in irqfd_wakeup, we will be called with the
      ctx lock held before getting the irqfds.lock to deactivate the irqfd,
      causing lockdep to complain.
      
      Calling the ->poll method does not really need the irqfds.lock, so let's
      just move it after we've given up the irqfds.lock in kvm_irqfd_assign().
      Signed-off-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      684a0b71
  10. 13 3月, 2014 1 次提交
    • G
      kvm: x86: ignore ioapic polarity · 100943c5
      Gabriel L. Somlo 提交于
      Both QEMU and KVM have already accumulated a significant number of
      optimizations based on the hard-coded assumption that ioapic polarity
      will always use the ActiveHigh convention, where the logical and
      physical states of level-triggered irq lines always match (i.e.,
      active(asserted) == high == 1, inactive == low == 0). QEMU guests
      are expected to follow directions given via ACPI and configure the
      ioapic with polarity 0 (ActiveHigh). However, even when misbehaving
      guests (e.g. OS X <= 10.9) set the ioapic polarity to 1 (ActiveLow),
      QEMU will still use the ActiveHigh signaling convention when
      interfacing with KVM.
      
      This patch modifies KVM to completely ignore ioapic polarity as set by
      the guest OS, enabling misbehaving guests to work alongside those which
      comply with the ActiveHigh polarity specified by QEMU's ACPI tables.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NGabriel L. Somlo <somlo@cmu.edu>
      [Move documentation to KVM_IRQ_LINE, add ia64. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      100943c5
  11. 27 2月, 2014 2 次提交
  12. 18 2月, 2014 1 次提交
  13. 14 2月, 2014 2 次提交
  14. 04 2月, 2014 1 次提交
  15. 30 1月, 2014 4 次提交
  16. 15 1月, 2014 2 次提交
    • S
      kvm: Provide kvm_vcpu_eligible_for_directed_yield() stub · 4a55dd72
      Scott Wood 提交于
      Commit 7940876e ("kvm: make local
      functions static") broke KVM PPC builds due to removing (rather than
      moving) the stub version of kvm_vcpu_eligible_for_directed_yield().
      
      This patch reintroduces it.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Alexander Graf <agraf@suse.de>
      [Move the #ifdef inside the function. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4a55dd72
    • P
      kvm: vfio: silence GCC warning · e81d1ad3
      Paul Bolle 提交于
      Building vfio.o triggers a GCC warning (when building for 32 bits x86):
          arch/x86/kvm/../../../virt/kvm/vfio.c: In function 'kvm_vfio_set_group':
          arch/x86/kvm/../../../virt/kvm/vfio.c:104:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
            void __user *argp = (void __user *)arg;
                                ^
      
      Silence this warning by casting arg to unsigned long.
      
      argp's current type, "void __user *", is always casted to "int32_t
      __user *". So its type might as well be changed to "int32_t __user *".
      Signed-off-by: NPaul Bolle <pebolle@tiscali.nl>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e81d1ad3
  17. 09 1月, 2014 2 次提交
  18. 22 12月, 2013 10 次提交
    • C
      KVM: arm-vgic: Support CPU interface reg access · fa20f5ae
      Christoffer Dall 提交于
      Implement support for the CPU interface register access driven by MMIO
      address offsets from the CPU interface base address.  Useful for user
      space to support save/restore of the VGIC state.
      
      This commit adds support only for the same logic as the current VGIC
      support, and no more.  For example, the active priority registers are
      handled as RAZ/WI, just like setting priorities on the emulated
      distributor.
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      fa20f5ae
    • C
      KVM: arm-vgic: Add GICD_SPENDSGIR and GICD_CPENDSGIR handlers · 90a5355e
      Christoffer Dall 提交于
      Handle MMIO accesses to the two registers which should support both the
      case where the VMs want to read/write either of these registers and the
      case where user space reads/writes these registers to do save/restore of
      the VGIC state.
      
      Note that the added complexity compared to simple set/clear enable
      registers stems from the bookkeping of source cpu ids.  It may be
      possible to change the underlying data structure to simplify the
      complexity, but since this is not in the critical path at all, this will
      do.
      
      Also note that reading this register from a live guest will not be
      accurate compared to on hardware, because some state may be living on
      the CPU LRs and the only way to give a consistent read would be to force
      stop all the VCPUs and request them to unqueu the LR state onto the
      distributor.  Until we have an actual user of live reading this
      register, we can live with the difference.
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      90a5355e
    • C
      KVM: arm-vgic: Support unqueueing of LRs to the dist · cbd333a4
      Christoffer Dall 提交于
      To properly access the VGIC state from user space it is very unpractical
      to have to loop through all the LRs in all register access functions.
      Instead, support moving all pending state from LRs to the distributor,
      but leave active state LRs alone.
      
      Note that to accurately present the active and pending state to VCPUs
      reading these distributor registers from a live VM, we would have to
      stop all other VPUs than the calling VCPU and ask each CPU to unqueue
      their LR state onto the distributor and add fields to track active state
      on the distributor side as well.  We don't have any users of such
      functionality yet and there are other inaccuracies of the GIC emulation,
      so don't provide accurate synchronized access to this state just yet.
      However, when the time comes, having this function should help.
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      cbd333a4
    • C
      KVM: arm-vgic: Add vgic reg access from dev attr · c07a0191
      Christoffer Dall 提交于
      Add infrastructure to handle distributor and cpu interface register
      accesses through the KVM_{GET/SET}_DEVICE_ATTR interface by adding the
      KVM_DEV_ARM_VGIC_GRP_DIST_REGS and KVM_DEV_ARM_VGIC_GRP_CPU_REGS groups
      and defining the semantics of the attr field to be the MMIO offset as
      specified in the GICv2 specs.
      
      Missing register accesses or other changes in individual register access
      functions to support save/restore of the VGIC state is added in
      subsequent patches.
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      c07a0191
    • C
      KVM: arm-vgic: Make vgic mmio functions more generic · 1006e8cb
      Christoffer Dall 提交于
      Rename the vgic_ranges array to vgic_dist_ranges to be more specific and
      to prepare for handling CPU interface register access as well (for
      save/restore of VGIC state).
      
      Pass offset from distributor or interface MMIO base to
      find_matching_range function instead of the physical address of the
      access in the VM memory map.  This allows other callers unaware of the
      VM specifics, but with generic VGIC knowledge to reuse the function.
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      1006e8cb
    • C
      KVM: arm-vgic: Set base addr through device API · ce01e4e8
      Christoffer Dall 提交于
      Support setting the distributor and cpu interface base addresses in the
      VM physical address space through the KVM_{SET,GET}_DEVICE_ATTR API
      in addition to the ARM specific API.
      
      This has the added benefit of being able to share more code in user
      space and do things in a uniform manner.
      
      Also deprecate the older API at the same time, but backwards
      compatibility will be maintained.
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      ce01e4e8
    • C
      KVM: arm-vgic: Support KVM_CREATE_DEVICE for VGIC · 7330672b
      Christoffer Dall 提交于
      Support creating the ARM VGIC device through the KVM_CREATE_DEVICE
      ioctl, which can then later be leveraged to use the
      KVM_{GET/SET}_DEVICE_ATTR, which is useful both for setting addresses in
      a more generic API than the ARM-specific one and is useful for
      save/restore of VGIC state.
      
      Adds KVM_CAP_DEVICE_CTRL to ARM capabilities.
      
      Note that we change the check for creating a VGIC from bailing out if
      any VCPUs were created, to bailing out if any VCPUs were ever run.  This
      is an important distinction that shouldn't break anything, but allows
      creating the VGIC after the VCPUs have been created.
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      7330672b
    • C
      ARM: KVM: Allow creating the VGIC after VCPUs · e1ba0207
      Christoffer Dall 提交于
      Rework the VGIC initialization slightly to allow initialization of the
      vgic cpu-specific state even if the irqchip (the VGIC) hasn't been
      created by user space yet.  This is safe, because the vgic data
      structures are already allocated when the CPU is allocated if VGIC
      support is compiled into the kernel.  Further, the init process does not
      depend on any other information and the sacrifice is a slight
      performance degradation for creating VMs in the no-VGIC case.
      
      The reason is that the new device control API doesn't mandate creating
      the VGIC before creating the VCPU and it is unreasonable to require user
      space to create the VGIC before creating the VCPUs.
      
      At the same time move the irqchip_in_kernel check out of
      kvm_vcpu_first_run_init and into the init function to make the per-vcpu
      and global init functions symmetric and add comments on the exported
      functions making it a bit easier to understand the init flow by only
      looking at vgic.c.
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      e1ba0207
    • A
      ARM/KVM: save and restore generic timer registers · 39735a3a
      Andre Przywara 提交于
      For migration to work we need to save (and later restore) the state of
      each core's virtual generic timer.
      Since this is per VCPU, we can use the [gs]et_one_reg ioctl and export
      the three needed registers (control, counter, compare value).
      Though they live in cp15 space, we don't use the existing list, since
      they need special accessor functions and the arch timer is optional.
      Acked-by: NMarc Zynger <marc.zyngier@arm.com>
      Signed-off-by: NAndre Przywara <andre.przywara@linaro.org>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      39735a3a
    • C
      arm/arm64: KVM: arch_timer: Initialize cntvoff at kvm_init · a1a64387
      Christoffer Dall 提交于
      Initialize the cntvoff at kvm_init_vm time, not before running the VCPUs
      at the first time because that will overwrite any potentially restored
      values from user space.
      
      Cc: Andre Przywara <andre.przywara@linaro.org>
      Acked-by: NMarc Zynger <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      a1a64387