1. 05 8月, 2014 2 次提交
    • P
      KVM: Move all accesses to kvm::irq_routing into irqchip.c · 9957c86d
      Paul Mackerras 提交于
      Now that struct _irqfd does not keep a reference to storage pointed
      to by the irq_routing field of struct kvm, we can move the statement
      that updates it out from under the irqfds.lock and put it in
      kvm_set_irq_routing() instead.  That means we then have to take a
      srcu_read_lock on kvm->irq_srcu around the irqfd_update call in
      kvm_irqfd_assign(), since holding the kvm->irqfds.lock no longer
      ensures that that the routing can't change.
      
      Combined with changing kvm_irq_map_gsi() and kvm_irq_map_chip_pin()
      to take a struct kvm * argument instead of the pointer to the routing
      table, this allows us to to move all references to kvm->irq_routing
      into irqchip.c.  That in turn allows us to move the definition of the
      kvm_irq_routing_table struct into irqchip.c as well.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Tested-by: NEric Auger <eric.auger@linaro.org>
      Tested-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9957c86d
    • P
      KVM: irqchip: Provide and use accessors for irq routing table · 8ba918d4
      Paul Mackerras 提交于
      This provides accessor functions for the KVM interrupt mappings, in
      order to reduce the amount of code that accesses the fields of the
      kvm_irq_routing_table struct, and restrict that code to one file,
      virt/kvm/irqchip.c.  The new functions are kvm_irq_map_gsi(), which
      maps from a global interrupt number to a set of IRQ routing entries,
      and kvm_irq_map_chip_pin, which maps from IRQ chip and pin numbers to
      a global interrupt number.
      
      This also moves the update of kvm_irq_routing_table::chip[][]
      into irqchip.c, out of the various kvm_set_routing_entry
      implementations.  That means that none of the kvm_set_routing_entry
      implementations need the kvm_irq_routing_table argument anymore,
      so this removes it.
      
      This does not change any locking or data lifetime rules.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Tested-by: NEric Auger <eric.auger@linaro.org>
      Tested-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8ba918d4
  2. 28 7月, 2014 1 次提交
  3. 05 6月, 2014 1 次提交
  4. 05 5月, 2014 1 次提交
    • C
      kvm/irqchip: Speed up KVM_SET_GSI_ROUTING · 719d93cd
      Christian Borntraeger 提交于
      When starting lots of dataplane devices the bootup takes very long on
      Christian's s390 with irqfd patches. With larger setups he is even
      able to trigger some timeouts in some components.  Turns out that the
      KVM_SET_GSI_ROUTING ioctl takes very long (strace claims up to 0.1 sec)
      when having multiple CPUs.  This is caused by the  synchronize_rcu and
      the HZ=100 of s390.  By changing the code to use a private srcu we can
      speed things up.  This patch reduces the boot time till mounting root
      from 8 to 2 seconds on my s390 guest with 100 disks.
      
      Uses of hlist_for_each_entry_rcu, hlist_add_head_rcu, hlist_del_init_rcu
      are fine because they do not have lockdep checks (hlist_for_each_entry_rcu
      uses rcu_dereference_raw rather than rcu_dereference, and write-sides
      do not do rcu lockdep at all).
      
      Note that we're hardly relying on the "sleepable" part of srcu.  We just
      want SRCU's faster detection of grace periods.
      
      Testing was done by Andrew Theurer using netperf tests STREAM, MAERTS
      and RR.  The difference between results "before" and "after" the patch
      has mean -0.2% and standard deviation 0.6%.  Using a paired t-test on the
      data points says that there is a 2.5% probability that the patch is the
      cause of the performance difference (rather than a random fluctuation).
      
      (Restricting the t-test to RR, which is the most likely to be affected,
      changes the numbers to respectively -0.3% mean, 0.7% stdev, and 8%
      probability that the numbers actually say something about the patch.
      The probability increases mostly because there are fewer data points).
      
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> # s390
      Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      719d93cd
  5. 29 4月, 2014 1 次提交
  6. 24 4月, 2014 1 次提交
  7. 22 4月, 2014 1 次提交
  8. 18 4月, 2014 1 次提交
    • M
      KVM: VMX: speed up wildcard MMIO EVENTFD · 68c3b4d1
      Michael S. Tsirkin 提交于
      With KVM, MMIO is much slower than PIO, due to the need to
      do page walk and emulation. But with EPT, it does not have to be: we
      know the address from the VMCS so if the address is unique, we can look
      up the eventfd directly, bypassing emulation.
      
      Unfortunately, this only works if userspace does not need to match on
      access length and data.  The implementation adds a separate FAST_MMIO
      bus internally. This serves two purposes:
          - minimize overhead for old userspace that does not use eventfd with lengtth = 0
          - minimize disruption in other code (since we don't know the length,
            devices on the MMIO bus only get a valid address in write, this
            way we don't need to touch all devices to teach them to handle
            an invalid length)
      
      At the moment, this optimization only has effect for EPT on x86.
      
      It will be possible to speed up MMIO for NPT and MMU using the same
      idea in the future.
      
      With this patch applied, on VMX MMIO EVENTFD is essentially as fast as PIO.
      I was unable to detect any measureable slowdown to non-eventfd MMIO.
      
      Making MMIO faster is important for the upcoming virtio 1.0 which
      includes an MMIO signalling capability.
      
      The idea was suggested by Peter Anvin.  Lots of thanks to Gleb for
      pre-review and suggestions.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      68c3b4d1
  9. 21 3月, 2014 2 次提交
  10. 18 2月, 2014 1 次提交
  11. 30 1月, 2014 2 次提交
  12. 09 1月, 2014 2 次提交
  13. 22 12月, 2013 1 次提交
    • C
      KVM: arm-vgic: Support KVM_CREATE_DEVICE for VGIC · 7330672b
      Christoffer Dall 提交于
      Support creating the ARM VGIC device through the KVM_CREATE_DEVICE
      ioctl, which can then later be leveraged to use the
      KVM_{GET/SET}_DEVICE_ATTR, which is useful both for setting addresses in
      a more generic API than the ARM-specific one and is useful for
      save/restore of VGIC state.
      
      Adds KVM_CAP_DEVICE_CTRL to ARM capabilities.
      
      Note that we change the check for creating a VGIC from bailing out if
      any VCPUs were created, to bailing out if any VCPUs were ever run.  This
      is an important distinction that shouldn't break anything, but allows
      creating the VGIC after the VCPUs have been created.
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      7330672b
  14. 13 12月, 2013 1 次提交
  15. 01 11月, 2013 1 次提交
  16. 31 10月, 2013 3 次提交
  17. 17 10月, 2013 1 次提交
  18. 15 10月, 2013 1 次提交
  19. 14 10月, 2013 1 次提交
  20. 30 9月, 2013 1 次提交
    • P
      KVM: Convert kvm_lock back to non-raw spinlock · 2f303b74
      Paolo Bonzini 提交于
      In commit e935b837 ("KVM: Convert kvm_lock to raw_spinlock"),
      the kvm_lock was made a raw lock.  However, the kvm mmu_shrink()
      function tries to grab the (non-raw) mmu_lock within the scope of
      the raw locked kvm_lock being held.  This leads to the following:
      
      BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
      in_atomic(): 1, irqs_disabled(): 0, pid: 55, name: kswapd0
      Preemption disabled at:[<ffffffffa0376eac>] mmu_shrink+0x5c/0x1b0 [kvm]
      
      Pid: 55, comm: kswapd0 Not tainted 3.4.34_preempt-rt
      Call Trace:
       [<ffffffff8106f2ad>] __might_sleep+0xfd/0x160
       [<ffffffff817d8d64>] rt_spin_lock+0x24/0x50
       [<ffffffffa0376f3c>] mmu_shrink+0xec/0x1b0 [kvm]
       [<ffffffff8111455d>] shrink_slab+0x17d/0x3a0
       [<ffffffff81151f00>] ? mem_cgroup_iter+0x130/0x260
       [<ffffffff8111824a>] balance_pgdat+0x54a/0x730
       [<ffffffff8111fe47>] ? set_pgdat_percpu_threshold+0xa7/0xd0
       [<ffffffff811185bf>] kswapd+0x18f/0x490
       [<ffffffff81070961>] ? get_parent_ip+0x11/0x50
       [<ffffffff81061970>] ? __init_waitqueue_head+0x50/0x50
       [<ffffffff81118430>] ? balance_pgdat+0x730/0x730
       [<ffffffff81060d2b>] kthread+0xdb/0xe0
       [<ffffffff8106e122>] ? finish_task_switch+0x52/0x100
       [<ffffffff817e1e94>] kernel_thread_helper+0x4/0x10
       [<ffffffff81060c50>] ? __init_kthread_worker+0x
      
      After the previous patch, kvm_lock need not be a raw spinlock anymore,
      so change it back.
      Reported-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Cc: kvm@vger.kernel.org
      Cc: gleb@redhat.com
      Cc: jan.kiszka@siemens.com
      Reviewed-by: NGleb Natapov <gleb@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2f303b74
  21. 25 9月, 2013 1 次提交
  22. 17 9月, 2013 1 次提交
  23. 29 7月, 2013 1 次提交
  24. 18 7月, 2013 2 次提交
  25. 04 6月, 2013 1 次提交
    • A
      kvm: exclude ioeventfd from counting kvm_io_range limit · 6ea34c9b
      Amos Kong 提交于
      We can easily reach the 1000 limit by start VM with a couple
      hundred I/O devices (multifunction=on). The hardcode limit
      already been adjusted 3 times (6 ~ 200 ~ 300 ~ 1000).
      
      In userspace, we already have maximum file descriptor to
      limit ioeventfd count. But kvm_io_bus devices also are used
      for pit, pic, ioapic, coalesced_mmio. They couldn't be limited
      by maximum file descriptor.
      
      Currently only ioeventfds take too much kvm_io_bus devices,
      so just exclude it from counting kvm_io_range limit.
      
      Also fixed one indent issue in kvm_host.h
      Signed-off-by: NAmos Kong <akong@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      6ea34c9b
  26. 31 5月, 2013 1 次提交
    • F
      kvm: Move guest entry/exit APIs to context_tracking · 521921ba
      Frederic Weisbecker 提交于
      The kvm_host.h header file doesn't handle well
      inclusion when archs don't support KVM.
      
      This results in build crashes for such archs when they
      want to implement context tracking because this subsystem
      includes kvm_host.h in order to implement the
      guest_enter/exit APIs but it doesn't handle KVM off case.
      
      To fix this, move the guest_enter()/guest_exit()
      declarations and generic implementation to the context
      tracking headers. These generic APIs actually belong to
      this subsystem, besides other domains boundary tracking
      like user_enter() et al.
      
      KVM now properly becomes a user of this library, not the
      other buggy way around.
      Reported-by: NKevin Hilman <khilman@linaro.org>
      Reviewed-by: NKevin Hilman <khilman@linaro.org>
      Tested-by: NKevin Hilman <khilman@linaro.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      521921ba
  27. 16 5月, 2013 1 次提交
  28. 02 5月, 2013 1 次提交
    • P
      KVM: PPC: Book3S: Add API for in-kernel XICS emulation · 5975a2e0
      Paul Mackerras 提交于
      This adds the API for userspace to instantiate an XICS device in a VM
      and connect VCPUs to it.  The API consists of a new device type for
      the KVM_CREATE_DEVICE ioctl, a new capability KVM_CAP_IRQ_XICS, which
      functions similarly to KVM_CAP_IRQ_MPIC, and the KVM_IRQ_LINE ioctl,
      which is used to assert and deassert interrupt inputs of the XICS.
      
      The XICS device has one attribute group, KVM_DEV_XICS_GRP_SOURCES.
      Each attribute within this group corresponds to the state of one
      interrupt source.  The attribute number is the same as the interrupt
      source number.
      
      This does not support irq routing or irqfd yet.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      5975a2e0
  29. 28 4月, 2013 2 次提交
  30. 27 4月, 2013 3 次提交
    • S
      kvm: destroy emulated devices on VM exit · 07f0a7bd
      Scott Wood 提交于
      The hassle of getting refcounting right was greater than the hassle
      of keeping a list of devices to destroy on VM exit.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      07f0a7bd
    • S
      kvm/ppc/mpic: in-kernel MPIC emulation · 5df554ad
      Scott Wood 提交于
      Hook the MPIC code up to the KVM interfaces, add locking, etc.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      [agraf: add stub function for kvmppc_mpic_set_epr, non-booke, 64bit]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      5df554ad
    • S
      kvm: add device control API · 852b6d57
      Scott Wood 提交于
      Currently, devices that are emulated inside KVM are configured in a
      hardcoded manner based on an assumption that any given architecture
      only has one way to do it.  If there's any need to access device state,
      it is done through inflexible one-purpose-only IOCTLs (e.g.
      KVM_GET/SET_LAPIC).  Defining new IOCTLs for every little thing is
      cumbersome and depletes a limited numberspace.
      
      This API provides a mechanism to instantiate a device of a certain
      type, returning an ID that can be used to set/get attributes of the
      device.  Attributes may include configuration parameters (e.g.
      register base address), device state, operational commands, etc.  It
      is similar to the ONE_REG API, except that it acts on devices rather
      than vcpus.
      
      Both device types and individual attributes can be tested without having
      to create the device or get/set the attribute, without the need for
      separately managing enumerated capabilities.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      852b6d57