1. 23 7月, 2016 2 次提交
  2. 19 7月, 2016 1 次提交
  3. 16 6月, 2016 2 次提交
    • P
      KVM: remove kvm_vcpu_compatible · 557abc40
      Paolo Bonzini 提交于
      The new created_vcpus field makes it possible to avoid the race between
      irqchip and VCPU creation in a much nicer way; just check under kvm->lock
      whether a VCPU has already been created.
      
      We can then remove KVM_APIC_ARCHITECTURE too, because at this point the
      symbol is only governing the default definition of kvm_vcpu_compatible.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      557abc40
    • P
      KVM: introduce kvm->created_vcpus · 6c7caebc
      Paolo Bonzini 提交于
      The race between creating the irqchip and the first VCPU is
      currently fixed by checking the presence of an irqchip before
      updating kvm->online_vcpus, and undoing the whole VCPU creation
      if someone created the irqchip in the meanwhile.
      
      Instead, introduce a new field in struct kvm that will count VCPUs
      under a mutex, without the atomic access and memory ordering that we
      need elsewhere to protect the vcpus array.  This also plugs the race
      and is more easily applicable in all similar circumstances.
      Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6c7caebc
  4. 25 5月, 2016 1 次提交
    • J
      KVM: Create debugfs dir and stat files for each VM · 536a6f88
      Janosch Frank 提交于
      This patch adds a kvm debugfs subdirectory for each VM, which is named
      after its pid and file descriptor. The directories contain the same
      kind of files that are already in the kvm debugfs directory, but the
      data exported through them is now VM specific.
      
      This makes the debugfs kvm data a convenient alternative to the
      tracepoints which already have per VM data. The debugfs data is easy
      to read and low overhead.
      
      CC: Dan Carpenter <dan.carpenter@oracle.com> [includes fixes by Dan Carpenter]
      Signed-off-by: NJanosch Frank <frankja@linux.vnet.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      536a6f88
  5. 19 5月, 2016 1 次提交
  6. 13 5月, 2016 1 次提交
    • C
      KVM: halt_polling: provide a way to qualify wakeups during poll · 3491caf2
      Christian Borntraeger 提交于
      Some wakeups should not be considered a sucessful poll. For example on
      s390 I/O interrupts are usually floating, which means that _ALL_ CPUs
      would be considered runnable - letting all vCPUs poll all the time for
      transactional like workload, even if one vCPU would be enough.
      This can result in huge CPU usage for large guests.
      This patch lets architectures provide a way to qualify wakeups if they
      should be considered a good/bad wakeups in regard to polls.
      
      For s390 the implementation will fence of halt polling for anything but
      known good, single vCPU events. The s390 implementation for floating
      interrupts does a wakeup for one vCPU, but the interrupt will be delivered
      by whatever CPU checks first for a pending interrupt. We prefer the
      woken up CPU by marking the poll of this CPU as "good" poll.
      This code will also mark several other wakeup reasons like IPI or
      expired timers as "good". This will of course also mark some events as
      not sucessful. As  KVM on z runs always as a 2nd level hypervisor,
      we prefer to not poll, unless we are really sure, though.
      
      This patch successfully limits the CPU usage for cases like uperf 1byte
      transactional ping pong workload or wakeup heavy workload like OLTP
      while still providing a proper speedup.
      
      This also introduced a new vcpu stat "halt_poll_no_tuning" that marks
      wakeups that are considered not good for polling.
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: Radim Krčmář <rkrcmar@redhat.com> (for an earlier version)
      Cc: David Matlack <dmatlack@google.com>
      Cc: Wanpeng Li <kernellwp@gmail.com>
      [Rename config symbol. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3491caf2
  7. 12 5月, 2016 3 次提交
  8. 20 4月, 2016 1 次提交
    • P
      KVM: add missing memory barrier in kvm_{make,check}_request · 2e4682ba
      Paolo Bonzini 提交于
      kvm_make_request and kvm_check_request imply a producer-consumer
      relationship; add implicit memory barriers to them.  There was indeed
      already a place that was adding an explicit smp_mb() to order between
      kvm_check_request and the processing of the request.  That memory
      barrier can be removed (as an added benefit, kvm_check_request can use
      smp_mb__after_atomic which is free on x86).
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2e4682ba
  9. 25 2月, 2016 1 次提交
    • M
      KVM: Use simple waitqueue for vcpu->wq · 8577370f
      Marcelo Tosatti 提交于
      The problem:
      
      On -rt, an emulated LAPIC timer instances has the following path:
      
      1) hard interrupt
      2) ksoftirqd is scheduled
      3) ksoftirqd wakes up vcpu thread
      4) vcpu thread is scheduled
      
      This extra context switch introduces unnecessary latency in the
      LAPIC path for a KVM guest.
      
      The solution:
      
      Allow waking up vcpu thread from hardirq context,
      thus avoiding the need for ksoftirqd to be scheduled.
      
      Normal waitqueues make use of spinlocks, which on -RT
      are sleepable locks. Therefore, waking up a waitqueue
      waiter involves locking a sleeping lock, which
      is not allowed from hard interrupt context.
      
      cyclictest command line:
      
      This patch reduces the average latency in my tests from 14us to 11us.
      
      Daniel writes:
      Paolo asked for numbers from kvm-unit-tests/tscdeadline_latency
      benchmark on mainline. The test was run 1000 times on
      tip/sched/core 4.4.0-rc8-01134-g0905f04e:
      
        ./x86-run x86/tscdeadline_latency.flat -cpu host
      
      with idle=poll.
      
      The test seems not to deliver really stable numbers though most of
      them are smaller. Paolo write:
      
      "Anything above ~10000 cycles means that the host went to C1 or
      lower---the number means more or less nothing in that case.
      
      The mean shows an improvement indeed."
      
      Before:
      
                     min             max         mean           std
      count  1000.000000     1000.000000  1000.000000   1000.000000
      mean   5162.596000  2019270.084000  5824.491541  20681.645558
      std      75.431231   622607.723969    89.575700   6492.272062
      min    4466.000000    23928.000000  5537.926500    585.864966
      25%    5163.000000  16132529.750000  5790.132275  16683.745433
      50%    5175.000000  2281919.000000  5834.654000  23151.990026
      75%    5190.000000  2382865.750000  5861.412950  24148.206168
      max    5228.000000  4175158.000000  6254.827300  46481.048691
      
      After
                     min            max         mean           std
      count  1000.000000     1000.00000  1000.000000   1000.000000
      mean   5143.511000  2076886.10300  5813.312474  21207.357565
      std      77.668322   610413.09583    86.541500   6331.915127
      min    4427.000000    25103.00000  5529.756600    559.187707
      25%    5148.000000  1691272.75000  5784.889825  17473.518244
      50%    5160.000000  2308328.50000  5832.025000  23464.837068
      75%    5172.000000  2393037.75000  5853.177675  24223.969976
      max    5222.000000  3922458.00000  6186.720500  42520.379830
      
      [Patch was originaly based on the swait implementation found in the -rt
       tree. Daniel ported it to mainline's version and gathered the
       benchmark numbers for tscdeadline_latency test.]
      Signed-off-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: linux-rt-users@vger.kernel.org
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/1455871601-27484-4-git-send-email-wagi@monom.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      8577370f
  10. 16 1月, 2016 1 次提交
    • D
      kvm: rename pfn_t to kvm_pfn_t · ba049e93
      Dan Williams 提交于
      To date, we have implemented two I/O usage models for persistent memory,
      PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
      userspace).  This series adds a third, DAX-GUP, that allows DAX mappings
      to be the target of direct-i/o.  It allows userspace to coordinate
      DMA/RDMA from/to persistent memory.
      
      The implementation leverages the ZONE_DEVICE mm-zone that went into
      4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
      and dynamically mapped by a device driver.  The pmem driver, after
      mapping a persistent memory range into the system memmap via
      devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
      page-backed pmem-pfns via flags in the new pfn_t type.
      
      The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
      resulting pte(s) inserted into the process page tables with a new
      _PAGE_DEVMAP flag.  Later, when get_user_pages() is walking ptes it keys
      off _PAGE_DEVMAP to pin the device hosting the page range active.
      Finally, get_page() and put_page() are modified to take references
      against the device driver established page mapping.
      
      Finally, this need for "struct page" for persistent memory requires
      memory capacity to store the memmap array.  Given the memmap array for a
      large pool of persistent may exhaust available DRAM introduce a
      mechanism to allocate the memmap from persistent memory.  The new
      "struct vmem_altmap *" parameter to devm_memremap_pages() enables
      arch_add_memory() to use reserved pmem capacity rather than the page
      allocator.
      
      This patch (of 18):
      
      The core has developed a need for a "pfn_t" type [1].  Move the existing
      pfn_t in KVM to kvm_pfn_t [2].
      
      [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
      [2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.htmlSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Acked-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ba049e93
  11. 09 1月, 2016 4 次提交
  12. 17 12月, 2015 3 次提交
    • B
      kvm: Dump guest rIP when the guest tried something unsupported · 671d9ab3
      Borislav Petkov 提交于
      It looks like this in action:
      
        kvm [5197]: vcpu0, guest rIP: 0xffffffff810187ba unhandled rdmsr: 0xc001102
      
      and helps to pinpoint quickly where in the guest we did the unsupported
      thing.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      671d9ab3
    • A
      kvm/x86: Hyper-V SynIC timers · 1f4b34f8
      Andrey Smetanin 提交于
      Per Hyper-V specification (and as required by Hyper-V-aware guests),
      SynIC provides 4 per-vCPU timers.  Each timer is programmed via a pair
      of MSRs, and signals expiration by delivering a special format message
      to the configured SynIC message slot and triggering the corresponding
      synthetic interrupt.
      
      Note: as implemented by this patch, all periodic timers are "lazy"
      (i.e. if the vCPU wasn't scheduled for more than the timer period the
      timer events are lost), regardless of the corresponding configuration
      MSR.  If deemed necessary, the "catch up" mode (the timer period is
      shortened until the timer catches up) will be implemented later.
      
      Changes v2:
      * Use remainder to calculate periodic timer expiration time
      Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
      Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
      CC: Gleb Natapov <gleb@kernel.org>
      CC: Paolo Bonzini <pbonzini@redhat.com>
      CC: "K. Y. Srinivasan" <kys@microsoft.com>
      CC: Haiyang Zhang <haiyangz@microsoft.com>
      CC: Vitaly Kuznetsov <vkuznets@redhat.com>
      CC: Roman Kagan <rkagan@virtuozzo.com>
      CC: Denis V. Lunev <den@openvz.org>
      CC: qemu-devel@nongnu.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1f4b34f8
    • A
      kvm/x86: Hyper-V SynIC message slot pending clearing at SINT ack · 765eaa0f
      Andrey Smetanin 提交于
      The SynIC message protocol mandates that the message slot is claimed
      by atomically setting message type to something other than HVMSG_NONE.
      If another message is to be delivered while the slot is still busy,
      message pending flag is asserted to indicate to the guest that the
      hypervisor wants to be notified when the slot is released.
      
      To make sure the protocol works regardless of where the message
      sources are (kernel or userspace), clear the pending flag on SINT ACK
      notification, and let the message sources compete for the slot again.
      Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
      Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
      CC: Gleb Natapov <gleb@kernel.org>
      CC: Paolo Bonzini <pbonzini@redhat.com>
      CC: "K. Y. Srinivasan" <kys@microsoft.com>
      CC: Haiyang Zhang <haiyangz@microsoft.com>
      CC: Vitaly Kuznetsov <vkuznets@redhat.com>
      CC: Roman Kagan <rkagan@virtuozzo.com>
      CC: Denis V. Lunev <den@openvz.org>
      CC: qemu-devel@nongnu.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      765eaa0f
  13. 30 11月, 2015 2 次提交
  14. 26 11月, 2015 4 次提交
    • Y
      KVM: kvm_is_visible_gfn can be boolean · 33e94154
      Yaowei Bai 提交于
      This patch makes kvm_is_visible_gfn return bool due to this particular
      function only using either one or zero as its return value.
      
      No functional change.
      Signed-off-by: NYaowei Bai <baiyaowei@cmss.chinamobile.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      33e94154
    • A
      kvm/x86: Hyper-V kvm exit · db397571
      Andrey Smetanin 提交于
      A new vcpu exit is introduced to notify the userspace of the
      changes in Hyper-V SynIC configuration triggered by guest writing to the
      corresponding MSRs.
      
      Changes v4:
      * exit into userspace only if guest writes into SynIC MSR's
      
      Changes v3:
      * added KVM_EXIT_HYPERV types and structs notes into docs
      Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
      Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: NDenis V. Lunev <den@openvz.org>
      CC: Gleb Natapov <gleb@kernel.org>
      CC: Paolo Bonzini <pbonzini@redhat.com>
      CC: Roman Kagan <rkagan@virtuozzo.com>
      CC: Denis V. Lunev <den@openvz.org>
      CC: qemu-devel@nongnu.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      db397571
    • A
      kvm/x86: Hyper-V synthetic interrupt controller · 5c919412
      Andrey Smetanin 提交于
      SynIC (synthetic interrupt controller) is a lapic extension,
      which is controlled via MSRs and maintains for each vCPU
       - 16 synthetic interrupt "lines" (SINT's); each can be configured to
         trigger a specific interrupt vector optionally with auto-EOI
         semantics
       - a message page in the guest memory with 16 256-byte per-SINT message
         slots
       - an event flag page in the guest memory with 16 2048-bit per-SINT
         event flag areas
      
      The host triggers a SINT whenever it delivers a new message to the
      corresponding slot or flips an event flag bit in the corresponding area.
      The guest informs the host that it can try delivering a message by
      explicitly asserting EOI in lapic or writing to End-Of-Message (EOM)
      MSR.
      
      The userspace (qemu) triggers interrupts and receives EOM notifications
      via irqfd with resampler; for that, a GSI is allocated for each
      configured SINT, and irq_routing api is extended to support GSI-SINT
      mapping.
      
      Changes v4:
      * added activation of SynIC by vcpu KVM_ENABLE_CAP
      * added per SynIC active flag
      * added deactivation of APICv upon SynIC activation
      
      Changes v3:
      * added KVM_CAP_HYPERV_SYNIC and KVM_IRQ_ROUTING_HV_SINT notes into
      docs
      
      Changes v2:
      * do not use posted interrupts for Hyper-V SynIC AutoEOI vectors
      * add Hyper-V SynIC vectors into EOI exit bitmap
      * Hyper-V SyniIC SINT msr write logic simplified
      Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
      Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: NDenis V. Lunev <den@openvz.org>
      CC: Gleb Natapov <gleb@kernel.org>
      CC: Paolo Bonzini <pbonzini@redhat.com>
      CC: Roman Kagan <rkagan@virtuozzo.com>
      CC: Denis V. Lunev <den@openvz.org>
      CC: qemu-devel@nongnu.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5c919412
    • A
      kvm/irqchip: kvm_arch_irq_routing_update renaming split · abdb080f
      Andrey Smetanin 提交于
      Actually kvm_arch_irq_routing_update() should be
      kvm_arch_post_irq_routing_update() as it's called at the end
      of irq routing update.
      
      This renaming frees kvm_arch_irq_routing_update function name.
      kvm_arch_irq_routing_update() weak function which will be used
      to update mappings for arch-specific irq routing entries
      (in particular, the upcoming Hyper-V synthetic interrupts).
      Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
      Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: NDenis V. Lunev <den@openvz.org>
      CC: Gleb Natapov <gleb@kernel.org>
      CC: Paolo Bonzini <pbonzini@redhat.com>
      CC: Roman Kagan <rkagan@virtuozzo.com>
      CC: Denis V. Lunev <den@openvz.org>
      CC: qemu-devel@nongnu.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      abdb080f
  15. 19 11月, 2015 1 次提交
  16. 10 11月, 2015 1 次提交
  17. 04 11月, 2015 2 次提交
  18. 23 10月, 2015 1 次提交
  19. 16 10月, 2015 2 次提交
  20. 01 10月, 2015 6 次提交