1. 22 5月, 2014 1 次提交
  2. 14 5月, 2014 1 次提交
  3. 13 5月, 2014 1 次提交
  4. 08 5月, 2014 1 次提交
    • M
      kvm/x86: implement hv EOI assist · b63cf42f
      Michael S. Tsirkin 提交于
      It seems that it's easy to implement the EOI assist
      on top of the PV EOI feature: simply convert the
      page address to the format expected by PV EOI.
      
      Notes:
      -"No EOI required" is set only if interrupt injected
       is edge triggered; this is true because level interrupts are going
       through IOAPIC which disables PV EOI.
       In any case, if guest triggers EOI the bit will get cleared on exit.
      -For migration, set of HV_X64_MSR_APIC_ASSIST_PAGE sets
       KVM_PV_EOI_EN internally, so restoring HV_X64_MSR_APIC_ASSIST_PAGE
       seems sufficient
       In any case, bit is cleared on exit so worst case it's never re-enabled
      -no handling of PV EOI data is performed at HV_X64_MSR_EOI write;
       HV_X64_MSR_EOI is a separate optimization - it's an X2APIC
       replacement that lets you do EOI with an MSR and not IO.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b63cf42f
  5. 06 5月, 2014 2 次提交
  6. 24 4月, 2014 4 次提交
  7. 18 4月, 2014 1 次提交
    • M
      KVM: support any-length wildcard ioeventfd · f848a5a8
      Michael S. Tsirkin 提交于
      It is sometimes benefitial to ignore IO size, and only match on address.
      In hindsight this would have been a better default than matching length
      when KVM_IOEVENTFD_FLAG_DATAMATCH is not set, In particular, this kind
      of access can be optimized on VMX: there no need to do page lookups.
      This can currently be done with many ioeventfds but in a suboptimal way.
      
      However we can't change kernel/userspace ABI without risk of breaking
      some applications.
      Use len = 0 to mean "ignore length for matching" in a more optimal way.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      f848a5a8
  8. 15 4月, 2014 2 次提交
  9. 27 3月, 2014 1 次提交
  10. 20 3月, 2014 1 次提交
    • S
      x86, kvm: Fix CPU hotplug callback registration · 460dd42e
      Srivatsa S. Bhat 提交于
      Subsystems that want to register CPU hotplug callbacks, as well as perform
      initialization for the CPUs that are already online, often do it as shown
      below:
      
      	get_online_cpus();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	register_cpu_notifier(&foobar_cpu_notifier);
      
      	put_online_cpus();
      
      This is wrong, since it is prone to ABBA deadlocks involving the
      cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
      with CPU hotplug operations).
      
      Instead, the correct and race-free way of performing the callback
      registration is:
      
      	cpu_notifier_register_begin();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	/* Note the use of the double underscored version of the API */
      	__register_cpu_notifier(&foobar_cpu_notifier);
      
      	cpu_notifier_register_done();
      
      Fix the kvm code in x86 by using this latter form of callback registration.
      
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      460dd42e
  11. 17 3月, 2014 2 次提交
    • P
      KVM: x86: handle missing MPX in nested virtualization · 93c4adc7
      Paolo Bonzini 提交于
      When doing nested virtualization, we may be able to read BNDCFGS but
      still not be allowed to write to GUEST_BNDCFGS in the VMCS.  Guard
      writes to the field with vmx_mpx_supported(), and similarly hide the
      MSR from userspace if the processor does not support the field.
      
      We could work around this with the generic MSR save/load machinery,
      but there is only a limited number of MSR save/load slots and it is
      not really worthwhile to waste one for a scenario that should not
      happen except in the nested virtualization case.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      93c4adc7
    • P
      KVM: x86: introduce kvm_supported_xcr0() · 4ff41732
      Paolo Bonzini 提交于
      XSAVE support for KVM is already using host_xcr0 & KVM_SUPPORTED_XCR0 as
      a "dynamic" version of KVM_SUPPORTED_XCR0.
      
      However, this is not enough because the MPX bits should not be presented
      to the guest unless kvm_x86_ops confirms the support.  So, replace all
      instances of host_xcr0 & KVM_SUPPORTED_XCR0 with a new function
      kvm_supported_xcr0() that also has this check.
      
      Note that here:
      
      		if (xstate_bv & ~KVM_SUPPORTED_XCR0)
      			return -EINVAL;
      		if (xstate_bv & ~host_cr0)
      			return -EINVAL;
      
      the code is equivalent to
      
      		if ((xstate_bv & ~KVM_SUPPORTED_XCR0) ||
      		    (xstate_bv & ~host_cr0)
      			return -EINVAL;
      
      i.e. "xstate_bv & (~KVM_SUPPORTED_XCR0 | ~host_cr0)" which is in turn
      equal to "xstate_bv & ~(KVM_SUPPORTED_XCR0 & host_cr0)".  So we should
      also use the new function there.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4ff41732
  12. 13 3月, 2014 1 次提交
    • G
      kvm: x86: ignore ioapic polarity · 100943c5
      Gabriel L. Somlo 提交于
      Both QEMU and KVM have already accumulated a significant number of
      optimizations based on the hard-coded assumption that ioapic polarity
      will always use the ActiveHigh convention, where the logical and
      physical states of level-triggered irq lines always match (i.e.,
      active(asserted) == high == 1, inactive == low == 0). QEMU guests
      are expected to follow directions given via ACPI and configure the
      ioapic with polarity 0 (ActiveHigh). However, even when misbehaving
      guests (e.g. OS X <= 10.9) set the ioapic polarity to 1 (ActiveLow),
      QEMU will still use the ActiveHigh signaling convention when
      interfacing with KVM.
      
      This patch modifies KVM to completely ignore ioapic polarity as set by
      the guest OS, enabling misbehaving guests to work alongside those which
      comply with the ActiveHigh polarity specified by QEMU's ACPI tables.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NGabriel L. Somlo <somlo@cmu.edu>
      [Move documentation to KVM_IRQ_LINE, add ia64. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      100943c5
  13. 11 3月, 2014 4 次提交
    • P
      KVM: x86: Allow the guest to run with dirty debug registers · c77fb5fe
      Paolo Bonzini 提交于
      When not running in guest-debug mode, the guest controls the debug
      registers and having to take an exit for each DR access is a waste
      of time.  If the guest gets into a state where each context switch
      causes DR to be saved and restored, this can take away as much as 40%
      of the execution time from the guest.
      
      After this patch, VMX- and SVM-specific code can set a flag in
      switch_db_regs, telling vcpu_enter_guest that on the next exit the debug
      registers might be dirty and need to be reloaded (syncing will be taken
      care of by a new callback in kvm_x86_ops).  This flag can be set on the
      first access to a debug registers, so that multiple accesses to the
      debug registers only cause one vmexit.
      
      Note that since the guest will be able to read debug registers and
      enable breakpoints in DR7, we need to ensure that they are synchronized
      on entry to the guest---including DR6 that was not synced before.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c77fb5fe
    • P
      KVM: x86: change vcpu->arch.switch_db_regs to a bit mask · 360b948d
      Paolo Bonzini 提交于
      The next patch will add another bit that we can test with the
      same "if".
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      360b948d
    • J
      KVM: x86: Remove return code from enable_irq/nmi_window · c9a7953f
      Jan Kiszka 提交于
      It's no longer possible to enter enable_irq_window in guest mode when
      L1 intercepts external interrupts and we are entering L2. This is now
      caught in vcpu_enter_guest. So we can remove the check from the VMX
      version of enable_irq_window, thus the need to return an error code from
      both enable_irq_window and enable_nmi_window.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c9a7953f
    • J
      KVM: nVMX: Rework interception of IRQs and NMIs · b6b8a145
      Jan Kiszka 提交于
      Move the check for leaving L2 on pending and intercepted IRQs or NMIs
      from the *_allowed handler into a dedicated callback. Invoke this
      callback at the relevant points before KVM checks if IRQs/NMIs can be
      injected. The callback has the task to switch from L2 to L1 if needed
      and inject the proper vmexit events.
      
      The rework fixes L2 wakeups from HLT and provides the foundation for
      preemption timer emulation.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b6b8a145
  14. 04 3月, 2014 2 次提交
    • A
      x86: kvm: introduce periodic global clock updates · 332967a3
      Andrew Jones 提交于
      commit 0061d53d introduced a mechanism to execute a global clock
      update for a vm. We can apply this periodically in order to propagate
      host NTP corrections. Also, if all vcpus of a vm are pinned, then
      without an additional trigger, no guest NTP corrections can propagate
      either, as the current trigger is only vcpu cpu migration.
      Signed-off-by: NAndrew Jones <drjones@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      332967a3
    • A
      x86: kvm: rate-limit global clock updates · 7e44e449
      Andrew Jones 提交于
      When we update a vcpu's local clock it may pick up an NTP correction.
      We can't wait an indeterminate amount of time for other vcpus to pick
      up that correction, so commit 0061d53d introduced a global clock
      update. However, we can't request a global clock update on every vcpu
      load either (which is what happens if the tsc is marked as unstable).
      The solution is to rate-limit the global clock updates. Marcelo
      calculated that we should delay the global clock updates no more
      than 0.1s as follows:
      
      Assume an NTP correction c is applied to one vcpu, but not the other,
      then in n seconds the delta of the vcpu system_timestamps will be
      c * n. If we assume a correction of 500ppm (worst-case), then the two
      vcpus will diverge 50us in 0.1s, which is a considerable amount.
      Signed-off-by: NAndrew Jones <drjones@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7e44e449
  15. 28 2月, 2014 2 次提交
    • A
      kvm: x86: fix emulator buffer overflow (CVE-2014-0049) · a08d3b3b
      Andrew Honig 提交于
      The problem occurs when the guest performs a pusha with the stack
      address pointing to an mmio address (or an invalid guest physical
      address) to start with, but then extending into an ordinary guest
      physical address.  When doing repeated emulated pushes
      emulator_read_write sets mmio_needed to 1 on the first one.  On a
      later push when the stack points to regular memory,
      mmio_nr_fragments is set to 0, but mmio_is_needed is not set to 0.
      
      As a result, KVM exits to userspace, and then returns to
      complete_emulated_mmio.  In complete_emulated_mmio
      vcpu->mmio_cur_fragment is incremented.  The termination condition of
      vcpu->mmio_cur_fragment == vcpu->mmio_nr_fragments is never achieved.
      The code bounces back and fourth to userspace incrementing
      mmio_cur_fragment past it's buffer.  If the guest does nothing else it
      eventually leads to a a crash on a memcpy from invalid memory address.
      
      However if a guest code can cause the vm to be destroyed in another
      vcpu with excellent timing, then kvm_clear_async_pf_completion_queue
      can be used by the guest to control the data that's pointed to by the
      call to cancel_work_item, which can be used to gain execution.
      
      Fixes: f78146b0Signed-off-by: NAndrew Honig <ahonig@google.com>
      Cc: stable@vger.kernel.org (3.5+)
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a08d3b3b
    • T
      KVM: x86: Break kvm_for_each_vcpu loop after finding the VP_INDEX · 684851a1
      Takuya Yoshikawa 提交于
      No need to scan the entire VCPU array.
      Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      684851a1
  16. 26 2月, 2014 3 次提交
  17. 22 2月, 2014 1 次提交
  18. 04 2月, 2014 1 次提交
  19. 30 1月, 2014 1 次提交
  20. 27 1月, 2014 1 次提交
  21. 24 1月, 2014 2 次提交
  22. 17 1月, 2014 3 次提交
    • J
      KVM: SVM: Fix reading of DR6 · 73aaf249
      Jan Kiszka 提交于
      In contrast to VMX, SVM dose not automatically transfer DR6 into the
      VCPU's arch.dr6. So if we face a DR6 read, we must consult a new vendor
      hook to obtain the current value. And as SVM now picks the DR6 state
      from its VMCB, we also need a set callback in order to write updates of
      DR6 back.
      
      Fixes a regression of 020df079.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      73aaf249
    • J
      KVM: x86: Sync DR7 on KVM_SET_DEBUGREGS · 9926c9fd
      Jan Kiszka 提交于
      Whenever we change arch.dr7, we also have to call kvm_update_dr7. In
      case guest debugging is off, this will synchronize the new state into
      hardware.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9926c9fd
    • V
      add support for Hyper-V reference time counter · e984097b
      Vadim Rozenfeld 提交于
      Signed-off: Peter Lieven <pl@kamp.de>
      Signed-off: Gleb Natapov
      Signed-off: Vadim Rozenfeld <vrozenfe@redhat.com>
      
      After some consideration I decided to submit only Hyper-V reference
      counters support this time. I will submit iTSC support as a separate
      patch as soon as it is ready.
      
      v1 -> v2
      1. mark TSC page dirty as suggested by
          Eric Northup <digitaleric@google.com> and Gleb
      2. disable local irq when calling get_kernel_ns,
          as it was done by Peter Lieven <pl@amp.de>
      3. move check for TSC page enable from second patch
          to this one.
      
      v3 -> v4
          Get rid of ref counter offset.
      
      v4 -> v5
          replace __copy_to_user with kvm_write_guest
          when updateing iTSC page.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e984097b
  23. 16 1月, 2014 1 次提交
  24. 15 1月, 2014 1 次提交
    • M
      KVM: x86: fix tsc catchup issue with tsc scaling · f25e656d
      Marcelo Tosatti 提交于
      To fix a problem related to different resolution of TSC and system clock,
      the offset in TSC units is approximated by
      
      delta = vcpu->hv_clock.tsc_timestamp 	- 	vcpu->last_guest_tsc
      
      (Guest TSC value at 			(Guest TSC value at last VM-exit)
      the last kvm_guest_time_update
      call)
      
      Delta is then later scaled using mult,shift pair found in hv_clock
      structure (which is correct against tsc_timestamp in that
      structure).
      
      However, if a frequency change is performed between these two points,
      this delta is measured using different TSC frequencies, but scaled using
      mult,shift pair for one frequency only.
      
      The end result is an incorrect delta.
      
      The bug which this code works around is not the only cause for
      clock backwards events. The global accumulator is still
      necessary, so remove the max_kernel_ns fix and rely on the
      global accumulator for no clock backwards events.
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f25e656d