1. 17 3月, 2014 1 次提交
  2. 13 3月, 2014 3 次提交
    • G
      kvm: x86: ignore ioapic polarity · 100943c5
      Gabriel L. Somlo 提交于
      Both QEMU and KVM have already accumulated a significant number of
      optimizations based on the hard-coded assumption that ioapic polarity
      will always use the ActiveHigh convention, where the logical and
      physical states of level-triggered irq lines always match (i.e.,
      active(asserted) == high == 1, inactive == low == 0). QEMU guests
      are expected to follow directions given via ACPI and configure the
      ioapic with polarity 0 (ActiveHigh). However, even when misbehaving
      guests (e.g. OS X <= 10.9) set the ioapic polarity to 1 (ActiveLow),
      QEMU will still use the ActiveHigh signaling convention when
      interfacing with KVM.
      
      This patch modifies KVM to completely ignore ioapic polarity as set by
      the guest OS, enabling misbehaving guests to work alongside those which
      comply with the ActiveHigh polarity specified by QEMU's ACPI tables.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NGabriel L. Somlo <somlo@cmu.edu>
      [Move documentation to KVM_IRQ_LINE, add ia64. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      100943c5
    • P
      KVM: PPC: Book3S HV: Fix register usage when loading/saving VRSAVE · e724f080
      Paul Mackerras 提交于
      Commit 595e4f7e ("KVM: PPC: Book3S HV: Use load/store_fp_state
      functions in HV guest entry/exit") changed the register usage in
      kvmppc_save_fp() and kvmppc_load_fp() but omitted changing the
      instructions that load and save VRSAVE.  The result is that the
      VRSAVE value was loaded from a constant address, and saved to a
      location past the end of the vcpu struct, causing host kernel
      memory corruption and various kinds of host kernel crashes.
      
      This fixes the problem by using register r31, which contains the
      vcpu pointer, instead of r3 and r4.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e724f080
    • P
      KVM: PPC: Book3S HV: Remove bogus duplicate code · a5b0ccb0
      Paul Mackerras 提交于
      Commit 7b490411 ("KVM: PPC: Book3S HV: Add new state for
      transactional memory") incorrectly added some duplicate code to the
      guest exit path because I didn't manage to clean up after a rebase
      correctly.  This removes the extraneous material.  The presence of
      this extraneous code causes host crashes whenever a guest is run.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a5b0ccb0
  3. 11 3月, 2014 11 次提交
    • P
      KVM: svm: Allow the guest to run with dirty debug registers · facb0139
      Paolo Bonzini 提交于
      When not running in guest-debug mode (i.e. the guest controls the debug
      registers, having to take an exit for each DR access is a waste of time.
      If the guest gets into a state where each context switch causes DR to be
      saved and restored, this can take away as much as 40% of the execution
      time from the guest.
      
      If the guest is running with vcpu->arch.db == vcpu->arch.eff_db, we
      can let it write freely to the debug registers and reload them on the
      next exit.  We still need to exit on the first access, so that the
      KVM_DEBUGREG_WONT_EXIT flag is set in switch_db_regs; after that, further
      accesses to the debug registers will not cause a vmexit.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      facb0139
    • P
      KVM: svm: set/clear all DR intercepts in one swoop · 5315c716
      Paolo Bonzini 提交于
      Unlike other intercepts, debug register intercepts will be modified
      in hot paths if the guest OS is bad or otherwise gets tricked into
      doing so.
      
      Avoid calling recalc_intercepts 16 times for debug registers.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5315c716
    • P
      KVM: nVMX: Allow nested guests to run with dirty debug registers · d16c293e
      Paolo Bonzini 提交于
      When preparing the VMCS02, the CPU-based execution controls is computed
      by vmx_exec_control.  Turn off DR access exits there, too, if the
      KVM_DEBUGREG_WONT_EXIT bit is set in switch_db_regs.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d16c293e
    • P
      KVM: vmx: Allow the guest to run with dirty debug registers · 81908bf4
      Paolo Bonzini 提交于
      When not running in guest-debug mode (i.e. the guest controls the debug
      registers, having to take an exit for each DR access is a waste of time.
      If the guest gets into a state where each context switch causes DR to be
      saved and restored, this can take away as much as 40% of the execution
      time from the guest.
      
      If the guest is running with vcpu->arch.db == vcpu->arch.eff_db, we
      can let it write freely to the debug registers and reload them on the
      next exit.  We still need to exit on the first access, so that the
      KVM_DEBUGREG_WONT_EXIT flag is set in switch_db_regs; after that, further
      accesses to the debug registers will not cause a vmexit.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      81908bf4
    • P
      KVM: x86: Allow the guest to run with dirty debug registers · c77fb5fe
      Paolo Bonzini 提交于
      When not running in guest-debug mode, the guest controls the debug
      registers and having to take an exit for each DR access is a waste
      of time.  If the guest gets into a state where each context switch
      causes DR to be saved and restored, this can take away as much as 40%
      of the execution time from the guest.
      
      After this patch, VMX- and SVM-specific code can set a flag in
      switch_db_regs, telling vcpu_enter_guest that on the next exit the debug
      registers might be dirty and need to be reloaded (syncing will be taken
      care of by a new callback in kvm_x86_ops).  This flag can be set on the
      first access to a debug registers, so that multiple accesses to the
      debug registers only cause one vmexit.
      
      Note that since the guest will be able to read debug registers and
      enable breakpoints in DR7, we need to ensure that they are synchronized
      on entry to the guest---including DR6 that was not synced before.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c77fb5fe
    • P
      KVM: x86: change vcpu->arch.switch_db_regs to a bit mask · 360b948d
      Paolo Bonzini 提交于
      The next patch will add another bit that we can test with the
      same "if".
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      360b948d
    • P
      KVM: vmx: we do rely on loading DR7 on entry · c845f9c6
      Paolo Bonzini 提交于
      Currently, this works even if the bit is not in "min", because the bit is always
      set in MSR_IA32_VMX_ENTRY_CTLS.  Mention it for the sake of documentation, and
      to avoid surprises if we later switch to MSR_IA32_VMX_TRUE_ENTRY_CTLS.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c845f9c6
    • J
      KVM: x86: Remove return code from enable_irq/nmi_window · c9a7953f
      Jan Kiszka 提交于
      It's no longer possible to enter enable_irq_window in guest mode when
      L1 intercepts external interrupts and we are entering L2. This is now
      caught in vcpu_enter_guest. So we can remove the check from the VMX
      version of enable_irq_window, thus the need to return an error code from
      both enable_irq_window and enable_nmi_window.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c9a7953f
    • J
      KVM: nVMX: Do not inject NMI vmexits when L2 has a pending interrupt · 220c5672
      Jan Kiszka 提交于
      According to SDM 27.2.3, IDT vectoring information will not be valid on
      vmexits caused by external NMIs. So we have to avoid creating such
      scenarios by delaying EXIT_REASON_EXCEPTION_NMI injection as long as we
      have a pending interrupt because that one would be migrated to L1's IDT
      vectoring info on nested exit.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      220c5672
    • J
      KVM: nVMX: Fully emulate preemption timer · f4124500
      Jan Kiszka 提交于
      We cannot rely on the hardware-provided preemption timer support because
      we are holding L2 in HLT outside non-root mode. Furthermore, emulating
      the preemption will resolve tick rate errata on older Intel CPUs.
      
      The emulation is based on hrtimer which is started on L2 entry, stopped
      on L2 exit and evaluated via the new check_nested_events hook. As we no
      longer rely on hardware features, we can enable both the preemption
      timer support and value saving unconditionally.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f4124500
    • J
      KVM: nVMX: Rework interception of IRQs and NMIs · b6b8a145
      Jan Kiszka 提交于
      Move the check for leaving L2 on pending and intercepted IRQs or NMIs
      from the *_allowed handler into a dedicated callback. Invoke this
      callback at the relevant points before KVM checks if IRQs/NMIs can be
      injected. The callback has the task to switch from L2 to L1 if needed
      and inject the proper vmexit events.
      
      The rework fixes L2 wakeups from HLT and provides the foundation for
      preemption timer emulation.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b6b8a145
  4. 04 3月, 2014 9 次提交
  5. 03 3月, 2014 13 次提交
  6. 28 2月, 2014 3 次提交
    • P
      kvm, vmx: Really fix lazy FPU on nested guest · 1b385cbd
      Paolo Bonzini 提交于
      Commit e504c909 (kvm, vmx: Fix lazy FPU on nested guest, 2013-11-13)
      highlighted a real problem, but the fix was subtly wrong.
      
      nested_read_cr0 is the CR0 as read by L2, but here we want to look at
      the CR0 value reflecting L1's setup.  In other words, L2 might think
      that TS=0 (so nested_read_cr0 has the bit clear); but if L1 is actually
      running it with TS=1, we should inject the fault into L1.
      
      The effective value of CR0 in L2 is contained in vmcs12->guest_cr0, use
      it.
      
      Fixes: e504c909Reported-by: NKashyap Chamarty <kchamart@redhat.com>
      Reported-by: NStefan Bader <stefan.bader@canonical.com>
      Tested-by: NKashyap Chamarty <kchamart@redhat.com>
      Tested-by: NAnthoine Bourgeois <bourgeois@bertin.fr>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1b385cbd
    • A
      kvm: x86: fix emulator buffer overflow (CVE-2014-0049) · a08d3b3b
      Andrew Honig 提交于
      The problem occurs when the guest performs a pusha with the stack
      address pointing to an mmio address (or an invalid guest physical
      address) to start with, but then extending into an ordinary guest
      physical address.  When doing repeated emulated pushes
      emulator_read_write sets mmio_needed to 1 on the first one.  On a
      later push when the stack points to regular memory,
      mmio_nr_fragments is set to 0, but mmio_is_needed is not set to 0.
      
      As a result, KVM exits to userspace, and then returns to
      complete_emulated_mmio.  In complete_emulated_mmio
      vcpu->mmio_cur_fragment is incremented.  The termination condition of
      vcpu->mmio_cur_fragment == vcpu->mmio_nr_fragments is never achieved.
      The code bounces back and fourth to userspace incrementing
      mmio_cur_fragment past it's buffer.  If the guest does nothing else it
      eventually leads to a a crash on a memcpy from invalid memory address.
      
      However if a guest code can cause the vm to be destroyed in another
      vcpu with excellent timing, then kvm_clear_async_pf_completion_queue
      can be used by the guest to control the data that's pointed to by the
      call to cancel_work_item, which can be used to gain execution.
      
      Fixes: f78146b0Signed-off-by: NAndrew Honig <ahonig@google.com>
      Cc: stable@vger.kernel.org (3.5+)
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a08d3b3b
    • M
      arm/arm64: KVM: detect CPU reset on CPU_PM_EXIT · b20c9f29
      Marc Zyngier 提交于
      Commit 1fcf7ce0 (arm: kvm: implement CPU PM notifier) added
      support for CPU power-management, using a cpu_notifier to re-init
      KVM on a CPU that entered CPU idle.
      
      The code assumed that a CPU entering idle would actually be powered
      off, loosing its state entierely, and would then need to be
      reinitialized. It turns out that this is not always the case, and
      some HW performs CPU PM without actually killing the core. In this
      case, we try to reinitialize KVM while it is still live. It ends up
      badly, as reported by Andre Przywara (using a Calxeda Midway):
      
      [    3.663897] Kernel panic - not syncing: unexpected prefetch abort in Hyp mode at: 0x685760
      [    3.663897] unexpected data abort in Hyp mode at: 0xc067d150
      [    3.663897] unexpected HVC/SVC trap in Hyp mode at: 0xc0901dd0
      
      The trick here is to detect if we've been through a full re-init or
      not by looking at HVBAR (VBAR_EL2 on arm64). This involves
      implementing the backend for __hyp_get_vectors in the main KVM HYP
      code (rather small), and checking the return value against the
      default one when the CPU notifier is called on CPU_PM_EXIT.
      Reported-by: NAndre Przywara <osp@andrep.de>
      Tested-by: NAndre Przywara <osp@andrep.de>
      Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Cc: Rob Herring <rob.herring@linaro.org>
      Acked-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b20c9f29