1. 09 1月, 2015 3 次提交
  2. 04 12月, 2014 7 次提交
  3. 24 11月, 2014 1 次提交
  4. 17 11月, 2014 3 次提交
    • N
      KVM: x86: Fix lost interrupt on irr_pending race · f210f757
      Nadav Amit 提交于
      apic_find_highest_irr assumes irr_pending is set if any vector in APIC_IRR is
      set.  If this assumption is broken and apicv is disabled, the injection of
      interrupts may be deferred until another interrupt is delivered to the guest.
      Ultimately, if no other interrupt should be injected to that vCPU, the pending
      interrupt may be lost.
      
      commit 56cc2406 ("KVM: nVMX: fix "acknowledge interrupt on exit" when APICv
      is in use") changed the behavior of apic_clear_irr so irr_pending is cleared
      after setting APIC_IRR vector. After this commit, if apic_set_irr and
      apic_clear_irr run simultaneously, a race may occur, resulting in APIC_IRR
      vector set, and irr_pending cleared. In the following example, assume a single
      vector is set in IRR prior to calling apic_clear_irr:
      
      apic_set_irr				apic_clear_irr
      ------------				--------------
      apic->irr_pending = true;
      					apic_clear_vector(...);
      					vec = apic_search_irr(apic);
      					// => vec == -1
      apic_set_vector(...);
      					apic->irr_pending = (vec != -1);
      					// => apic->irr_pending == false
      
      Nonetheless, it appears the race might even occur prior to this commit:
      
      apic_set_irr				apic_clear_irr
      ------------				--------------
      apic->irr_pending = true;
      					apic->irr_pending = false;
      					apic_clear_vector(...);
      					if (apic_search_irr(apic) != -1)
      						apic->irr_pending = true;
      					// => apic->irr_pending == false
      apic_set_vector(...);
      
      Fixing this issue by:
      1. Restoring the previous behavior of apic_clear_irr: clear irr_pending, call
         apic_clear_vector, and then if APIC_IRR is non-zero, set irr_pending.
      2. On apic_set_irr: first call apic_set_vector, then set irr_pending.
      Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f210f757
    • P
      KVM: compute correct map even if all APICs are software disabled · a3e339e1
      Paolo Bonzini 提交于
      Logical destination mode can be used to send NMI IPIs even when all
      APICs are software disabled, so if all APICs are software disabled we
      should still look at the DFRs.
      
      So the DFRs should all be the same, even if some or all APICs are
      software disabled.  However, the SDM does not say this, so tweak
      the logic as follows:
      
      - if one APIC is enabled and has LDR != 0, use that one to build the map.
      This picks the right DFR in case an OS is only setting it for the
      software-enabled APICs, or in case an OS is using logical addressing
      on some APICs while leaving the rest in reset state (using LDR was
      suggested by Radim).
      
      - if all APICs are disabled, pick a random one to build the map.
      We use the last one with LDR != 0 for simplicity.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a3e339e1
    • N
      KVM: x86: Software disabled APIC should still deliver NMIs · 173beedc
      Nadav Amit 提交于
      Currently, the APIC logical map does not consider VCPUs whose local-apic is
      software-disabled.  However, NMIs, INIT, etc. should still be delivered to such
      VCPUs. Therefore, the APIC mode should first be determined, and then the map,
      considering all VCPUs should be constructed.
      
      To address this issue, first find the APIC mode, and only then construct the
      logical map.
      Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      173beedc
  5. 08 11月, 2014 1 次提交
  6. 07 11月, 2014 1 次提交
  7. 03 11月, 2014 6 次提交
    • R
      KVM: x86: optimize some accesses to LVTT and SPIV · f30ebc31
      Radim Krčmář 提交于
      We mirror a subset of these registers in separate variables.
      Using them directly should be faster.
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f30ebc31
    • R
      KVM: x86: detect LVTT changes under APICv · a323b409
      Radim Krčmář 提交于
      APIC-write VM exits are "trap-like": they save CS:RIP values for the
      instruction after the write, and more importantly, the handler will
      already see the new value in the virtual-APIC page.  This means that
      apic_reg_write cannot use kvm_apic_get_reg to omit timer cancelation
      when mode changes.
      
      timer_mode_mask shouldn't be changing as it depends on cpuid.
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a323b409
    • R
      KVM: x86: detect SPIV changes under APICv · e462755c
      Radim Krčmář 提交于
      APIC-write VM exits are "trap-like": they save CS:RIP values for the
      instruction after the write, and more importantly, the handler will
      already see the new value in the virtual-APIC page.
      
      This caused a bug if you used KVM_SET_IRQCHIP to set the SW-enabled bit
      in the SPIV register.  The chain of events is as follows:
      
      * When the irqchip is added to the destination VM, the apic_sw_disabled
      static key is incremented (1)
      
      * When the KVM_SET_IRQCHIP ioctl is invoked, it is decremented (0)
      
      * When the guest disables the bit in the SPIV register, e.g. as part of
      shutdown, apic_set_spiv does not notice the change and the static key is
      _not_ incremented.
      
      * When the guest is destroyed, the static key is decremented (-1),
      resulting in this trace:
      
        WARNING: at kernel/jump_label.c:81 __static_key_slow_dec+0xa6/0xb0()
        jump label: negative count!
      
        [<ffffffff816bf898>] dump_stack+0x19/0x1b
        [<ffffffff8107c6f1>] warn_slowpath_common+0x61/0x80
        [<ffffffff8107c76c>] warn_slowpath_fmt+0x5c/0x80
        [<ffffffff811931e6>] __static_key_slow_dec+0xa6/0xb0
        [<ffffffff81193226>] static_key_slow_dec_deferred+0x16/0x20
        [<ffffffffa0637698>] kvm_free_lapic+0x88/0xa0 [kvm]
        [<ffffffffa061c63e>] kvm_arch_vcpu_uninit+0x2e/0xe0 [kvm]
        [<ffffffffa05ff301>] kvm_vcpu_uninit+0x21/0x40 [kvm]
        [<ffffffffa067cec7>] vmx_free_vcpu+0x47/0x70 [kvm_intel]
        [<ffffffffa061bc50>] kvm_arch_vcpu_free+0x50/0x60 [kvm]
        [<ffffffffa061ca22>] kvm_arch_destroy_vm+0x102/0x260 [kvm]
        [<ffffffff810b68fd>] ? synchronize_srcu+0x1d/0x20
        [<ffffffffa06030d1>] kvm_put_kvm+0xe1/0x1c0 [kvm]
        [<ffffffffa06036f8>] kvm_vcpu_release+0x18/0x20 [kvm]
        [<ffffffff81215c62>] __fput+0x102/0x310
        [<ffffffff81215f4e>] ____fput+0xe/0x10
        [<ffffffff810ab664>] task_work_run+0xb4/0xe0
        [<ffffffff81083944>] do_exit+0x304/0xc60
        [<ffffffff816c8dfc>] ? _raw_spin_unlock_irq+0x2c/0x50
        [<ffffffff810fd22d>] ?  trace_hardirqs_on_caller+0xfd/0x1c0
        [<ffffffff8108432c>] do_group_exit+0x4c/0xc0
        [<ffffffff810843b4>] SyS_exit_group+0x14/0x20
        [<ffffffff816d33a9>] system_call_fastpath+0x16/0x1b
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e462755c
    • R
      KVM: x86: fix deadline tsc interrupt injection · 1e0ad70c
      Radim Krčmář 提交于
      The check in kvm_set_lapic_tscdeadline_msr() was trying to prevent a
      situation where we lose a pending deadline timer in a MSR write.
      Losing it is fine, because it effectively occurs before the timer fired,
      so we should be able to cancel or postpone it.
      
      Another problem comes from interaction with QEMU, or other userspace
      that can set deadline MSR without a good reason, when timer is already
      pending:  one guest's deadline request results in more than one
      interrupt because one is injected immediately on MSR write from
      userspace and one through hrtimer later.
      
      The solution is to remove the injection when replacing a pending timer
      and to improve the usual QEMU path, we inject without a hrtimer when the
      deadline has already passed.
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      Reported-by: NNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1e0ad70c
    • R
      KVM: x86: add apic_timer_expired() · 5d87db71
      Radim Krčmář 提交于
      Make the code reusable.
      
      If the timer was already pending, we shouldn't be waiting in a queue,
      so wake_up can be skipped, simplifying the path.
      
      There is no 'reinject' case => the comment is removed.
      Current race behaves correctly.
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5d87db71
    • N
      KVM: x86: some apic broadcast modes does not work · 394457a9
      Nadav Amit 提交于
      KVM does not deliver x2APIC broadcast messages with physical mode.  Intel SDM
      (10.12.9 ICR Operation in x2APIC Mode) states: "A destination ID value of
      FFFF_FFFFH is used for broadcast of interrupts in both logical destination and
      physical destination modes."
      
      In addition, the local-apic enables cluster mode broadcast. As Intel SDM
      10.6.2.2 says: "Broadcast to all local APICs is achieved by setting all
      destination bits to one." This patch enables cluster mode broadcast.
      
      The fix tries to combine broadcast in different modes through a unified code.
      
      One rare case occurs when the source of IPI has its APIC disabled.  In such
      case, the source can still issue IPIs, but since the source is not obliged to
      have the same LAPIC mode as the enabled ones, we cannot rely on it.
      Since it is a rare case, it is unoptimized and done on the slow-path.
      Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
      Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
      Reviewed-by: NWanpeng Li <wanpeng.li@linux.intel.com>
      [As per Radim's review, use unsigned int for X2APIC_BROADCAST, return bool from
       kvm_apic_broadcast. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      394457a9
  8. 11 9月, 2014 1 次提交
  9. 19 8月, 2014 2 次提交
  10. 05 8月, 2014 1 次提交
    • W
      KVM: nVMX: fix "acknowledge interrupt on exit" when APICv is in use · 56cc2406
      Wanpeng Li 提交于
      After commit 77b0f5d6 (KVM: nVMX: Ack and write vector info to intr_info
      if L1 asks us to), "Acknowledge interrupt on exit" behavior can be
      emulated. To do so, KVM will ask the APIC for the interrupt vector if
      during a nested vmexit if VM_EXIT_ACK_INTR_ON_EXIT is set.  With APICv,
      kvm_get_apic_interrupt would return -1 and give the following WARNING:
      
      Call Trace:
       [<ffffffff81493563>] dump_stack+0x49/0x5e
       [<ffffffff8103f0eb>] warn_slowpath_common+0x7c/0x96
       [<ffffffffa059709a>] ? nested_vmx_vmexit+0xa4/0x233 [kvm_intel]
       [<ffffffff8103f11a>] warn_slowpath_null+0x15/0x17
       [<ffffffffa059709a>] nested_vmx_vmexit+0xa4/0x233 [kvm_intel]
       [<ffffffffa0594295>] ? nested_vmx_exit_handled+0x6a/0x39e [kvm_intel]
       [<ffffffffa0537931>] ? kvm_apic_has_interrupt+0x80/0xd5 [kvm]
       [<ffffffffa05972ec>] vmx_check_nested_events+0xc3/0xd3 [kvm_intel]
       [<ffffffffa051ebe9>] inject_pending_event+0xd0/0x16e [kvm]
       [<ffffffffa051efa0>] vcpu_enter_guest+0x319/0x704 [kvm]
      
      To fix this, we cannot rely on the processor's virtual interrupt delivery,
      because "acknowledge interrupt on exit" must only update the virtual
      ISR/PPR/IRR registers (and SVI, which is just a cache of the virtual ISR)
      but it should not deliver the interrupt through the IDT.  Thus, KVM has
      to deliver the interrupt "by hand", similar to the treatment of EOI in
      commit fc57ac2c (KVM: lapic: sync highest ISR to hardware apic on
      EOI, 2014-05-14).
      
      The patch modifies kvm_cpu_get_interrupt to always acknowledge an
      interrupt; there are only two callers, and the other is not affected
      because it is never reached with kvm_apic_vid_enabled() == true.  Then it
      modifies apic_set_isr and apic_clear_irr to update SVI and RVI in addition
      to the registers.
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Suggested-by: N"Zhang, Yang Z" <yang.z.zhang@intel.com>
      Tested-by: NLiu, RongrongX <rongrongx.liu@intel.com>
      Tested-by: NFelipe Reyes <freyes@suse.com>
      Fixes: 77b0f5d6
      Cc: stable@vger.kernel.org
      Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      56cc2406
  11. 10 7月, 2014 1 次提交
  12. 27 5月, 2014 1 次提交
    • P
      KVM: lapic: sync highest ISR to hardware apic on EOI · fc57ac2c
      Paolo Bonzini 提交于
      When Hyper-V enlightenments are in effect, Windows prefers to issue an
      Hyper-V MSR write to issue an EOI rather than an x2apic MSR write.
      The Hyper-V MSR write is not handled by the processor, and besides
      being slower, this also causes bugs with APIC virtualization.  The
      reason is that on EOI the processor will modify the highest in-service
      interrupt (SVI) field of the VMCS, as explained in section 29.1.4 of
      the SDM; every other step in EOI virtualization is already done by
      apic_send_eoi or on VM entry, but this one is missing.
      
      We need to do the same, and be careful not to muck with the isr_count
      and highest_isr_cache fields that are unused when virtual interrupt
      delivery is enabled.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: NYang Zhang <yang.z.zhang@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fc57ac2c
  13. 15 1月, 2014 2 次提交
  14. 09 1月, 2014 1 次提交
  15. 31 12月, 2013 1 次提交
  16. 13 12月, 2013 3 次提交
    • G
      KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376) · 17d68b76
      Gleb Natapov 提交于
      A guest can cause a BUG_ON() leading to a host kernel crash.
      When the guest writes to the ICR to request an IPI, while in x2apic
      mode the following things happen, the destination is read from
      ICR2, which is a register that the guest can control.
      
      kvm_irq_delivery_to_apic_fast uses the high 16 bits of ICR2 as the
      cluster id.  A BUG_ON is triggered, which is a protection against
      accessing map->logical_map with an out-of-bounds access and manages
      to avoid that anything really unsafe occurs.
      
      The logic in the code is correct from real HW point of view. The problem
      is that KVM supports only one cluster with ID 0 in clustered mode, but
      the code that has the bug does not take this into account.
      Reported-by: NLars Bull <larsbull@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      17d68b76
    • A
      KVM: x86: Convert vapic synchronization to _cached functions (CVE-2013-6368) · fda4e2e8
      Andy Honig 提交于
      In kvm_lapic_sync_from_vapic and kvm_lapic_sync_to_vapic there is the
      potential to corrupt kernel memory if userspace provides an address that
      is at the end of a page.  This patches concerts those functions to use
      kvm_write_guest_cached and kvm_read_guest_cached.  It also checks the
      vapic_address specified by userspace during ioctl processing and returns
      an error to userspace if the address is not a valid GPA.
      
      This is generally not guest triggerable, because the required write is
      done by firmware that runs before the guest.  Also, it only affects AMD
      processors and oldish Intel that do not have the FlexPriority feature
      (unless you disable FlexPriority, of course; then newer processors are
      also affected).
      
      Fixes: b93463aa ('KVM: Accelerated apic support')
      Reported-by: NAndrew Honig <ahonig@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NAndrew Honig <ahonig@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fda4e2e8
    • A
      KVM: x86: Fix potential divide by 0 in lapic (CVE-2013-6367) · b963a22e
      Andy Honig 提交于
      Under guest controllable circumstances apic_get_tmcct will execute a
      divide by zero and cause a crash.  If the guest cpuid support
      tsc deadline timers and performs the following sequence of requests
      the host will crash.
      - Set the mode to periodic
      - Set the TMICT to 0
      - Set the mode bits to 11 (neither periodic, nor one shot, nor tsc deadline)
      - Set the TMICT to non-zero.
      Then the lapic_timer.period will be 0, but the TMICT will not be.  If the
      guest then reads from the TMCCT then the host will perform a divide by 0.
      
      This patch ensures that if the lapic_timer.period is 0, then the division
      does not occur.
      Reported-by: NAndrew Honig <ahonig@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NAndrew Honig <ahonig@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b963a22e
  17. 26 8月, 2013 1 次提交
  18. 25 7月, 2013 2 次提交
  19. 27 6月, 2013 1 次提交
  20. 03 6月, 2013 1 次提交
    • G
      KVM: Fix race in apic->pending_events processing · 299018f4
      Gleb Natapov 提交于
      apic->pending_events processing has a race that may cause INIT and
      SIPI
      processing to be reordered:
      
      vpu0:                            vcpu1:
      set INIT
                                     test_and_clear_bit(KVM_APIC_INIT)
                                        process INIT
      set INIT
      set SIPI
                                     test_and_clear_bit(KVM_APIC_SIPI)
                                        process SIPI
      
      At the end INIT is left pending in pending_events. The following patch
      fixes this by latching pending event before processing them.
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      299018f4