1. 01 10月, 2015 4 次提交
  2. 25 9月, 2015 1 次提交
    • P
      KVM: svm: do not call kvm_set_cr0 from init_vmcb · 79a8059d
      Paolo Bonzini 提交于
      kvm_set_cr0 may want to call kvm_zap_gfn_range and thus access the
      memslots array (SRCU protected).  Using a mini SRCU critical section
      is ugly, and adding it to kvm_arch_vcpu_create doesn't work because
      the VMX vcpu_create callback calls synchronize_srcu.
      
      Fixes this lockdep splat:
      
      ===============================
      [ INFO: suspicious RCU usage. ]
      4.3.0-rc1+ #1 Not tainted
      -------------------------------
      include/linux/kvm_host.h:488 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      rcu_scheduler_active = 1, debug_locks = 0
      1 lock held by qemu-system-i38/17000:
       #0:  (&(&kvm->mmu_lock)->rlock){+.+...}, at: kvm_zap_gfn_range+0x24/0x1a0 [kvm]
      
      [...]
      Call Trace:
       dump_stack+0x4e/0x84
       lockdep_rcu_suspicious+0xfd/0x130
       kvm_zap_gfn_range+0x188/0x1a0 [kvm]
       kvm_set_cr0+0xde/0x1e0 [kvm]
       init_vmcb+0x760/0xad0 [kvm_amd]
       svm_create_vcpu+0x197/0x250 [kvm_amd]
       kvm_arch_vcpu_create+0x47/0x70 [kvm]
       kvm_vm_ioctl+0x302/0x7e0 [kvm]
       ? __lock_is_held+0x51/0x70
       ? __fget+0x101/0x210
       do_vfs_ioctl+0x2f4/0x560
       ? __fget_light+0x29/0x90
       SyS_ioctl+0x4c/0x90
       entry_SYSCALL_64_fastpath+0x16/0x73
      Reported-by: NBorislav Petkov <bp@alien8.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      79a8059d
  3. 18 9月, 2015 1 次提交
    • I
      kvm: svm: reset mmu on VCPU reset · ebae871a
      Igor Mammedov 提交于
      When INIT/SIPI sequence is sent to VCPU which before that
      was in use by OS, VMRUN might fail with:
      
       KVM: entry failed, hardware error 0xffffffff
       EAX=00000000 EBX=00000000 ECX=00000000 EDX=000006d3
       ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
       EIP=00000000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
       ES =0000 00000000 0000ffff 00009300
       CS =9a00 0009a000 0000ffff 00009a00
       [...]
       CR0=60000010 CR2=b6f3e000 CR3=01942000 CR4=000007e0
       [...]
       EFER=0000000000000000
      
      with corresponding SVM error:
       KVM: FAILED VMRUN WITH VMCB:
       [...]
       cpl:            0                efer:         0000000000001000
       cr0:            0000000080010010 cr2:          00007fd7fe85bf90
       cr3:            0000000187d0c000 cr4:          0000000000000020
       [...]
      
      What happens is that VCPU state right after offlinig:
      CR0: 0x80050033  EFER: 0xd01  CR4: 0x7e0
        -> long mode with CR3 pointing to longmode page tables
      
      and when VCPU gets INIT/SIPI following transition happens
      CR0: 0 -> 0x60000010 EFER: 0x0  CR4: 0x7e0
        -> paging disabled with stale CR3
      
      However SVM under the hood puts VCPU in Paged Real Mode*
      which effectively translates CR0 0x60000010 -> 80010010 after
      
         svm_vcpu_reset()
             -> init_vmcb()
                 -> kvm_set_cr0()
                     -> svm_set_cr0()
      
      but from  kvm_set_cr0() perspective CR0: 0 -> 0x60000010
      only caching bits are changed and
      commit d81135a5
       ("KVM: x86: do not reset mmu if CR0.CD and CR0.NW are changed")'
      regressed svm_vcpu_reset() which relied on MMU being reset.
      
      As result VMRUN after svm_vcpu_reset() tries to run
      VCPU in Paged Real Mode with stale MMU context (longmode page tables),
      which causes some AMD CPUs** to bail out with VMEXIT_INVALID.
      
      Fix issue by unconditionally resetting MMU context
      at init_vmcb() time.
      
      	* AMD64 Architecture Programmer’s Manual,
      	    Volume 2: System Programming, rev: 3.25
      	      15.19 Paged Real Mode
      	** Opteron 1216
      Signed-off-by: NIgor Mammedov <imammedo@redhat.com>
      Fixes: d81135a5
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ebae871a
  4. 05 8月, 2015 1 次提交
  5. 23 7月, 2015 3 次提交
  6. 10 7月, 2015 3 次提交
  7. 06 7月, 2015 1 次提交
  8. 23 6月, 2015 1 次提交
  9. 19 6月, 2015 1 次提交
  10. 05 6月, 2015 2 次提交
  11. 04 6月, 2015 2 次提交
  12. 20 5月, 2015 1 次提交
  13. 14 5月, 2015 1 次提交
  14. 07 5月, 2015 3 次提交
    • R
      KVM: x86: fix initial PAT value · 74545705
      Radim Krčmář 提交于
      PAT should be 0007_0406_0007_0406h on RESET and not modified on INIT.
      VMX used a wrong value (host's PAT) and while SVM used the right one,
      it never got to arch.pat.
      
      This is not an issue with QEMU as it will force the correct value.
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      74545705
    • N
      KVM: x86: INIT and reset sequences are different · d28bc9dd
      Nadav Amit 提交于
      x86 architecture defines differences between the reset and INIT sequences.
      INIT does not initialize the FPU (including MMX, XMM, YMM, etc.), TSC, PMU,
      MSRs (in general), MTRRs machine-check, APIC ID, APIC arbitration ID and BSP.
      
      References (from Intel SDM):
      
      "If the MP protocol has completed and a BSP is chosen, subsequent INITs (either
      to a specific processor or system wide) do not cause the MP protocol to be
      repeated." [8.4.2: MP Initialization Protocol Requirements and Restrictions]
      
      [Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT]
      
      "If the processor is reset by asserting the INIT# pin, the x87 FPU state is not
      changed." [9.2: X87 FPU INITIALIZATION]
      
      "The state of the local APIC following an INIT reset is the same as it is after
      a power-up or hardware reset, except that the APIC ID and arbitration ID
      registers are not affected." [10.4.7.3: Local APIC State After an INIT Reset
      ("Wait-for-SIPI" State)]
      Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
      Message-Id: <1428924848-28212-1-git-send-email-namit@cs.technion.ac.il>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d28bc9dd
    • N
      KVM: x86: Support for disabling quirks · 90de4a18
      Nadav Amit 提交于
      Introducing KVM_CAP_DISABLE_QUIRKS for disabling x86 quirks that were previous
      created in order to overcome QEMU issues. Those issue were mostly result of
      invalid VM BIOS.  Currently there are two quirks that can be disabled:
      
      1. KVM_QUIRK_LINT0_REENABLED - LINT0 was enabled after boot
      2. KVM_QUIRK_CD_NW_CLEARED - CD and NW are cleared after boot
      
      These two issues are already resolved in recent releases of QEMU, and would
      therefore be disabled by QEMU.
      Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
      Message-Id: <1428879221-29996-1-git-send-email-namit@cs.technion.ac.il>
      [Report capability from KVM_CHECK_EXTENSION too. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      90de4a18
  15. 08 4月, 2015 1 次提交
    • N
      KVM: x86: BSP in MSR_IA32_APICBASE is writable · 58d269d8
      Nadav Amit 提交于
      After reset, the CPU can change the BSP, which will be used upon INIT.  Reset
      should return the BSP which QEMU asked for, and therefore handled accordingly.
      
      To quote: "If the MP protocol has completed and a BSP is chosen, subsequent
      INITs (either to a specific processor or system wide) do not cause the MP
      protocol to be repeated."
      [Intel SDM 8.4.2: MP Initialization Protocol Requirements and Restrictions]
      Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
      Message-Id: <1427933438-12782-3-git-send-email-namit@cs.technion.ac.il>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      58d269d8
  16. 19 3月, 2015 1 次提交
  17. 18 3月, 2015 1 次提交
  18. 13 3月, 2015 1 次提交
  19. 11 3月, 2015 2 次提交
  20. 10 3月, 2015 1 次提交
  21. 03 3月, 2015 1 次提交
    • R
      KVM: SVM: fix interrupt injection (apic->isr_count always 0) · f563db4b
      Radim Krčmář 提交于
      In commit b4eef9b3, we started to use hwapic_isr_update() != NULL
      instead of kvm_apic_vid_enabled(vcpu->kvm).  This didn't work because
      SVM had it defined and "apicv" path in apic_{set,clear}_isr() does not
      change apic->isr_count, because it should always be 1.  The initial
      value of apic->isr_count was based on kvm_apic_vid_enabled(vcpu->kvm),
      which is always 0 for SVM, so KVM could have injected interrupts when it
      shouldn't.
      
      Fix it by implicitly setting SVM's hwapic_isr_update to NULL and make the
      initial isr_count depend on hwapic_isr_update() for good measure.
      
      Fixes: b4eef9b3 ("kvm: x86: vmx: NULL out hwapic_isr_update() in case of !enable_apicv")
      Reported-and-tested-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      f563db4b
  22. 04 2月, 2015 1 次提交
  23. 09 1月, 2015 1 次提交
  24. 05 12月, 2014 1 次提交
  25. 13 11月, 2014 1 次提交
    • C
      kvm: svm: move WARN_ON in svm_adjust_tsc_offset · d913b904
      Chris J Arges 提交于
      When running the tsc_adjust kvm-unit-test on an AMD processor with the
      IA32_TSC_ADJUST feature enabled, the WARN_ON in svm_adjust_tsc_offset can be
      triggered. This WARN_ON checks for a negative adjustment in case __scale_tsc
      is called; however it may trigger unnecessary warnings.
      
      This patch moves the WARN_ON to trigger only if __scale_tsc will actually be
      called from svm_adjust_tsc_offset. In addition make adj in kvm_set_msr_common
      s64 since this can have signed values.
      Signed-off-by: NChris J Arges <chris.j.arges@canonical.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d913b904
  26. 03 11月, 2014 1 次提交
    • N
      KVM: vmx: Unavailable DR4/5 is checked before CPL · 16f8a6f9
      Nadav Amit 提交于
      If DR4/5 is accessed when it is unavailable (since CR4.DE is set), then #UD
      should be generated even if CPL>0. This is according to Intel SDM Table 6-2:
      "Priority Among Simultaneous Exceptions and Interrupts".
      
      Note, that this may happen on the first DR access, even if the host does not
      sets debug breakpoints. Obviously, it occurs when the host debugs the guest.
      
      This patch moves the DR4/5 checks from __kvm_set_dr/_kvm_get_dr to handle_dr.
      The emulator already checks DR4/5 availability in check_dr_read. Nested
      virutalization related calls to kvm_set_dr/kvm_get_dr would not like to inject
      exceptions to the guest.
      
      As for SVM, the patch follows the previous logic as much as possible. Anyhow,
      it appears the DR interception code might be buggy - even if the DR access
      may cause an exception, the instruction is skipped.
      Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      16f8a6f9
  27. 24 10月, 2014 2 次提交
    • M
      kvm: x86: don't kill guest on unknown exit reason · 2bc19dc3
      Michael S. Tsirkin 提交于
      KVM_EXIT_UNKNOWN is a kvm bug, we don't really know whether it was
      triggered by a priveledged application.  Let's not kill the guest: WARN
      and inject #UD instead.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2bc19dc3
    • N
      KVM: x86: Check non-canonical addresses upon WRMSR · 854e8bb1
      Nadav Amit 提交于
      Upon WRMSR, the CPU should inject #GP if a non-canonical value (address) is
      written to certain MSRs. The behavior is "almost" identical for AMD and Intel
      (ignoring MSRs that are not implemented in either architecture since they would
      anyhow #GP). However, IA32_SYSENTER_ESP and IA32_SYSENTER_EIP cause #GP if
      non-canonical address is written on Intel but not on AMD (which ignores the top
      32-bits).
      
      Accordingly, this patch injects a #GP on the MSRs which behave identically on
      Intel and AMD.  To eliminate the differences between the architecutres, the
      value which is written to IA32_SYSENTER_ESP and IA32_SYSENTER_EIP is turned to
      canonical value before writing instead of injecting a #GP.
      
      Some references from Intel and AMD manuals:
      
      According to Intel SDM description of WRMSR instruction #GP is expected on
      WRMSR "If the source register contains a non-canonical address and ECX
      specifies one of the following MSRs: IA32_DS_AREA, IA32_FS_BASE, IA32_GS_BASE,
      IA32_KERNEL_GS_BASE, IA32_LSTAR, IA32_SYSENTER_EIP, IA32_SYSENTER_ESP."
      
      According to AMD manual instruction manual:
      LSTAR/CSTAR (SYSCALL): "The WRMSR instruction loads the target RIP into the
      LSTAR and CSTAR registers.  If an RIP written by WRMSR is not in canonical
      form, a general-protection exception (#GP) occurs."
      IA32_GS_BASE and IA32_FS_BASE (WRFSBASE/WRGSBASE): "The address written to the
      base field must be in canonical form or a #GP fault will occur."
      IA32_KERNEL_GS_BASE (SWAPGS): "The address stored in the KernelGSbase MSR must
      be in canonical form."
      
      This patch fixes CVE-2014-3610.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      854e8bb1