1. 12 11月, 2019 4 次提交
    • J
      KVM: VMX: Do not change PID.NDST when loading a blocked vCPU · 132194ff
      Joao Martins 提交于
      When vCPU enters block phase, pi_pre_block() inserts vCPU to a per pCPU
      linked list of all vCPUs that are blocked on this pCPU. Afterwards, it
      changes PID.NV to POSTED_INTR_WAKEUP_VECTOR which its handler
      (wakeup_handler()) is responsible to kick (unblock) any vCPU on that
      linked list that now has pending posted interrupts.
      
      While vCPU is blocked (in kvm_vcpu_block()), it may be preempted which
      will cause vmx_vcpu_pi_put() to set PID.SN.  If later the vCPU will be
      scheduled to run on a different pCPU, vmx_vcpu_pi_load() will clear
      PID.SN but will also *overwrite PID.NDST to this different pCPU*.
      Instead of keeping it with original pCPU which vCPU had entered block
      phase on.
      
      This results in an issue because when a posted interrupt is delivered, as
      the wakeup_handler() will be executed and fail to find blocked vCPU on
      its per pCPU linked list of all vCPUs that are blocked on this pCPU.
      Which is due to the vCPU being placed on a *different* per pCPU
      linked list i.e. the original pCPU in which it entered block phase.
      
      The regression is introduced by commit c112b5f5 ("KVM: x86:
      Recompute PID.ON when clearing PID.SN"). Therefore, partially revert
      it and reintroduce the condition in vmx_vcpu_pi_load() responsible for
      avoiding changing PID.NDST when loading a blocked vCPU.
      
      Fixes: c112b5f5 ("KVM: x86: Recompute PID.ON when clearing PID.SN")
      Tested-by: NNathan Ni <nathan.ni@oracle.com>
      Co-developed-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NJoao Martins <joao.m.martins@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      132194ff
    • J
      KVM: VMX: Consider PID.PIR to determine if vCPU has pending interrupts · 9482ae45
      Joao Martins 提交于
      Commit 17e433b5 ("KVM: Fix leak vCPU's VMCS value into other pCPU")
      introduced vmx_dy_apicv_has_pending_interrupt() in order to determine
      if a vCPU have a pending posted interrupt. This routine is used by
      kvm_vcpu_on_spin() when searching for a a new runnable vCPU to schedule
      on pCPU instead of a vCPU doing busy loop.
      
      vmx_dy_apicv_has_pending_interrupt() determines if a
      vCPU has a pending posted interrupt solely based on PID.ON. However,
      when a vCPU is preempted, vmx_vcpu_pi_put() sets PID.SN which cause
      raised posted interrupts to only set bit in PID.PIR without setting
      PID.ON (and without sending notification vector), as depicted in VT-d
      manual section 5.2.3 "Interrupt-Posting Hardware Operation".
      
      Therefore, checking PID.ON is insufficient to determine if a vCPU has
      pending posted interrupts and instead we should also check if there is
      some bit set on PID.PIR if PID.SN=1.
      
      Fixes: 17e433b5 ("KVM: Fix leak vCPU's VMCS value into other pCPU")
      Reviewed-by: NJagannathan Raman <jag.raman@oracle.com>
      Co-developed-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NJoao Martins <joao.m.martins@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9482ae45
    • L
      KVM: VMX: Fix comment to specify PID.ON instead of PIR.ON · d9ff2744
      Liran Alon 提交于
      The Outstanding Notification (ON) bit is part of the Posted Interrupt
      Descriptor (PID) as opposed to the Posted Interrupts Register (PIR).
      The latter is a bitmap for pending vectors.
      Reviewed-by: NJoao Martins <joao.m.martins@oracle.com>
      Signed-off-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d9ff2744
    • C
      KVM: X86: Fix initialization of MSR lists · 7a5ee6ed
      Chenyi Qiang 提交于
      The three MSR lists(msrs_to_save[], emulated_msrs[] and
      msr_based_features[]) are global arrays of kvm.ko, which are
      adjusted (copy supported MSRs forward to override the unsupported MSRs)
      when insmod kvm-{intel,amd}.ko, but it doesn't reset these three arrays
      to their initial value when rmmod kvm-{intel,amd}.ko. Thus, at the next
      installation, kvm-{intel,amd}.ko will do operations on the modified
      arrays with some MSRs lost and some MSRs duplicated.
      
      So define three constant arrays to hold the initial MSR lists and
      initialize msrs_to_save[], emulated_msrs[] and msr_based_features[]
      based on the constant arrays.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: NXiaoyao Li <xiaoyao.li@intel.com>
      Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
      [Remove now useless conditionals. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7a5ee6ed
  2. 31 10月, 2019 1 次提交
    • P
      KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active · 9167ab79
      Paolo Bonzini 提交于
      VMX already does so if the host has SMEP, in order to support the combination of
      CR0.WP=1 and CR4.SMEP=1.  However, it is perfectly safe to always do so, and in
      fact VMX already ends up running with EFER.NXE=1 on old processors that lack the
      "load EFER" controls, because it may help avoiding a slow MSR write.  Removing
      all the conditionals simplifies the code.
      
      SVM does not have similar code, but it should since recent AMD processors do
      support SMEP.  So this patch also makes the code for the two vendors more similar
      while fixing NPT=0, CR0.WP=1 and CR4.SMEP=1 on AMD processors.
      
      Cc: stable@vger.kernel.org
      Cc: Joerg Roedel <jroedel@suse.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9167ab79
  3. 23 10月, 2019 2 次提交
    • J
      KVM: nVMX: Don't leak L1 MMIO regions to L2 · 671ddc70
      Jim Mattson 提交于
      If the "virtualize APIC accesses" VM-execution control is set in the
      VMCS, the APIC virtualization hardware is triggered when a page walk
      in VMX non-root mode terminates at a PTE wherein the address of the 4k
      page frame matches the APIC-access address specified in the VMCS. On
      hardware, the APIC-access address may be any valid 4k-aligned physical
      address.
      
      KVM's nVMX implementation enforces the additional constraint that the
      APIC-access address specified in the vmcs12 must be backed by
      a "struct page" in L1. If not, L0 will simply clear the "virtualize
      APIC accesses" VM-execution control in the vmcs02.
      
      The problem with this approach is that the L1 guest has arranged the
      vmcs12 EPT tables--or shadow page tables, if the "enable EPT"
      VM-execution control is clear in the vmcs12--so that the L2 guest
      physical address(es)--or L2 guest linear address(es)--that reference
      the L2 APIC map to the APIC-access address specified in the
      vmcs12. Without the "virtualize APIC accesses" VM-execution control in
      the vmcs02, the APIC accesses in the L2 guest will directly access the
      APIC-access page in L1.
      
      When there is no mapping whatsoever for the APIC-access address in L1,
      the L2 VM just loses the intended APIC virtualization. However, when
      the APIC-access address is mapped to an MMIO region in L1, the L2
      guest gets direct access to the L1 MMIO device. For example, if the
      APIC-access address specified in the vmcs12 is 0xfee00000, then L2
      gets direct access to L1's APIC.
      
      Since this vmcs12 configuration is something that KVM cannot
      faithfully emulate, the appropriate response is to exit to userspace
      with KVM_INTERNAL_ERROR_EMULATION.
      
      Fixes: fe3ef05c ("KVM: nVMX: Prepare vmcs02 from vmcs01 and vmcs12")
      Reported-by: NDan Cross <dcross@google.com>
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NPeter Shier <pshier@google.com>
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      671ddc70
    • M
      KVM: SVM: Fix potential wrong physical id in avic_handle_ldr_update · 5c94ac5d
      Miaohe Lin 提交于
      Guest physical APIC ID may not equal to vcpu->vcpu_id in some case.
      We may set the wrong physical id in avic_handle_ldr_update as we
      always use vcpu->vcpu_id. Get physical APIC ID from vAPIC page
      instead.
      Export and use kvm_xapic_id here and in avic_handle_apic_id_update
      as suggested by Vitaly.
      Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5c94ac5d
  4. 22 10月, 2019 4 次提交
  5. 04 10月, 2019 1 次提交
  6. 03 10月, 2019 2 次提交
  7. 01 10月, 2019 2 次提交
  8. 28 9月, 2019 1 次提交
    • W
      KVM: VMX: Set VMENTER_L1D_FLUSH_NOT_REQUIRED if !X86_BUG_L1TF · 19a36d32
      Waiman Long 提交于
      The l1tf_vmx_mitigation is only set to VMENTER_L1D_FLUSH_NOT_REQUIRED
      when the ARCH_CAPABILITIES MSR indicates that L1D flush is not required.
      However, if the CPU is not affected by L1TF, l1tf_vmx_mitigation will
      still be set to VMENTER_L1D_FLUSH_AUTO. This is certainly not the best
      option for a !X86_BUG_L1TF CPU.
      
      So force l1tf_vmx_mitigation to VMENTER_L1D_FLUSH_NOT_REQUIRED to make it
      more explicit in case users are checking the vmentry_l1d_flush parameter.
      Signed-off-by: NWaiman Long <longman@redhat.com>
      [Patch rewritten accoring to Borislav Petkov's suggestion. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      19a36d32
  9. 27 9月, 2019 2 次提交
    • P
      KVM: x86: fix nested guest live migration with PML · 1f4e5fc8
      Paolo Bonzini 提交于
      Shadow paging is fundamentally incompatible with the page-modification
      log, because the GPAs in the log come from the wrong memory map.
      In particular, for the EPT page-modification log, the GPAs in the log come
      from L2 rather than L1.  (If there was a non-EPT page-modification log,
      we couldn't use it for shadow paging because it would log GVAs rather
      than GPAs).
      
      Therefore, we need to rely on write protection to record dirty pages.
      This has the side effect of bypassing PML, since writes now result in an
      EPT violation vmexit.
      
      This is relatively easy to add to KVM, because pretty much the only place
      that needs changing is spte_clear_dirty.  The first access to the page
      already goes through the page fault path and records the correct GPA;
      it's only subsequent accesses that are wrong.  Therefore, we can equip
      set_spte (where the first access happens) to record that the SPTE will
      have to be write protected, and then spte_clear_dirty will use this
      information to do the right thing.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1f4e5fc8
    • P
      KVM: x86: assign two bits to track SPTE kinds · 6eeb4ef0
      Paolo Bonzini 提交于
      Currently, we are overloading SPTE_SPECIAL_MASK to mean both
      "A/D bits unavailable" and MMIO, where the difference between the
      two is determined by mio_mask and mmio_value.
      
      However, the next patch will need two bits to distinguish
      availability of A/D bits from write protection.  So, while at
      it give MMIO its own bit pattern, and move the two bits from
      bit 62 to bits 52..53 since Intel is allocating EPT page table
      bits from the top.
      Reviewed-by: NJunaid Shahid <junaids@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6eeb4ef0
  10. 26 9月, 2019 8 次提交
  11. 25 9月, 2019 4 次提交
  12. 24 9月, 2019 9 次提交