1. 05 7月, 2019 1 次提交
    • S
      KVM: nVMX: Stash L1's CR3 in vmcs01.GUEST_CR3 on nested entry w/o EPT · f087a029
      Sean Christopherson 提交于
      KVM does not have 100% coverage of VMX consistency checks, i.e. some
      checks that cause VM-Fail may only be detected by hardware during a
      nested VM-Entry.  In such a case, KVM must restore L1's state to the
      pre-VM-Enter state as L2's state has already been loaded into KVM's
      software model.
      
      L1's CR3 and PDPTRs in particular are loaded from vmcs01.GUEST_*.  But
      when EPT is disabled, the associated fields hold KVM's shadow values,
      not L1's "real" values.  Fortunately, when EPT is disabled the PDPTRs
      come from memory, i.e. are not cached in the VMCS.  Which leaves CR3
      as the sole anomaly.
      
      A previously applied workaround to handle CR3 was to force nested early
      checks if EPT is disabled:
      
        commit 2b27924b ("KVM: nVMX: always use early vmcs check when EPT
                               is disabled")
      
      Forcing nested early checks is undesirable as doing so adds hundreds of
      cycles to every nested VM-Entry.  Rather than take this performance hit,
      handle CR3 by overwriting vmcs01.GUEST_CR3 with L1's CR3 during nested
      VM-Entry when EPT is disabled *and* nested early checks are disabled.
      By stuffing vmcs01.GUEST_CR3, nested_vmx_restore_host_state() will
      naturally restore the correct vcpu->arch.cr3 from vmcs01.GUEST_CR3.
      
      These shenanigans work because nested_vmx_restore_host_state() does a
      full kvm_mmu_reset_context(), i.e. unloads the current MMU, which
      guarantees vmcs01.GUEST_CR3 will be rewritten with a new shadow CR3
      prior to re-entering L1.
      
      vcpu->arch.root_mmu.root_hpa is set to INVALID_PAGE via:
      
          nested_vmx_restore_host_state() ->
              kvm_mmu_reset_context() ->
                  kvm_mmu_unload() ->
                      kvm_mmu_free_roots()
      
      kvm_mmu_unload() has WARN_ON(root_hpa != INVALID_PAGE), i.e. we can bank
      on 'root_hpa == INVALID_PAGE' unless the implementation of
      kvm_mmu_reset_context() is changed.
      
      On the way into L1, VMCS.GUEST_CR3 is guaranteed to be written (on a
      successful entry) via:
      
          vcpu_enter_guest() ->
              kvm_mmu_reload() ->
                  kvm_mmu_load() ->
                      kvm_mmu_load_cr3() ->
                          vmx_set_cr3()
      
      Stuff vmcs01.GUEST_CR3 if and only if nested early checks are disabled
      as a "late" VM-Fail should never happen win that case (KVM WARNs), and
      the conditional write avoids the need to restore the correct GUEST_CR3
      when nested_vmx_check_vmentry_hw() fails.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20190607185534.24368-1-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f087a029
  2. 03 7月, 2019 3 次提交
    • J
      kvm: nVMX: Remove unnecessary sync_roots from handle_invept · b1190198
      Jim Mattson 提交于
      When L0 is executing handle_invept(), the TDP MMU is active. Emulating
      an L1 INVEPT does require synchronizing the appropriate shadow EPT
      root(s), but a call to kvm_mmu_sync_roots in this context won't do
      that. Similarly, the hardware TLB and paging-structure-cache entries
      associated with the appropriate shadow EPT root(s) must be flushed,
      but requesting a TLB_FLUSH from this context won't do that either.
      
      How did this ever work? KVM always does a sync_roots and TLB flush (in
      the correct context) when transitioning from L1 to L2. That isn't the
      best choice for nested VM performance, but it effectively papers over
      the mistakes here.
      
      Remove the unnecessary operations and leave a comment to try to do
      better in the future.
      Reported-by: NJunaid Shahid <junaids@google.com>
      Fixes: bfd0a56b ("nEPT: Nested INVEPT")
      Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Cc: Nadav Har'El <nyh@il.ibm.com>
      Cc: Jun Nakajima <jun.nakajima@intel.com>
      Cc: Xinhao Xu <xinhao.xu@intel.com>
      Cc: Yang Zhang <yang.z.zhang@Intel.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Reviewed-by Peter Shier <pshier@google.com>
      Reviewed-by: NJunaid Shahid <junaids@google.com>
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b1190198
    • V
      x86/kvm/nVMX: fix VMCLEAR when Enlightened VMCS is in use · 11e34914
      Vitaly Kuznetsov 提交于
      When Enlightened VMCS is in use, it is valid to do VMCLEAR and,
      according to TLFS, this should "transition an enlightened VMCS from the
      active to the non-active state". It is, however, wrong to assume that
      it is only valid to do VMCLEAR for the eVMCS which is currently active
      on the vCPU performing VMCLEAR.
      
      Currently, the logic in handle_vmclear() is broken: in case, there is no
      active eVMCS on the vCPU doing VMCLEAR we treat the argument as a 'normal'
      VMCS and kvm_vcpu_write_guest() to the 'launch_state' field irreversibly
      corrupts the memory area.
      
      So, in case the VMCLEAR argument is not the current active eVMCS on the
      vCPU, how can we know if the area it is pointing to is a normal or an
      enlightened VMCS?
      Thanks to the bug in Hyper-V (see commit 72aeb60c ("KVM: nVMX: Verify
      eVMCS revision id match supported eVMCS version on eVMCS VMPTRLD")) we can
      not, the revision can't be used to distinguish between them. So let's
      assume it is always enlightened in case enlightened vmentry is enabled in
      the assist page. Also, check if vmx->nested.enlightened_vmcs_enabled to
      minimize the impact for 'unenlightened' workloads.
      
      Fixes: b8bbab92 ("KVM: nVMX: implement enlightened VMPTRLD and VMCLEAR")
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      11e34914
    • V
      x86/KVM/nVMX: don't use clean fields data on enlightened VMLAUNCH · a21a39c2
      Vitaly Kuznetsov 提交于
      Apparently, Windows doesn't maintain clean fields data after it does
      VMCLEAR for an enlightened VMCS so we can only use it on VMRESUME.
      The issue went unnoticed because currently we do nested_release_evmcs()
      in handle_vmclear() and the consecutive enlightened VMPTRLD invalidates
      clean fields when a new eVMCS is mapped but we're going to change the
      logic.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a21a39c2
  3. 02 7月, 2019 2 次提交
  4. 18 6月, 2019 34 次提交