1. 29 3月, 2019 5 次提交
    • B
      kvm: mmu: Used range based flushing in slot_handle_level_range · f285c633
      Ben Gardon 提交于
      Replace kvm_flush_remote_tlbs with kvm_flush_remote_tlbs_with_address
      in slot_handle_level_range. When range based flushes are not enabled
      kvm_flush_remote_tlbs_with_address falls back to kvm_flush_remote_tlbs.
      
      This changes the behavior of many functions that indirectly use
      slot_handle_level_range, iff the range based flushes are enabled. The
      only potential problem I see with this is that kvm->tlbs_dirty will be
      cleared less often, however the only caller of slot_handle_level_range that
      checks tlbs_dirty is kvm_mmu_notifier_invalidate_range_start which
      checks it and does a kvm_flush_remote_tlbs after calling
      kvm_unmap_hva_range anyway.
      
      Tested: Ran all kvm-unit-tests on a Intel Haswell machine with and
      	without this patch. The patch introduced no new failures.
      Signed-off-by: NBen Gardon <bgardon@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f285c633
    • W
      KVM: x86: remove check on nr_mmu_pages in kvm_arch_commit_memory_region() · 4d66623c
      Wei Yang 提交于
      * nr_mmu_pages would be non-zero only if kvm->arch.n_requested_mmu_pages is
        non-zero.
      
      * nr_mmu_pages is always non-zero, since kvm_mmu_calculate_mmu_pages()
        never return zero.
      
      Based on these two reasons, we can merge the two *if* clause and use the
      return value from kvm_mmu_calculate_mmu_pages() directly. This simplify
      the code and also eliminate the possibility for reader to believe
      nr_mmu_pages would be zero.
      Signed-off-by: NWei Yang <richard.weiyang@gmail.com>
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4d66623c
    • S
      KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation) · 05d5a486
      Singh, Brijesh 提交于
      Errata#1096:
      
      On a nested data page fault when CR.SMAP=1 and the guest data read
      generates a SMAP violation, GuestInstrBytes field of the VMCB on a
      VMEXIT will incorrectly return 0h instead the correct guest
      instruction bytes .
      
      Recommend Workaround:
      
      To determine what instruction the guest was executing the hypervisor
      will have to decode the instruction at the instruction pointer.
      
      The recommended workaround can not be implemented for the SEV
      guest because guest memory is encrypted with the guest specific key,
      and instruction decoder will not be able to decode the instruction
      bytes. If we hit this errata in the SEV guest then log the message
      and request a guest shutdown.
      Reported-by: NVenkatesh Srinivas <venkateshs@google.com>
      Cc: Jim Mattson <jmattson@google.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      05d5a486
    • S
      KVM: x86: fix handling of role.cr4_pae and rename it to 'gpte_size' · 47c42e6b
      Sean Christopherson 提交于
      The cr4_pae flag is a bit of a misnomer, its purpose is really to track
      whether the guest PTE that is being shadowed is a 4-byte entry or an
      8-byte entry.  Prior to supporting nested EPT, the size of the gpte was
      reflected purely by CR4.PAE.  KVM fudged things a bit for direct sptes,
      but it was mostly harmless since the size of the gpte never mattered.
      Now that a spte may be tracking an indirect EPT entry, relying on
      CR4.PAE is wrong and ill-named.
      
      For direct shadow pages, force the gpte_size to '1' as they are always
      8-byte entries; EPT entries can only be 8-bytes and KVM always uses
      8-byte entries for NPT and its identity map (when running with EPT but
      not unrestricted guest).
      
      Likewise, nested EPT entries are always 8-bytes.  Nested EPT presents a
      unique scenario as the size of the entries are not dictated by CR4.PAE,
      but neither is the shadow page a direct map.  To handle this scenario,
      set cr0_wp=1 and smap_andnot_wp=1, an otherwise impossible combination,
      to denote a nested EPT shadow page.  Use the information to avoid
      incorrectly zapping an unsync'd indirect page in __kvm_sync_page().
      
      Providing a consistent and accurate gpte_size fixes a bug reported by
      Vitaly where fast_cr3_switch() always fails when switching from L2 to
      L1 as kvm_mmu_get_page() would force role.cr4_pae=0 for direct pages,
      whereas kvm_calc_mmu_role_common() would set it according to CR4.PAE.
      
      Fixes: 7dcd5755 ("x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is needed")
      Reported-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Tested-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      47c42e6b
    • S
      KVM: nVMX: Do not inherit quadrant and invalid for the root shadow EPT · 552c69b1
      Sean Christopherson 提交于
      Explicitly zero out quadrant and invalid instead of inheriting them from
      the root_mmu.  Functionally, this patch is a nop as we (should) never
      set quadrant for a direct mapped (EPT) root_mmu and nested EPT is only
      allowed if EPT is used for L1, and the root_mmu will never be invalid at
      this point.
      
      Explicitly setting flags sets the stage for repurposing the legacy
      paging bits in role, e.g. nxe, cr0_wp, and sm{a,e}p_andnot_wp, at which
      point 'smm' would be the only flag to be inherited from root_mmu.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      552c69b1
  2. 16 3月, 2019 1 次提交
    • B
      Revert "KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()" · 92da008f
      Ben Gardon 提交于
      This reverts commit 71883a62.
      
      The above commit contains an optimization to kvm_zap_gfn_range which
      uses gfn-limited TLB flushes, if enabled. If using these limited flushes,
      kvm_zap_gfn_range passes lock_flush_tlb=false to slot_handle_level_range
      which creates a race when the function unlocks to call cond_resched.
      See an example of this race below:
      
      CPU 0                   CPU 1                           CPU 3
      // zap_direct_gfn_range
      mmu_lock()
      // *ptep == pte_1
      *ptep = 0
      if (lock_flush_tlb)
              flush_tlbs()
      mmu_unlock()
                              // In invalidate range
                              // MMU notifier
                              mmu_lock()
                              if (pte != 0)
                                      *ptep = 0
                                      flush = true
                              if (flush)
                                      flush_remote_tlbs()
                              mmu_unlock()
                              return
                              // Host MM reallocates
                              // page previously
                              // backing guest memory.
                                                              // Guest accesses
                                                              // invalid page
                                                              // through pte_1
                                                              // in its TLB!!
      
      Tested: Ran all kvm-unit-tests on a Intel Haswell machine with and
      	without this patch. The patch introduced no new failures.
      Signed-off-by: NBen Gardon <bgardon@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      92da008f
  3. 23 2月, 2019 2 次提交
    • Y
      KVM: MMU: record maximum physical address width in kvm_mmu_extended_role · de3ccd26
      Yu Zhang 提交于
      Previously, commit 7dcd5755 ("x86/kvm/mmu: check if tdp/shadow
      MMU reconfiguration is needed") offered some optimization to avoid
      the unnecessary reconfiguration. Yet one scenario is broken - when
      cpuid changes VM's maximum physical address width, reconfiguration
      is needed to reset the reserved bits.  Also, the TDP may need to
      reset its shadow_root_level when this value is changed.
      
      To fix this, a new field, maxphyaddr, is introduced in the extended
      role structure to keep track of the configured guest physical address
      width.
      Signed-off-by: NYu Zhang <yu.c.zhang@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      de3ccd26
    • V
      x86/kvm/mmu: fix switch between root and guest MMUs · ad7dc69a
      Vitaly Kuznetsov 提交于
      Commit 14c07ad8 ("x86/kvm/mmu: introduce guest_mmu") brought one subtle
      change: previously, when switching back from L2 to L1, we were resetting
      MMU hooks (like mmu->get_cr3()) in kvm_init_mmu() called from
      nested_vmx_load_cr3() and now we do that in nested_ept_uninit_mmu_context()
      when we re-target vcpu->arch.mmu pointer.
      The change itself looks logical: if nested_ept_init_mmu_context() changes
      something than nested_ept_uninit_mmu_context() restores it back. There is,
      however, one thing: the following call chain:
      
       nested_vmx_load_cr3()
        kvm_mmu_new_cr3()
          __kvm_mmu_new_cr3()
            fast_cr3_switch()
              cached_root_available()
      
      now happens with MMU hooks pointing to the new MMU (root MMU in our case)
      while previously it was happening with the old one. cached_root_available()
      tries to stash current root but it is incorrect to read current CR3 with
      mmu->get_cr3(), we need to use old_mmu->get_cr3() which in case we're
      switching from L2 to L1 is guest_mmu. (BTW, in shadow page tables case this
      is a non-issue because we don't switch MMU).
      
      While we could've tried to guess that we're switching between MMUs and call
      the right ->get_cr3() from cached_root_available() this seems to be overly
      complicated. Instead, just stash the corresponding CR3 when setting
      root_hpa and make cached_root_available() use the stashed value.
      
      Fixes: 14c07ad8 ("x86/kvm/mmu: introduce guest_mmu")
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ad7dc69a
  4. 21 2月, 2019 24 次提交
  5. 26 1月, 2019 1 次提交
    • G
      KVM: x86: Mark expected switch fall-throughs · b2869f28
      Gustavo A. R. Silva 提交于
      In preparation to enabling -Wimplicit-fallthrough, mark switch
      cases where we are expecting to fall through.
      
      This patch fixes the following warnings:
      
      arch/x86/kvm/lapic.c:1037:27: warning: this statement may fall through [-Wimplicit-fallthrough=]
      arch/x86/kvm/lapic.c:1876:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
      arch/x86/kvm/hyperv.c:1637:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      arch/x86/kvm/svm.c:4396:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      arch/x86/kvm/mmu.c:4372:36: warning: this statement may fall through [-Wimplicit-fallthrough=]
      arch/x86/kvm/x86.c:3835:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      arch/x86/kvm/x86.c:7938:23: warning: this statement may fall through [-Wimplicit-fallthrough=]
      arch/x86/kvm/vmx/vmx.c:2015:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      arch/x86/kvm/vmx/vmx.c:1773:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      
      Warning level 3 was used: -Wimplicit-fallthrough=3
      
      This patch is part of the ongoing efforts to enabling -Wimplicit-fallthrough.
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b2869f28
  6. 21 12月, 2018 7 次提交