1. 02 4月, 2022 2 次提交
    • P
      KVM: x86/mmu: do compare-and-exchange of gPTE via the user address · 2a8859f3
      Paolo Bonzini 提交于
      FNAME(cmpxchg_gpte) is an inefficient mess.  It is at least decent if it
      can go through get_user_pages_fast(), but if it cannot then it tries to
      use memremap(); that is not just terribly slow, it is also wrong because
      it assumes that the VM_PFNMAP VMA is contiguous.
      
      The right way to do it would be to do the same thing as
      hva_to_pfn_remapped() does since commit add6a0cd ("KVM: MMU: try to
      fix up page faults before giving up", 2016-07-05), using follow_pte()
      and fixup_user_fault() to determine the correct address to use for
      memremap().  To do this, one could for example extract hva_to_pfn()
      for use outside virt/kvm/kvm_main.c.  But really there is no reason to
      do that either, because there is already a perfectly valid address to
      do the cmpxchg() on, only it is a userspace address.  That means doing
      user_access_begin()/user_access_end() and writing the code in assembly
      to handle exceptions correctly.  Worse, the guest PTE can be 8-byte
      even on i686 so there is the extra complication of using cmpxchg8b to
      account for.  But at least it is an efficient mess.
      
      (Thanks to Linus for suggesting improvement on the inline assembly).
      Reported-by: NQiuhao Li <qiuhao@sysec.org>
      Reported-by: NGaoning Pan <pgn@zju.edu.cn>
      Reported-by: NYongkang Jia <kangel@zju.edu.cn>
      Reported-by: syzbot+6cde2282daa792c49ab8@syzkaller.appspotmail.com
      Debugged-by: NTadeusz Struk <tadeusz.struk@linaro.org>
      Tested-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: bd53cb35 ("X86/KVM: Handle PFNs outside of kernel reach when touching GPTEs")
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2a8859f3
    • L
      KVM: X86: Change the type of access u32 to u64 · 5b22bbe7
      Lai Jiangshan 提交于
      Change the type of access u32 to u64 for FNAME(walk_addr) and
      ->gva_to_gpa().
      
      The kinds of accesses are usually combinations of UWX, and VMX/SVM's
      nested paging adds a new factor of access: is it an access for a guest
      page table or for a final guest physical address.
      
      And SMAP relies a factor for supervisor access: explicit or implicit.
      
      So @access in FNAME(walk_addr) and ->gva_to_gpa() is better to include
      all these information to do the walk.
      
      Although @access(u32) has enough bits to encode all the kinds, this
      patch extends it to u64:
      	o Extra bits will be in the higher 32 bits, so that we can
      	  easily obtain the traditional access mode (UWX) by converting
      	  it to u32.
      	o Reuse the value for the access kind defined by SVM's nested
      	  paging (PFERR_GUEST_FINAL_MASK and PFERR_GUEST_PAGE_MASK) as
      	  @error_code in kvm_handle_page_fault().
      Signed-off-by: NLai Jiangshan <jiangshan.ljs@antgroup.com>
      Message-Id: <20220311070346.45023-2-jiangshanlai@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5b22bbe7
  2. 25 2月, 2022 1 次提交
  3. 19 2月, 2022 1 次提交
  4. 08 12月, 2021 2 次提交
  5. 02 12月, 2021 1 次提交
    • S
      KVM: x86/mmu: Retry page fault if root is invalidated by memslot update · a955cad8
      Sean Christopherson 提交于
      Bail from the page fault handler if the root shadow page was obsoleted by
      a memslot update.  Do the check _after_ acuiring mmu_lock, as the TDP MMU
      doesn't rely on the memslot/MMU generation, and instead relies on the
      root being explicit marked invalid by kvm_mmu_zap_all_fast(), which takes
      mmu_lock for write.
      
      For the TDP MMU, inserting a SPTE into an obsolete root can leak a SP if
      kvm_tdp_mmu_zap_invalidated_roots() has already zapped the SP, i.e. has
      moved past the gfn associated with the SP.
      
      For other MMUs, the resulting behavior is far more convoluted, though
      unlikely to be truly problematic.  Installing SPs/SPTEs into the obsolete
      root isn't directly problematic, as the obsolete root will be unloaded
      and dropped before the vCPU re-enters the guest.  But because the legacy
      MMU tracks shadow pages by their role, any SP created by the fault can
      can be reused in the new post-reload root.  Again, that _shouldn't_ be
      problematic as any leaf child SPTEs will be created for the current/valid
      memslot generation, and kvm_mmu_get_page() will not reuse child SPs from
      the old generation as they will be flagged as obsolete.  But, given that
      continuing with the fault is pointess (the root will be unloaded), apply
      the check to all MMUs.
      
      Fixes: b7cccd39 ("KVM: x86/mmu: Fast invalidation for TDP MMU")
      Cc: stable@vger.kernel.org
      Cc: Ben Gardon <bgardon@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211120045046.3940942-5-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a955cad8
  6. 22 10月, 2021 1 次提交
  7. 01 10月, 2021 18 次提交
  8. 30 9月, 2021 2 次提交
  9. 23 9月, 2021 2 次提交
  10. 21 8月, 2021 2 次提交
  11. 15 7月, 2021 1 次提交
    • S
      KVM: x86/mmu: Do not apply HPA (memory encryption) mask to GPAs · fc9bf2e0
      Sean Christopherson 提交于
      Ignore "dynamic" host adjustments to the physical address mask when
      generating the masks for guest PTEs, i.e. the guest PA masks.  The host
      physical address space and guest physical address space are two different
      beasts, e.g. even though SEV's C-bit is the same bit location for both
      host and guest, disabling SME in the host (which clears shadow_me_mask)
      does not affect the guest PTE->GPA "translation".
      
      For non-SEV guests, not dropping bits is the correct behavior.  Assuming
      KVM and userspace correctly enumerate/configure guest MAXPHYADDR, bits
      that are lost as collateral damage from memory encryption are treated as
      reserved bits, i.e. KVM will never get to the point where it attempts to
      generate a gfn using the affected bits.  And if userspace wants to create
      a bogus vCPU, then userspace gets to deal with the fallout of hardware
      doing odd things with bad GPAs.
      
      For SEV guests, not dropping the C-bit is technically wrong, but it's a
      moot point because KVM can't read SEV guest's page tables in any case
      since they're always encrypted.  Not to mention that the current KVM code
      is also broken since sme_me_mask does not have to be non-zero for SEV to
      be supported by KVM.  The proper fix would be to teach all of KVM to
      correctly handle guest private memory, but that's a task for the future.
      
      Fixes: d0ec49d4 ("kvm/x86/svm: Support Secure Memory Encryption within KVM")
      Cc: stable@vger.kernel.org
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210623230552.4027702-5-seanjc@google.com>
      [Use a new header instead of adding header guards to paging_tmpl.h. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fc9bf2e0
  12. 25 6月, 2021 7 次提交
    • S
      KVM: x86/mmu: Get CR4.SMEP from MMU, not vCPU, in shadow page fault · 9a65d0b7
      Sean Christopherson 提交于
      Use the current MMU instead of vCPU state to query CR4.SMEP when handling
      a page fault.  In the nested NPT case, the current CR4.SMEP reflects L2,
      whereas the page fault is shadowing L1's NPT, which uses L1's hCR4.
      Practically speaking, this is a nop a NPT walks are always user faults,
      i.e. this code will never be reached, but fix it up for consistency.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210622175739.3610207-54-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9a65d0b7
    • S
      KVM: x86/mmu: Get CR0.WP from MMU, not vCPU, in shadow page fault · fdaa2935
      Sean Christopherson 提交于
      Use the current MMU instead of vCPU state to query CR0.WP when handling
      a page fault.  In the nested NPT case, the current CR0.WP reflects L2,
      whereas the page fault is shadowing L1's NPT.  Practically speaking, this
      is a nop a NPT walks are always user faults, but fix it up for
      consistency.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210622175739.3610207-53-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fdaa2935
    • S
      KVM: x86/mmu: Optimize and clean up so called "last nonleaf level" logic · 7cd138db
      Sean Christopherson 提交于
      Drop the pre-computed last_nonleaf_level, which is arguably wrong and at
      best confusing.  Per the comment:
      
        Can have large pages at levels 2..last_nonleaf_level-1.
      
      the intent of the variable would appear to be to track what levels can
      _legally_ have large pages, but that intent doesn't align with reality.
      The computed value will be wrong for 5-level paging, or if 1gb pages are
      not supported.
      
      The flawed code is not a problem in practice, because except for 32-bit
      PSE paging, bit 7 is reserved if large pages aren't supported at the
      level.  Take advantage of this invariant and simply omit the level magic
      math for 64-bit page tables (including PAE).
      
      For 32-bit paging (non-PAE), the adjustments are needed purely because
      bit 7 is ignored if PSE=0.  Retain that logic as is, but make
      is_last_gpte() unique per PTTYPE so that the PSE check is avoided for
      PAE and EPT paging.  In the spirit of avoiding branches, bump the "last
      nonleaf level" for 32-bit PSE paging by adding the PSE bit itself.
      
      Note, bit 7 is ignored or has other meaning in CR3/EPTP, but despite
      FNAME(walk_addr_generic) briefly grabbing CR3/EPTP in "pte", they are
      not PTEs and will blow up all the other gpte helpers.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210622175739.3610207-51-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7cd138db
    • S
      KVM: x86/mmu: Use MMU's role to detect EFER.NX in guest page walk · cd628f0f
      Sean Christopherson 提交于
      Use the NX bit from the MMU's role instead of the MMU itself so that the
      redundant, dedicated "nx" flag can be dropped.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210622175739.3610207-36-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cd628f0f
    • S
      KVM: x86/mmu: Add accessors to query mmu_role bits · 60667724
      Sean Christopherson 提交于
      Add accessors via a builder macro for all mmu_role bits that track a CR0,
      CR4, or EFER bit, abstracting whether the bits are in the base or the
      extended role.
      
      Future commits will switch to using mmu_role instead of vCPU state to
      configure the MMU, i.e. there are about to be a large number of users.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210622175739.3610207-26-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      60667724
    • S
      KVM: x86/mmu: WARN and zap SP when sync'ing if MMU role mismatches · 2640b086
      Sean Christopherson 提交于
      When synchronizing a shadow page, WARN and zap the page if its mmu role
      isn't compatible with the current MMU context, where "compatible" is an
      exact match sans the bits that have no meaning in the overall MMU context
      or will be explicitly overwritten during the sync.  Many of the helpers
      used by sync_page() are specific to the current context, updating a SMM
      vs. non-SMM shadow page would use the wrong memslots, updating L1 vs. L2
      PTEs might work but would be extremely bizaree, and so on and so forth.
      
      Drop the guard with respect to 8-byte vs. 4-byte PTEs in
      __kvm_sync_page(), it was made useless when kvm_mmu_get_page() stopped
      trying to sync shadow pages irrespective of the current MMU context.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210622175739.3610207-12-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2640b086
    • S
      KVM: x86/mmu: Use MMU's role to detect CR4.SMEP value in nested NPT walk · ef318b9e
      Sean Christopherson 提交于
      Use the MMU's role to get its effective SMEP value when injecting a fault
      into the guest.  When walking L1's (nested) NPT while L2 is active, vCPU
      state will reflect L2, whereas NPT uses the host's (L1 in this case) CR0,
      CR4, EFER, etc...  If L1 and L2 have different settings for SMEP and
      L1 does not have EFER.NX=1, this can result in an incorrect PFEC.FETCH
      when injecting #NPF.
      
      Fixes: e57d4a35 ("KVM: Add instruction fetch checking when walking guest page table")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210622175739.3610207-5-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ef318b9e