1. 08 12月, 2021 1 次提交
  2. 02 12月, 2021 1 次提交
    • S
      KVM: x86/mmu: Retry page fault if root is invalidated by memslot update · a955cad8
      Sean Christopherson 提交于
      Bail from the page fault handler if the root shadow page was obsoleted by
      a memslot update.  Do the check _after_ acuiring mmu_lock, as the TDP MMU
      doesn't rely on the memslot/MMU generation, and instead relies on the
      root being explicit marked invalid by kvm_mmu_zap_all_fast(), which takes
      mmu_lock for write.
      
      For the TDP MMU, inserting a SPTE into an obsolete root can leak a SP if
      kvm_tdp_mmu_zap_invalidated_roots() has already zapped the SP, i.e. has
      moved past the gfn associated with the SP.
      
      For other MMUs, the resulting behavior is far more convoluted, though
      unlikely to be truly problematic.  Installing SPs/SPTEs into the obsolete
      root isn't directly problematic, as the obsolete root will be unloaded
      and dropped before the vCPU re-enters the guest.  But because the legacy
      MMU tracks shadow pages by their role, any SP created by the fault can
      can be reused in the new post-reload root.  Again, that _shouldn't_ be
      problematic as any leaf child SPTEs will be created for the current/valid
      memslot generation, and kvm_mmu_get_page() will not reuse child SPs from
      the old generation as they will be flagged as obsolete.  But, given that
      continuing with the fault is pointess (the root will be unloaded), apply
      the check to all MMUs.
      
      Fixes: b7cccd39 ("KVM: x86/mmu: Fast invalidation for TDP MMU")
      Cc: stable@vger.kernel.org
      Cc: Ben Gardon <bgardon@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211120045046.3940942-5-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a955cad8
  3. 22 10月, 2021 1 次提交
  4. 01 10月, 2021 18 次提交
  5. 30 9月, 2021 2 次提交
  6. 23 9月, 2021 2 次提交
  7. 21 8月, 2021 2 次提交
  8. 15 7月, 2021 1 次提交
    • S
      KVM: x86/mmu: Do not apply HPA (memory encryption) mask to GPAs · fc9bf2e0
      Sean Christopherson 提交于
      Ignore "dynamic" host adjustments to the physical address mask when
      generating the masks for guest PTEs, i.e. the guest PA masks.  The host
      physical address space and guest physical address space are two different
      beasts, e.g. even though SEV's C-bit is the same bit location for both
      host and guest, disabling SME in the host (which clears shadow_me_mask)
      does not affect the guest PTE->GPA "translation".
      
      For non-SEV guests, not dropping bits is the correct behavior.  Assuming
      KVM and userspace correctly enumerate/configure guest MAXPHYADDR, bits
      that are lost as collateral damage from memory encryption are treated as
      reserved bits, i.e. KVM will never get to the point where it attempts to
      generate a gfn using the affected bits.  And if userspace wants to create
      a bogus vCPU, then userspace gets to deal with the fallout of hardware
      doing odd things with bad GPAs.
      
      For SEV guests, not dropping the C-bit is technically wrong, but it's a
      moot point because KVM can't read SEV guest's page tables in any case
      since they're always encrypted.  Not to mention that the current KVM code
      is also broken since sme_me_mask does not have to be non-zero for SEV to
      be supported by KVM.  The proper fix would be to teach all of KVM to
      correctly handle guest private memory, but that's a task for the future.
      
      Fixes: d0ec49d4 ("kvm/x86/svm: Support Secure Memory Encryption within KVM")
      Cc: stable@vger.kernel.org
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210623230552.4027702-5-seanjc@google.com>
      [Use a new header instead of adding header guards to paging_tmpl.h. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fc9bf2e0
  9. 25 6月, 2021 7 次提交
    • S
      KVM: x86/mmu: Get CR4.SMEP from MMU, not vCPU, in shadow page fault · 9a65d0b7
      Sean Christopherson 提交于
      Use the current MMU instead of vCPU state to query CR4.SMEP when handling
      a page fault.  In the nested NPT case, the current CR4.SMEP reflects L2,
      whereas the page fault is shadowing L1's NPT, which uses L1's hCR4.
      Practically speaking, this is a nop a NPT walks are always user faults,
      i.e. this code will never be reached, but fix it up for consistency.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210622175739.3610207-54-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9a65d0b7
    • S
      KVM: x86/mmu: Get CR0.WP from MMU, not vCPU, in shadow page fault · fdaa2935
      Sean Christopherson 提交于
      Use the current MMU instead of vCPU state to query CR0.WP when handling
      a page fault.  In the nested NPT case, the current CR0.WP reflects L2,
      whereas the page fault is shadowing L1's NPT.  Practically speaking, this
      is a nop a NPT walks are always user faults, but fix it up for
      consistency.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210622175739.3610207-53-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fdaa2935
    • S
      KVM: x86/mmu: Optimize and clean up so called "last nonleaf level" logic · 7cd138db
      Sean Christopherson 提交于
      Drop the pre-computed last_nonleaf_level, which is arguably wrong and at
      best confusing.  Per the comment:
      
        Can have large pages at levels 2..last_nonleaf_level-1.
      
      the intent of the variable would appear to be to track what levels can
      _legally_ have large pages, but that intent doesn't align with reality.
      The computed value will be wrong for 5-level paging, or if 1gb pages are
      not supported.
      
      The flawed code is not a problem in practice, because except for 32-bit
      PSE paging, bit 7 is reserved if large pages aren't supported at the
      level.  Take advantage of this invariant and simply omit the level magic
      math for 64-bit page tables (including PAE).
      
      For 32-bit paging (non-PAE), the adjustments are needed purely because
      bit 7 is ignored if PSE=0.  Retain that logic as is, but make
      is_last_gpte() unique per PTTYPE so that the PSE check is avoided for
      PAE and EPT paging.  In the spirit of avoiding branches, bump the "last
      nonleaf level" for 32-bit PSE paging by adding the PSE bit itself.
      
      Note, bit 7 is ignored or has other meaning in CR3/EPTP, but despite
      FNAME(walk_addr_generic) briefly grabbing CR3/EPTP in "pte", they are
      not PTEs and will blow up all the other gpte helpers.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210622175739.3610207-51-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7cd138db
    • S
      KVM: x86/mmu: Use MMU's role to detect EFER.NX in guest page walk · cd628f0f
      Sean Christopherson 提交于
      Use the NX bit from the MMU's role instead of the MMU itself so that the
      redundant, dedicated "nx" flag can be dropped.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210622175739.3610207-36-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cd628f0f
    • S
      KVM: x86/mmu: Add accessors to query mmu_role bits · 60667724
      Sean Christopherson 提交于
      Add accessors via a builder macro for all mmu_role bits that track a CR0,
      CR4, or EFER bit, abstracting whether the bits are in the base or the
      extended role.
      
      Future commits will switch to using mmu_role instead of vCPU state to
      configure the MMU, i.e. there are about to be a large number of users.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210622175739.3610207-26-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      60667724
    • S
      KVM: x86/mmu: WARN and zap SP when sync'ing if MMU role mismatches · 2640b086
      Sean Christopherson 提交于
      When synchronizing a shadow page, WARN and zap the page if its mmu role
      isn't compatible with the current MMU context, where "compatible" is an
      exact match sans the bits that have no meaning in the overall MMU context
      or will be explicitly overwritten during the sync.  Many of the helpers
      used by sync_page() are specific to the current context, updating a SMM
      vs. non-SMM shadow page would use the wrong memslots, updating L1 vs. L2
      PTEs might work but would be extremely bizaree, and so on and so forth.
      
      Drop the guard with respect to 8-byte vs. 4-byte PTEs in
      __kvm_sync_page(), it was made useless when kvm_mmu_get_page() stopped
      trying to sync shadow pages irrespective of the current MMU context.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210622175739.3610207-12-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2640b086
    • S
      KVM: x86/mmu: Use MMU's role to detect CR4.SMEP value in nested NPT walk · ef318b9e
      Sean Christopherson 提交于
      Use the MMU's role to get its effective SMEP value when injecting a fault
      into the guest.  When walking L1's (nested) NPT while L2 is active, vCPU
      state will reflect L2, whereas NPT uses the host's (L1 in this case) CR0,
      CR4, EFER, etc...  If L1 and L2 have different settings for SMEP and
      L1 does not have EFER.NX=1, this can result in an incorrect PFEC.FETCH
      when injecting #NPF.
      
      Fixes: e57d4a35 ("KVM: Add instruction fetch checking when walking guest page table")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210622175739.3610207-5-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ef318b9e
  10. 09 6月, 2021 1 次提交
    • L
      KVM: X86: MMU: Use the correct inherited permissions to get shadow page · b1bd5cba
      Lai Jiangshan 提交于
      When computing the access permissions of a shadow page, use the effective
      permissions of the walk up to that point, i.e. the logic AND of its parents'
      permissions.  Two guest PxE entries that point at the same table gfn need to
      be shadowed with different shadow pages if their parents' permissions are
      different.  KVM currently uses the effective permissions of the last
      non-leaf entry for all non-leaf entries.  Because all non-leaf SPTEs have
      full ("uwx") permissions, and the effective permissions are recorded only
      in role.access and merged into the leaves, this can lead to incorrect
      reuse of a shadow page and eventually to a missing guest protection page
      fault.
      
      For example, here is a shared pagetable:
      
         pgd[]   pud[]        pmd[]            virtual address pointers
                           /->pmd1(u--)->pte1(uw-)->page1 <- ptr1 (u--)
              /->pud1(uw-)--->pmd2(uw-)->pte2(uw-)->page2 <- ptr2 (uw-)
         pgd-|           (shared pmd[] as above)
              \->pud2(u--)--->pmd1(u--)->pte1(uw-)->page1 <- ptr3 (u--)
                           \->pmd2(uw-)->pte2(uw-)->page2 <- ptr4 (u--)
      
        pud1 and pud2 point to the same pmd table, so:
        - ptr1 and ptr3 points to the same page.
        - ptr2 and ptr4 points to the same page.
      
      (pud1 and pud2 here are pud entries, while pmd1 and pmd2 here are pmd entries)
      
      - First, the guest reads from ptr1 first and KVM prepares a shadow
        page table with role.access=u--, from ptr1's pud1 and ptr1's pmd1.
        "u--" comes from the effective permissions of pgd, pud1 and
        pmd1, which are stored in pt->access.  "u--" is used also to get
        the pagetable for pud1, instead of "uw-".
      
      - Then the guest writes to ptr2 and KVM reuses pud1 which is present.
        The hypervisor set up a shadow page for ptr2 with pt->access is "uw-"
        even though the pud1 pmd (because of the incorrect argument to
        kvm_mmu_get_page in the previous step) has role.access="u--".
      
      - Then the guest reads from ptr3.  The hypervisor reuses pud1's
        shadow pmd for pud2, because both use "u--" for their permissions.
        Thus, the shadow pmd already includes entries for both pmd1 and pmd2.
      
      - At last, the guest writes to ptr4.  This causes no vmexit or pagefault,
        because pud1's shadow page structures included an "uw-" page even though
        its role.access was "u--".
      
      Any kind of shared pagetable might have the similar problem when in
      virtual machine without TDP enabled if the permissions are different
      from different ancestors.
      
      In order to fix the problem, we change pt->access to be an array, and
      any access in it will not include permissions ANDed from child ptes.
      
      The test code is: https://lore.kernel.org/kvm/20210603050537.19605-1-jiangshanlai@gmail.com/
      Remember to test it with TDP disabled.
      
      The problem had existed long before the commit 41074d07 ("KVM: MMU:
      Fix inherited permissions for emulated guest pte updates"), and it
      is hard to find which is the culprit.  So there is no fixes tag here.
      Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
      Message-Id: <20210603052455.21023-1-jiangshanlai@gmail.com>
      Cc: stable@vger.kernel.org
      Fixes: cea0f0e7 ("[PATCH] KVM: MMU: Shadow page table caching")
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b1bd5cba
  11. 15 3月, 2021 2 次提交
  12. 23 2月, 2021 2 次提交
    • D
      KVM: x86/mmu: Consider the hva in mmu_notifier retry · 4a42d848
      David Stevens 提交于
      Track the range being invalidated by mmu_notifier and skip page fault
      retries if the fault address is not affected by the in-progress
      invalidation. Handle concurrent invalidations by finding the minimal
      range which includes all ranges being invalidated. Although the combined
      range may include unrelated addresses and cannot be shrunk as individual
      invalidation operations complete, it is unlikely the marginal gains of
      proper range tracking are worth the additional complexity.
      
      The primary benefit of this change is the reduction in the likelihood of
      extreme latency when handing a page fault due to another thread having
      been preempted while modifying host virtual addresses.
      Signed-off-by: NDavid Stevens <stevensd@chromium.org>
      Message-Id: <20210222024522.1751719-3-stevensd@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4a42d848
    • S
      KVM: x86/mmu: Skip mmu_notifier check when handling MMIO page fault · 5f8a7cf2
      Sean Christopherson 提交于
      Don't retry a page fault due to an mmu_notifier invalidation when
      handling a page fault for a GPA that did not resolve to a memslot, i.e.
      an MMIO page fault.  Invalidations from the mmu_notifier signal a change
      in a host virtual address (HVA) mapping; without a memslot, there is no
      HVA and thus no possibility that the invalidation is relevant to the
      page fault being handled.
      
      Note, the MMIO vs. memslot generation checks handle the case where a
      pending memslot will create a memslot overlapping the faulting GPA.  The
      mmu_notifier checks are orthogonal to memslot updates.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210222024522.1751719-2-stevensd@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5f8a7cf2