1. 20 12月, 2021 1 次提交
    • S
      KVM: x86: Retry page fault if MMU reload is pending and root has no sp · 18c841e1
      Sean Christopherson 提交于
      Play nice with a NULL shadow page when checking for an obsolete root in
      the page fault handler by flagging the page fault as stale if there's no
      shadow page associated with the root and KVM_REQ_MMU_RELOAD is pending.
      Invalidating memslots, which is the only case where _all_ roots need to
      be reloaded, requests all vCPUs to reload their MMUs while holding
      mmu_lock for lock.
      
      The "special" roots, e.g. pae_root when KVM uses PAE paging, are not
      backed by a shadow page.  Running with TDP disabled or with nested NPT
      explodes spectaculary due to dereferencing a NULL shadow page pointer.
      
      Skip the KVM_REQ_MMU_RELOAD check if there is a valid shadow page for the
      root.  Zapping shadow pages in response to guest activity, e.g. when the
      guest frees a PGD, can trigger KVM_REQ_MMU_RELOAD even if the current
      vCPU isn't using the affected root.  I.e. KVM_REQ_MMU_RELOAD can be seen
      with a completely valid root shadow page.  This is a bit of a moot point
      as KVM currently unloads all roots on KVM_REQ_MMU_RELOAD, but that will
      be cleaned up in the future.
      
      Fixes: a955cad8 ("KVM: x86/mmu: Retry page fault if root is invalidated by memslot update")
      Cc: stable@vger.kernel.org
      Cc: Maxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211209060552.2956723-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      18c841e1
  2. 08 12月, 2021 16 次提交
  3. 02 12月, 2021 1 次提交
    • S
      KVM: x86/mmu: Retry page fault if root is invalidated by memslot update · a955cad8
      Sean Christopherson 提交于
      Bail from the page fault handler if the root shadow page was obsoleted by
      a memslot update.  Do the check _after_ acuiring mmu_lock, as the TDP MMU
      doesn't rely on the memslot/MMU generation, and instead relies on the
      root being explicit marked invalid by kvm_mmu_zap_all_fast(), which takes
      mmu_lock for write.
      
      For the TDP MMU, inserting a SPTE into an obsolete root can leak a SP if
      kvm_tdp_mmu_zap_invalidated_roots() has already zapped the SP, i.e. has
      moved past the gfn associated with the SP.
      
      For other MMUs, the resulting behavior is far more convoluted, though
      unlikely to be truly problematic.  Installing SPs/SPTEs into the obsolete
      root isn't directly problematic, as the obsolete root will be unloaded
      and dropped before the vCPU re-enters the guest.  But because the legacy
      MMU tracks shadow pages by their role, any SP created by the fault can
      can be reused in the new post-reload root.  Again, that _shouldn't_ be
      problematic as any leaf child SPTEs will be created for the current/valid
      memslot generation, and kvm_mmu_get_page() will not reuse child SPs from
      the old generation as they will be flagged as obsolete.  But, given that
      continuing with the fault is pointess (the root will be unloaded), apply
      the check to all MMUs.
      
      Fixes: b7cccd39 ("KVM: x86/mmu: Fast invalidation for TDP MMU")
      Cc: stable@vger.kernel.org
      Cc: Ben Gardon <bgardon@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211120045046.3940942-5-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a955cad8
  4. 30 11月, 2021 3 次提交
  5. 26 11月, 2021 3 次提交
  6. 18 11月, 2021 3 次提交
    • H
      KVM: x86/mmu: Pass parameter flush as false in kvm_tdp_mmu_zap_collapsible_sptes() · 8ed716ca
      Hou Wenlong 提交于
      Since tlb flush has been done for legacy MMU before
      kvm_tdp_mmu_zap_collapsible_sptes(), so the parameter flush
      should be false for kvm_tdp_mmu_zap_collapsible_sptes().
      
      Fixes: e2209710 ("KVM: x86/mmu: Skip rmap operations if rmaps not allocated")
      Signed-off-by: NHou Wenlong <houwenlong93@linux.alibaba.com>
      Message-Id: <21453a1d2533afb6e59fb6c729af89e771ff2e76.1637140154.git.houwenlong93@linux.alibaba.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8ed716ca
    • H
      KVM: x86/mmu: Skip tlb flush if it has been done in zap_gfn_range() · c7785d85
      Hou Wenlong 提交于
      If the parameter flush is set, zap_gfn_range() would flush remote tlb
      when yield, then tlb flush is not needed outside. So use the return
      value of zap_gfn_range() directly instead of OR on it in
      kvm_unmap_gfn_range() and kvm_tdp_mmu_unmap_gfn_range().
      
      Fixes: 3039bcc7 ("KVM: Move x86's MMU notifier memslot walkers to generic code")
      Signed-off-by: NHou Wenlong <houwenlong93@linux.alibaba.com>
      Message-Id: <5e16546e228877a4d974f8c0e448a93d52c7a5a9.1637140154.git.houwenlong93@linux.alibaba.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c7785d85
    • M
      KVM: x86/mmu: include EFER.LMA in extended mmu role · b8453cdc
      Maxim Levitsky 提交于
      Incorporate EFER.LMA into kvm_mmu_extended_role, as it used to compute the
      guest root level and is not reflected in kvm_mmu_page_role.level when TDP
      is in use.  When simply running the guest, it is impossible for EFER.LMA
      and kvm_mmu.root_level to get out of sync, as the guest cannot transition
      from PAE paging to 64-bit paging without toggling CR0.PG, i.e. without
      first bouncing through a different MMU context.  And stuffing guest state
      via KVM_SET_SREGS{,2} also ensures a full MMU context reset.
      
      However, if KVM_SET_SREGS{,2} is followed by KVM_SET_NESTED_STATE, e.g. to
      set guest state when migrating the VM while L2 is active, the vCPU state
      will reflect L2, not L1.  If L1 is using TDP for L2, then root_mmu will
      have been configured using L2's state, despite not being used for L2.  If
      L2.EFER.LMA != L1.EFER.LMA, and L2 is using PAE paging, then root_mmu will
      be configured for guest PAE paging, but will match the mmu_role for 64-bit
      paging and cause KVM to not reconfigure root_mmu on the next nested VM-Exit.
      
      Alternatively, the root_mmu's role could be invalidated after a successful
      KVM_SET_NESTED_STATE that yields vcpu->arch.mmu != vcpu->arch.root_mmu,
      i.e. that switches the active mmu to guest_mmu, but doing so is unnecessarily
      tricky, and not even needed if L1 and L2 do have the same role (e.g., they
      are both 64-bit guests and run with the same CR4).
      Suggested-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20211115131837.195527-3-mlevitsk@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b8453cdc
  7. 11 11月, 2021 1 次提交
  8. 22 10月, 2021 9 次提交
    • S
      KVM: x86/mmu: Extract zapping of rmaps for gfn range to separate helper · 21fa3246
      Sean Christopherson 提交于
      Extract the zapping of rmaps, a.k.a. legacy MMU, for a gfn range to a
      separate helper to clean up the unholy mess that kvm_zap_gfn_range() has
      become.  In addition to deep nesting, the rmaps zapping spreads out the
      declaration of several variables and is generally a mess.  Clean up the
      mess now so that future work to improve the memslots implementation
      doesn't need to deal with it.
      
      Cc: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211022010005.1454978-4-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      21fa3246
    • S
      KVM: x86/mmu: Drop a redundant remote TLB flush in kvm_zap_gfn_range() · e8be2a5b
      Sean Christopherson 提交于
      Remove an unnecessary remote TLB flush in kvm_zap_gfn_range() now that
      said function holds mmu_lock for write for its entire duration.  The
      flush was added by the now-reverted commit to allow TDP MMU to flush while
      holding mmu_lock for read, as the transition from write=>read required
      dropping the lock and thus a pending flush needed to be serviced.
      
      Fixes: 5a324c24 ("Revert "KVM: x86/mmu: Allow zap gfn range to operate under the mmu read lock"")
      Cc: Maxim Levitsky <mlevitsk@redhat.com>
      Cc: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
      Cc: Ben Gardon <bgardon@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211022010005.1454978-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e8be2a5b
    • S
      KVM: x86/mmu: Drop a redundant, broken remote TLB flush · bc3b3c10
      Sean Christopherson 提交于
      A recent commit to fix the calls to kvm_flush_remote_tlbs_with_address()
      in kvm_zap_gfn_range() inadvertantly added yet another flush instead of
      fixing the existing flush.  Drop the redundant flush, and fix the params
      for the existing flush.
      
      Cc: stable@vger.kernel.org
      Fixes: 2822da44 ("KVM: x86/mmu: fix parameters to kvm_flush_remote_tlbs_with_address")
      Cc: Maxim Levitsky <mlevitsk@redhat.com>
      Cc: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211022010005.1454978-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      bc3b3c10
    • L
      KVM: X86: Don't unload MMU in kvm_vcpu_flush_tlb_guest() · 61b05a9f
      Lai Jiangshan 提交于
      kvm_mmu_unload() destroys all the PGD caches.  Use the lighter
      kvm_mmu_sync_roots() and kvm_mmu_sync_prev_roots() instead.
      Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
      Message-Id: <20211019110154.4091-5-jiangshanlai@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      61b05a9f
    • L
      KVM: X86: pair smp_wmb() of mmu_try_to_unsync_pages() with smp_rmb() · 264d3dc1
      Lai Jiangshan 提交于
      The commit 578e1c4d ("kvm: x86: Avoid taking MMU lock
      in kvm_mmu_sync_roots if no sync is needed") added smp_wmb() in
      mmu_try_to_unsync_pages(), but the corresponding smp_load_acquire() isn't
      used on the load of SPTE.W.  smp_load_acquire() orders _subsequent_
      loads after sp->is_unsync; it does not order _earlier_ loads before
      the load of sp->is_unsync.
      
      This has no functional change; smp_rmb() is a NOP on x86, and no
      compiler barrier is required because there is a VMEXIT between the
      load of SPTE.W and kvm_mmu_snc_roots.
      
      Cc: Junaid Shahid <junaids@google.com>
      Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
      Message-Id: <20211019110154.4091-4-jiangshanlai@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      264d3dc1
    • J
      kvm: x86: mmu: Make NX huge page recovery period configurable · 4dfe4f40
      Junaid Shahid 提交于
      Currently, the NX huge page recovery thread wakes up every minute and
      zaps 1/nx_huge_pages_recovery_ratio of the total number of split NX
      huge pages at a time. This is intended to ensure that only a
      relatively small number of pages get zapped at a time. But for very
      large VMs (or more specifically, VMs with a large number of
      executable pages), a period of 1 minute could still result in this
      number being too high (unless the ratio is changed significantly,
      but that can result in split pages lingering on for too long).
      
      This change makes the period configurable instead of fixing it at
      1 minute. Users of large VMs can then adjust the period and/or the
      ratio to reduce the number of pages zapped at one time while still
      maintaining the same overall duration for cycling through the
      entire list. By default, KVM derives a period from the ratio such
      that a page will remain on the list for 1 hour on average.
      Signed-off-by: NJunaid Shahid <junaids@google.com>
      Message-Id: <20211020010627.305925-1-junaids@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4dfe4f40
    • D
      KVM: x86/mmu: Rename slot_handle_leaf to slot_handle_level_4k · 610265ea
      David Matlack 提交于
      slot_handle_leaf is a misnomer because it only operates on 4K SPTEs
      whereas "leaf" is used to describe any valid terminal SPTE (4K or
      large page). Rename slot_handle_leaf to slot_handle_level_4k to
      avoid confusion.
      
      Making this change makes it more obvious there is a benign discrepency
      between the legacy MMU and the TDP MMU when it comes to dirty logging.
      The legacy MMU only iterates through 4K SPTEs when zapping for
      collapsing and when clearing D-bits. The TDP MMU, on the other hand,
      iterates through SPTEs on all levels.
      
      The TDP MMU behavior of zapping SPTEs at all levels is technically
      overkill for its current dirty logging implementation, which always
      demotes to 4k SPTES, but both the TDP MMU and legacy MMU zap if and only
      if the SPTE can be replaced by a larger page, i.e. will not spuriously
      zap 2m (or larger) SPTEs. Opportunistically add comments to explain this
      discrepency in the code.
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Message-Id: <20211019162223.3935109-1-dmatlack@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      610265ea
    • P
      KVM: x86/mmu: clean up prefetch/prefault/speculative naming · 2839180c
      Paolo Bonzini 提交于
      "prefetch", "prefault" and "speculative" are used throughout KVM to mean
      the same thing.  Use a single name, standardizing on "prefetch" which
      is already used by various functions such as direct_pte_prefetch,
      FNAME(prefetch_gpte), FNAME(pte_prefetch), etc.
      Suggested-by: NDavid Matlack <dmatlack@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2839180c
    • D
      KVM: cleanup allocation of rmaps and page tracking data · 1e76a3ce
      David Stevens 提交于
      Unify the flags for rmaps and page tracking data, using a
      single flag in struct kvm_arch and a single loop to go
      over all the address spaces and memslots.  This avoids
      code duplication between alloc_all_memslots_rmaps and
      kvm_page_track_enable_mmu_write_tracking.
      Signed-off-by: NDavid Stevens <stevensd@chromium.org>
      [This patch is the delta between David's v2 and v3, with conflicts
       fixed and my own commit message. - Paolo]
      Co-developed-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1e76a3ce
  9. 21 10月, 2021 1 次提交
  10. 18 10月, 2021 1 次提交
  11. 01 10月, 2021 1 次提交
    • D
      KVM: x86: only allocate gfn_track when necessary · deae4a10
      David Stevens 提交于
      Avoid allocating the gfn_track arrays if nothing needs them. If there
      are no external to KVM users of the API (i.e. no GVT-g), then page
      tracking is only needed for shadow page tables. This means that when tdp
      is enabled and there are no external users, then the gfn_track arrays
      can be lazily allocated when the shadow MMU is actually used. This avoid
      allocations equal to .05% of guest memory when nested virtualization is
      not used, if the kernel is compiled without GVT-g.
      Signed-off-by: NDavid Stevens <stevensd@chromium.org>
      Message-Id: <20210922045859.2011227-3-stevensd@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      deae4a10