1. 25 6月, 2021 3 次提交
    • S
      KVM: x86: Alert userspace that KVM_SET_CPUID{,2} after KVM_RUN is broken · 63f5a190
      Sean Christopherson 提交于
      Warn userspace that KVM_SET_CPUID{,2} after KVM_RUN "may" cause guest
      instability.  Initialize last_vmentry_cpu to -1 and use it to detect if
      the vCPU has been run at least once when its CPUID model is changed.
      
      KVM does not correctly handle changes to paging related settings in the
      guest's vCPU model after KVM_RUN, e.g. MAXPHYADDR, GBPAGES, etc...  KVM
      could theoretically zap all shadow pages, but actually making that happen
      is a mess due to lock inversion (vcpu->mutex is held).  And even then,
      updating paging settings on the fly would only work if all vCPUs are
      stopped, updated in concert with identical settings, then restarted.
      
      To support running vCPUs with different vCPU models (that affect paging),
      KVM would need to track all relevant information in kvm_mmu_page_role.
      Note, that's the _page_ role, not the full mmu_role.  Updating mmu_role
      isn't sufficient as a vCPU can reuse a shadow page translation that was
      created by a vCPU with different settings and thus completely skip the
      reserved bit checks (that are tied to CPUID).
      
      Tracking CPUID state in kvm_mmu_page_role is _extremely_ undesirable as
      it would require doubling gfn_track from a u16 to a u32, i.e. would
      increase KVM's memory footprint by 2 bytes for every 4kb of guest memory.
      E.g. MAXPHYADDR (6 bits), GBPAGES, AMD vs. INTEL = 1 bit, and SEV C-BIT
      would all need to be tracked.
      
      In practice, there is no remotely sane use case for changing any paging
      related CPUID entries on the fly, so just sweep it under the rug (after
      yelling at userspace).
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210622175739.3610207-8-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      63f5a190
    • S
      KVM: x86: Force all MMUs to reinitialize if guest CPUID is modified · 49c6f875
      Sean Christopherson 提交于
      Invalidate all MMUs' roles after a CPUID update to force reinitizliation
      of the MMU context/helpers.  Despite the efforts of commit de3ccd26
      ("KVM: MMU: record maximum physical address width in kvm_mmu_extended_role"),
      there are still a handful of CPUID-based properties that affect MMU
      behavior but are not incorporated into mmu_role.  E.g. 1gb hugepage
      support, AMD vs. Intel handling of bit 8, and SEV's C-Bit location all
      factor into the guest's reserved PTE bits.
      
      The obvious alternative would be to add all such properties to mmu_role,
      but doing so provides no benefit over simply forcing a reinitialization
      on every CPUID update, as setting guest CPUID is a rare operation.
      
      Note, reinitializing all MMUs after a CPUID update does not fix all of
      KVM's woes.  Specifically, kvm_mmu_page_role doesn't track the CPUID
      properties, which means that a vCPU can reuse shadow pages that should
      not exist for the new vCPU model, e.g. that map GPAs that are now illegal
      (due to MAXPHYADDR changes) or that set bits that are now reserved
      (PAGE_SIZE for 1gb pages), etc...
      
      Tracking the relevant CPUID properties in kvm_mmu_page_role would address
      the majority of problems, but fully tracking that much state in the
      shadow page role comes with an unpalatable cost as it would require a
      non-trivial increase in KVM's memory footprint.  The GBPAGES case is even
      worse, as neither Intel nor AMD provides a way to disable 1gb hugepage
      support in the hardware page walker, i.e. it's a virtualization hole that
      can't be closed when using TDP.
      
      In other words, resetting the MMU after a CPUID update is largely a
      superficial fix.  But, it will allow reverting the tracking of MAXPHYADDR
      in the mmu_role, and that case in particular needs to mostly work because
      KVM's shadow_root_level depends on guest MAXPHYADDR when 5-level paging
      is supported.  For cases where KVM botches guest behavior, the damage is
      limited to that guest.  But for the shadow_root_level, a misconfigured
      MMU can cause KVM to incorrectly access memory, e.g. due to walking off
      the end of its shadow page tables.
      
      Fixes: 7dcd5755 ("x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is needed")
      Cc: Yu Zhang <yu.c.zhang@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210622175739.3610207-7-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      49c6f875
    • S
      Revert "KVM: x86/mmu: Drop kvm_mmu_extended_role.cr4_la57 hack" · f71a53d1
      Sean Christopherson 提交于
      Restore CR4.LA57 to the mmu_role to fix an amusing edge case with nested
      virtualization.  When KVM (L0) is using TDP, CR4.LA57 is not reflected in
      mmu_role.base.level because that tracks the shadow root level, i.e. TDP
      level.  Normally, this is not an issue because LA57 can't be toggled
      while long mode is active, i.e. the guest has to first disable paging,
      then toggle LA57, then re-enable paging, thus ensuring an MMU
      reinitialization.
      
      But if L1 is crafty, it can load a new CR4 on VM-Exit and toggle LA57
      without having to bounce through an unpaged section.  L1 can also load a
      new CR3 on exit, i.e. it doesn't even need to play crazy paging games, a
      single entry PML5 is sufficient.  Such shenanigans are only problematic
      if L0 and L1 use TDP, otherwise L1 and L2 share an MMU that gets
      reinitialized on nested VM-Enter/VM-Exit due to mmu_role.base.guest_mode.
      
      Note, in the L2 case with nested TDP, even though L1 can switch between
      L2s with different LA57 settings, thus bypassing the paging requirement,
      in that case KVM's nested_mmu will track LA57 in base.level.
      
      This reverts commit 8053f924.
      
      Fixes: 8053f924 ("KVM: x86/mmu: Drop kvm_mmu_extended_role.cr4_la57 hack")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210622175739.3610207-6-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f71a53d1
  2. 24 6月, 2021 1 次提交
  3. 18 6月, 2021 22 次提交
  4. 27 5月, 2021 1 次提交
  5. 07 5月, 2021 7 次提交
  6. 03 5月, 2021 1 次提交
    • M
      KVM: nSVM: fix few bugs in the vmcb02 caching logic · c74ad08f
      Maxim Levitsky 提交于
      * Define and use an invalid GPA (all ones) for init value of last
        and current nested vmcb physical addresses.
      
      * Reset the current vmcb12 gpa to the invalid value when leaving
        the nested mode, similar to what is done on nested vmexit.
      
      * Reset	the last seen vmcb12 address when disabling the nested SVM,
        as it relies on vmcb02 fields which are freed at that point.
      
      Fixes: 4995a368 ("KVM: SVM: Use a separate vmcb for the nested L2 guest")
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210503125446.1353307-3-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c74ad08f
  7. 22 4月, 2021 1 次提交
    • N
      KVM: x86: Support KVM VMs sharing SEV context · 54526d1f
      Nathan Tempelman 提交于
      Add a capability for userspace to mirror SEV encryption context from
      one vm to another. On our side, this is intended to support a
      Migration Helper vCPU, but it can also be used generically to support
      other in-guest workloads scheduled by the host. The intention is for
      the primary guest and the mirror to have nearly identical memslots.
      
      The primary benefits of this are that:
      1) The VMs do not share KVM contexts (think APIC/MSRs/etc), so they
      can't accidentally clobber each other.
      2) The VMs can have different memory-views, which is necessary for post-copy
      migration (the migration vCPUs on the target need to read and write to
      pages, when the primary guest would VMEXIT).
      
      This does not change the threat model for AMD SEV. Any memory involved
      is still owned by the primary guest and its initial state is still
      attested to through the normal SEV_LAUNCH_* flows. If userspace wanted
      to circumvent SEV, they could achieve the same effect by simply attaching
      a vCPU to the primary VM.
      This patch deliberately leaves userspace in charge of the memslots for the
      mirror, as it already has the power to mess with them in the primary guest.
      
      This patch does not support SEV-ES (much less SNP), as it does not
      handle handing off attested VMSAs to the mirror.
      
      For additional context, we need a Migration Helper because SEV PSP
      migration is far too slow for our live migration on its own. Using
      an in-guest migrator lets us speed this up significantly.
      Signed-off-by: NNathan Tempelman <natet@google.com>
      Message-Id: <20210408223214.2582277-1-natet@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      54526d1f
  8. 20 4月, 2021 4 次提交