1. 02 8月, 2021 2 次提交
  2. 25 6月, 2021 10 次提交
    • M
      KVM: x86: rename apic_access_page_done to apic_access_memslot_enabled · a01b45e9
      Maxim Levitsky 提交于
      This better reflects the purpose of this variable on AMD, since
      on AMD the AVIC's memory slot can be enabled and disabled dynamically.
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210623113002.111448-4-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a01b45e9
    • A
      kvm: x86: Allow userspace to handle emulation errors · 19238e75
      Aaron Lewis 提交于
      Add a fallback mechanism to the in-kernel instruction emulator that
      allows userspace the opportunity to process an instruction the emulator
      was unable to.  When the in-kernel instruction emulator fails to process
      an instruction it will either inject a #UD into the guest or exit to
      userspace with exit reason KVM_INTERNAL_ERROR.  This is because it does
      not know how to proceed in an appropriate manner.  This feature lets
      userspace get involved to see if it can figure out a better path
      forward.
      Signed-off-by: NAaron Lewis <aaronlewis@google.com>
      Reviewed-by: NDavid Edmondson <david.edmondson@oracle.com>
      Message-Id: <20210510144834.658457-2-aaronlewis@google.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      19238e75
    • S
      KVM: x86/mmu: Optimize and clean up so called "last nonleaf level" logic · 7cd138db
      Sean Christopherson 提交于
      Drop the pre-computed last_nonleaf_level, which is arguably wrong and at
      best confusing.  Per the comment:
      
        Can have large pages at levels 2..last_nonleaf_level-1.
      
      the intent of the variable would appear to be to track what levels can
      _legally_ have large pages, but that intent doesn't align with reality.
      The computed value will be wrong for 5-level paging, or if 1gb pages are
      not supported.
      
      The flawed code is not a problem in practice, because except for 32-bit
      PSE paging, bit 7 is reserved if large pages aren't supported at the
      level.  Take advantage of this invariant and simply omit the level magic
      math for 64-bit page tables (including PAE).
      
      For 32-bit paging (non-PAE), the adjustments are needed purely because
      bit 7 is ignored if PSE=0.  Retain that logic as is, but make
      is_last_gpte() unique per PTTYPE so that the PSE check is avoided for
      PAE and EPT paging.  In the spirit of avoiding branches, bump the "last
      nonleaf level" for 32-bit PSE paging by adding the PSE bit itself.
      
      Note, bit 7 is ignored or has other meaning in CR3/EPTP, but despite
      FNAME(walk_addr_generic) briefly grabbing CR3/EPTP in "pte", they are
      not PTEs and will blow up all the other gpte helpers.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210622175739.3610207-51-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7cd138db
    • S
      KVM: x86: Enhance comments for MMU roles and nested transition trickiness · 616007c8
      Sean Christopherson 提交于
      Expand the comments for the MMU roles.  The interactions with gfn_track
      PGD reuse in particular are hairy.
      
      Regarding PGD reuse, add comments in the nested virtualization flows to
      call out why kvm_init_mmu() is unconditionally called even when nested
      TDP is used.
      
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210622175739.3610207-50-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      616007c8
    • S
      KVM: x86/mmu: Drop "nx" from MMU context now that there are no readers · a4c93252
      Sean Christopherson 提交于
      Drop kvm_mmu.nx as there no consumers left.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210622175739.3610207-39-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a4c93252
    • S
      KVM: x86/mmu: Rename "nxe" role bit to "efer_nx" for macro shenanigans · 167f8a5c
      Sean Christopherson 提交于
      Rename "nxe" to "efer_nx" so that future macro magic can use the pattern
      <reg>_<bit> for all CR0, CR4, and EFER bits that included in the role.
      Using "efer_nx" also makes it clear that the role bit reflects EFER.NX,
      not the NX bit in the corresponding PTE.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210622175739.3610207-25-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      167f8a5c
    • S
      Revert "KVM: MMU: record maximum physical address width in kvm_mmu_extended_role" · 6c032f12
      Sean Christopherson 提交于
      Drop MAXPHYADDR from mmu_role now that all MMUs have their role
      invalidated after a CPUID update.  Invalidating the role forces all MMUs
      to re-evaluate the guest's MAXPHYADDR, and the guest's MAXPHYADDR can
      only be changed only through a CPUID update.
      
      This reverts commit de3ccd26.
      
      Cc: Yu Zhang <yu.c.zhang@linux.intel.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210622175739.3610207-9-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6c032f12
    • S
      KVM: x86: Alert userspace that KVM_SET_CPUID{,2} after KVM_RUN is broken · 63f5a190
      Sean Christopherson 提交于
      Warn userspace that KVM_SET_CPUID{,2} after KVM_RUN "may" cause guest
      instability.  Initialize last_vmentry_cpu to -1 and use it to detect if
      the vCPU has been run at least once when its CPUID model is changed.
      
      KVM does not correctly handle changes to paging related settings in the
      guest's vCPU model after KVM_RUN, e.g. MAXPHYADDR, GBPAGES, etc...  KVM
      could theoretically zap all shadow pages, but actually making that happen
      is a mess due to lock inversion (vcpu->mutex is held).  And even then,
      updating paging settings on the fly would only work if all vCPUs are
      stopped, updated in concert with identical settings, then restarted.
      
      To support running vCPUs with different vCPU models (that affect paging),
      KVM would need to track all relevant information in kvm_mmu_page_role.
      Note, that's the _page_ role, not the full mmu_role.  Updating mmu_role
      isn't sufficient as a vCPU can reuse a shadow page translation that was
      created by a vCPU with different settings and thus completely skip the
      reserved bit checks (that are tied to CPUID).
      
      Tracking CPUID state in kvm_mmu_page_role is _extremely_ undesirable as
      it would require doubling gfn_track from a u16 to a u32, i.e. would
      increase KVM's memory footprint by 2 bytes for every 4kb of guest memory.
      E.g. MAXPHYADDR (6 bits), GBPAGES, AMD vs. INTEL = 1 bit, and SEV C-BIT
      would all need to be tracked.
      
      In practice, there is no remotely sane use case for changing any paging
      related CPUID entries on the fly, so just sweep it under the rug (after
      yelling at userspace).
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210622175739.3610207-8-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      63f5a190
    • S
      KVM: x86: Force all MMUs to reinitialize if guest CPUID is modified · 49c6f875
      Sean Christopherson 提交于
      Invalidate all MMUs' roles after a CPUID update to force reinitizliation
      of the MMU context/helpers.  Despite the efforts of commit de3ccd26
      ("KVM: MMU: record maximum physical address width in kvm_mmu_extended_role"),
      there are still a handful of CPUID-based properties that affect MMU
      behavior but are not incorporated into mmu_role.  E.g. 1gb hugepage
      support, AMD vs. Intel handling of bit 8, and SEV's C-Bit location all
      factor into the guest's reserved PTE bits.
      
      The obvious alternative would be to add all such properties to mmu_role,
      but doing so provides no benefit over simply forcing a reinitialization
      on every CPUID update, as setting guest CPUID is a rare operation.
      
      Note, reinitializing all MMUs after a CPUID update does not fix all of
      KVM's woes.  Specifically, kvm_mmu_page_role doesn't track the CPUID
      properties, which means that a vCPU can reuse shadow pages that should
      not exist for the new vCPU model, e.g. that map GPAs that are now illegal
      (due to MAXPHYADDR changes) or that set bits that are now reserved
      (PAGE_SIZE for 1gb pages), etc...
      
      Tracking the relevant CPUID properties in kvm_mmu_page_role would address
      the majority of problems, but fully tracking that much state in the
      shadow page role comes with an unpalatable cost as it would require a
      non-trivial increase in KVM's memory footprint.  The GBPAGES case is even
      worse, as neither Intel nor AMD provides a way to disable 1gb hugepage
      support in the hardware page walker, i.e. it's a virtualization hole that
      can't be closed when using TDP.
      
      In other words, resetting the MMU after a CPUID update is largely a
      superficial fix.  But, it will allow reverting the tracking of MAXPHYADDR
      in the mmu_role, and that case in particular needs to mostly work because
      KVM's shadow_root_level depends on guest MAXPHYADDR when 5-level paging
      is supported.  For cases where KVM botches guest behavior, the damage is
      limited to that guest.  But for the shadow_root_level, a misconfigured
      MMU can cause KVM to incorrectly access memory, e.g. due to walking off
      the end of its shadow page tables.
      
      Fixes: 7dcd5755 ("x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is needed")
      Cc: Yu Zhang <yu.c.zhang@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210622175739.3610207-7-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      49c6f875
    • S
      Revert "KVM: x86/mmu: Drop kvm_mmu_extended_role.cr4_la57 hack" · f71a53d1
      Sean Christopherson 提交于
      Restore CR4.LA57 to the mmu_role to fix an amusing edge case with nested
      virtualization.  When KVM (L0) is using TDP, CR4.LA57 is not reflected in
      mmu_role.base.level because that tracks the shadow root level, i.e. TDP
      level.  Normally, this is not an issue because LA57 can't be toggled
      while long mode is active, i.e. the guest has to first disable paging,
      then toggle LA57, then re-enable paging, thus ensuring an MMU
      reinitialization.
      
      But if L1 is crafty, it can load a new CR4 on VM-Exit and toggle LA57
      without having to bounce through an unpaged section.  L1 can also load a
      new CR3 on exit, i.e. it doesn't even need to play crazy paging games, a
      single entry PML5 is sufficient.  Such shenanigans are only problematic
      if L0 and L1 use TDP, otherwise L1 and L2 share an MMU that gets
      reinitialized on nested VM-Enter/VM-Exit due to mmu_role.base.guest_mode.
      
      Note, in the L2 case with nested TDP, even though L1 can switch between
      L2s with different LA57 settings, thus bypassing the paging requirement,
      in that case KVM's nested_mmu will track LA57 in base.level.
      
      This reverts commit 8053f924.
      
      Fixes: 8053f924 ("KVM: x86/mmu: Drop kvm_mmu_extended_role.cr4_la57 hack")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210622175739.3610207-6-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f71a53d1
  3. 24 6月, 2021 1 次提交
  4. 18 6月, 2021 22 次提交
  5. 27 5月, 2021 1 次提交
  6. 07 5月, 2021 4 次提交
    • S
      KVM: x86: Prevent KVM SVM from loading on kernels with 5-level paging · 03ca4589
      Sean Christopherson 提交于
      Disallow loading KVM SVM if 5-level paging is supported.  In theory, NPT
      for L1 should simply work, but there unknowns with respect to how the
      guest's MAXPHYADDR will be handled by hardware.
      
      Nested NPT is more problematic, as running an L1 VMM that is using
      2-level page tables requires stacking single-entry PDP and PML4 tables in
      KVM's NPT for L2, as there are no equivalent entries in L1's NPT to
      shadow.  Barring hardware magic, for 5-level paging, KVM would need stack
      another layer to handle PML5.
      
      Opportunistically rename the lm_root pointer, which is used for the
      aforementioned stacking when shadowing 2-level L1 NPT, to pml4_root to
      call out that it's specifically for PML4.
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210505204221.1934471-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      03ca4589
    • C
      KVM: X86: Add support for the emulation of DR6_BUS_LOCK bit · e8ea85fb
      Chenyi Qiang 提交于
      Bus lock debug exception introduces a new bit DR6_BUS_LOCK (bit 11 of
      DR6) to indicate that bus lock #DB exception is generated. The set/clear
      of DR6_BUS_LOCK is similar to the DR6_RTM. The processor clears
      DR6_BUS_LOCK when the exception is generated. For all other #DB, the
      processor sets this bit to 1. Software #DB handler should set this bit
      before returning to the interrupted task.
      
      In VMM, to avoid breaking the CPUs without bus lock #DB exception
      support, activate the DR6_BUS_LOCK conditionally in DR6_FIXED_1 bits.
      When intercepting the #DB exception caused by bus locks, bit 11 of the
      exit qualification is set to identify it. The VMM should emulate the
      exception by clearing the bit 11 of the guest DR6.
      Co-developed-by: NXiaoyao Li <xiaoyao.li@intel.com>
      Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
      Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
      Message-Id: <20210202090433.13441-3-chenyi.qiang@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e8ea85fb
    • S
      KVM: x86: Tie Intel and AMD behavior for MSR_TSC_AUX to guest CPU model · 61a05d44
      Sean Christopherson 提交于
      Squish the Intel and AMD emulation of MSR_TSC_AUX together and tie it to
      the guest CPU model instead of the host CPU behavior.  While not strictly
      necessary to avoid guest breakage, emulating cross-vendor "architecture"
      will provide consistent behavior for the guest, e.g. WRMSR fault behavior
      won't change if the vCPU is migrated to a host with divergent behavior.
      
      Note, the "new" kvm_is_supported_user_return_msr() checks do not add new
      functionality on either SVM or VMX.  On SVM, the equivalent was
      "tsc_aux_uret_slot < 0", and on VMX the check was buried in the
      vmx_find_uret_msr() call at the find_uret_msr label.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210504171734.1434054-15-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      61a05d44
    • S
      KVM: x86: Move uret MSR slot management to common x86 · e5fda4bb
      Sean Christopherson 提交于
      Now that SVM and VMX both probe MSRs before "defining" user return slots
      for them, consolidate the code for probe+define into common x86 and
      eliminate the odd behavior of having the vendor code define the slot for
      a given MSR.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210504171734.1434054-14-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e5fda4bb