1. 18 6月, 2021 3 次提交
  2. 25 5月, 2021 1 次提交
  3. 10 5月, 2021 1 次提交
  4. 07 5月, 2021 1 次提交
    • T
      KVM: SVM: Move GHCB unmapping to fix RCU warning · ce7ea0cf
      Tom Lendacky 提交于
      When an SEV-ES guest is running, the GHCB is unmapped as part of the
      vCPU run support. However, kvm_vcpu_unmap() triggers an RCU dereference
      warning with CONFIG_PROVE_LOCKING=y because the SRCU lock is released
      before invoking the vCPU run support.
      
      Move the GHCB unmapping into the prepare_guest_switch callback, which is
      invoked while still holding the SRCU lock, eliminating the RCU dereference
      warning.
      
      Fixes: 291bd20d ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT")
      Reported-by: NBorislav Petkov <bp@alien8.de>
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <b2f9b79d15166f2c3e4375c0d9bc3268b7696455.1620332081.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ce7ea0cf
  5. 26 4月, 2021 5 次提交
  6. 22 4月, 2021 1 次提交
    • N
      KVM: x86: Support KVM VMs sharing SEV context · 54526d1f
      Nathan Tempelman 提交于
      Add a capability for userspace to mirror SEV encryption context from
      one vm to another. On our side, this is intended to support a
      Migration Helper vCPU, but it can also be used generically to support
      other in-guest workloads scheduled by the host. The intention is for
      the primary guest and the mirror to have nearly identical memslots.
      
      The primary benefits of this are that:
      1) The VMs do not share KVM contexts (think APIC/MSRs/etc), so they
      can't accidentally clobber each other.
      2) The VMs can have different memory-views, which is necessary for post-copy
      migration (the migration vCPUs on the target need to read and write to
      pages, when the primary guest would VMEXIT).
      
      This does not change the threat model for AMD SEV. Any memory involved
      is still owned by the primary guest and its initial state is still
      attested to through the normal SEV_LAUNCH_* flows. If userspace wanted
      to circumvent SEV, they could achieve the same effect by simply attaching
      a vCPU to the primary VM.
      This patch deliberately leaves userspace in charge of the memslots for the
      mirror, as it already has the power to mess with them in the primary guest.
      
      This patch does not support SEV-ES (much less SNP), as it does not
      handle handing off attested VMSAs to the mirror.
      
      For additional context, we need a Migration Helper because SEV PSP
      migration is far too slow for our live migration on its own. Using
      an in-guest migrator lets us speed this up significantly.
      Signed-off-by: NNathan Tempelman <natet@google.com>
      Message-Id: <20210408223214.2582277-1-natet@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      54526d1f
  7. 20 4月, 2021 3 次提交
  8. 17 4月, 2021 1 次提交
    • M
      KVM: nSVM: improve SYSENTER emulation on AMD · adc2a237
      Maxim Levitsky 提交于
      Currently to support Intel->AMD migration, if CPU vendor is GenuineIntel,
      we emulate the full 64 value for MSR_IA32_SYSENTER_{EIP|ESP}
      msrs, and we also emulate the sysenter/sysexit instruction in long mode.
      
      (Emulator does still refuse to emulate sysenter in 64 bit mode, on the
      ground that the code for that wasn't tested and likely has no users)
      
      However when virtual vmload/vmsave is enabled, the vmload instruction will
      update these 32 bit msrs without triggering their msr intercept,
      which will lead to having stale values in kvm's shadow copy of these msrs,
      which relies on the intercept to be up to date.
      
      Fix/optimize this by doing the following:
      
      1. Enable the MSR intercepts for SYSENTER MSRs iff vendor=GenuineIntel
         (This is both a tiny optimization and also ensures that in case
         the guest cpu vendor is AMD, the msrs will be 32 bit wide as
         AMD defined).
      
      2. Store only high 32 bit part of these msrs on interception and combine
         it with hardware msr value on intercepted read/writes
         iff vendor=GenuineIntel.
      
      3. Disable vmload/vmsave virtualization if vendor=GenuineIntel.
         (It is somewhat insane to set vendor=GenuineIntel and still enable
         SVM for the guest but well whatever).
         Then zero the high 32 bit parts when kvm intercepts and emulates vmload.
      
      Thanks a lot to Paulo Bonzini for helping me with fixing this in the most
      correct way.
      
      This patch fixes nested migration of 32 bit nested guests, that was
      broken because incorrect cached values of SYSENTER msrs were stored in
      the migration stream if L1 changed these msrs with
      vmload prior to L2 entry.
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210401111928.996871-3-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      adc2a237
  9. 05 4月, 2021 1 次提交
  10. 15 3月, 2021 7 次提交
  11. 04 2月, 2021 3 次提交
    • M
      KVM: SVM: use .prepare_guest_switch() to handle CPU register save/setup · a7fc06dd
      Michael Roth 提交于
      Currently we save host state like user-visible host MSRs, and do some
      initial guest register setup for MSR_TSC_AUX and MSR_AMD64_TSC_RATIO
      in svm_vcpu_load(). Defer this until just before we enter the guest by
      moving the handling to kvm_x86_ops.prepare_guest_switch() similarly to
      how it is done for the VMX implementation.
      
      Additionally, since handling of saving/restoring host user MSRs is the
      same both with/without SEV-ES enabled, move that handling to common
      code.
      Suggested-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NMichael Roth <michael.roth@amd.com>
      Message-Id: <20210202190126.2185715-4-michael.roth@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a7fc06dd
    • M
      KVM: SVM: remove uneeded fields from host_save_users_msrs · 553cc15f
      Michael Roth 提交于
      Now that the set of host user MSRs that need to be individually
      saved/restored are the same with/without SEV-ES, we can drop the
      .sev_es_restored flag and just iterate through the list unconditionally
      for both cases. A subsequent patch can then move these loops to a
      common path.
      Signed-off-by: NMichael Roth <michael.roth@amd.com>
      Message-Id: <20210202190126.2185715-3-michael.roth@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      553cc15f
    • M
      KVM: SVM: use vmsave/vmload for saving/restoring additional host state · e79b91bb
      Michael Roth 提交于
      Using a guest workload which simply issues 'hlt' in a tight loop to
      generate VMEXITs, it was observed (on a recent EPYC processor) that a
      significant amount of the VMEXIT overhead measured on the host was the
      result of MSR reads/writes in svm_vcpu_load/svm_vcpu_put according to
      perf:
      
        67.49%--kvm_arch_vcpu_ioctl_run
                |
                |--23.13%--vcpu_put
                |          kvm_arch_vcpu_put
                |          |
                |          |--21.31%--native_write_msr
                |          |
                |           --1.27%--svm_set_cr4
                |
                |--16.11%--vcpu_load
                |          |
                |           --15.58%--kvm_arch_vcpu_load
                |                     |
                |                     |--13.97%--svm_set_cr4
                |                     |          |
                |                     |          |--12.64%--native_read_msr
      
      Most of these MSRs relate to 'syscall'/'sysenter' and segment bases, and
      can be saved/restored using 'vmsave'/'vmload' instructions rather than
      explicit MSR reads/writes. In doing so there is a significant reduction
      in the svm_vcpu_load/svm_vcpu_put overhead measured for the above
      workload:
      
        50.92%--kvm_arch_vcpu_ioctl_run
                |
                |--19.28%--disable_nmi_singlestep
                |
                |--13.68%--vcpu_load
                |          kvm_arch_vcpu_load
                |          |
                |          |--9.19%--svm_set_cr4
                |          |          |
                |          |           --6.44%--native_read_msr
                |          |
                |           --3.55%--native_write_msr
                |
                |--6.05%--kvm_inject_nmi
                |--2.80%--kvm_sev_es_mmio_read
                |--2.19%--vcpu_put
                |          |
                |           --1.25%--kvm_arch_vcpu_put
                |                     native_write_msr
      
      Quantifying this further, if we look at the raw cycle counts for a
      normal iteration of the above workload (according to 'rdtscp'),
      kvm_arch_vcpu_ioctl_run() takes ~4600 cycles from start to finish with
      the current behavior. Using 'vmsave'/'vmload', this is reduced to
      ~2800 cycles, a savings of 39%.
      
      While this approach doesn't seem to manifest in any noticeable
      improvement for more realistic workloads like UnixBench, netperf, and
      kernel builds, likely due to their exit paths generally involving IO
      with comparatively high latencies, it does improve overall overhead
      of KVM_RUN significantly, which may still be noticeable for certain
      situations. It also simplifies some aspects of the code.
      
      With this change, explicit save/restore is no longer needed for the
      following host MSRs, since they are documented[1] as being part of the
      VMCB State Save Area:
      
        MSR_STAR, MSR_LSTAR, MSR_CSTAR,
        MSR_SYSCALL_MASK, MSR_KERNEL_GS_BASE,
        MSR_IA32_SYSENTER_CS,
        MSR_IA32_SYSENTER_ESP,
        MSR_IA32_SYSENTER_EIP,
        MSR_FS_BASE, MSR_GS_BASE
      
      and only the following MSR needs individual handling in
      svm_vcpu_put/svm_vcpu_load:
      
        MSR_TSC_AUX
      
      We could drop the host_save_user_msrs array/loop and instead handle
      MSR read/write of MSR_TSC_AUX directly, but we leave that for now as
      a potential follow-up.
      
      Since 'vmsave'/'vmload' also handles the LDTR and FS/GS segment
      registers (and associated hidden state)[2], some of the code
      previously used to handle this is no longer needed, so we drop it
      as well.
      
      The first public release of the SVM spec[3] also documents the same
      handling for the host state in question, so we make these changes
      unconditionally.
      
      Also worth noting is that we 'vmsave' to the same page that is
      subsequently used by 'vmrun' to record some host additional state. This
      is okay, since, in accordance with the spec[2], the additional state
      written to the page by 'vmrun' does not overwrite any fields written by
      'vmsave'. This has also been confirmed through testing (for the above
      CPU, at least).
      
      [1] AMD64 Architecture Programmer's Manual, Rev 3.33, Volume 2, Appendix B, Table B-2
      [2] AMD64 Architecture Programmer's Manual, Rev 3.31, Volume 3, Chapter 4, VMSAVE/VMLOAD
      [3] Secure Virtual Machine Architecture Reference Manual, Rev 3.01
      Suggested-by: NTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NMichael Roth <michael.roth@amd.com>
      Message-Id: <20210202190126.2185715-2-michael.roth@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e79b91bb
  12. 03 2月, 2021 1 次提交
    • P
      KVM: x86: cleanup CR3 reserved bits checks · c1c35cf7
      Paolo Bonzini 提交于
      If not in long mode, the low bits of CR3 are reserved but not enforced to
      be zero, so remove those checks.  If in long mode, however, the MBZ bits
      extend down to the highest physical address bit of the guest, excluding
      the encryption bit.
      
      Make the checks consistent with the above, and match them between
      nested_vmcb_checks and KVM_SET_SREGS.
      
      Cc: stable@vger.kernel.org
      Fixes: 761e4169 ("KVM: nSVM: Check that MBZ bits in CR3 and CR4 are not set on vmrun of nested guests")
      Fixes: a780a3ea ("KVM: X86: Fix reserved bits check for MOV to CR3")
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c1c35cf7
  13. 08 1月, 2021 1 次提交
    • T
      KVM: SVM: Add support for booting APs in an SEV-ES guest · 647daca2
      Tom Lendacky 提交于
      Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence,
      where the guest vCPU register state is updated and then the vCPU is VMRUN
      to begin execution of the AP. For an SEV-ES guest, this won't work because
      the guest register state is encrypted.
      
      Following the GHCB specification, the hypervisor must not alter the guest
      register state, so KVM must track an AP/vCPU boot. Should the guest want
      to park the AP, it must use the AP Reset Hold exit event in place of, for
      example, a HLT loop.
      
      First AP boot (first INIT-SIPI-SIPI sequence):
        Execute the AP (vCPU) as it was initialized and measured by the SEV-ES
        support. It is up to the guest to transfer control of the AP to the
        proper location.
      
      Subsequent AP boot:
        KVM will expect to receive an AP Reset Hold exit event indicating that
        the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to
        awaken it. When the AP Reset Hold exit event is received, KVM will place
        the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI
        sequence, KVM will make the vCPU runnable. It is again up to the guest
        to then transfer control of the AP to the proper location.
      
        To differentiate between an actual HLT and an AP Reset Hold, a new MP
        state is introduced, KVM_MP_STATE_AP_RESET_HOLD, which the vCPU is
        placed in upon receiving the AP Reset Hold exit event. Additionally, to
        communicate the AP Reset Hold exit event up to userspace (if needed), a
        new exit reason is introduced, KVM_EXIT_AP_RESET_HOLD.
      
      A new x86 ops function is introduced, vcpu_deliver_sipi_vector, in order
      to accomplish AP booting. For VMX, vcpu_deliver_sipi_vector is set to the
      original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(). SVM adds
      a new function that, for non SEV-ES guests, invokes the original SIPI
      delivery function, kvm_vcpu_deliver_sipi_vector(), but for SEV-ES guests,
      implements the logic above.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <e8fbebe8eb161ceaabdad7c01a5859a78b424d5e.1609791600.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      647daca2
  14. 16 12月, 2020 1 次提交
  15. 15 12月, 2020 10 次提交
    • T
      KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests · 16809ecd
      Tom Lendacky 提交于
      The run sequence is different for an SEV-ES guest compared to a legacy or
      even an SEV guest. The guest vCPU register state of an SEV-ES guest will
      be restored on VMRUN and saved on VMEXIT. There is no need to restore the
      guest registers directly and through VMLOAD before VMRUN and no need to
      save the guest registers directly and through VMSAVE on VMEXIT.
      
      Update the svm_vcpu_run() function to skip register state saving and
      restoring and provide an alternative function for running an SEV-ES guest
      in vmenter.S
      
      Additionally, certain host state is restored across an SEV-ES VMRUN. As
      a result certain register states are not required to be restored upon
      VMEXIT (e.g. FS, GS, etc.), so only do that if the guest is not an SEV-ES
      guest.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <fb1c66d32f2194e171b95fc1a8affd6d326e10c1.1607620209.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      16809ecd
    • T
      KVM: SVM: Provide support for SEV-ES vCPU loading · 86137773
      Tom Lendacky 提交于
      An SEV-ES vCPU requires additional VMCB vCPU load/put requirements. SEV-ES
      hardware will restore certain registers on VMEXIT, but not save them on
      VMRUN (see Table B-3 and Table B-4 of the AMD64 APM Volume 2), so make the
      following changes:
      
      General vCPU load changes:
        - During vCPU loading, perform a VMSAVE to the per-CPU SVM save area and
          save the current values of XCR0, XSS and PKRU to the per-CPU SVM save
          area as these registers will be restored on VMEXIT.
      
      General vCPU put changes:
        - Do not attempt to restore registers that SEV-ES hardware has already
          restored on VMEXIT.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <019390e9cb5e93cd73014fa5a040c17d42588733.1607620209.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      86137773
    • T
      KVM: SVM: Provide support for SEV-ES vCPU creation/loading · 376c6d28
      Tom Lendacky 提交于
      An SEV-ES vCPU requires additional VMCB initialization requirements for
      vCPU creation and vCPU load/put requirements. This includes:
      
      General VMCB initialization changes:
        - Set a VMCB control bit to enable SEV-ES support on the vCPU.
        - Set the VMCB encrypted VM save area address.
        - CRx registers are part of the encrypted register state and cannot be
          updated. Remove the CRx register read and write intercepts and replace
          them with CRx register write traps to track the CRx register values.
        - Certain MSR values are part of the encrypted register state and cannot
          be updated. Remove certain MSR intercepts (EFER, CR_PAT, etc.).
        - Remove the #GP intercept (no support for "enable_vmware_backdoor").
        - Remove the XSETBV intercept since the hypervisor cannot modify XCR0.
      
      General vCPU creation changes:
        - Set the initial GHCB gpa value as per the GHCB specification.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <3a8aef366416eddd5556dfa3fdc212aafa1ad0a2.1607620209.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      376c6d28
    • T
      KVM: SVM: Set the encryption mask for the SVM host save area · 85ca8be9
      Tom Lendacky 提交于
      The SVM host save area is used to restore some host state on VMEXIT of an
      SEV-ES guest. After allocating the save area, clear it and add the
      encryption mask to the SVM host save area physical address that is
      programmed into the VM_HSAVE_PA MSR.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <b77aa28af6d7f1a0cb545959e08d6dc75e0c3cba.1607620209.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      85ca8be9
    • T
      KVM: SVM: Support string IO operations for an SEV-ES guest · 7ed9abfe
      Tom Lendacky 提交于
      For an SEV-ES guest, string-based port IO is performed to a shared
      (un-encrypted) page so that both the hypervisor and guest can read or
      write to it and each see the contents.
      
      For string-based port IO operations, invoke SEV-ES specific routines that
      can complete the operation using common KVM port IO support.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <9d61daf0ffda496703717218f415cdc8fd487100.1607620209.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7ed9abfe
    • T
      KVM: SVM: Support MMIO for an SEV-ES guest · 8f423a80
      Tom Lendacky 提交于
      For an SEV-ES guest, MMIO is performed to a shared (un-encrypted) page
      so that both the hypervisor and guest can read or write to it and each
      see the contents.
      
      The GHCB specification provides software-defined VMGEXIT exit codes to
      indicate a request for an MMIO read or an MMIO write. Add support to
      recognize the MMIO requests and invoke SEV-ES specific routines that
      can complete the MMIO operation. These routines use common KVM support
      to complete the MMIO operation.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <af8de55127d5bcc3253d9b6084a0144c12307d4d.1607620209.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8f423a80
    • T
      KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x100 · e1d71116
      Tom Lendacky 提交于
      The GHCB specification defines a GHCB MSR protocol using the lower
      12-bits of the GHCB MSR (in the hypervisor this corresponds to the
      GHCB GPA field in the VMCB).
      
      Function 0x100 is a request for termination of the guest. The guest has
      encountered some situation for which it has requested to be terminated.
      The GHCB MSR value contains the reason for the request.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <f3a1f7850c75b6ea4101e15bbb4a3af1a203f1dc.1607620209.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e1d71116
    • T
      KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x004 · d3694667
      Tom Lendacky 提交于
      The GHCB specification defines a GHCB MSR protocol using the lower
      12-bits of the GHCB MSR (in the hypervisor this corresponds to the
      GHCB GPA field in the VMCB).
      
      Function 0x004 is a request for CPUID information. Only a single CPUID
      result register can be sent per invocation, so the protocol defines the
      register that is requested. The GHCB MSR value is set to the CPUID
      register value as per the specification via the VMCB GHCB GPA field.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <fd7ee347d3936e484c06e9001e340bf6387092cd.1607620209.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d3694667
    • T
      KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x002 · 1edc1459
      Tom Lendacky 提交于
      The GHCB specification defines a GHCB MSR protocol using the lower
      12-bits of the GHCB MSR (in the hypervisor this corresponds to the
      GHCB GPA field in the VMCB).
      
      Function 0x002 is a request to set the GHCB MSR value to the SEV INFO as
      per the specification via the VMCB GHCB GPA field.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <c23c163a505290a0d1b9efc4659b838c8c902cbc.1607620209.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1edc1459
    • T
      KVM: SVM: Add initial support for a VMGEXIT VMEXIT · 291bd20d
      Tom Lendacky 提交于
      SEV-ES adds a new VMEXIT reason code, VMGEXIT. Initial support for a
      VMGEXIT includes mapping the GHCB based on the guest GPA, which is
      obtained from a new VMCB field, and then validating the required inputs
      for the VMGEXIT exit reason.
      
      Since many of the VMGEXIT exit reasons correspond to existing VMEXIT
      reasons, the information from the GHCB is copied into the VMCB control
      exit code areas and KVM register areas. The standard exit handlers are
      invoked, similar to standard VMEXIT processing. Before restarting the
      vCPU, the GHCB is updated with any registers that have been updated by
      the hypervisor.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <c6a4ed4294a369bd75c44d03bd7ce0f0c3840e50.1607620209.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      291bd20d