1. 02 2月, 2021 3 次提交
    • V
      KVM: x86: Supplement __cr4_reserved_bits() with X86_FEATURE_PCID check · 4683d758
      Vitaly Kuznetsov 提交于
      Commit 7a873e45 ("KVM: selftests: Verify supported CR4 bits can be set
      before KVM_SET_CPUID2") reveals that KVM allows to set X86_CR4_PCIDE even
      when PCID support is missing:
      
      ==== Test Assertion Failure ====
        x86_64/set_sregs_test.c:41: rc
        pid=6956 tid=6956 - Invalid argument
           1	0x000000000040177d: test_cr4_feature_bit at set_sregs_test.c:41
           2	0x00000000004014fc: main at set_sregs_test.c:119
           3	0x00007f2d9346d041: ?? ??:0
           4	0x000000000040164d: _start at ??:?
        KVM allowed unsupported CR4 bit (0x20000)
      
      Add X86_FEATURE_PCID feature check to __cr4_reserved_bits() to make
      kvm_is_valid_cr4() fail.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20210201142843.108190-1-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4683d758
    • Z
      KVM/x86: assign hva with the right value to vm_munmap the pages · b66f9bab
      Zheng Zhan Liang 提交于
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Wanpeng Li <wanpengli@tencent.com>
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NZheng Zhan Liang <zhengzhanliang@huorong.cn>
      Message-Id: <20210201055310.267029-1-zhengzhanliang@huorong.cn>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b66f9bab
    • P
      KVM: x86: Allow guests to see MSR_IA32_TSX_CTRL even if tsx=off · 7131636e
      Paolo Bonzini 提交于
      Userspace that does not know about KVM_GET_MSR_FEATURE_INDEX_LIST
      will generally use the default value for MSR_IA32_ARCH_CAPABILITIES.
      When this happens and the host has tsx=on, it is possible to end up with
      virtual machines that have HLE and RTM disabled, but TSX_CTRL available.
      
      If the fleet is then switched to tsx=off, kvm_get_arch_capabilities()
      will clear the ARCH_CAP_TSX_CTRL_MSR bit and it will not be possible to
      use the tsx=off hosts as migration destinations, even though the guests
      do not have TSX enabled.
      
      To allow this migration, allow guests to write to their TSX_CTRL MSR,
      while keeping the host MSR unchanged for the entire life of the guests.
      This ensures that TSX remains disabled and also saves MSR reads and
      writes, and it's okay to do because with tsx=off we know that guests will
      not have the HLE and RTM features in their CPUID.  (If userspace sets
      bogus CPUID data, we do not expect HLE and RTM to work in guests anyway).
      
      Cc: stable@vger.kernel.org
      Fixes: cbbaa272 ("KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES")
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7131636e
  2. 29 1月, 2021 1 次提交
    • P
      Fix unsynchronized access to sev members through svm_register_enc_region · 19a23da5
      Peter Gonda 提交于
      Grab kvm->lock before pinning memory when registering an encrypted
      region; sev_pin_memory() relies on kvm->lock being held to ensure
      correctness when checking and updating the number of pinned pages.
      
      Add a lockdep assertion to help prevent future regressions.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: x86@kernel.org
      Cc: kvm@vger.kernel.org
      Cc: stable@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Fixes: 1e80fdc0 ("KVM: SVM: Pin guest memory when SEV is active")
      Signed-off-by: NPeter Gonda <pgonda@google.com>
      
      V2
       - Fix up patch description
       - Correct file paths svm.c -> sev.c
       - Add unlock of kvm->lock on sev_pin_memory error
      
      V1
       - https://lore.kernel.org/kvm/20210126185431.1824530-1-pgonda@google.com/
      
      Message-Id: <20210127161524.2832400-1-pgonda@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      19a23da5
  3. 28 1月, 2021 1 次提交
    • M
      KVM: x86: fix CPUID entries returned by KVM_GET_CPUID2 ioctl · 181f4948
      Michael Roth 提交于
      Recent commit 255cbecf modified struct kvm_vcpu_arch to make
      'cpuid_entries' a pointer to an array of kvm_cpuid_entry2 entries
      rather than embedding the array in the struct. KVM_SET_CPUID and
      KVM_SET_CPUID2 were updated accordingly, but KVM_GET_CPUID2 was missed.
      
      As a result, KVM_GET_CPUID2 currently returns random fields from struct
      kvm_vcpu_arch to userspace rather than the expected CPUID values. Fix
      this by treating 'cpuid_entries' as a pointer when copying its
      contents to userspace buffer.
      
      Fixes: 255cbecf ("KVM: x86: allocate vcpu->arch.cpuid_entries dynamically")
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NMichael Roth <michael.roth@amd.com.com>
      Message-Id: <20210128024451.1816770-1-michael.roth@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      181f4948
  4. 26 1月, 2021 9 次提交
    • P
      KVM: x86: allow KVM_REQ_GET_NESTED_STATE_PAGES outside guest mode for VMX · 9a78e158
      Paolo Bonzini 提交于
      VMX also uses KVM_REQ_GET_NESTED_STATE_PAGES for the Hyper-V eVMCS,
      which may need to be loaded outside guest mode.  Therefore we cannot
      WARN in that case.
      
      However, that part of nested_get_vmcs12_pages is _not_ needed at
      vmentry time.  Split it out of KVM_REQ_GET_NESTED_STATE_PAGES handling,
      so that both vmentry and migration (and in the latter case, independent
      of is_guest_mode) do the parts that are needed.
      
      Cc: <stable@vger.kernel.org> # 5.10.x: f2c7ef3b: KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES
      Cc: <stable@vger.kernel.org> # 5.10.x
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9a78e158
    • S
      KVM: x86: Revert "KVM: x86: Mark GPRs dirty when written" · aed89418
      Sean Christopherson 提交于
      Revert the dirty/available tracking of GPRs now that KVM copies the GPRs
      to the GHCB on any post-VMGEXIT VMRUN, even if a GPR is not dirty.  Per
      commit de3cd117 ("KVM: x86: Omit caching logic for always-available
      GPRs"), tracking for GPRs noticeably impacts KVM's code footprint.
      
      This reverts commit 1c04d8c9.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210122235049.3107620-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      aed89418
    • S
      KVM: SVM: Unconditionally sync GPRs to GHCB on VMRUN of SEV-ES guest · 25009140
      Sean Christopherson 提交于
      Drop the per-GPR dirty checks when synchronizing GPRs to the GHCB, the
      GRPs' dirty bits are set from time zero and never cleared, i.e. will
      always be seen as dirty.  The obvious alternative would be to clear
      the dirty bits when appropriate, but removing the dirty checks is
      desirable as it allows reverting GPR dirty+available tracking, which
      adds overhead to all flavors of x86 VMs.
      
      Note, unconditionally writing the GPRs in the GHCB is tacitly allowed
      by the GHCB spec, which allows the hypervisor (or guest) to provide
      unnecessary info; it's the guest's responsibility to consume only what
      it needs (the hypervisor is untrusted after all).
      
        The guest and hypervisor can supply additional state if desired but
        must not rely on that additional state being provided.
      
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Fixes: 291bd20d ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT")
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210122235049.3107620-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      25009140
    • M
      KVM: nVMX: Sync unsync'd vmcs02 state to vmcs12 on migration · d51e1d3f
      Maxim Levitsky 提交于
      Even when we are outside the nested guest, some vmcs02 fields
      may not be in sync vs vmcs12.  This is intentional, even across
      nested VM-exit, because the sync can be delayed until the nested
      hypervisor performs a VMCLEAR or a VMREAD/VMWRITE that affects those
      rarely accessed fields.
      
      However, during KVM_GET_NESTED_STATE, the vmcs12 has to be up to date to
      be able to restore it.  To fix that, call copy_vmcs02_to_vmcs12_rare()
      before the vmcs12 contents are copied to userspace.
      
      Fixes: 7952d769 ("KVM: nVMX: Sync rarely accessed guest fields only when needed")
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210114205449.8715-2-mlevitsk@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d51e1d3f
    • L
      kvm: tracing: Fix unmatched kvm_entry and kvm_exit events · d95df951
      Lorenzo Brescia 提交于
      On VMX, if we exit and then re-enter immediately without leaving
      the vmx_vcpu_run() function, the kvm_entry event is not logged.
      That means we will see one (or more) kvm_exit, without its (their)
      corresponding kvm_entry, as shown here:
      
       CPU-1979 [002] 89.871187: kvm_entry: vcpu 1
       CPU-1979 [002] 89.871218: kvm_exit:  reason MSR_WRITE
       CPU-1979 [002] 89.871259: kvm_exit:  reason MSR_WRITE
      
      It also seems possible for a kvm_entry event to be logged, but then
      we leave vmx_vcpu_run() right away (if vmx->emulation_required is
      true). In this case, we will have a spurious kvm_entry event in the
      trace.
      
      Fix these situations by moving trace_kvm_entry() inside vmx_vcpu_run()
      (where trace_kvm_exit() already is).
      
      A trace obtained with this patch applied looks like this:
      
       CPU-14295 [000] 8388.395387: kvm_entry: vcpu 0
       CPU-14295 [000] 8388.395392: kvm_exit:  reason MSR_WRITE
       CPU-14295 [000] 8388.395393: kvm_entry: vcpu 0
       CPU-14295 [000] 8388.395503: kvm_exit:  reason EXTERNAL_INTERRUPT
      
      Of course, not calling trace_kvm_entry() in common x86 code any
      longer means that we need to adjust the SVM side of things too.
      Signed-off-by: NLorenzo Brescia <lorenzo.brescia@edu.unito.it>
      Signed-off-by: NDario Faggioli <dfaggioli@suse.com>
      Message-Id: <160873470698.11652.13483635328769030605.stgit@Wayrath>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d95df951
    • J
      KVM: x86: get smi pending status correctly · 1f7becf1
      Jay Zhou 提交于
      The injection process of smi has two steps:
      
          Qemu                        KVM
      Step1:
          cpu->interrupt_request &= \
              ~CPU_INTERRUPT_SMI;
          kvm_vcpu_ioctl(cpu, KVM_SMI)
      
                                      call kvm_vcpu_ioctl_smi() and
                                      kvm_make_request(KVM_REQ_SMI, vcpu);
      
      Step2:
          kvm_vcpu_ioctl(cpu, KVM_RUN, 0)
      
                                      call process_smi() if
                                      kvm_check_request(KVM_REQ_SMI, vcpu) is
                                      true, mark vcpu->arch.smi_pending = true;
      
      The vcpu->arch.smi_pending will be set true in step2, unfortunately if
      vcpu paused between step1 and step2, the kvm_run->immediate_exit will be
      set and vcpu has to exit to Qemu immediately during step2 before mark
      vcpu->arch.smi_pending true.
      During VM migration, Qemu will get the smi pending status from KVM using
      KVM_GET_VCPU_EVENTS ioctl at the downtime, then the smi pending status
      will be lost.
      Signed-off-by: NJay Zhou <jianjay.zhou@huawei.com>
      Signed-off-by: NShengen Zhuang <zhuangshengen@huawei.com>
      Message-Id: <20210118084720.1585-1-jianjay.zhou@huawei.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1f7becf1
    • L
      KVM: x86/pmu: Fix HW_REF_CPU_CYCLES event pseudo-encoding in intel_arch_events[] · 98dd2f10
      Like Xu 提交于
      The HW_REF_CPU_CYCLES event on the fixed counter 2 is pseudo-encoded as
      0x0300 in the intel_perfmon_event_map[]. Correct its usage.
      
      Fixes: 62079d8a ("KVM: PMU: add proper support for fixed counter 2")
      Signed-off-by: NLike Xu <like.xu@linux.intel.com>
      Message-Id: <20201230081916.63417-1-like.xu@linux.intel.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      98dd2f10
    • L
      KVM: x86/pmu: Fix UBSAN shift-out-of-bounds warning in intel_pmu_refresh() · e61ab2a3
      Like Xu 提交于
      Since we know vPMU will not work properly when (1) the guest bit_width(s)
      of the [gp|fixed] counters are greater than the host ones, or (2) guest
      requested architectural events exceeds the range supported by the host, so
      we can setup a smaller left shift value and refresh the guest cpuid entry,
      thus fixing the following UBSAN shift-out-of-bounds warning:
      
      shift exponent 197 is too large for 64-bit type 'long long unsigned int'
      
      Call Trace:
       __dump_stack lib/dump_stack.c:79 [inline]
       dump_stack+0x107/0x163 lib/dump_stack.c:120
       ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
       __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395
       intel_pmu_refresh.cold+0x75/0x99 arch/x86/kvm/vmx/pmu_intel.c:348
       kvm_vcpu_after_set_cpuid+0x65a/0xf80 arch/x86/kvm/cpuid.c:177
       kvm_vcpu_ioctl_set_cpuid2+0x160/0x440 arch/x86/kvm/cpuid.c:308
       kvm_arch_vcpu_ioctl+0x11b6/0x2d70 arch/x86/kvm/x86.c:4709
       kvm_vcpu_ioctl+0x7b9/0xdb0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3386
       vfs_ioctl fs/ioctl.c:48 [inline]
       __do_sys_ioctl fs/ioctl.c:753 [inline]
       __se_sys_ioctl fs/ioctl.c:739 [inline]
       __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported-by: syzbot+ae488dc136a4cc6ba32b@syzkaller.appspotmail.com
      Signed-off-by: NLike Xu <like.xu@linux.intel.com>
      Message-Id: <20210118025800.34620-1-like.xu@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e61ab2a3
    • S
      KVM: x86: Add more protection against undefined behavior in rsvd_bits() · eb79cd00
      Sean Christopherson 提交于
      Add compile-time asserts in rsvd_bits() to guard against KVM passing in
      garbage hardcoded values, and cap the upper bound at '63' for dynamic
      values to prevent generating a mask that would overflow a u64.
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210113204515.3473079-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      eb79cd00
  5. 08 1月, 2021 14 次提交
    • P
      KVM: x86: __kvm_vcpu_halt can be static · 872f36eb
      Paolo Bonzini 提交于
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      872f36eb
    • T
      KVM: SVM: Add support for booting APs in an SEV-ES guest · 647daca2
      Tom Lendacky 提交于
      Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence,
      where the guest vCPU register state is updated and then the vCPU is VMRUN
      to begin execution of the AP. For an SEV-ES guest, this won't work because
      the guest register state is encrypted.
      
      Following the GHCB specification, the hypervisor must not alter the guest
      register state, so KVM must track an AP/vCPU boot. Should the guest want
      to park the AP, it must use the AP Reset Hold exit event in place of, for
      example, a HLT loop.
      
      First AP boot (first INIT-SIPI-SIPI sequence):
        Execute the AP (vCPU) as it was initialized and measured by the SEV-ES
        support. It is up to the guest to transfer control of the AP to the
        proper location.
      
      Subsequent AP boot:
        KVM will expect to receive an AP Reset Hold exit event indicating that
        the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to
        awaken it. When the AP Reset Hold exit event is received, KVM will place
        the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI
        sequence, KVM will make the vCPU runnable. It is again up to the guest
        to then transfer control of the AP to the proper location.
      
        To differentiate between an actual HLT and an AP Reset Hold, a new MP
        state is introduced, KVM_MP_STATE_AP_RESET_HOLD, which the vCPU is
        placed in upon receiving the AP Reset Hold exit event. Additionally, to
        communicate the AP Reset Hold exit event up to userspace (if needed), a
        new exit reason is introduced, KVM_EXIT_AP_RESET_HOLD.
      
      A new x86 ops function is introduced, vcpu_deliver_sipi_vector, in order
      to accomplish AP booting. For VMX, vcpu_deliver_sipi_vector is set to the
      original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(). SVM adds
      a new function that, for non SEV-ES guests, invokes the original SIPI
      delivery function, kvm_vcpu_deliver_sipi_vector(), but for SEV-ES guests,
      implements the logic above.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <e8fbebe8eb161ceaabdad7c01a5859a78b424d5e.1609791600.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      647daca2
    • M
      KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit · f2c7ef3b
      Maxim Levitsky 提交于
      It is possible to exit the nested guest mode, entered by
      svm_set_nested_state prior to first vm entry to it (e.g due to pending event)
      if the nested run was not pending during the migration.
      
      In this case we must not switch to the nested msr permission bitmap.
      Also add a warning to catch similar cases in the future.
      
      Fixes: a7d5c7ce ("KVM: nSVM: delay MSR permission processing to first nested VM run")
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210107093854.882483-2-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f2c7ef3b
    • M
      KVM: nSVM: mark vmcb as dirty when forcingly leaving the guest mode · 56fe28de
      Maxim Levitsky 提交于
      We overwrite most of vmcb fields while doing so, so we must
      mark it as dirty.
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210107093854.882483-5-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      56fe28de
    • M
      KVM: nSVM: correctly restore nested_run_pending on migration · 81f76ada
      Maxim Levitsky 提交于
      The code to store it on the migration exists, but no code was restoring it.
      
      One of the side effects of fixing this is that L1->L2 injected events
      are no longer lost when migration happens with nested run pending.
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210107093854.882483-3-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      81f76ada
    • B
      KVM: x86/mmu: Ensure TDP MMU roots are freed after yield · a889ea54
      Ben Gardon 提交于
      Many TDP MMU functions which need to perform some action on all TDP MMU
      roots hold a reference on that root so that they can safely drop the MMU
      lock in order to yield to other threads. However, when releasing the
      reference on the root, there is a bug: the root will not be freed even
      if its reference count (root_count) is reduced to 0.
      
      To simplify acquiring and releasing references on TDP MMU root pages, and
      to ensure that these roots are properly freed, move the get/put operations
      into another TDP MMU root iterator macro.
      
      Moving the get/put operations into an iterator macro also helps
      simplify control flow when a root does need to be freed. Note that using
      the list_for_each_entry_safe macro would not have been appropriate in
      this situation because it could keep a pointer to the next root across
      an MMU lock release + reacquire, during which time that root could be
      freed.
      Reported-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Fixes: faaf05b0 ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU")
      Fixes: 063afacd ("kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU")
      Fixes: a6a0b05d ("kvm: x86/mmu: Support dirty logging for the TDP MMU")
      Fixes: 14881998 ("kvm: x86/mmu: Support disabling dirty logging for the tdp MMU")
      Signed-off-by: NBen Gardon <bgardon@google.com>
      Message-Id: <20210107001935.3732070-1-bgardon@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a889ea54
    • S
      KVM: x86: change in pv_eoi_get_pending() to make code more readable · de7860c8
      Stephen Zhang 提交于
      Signed-off-by: NStephen Zhang <stephenzhangzsd@gmail.com>
      Message-Id: <1608277897-1932-1-git-send-email-stephenzhangzsd@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      de7860c8
    • P
      KVM: x86: fix shift out of bounds reported by UBSAN · 2f80d502
      Paolo Bonzini 提交于
      Since we know that e >= s, we can reassociate the left shift,
      changing the shifted number from 1 to 2 in exchange for
      decreasing the right hand side by 1.
      
      Reported-by: syzbot+e87846c48bf72bc85311@syzkaller.appspotmail.com
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2f80d502
    • U
      KVM/SVM: Remove leftover __svm_vcpu_run prototype from svm.c · 52782d5b
      Uros Bizjak 提交于
      Commit 16809ecd moved __svm_vcpu_run the prototype to svm.h,
      but forgot to remove the original from svm.c.
      
      Fixes: 16809ecd ("KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests")
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NUros Bizjak <ubizjak@gmail.com>
      Message-Id: <20201220200339.65115-1-ubizjak@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      52782d5b
    • N
      KVM: SVM: Add register operand to vmsave call in sev_es_vcpu_load · f65cf84e
      Nathan Chancellor 提交于
      When using LLVM's integrated assembler (LLVM_IAS=1) while building
      x86_64_defconfig + CONFIG_KVM=y + CONFIG_KVM_AMD=y, the following build
      error occurs:
      
       $ make LLVM=1 LLVM_IAS=1 arch/x86/kvm/svm/sev.o
       arch/x86/kvm/svm/sev.c:2004:15: error: too few operands for instruction
               asm volatile(__ex("vmsave") : : "a" (__sme_page_pa(sd->save_area)) : "memory");
                            ^
       arch/x86/kvm/svm/sev.c:28:17: note: expanded from macro '__ex'
       #define __ex(x) __kvm_handle_fault_on_reboot(x)
                       ^
       ./arch/x86/include/asm/kvm_host.h:1646:10: note: expanded from macro '__kvm_handle_fault_on_reboot'
               "666: \n\t"                                                     \
                       ^
       <inline asm>:2:2: note: instantiated into assembly here
               vmsave
               ^
       1 error generated.
      
      This happens because LLVM currently does not support calling vmsave
      without the fixed register operand (%rax for 64-bit and %eax for
      32-bit). This will be fixed in LLVM 12 but the kernel currently supports
      LLVM 10.0.1 and newer so this needs to be handled.
      
      Add the proper register using the _ASM_AX macro, which matches the
      vmsave call in vmenter.S.
      
      Fixes: 86137773 ("KVM: SVM: Provide support for SEV-ES vCPU loading")
      Link: https://reviews.llvm.org/D93524
      Link: https://github.com/ClangBuiltLinux/linux/issues/1216Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
      Message-Id: <20201219063711.3526947-1-natechancellor@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f65cf84e
    • S
      KVM: x86/mmu: Optimize not-present/MMIO SPTE check in get_mmio_spte() · 9aa41879
      Sean Christopherson 提交于
      Check only the terminal leaf for a "!PRESENT || MMIO" SPTE when looking
      for reserved bits on valid, non-MMIO SPTEs.  The get_walk() helpers
      terminate their walks if a not-present or MMIO SPTE is encountered, i.e.
      the non-terminal SPTEs have already been verified to be regular SPTEs.
      This eliminates an extra check-and-branch in a relatively hot loop.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20201218003139.2167891-5-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9aa41879
    • S
      KVM: x86/mmu: Use raw level to index into MMIO walks' sptes array · dde81f94
      Sean Christopherson 提交于
      Bump the size of the sptes array by one and use the raw level of the
      SPTE to index into the sptes array.  Using the SPTE level directly
      improves readability by eliminating the need to reason out why the level
      is being adjusted when indexing the array.  The array is on the stack
      and is not explicitly initialized; bumping its size is nothing more than
      a superficial adjustment to the stack frame.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20201218003139.2167891-4-seanjc@google.com>
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      dde81f94
    • S
      KVM: x86/mmu: Get root level from walkers when retrieving MMIO SPTE · 39b4d43e
      Sean Christopherson 提交于
      Get the so called "root" level from the low level shadow page table
      walkers instead of manually attempting to calculate it higher up the
      stack, e.g. in get_mmio_spte().  When KVM is using PAE shadow paging,
      the starting level of the walk, from the callers perspective, is not
      the CR3 root but rather the PDPTR "root".  Checking for reserved bits
      from the CR3 root causes get_mmio_spte() to consume uninitialized stack
      data due to indexing into sptes[] for a level that was not filled by
      get_walk().  This can result in false positives and/or negatives
      depending on what garbage happens to be on the stack.
      
      Opportunistically nuke a few extra newlines.
      
      Fixes: 95fb5b02 ("kvm: x86/mmu: Support MMIO in the TDP MMU")
      Reported-by: NRichard Herbert <rherbert@sympatico.ca>
      Cc: Ben Gardon <bgardon@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20201218003139.2167891-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      39b4d43e
    • S
      KVM: x86/mmu: Use -1 to flag an undefined spte in get_mmio_spte() · 2aa07893
      Sean Christopherson 提交于
      Return -1 from the get_walk() helpers if the shadow walk doesn't fill at
      least one spte, which can theoretically happen if the walk hits a
      not-present PDPTR.  Returning the root level in such a case will cause
      get_mmio_spte() to return garbage (uninitialized stack data).  In
      practice, such a scenario should be impossible as KVM shouldn't get a
      reserved-bit page fault with a not-present PDPTR.
      
      Note, using mmu->root_level in get_walk() is wrong for other reasons,
      too, but that's now a moot point.
      
      Fixes: 95fb5b02 ("kvm: x86/mmu: Support MMIO in the TDP MMU")
      Cc: Ben Gardon <bgardon@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20201218003139.2167891-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2aa07893
  6. 20 12月, 2020 1 次提交
  7. 17 12月, 2020 1 次提交
  8. 16 12月, 2020 1 次提交
  9. 15 12月, 2020 9 次提交
    • T
      KVM: SVM: Provide support to launch and run an SEV-ES guest · ad73109a
      Tom Lendacky 提交于
      An SEV-ES guest is started by invoking a new SEV initialization ioctl,
      KVM_SEV_ES_INIT. This identifies the guest as an SEV-ES guest, which is
      used to drive the appropriate ASID allocation, VMSA encryption, etc.
      
      Before being able to run an SEV-ES vCPU, the vCPU VMSA must be encrypted
      and measured. This is done using the LAUNCH_UPDATE_VMSA command after all
      calls to LAUNCH_UPDATE_DATA have been performed, but before LAUNCH_MEASURE
      has been performed. In order to establish the encrypted VMSA, the current
      (traditional) VMSA and the GPRs are synced to the page that will hold the
      encrypted VMSA and then LAUNCH_UPDATE_VMSA is invoked. The vCPU is then
      marked as having protected guest state.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <e9643245adb809caf3a87c09997926d2f3d6ff41.1607620209.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ad73109a
    • T
      KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests · 16809ecd
      Tom Lendacky 提交于
      The run sequence is different for an SEV-ES guest compared to a legacy or
      even an SEV guest. The guest vCPU register state of an SEV-ES guest will
      be restored on VMRUN and saved on VMEXIT. There is no need to restore the
      guest registers directly and through VMLOAD before VMRUN and no need to
      save the guest registers directly and through VMSAVE on VMEXIT.
      
      Update the svm_vcpu_run() function to skip register state saving and
      restoring and provide an alternative function for running an SEV-ES guest
      in vmenter.S
      
      Additionally, certain host state is restored across an SEV-ES VMRUN. As
      a result certain register states are not required to be restored upon
      VMEXIT (e.g. FS, GS, etc.), so only do that if the guest is not an SEV-ES
      guest.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <fb1c66d32f2194e171b95fc1a8affd6d326e10c1.1607620209.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      16809ecd
    • T
      KVM: SVM: Provide support for SEV-ES vCPU loading · 86137773
      Tom Lendacky 提交于
      An SEV-ES vCPU requires additional VMCB vCPU load/put requirements. SEV-ES
      hardware will restore certain registers on VMEXIT, but not save them on
      VMRUN (see Table B-3 and Table B-4 of the AMD64 APM Volume 2), so make the
      following changes:
      
      General vCPU load changes:
        - During vCPU loading, perform a VMSAVE to the per-CPU SVM save area and
          save the current values of XCR0, XSS and PKRU to the per-CPU SVM save
          area as these registers will be restored on VMEXIT.
      
      General vCPU put changes:
        - Do not attempt to restore registers that SEV-ES hardware has already
          restored on VMEXIT.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <019390e9cb5e93cd73014fa5a040c17d42588733.1607620209.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      86137773
    • T
      KVM: SVM: Provide support for SEV-ES vCPU creation/loading · 376c6d28
      Tom Lendacky 提交于
      An SEV-ES vCPU requires additional VMCB initialization requirements for
      vCPU creation and vCPU load/put requirements. This includes:
      
      General VMCB initialization changes:
        - Set a VMCB control bit to enable SEV-ES support on the vCPU.
        - Set the VMCB encrypted VM save area address.
        - CRx registers are part of the encrypted register state and cannot be
          updated. Remove the CRx register read and write intercepts and replace
          them with CRx register write traps to track the CRx register values.
        - Certain MSR values are part of the encrypted register state and cannot
          be updated. Remove certain MSR intercepts (EFER, CR_PAT, etc.).
        - Remove the #GP intercept (no support for "enable_vmware_backdoor").
        - Remove the XSETBV intercept since the hypervisor cannot modify XCR0.
      
      General vCPU creation changes:
        - Set the initial GHCB gpa value as per the GHCB specification.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <3a8aef366416eddd5556dfa3fdc212aafa1ad0a2.1607620209.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      376c6d28
    • T
      KVM: SVM: Update ASID allocation to support SEV-ES guests · 80675b3a
      Tom Lendacky 提交于
      SEV and SEV-ES guests each have dedicated ASID ranges. Update the ASID
      allocation routine to return an ASID in the respective range.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <d7aed505e31e3954268b2015bb60a1486269c780.1607620209.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      80675b3a
    • T
      KVM: SVM: Set the encryption mask for the SVM host save area · 85ca8be9
      Tom Lendacky 提交于
      The SVM host save area is used to restore some host state on VMEXIT of an
      SEV-ES guest. After allocating the save area, clear it and add the
      encryption mask to the SVM host save area physical address that is
      programmed into the VM_HSAVE_PA MSR.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <b77aa28af6d7f1a0cb545959e08d6dc75e0c3cba.1607620209.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      85ca8be9
    • T
      KVM: SVM: Add NMI support for an SEV-ES guest · 4444dfe4
      Tom Lendacky 提交于
      The GHCB specification defines how NMIs are to be handled for an SEV-ES
      guest. To detect the completion of an NMI the hypervisor must not
      intercept the IRET instruction (because a #VC while running the NMI will
      issue an IRET) and, instead, must receive an NMI Complete exit event from
      the guest.
      
      Update the KVM support for detecting the completion of NMIs in the guest
      to follow the GHCB specification. When an SEV-ES guest is active, the
      IRET instruction will no longer be intercepted. Now, when the NMI Complete
      exit event is received, the iret_interception() function will be called
      to simulate the completion of the NMI.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <5ea3dd69b8d4396cefdc9048ebc1ab7caa70a847.1607620209.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4444dfe4
    • T
      KVM: SVM: Guest FPU state save/restore not needed for SEV-ES guest · ed02b213
      Tom Lendacky 提交于
      The guest FPU state is automatically restored on VMRUN and saved on VMEXIT
      by the hardware, so there is no reason to do this in KVM. Eliminate the
      allocation of the guest_fpu save area and key off that to skip operations
      related to the guest FPU state.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <173e429b4d0d962c6a443c4553ffdaf31b7665a4.1607620209.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ed02b213
    • T
      KVM: SVM: Do not report support for SMM for an SEV-ES guest · 5719455f
      Tom Lendacky 提交于
      SEV-ES guests do not currently support SMM. Update the has_emulated_msr()
      kvm_x86_ops function to take a struct kvm parameter so that the capability
      can be reported at a VM level.
      
      Since this op is also called during KVM initialization and before a struct
      kvm instance is available, comments will be added to each implementation
      of has_emulated_msr() to indicate the kvm parameter can be null.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <75de5138e33b945d2fb17f81ae507bda381808e3.1607620209.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5719455f