1. 20 4月, 2021 17 次提交
    • S
      KVM: VMX: Add basic handling of VM-Exit from SGX enclave · 3c0c2ad1
      Sean Christopherson 提交于
      Add support for handling VM-Exits that originate from a guest SGX
      enclave.  In SGX, an "enclave" is a new CPL3-only execution environment,
      wherein the CPU and memory state is protected by hardware to make the
      state inaccesible to code running outside of the enclave.  When exiting
      an enclave due to an asynchronous event (from the perspective of the
      enclave), e.g. exceptions, interrupts, and VM-Exits, the enclave's state
      is automatically saved and scrubbed (the CPU loads synthetic state), and
      then reloaded when re-entering the enclave.  E.g. after an instruction
      based VM-Exit from an enclave, vmcs.GUEST_RIP will not contain the RIP
      of the enclave instruction that trigered VM-Exit, but will instead point
      to a RIP in the enclave's untrusted runtime (the guest userspace code
      that coordinates entry/exit to/from the enclave).
      
      To help a VMM recognize and handle exits from enclaves, SGX adds bits to
      existing VMCS fields, VM_EXIT_REASON.VMX_EXIT_REASON_FROM_ENCLAVE and
      GUEST_INTERRUPTIBILITY_INFO.GUEST_INTR_STATE_ENCLAVE_INTR.  Define the
      new architectural bits, and add a boolean to struct vcpu_vmx to cache
      VMX_EXIT_REASON_FROM_ENCLAVE.  Clear the bit in exit_reason so that
      checks against exit_reason do not need to account for SGX, e.g.
      "if (exit_reason == EXIT_REASON_EXCEPTION_NMI)" continues to work.
      
      KVM is a largely a passive observer of the new bits, e.g. KVM needs to
      account for the bits when propagating information to a nested VMM, but
      otherwise doesn't need to act differently for the majority of VM-Exits
      from enclaves.
      
      The one scenario that is directly impacted is emulation, which is for
      all intents and purposes impossible[1] since KVM does not have access to
      the RIP or instruction stream that triggered the VM-Exit.  The inability
      to emulate is a non-issue for KVM, as most instructions that might
      trigger VM-Exit unconditionally #UD in an enclave (before the VM-Exit
      check.  For the few instruction that conditionally #UD, KVM either never
      sets the exiting control, e.g. PAUSE_EXITING[2], or sets it if and only
      if the feature is not exposed to the guest in order to inject a #UD,
      e.g. RDRAND_EXITING.
      
      But, because it is still possible for a guest to trigger emulation,
      e.g. MMIO, inject a #UD if KVM ever attempts emulation after a VM-Exit
      from an enclave.  This is architecturally accurate for instruction
      VM-Exits, and for MMIO it's the least bad choice, e.g. it's preferable
      to killing the VM.  In practice, only broken or particularly stupid
      guests should ever encounter this behavior.
      
      Add a WARN in skip_emulated_instruction to detect any attempt to
      modify the guest's RIP during an SGX enclave VM-Exit as all such flows
      should either be unreachable or must handle exits from enclaves before
      getting to skip_emulated_instruction.
      
      [1] Impossible for all practical purposes.  Not truly impossible
          since KVM could implement some form of para-virtualization scheme.
      
      [2] PAUSE_LOOP_EXITING only affects CPL0 and enclaves exist only at
          CPL3, so we also don't need to worry about that interaction.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NKai Huang <kai.huang@intel.com>
      Message-Id: <315f54a8507d09c292463ef29104e1d4c62e9090.1618196135.git.kai.huang@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3c0c2ad1
    • S
      KVM: x86: Add reverse-CPUID lookup support for scattered SGX features · 01de8682
      Sean Christopherson 提交于
      Define a new KVM-only feature word for advertising and querying SGX
      sub-features in CPUID.0x12.0x0.EAX.  Because SGX1 and SGX2 are scattered
      in the kernel's feature word, they need to be translated so that the
      bit numbers match those of hardware.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NKai Huang <kai.huang@intel.com>
      Message-Id: <e797c533f4c71ae89265bbb15a02aef86b67cbec.1618196135.git.kai.huang@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      01de8682
    • S
      KVM: x86: Add support for reverse CPUID lookup of scattered features · 4e66c0cb
      Sean Christopherson 提交于
      Introduce a scheme that allows KVM's CPUID magic to support features
      that are scattered in the kernel's feature words.  To advertise and/or
      query guest support for CPUID-based features, KVM requires the bit
      number of an X86_FEATURE_* to match the bit number in its associated
      CPUID entry.  For scattered features, this does not hold true.
      
      Add a framework to allow defining KVM-only words, stored in
      kvm_cpu_caps after the shared kernel caps, that can be used to gather
      the scattered feature bits by translating X86_FEATURE_* flags into their
      KVM-defined feature.
      
      Note, because reverse_cpuid_check() effectively forces kvm_cpu_caps
      lookups to be resolved at compile time, there is no runtime cost for
      translating from kernel-defined to kvm-defined features.
      
      More details here:  https://lkml.kernel.org/r/X/jxCOLG+HUO4QlZ@google.comSigned-off-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NKai Huang <kai.huang@intel.com>
      Message-Id: <16cad8d00475f67867fb36701fc7fb7c1ec86ce1.1618196135.git.kai.huang@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4e66c0cb
    • S
      KVM: x86: Define new #PF SGX error code bit · 00e7646c
      Sean Christopherson 提交于
      Page faults that are signaled by the SGX Enclave Page Cache Map (EPCM),
      as opposed to the traditional IA32/EPT page tables, set an SGX bit in
      the error code to indicate that the #PF was induced by SGX.  KVM will
      need to emulate this behavior as part of its trap-and-execute scheme for
      virtualizing SGX Launch Control, e.g. to inject SGX-induced #PFs if
      EINIT faults in the host, and to support live migration.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NKai Huang <kai.huang@intel.com>
      Message-Id: <e170c5175cb9f35f53218a7512c9e3db972b97a2.1618196135.git.kai.huang@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      00e7646c
    • S
      KVM: x86: Export kvm_mmu_gva_to_gpa_{read,write}() for SGX (VMX) · 54f958cd
      Sean Christopherson 提交于
      Export the gva_to_gpa() helpers for use by SGX virtualization when
      executing ENCLS[ECREATE] and ENCLS[EINIT] on behalf of the guest.
      To execute ECREATE and EINIT, KVM must obtain the GPA of the target
      Secure Enclave Control Structure (SECS) in order to get its
      corresponding HVA.
      
      Because the SECS must reside in the Enclave Page Cache (EPC), copying
      the SECS's data to a host-controlled buffer via existing exported
      helpers is not a viable option as the EPC is not readable or writable
      by the kernel.
      
      SGX virtualization will also use gva_to_gpa() to obtain HVAs for
      non-EPC pages in order to pass user pointers directly to ECREATE and
      EINIT, which avoids having to copy pages worth of data into the kernel.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Acked-by: NJarkko Sakkinen <jarkko@kernel.org>
      Signed-off-by: NKai Huang <kai.huang@intel.com>
      Message-Id: <02f37708321bcdfaa2f9d41c8478affa6e84b04d.1618196135.git.kai.huang@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      54f958cd
    • H
      KVM: vmx: add mismatched size assertions in vmcs_check32() · 870c575a
      Haiwei Li 提交于
      Add compile-time assertions in vmcs_check32() to disallow accesses to
      64-bit and 64-bit high fields via vmcs_{read,write}32().  Upper level KVM
      code should never do partial accesses to VMCS fields.  KVM handles the
      split accesses automatically in vmcs_{read,write}64() when running as a
      32-bit kernel.
      Reviewed-and-tested-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NHaiwei Li <lihaiwei@tencent.com>
      Message-Id: <20210409022456.23528-1-lihaiwei.kernel@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      870c575a
    • K
      KVM: x86: Remove unused function declaration · d90b15ed
      Keqian Zhu 提交于
      kvm_mmu_slot_largepage_remove_write_access() is decared but not used,
      just remove it.
      Signed-off-by: NKeqian Zhu <zhukeqian1@huawei.com>
      Message-Id: <20210406063504.17552-1-zhukeqian1@huawei.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d90b15ed
    • S
      KVM: SVM: Enhance and clean up the vmcb tracking comment in pre_svm_run() · 44f1b558
      Sean Christopherson 提交于
      Explicitly document why a vmcb must be marked dirty and assigned a new
      asid when it will be run on a different cpu.  The "what" is relatively
      obvious, whereas the "why" requires reading the APM and/or KVM code.
      
      Opportunistically remove a spurious period and several unnecessary
      newlines in the comment.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210406171811.4043363-5-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      44f1b558
    • S
      KVM: SVM: Add a comment to clarify what vcpu_svm.vmcb points at · 554cf314
      Sean Christopherson 提交于
      Add a comment above the declaration of vcpu_svm.vmcb to call out that it
      is simply a shorthand for current_vmcb->ptr.  The myriad accesses to
      svm->vmcb are quite confusing without this crucial detail.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210406171811.4043363-4-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      554cf314
    • S
      KVM: SVM: Drop vcpu_svm.vmcb_pa · d1788191
      Sean Christopherson 提交于
      Remove vmcb_pa from vcpu_svm and simply read current_vmcb->pa directly in
      the one path where it is consumed.  Unlike svm->vmcb, use of the current
      vmcb's address is very limited, as evidenced by the fact that its use
      can be trimmed to a single dereference.
      
      Opportunistically add a comment about using vmcb01 for VMLOAD/VMSAVE, at
      first glance using vmcb01 instead of vmcb_pa looks wrong.
      
      No functional change intended.
      
      Cc: Maxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210406171811.4043363-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d1788191
    • S
      KVM: SVM: Don't set current_vmcb->cpu when switching vmcb · 17e5e964
      Sean Christopherson 提交于
      Do not update the new vmcb's last-run cpu when switching to a different
      vmcb.  If the vCPU is migrated between its last run and a vmcb switch,
      e.g. for nested VM-Exit, then setting the cpu without marking the vmcb
      dirty will lead to KVM running the vCPU on a different physical cpu with
      stale clean bit settings.
      
                                vcpu->cpu    current_vmcb->cpu    hardware
        pre_svm_run()           cpu0         cpu0                 cpu0,clean
        kvm_arch_vcpu_load()    cpu1         cpu0                 cpu0,clean
        svm_switch_vmcb()       cpu1         cpu1                 cpu0,clean
        pre_svm_run()           cpu1         cpu1                 kaboom
      
      Simply delete the offending code; unlike VMX, which needs to update the
      cpu at switch time due to the need to do VMPTRLD, SVM only cares about
      which cpu last ran the vCPU.
      
      Fixes: af18fa77 ("KVM: nSVM: Track the physical cpu of the vmcb vmrun through the vmcb")
      Cc: Cathy Avery <cavery@redhat.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210406171811.4043363-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      17e5e964
    • T
      KVM: SVM: Make sure GHCB is mapped before updating · a3ba26ec
      Tom Lendacky 提交于
      Access to the GHCB is mainly in the VMGEXIT path and it is known that the
      GHCB will be mapped. But there are two paths where it is possible the GHCB
      might not be mapped.
      
      The sev_vcpu_deliver_sipi_vector() routine will update the GHCB to inform
      the caller of the AP Reset Hold NAE event that a SIPI has been delivered.
      However, if a SIPI is performed without a corresponding AP Reset Hold,
      then the GHCB might not be mapped (depending on the previous VMEXIT),
      which will result in a NULL pointer dereference.
      
      The svm_complete_emulated_msr() routine will update the GHCB to inform
      the caller of a RDMSR/WRMSR operation about any errors. While it is likely
      that the GHCB will be mapped in this situation, add a safe guard
      in this path to be certain a NULL pointer dereference is not encountered.
      
      Fixes: f1c6366e ("KVM: SVM: Add required changes to support intercepts under SEV-ES")
      Fixes: 647daca2 ("KVM: SVM: Add support for booting APs in an SEV-ES guest")
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Cc: stable@vger.kernel.org
      Message-Id: <a5d3ebb600a91170fc88599d5a575452b3e31036.1617979121.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a3ba26ec
    • W
      KVM: X86: Do not yield to self · a1fa4cbd
      Wanpeng Li 提交于
      If the target is self we do not need to yield, we can avoid malicious
      guest to play this.
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Message-Id: <1617941911-5338-3-git-send-email-wanpengli@tencent.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a1fa4cbd
    • W
      KVM: X86: Count attempted/successful directed yield · 4a7132ef
      Wanpeng Li 提交于
      To analyze some performance issues with lock contention and scheduling,
      it is nice to know when directed yield are successful or failing.
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Message-Id: <1617941911-5338-2-git-send-email-wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4a7132ef
    • W
      x86/kvm: Don't bother __pv_cpu_mask when !CONFIG_SMP · 2b519b57
      Wanpeng Li 提交于
      Enable PV TLB shootdown when !CONFIG_SMP doesn't make sense. Let's
      move it inside CONFIG_SMP. In addition, we can avoid define and
      alloc __pv_cpu_mask when !CONFIG_SMP and get rid of 'alloc' variable
      in kvm_alloc_cpumask.
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Message-Id: <1617941911-5338-1-git-send-email-wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2b519b57
    • B
      KVM: x86/mmu: Tear down roots before kvm_mmu_zap_all_fast returns · 4c6654bd
      Ben Gardon 提交于
      To avoid saddling a vCPU thread with the work of tearing down an entire
      paging structure, take a reference on each root before they become
      obsolete, so that the thread initiating the fast invalidation can tear
      down the paging structure and (most likely) release the last reference.
      As a bonus, this teardown can happen under the MMU lock in read mode so
      as not to block the progress of vCPU threads.
      Signed-off-by: NBen Gardon <bgardon@google.com>
      Message-Id: <20210401233736.638171-14-bgardon@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4c6654bd
    • B
      KVM: x86/mmu: Fast invalidation for TDP MMU · b7cccd39
      Ben Gardon 提交于
      Provide a real mechanism for fast invalidation by marking roots as
      invalid so that their reference count will quickly fall to zero
      and they will be torn down.
      
      One negative side affect of this approach is that a vCPU thread will
      likely drop the last reference to a root and be saddled with the work of
      tearing down an entire paging structure. This issue will be resolved in
      a later commit.
      Signed-off-by: NBen Gardon <bgardon@google.com>
      Message-Id: <20210401233736.638171-13-bgardon@google.com>
      [Move the loop to tdp_mmu.c, otherwise compilation fails on 32-bit. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b7cccd39
  2. 19 4月, 2021 12 次提交
  3. 17 4月, 2021 11 次提交