1. 18 6月, 2021 20 次提交
  2. 07 5月, 2021 1 次提交
    • V
      KVM: nVMX: Always make an attempt to map eVMCS after migration · f5c7e842
      Vitaly Kuznetsov 提交于
      When enlightened VMCS is in use and nested state is migrated with
      vmx_get_nested_state()/vmx_set_nested_state() KVM can't map evmcs
      page right away: evmcs gpa is not 'struct kvm_vmx_nested_state_hdr'
      and we can't read it from VP assist page because userspace may decide
      to restore HV_X64_MSR_VP_ASSIST_PAGE after restoring nested state
      (and QEMU, for example, does exactly that). To make sure eVMCS is
      mapped /vmx_set_nested_state() raises KVM_REQ_GET_NESTED_STATE_PAGES
      request.
      
      Commit f2c7ef3b ("KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES
      on nested vmexit") added KVM_REQ_GET_NESTED_STATE_PAGES clearing to
      nested_vmx_vmexit() to make sure MSR permission bitmap is not switched
      when an immediate exit from L2 to L1 happens right after migration (caused
      by a pending event, for example). Unfortunately, in the exact same
      situation we still need to have eVMCS mapped so
      nested_sync_vmcs12_to_shadow() reflects changes in VMCS12 to eVMCS.
      
      As a band-aid, restore nested_get_evmcs_page() when clearing
      KVM_REQ_GET_NESTED_STATE_PAGES in nested_vmx_vmexit(). The 'fix' is far
      from being ideal as we can't easily propagate possible failures and even if
      we could, this is most likely already too late to do so. The whole
      'KVM_REQ_GET_NESTED_STATE_PAGES' idea for mapping eVMCS after migration
      seems to be fragile as we diverge too much from the 'native' path when
      vmptr loading happens on vmx_set_nested_state().
      
      Fixes: f2c7ef3b ("KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit")
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20210503150854.1144255-2-vkuznets@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f5c7e842
  3. 26 4月, 2021 4 次提交
  4. 20 4月, 2021 2 次提交
    • S
      KVM: VMX: Enable SGX virtualization for SGX1, SGX2 and LC · 72add915
      Sean Christopherson 提交于
      Enable SGX virtualization now that KVM has the VM-Exit handlers needed
      to trap-and-execute ENCLS to ensure correctness and/or enforce the CPU
      model exposed to the guest.  Add a KVM module param, "sgx", to allow an
      admin to disable SGX virtualization independent of the kernel.
      
      When supported in hardware and the kernel, advertise SGX1, SGX2 and SGX
      LC to userspace via CPUID and wire up the ENCLS_EXITING bitmap based on
      the guest's SGX capabilities, i.e. to allow ENCLS to be executed in an
      SGX-enabled guest.  With the exception of the provision key, all SGX
      attribute bits may be exposed to the guest.  Guest access to the
      provision key, which is controlled via securityfs, will be added in a
      future patch.
      
      Note, KVM does not yet support exposing ENCLS_C leafs or ENCLV leafs.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NKai Huang <kai.huang@intel.com>
      Message-Id: <a99e9c23310c79f2f4175c1af4c4cbcef913c3e5.1618196135.git.kai.huang@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      72add915
    • S
      KVM: VMX: Add basic handling of VM-Exit from SGX enclave · 3c0c2ad1
      Sean Christopherson 提交于
      Add support for handling VM-Exits that originate from a guest SGX
      enclave.  In SGX, an "enclave" is a new CPL3-only execution environment,
      wherein the CPU and memory state is protected by hardware to make the
      state inaccesible to code running outside of the enclave.  When exiting
      an enclave due to an asynchronous event (from the perspective of the
      enclave), e.g. exceptions, interrupts, and VM-Exits, the enclave's state
      is automatically saved and scrubbed (the CPU loads synthetic state), and
      then reloaded when re-entering the enclave.  E.g. after an instruction
      based VM-Exit from an enclave, vmcs.GUEST_RIP will not contain the RIP
      of the enclave instruction that trigered VM-Exit, but will instead point
      to a RIP in the enclave's untrusted runtime (the guest userspace code
      that coordinates entry/exit to/from the enclave).
      
      To help a VMM recognize and handle exits from enclaves, SGX adds bits to
      existing VMCS fields, VM_EXIT_REASON.VMX_EXIT_REASON_FROM_ENCLAVE and
      GUEST_INTERRUPTIBILITY_INFO.GUEST_INTR_STATE_ENCLAVE_INTR.  Define the
      new architectural bits, and add a boolean to struct vcpu_vmx to cache
      VMX_EXIT_REASON_FROM_ENCLAVE.  Clear the bit in exit_reason so that
      checks against exit_reason do not need to account for SGX, e.g.
      "if (exit_reason == EXIT_REASON_EXCEPTION_NMI)" continues to work.
      
      KVM is a largely a passive observer of the new bits, e.g. KVM needs to
      account for the bits when propagating information to a nested VMM, but
      otherwise doesn't need to act differently for the majority of VM-Exits
      from enclaves.
      
      The one scenario that is directly impacted is emulation, which is for
      all intents and purposes impossible[1] since KVM does not have access to
      the RIP or instruction stream that triggered the VM-Exit.  The inability
      to emulate is a non-issue for KVM, as most instructions that might
      trigger VM-Exit unconditionally #UD in an enclave (before the VM-Exit
      check.  For the few instruction that conditionally #UD, KVM either never
      sets the exiting control, e.g. PAUSE_EXITING[2], or sets it if and only
      if the feature is not exposed to the guest in order to inject a #UD,
      e.g. RDRAND_EXITING.
      
      But, because it is still possible for a guest to trigger emulation,
      e.g. MMIO, inject a #UD if KVM ever attempts emulation after a VM-Exit
      from an enclave.  This is architecturally accurate for instruction
      VM-Exits, and for MMIO it's the least bad choice, e.g. it's preferable
      to killing the VM.  In practice, only broken or particularly stupid
      guests should ever encounter this behavior.
      
      Add a WARN in skip_emulated_instruction to detect any attempt to
      modify the guest's RIP during an SGX enclave VM-Exit as all such flows
      should either be unreachable or must handle exits from enclaves before
      getting to skip_emulated_instruction.
      
      [1] Impossible for all practical purposes.  Not truly impossible
          since KVM could implement some form of para-virtualization scheme.
      
      [2] PAUSE_LOOP_EXITING only affects CPL0 and enclaves exist only at
          CPL3, so we also don't need to worry about that interaction.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NKai Huang <kai.huang@intel.com>
      Message-Id: <315f54a8507d09c292463ef29104e1d4c62e9090.1618196135.git.kai.huang@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3c0c2ad1
  5. 17 4月, 2021 1 次提交
  6. 22 3月, 2021 1 次提交
    • I
      x86: Fix various typos in comments, take #2 · 163b0991
      Ingo Molnar 提交于
      Fix another ~42 single-word typos in arch/x86/ code comments,
      missed a few in the first pass, in particular in .S files.
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: linux-kernel@vger.kernel.org
      163b0991
  7. 15 3月, 2021 4 次提交
    • S
      KVM: x86: Handle triple fault in L2 without killing L1 · cb6a32c2
      Sean Christopherson 提交于
      Synthesize a nested VM-Exit if L2 triggers an emulated triple fault
      instead of exiting to userspace, which likely will kill L1.  Any flow
      that does KVM_REQ_TRIPLE_FAULT is suspect, but the most common scenario
      for L2 killing L1 is if L0 (KVM) intercepts a contributory exception that
      is _not_intercepted by L1.  E.g. if KVM is intercepting #GPs for the
      VMware backdoor, a #GP that occurs in L2 while vectoring an injected #DF
      will cause KVM to emulate triple fault.
      
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Jim Mattson <jmattson@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210302174515.2812275-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cb6a32c2
    • S
      KVM: x86: Move nVMX's consistency check macro to common code · 648fc8ae
      Sean Christopherson 提交于
      Move KVM's CC() macro to x86.h so that it can be reused by nSVM.
      Debugging VM-Enter is as painful on SVM as it is on VMX.
      
      Rename the more visible macro to KVM_NESTED_VMENTER_CONSISTENCY_CHECK
      to avoid any collisions with the uber-concise "CC".
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210204000117.3303214-12-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      648fc8ae
    • S
      KVM: nVMX: Defer the MMU reload to the normal path on an EPTP switch · c805f5d5
      Sean Christopherson 提交于
      Defer reloading the MMU after a EPTP successful EPTP switch.  The VMFUNC
      instruction itself is executed in the previous EPTP context, any side
      effects, e.g. updating RIP, should occur in the old context.  Practically
      speaking, this bug is benign as VMX doesn't touch the MMU when skipping
      an emulated instruction, nor does queuing a single-step #DB.  No other
      post-switch side effects exist.
      
      Fixes: 41ab9372 ("KVM: nVMX: Emulate EPTP switching for the L1 hypervisor")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210305011101.3597423-14-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c805f5d5
    • D
      KVM: x86: to track if L1 is running L2 VM · 43c11d91
      Dongli Zhang 提交于
      The new per-cpu stat 'nested_run' is introduced in order to track if L1 VM
      is running or used to run L2 VM.
      
      An example of the usage of 'nested_run' is to help the host administrator
      to easily track if any L1 VM is used to run L2 VM. Suppose there is issue
      that may happen with nested virtualization, the administrator will be able
      to easily narrow down and confirm if the issue is due to nested
      virtualization via 'nested_run'. For example, whether the fix like
      commit 88dddc11 ("KVM: nVMX: do not use dangling shadow VMCS after
      guest reset") is required.
      
      Cc: Joe Jin <joe.jin@oracle.com>
      Signed-off-by: NDongli Zhang <dongli.zhang@oracle.com>
      Message-Id: <20210305225747.7682-1-dongli.zhang@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      43c11d91
  8. 19 2月, 2021 2 次提交
    • M
      KVM: VMX: Dynamically enable/disable PML based on memslot dirty logging · a85863c2
      Makarand Sonare 提交于
      Currently, if enable_pml=1 PML remains enabled for the entire lifetime
      of the VM irrespective of whether dirty logging is enable or disabled.
      When dirty logging is disabled, all the pages of the VM are manually
      marked dirty, so that PML is effectively non-operational.  Setting
      the dirty bits is an expensive operation which can cause severe MMU
      lock contention in a performance sensitive path when dirty logging is
      disabled after a failed or canceled live migration.
      
      Manually setting dirty bits also fails to prevent PML activity if some
      code path clears dirty bits, which can incur unnecessary VM-Exits.
      
      In order to avoid this extra overhead, dynamically enable/disable PML
      when dirty logging gets turned on/off for the first/last memslot.
      Signed-off-by: NMakarand Sonare <makarandsonare@google.com>
      Co-developed-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210213005015.1651772-12-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a85863c2
    • S
      KVM: nVMX: Disable PML in hardware when running L2 · c3bb9a20
      Sean Christopherson 提交于
      Unconditionally disable PML in vmcs02, KVM emulates PML purely in the
      MMU, e.g. vmx_flush_pml_buffer() doesn't even try to copy the L2 GPAs
      from vmcs02's buffer to vmcs12.  At best, enabling PML is a nop.  At
      worst, it will cause vmx_flush_pml_buffer() to record bogus GFNs in the
      dirty logs.
      
      Initialize vmcs02.GUEST_PML_INDEX such that PML writes would trigger
      VM-Exit if PML was somehow enabled, skip flushing the buffer for guest
      mode since the index is bogus, and freak out if a PML full exit occurs
      when L2 is active.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210213005015.1651772-7-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c3bb9a20
  9. 18 2月, 2021 1 次提交
  10. 04 2月, 2021 4 次提交