1. 15 7月, 2021 3 次提交
  2. 25 6月, 2021 4 次提交
  3. 18 6月, 2021 7 次提交
  4. 07 5月, 2021 2 次提交
  5. 03 5月, 2021 3 次提交
  6. 22 4月, 2021 1 次提交
  7. 17 4月, 2021 4 次提交
    • M
      KVM: x86: pending exceptions must not be blocked by an injected event · 4020da3b
      Maxim Levitsky 提交于
      Injected interrupts/nmi should not block a pending exception,
      but rather be either lost if nested hypervisor doesn't
      intercept the pending exception (as in stock x86), or be delivered
      in exitintinfo/IDT_VECTORING_INFO field, as a part of a VMexit
      that corresponds to the pending exception.
      
      The only reason for an exception to be blocked is when nested run
      is pending (and that can't really happen currently
      but still worth checking for).
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210401143817.1030695-2-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4020da3b
    • M
      KVM: nSVM: call nested_svm_load_cr3 on nested state load · 232f75d3
      Maxim Levitsky 提交于
      While KVM's MMU should be fully reset by loading of nested CR0/CR3/CR4
      by KVM_SET_SREGS, we are not in nested mode yet when we do it and therefore
      only root_mmu is reset.
      
      On regular nested entries we call nested_svm_load_cr3 which both updates
      the guest's CR3 in the MMU when it is needed, and it also initializes
      the mmu again which makes it initialize the walk_mmu as well when nested
      paging is enabled in both host and guest.
      
      Since we don't call nested_svm_load_cr3 on nested state load,
      the walk_mmu can be left uninitialized, which can lead to a NULL pointer
      dereference while accessing it if we happen to get a nested page fault
      right after entering the nested guest first time after the migration and
      we decide to emulate it, which leads to the emulator trying to access
      walk_mmu->gva_to_gpa which is NULL.
      
      Therefore we should call this function on nested state load as well.
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210401141814.1029036-3-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      232f75d3
    • S
      KVM: x86: Account a variety of miscellaneous allocations · eba04b20
      Sean Christopherson 提交于
      Switch to GFP_KERNEL_ACCOUNT for a handful of allocations that are
      clearly associated with a single task/VM.
      
      Note, there are a several SEV allocations that aren't accounted, but
      those can (hopefully) be fixed by using the local stack for memory.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210331023025.2485960-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      eba04b20
    • K
      KVM: nSVM: If VMRUN is single-stepped, queue the #DB intercept in nested_svm_vmexit() · 9a7de6ec
      Krish Sadhukhan 提交于
      According to APM, the #DB intercept for a single-stepped VMRUN must happen
      after the completion of that instruction, when the guest does #VMEXIT to
      the host. However, in the current implementation of KVM, the #DB intercept
      for a single-stepped VMRUN happens after the completion of the instruction
      that follows the VMRUN instruction. When the #DB intercept handler is
      invoked, it shows the RIP of the instruction that follows VMRUN, instead of
      of VMRUN itself. This is an incorrect RIP as far as single-stepping VMRUN
      is concerned.
      
      This patch fixes the problem by checking, in nested_svm_vmexit(), for the
      condition that the VMRUN instruction is being single-stepped and if so,
      queues the pending #DB intercept so that the #DB is accounted for before
      we execute L1's next instruction.
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oraacle.com>
      Message-Id: <20210323175006.73249-2-krish.sadhukhan@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9a7de6ec
  8. 01 4月, 2021 2 次提交
    • P
      KVM: SVM: ensure that EFER.SVME is set when running nested guest or on nested vmexit · 3c346c0c
      Paolo Bonzini 提交于
      Fixing nested_vmcb_check_save to avoid all TOC/TOU races
      is a bit harder in released kernels, so do the bare minimum
      by avoiding that EFER.SVME is cleared.  This is problematic
      because svm_set_efer frees the data structures for nested
      virtualization if EFER.SVME is cleared.
      
      Also check that EFER.SVME remains set after a nested vmexit;
      clearing it could happen if the bit is zero in the save area
      that is passed to KVM_SET_NESTED_STATE (the save area of the
      nested state corresponds to the nested hypervisor's state
      and is restored on the next nested vmexit).
      
      Cc: stable@vger.kernel.org
      Fixes: 2fcf4876 ("KVM: nSVM: implement on demand allocation of the nested state")
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3c346c0c
    • P
      KVM: SVM: load control fields from VMCB12 before checking them · a58d9166
      Paolo Bonzini 提交于
      Avoid races between check and use of the nested VMCB controls.  This
      for example ensures that the VMRUN intercept is always reflected to the
      nested hypervisor, instead of being processed by the host.  Without this
      patch, it is possible to end up with svm->nested.hsave pointing to
      the MSR permission bitmap for nested guests.
      
      This bug is CVE-2021-29657.
      Reported-by: NFelix Wilhelm <fwilhelm@google.com>
      Cc: stable@vger.kernel.org
      Fixes: 2fcf4876 ("KVM: nSVM: implement on demand allocation of the nested state")
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a58d9166
  9. 15 3月, 2021 14 次提交
    • C
      KVM: nSVM: Optimize vmcb12 to vmcb02 save area copies · 8173396e
      Cathy Avery 提交于
      Use the vmcb12 control clean field to determine which vmcb12.save
      registers were marked dirty in order to minimize register copies
      when switching from L1 to L2. Those vmcb12 registers marked as dirty need
      to be copied to L0's vmcb02 as they will be used to update the vmcb
      state cache for the L2 VMRUN.  In the case where we have a different
      vmcb12 from the last L2 VMRUN all vmcb12.save registers must be
      copied over to L2.save.
      
      Tested:
      kvm-unit-tests
      kvm selftests
      Fedora L1 L2
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NCathy Avery <cavery@redhat.com>
      Message-Id: <20210301200844.2000-1-cavery@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8173396e
    • B
      KVM: SVM: Add support for Virtual SPEC_CTRL · d00b99c5
      Babu Moger 提交于
      Newer AMD processors have a feature to virtualize the use of the
      SPEC_CTRL MSR. Presence of this feature is indicated via CPUID
      function 0x8000000A_EDX[20]: GuestSpecCtrl. Hypervisors are not
      required to enable this feature since it is automatically enabled on
      processors that support it.
      
      A hypervisor may wish to impose speculation controls on guest
      execution or a guest may want to impose its own speculation controls.
      Therefore, the processor implements both host and guest
      versions of SPEC_CTRL.
      
      When in host mode, the host SPEC_CTRL value is in effect and writes
      update only the host version of SPEC_CTRL. On a VMRUN, the processor
      loads the guest version of SPEC_CTRL from the VMCB. When the guest
      writes SPEC_CTRL, only the guest version is updated. On a VMEXIT,
      the guest version is saved into the VMCB and the processor returns
      to only using the host SPEC_CTRL for speculation control. The guest
      SPEC_CTRL is located at offset 0x2E0 in the VMCB.
      
      The effective SPEC_CTRL setting is the guest SPEC_CTRL setting or'ed
      with the hypervisor SPEC_CTRL setting. This allows the hypervisor to
      ensure a minimum SPEC_CTRL if desired.
      
      This support also fixes an issue where a guest may sometimes see an
      inconsistent value for the SPEC_CTRL MSR on processors that support
      this feature. With the current SPEC_CTRL support, the first write to
      SPEC_CTRL is intercepted and the virtualized version of the SPEC_CTRL
      MSR is not updated. When the guest reads back the SPEC_CTRL MSR, it
      will be 0x0, instead of the actual expected value. There isn’t a
      security concern here, because the host SPEC_CTRL value is or’ed with
      the Guest SPEC_CTRL value to generate the effective SPEC_CTRL value.
      KVM writes with the guest's virtualized SPEC_CTRL value to SPEC_CTRL
      MSR just before the VMRUN, so it will always have the actual value
      even though it doesn’t appear that way in the guest. The guest will
      only see the proper value for the SPEC_CTRL register if the guest was
      to write to the SPEC_CTRL register again. With Virtual SPEC_CTRL
      support, the save area spec_ctrl is properly saved and restored.
      So, the guest will always see the proper value when it is read back.
      Signed-off-by: NBabu Moger <babu.moger@amd.com>
      Message-Id: <161188100955.28787.11816849358413330720.stgit@bmoger-ubuntu>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d00b99c5
    • M
      KVM: nSVM: always use vmcb01 to for vmsave/vmload of guest state · cc3ed80a
      Maxim Levitsky 提交于
      This allows to avoid copying of these fields between vmcb01
      and vmcb02 on nested guest entry/exit.
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cc3ed80a
    • S
      KVM: nSVM: Add helper to synthesize nested VM-Exit without collateral · 3a87c7e0
      Sean Christopherson 提交于
      Add a helper to consolidate boilerplate for nested VM-Exits that don't
      provide any data in exit_info_*.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210302174515.2812275-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3a87c7e0
    • S
      KVM: x86: Handle triple fault in L2 without killing L1 · cb6a32c2
      Sean Christopherson 提交于
      Synthesize a nested VM-Exit if L2 triggers an emulated triple fault
      instead of exiting to userspace, which likely will kill L1.  Any flow
      that does KVM_REQ_TRIPLE_FAULT is suspect, but the most common scenario
      for L2 killing L1 is if L0 (KVM) intercepts a contributory exception that
      is _not_intercepted by L1.  E.g. if KVM is intercepting #GPs for the
      VMware backdoor, a #GP that occurs in L2 while vectoring an injected #DF
      will cause KVM to emulate triple fault.
      
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Jim Mattson <jmattson@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210302174515.2812275-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cb6a32c2
    • P
      KVM: SVM: Pass struct kvm_vcpu to exit handlers (and many, many other places) · 63129754
      Paolo Bonzini 提交于
      Refactor the svm_exit_handlers API to pass @vcpu instead of @svm to
      allow directly invoking common x86 exit handlers (in a future patch).
      Opportunistically convert an absurd number of instances of 'svm->vcpu'
      to direct uses of 'vcpu' to avoid pointless casting.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210205005750.3841462-4-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      63129754
    • S
      KVM: nSVM: Trace VM-Enter consistency check failures · 11f0cbf0
      Sean Christopherson 提交于
      Use trace_kvm_nested_vmenter_failed() and its macro magic to trace
      consistency check failures on nested VMRUN.  Tracing such failures by
      running the buggy VMM as a KVM guest is often the only way to get a
      precise explanation of why VMRUN failed.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210204000117.3303214-13-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      11f0cbf0
    • K
      KVM: nSVM: Add missing checks for reserved bits to svm_set_nested_state() · 6906e06d
      Krish Sadhukhan 提交于
      The path for SVM_SET_NESTED_STATE needs to have the same checks for the CPU
      registers, as we have in the VMRUN path for a nested guest. This patch adds
      those missing checks to svm_set_nested_state().
      Suggested-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Message-Id: <20201006190654.32305-3-krish.sadhukhan@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6906e06d
    • P
      KVM: nSVM: only copy L1 non-VMLOAD/VMSAVE data in svm_set_nested_state() · c08f390a
      Paolo Bonzini 提交于
      The VMLOAD/VMSAVE data is not taken from userspace, since it will
      not be restored on VMEXIT (it will be copied from VMCB02 to VMCB01).
      For clarity, replace the wholesale copy of the VMCB save area
      with a copy of that state only.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c08f390a
    • P
      KVM: nSVM: do not mark all VMCB02 fields dirty on nested vmexit · 4bb170a5
      Paolo Bonzini 提交于
      Since L1 and L2 now use different VMCBs, most of the fields remain the
      same in VMCB02 from one L2 run to the next.  Since KVM itself is not
      looking at VMCB12's clean field, for now not much can be optimized.
      However, in the future we could avoid more copies if the VMCB12's SEG
      and DT sections are clean.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4bb170a5
    • P
      KVM: nSVM: do not mark all VMCB01 fields dirty on nested vmexit · 7ca62d13
      Paolo Bonzini 提交于
      Since L1 and L2 now use different VMCBs, most of the fields remain
      the same from one L1 run to the next.  svm_set_cr0 and other functions
      called by nested_svm_vmexit already take care of clearing the
      corresponding clean bits; only the TSC offset is special.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7ca62d13
    • P
      KVM: nSVM: do not copy vmcb01->control blindly to vmcb02->control · 7c3ecfcd
      Paolo Bonzini 提交于
      Most fields were going to be overwritten by vmcb12 control fields, or
      do not matter at all because they are filled by the processor on vmexit.
      Therefore, we need not copy them from vmcb01 to vmcb02 on vmentry.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7c3ecfcd
    • P
      KVM: nSVM: rename functions and variables according to vmcbXY nomenclature · 9e8f0fbf
      Paolo Bonzini 提交于
      Now that SVM is using a separate vmcb01 and vmcb02 (and also uses the vmcb12
      naming) we can give clearer names to functions that write to and read
      from those VMCBs.  Likewise, variables and parameters can be renamed
      from nested_vmcb to vmcb12.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9e8f0fbf
    • C
      KVM: SVM: Use a separate vmcb for the nested L2 guest · 4995a368
      Cathy Avery 提交于
      svm->vmcb will now point to a separate vmcb for L1 (not nested) or L2
      (nested).
      
      The main advantages are removing get_host_vmcb and hsave, in favor of
      concepts that are shared with VMX.
      
      We don't need anymore to stash the L1 registers in hsave while L2
      runs, but we need to copy the VMLOAD/VMSAVE registers from VMCB01 to
      VMCB02 and back.  This more or less has the same cost, but code-wise
      nested_svm_vmloadsave can be reused.
      
      This patch omits several optimizations that are possible:
      
      - for simplicity there is some wholesale copying of vmcb.control areas
      which can go away.
      
      - we should be able to better use the VMCB01 and VMCB02 clean bits.
      
      - another possibility is to always use VMCB01 for VMLOAD and VMSAVE,
      thus avoiding the copy of VMLOAD/VMSAVE registers from VMCB01 to
      VMCB02 and back.
      
      Tested:
      kvm-unit-tests
      kvm self tests
      Loaded fedora nested guest on fedora
      Signed-off-by: NCathy Avery <cavery@redhat.com>
      Message-Id: <20201011184818.3609-3-cavery@redhat.com>
      [Fix conflicts; keep VMCB02 G_PAT up to date whenever guest writes the
       PAT MSR; do not copy CR4 over from VMCB01 as it is not needed anymore; add
       a few more comments. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4995a368