1. 15 3月, 2021 4 次提交
    • C
      KVM: nSVM: Track the physical cpu of the vmcb vmrun through the vmcb · af18fa77
      Cathy Avery 提交于
      This patch moves the physical cpu tracking from the vcpu
      to the vmcb in svm_switch_vmcb. If either vmcb01 or vmcb02
      change physical cpus from one vmrun to the next the vmcb's
      previous cpu is preserved for comparison with the current
      cpu and the vmcb is marked dirty if different. This prevents
      the processor from using old cached data for a vmcb that may
      have been updated on a prior run on a different processor.
      
      It also moves the physical cpu check from svm_vcpu_load
      to pre_svm_run as the check only needs to be done at run.
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NCathy Avery <cavery@redhat.com>
      Message-Id: <20210112164313.4204-2-cavery@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      af18fa77
    • C
      KVM: SVM: Use a separate vmcb for the nested L2 guest · 4995a368
      Cathy Avery 提交于
      svm->vmcb will now point to a separate vmcb for L1 (not nested) or L2
      (nested).
      
      The main advantages are removing get_host_vmcb and hsave, in favor of
      concepts that are shared with VMX.
      
      We don't need anymore to stash the L1 registers in hsave while L2
      runs, but we need to copy the VMLOAD/VMSAVE registers from VMCB01 to
      VMCB02 and back.  This more or less has the same cost, but code-wise
      nested_svm_vmloadsave can be reused.
      
      This patch omits several optimizations that are possible:
      
      - for simplicity there is some wholesale copying of vmcb.control areas
      which can go away.
      
      - we should be able to better use the VMCB01 and VMCB02 clean bits.
      
      - another possibility is to always use VMCB01 for VMLOAD and VMSAVE,
      thus avoiding the copy of VMLOAD/VMSAVE registers from VMCB01 to
      VMCB02 and back.
      
      Tested:
      kvm-unit-tests
      kvm self tests
      Loaded fedora nested guest on fedora
      Signed-off-by: NCathy Avery <cavery@redhat.com>
      Message-Id: <20201011184818.3609-3-cavery@redhat.com>
      [Fix conflicts; keep VMCB02 G_PAT up to date whenever guest writes the
       PAT MSR; do not copy CR4 over from VMCB01 as it is not needed anymore; add
       a few more comments. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4995a368
    • S
      KVM: SVM: Don't strip the C-bit from CR2 on #PF interception · 6d1b867d
      Sean Christopherson 提交于
      Don't strip the C-bit from the faulting address on an intercepted #PF,
      the address is a virtual address, not a physical address.
      
      Fixes: 0ede79e1 ("KVM: SVM: Clear C-bit from the page fault address")
      Cc: stable@vger.kernel.org
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210305011101.3597423-13-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6d1b867d
    • D
      KVM: x86: to track if L1 is running L2 VM · 43c11d91
      Dongli Zhang 提交于
      The new per-cpu stat 'nested_run' is introduced in order to track if L1 VM
      is running or used to run L2 VM.
      
      An example of the usage of 'nested_run' is to help the host administrator
      to easily track if any L1 VM is used to run L2 VM. Suppose there is issue
      that may happen with nested virtualization, the administrator will be able
      to easily narrow down and confirm if the issue is due to nested
      virtualization via 'nested_run'. For example, whether the fix like
      commit 88dddc11 ("KVM: nVMX: do not use dangling shadow VMCS after
      guest reset") is required.
      
      Cc: Joe Jin <joe.jin@oracle.com>
      Signed-off-by: NDongli Zhang <dongli.zhang@oracle.com>
      Message-Id: <20210305225747.7682-1-dongli.zhang@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      43c11d91
  2. 05 3月, 2021 1 次提交
  3. 03 3月, 2021 1 次提交
    • B
      KVM: SVM: Clear the CR4 register on reset · 9e46f6c6
      Babu Moger 提交于
      This problem was reported on a SVM guest while executing kexec.
      Kexec fails to load the new kernel when the PCID feature is enabled.
      
      When kexec starts loading the new kernel, it starts the process by
      resetting the vCPU's and then bringing each vCPU online one by one.
      The vCPU reset is supposed to reset all the register states before the
      vCPUs are brought online. However, the CR4 register is not reset during
      this process. If this register is already setup during the last boot,
      all the flags can remain intact. The X86_CR4_PCIDE bit can only be
      enabled in long mode. So, it must be enabled much later in SMP
      initialization.  Having the X86_CR4_PCIDE bit set during SMP boot can
      cause a boot failures.
      
      Fix the issue by resetting the CR4 register in init_vmcb().
      Signed-off-by: NBabu Moger <babu.moger@amd.com>
      Message-Id: <161471109108.30811.6392805173629704166.stgit@bmoger-ubuntu>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9e46f6c6
  4. 25 2月, 2021 1 次提交
    • S
      KVM: SVM: Fix nested VM-Exit on #GP interception handling · 2df8d380
      Sean Christopherson 提交于
      Fix the interpreation of nested_svm_vmexit()'s return value when
      synthesizing a nested VM-Exit after intercepting an SVM instruction while
      L2 was running.  The helper returns '0' on success, whereas a return
      value of '0' in the exit handler path means "exit to userspace".  The
      incorrect return value causes KVM to exit to userspace without filling
      the run state, e.g. QEMU logs "KVM: unknown exit, hardware reason 0".
      
      Fixes: 14c2bf81 ("KVM: SVM: Fix #GP handling for doubly-nested virtualization")
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210224005627.657028-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2df8d380
  5. 23 2月, 2021 1 次提交
    • P
      KVM: nSVM: prepare guest save area while is_guest_mode is true · d2df592f
      Paolo Bonzini 提交于
      Right now, enter_svm_guest_mode is calling nested_prepare_vmcb_save and
      nested_prepare_vmcb_control.  This results in is_guest_mode being false
      until the end of nested_prepare_vmcb_control.
      
      This is a problem because nested_prepare_vmcb_save can in turn cause
      changes to the intercepts and these have to be applied to the "host VMCB"
      (stored in svm->nested.hsave) and then merged with the VMCB12 intercepts
      into svm->vmcb.
      
      In particular, without this change we forget to set the CR0 read and CR0
      write intercepts when running a real mode L2 guest with NPT disabled.
      The guest is therefore able to see the CR0.PG bit that KVM sets to
      enable "paged real mode".  This patch fixes the svm.flat mode_switch
      test case with npt=0.  There are no other problematic calls in
      nested_prepare_vmcb_save.
      
      Moving is_guest_mode to the end is done since commit 06fc7772
      ("KVM: SVM: Activate nested state only when guest state is complete",
      2010-04-25).  However, back then KVM didn't grab a different VMCB
      when updating the intercepts, it had already copied/merged L1's stuff
      to L0's VMCB, and then updated L0's VMCB regardless of is_nested().
      Later recalc_intercepts was introduced in commit 384c6368
      ("KVM: SVM: Add function to recalculate intercept masks", 2011-01-12).
      This introduced the bug, because recalc_intercepts now throws away
      the intercept manipulations that svm_set_cr0 had done in the meanwhile
      to svm->vmcb.
      
      [1] https://lore.kernel.org/kvm/1266493115-28386-1-git-send-email-joerg.roedel@amd.com/Reviewed-by: NSean Christopherson <seanjc@google.com>
      Tested-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d2df592f
  6. 18 2月, 2021 4 次提交
  7. 11 2月, 2021 1 次提交
  8. 09 2月, 2021 3 次提交
  9. 04 2月, 2021 15 次提交
    • S
      KVM: x86: SEV: Treat C-bit as legal GPA bit regardless of vCPU mode · ca29e145
      Sean Christopherson 提交于
      Rename cr3_lm_rsvd_bits to reserved_gpa_bits, and use it for all GPA
      legality checks.  AMD's APM states:
      
        If the C-bit is an address bit, this bit is masked from the guest
        physical address when it is translated through the nested page tables.
      
      Thus, any access that can conceivably be run through NPT should ignore
      the C-bit when checking for validity.
      
      For features that KVM emulates in software, e.g. MTRRs, there is no
      clear direction in the APM for how the C-bit should be handled.  For
      such cases, follow the SME behavior inasmuch as possible, since SEV is
      is essentially a VM-specific variant of SME.  For SME, the APM states:
      
        In this case the upper physical address bits are treated as reserved
        when the feature is enabled except where otherwise indicated.
      
      Collecting the various relavant SME snippets in the APM and cross-
      referencing the omissions with Linux kernel code, this leaves MTTRs and
      APIC_BASE as the only flows that KVM emulates that should _not_ ignore
      the C-bit.
      
      Note, this means the reserved bit checks in the page tables are
      technically broken.  This will be remedied in a future patch.
      
      Although the page table checks are technically broken, in practice, it's
      all but guaranteed to be irrelevant.  NPT is required for SEV, i.e.
      shadowing page tables isn't needed in the common case.  Theoretically,
      the checks could be in play for nested NPT, but it's extremely unlikely
      that anyone is running nested VMs on SEV, as doing so would require L1
      to expose sensitive data to L0, e.g. the entire VMCB.  And if anyone is
      running nested VMs, L0 can't read the guest's encrypted memory, i.e. L1
      would need to put its NPT in shared memory, in which case the C-bit will
      never be set.  Or, L1 could use shadow paging, but again, if L0 needs to
      read page tables, e.g. to load PDPTRs, the memory can't be encrypted if
      L1 has any expectation of L0 doing the right thing.
      
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210204000117.3303214-8-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ca29e145
    • S
      KVM: nSVM: Use common GPA helper to check for illegal CR3 · bbc2c63d
      Sean Christopherson 提交于
      Replace an open coded check for an invalid CR3 with its equivalent
      helper.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210204000117.3303214-7-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      bbc2c63d
    • S
      KVM: nSVM: Don't strip host's C-bit from guest's CR3 when reading PDPTRs · 2732be90
      Sean Christopherson 提交于
      Don't clear the SME C-bit when reading a guest PDPTR, as the GPA (CR3) is
      in the guest domain.
      
      Barring a bizarre paravirtual use case, this is likely a benign bug.  SME
      is not emulated by KVM, loading SEV guest PDPTRs is doomed as KVM can't
      use the correct key to read guest memory, and setting guest MAXPHYADDR
      higher than the host, i.e. overlapping the C-bit, would cause faults in
      the guest.
      
      Note, for SEV guests, stripping the C-bit is technically aligned with CPU
      behavior, but for KVM it's the greater of two evils.  Because KVM doesn't
      have access to the guest's encryption key, ignoring the C-bit would at
      best result in KVM reading garbage.  By keeping the C-bit, KVM will
      fail its read (unless userspace creates a memslot with the C-bit set).
      The guest will still undoubtedly die, as KVM will use '0' for the PDPTR
      value, but that's preferable to interpreting encrypted data as a PDPTR.
      
      Fixes: d0ec49d4 ("kvm/x86/svm: Support Secure Memory Encryption within KVM")
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210204000117.3303214-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2732be90
    • P
      KVM: x86: move kvm_inject_gp up from kvm_set_xcr to callers · bbefd4fc
      Paolo Bonzini 提交于
      Push the injection of #GP up to the callers, so that they can just use
      kvm_complete_insn_gp.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      bbefd4fc
    • K
      KVM: SVM: Replace hard-coded value with #define · 04548ed0
      Krish Sadhukhan 提交于
      Replace the hard-coded value for bit# 1 in EFLAGS, with the available
      #define.
      Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Message-Id: <20210203012842.101447-2-krish.sadhukhan@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      04548ed0
    • M
      KVM: SVM: use .prepare_guest_switch() to handle CPU register save/setup · a7fc06dd
      Michael Roth 提交于
      Currently we save host state like user-visible host MSRs, and do some
      initial guest register setup for MSR_TSC_AUX and MSR_AMD64_TSC_RATIO
      in svm_vcpu_load(). Defer this until just before we enter the guest by
      moving the handling to kvm_x86_ops.prepare_guest_switch() similarly to
      how it is done for the VMX implementation.
      
      Additionally, since handling of saving/restoring host user MSRs is the
      same both with/without SEV-ES enabled, move that handling to common
      code.
      Suggested-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NMichael Roth <michael.roth@amd.com>
      Message-Id: <20210202190126.2185715-4-michael.roth@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a7fc06dd
    • M
      KVM: SVM: remove uneeded fields from host_save_users_msrs · 553cc15f
      Michael Roth 提交于
      Now that the set of host user MSRs that need to be individually
      saved/restored are the same with/without SEV-ES, we can drop the
      .sev_es_restored flag and just iterate through the list unconditionally
      for both cases. A subsequent patch can then move these loops to a
      common path.
      Signed-off-by: NMichael Roth <michael.roth@amd.com>
      Message-Id: <20210202190126.2185715-3-michael.roth@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      553cc15f
    • M
      KVM: SVM: use vmsave/vmload for saving/restoring additional host state · e79b91bb
      Michael Roth 提交于
      Using a guest workload which simply issues 'hlt' in a tight loop to
      generate VMEXITs, it was observed (on a recent EPYC processor) that a
      significant amount of the VMEXIT overhead measured on the host was the
      result of MSR reads/writes in svm_vcpu_load/svm_vcpu_put according to
      perf:
      
        67.49%--kvm_arch_vcpu_ioctl_run
                |
                |--23.13%--vcpu_put
                |          kvm_arch_vcpu_put
                |          |
                |          |--21.31%--native_write_msr
                |          |
                |           --1.27%--svm_set_cr4
                |
                |--16.11%--vcpu_load
                |          |
                |           --15.58%--kvm_arch_vcpu_load
                |                     |
                |                     |--13.97%--svm_set_cr4
                |                     |          |
                |                     |          |--12.64%--native_read_msr
      
      Most of these MSRs relate to 'syscall'/'sysenter' and segment bases, and
      can be saved/restored using 'vmsave'/'vmload' instructions rather than
      explicit MSR reads/writes. In doing so there is a significant reduction
      in the svm_vcpu_load/svm_vcpu_put overhead measured for the above
      workload:
      
        50.92%--kvm_arch_vcpu_ioctl_run
                |
                |--19.28%--disable_nmi_singlestep
                |
                |--13.68%--vcpu_load
                |          kvm_arch_vcpu_load
                |          |
                |          |--9.19%--svm_set_cr4
                |          |          |
                |          |           --6.44%--native_read_msr
                |          |
                |           --3.55%--native_write_msr
                |
                |--6.05%--kvm_inject_nmi
                |--2.80%--kvm_sev_es_mmio_read
                |--2.19%--vcpu_put
                |          |
                |           --1.25%--kvm_arch_vcpu_put
                |                     native_write_msr
      
      Quantifying this further, if we look at the raw cycle counts for a
      normal iteration of the above workload (according to 'rdtscp'),
      kvm_arch_vcpu_ioctl_run() takes ~4600 cycles from start to finish with
      the current behavior. Using 'vmsave'/'vmload', this is reduced to
      ~2800 cycles, a savings of 39%.
      
      While this approach doesn't seem to manifest in any noticeable
      improvement for more realistic workloads like UnixBench, netperf, and
      kernel builds, likely due to their exit paths generally involving IO
      with comparatively high latencies, it does improve overall overhead
      of KVM_RUN significantly, which may still be noticeable for certain
      situations. It also simplifies some aspects of the code.
      
      With this change, explicit save/restore is no longer needed for the
      following host MSRs, since they are documented[1] as being part of the
      VMCB State Save Area:
      
        MSR_STAR, MSR_LSTAR, MSR_CSTAR,
        MSR_SYSCALL_MASK, MSR_KERNEL_GS_BASE,
        MSR_IA32_SYSENTER_CS,
        MSR_IA32_SYSENTER_ESP,
        MSR_IA32_SYSENTER_EIP,
        MSR_FS_BASE, MSR_GS_BASE
      
      and only the following MSR needs individual handling in
      svm_vcpu_put/svm_vcpu_load:
      
        MSR_TSC_AUX
      
      We could drop the host_save_user_msrs array/loop and instead handle
      MSR read/write of MSR_TSC_AUX directly, but we leave that for now as
      a potential follow-up.
      
      Since 'vmsave'/'vmload' also handles the LDTR and FS/GS segment
      registers (and associated hidden state)[2], some of the code
      previously used to handle this is no longer needed, so we drop it
      as well.
      
      The first public release of the SVM spec[3] also documents the same
      handling for the host state in question, so we make these changes
      unconditionally.
      
      Also worth noting is that we 'vmsave' to the same page that is
      subsequently used by 'vmrun' to record some host additional state. This
      is okay, since, in accordance with the spec[2], the additional state
      written to the page by 'vmrun' does not overwrite any fields written by
      'vmsave'. This has also been confirmed through testing (for the above
      CPU, at least).
      
      [1] AMD64 Architecture Programmer's Manual, Rev 3.33, Volume 2, Appendix B, Table B-2
      [2] AMD64 Architecture Programmer's Manual, Rev 3.31, Volume 3, Chapter 4, VMSAVE/VMLOAD
      [3] Secure Virtual Machine Architecture Reference Manual, Rev 3.01
      Suggested-by: NTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NMichael Roth <michael.roth@amd.com>
      Message-Id: <20210202190126.2185715-2-michael.roth@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e79b91bb
    • S
      KVM: SVM: Use asm goto to handle unexpected #UD on SVM instructions · 35a78319
      Sean Christopherson 提交于
      Add svm_asm*() macros, a la the existing vmx_asm*() macros, to handle
      faults on SVM instructions instead of using the generic __ex(), a.k.a.
      __kvm_handle_fault_on_reboot().  Using asm goto generates slightly
      better code as it eliminates the in-line JMP+CALL sequences that are
      needed by __kvm_handle_fault_on_reboot() to avoid triggering BUG()
      from fixup (which generates bad stack traces).
      
      Using SVM specific macros also drops the last user of __ex() and the
      the last asm linkage to kvm_spurious_fault(), and adds a helper for
      VMSAVE, which may gain an addition call site in the future (as part
      of optimizing the SVM context switching).
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20201231002702.22237077-8-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      35a78319
    • J
      KVM: X86: prepend vmx/svm prefix to additional kvm_x86_ops functions · b6a7cc35
      Jason Baron 提交于
      A subsequent patch introduces macros in preparation for simplifying the
      definition for vmx_x86_ops and svm_x86_ops. Making the naming more uniform
      expands the coverage of the macros. Add vmx/svm prefix to the following
      functions: update_exception_bitmap(), enable_nmi_window(),
      enable_irq_window(), update_cr8_intercept and enable_smi_window().
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NJason Baron <jbaron@akamai.com>
      Message-Id: <ed594696f8e2c2b2bfc747504cee9bbb2a269300.1610680941.git.jbaron@akamai.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b6a7cc35
    • W
      KVM: SVM: Fix #GP handling for doubly-nested virtualization · 14c2bf81
      Wei Huang 提交于
      Under the case of nested on nested (L0, L1, L2 are all hypervisors),
      we do not support emulation of the vVMLOAD/VMSAVE feature, the
      L0 hypervisor can inject the proper #VMEXIT to inform L1 of what is
      happening and L1 can avoid invoking the #GP workaround.  For this
      reason we turns on guest VM's X86_FEATURE_SVME_ADDR_CHK bit for KVM
      running inside VM to receive the notification and change behavior.
      
      Similarly we check if vcpu is under guest mode before emulating the
      vmware-backdoor instructions. For the case of nested on nested, we
      let the guest handle it.
      Co-developed-by: NBandan Das <bsd@redhat.com>
      Signed-off-by: NBandan Das <bsd@redhat.com>
      Signed-off-by: NWei Huang <wei.huang2@amd.com>
      Tested-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210126081831.570253-5-wei.huang2@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      14c2bf81
    • W
      KVM: SVM: Add support for SVM instruction address check change · 3b9c723e
      Wei Huang 提交于
      New AMD CPUs have a change that checks #VMEXIT intercept on special SVM
      instructions before checking their EAX against reserved memory region.
      This change is indicated by CPUID_0x8000000A_EDX[28]. If it is 1, #VMEXIT
      is triggered before #GP. KVM doesn't need to intercept and emulate #GP
      faults as #GP is supposed to be triggered.
      Co-developed-by: NBandan Das <bsd@redhat.com>
      Signed-off-by: NBandan Das <bsd@redhat.com>
      Signed-off-by: NWei Huang <wei.huang2@amd.com>
      Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210126081831.570253-4-wei.huang2@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3b9c723e
    • B
      KVM: SVM: Add emulation support for #GP triggered by SVM instructions · 82a11e9c
      Bandan Das 提交于
      While running SVM related instructions (VMRUN/VMSAVE/VMLOAD), some AMD
      CPUs check EAX against reserved memory regions (e.g. SMM memory on host)
      before checking VMCB's instruction intercept. If EAX falls into such
      memory areas, #GP is triggered before VMEXIT. This causes problem under
      nested virtualization. To solve this problem, KVM needs to trap #GP and
      check the instructions triggering #GP. For VM execution instructions,
      KVM emulates these instructions.
      Co-developed-by: NWei Huang <wei.huang2@amd.com>
      Signed-off-by: NWei Huang <wei.huang2@amd.com>
      Signed-off-by: NBandan Das <bsd@redhat.com>
      Message-Id: <20210126081831.570253-3-wei.huang2@amd.com>
      [Conditionally enable #GP intercept. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      82a11e9c
    • C
      KVM: X86: Rename DR6_INIT to DR6_ACTIVE_LOW · 9a3ecd5e
      Chenyi Qiang 提交于
      DR6_INIT contains the 1-reserved bits as well as the bit that is cleared
      to 0 when the condition (e.g. RTM) happens. The value can be used to
      initialize dr6 and also be the XOR mask between the #DB exit
      qualification (or payload) and DR6.
      
      Concerning that DR6_INIT is used as initial value only once, rename it
      to DR6_ACTIVE_LOW and apply it in other places, which would make the
      incoming changes for bus lock debug exception more simple.
      Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
      Message-Id: <20210202090433.13441-2-chenyi.qiang@intel.com>
      [Define DR6_FIXED_1 from DR6_ACTIVE_LOW and DR6_VOLATILE. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9a3ecd5e
    • B
      KVM/SVM: add support for SEV attestation command · 2c07ded0
      Brijesh Singh 提交于
      The SEV FW version >= 0.23 added a new command that can be used to query
      the attestation report containing the SHA-256 digest of the guest memory
      encrypted through the KVM_SEV_LAUNCH_UPDATE_{DATA, VMSA} commands and
      sign the report with the Platform Endorsement Key (PEK).
      
      See the SEV FW API spec section 6.8 for more details.
      
      Note there already exist a command (KVM_SEV_LAUNCH_MEASURE) that can be
      used to get the SHA-256 digest. The main difference between the
      KVM_SEV_LAUNCH_MEASURE and KVM_SEV_ATTESTATION_REPORT is that the latter
      can be called while the guest is running and the measurement value is
      signed with PEK.
      
      Cc: James Bottomley <jejb@linux.ibm.com>
      Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: John Allen <john.allen@amd.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: linux-crypto@vger.kernel.org
      Reviewed-by: NTom Lendacky <thomas.lendacky@amd.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Tested-by: NJames Bottomley <jejb@linux.ibm.com>
      Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
      Message-Id: <20210104151749.30248-1-brijesh.singh@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2c07ded0
  10. 03 2月, 2021 2 次提交
    • P
      KVM: x86: cleanup CR3 reserved bits checks · c1c35cf7
      Paolo Bonzini 提交于
      If not in long mode, the low bits of CR3 are reserved but not enforced to
      be zero, so remove those checks.  If in long mode, however, the MBZ bits
      extend down to the highest physical address bit of the guest, excluding
      the encryption bit.
      
      Make the checks consistent with the above, and match them between
      nested_vmcb_checks and KVM_SET_SREGS.
      
      Cc: stable@vger.kernel.org
      Fixes: 761e4169 ("KVM: nSVM: Check that MBZ bits in CR3 and CR4 are not set on vmrun of nested guests")
      Fixes: a780a3ea ("KVM: X86: Fix reserved bits check for MOV to CR3")
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c1c35cf7
    • S
      KVM: SVM: Treat SVM as unsupported when running as an SEV guest · ccd85d90
      Sean Christopherson 提交于
      Don't let KVM load when running as an SEV guest, regardless of what
      CPUID says.  Memory is encrypted with a key that is not accessible to
      the host (L0), thus it's impossible for L0 to emulate SVM, e.g. it'll
      see garbage when reading the VMCB.
      
      Technically, KVM could decrypt all memory that needs to be accessible to
      the L0 and use shadow paging so that L0 does not need to shadow NPT, but
      exposing such information to L0 largely defeats the purpose of running as
      an SEV guest.  This can always be revisited if someone comes up with a
      use case for running VMs inside SEV guests.
      
      Note, VMLOAD, VMRUN, etc... will also #GP on GPAs with C-bit set, i.e. KVM
      is doomed even if the SEV guest is debuggable and the hypervisor is willing
      to decrypt the VMCB.  This may or may not be fixed on CPUs that have the
      SVME_ADDR_CHK fix.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210202212017.2486595-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ccd85d90
  11. 29 1月, 2021 1 次提交
    • P
      Fix unsynchronized access to sev members through svm_register_enc_region · 19a23da5
      Peter Gonda 提交于
      Grab kvm->lock before pinning memory when registering an encrypted
      region; sev_pin_memory() relies on kvm->lock being held to ensure
      correctness when checking and updating the number of pinned pages.
      
      Add a lockdep assertion to help prevent future regressions.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: x86@kernel.org
      Cc: kvm@vger.kernel.org
      Cc: stable@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Fixes: 1e80fdc0 ("KVM: SVM: Pin guest memory when SEV is active")
      Signed-off-by: NPeter Gonda <pgonda@google.com>
      
      V2
       - Fix up patch description
       - Correct file paths svm.c -> sev.c
       - Add unlock of kvm->lock on sev_pin_memory error
      
      V1
       - https://lore.kernel.org/kvm/20210126185431.1824530-1-pgonda@google.com/
      
      Message-Id: <20210127161524.2832400-1-pgonda@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      19a23da5
  12. 26 1月, 2021 3 次提交
    • P
      KVM: x86: allow KVM_REQ_GET_NESTED_STATE_PAGES outside guest mode for VMX · 9a78e158
      Paolo Bonzini 提交于
      VMX also uses KVM_REQ_GET_NESTED_STATE_PAGES for the Hyper-V eVMCS,
      which may need to be loaded outside guest mode.  Therefore we cannot
      WARN in that case.
      
      However, that part of nested_get_vmcs12_pages is _not_ needed at
      vmentry time.  Split it out of KVM_REQ_GET_NESTED_STATE_PAGES handling,
      so that both vmentry and migration (and in the latter case, independent
      of is_guest_mode) do the parts that are needed.
      
      Cc: <stable@vger.kernel.org> # 5.10.x: f2c7ef3b: KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES
      Cc: <stable@vger.kernel.org> # 5.10.x
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9a78e158
    • S
      KVM: SVM: Unconditionally sync GPRs to GHCB on VMRUN of SEV-ES guest · 25009140
      Sean Christopherson 提交于
      Drop the per-GPR dirty checks when synchronizing GPRs to the GHCB, the
      GRPs' dirty bits are set from time zero and never cleared, i.e. will
      always be seen as dirty.  The obvious alternative would be to clear
      the dirty bits when appropriate, but removing the dirty checks is
      desirable as it allows reverting GPR dirty+available tracking, which
      adds overhead to all flavors of x86 VMs.
      
      Note, unconditionally writing the GPRs in the GHCB is tacitly allowed
      by the GHCB spec, which allows the hypervisor (or guest) to provide
      unnecessary info; it's the guest's responsibility to consume only what
      it needs (the hypervisor is untrusted after all).
      
        The guest and hypervisor can supply additional state if desired but
        must not rely on that additional state being provided.
      
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Fixes: 291bd20d ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT")
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210122235049.3107620-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      25009140
    • L
      kvm: tracing: Fix unmatched kvm_entry and kvm_exit events · d95df951
      Lorenzo Brescia 提交于
      On VMX, if we exit and then re-enter immediately without leaving
      the vmx_vcpu_run() function, the kvm_entry event is not logged.
      That means we will see one (or more) kvm_exit, without its (their)
      corresponding kvm_entry, as shown here:
      
       CPU-1979 [002] 89.871187: kvm_entry: vcpu 1
       CPU-1979 [002] 89.871218: kvm_exit:  reason MSR_WRITE
       CPU-1979 [002] 89.871259: kvm_exit:  reason MSR_WRITE
      
      It also seems possible for a kvm_entry event to be logged, but then
      we leave vmx_vcpu_run() right away (if vmx->emulation_required is
      true). In this case, we will have a spurious kvm_entry event in the
      trace.
      
      Fix these situations by moving trace_kvm_entry() inside vmx_vcpu_run()
      (where trace_kvm_exit() already is).
      
      A trace obtained with this patch applied looks like this:
      
       CPU-14295 [000] 8388.395387: kvm_entry: vcpu 0
       CPU-14295 [000] 8388.395392: kvm_exit:  reason MSR_WRITE
       CPU-14295 [000] 8388.395393: kvm_entry: vcpu 0
       CPU-14295 [000] 8388.395503: kvm_exit:  reason EXTERNAL_INTERRUPT
      
      Of course, not calling trace_kvm_entry() in common x86 code any
      longer means that we need to adjust the SVM side of things too.
      Signed-off-by: NLorenzo Brescia <lorenzo.brescia@edu.unito.it>
      Signed-off-by: NDario Faggioli <dfaggioli@suse.com>
      Message-Id: <160873470698.11652.13483635328769030605.stgit@Wayrath>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d95df951
  13. 08 1月, 2021 3 次提交
    • T
      KVM: SVM: Add support for booting APs in an SEV-ES guest · 647daca2
      Tom Lendacky 提交于
      Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence,
      where the guest vCPU register state is updated and then the vCPU is VMRUN
      to begin execution of the AP. For an SEV-ES guest, this won't work because
      the guest register state is encrypted.
      
      Following the GHCB specification, the hypervisor must not alter the guest
      register state, so KVM must track an AP/vCPU boot. Should the guest want
      to park the AP, it must use the AP Reset Hold exit event in place of, for
      example, a HLT loop.
      
      First AP boot (first INIT-SIPI-SIPI sequence):
        Execute the AP (vCPU) as it was initialized and measured by the SEV-ES
        support. It is up to the guest to transfer control of the AP to the
        proper location.
      
      Subsequent AP boot:
        KVM will expect to receive an AP Reset Hold exit event indicating that
        the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to
        awaken it. When the AP Reset Hold exit event is received, KVM will place
        the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI
        sequence, KVM will make the vCPU runnable. It is again up to the guest
        to then transfer control of the AP to the proper location.
      
        To differentiate between an actual HLT and an AP Reset Hold, a new MP
        state is introduced, KVM_MP_STATE_AP_RESET_HOLD, which the vCPU is
        placed in upon receiving the AP Reset Hold exit event. Additionally, to
        communicate the AP Reset Hold exit event up to userspace (if needed), a
        new exit reason is introduced, KVM_EXIT_AP_RESET_HOLD.
      
      A new x86 ops function is introduced, vcpu_deliver_sipi_vector, in order
      to accomplish AP booting. For VMX, vcpu_deliver_sipi_vector is set to the
      original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(). SVM adds
      a new function that, for non SEV-ES guests, invokes the original SIPI
      delivery function, kvm_vcpu_deliver_sipi_vector(), but for SEV-ES guests,
      implements the logic above.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <e8fbebe8eb161ceaabdad7c01a5859a78b424d5e.1609791600.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      647daca2
    • M
      KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit · f2c7ef3b
      Maxim Levitsky 提交于
      It is possible to exit the nested guest mode, entered by
      svm_set_nested_state prior to first vm entry to it (e.g due to pending event)
      if the nested run was not pending during the migration.
      
      In this case we must not switch to the nested msr permission bitmap.
      Also add a warning to catch similar cases in the future.
      
      Fixes: a7d5c7ce ("KVM: nSVM: delay MSR permission processing to first nested VM run")
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210107093854.882483-2-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f2c7ef3b
    • M
      KVM: nSVM: mark vmcb as dirty when forcingly leaving the guest mode · 56fe28de
      Maxim Levitsky 提交于
      We overwrite most of vmcb fields while doing so, so we must
      mark it as dirty.
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210107093854.882483-5-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      56fe28de