1. 26 4月, 2021 5 次提交
    • S
      KVM: SVM: Use default rAX size for INVLPGA emulation · bc9eff67
      Sean Christopherson 提交于
      Drop bits 63:32 of RAX when grabbing the address for INVLPGA emulation
      outside of 64-bit mode to make KVM's emulation slightly less wrong.  The
      address for INVLPGA is determined by the effective address size, i.e.
      it's not hardcoded to 64/32 bits for a given mode.  Add a FIXME to call
      out that the emulation is wrong.
      
      Opportunistically tweak the ASID handling to make it clear that it's
      defined by ECX, not rCX.
      
      Per the APM:
         The portion of rAX used to form the address is determined by the
         effective address size (current execution mode and optional address
         size prefix). The ASID is taken from ECX.
      
      Fixes: ff092385 ("KVM: SVM: Implement INVLPGA")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210422022128.3464144-9-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      bc9eff67
    • S
      KVM: SVM: Truncate GPR value for DR and CR accesses in !64-bit mode · 0884335a
      Sean Christopherson 提交于
      Drop bits 63:32 on loads/stores to/from DRs and CRs when the vCPU is not
      in 64-bit mode.  The APM states bits 63:32 are dropped for both DRs and
      CRs:
      
        In 64-bit mode, the operand size is fixed at 64 bits without the need
        for a REX prefix. In non-64-bit mode, the operand size is fixed at 32
        bits and the upper 32 bits of the destination are forced to 0.
      
      Fixes: 7ff76d58 ("KVM: SVM: enhance MOV CR intercept handler")
      Fixes: cae3797a ("KVM: SVM: enhance mov DR intercept handler")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210422022128.3464144-4-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0884335a
    • S
      KVM: SVM: Delay restoration of host MSR_TSC_AUX until return to userspace · 844d69c2
      Sean Christopherson 提交于
      Use KVM's "user return MSRs" framework to defer restoring the host's
      MSR_TSC_AUX until the CPU returns to userspace.  Add/improve comments to
      clarify why MSR_TSC_AUX is intercepted on both RDMSR and WRMSR, and why
      it's safe for KVM to keep the guest's value loaded even if KVM is
      scheduled out.
      
      Cc: Reiji Watanabe <reijiw@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210423223404.3860547-5-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      844d69c2
    • S
      KVM: SVM: Clear MSR_TSC_AUX[63:32] on write · dbd61273
      Sean Christopherson 提交于
      Force clear bits 63:32 of MSR_TSC_AUX on write to emulate current AMD
      CPUs, which completely ignore the upper 32 bits, including dropping them
      on write.  Emulating AMD hardware will also allow migrating a vCPU from
      AMD hardware to Intel hardware without requiring userspace to manually
      clear the upper bits, which are reserved on Intel hardware.
      
      Presumably, MSR_TSC_AUX[63:32] are intended to be reserved on AMD, but
      sadly the APM doesn't say _anything_ about those bits in the context of
      MSR access.  The RDTSCP entry simply states that RCX contains bits 31:0
      of the MSR, zero extended.  And even worse is that the RDPID description
      implies that it can consume all 64 bits of the MSR:
      
        RDPID reads the value of TSC_AUX MSR used by the RDTSCP instruction
        into the specified destination register. Normal operand size prefixes
        do not apply and the update is either 32 bit or 64 bit based on the
        current mode.
      
      Emulate current hardware behavior to give KVM the best odds of playing
      nice with whatever the behavior of future AMD CPUs happens to be.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210423223404.3860547-3-seanjc@google.com>
      [Fix broken patch. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      dbd61273
    • S
      KVM: SVM: Inject #GP on guest MSR_TSC_AUX accesses if RDTSCP unsupported · 6f2b296a
      Sean Christopherson 提交于
      Inject #GP on guest accesses to MSR_TSC_AUX if RDTSCP is unsupported in
      the guest's CPUID model.
      
      Fixes: 46896c73 ("KVM: svm: add support for RDTSCP")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210423223404.3860547-2-seanjc@google.com>
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6f2b296a
  2. 22 4月, 2021 1 次提交
    • N
      KVM: x86: Support KVM VMs sharing SEV context · 54526d1f
      Nathan Tempelman 提交于
      Add a capability for userspace to mirror SEV encryption context from
      one vm to another. On our side, this is intended to support a
      Migration Helper vCPU, but it can also be used generically to support
      other in-guest workloads scheduled by the host. The intention is for
      the primary guest and the mirror to have nearly identical memslots.
      
      The primary benefits of this are that:
      1) The VMs do not share KVM contexts (think APIC/MSRs/etc), so they
      can't accidentally clobber each other.
      2) The VMs can have different memory-views, which is necessary for post-copy
      migration (the migration vCPUs on the target need to read and write to
      pages, when the primary guest would VMEXIT).
      
      This does not change the threat model for AMD SEV. Any memory involved
      is still owned by the primary guest and its initial state is still
      attested to through the normal SEV_LAUNCH_* flows. If userspace wanted
      to circumvent SEV, they could achieve the same effect by simply attaching
      a vCPU to the primary VM.
      This patch deliberately leaves userspace in charge of the memslots for the
      mirror, as it already has the power to mess with them in the primary guest.
      
      This patch does not support SEV-ES (much less SNP), as it does not
      handle handing off attested VMSAs to the mirror.
      
      For additional context, we need a Migration Helper because SEV PSP
      migration is far too slow for our live migration on its own. Using
      an in-guest migrator lets us speed this up significantly.
      Signed-off-by: NNathan Tempelman <natet@google.com>
      Message-Id: <20210408223214.2582277-1-natet@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      54526d1f
  3. 20 4月, 2021 5 次提交
    • K
      KVM: SVM: Define actual size of IOPM and MSRPM tables · 47903dc1
      Krish Sadhukhan 提交于
      Define the actual size of the IOPM and MSRPM tables so that the actual size
      can be used when initializing them and when checking the consistency of their
      physical address.
      These #defines are placed in svm.h so that they can be shared.
      Suggested-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Message-Id: <20210412215611.110095-2-krish.sadhukhan@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      47903dc1
    • S
      KVM: SVM: Enhance and clean up the vmcb tracking comment in pre_svm_run() · 44f1b558
      Sean Christopherson 提交于
      Explicitly document why a vmcb must be marked dirty and assigned a new
      asid when it will be run on a different cpu.  The "what" is relatively
      obvious, whereas the "why" requires reading the APM and/or KVM code.
      
      Opportunistically remove a spurious period and several unnecessary
      newlines in the comment.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210406171811.4043363-5-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      44f1b558
    • S
      KVM: SVM: Drop vcpu_svm.vmcb_pa · d1788191
      Sean Christopherson 提交于
      Remove vmcb_pa from vcpu_svm and simply read current_vmcb->pa directly in
      the one path where it is consumed.  Unlike svm->vmcb, use of the current
      vmcb's address is very limited, as evidenced by the fact that its use
      can be trimmed to a single dereference.
      
      Opportunistically add a comment about using vmcb01 for VMLOAD/VMSAVE, at
      first glance using vmcb01 instead of vmcb_pa looks wrong.
      
      No functional change intended.
      
      Cc: Maxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210406171811.4043363-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d1788191
    • S
      KVM: SVM: Don't set current_vmcb->cpu when switching vmcb · 17e5e964
      Sean Christopherson 提交于
      Do not update the new vmcb's last-run cpu when switching to a different
      vmcb.  If the vCPU is migrated between its last run and a vmcb switch,
      e.g. for nested VM-Exit, then setting the cpu without marking the vmcb
      dirty will lead to KVM running the vCPU on a different physical cpu with
      stale clean bit settings.
      
                                vcpu->cpu    current_vmcb->cpu    hardware
        pre_svm_run()           cpu0         cpu0                 cpu0,clean
        kvm_arch_vcpu_load()    cpu1         cpu0                 cpu0,clean
        svm_switch_vmcb()       cpu1         cpu1                 cpu0,clean
        pre_svm_run()           cpu1         cpu1                 kaboom
      
      Simply delete the offending code; unlike VMX, which needs to update the
      cpu at switch time due to the need to do VMPTRLD, SVM only cares about
      which cpu last ran the vCPU.
      
      Fixes: af18fa77 ("KVM: nSVM: Track the physical cpu of the vmcb vmrun through the vmcb")
      Cc: Cathy Avery <cavery@redhat.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210406171811.4043363-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      17e5e964
    • T
      KVM: SVM: Make sure GHCB is mapped before updating · a3ba26ec
      Tom Lendacky 提交于
      Access to the GHCB is mainly in the VMGEXIT path and it is known that the
      GHCB will be mapped. But there are two paths where it is possible the GHCB
      might not be mapped.
      
      The sev_vcpu_deliver_sipi_vector() routine will update the GHCB to inform
      the caller of the AP Reset Hold NAE event that a SIPI has been delivered.
      However, if a SIPI is performed without a corresponding AP Reset Hold,
      then the GHCB might not be mapped (depending on the previous VMEXIT),
      which will result in a NULL pointer dereference.
      
      The svm_complete_emulated_msr() routine will update the GHCB to inform
      the caller of a RDMSR/WRMSR operation about any errors. While it is likely
      that the GHCB will be mapped in this situation, add a safe guard
      in this path to be certain a NULL pointer dereference is not encountered.
      
      Fixes: f1c6366e ("KVM: SVM: Add required changes to support intercepts under SEV-ES")
      Fixes: 647daca2 ("KVM: SVM: Add support for booting APs in an SEV-ES guest")
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Cc: stable@vger.kernel.org
      Message-Id: <a5d3ebb600a91170fc88599d5a575452b3e31036.1617979121.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a3ba26ec
  4. 17 4月, 2021 1 次提交
    • M
      KVM: nSVM: improve SYSENTER emulation on AMD · adc2a237
      Maxim Levitsky 提交于
      Currently to support Intel->AMD migration, if CPU vendor is GenuineIntel,
      we emulate the full 64 value for MSR_IA32_SYSENTER_{EIP|ESP}
      msrs, and we also emulate the sysenter/sysexit instruction in long mode.
      
      (Emulator does still refuse to emulate sysenter in 64 bit mode, on the
      ground that the code for that wasn't tested and likely has no users)
      
      However when virtual vmload/vmsave is enabled, the vmload instruction will
      update these 32 bit msrs without triggering their msr intercept,
      which will lead to having stale values in kvm's shadow copy of these msrs,
      which relies on the intercept to be up to date.
      
      Fix/optimize this by doing the following:
      
      1. Enable the MSR intercepts for SYSENTER MSRs iff vendor=GenuineIntel
         (This is both a tiny optimization and also ensures that in case
         the guest cpu vendor is AMD, the msrs will be 32 bit wide as
         AMD defined).
      
      2. Store only high 32 bit part of these msrs on interception and combine
         it with hardware msr value on intercepted read/writes
         iff vendor=GenuineIntel.
      
      3. Disable vmload/vmsave virtualization if vendor=GenuineIntel.
         (It is somewhat insane to set vendor=GenuineIntel and still enable
         SVM for the guest but well whatever).
         Then zero the high 32 bit parts when kvm intercepts and emulates vmload.
      
      Thanks a lot to Paulo Bonzini for helping me with fixing this in the most
      correct way.
      
      This patch fixes nested migration of 32 bit nested guests, that was
      broken because incorrect cached values of SYSENTER msrs were stored in
      the migration stream if L1 changed these msrs with
      vmload prior to L2 entry.
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210401111928.996871-3-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      adc2a237
  5. 15 3月, 2021 21 次提交
    • S
      KVM: x86/mmu: Mark the PAE roots as decrypted for shadow paging · 4a98623d
      Sean Christopherson 提交于
      Set the PAE roots used as decrypted to play nice with SME when KVM is
      using shadow paging.  Explicitly skip setting the C-bit when loading
      CR3 for PAE shadow paging, even though it's completely ignored by the
      CPU.  The extra documentation is nice to have.
      
      Note, there are several subtleties at play with NPT.  In addition to
      legacy shadow paging, the PAE roots are used for SVM's NPT when either
      KVM is 32-bit (uses PAE paging) or KVM is 64-bit and shadowing 32-bit
      NPT.  However, 32-bit Linux, and thus KVM, doesn't support SME.  And
      64-bit KVM can happily set the C-bit in CR3.  This also means that
      keeping __sme_set(root) for 32-bit KVM when NPT is enabled is
      conceptually wrong, but functionally ok since SME is 64-bit only.
      Leave it as is to avoid unnecessary pollution.
      
      Fixes: d0ec49d4 ("kvm/x86/svm: Support Secure Memory Encryption within KVM")
      Cc: stable@vger.kernel.org
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210309224207.1218275-5-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4a98623d
    • S
      KVM: x86: Get active PCID only when writing a CR3 value · e83bc09c
      Sean Christopherson 提交于
      Retrieve the active PCID only when writing a guest CR3 value, i.e. don't
      get the PCID when using EPT or NPT.  The PCID is especially problematic
      for EPT as the bits have different meaning, and so the PCID and must be
      manually stripped, which is annoying and unnecessary.  And on VMX,
      getting the active PCID also involves reading the guest's CR3 and
      CR4.PCIDE, i.e. may add pointless VMREADs.
      
      Opportunistically rename the pgd/pgd_level params to root_hpa and
      root_level to better reflect their new roles.  Keep the function names,
      as "load the guest PGD" is still accurate/correct.
      
      Last, and probably least, pass root_hpa as a hpa_t/u64 instead of an
      unsigned long.  The EPTP holds a 64-bit value, even in 32-bit mode, so
      in theory EPT could support HIGHMEM for 32-bit KVM.  Never mind that
      doing so would require changing the MMU page allocators and reworking
      the MMU to use kmap().
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210305183123.3978098-2-seanjc@google.com>
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e83bc09c
    • S
      KVM: x86/mmu: Stop using software available bits to denote MMIO SPTEs · 8120337a
      Sean Christopherson 提交于
      Stop tagging MMIO SPTEs with specific available bits and instead detect
      MMIO SPTEs by checking for their unique SPTE value.  The value is
      guaranteed to be unique on shadow paging and NPT as setting reserved
      physical address bits on any other type of SPTE would consistute a KVM
      bug.  Ditto for EPT, as creating a WX non-MMIO would also be a bug.
      
      Note, this approach is also future-compatibile with TDX, which will need
      to reflect MMIO EPT violations as #VEs into the guest.  To create an EPT
      violation instead of a misconfig, TDX EPTs will need to have RWX=0,  But,
      MMIO SPTEs will also be the only case where KVM clears SUPPRESS_VE, so
      MMIO SPTEs will still be guaranteed to have a unique value within a given
      MMU context.
      
      The main motivation is to make it easier to reason about which types of
      SPTEs use which available bits.  As a happy side effect, this frees up
      two more bits for storing the MMIO generation.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210225204749.1512652-11-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8120337a
    • C
      KVM: nSVM: Optimize vmcb12 to vmcb02 save area copies · 8173396e
      Cathy Avery 提交于
      Use the vmcb12 control clean field to determine which vmcb12.save
      registers were marked dirty in order to minimize register copies
      when switching from L1 to L2. Those vmcb12 registers marked as dirty need
      to be copied to L0's vmcb02 as they will be used to update the vmcb
      state cache for the L2 VMRUN.  In the case where we have a different
      vmcb12 from the last L2 VMRUN all vmcb12.save registers must be
      copied over to L2.save.
      
      Tested:
      kvm-unit-tests
      kvm selftests
      Fedora L1 L2
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NCathy Avery <cavery@redhat.com>
      Message-Id: <20210301200844.2000-1-cavery@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8173396e
    • B
      KVM: SVM: Add support for Virtual SPEC_CTRL · d00b99c5
      Babu Moger 提交于
      Newer AMD processors have a feature to virtualize the use of the
      SPEC_CTRL MSR. Presence of this feature is indicated via CPUID
      function 0x8000000A_EDX[20]: GuestSpecCtrl. Hypervisors are not
      required to enable this feature since it is automatically enabled on
      processors that support it.
      
      A hypervisor may wish to impose speculation controls on guest
      execution or a guest may want to impose its own speculation controls.
      Therefore, the processor implements both host and guest
      versions of SPEC_CTRL.
      
      When in host mode, the host SPEC_CTRL value is in effect and writes
      update only the host version of SPEC_CTRL. On a VMRUN, the processor
      loads the guest version of SPEC_CTRL from the VMCB. When the guest
      writes SPEC_CTRL, only the guest version is updated. On a VMEXIT,
      the guest version is saved into the VMCB and the processor returns
      to only using the host SPEC_CTRL for speculation control. The guest
      SPEC_CTRL is located at offset 0x2E0 in the VMCB.
      
      The effective SPEC_CTRL setting is the guest SPEC_CTRL setting or'ed
      with the hypervisor SPEC_CTRL setting. This allows the hypervisor to
      ensure a minimum SPEC_CTRL if desired.
      
      This support also fixes an issue where a guest may sometimes see an
      inconsistent value for the SPEC_CTRL MSR on processors that support
      this feature. With the current SPEC_CTRL support, the first write to
      SPEC_CTRL is intercepted and the virtualized version of the SPEC_CTRL
      MSR is not updated. When the guest reads back the SPEC_CTRL MSR, it
      will be 0x0, instead of the actual expected value. There isn’t a
      security concern here, because the host SPEC_CTRL value is or’ed with
      the Guest SPEC_CTRL value to generate the effective SPEC_CTRL value.
      KVM writes with the guest's virtualized SPEC_CTRL value to SPEC_CTRL
      MSR just before the VMRUN, so it will always have the actual value
      even though it doesn’t appear that way in the guest. The guest will
      only see the proper value for the SPEC_CTRL register if the guest was
      to write to the SPEC_CTRL register again. With Virtual SPEC_CTRL
      support, the save area spec_ctrl is properly saved and restored.
      So, the guest will always see the proper value when it is read back.
      Signed-off-by: NBabu Moger <babu.moger@amd.com>
      Message-Id: <161188100955.28787.11816849358413330720.stgit@bmoger-ubuntu>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d00b99c5
    • M
      KVM: nSVM: always use vmcb01 to for vmsave/vmload of guest state · cc3ed80a
      Maxim Levitsky 提交于
      This allows to avoid copying of these fields between vmcb01
      and vmcb02 on nested guest entry/exit.
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cc3ed80a
    • P
      KVM: SVM: move VMLOAD/VMSAVE to C code · fb0c4a4f
      Paolo Bonzini 提交于
      Thanks to the new macros that handle exception handling for SVM
      instructions, it is easier to just do the VMLOAD/VMSAVE in C.
      This is safe, as shown by the fact that the host reload is
      already done outside the assembly source.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fb0c4a4f
    • S
      KVM: SVM: Skip intercepted PAUSE instructions after emulation · c8781fea
      Sean Christopherson 提交于
      Skip PAUSE after interception to avoid unnecessarily re-executing the
      instruction in the guest, e.g. after regaining control post-yield.
      This is a benign bug as KVM disables PAUSE interception if filtering is
      off, including the case where pause_filter_count is set to zero.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210205005750.3841462-10-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c8781fea
    • S
      KVM: SVM: Don't manually emulate RDPMC if nrips=0 · 32c23c7d
      Sean Christopherson 提交于
      Remove bizarre code that causes KVM to run RDPMC through the emulator
      when nrips is disabled.  Accelerated emulation of RDPMC doesn't rely on
      any additional data from the VMCB, and SVM has generic handling for
      updating RIP to skip instructions when nrips is disabled.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210205005750.3841462-9-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      32c23c7d
    • S
      KVM: x86: Move RDPMC emulation to common code · c483c454
      Sean Christopherson 提交于
      Move the entirety of the accelerated RDPMC emulation to x86.c, and assign
      the common handler directly to the exit handler array for VMX.  SVM has
      bizarre nrips behavior that prevents it from directly invoking the common
      handler.  The nrips goofiness will be addressed in a future patch.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210205005750.3841462-8-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c483c454
    • S
      KVM: x86: Move trivial instruction-based exit handlers to common code · 5ff3a351
      Sean Christopherson 提交于
      Move the trivial exit handlers, e.g. for instructions that KVM
      "emulates" as nops, to common x86 code.  Assign the common handlers
      directly to the exit handler arrays and drop the vendor trampolines.
      
      Opportunistically use pr_warn_once() where appropriate.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210205005750.3841462-7-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5ff3a351
    • S
      KVM: x86: Move XSETBV emulation to common code · 92f9895c
      Sean Christopherson 提交于
      Move the entirety of XSETBV emulation to x86.c, and assign the
      function directly to both VMX's and SVM's exit handlers, i.e. drop the
      unnecessary trampolines.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210205005750.3841462-6-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      92f9895c
    • S
      KVM: nSVM: Add VMLOAD/VMSAVE helper to deduplicate code · 2ac636a6
      Sean Christopherson 提交于
      Add another helper layer for VMLOAD+VMSAVE, the code is identical except
      for the one line that determines which VMCB is the source and which is
      the destination.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210205005750.3841462-5-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2ac636a6
    • S
      KVM: nSVM: Add helper to synthesize nested VM-Exit without collateral · 3a87c7e0
      Sean Christopherson 提交于
      Add a helper to consolidate boilerplate for nested VM-Exits that don't
      provide any data in exit_info_*.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210302174515.2812275-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3a87c7e0
    • P
      KVM: SVM: Pass struct kvm_vcpu to exit handlers (and many, many other places) · 63129754
      Paolo Bonzini 提交于
      Refactor the svm_exit_handlers API to pass @vcpu instead of @svm to
      allow directly invoking common x86 exit handlers (in a future patch).
      Opportunistically convert an absurd number of instances of 'svm->vcpu'
      to direct uses of 'vcpu' to avoid pointless casting.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210205005750.3841462-4-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      63129754
    • P
      KVM: SVM: merge update_cr0_intercept into svm_set_cr0 · 2a32a77c
      Paolo Bonzini 提交于
      The logic of update_cr0_intercept is pointlessly complicated.
      All svm_set_cr0 is compute the effective cr0 and compare it with
      the guest value.
      
      Inlining the function and simplifying the condition
      clarifies what it is doing.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2a32a77c
    • P
      KVM: nSVM: rename functions and variables according to vmcbXY nomenclature · 9e8f0fbf
      Paolo Bonzini 提交于
      Now that SVM is using a separate vmcb01 and vmcb02 (and also uses the vmcb12
      naming) we can give clearer names to functions that write to and read
      from those VMCBs.  Likewise, variables and parameters can be renamed
      from nested_vmcb to vmcb12.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9e8f0fbf
    • C
      KVM: nSVM: Track the ASID generation of the vmcb vmrun through the vmcb · 193015ad
      Cathy Avery 提交于
      This patch moves the asid_generation from the vcpu to the vmcb
      in order to track the ASID generation that was active the last
      time the vmcb was run. If sd->asid_generation changes between
      two runs, the old ASID is invalid and must be changed.
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NCathy Avery <cavery@redhat.com>
      Message-Id: <20210112164313.4204-3-cavery@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      193015ad
    • C
      KVM: nSVM: Track the physical cpu of the vmcb vmrun through the vmcb · af18fa77
      Cathy Avery 提交于
      This patch moves the physical cpu tracking from the vcpu
      to the vmcb in svm_switch_vmcb. If either vmcb01 or vmcb02
      change physical cpus from one vmrun to the next the vmcb's
      previous cpu is preserved for comparison with the current
      cpu and the vmcb is marked dirty if different. This prevents
      the processor from using old cached data for a vmcb that may
      have been updated on a prior run on a different processor.
      
      It also moves the physical cpu check from svm_vcpu_load
      to pre_svm_run as the check only needs to be done at run.
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NCathy Avery <cavery@redhat.com>
      Message-Id: <20210112164313.4204-2-cavery@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      af18fa77
    • C
      KVM: SVM: Use a separate vmcb for the nested L2 guest · 4995a368
      Cathy Avery 提交于
      svm->vmcb will now point to a separate vmcb for L1 (not nested) or L2
      (nested).
      
      The main advantages are removing get_host_vmcb and hsave, in favor of
      concepts that are shared with VMX.
      
      We don't need anymore to stash the L1 registers in hsave while L2
      runs, but we need to copy the VMLOAD/VMSAVE registers from VMCB01 to
      VMCB02 and back.  This more or less has the same cost, but code-wise
      nested_svm_vmloadsave can be reused.
      
      This patch omits several optimizations that are possible:
      
      - for simplicity there is some wholesale copying of vmcb.control areas
      which can go away.
      
      - we should be able to better use the VMCB01 and VMCB02 clean bits.
      
      - another possibility is to always use VMCB01 for VMLOAD and VMSAVE,
      thus avoiding the copy of VMLOAD/VMSAVE registers from VMCB01 to
      VMCB02 and back.
      
      Tested:
      kvm-unit-tests
      kvm self tests
      Loaded fedora nested guest on fedora
      Signed-off-by: NCathy Avery <cavery@redhat.com>
      Message-Id: <20201011184818.3609-3-cavery@redhat.com>
      [Fix conflicts; keep VMCB02 G_PAT up to date whenever guest writes the
       PAT MSR; do not copy CR4 over from VMCB01 as it is not needed anymore; add
       a few more comments. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4995a368
    • S
      KVM: SVM: Don't strip the C-bit from CR2 on #PF interception · 6d1b867d
      Sean Christopherson 提交于
      Don't strip the C-bit from the faulting address on an intercepted #PF,
      the address is a virtual address, not a physical address.
      
      Fixes: 0ede79e1 ("KVM: SVM: Clear C-bit from the page fault address")
      Cc: stable@vger.kernel.org
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210305011101.3597423-13-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6d1b867d
  6. 05 3月, 2021 1 次提交
  7. 03 3月, 2021 1 次提交
    • B
      KVM: SVM: Clear the CR4 register on reset · 9e46f6c6
      Babu Moger 提交于
      This problem was reported on a SVM guest while executing kexec.
      Kexec fails to load the new kernel when the PCID feature is enabled.
      
      When kexec starts loading the new kernel, it starts the process by
      resetting the vCPU's and then bringing each vCPU online one by one.
      The vCPU reset is supposed to reset all the register states before the
      vCPUs are brought online. However, the CR4 register is not reset during
      this process. If this register is already setup during the last boot,
      all the flags can remain intact. The X86_CR4_PCIDE bit can only be
      enabled in long mode. So, it must be enabled much later in SMP
      initialization.  Having the X86_CR4_PCIDE bit set during SMP boot can
      cause a boot failures.
      
      Fix the issue by resetting the CR4 register in init_vmcb().
      Signed-off-by: NBabu Moger <babu.moger@amd.com>
      Message-Id: <161471109108.30811.6392805173629704166.stgit@bmoger-ubuntu>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9e46f6c6
  8. 25 2月, 2021 1 次提交
    • S
      KVM: SVM: Fix nested VM-Exit on #GP interception handling · 2df8d380
      Sean Christopherson 提交于
      Fix the interpreation of nested_svm_vmexit()'s return value when
      synthesizing a nested VM-Exit after intercepting an SVM instruction while
      L2 was running.  The helper returns '0' on success, whereas a return
      value of '0' in the exit handler path means "exit to userspace".  The
      incorrect return value causes KVM to exit to userspace without filling
      the run state, e.g. QEMU logs "KVM: unknown exit, hardware reason 0".
      
      Fixes: 14c2bf81 ("KVM: SVM: Fix #GP handling for doubly-nested virtualization")
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210224005627.657028-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2df8d380
  9. 18 2月, 2021 2 次提交
  10. 11 2月, 2021 1 次提交
  11. 09 2月, 2021 1 次提交
    • P
      KVM: x86: move kvm_inject_gp up from kvm_set_dr to callers · 996ff542
      Paolo Bonzini 提交于
      Push the injection of #GP up to the callers, so that they can just use
      kvm_complete_insn_gp. __kvm_set_dr is pretty much what the callers can use
      together with kvm_complete_insn_gp, so rename it to kvm_set_dr and drop
      the old kvm_set_dr wrapper.
      
      This also allows nested VMX code, which really wanted to use __kvm_set_dr,
      to use the right function.
      
      While at it, remove the kvm_require_dr() check from the SVM interception.
      The APM states:
      
        All normal exception checks take precedence over the SVM intercepts.
      
      which includes the CR4.DE=1 #UD.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      996ff542