1. 31 3月, 2021 1 次提交
  2. 04 2月, 2021 5 次提交
  3. 02 2月, 2021 1 次提交
    • V
      KVM: x86: Supplement __cr4_reserved_bits() with X86_FEATURE_PCID check · 4683d758
      Vitaly Kuznetsov 提交于
      Commit 7a873e45 ("KVM: selftests: Verify supported CR4 bits can be set
      before KVM_SET_CPUID2") reveals that KVM allows to set X86_CR4_PCIDE even
      when PCID support is missing:
      
      ==== Test Assertion Failure ====
        x86_64/set_sregs_test.c:41: rc
        pid=6956 tid=6956 - Invalid argument
           1	0x000000000040177d: test_cr4_feature_bit at set_sregs_test.c:41
           2	0x00000000004014fc: main at set_sregs_test.c:119
           3	0x00007f2d9346d041: ?? ??:0
           4	0x000000000040164d: _start at ??:?
        KVM allowed unsupported CR4 bit (0x20000)
      
      Add X86_FEATURE_PCID feature check to __cr4_reserved_bits() to make
      kvm_is_valid_cr4() fail.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20210201142843.108190-1-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4683d758
  4. 15 12月, 2020 4 次提交
    • T
      KVM: SVM: Provide support for SEV-ES vCPU loading · 86137773
      Tom Lendacky 提交于
      An SEV-ES vCPU requires additional VMCB vCPU load/put requirements. SEV-ES
      hardware will restore certain registers on VMEXIT, but not save them on
      VMRUN (see Table B-3 and Table B-4 of the AMD64 APM Volume 2), so make the
      following changes:
      
      General vCPU load changes:
        - During vCPU loading, perform a VMSAVE to the per-CPU SVM save area and
          save the current values of XCR0, XSS and PKRU to the per-CPU SVM save
          area as these registers will be restored on VMEXIT.
      
      General vCPU put changes:
        - Do not attempt to restore registers that SEV-ES hardware has already
          restored on VMEXIT.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <019390e9cb5e93cd73014fa5a040c17d42588733.1607620209.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      86137773
    • T
      KVM: SVM: Support string IO operations for an SEV-ES guest · 7ed9abfe
      Tom Lendacky 提交于
      For an SEV-ES guest, string-based port IO is performed to a shared
      (un-encrypted) page so that both the hypervisor and guest can read or
      write to it and each see the contents.
      
      For string-based port IO operations, invoke SEV-ES specific routines that
      can complete the operation using common KVM port IO support.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <9d61daf0ffda496703717218f415cdc8fd487100.1607620209.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7ed9abfe
    • T
      KVM: SVM: Support MMIO for an SEV-ES guest · 8f423a80
      Tom Lendacky 提交于
      For an SEV-ES guest, MMIO is performed to a shared (un-encrypted) page
      so that both the hypervisor and guest can read or write to it and each
      see the contents.
      
      The GHCB specification provides software-defined VMGEXIT exit codes to
      indicate a request for an MMIO read or an MMIO write. Add support to
      recognize the MMIO requests and invoke SEV-ES specific routines that
      can complete the MMIO operation. These routines use common KVM support
      to complete the MMIO operation.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <af8de55127d5bcc3253d9b6084a0144c12307d4d.1607620209.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8f423a80
    • U
      KVM/VMX/SVM: Move kvm_machine_check function to x86.h · 3f1a18b9
      Uros Bizjak 提交于
      Move kvm_machine_check to x86.h to avoid two exact copies
      of the same function in kvm.c and svm.c.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NUros Bizjak <ubizjak@gmail.com>
      Message-Id: <20201029135600.122392-1-ubizjak@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3f1a18b9
  5. 15 11月, 2020 1 次提交
  6. 08 11月, 2020 1 次提交
    • M
      KVM: x86: use positive error values for msr emulation that causes #GP · cc4cb017
      Maxim Levitsky 提交于
      Recent introduction of the userspace msr filtering added code that uses
      negative error codes for cases that result in either #GP delivery to
      the guest, or handled by the userspace msr filtering.
      
      This breaks an assumption that a negative error code returned from the
      msr emulation code is a semi-fatal error which should be returned
      to userspace via KVM_RUN ioctl and usually kill the guest.
      
      Fix this by reusing the already existing KVM_MSR_RET_INVALID error code,
      and by adding a new KVM_MSR_RET_FILTERED error code for the
      userspace filtered msrs.
      
      Fixes: 291f35fb2c1d1 ("KVM: x86: report negative values from wrmsr emulation to userspace")
      Reported-by: NQian Cai <cai@redhat.com>
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20201101115523.115780-1-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cc4cb017
  7. 28 9月, 2020 4 次提交
  8. 11 7月, 2020 1 次提交
  9. 09 7月, 2020 6 次提交
  10. 16 5月, 2020 2 次提交
  11. 21 4月, 2020 1 次提交
    • S
      KVM: x86: Introduce KVM_REQ_TLB_FLUSH_CURRENT to flush current ASID · eeeb4f67
      Sean Christopherson 提交于
      Add KVM_REQ_TLB_FLUSH_CURRENT to allow optimized TLB flushing of VMX's
      EPTP/VPID contexts[*] from the KVM MMU and/or in a deferred manner, e.g.
      to flush L2's context during nested VM-Enter.
      
      Convert KVM_REQ_TLB_FLUSH to KVM_REQ_TLB_FLUSH_CURRENT in flows where
      the flush is directly associated with vCPU-scoped instruction emulation,
      i.e. MOV CR3 and INVPCID.
      
      Add a comment in vmx_vcpu_load_vmcs() above its KVM_REQ_TLB_FLUSH to
      make it clear that it deliberately requests a flush of all contexts.
      
      Service any pending flush request on nested VM-Exit as it's possible a
      nested VM-Exit could occur after requesting a flush for L2.  Add the
      same logic for nested VM-Enter even though it's _extremely_ unlikely
      for flush to be pending on nested VM-Enter, but theoretically possible
      (in the future) due to RSM (SMM) emulation.
      
      [*] Intel also has an Address Space Identifier (ASID) concept, e.g.
          EPTP+VPID+PCID == ASID, it's just not documented in the SDM because
          the rules of invalidation are different based on which piece of the
          ASID is being changed, i.e. whether the EPTP, VPID, or PCID context
          must be invalidated.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200320212833.3507-25-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      eeeb4f67
  12. 31 3月, 2020 1 次提交
  13. 17 3月, 2020 5 次提交
  14. 05 2月, 2020 1 次提交
    • S
      KVM: x86: Take a u64 when checking for a valid dr7 value · 9b5e8532
      Sean Christopherson 提交于
      Take a u64 instead of an unsigned long in kvm_dr7_valid() to fix a build
      warning on i386 due to right-shifting a 32-bit value by 32 when checking
      for bits being set in dr7[63:32].
      
      Alternatively, the warning could be resolved by rewriting the check to
      use an i386-friendly method, but taking a u64 fixes another oddity on
      32-bit KVM.  Beause KVM implements natural width VMCS fields as u64s to
      avoid layout issues between 32-bit and 64-bit, a devious guest can stuff
      vmcs12->guest_dr7 with a 64-bit value even when both the guest and host
      are 32-bit kernels.  KVM eventually drops vmcs12->guest_dr7[63:32] when
      propagating vmcs12->guest_dr7 to vmcs02, but ideally KVM would not rely
      on that behavior for correctness.
      
      Cc: Jim Mattson <jmattson@google.com>
      Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com>
      Fixes: ecb697d10f70 ("KVM: nVMX: Check GUEST_DR7 on vmentry of nested guests")
      Reported-by: NRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9b5e8532
  15. 28 1月, 2020 2 次提交
    • K
      KVM: nVMX: Check GUEST_DR7 on vmentry of nested guests · b91991bf
      Krish Sadhukhan 提交于
      According to section "Checks on Guest Control Registers, Debug Registers, and
      and MSRs" in Intel SDM vol 3C, the following checks are performed on vmentry
      of nested guests:
      
          If the "load debug controls" VM-entry control is 1, bits 63:32 in the DR7
          field must be 0.
      
      In KVM, GUEST_DR7 is set prior to the vmcs02 VM-entry by kvm_set_dr() and the
      latter synthesizes a #GP if any bit in the high dword in the former is set.
      Hence this field needs to be checked in software.
      Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Reviewed-by: NKarl Heubaum <karl.heubaum@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b91991bf
    • S
      KVM: x86: Perform non-canonical checks in 32-bit KVM · de761ea7
      Sean Christopherson 提交于
      Remove the CONFIG_X86_64 condition from the low level non-canonical
      helpers to effectively enable non-canonical checks on 32-bit KVM.
      Non-canonical checks are performed by hardware if the CPU *supports*
      64-bit mode, whether or not the CPU is actually in 64-bit mode is
      irrelevant.
      
      For the most part, skipping non-canonical checks on 32-bit KVM is ok-ish
      because 32-bit KVM always (hopefully) drops bits 63:32 of whatever value
      it's checking before propagating it to hardware, and architecturally,
      the expected behavior for the guest is a bit of a grey area since the
      vCPU itself doesn't support 64-bit mode.  I.e. a 32-bit KVM guest can
      observe the missed checks in several paths, e.g. INVVPID and VM-Enter,
      but it's debatable whether or not the missed checks constitute a bug
      because technically the vCPU doesn't support 64-bit mode.
      
      The primary motivation for enabling the non-canonical checks is defense
      in depth.  As mentioned above, a guest can trigger a missed check via
      INVVPID or VM-Enter.  INVVPID is straightforward as it takes a 64-bit
      virtual address as part of its 128-bit INVVPID descriptor and fails if
      the address is non-canonical, even if INVVPID is executed in 32-bit PM.
      Nested VM-Enter is a bit more convoluted as it requires the guest to
      write natural width VMCS fields via memory accesses and then VMPTRLD the
      VMCS, but it's still possible.  In both cases, KVM is saved from a true
      bug only because its flows that propagate values to hardware (correctly)
      take "unsigned long" parameters and so drop bits 63:32 of the bad value.
      
      Explicitly performing the non-canonical checks makes it less likely that
      a bad value will be propagated to hardware, e.g. in the INVVPID case,
      if __invvpid() didn't implicitly drop bits 63:32 then KVM would BUG() on
      the resulting unexpected INVVPID failure due to hardware rejecting the
      non-canonical address.
      
      The only downside to enabling the non-canonical checks is that it adds a
      relatively small amount of overhead, but the affected flows are not hot
      paths, i.e. the overhead is negligible.
      
      Note, KVM technically could gate the non-canonical checks on 32-bit KVM
      with static_cpu_has(X86_FEATURE_LM), but on bare metal that's an even
      bigger waste of code for everyone except the 0.00000000000001% of the
      population running on Yonah, and nested 32-bit on 64-bit already fudges
      things with respect to 64-bit CPU behavior.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      [Also do so in nested_vmx_check_host_state as reported by Krish. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      de761ea7
  16. 24 1月, 2020 1 次提交
    • P
      KVM: x86: avoid incorrect writes to host MSR_IA32_SPEC_CTRL · 6441fa61
      Paolo Bonzini 提交于
      If the guest is configured to have SPEC_CTRL but the host does not
      (which is a nonsensical configuration but these are not explicitly
      forbidden) then a host-initiated MSR write can write vmx->spec_ctrl
      (respectively svm->spec_ctrl) and trigger a #GP when KVM tries to
      restore the host value of the MSR.  Add a more comprehensive check
      for valid bits of SPEC_CTRL, covering host CPUID flags and,
      since we are at it and it is more correct that way, guest CPUID
      flags too.
      
      For AMD, remove the unnecessary is_guest_mode check around setting
      the MSR interception bitmap, so that the code looks the same as
      for Intel.
      
      Cc: Jim Mattson <jmattson@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6441fa61
  17. 21 1月, 2020 2 次提交
    • S
      KVM: x86: Move bit() helper to cpuid.h · a0a2260c
      Sean Christopherson 提交于
      Move bit() to cpuid.h in preparation for incorporating the reverse_cpuid
      array in bit() build-time assertions.  Opportunistically use the BIT()
      macro instead of open-coding the shift.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a0a2260c
    • W
      KVM: VMX: FIXED+PHYSICAL mode single target IPI fastpath · 1e9e2622
      Wanpeng Li 提交于
      ICR and TSCDEADLINE MSRs write cause the main MSRs write vmexits in our
      product observation, multicast IPIs are not as common as unicast IPI like
      RESCHEDULE_VECTOR and CALL_FUNCTION_SINGLE_VECTOR etc.
      
      This patch introduce a mechanism to handle certain performance-critical
      WRMSRs in a very early stage of KVM VMExit handler.
      
      This mechanism is specifically used for accelerating writes to x2APIC ICR
      that attempt to send a virtual IPI with physical destination-mode, fixed
      delivery-mode and single target. Which was found as one of the main causes
      of VMExits for Linux workloads.
      
      The reason this mechanism significantly reduce the latency of such virtual
      IPIs is by sending the physical IPI to the target vCPU in a very early stage
      of KVM VMExit handler, before host interrupts are enabled and before expensive
      operations such as reacquiring KVM’s SRCU lock.
      Latency is reduced even more when KVM is able to use APICv posted-interrupt
      mechanism (which allows to deliver the virtual IPI directly to target vCPU
      without the need to kick it to host).
      
      Testing on Xeon Skylake server:
      
      The virtual IPI latency from sender send to receiver receive reduces
      more than 200+ cpu cycles.
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Liran Alon <liran.alon@oracle.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1e9e2622
  18. 09 1月, 2020 1 次提交
    • S
      KVM: x86: Use gpa_t for cr2/gpa to fix TDP support on 32-bit KVM · 736c291c
      Sean Christopherson 提交于
      Convert a plethora of parameters and variables in the MMU and page fault
      flows from type gva_t to gpa_t to properly handle TDP on 32-bit KVM.
      
      Thanks to PSE and PAE paging, 32-bit kernels can access 64-bit physical
      addresses.  When TDP is enabled, the fault address is a guest physical
      address and thus can be a 64-bit value, even when both KVM and its guest
      are using 32-bit virtual addressing, e.g. VMX's VMCS.GUEST_PHYSICAL is a
      64-bit field, not a natural width field.
      
      Using a gva_t for the fault address means KVM will incorrectly drop the
      upper 32-bits of the GPA.  Ditto for gva_to_gpa() when it is used to
      translate L2 GPAs to L1 GPAs.
      
      Opportunistically rename variables and parameters to better reflect the
      dual address modes, e.g. use "cr2_or_gpa" for fault addresses and plain
      "addr" instead of "vaddr" when the address may be either a GVA or an L2
      GPA.  Similarly, use "gpa" in the nonpaging_page_fault() flows to avoid
      a confusing "gpa_t gva" declaration; this also sets the stage for a
      future patch to combing nonpaging_page_fault() and tdp_page_fault() with
      minimal churn.
      
      Sprinkle in a few comments to document flows where an address is known
      to be a GVA and thus can be safely truncated to a 32-bit value.  Add
      WARNs in kvm_handle_page_fault() and FNAME(gva_to_gpa_nested)() to help
      document such cases and detect bugs.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      736c291c