1. 26 4月, 2021 16 次提交
    • S
      KVM: SVM: Use default rAX size for INVLPGA emulation · bc9eff67
      Sean Christopherson 提交于
      Drop bits 63:32 of RAX when grabbing the address for INVLPGA emulation
      outside of 64-bit mode to make KVM's emulation slightly less wrong.  The
      address for INVLPGA is determined by the effective address size, i.e.
      it's not hardcoded to 64/32 bits for a given mode.  Add a FIXME to call
      out that the emulation is wrong.
      
      Opportunistically tweak the ASID handling to make it clear that it's
      defined by ECX, not rCX.
      
      Per the APM:
         The portion of rAX used to form the address is determined by the
         effective address size (current execution mode and optional address
         size prefix). The ASID is taken from ECX.
      
      Fixes: ff092385 ("KVM: SVM: Implement INVLPGA")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210422022128.3464144-9-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      bc9eff67
    • S
      KVM: x86/xen: Drop RAX[63:32] when processing hypercall · 6b48fd4c
      Sean Christopherson 提交于
      Truncate RAX to 32 bits, i.e. consume EAX, when retrieving the hypecall
      index for a Xen hypercall.  Per Xen documentation[*], the index is EAX
      when the vCPU is not in 64-bit mode.
      
      [*] http://xenbits.xenproject.org/docs/sphinx-unstable/guest-guide/x86/hypercall-abi.html
      
      Fixes: 23200b7a ("KVM: x86/xen: intercept xen hypercalls if enabled")
      Cc: Joao Martins <joao.m.martins@oracle.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210422022128.3464144-8-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6b48fd4c
    • S
      KVM: nVMX: Truncate base/index GPR value on address calc in !64-bit · 82277eee
      Sean Christopherson 提交于
      Drop bits 63:32 of the base and/or index GPRs when calculating the
      effective address of a VMX instruction memory operand.  Outside of 64-bit
      mode, memory encodings are strictly limited to E*X and below.
      
      Fixes: 064aea77 ("KVM: nVMX: Decoding memory operands of VMX instructions")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210422022128.3464144-7-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      82277eee
    • S
      KVM: nVMX: Truncate bits 63:32 of VMCS field on nested check in !64-bit · ee050a57
      Sean Christopherson 提交于
      Drop bits 63:32 of the VMCS field encoding when checking for a nested
      VM-Exit on VMREAD/VMWRITE in !64-bit mode.  VMREAD and VMWRITE always
      use 32-bit operands outside of 64-bit mode.
      
      The actual emulation of VMREAD/VMWRITE does the right thing, this bug is
      purely limited to incorrectly causing a nested VM-Exit if a GPR happens
      to have bits 63:32 set outside of 64-bit mode.
      
      Fixes: a7cde481 ("KVM: nVMX: Do not forward VMREAD/VMWRITE VMExits to L1 if required so by vmcs12 vmread/vmwrite bitmaps")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210422022128.3464144-6-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ee050a57
    • S
      KVM: VMX: Truncate GPR value for DR and CR reads in !64-bit mode · d8971344
      Sean Christopherson 提交于
      Drop bits 63:32 when storing a DR/CR to a GPR when the vCPU is not in
      64-bit mode.  Per the SDM:
      
        The operand size for these instructions is always 32 bits in non-64-bit
        modes, regardless of the operand-size attribute.
      
      CR8 technically isn't affected as CR8 isn't accessible outside of 64-bit
      mode, but fix it up for consistency and to allow for future cleanup.
      
      Fixes: 6aa8b732 ("[PATCH] kvm: userspace interface")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210422022128.3464144-5-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d8971344
    • S
      KVM: SVM: Truncate GPR value for DR and CR accesses in !64-bit mode · 0884335a
      Sean Christopherson 提交于
      Drop bits 63:32 on loads/stores to/from DRs and CRs when the vCPU is not
      in 64-bit mode.  The APM states bits 63:32 are dropped for both DRs and
      CRs:
      
        In 64-bit mode, the operand size is fixed at 64 bits without the need
        for a REX prefix. In non-64-bit mode, the operand size is fixed at 32
        bits and the upper 32 bits of the destination are forced to 0.
      
      Fixes: 7ff76d58 ("KVM: SVM: enhance MOV CR intercept handler")
      Fixes: cae3797a ("KVM: SVM: enhance mov DR intercept handler")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210422022128.3464144-4-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0884335a
    • S
      KVM: x86: Check CR3 GPA for validity regardless of vCPU mode · 886bbcc7
      Sean Christopherson 提交于
      Check CR3 for an invalid GPA even if the vCPU isn't in long mode.  For
      bigger emulation flows, notably RSM, the vCPU mode may not be accurate
      if CR0/CR4 are loaded after CR3.  For MOV CR3 and similar flows, the
      caller is responsible for truncating the value.
      
      Fixes: 660a5d51 ("KVM: x86: save/load state on SMM switch")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210422022128.3464144-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      886bbcc7
    • S
      KVM: x86: Remove emulator's broken checks on CR0/CR3/CR4 loads · d0fe7b64
      Sean Christopherson 提交于
      Remove the emulator's checks for illegal CR0, CR3, and CR4 values, as
      the checks are redundant, outdated, and in the case of SEV's C-bit,
      broken.  The emulator manually calculates MAXPHYADDR from CPUID and
      neglects to mask off the C-bit.  For all other checks, kvm_set_cr*() are
      a superset of the emulator checks, e.g. see CR4.LA57.
      
      Fixes: a780a3ea ("KVM: X86: Fix reserved bits check for MOV to CR3")
      Cc: Babu Moger <babu.moger@amd.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210422022128.3464144-2-seanjc@google.com>
      Cc: stable@vger.kernel.org
      [Unify check_cr_read and check_cr_write. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d0fe7b64
    • S
      KVM: VMX: Intercept FS/GS_BASE MSR accesses for 32-bit KVM · dbdd096a
      Sean Christopherson 提交于
      Disable pass-through of the FS and GS base MSRs for 32-bit KVM.  Intel's
      SDM unequivocally states that the MSRs exist if and only if the CPU
      supports x86-64.  FS_BASE and GS_BASE are mostly a non-issue; a clever
      guest could opportunistically use the MSRs without issue.  KERNEL_GS_BASE
      is a bigger problem, as a clever guest would subtly be broken if it were
      migrated, as KVM disallows software access to the MSRs, and unlike the
      direct variants, KERNEL_GS_BASE needs to be explicitly migrated as it's
      not captured in the VMCS.
      
      Fixes: 25c5f225 ("KVM: VMX: Enable MSR Bitmap feature")
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210422023831.3473491-1-seanjc@google.com>
      [*NOT* for stable kernels. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      dbdd096a
    • S
      KVM: SVM: Delay restoration of host MSR_TSC_AUX until return to userspace · 844d69c2
      Sean Christopherson 提交于
      Use KVM's "user return MSRs" framework to defer restoring the host's
      MSR_TSC_AUX until the CPU returns to userspace.  Add/improve comments to
      clarify why MSR_TSC_AUX is intercepted on both RDMSR and WRMSR, and why
      it's safe for KVM to keep the guest's value loaded even if KVM is
      scheduled out.
      
      Cc: Reiji Watanabe <reijiw@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210423223404.3860547-5-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      844d69c2
    • S
      KVM: SVM: Clear MSR_TSC_AUX[63:32] on write · dbd61273
      Sean Christopherson 提交于
      Force clear bits 63:32 of MSR_TSC_AUX on write to emulate current AMD
      CPUs, which completely ignore the upper 32 bits, including dropping them
      on write.  Emulating AMD hardware will also allow migrating a vCPU from
      AMD hardware to Intel hardware without requiring userspace to manually
      clear the upper bits, which are reserved on Intel hardware.
      
      Presumably, MSR_TSC_AUX[63:32] are intended to be reserved on AMD, but
      sadly the APM doesn't say _anything_ about those bits in the context of
      MSR access.  The RDTSCP entry simply states that RCX contains bits 31:0
      of the MSR, zero extended.  And even worse is that the RDPID description
      implies that it can consume all 64 bits of the MSR:
      
        RDPID reads the value of TSC_AUX MSR used by the RDTSCP instruction
        into the specified destination register. Normal operand size prefixes
        do not apply and the update is either 32 bit or 64 bit based on the
        current mode.
      
      Emulate current hardware behavior to give KVM the best odds of playing
      nice with whatever the behavior of future AMD CPUs happens to be.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210423223404.3860547-3-seanjc@google.com>
      [Fix broken patch. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      dbd61273
    • S
      KVM: SVM: Inject #GP on guest MSR_TSC_AUX accesses if RDTSCP unsupported · 6f2b296a
      Sean Christopherson 提交于
      Inject #GP on guest accesses to MSR_TSC_AUX if RDTSCP is unsupported in
      the guest's CPUID model.
      
      Fixes: 46896c73 ("KVM: svm: add support for RDTSCP")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210423223404.3860547-2-seanjc@google.com>
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6f2b296a
    • S
      KVM: VMX: Invert the inlining of MSR interception helpers · e23f6d49
      Sean Christopherson 提交于
      Invert the inline declarations of the MSR interception helpers between
      the wrapper, vmx_set_intercept_for_msr(), and the core implementations,
      vmx_{dis,en}able_intercept_for_msr().  Letting the compiler _not_
      inline the implementation reduces KVM's code footprint by ~3k bytes.
      
      Back when the helpers were added in commit 904e14fb ("KVM: VMX: make
      MSR bitmaps per-VCPU"), both the wrapper and the implementations were
      __always_inline because the end code distilled down to a few conditionals
      and a bit operation.  Today, the implementations involve a variety of
      checks and bit ops in order to support userspace MSR filtering.
      
      Furthermore, the vast majority of calls to manipulate MSR interception
      are not performance sensitive, e.g. vCPU creation and x2APIC toggling.
      On the other hand, the one path that is performance sensitive, dynamic
      LBR passthrough, uses the wrappers, i.e. is largely untouched by
      inverting the inlining.
      
      In short, forcing the low level MSR interception code to be inlined no
      longer makes sense.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210423221912.3857243-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e23f6d49
    • P
      KVM: documentation: fix sphinx warnings · f82762fb
      Paolo Bonzini 提交于
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f82762fb
    • W
      KVM: X86: Fix failure to boost kernel lock holder candidate in SEV-ES guests · b86bb11e
      Wanpeng Li 提交于
      Commit f1c6366e ("KVM: SVM: Add required changes to support intercepts under
      SEV-ES") prevents hypervisor accesses guest register state when the guest is
      running under SEV-ES. The initial value of vcpu->arch.guest_state_protected
      is false, it will not be updated in preemption notifiers after this commit which
      means that the kernel spinlock lock holder will always be skipped to boost. Let's
      fix it by always treating preempted is in the guest kernel mode, false positive
      is better than skip completely.
      
      Fixes: f1c6366e (KVM: SVM: Add required changes to support intercepts under SEV-ES)
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Message-Id: <1619080459-30032-1-git-send-email-wanpengli@tencent.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b86bb11e
    • V
      KVM: x86: Properly handle APF vs disabled LAPIC situation · 2f15d027
      Vitaly Kuznetsov 提交于
      Async PF 'page ready' event may happen when LAPIC is (temporary) disabled.
      In particular, Sebastien reports that when Linux kernel is directly booted
      by Cloud Hypervisor, LAPIC is 'software disabled' when APF mechanism is
      initialized. On initialization KVM tries to inject 'wakeup all' event and
      puts the corresponding token to the slot. It is, however, failing to inject
      an interrupt (kvm_apic_set_irq() -> __apic_accept_irq() -> !apic_enabled())
      so the guest never gets notified and the whole APF mechanism gets stuck.
      The same issue is likely to happen if the guest temporary disables LAPIC
      and a previously unavailable page becomes available.
      
      Do two things to resolve the issue:
      - Avoid dequeuing 'page ready' events from APF queue when LAPIC is
        disabled.
      - Trigger an attempt to deliver pending 'page ready' events when LAPIC
        becomes enabled (SPIV or MSR_IA32_APICBASE).
      Reported-by: NSebastien Boeuf <sebastien.boeuf@intel.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20210422092948.568327-1-vkuznets@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2f15d027
  2. 23 4月, 2021 4 次提交
    • S
      KVM: x86: Fix implicit enum conversion goof in scattered reverse CPUID code · 462f8dde
      Sean Christopherson 提交于
      Take "enum kvm_only_cpuid_leafs" in scattered specific CPUID helpers
      (which is obvious in hindsight), and use "unsigned int" for leafs that
      can be the kernel's standard "enum cpuid_leaf" or the aforementioned
      KVM-only variant.  Loss of the enum params is a bit disapponting, but
      gcc obviously isn't providing any extra sanity checks, and the various
      BUILD_BUG_ON() assertions ensure the input is in range.
      
      This fixes implicit enum conversions that are detected by clang-11:
      
      arch/x86/kvm/cpuid.c:499:29: warning: implicit conversion from enumeration type 'enum kvm_only_cpuid_leafs' to different enumeration type 'enum cpuid_leafs' [-Wenum-conversion]
              kvm_cpu_cap_init_scattered(CPUID_12_EAX,
              ~~~~~~~~~~~~~~~~~~~~~~~~~~ ^~~~~~~~~~~~
      arch/x86/kvm/cpuid.c:837:31: warning: implicit conversion from enumeration type 'enum kvm_only_cpuid_leafs' to different enumeration type 'enum cpuid_leafs' [-Wenum-conversion]
                      cpuid_entry_override(entry, CPUID_12_EAX);
                      ~~~~~~~~~~~~~~~~~~~~        ^~~~~~~~~~~~
      2 warnings generated.
      
      Fixes: 4e66c0cb ("KVM: x86: Add support for reverse CPUID lookup of scattered features")
      Cc: Kai Huang <kai.huang@intel.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210421010850.3009718-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      462f8dde
    • I
      KVM: VMX: use EPT_VIOLATION_GVA_TRANSLATED instead of 0x100 · 10835602
      Isaku Yamahata 提交于
      Use symbolic value, EPT_VIOLATION_GVA_TRANSLATED, instead of 0x100
      in handle_ept_violation().
      Signed-off-by: NYao Yuan <yuan.yao@intel.com>
      Signed-off-by: NIsaku Yamahata <isaku.yamahata@intel.com>
      Message-Id: <724e8271ea301aece3eb2afe286a9e2e92a70b18.1619136576.git.isaku.yamahata@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      10835602
    • P
      Merge tag 'kvmarm-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD · c4f71901
      Paolo Bonzini 提交于
      KVM/arm64 updates for Linux 5.13
      
      New features:
      
      - Stage-2 isolation for the host kernel when running in protected mode
      - Guest SVE support when running in nVHE mode
      - Force W^X hypervisor mappings in nVHE mode
      - ITS save/restore for guests using direct injection with GICv4.1
      - nVHE panics now produce readable backtraces
      - Guest support for PTP using the ptp_kvm driver
      - Performance improvements in the S2 fault handler
      - Alexandru is now a reviewer (not really a new feature...)
      
      Fixes:
      - Proper emulation of the GICR_TYPER register
      - Handle the complete set of relocation in the nVHE EL2 object
      - Get rid of the oprofile dependency in the PMU code (and of the
        oprofile body parts at the same time)
      - Debug and SPE fixes
      - Fix vcpu reset
      c4f71901
    • P
      Merge branch 'kvm-sev-cgroup' into HEAD · fd49e8ee
      Paolo Bonzini 提交于
      fd49e8ee
  3. 22 4月, 2021 20 次提交