1. 03 5月, 2022 1 次提交
    • S
      kvm: x86/cpuid: Only provide CPUID leaf 0xA if host has architectural PMU · 5a1bde46
      Sandipan Das 提交于
      On some x86 processors, CPUID leaf 0xA provides information
      on Architectural Performance Monitoring features. It
      advertises a PMU version which Qemu uses to determine the
      availability of additional MSRs to manage the PMCs.
      
      Upon receiving a KVM_GET_SUPPORTED_CPUID ioctl request for
      the same, the kernel constructs return values based on the
      x86_pmu_capability irrespective of the vendor.
      
      This leaf and the additional MSRs are not supported on AMD
      and Hygon processors. If AMD PerfMonV2 is detected, the PMU
      version is set to 2 and guest startup breaks because of an
      attempt to access a non-existent MSR. Return zeros to avoid
      this.
      
      Fixes: a6c06ed1 ("KVM: Expose the architectural performance monitoring CPUID leaf")
      Reported-by: NVasant Hegde <vasant.hegde@amd.com>
      Signed-off-by: NSandipan Das <sandipan.das@amd.com>
      Message-Id: <3fef83d9c2b2f7516e8ff50d60851f29a4bcb716.1651058600.git.sandipan.das@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5a1bde46
  2. 30 3月, 2022 1 次提交
    • N
      KVM: x86: Fix clang -Wimplicit-fallthrough in do_host_cpuid() · 07ea4ab1
      Nathan Chancellor 提交于
      Clang warns:
      
        arch/x86/kvm/cpuid.c:739:2: error: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough]
                default:
                ^
        arch/x86/kvm/cpuid.c:739:2: note: insert 'break;' to avoid fall-through
                default:
                ^
                break;
        1 error generated.
      
      Clang is a little more pedantic than GCC, which does not warn when
      falling through to a case that is just break or return. Clang's version
      is more in line with the kernel's own stance in deprecated.rst, which
      states that all switch/case blocks must end in either break,
      fallthrough, continue, goto, or return. Add the missing break to silence
      the warning.
      
      Fixes: f144c49e ("KVM: x86: synthesize CPUID leaf 0x80000021h if useful")
      Reported-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NNathan Chancellor <nathan@kernel.org>
      Message-Id: <20220322152906.112164-1-nathan@kernel.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      07ea4ab1
  3. 21 3月, 2022 3 次提交
  4. 17 2月, 2022 2 次提交
    • L
      x86/kvm/fpu: Remove kvm_vcpu_arch.guest_supported_xcr0 · 988896bb
      Leonardo Bras 提交于
      kvm_vcpu_arch currently contains the guest supported features in both
      guest_supported_xcr0 and guest_fpu.fpstate->user_xfeatures field.
      
      Currently both fields are set to the same value in
      kvm_vcpu_after_set_cpuid() and are not changed anywhere else after that.
      
      Since it's not good to keep duplicated data, remove guest_supported_xcr0.
      
      To keep the code more readable, introduce kvm_guest_supported_xcr()
      and kvm_guest_supported_xfd() to replace the previous usages of
      guest_supported_xcr0.
      Signed-off-by: NLeonardo Bras <leobras@redhat.com>
      Message-Id: <20220217053028.96432-3-leobras@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      988896bb
    • L
      x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0 · ad856280
      Leonardo Bras 提交于
      During host/guest switch (like in kvm_arch_vcpu_ioctl_run()), the kernel
      swaps the fpu between host/guest contexts, by using fpu_swap_kvm_fpstate().
      
      When xsave feature is available, the fpu swap is done by:
      - xsave(s) instruction, with guest's fpstate->xfeatures as mask, is used
        to store the current state of the fpu registers to a buffer.
      - xrstor(s) instruction, with (fpu_kernel_cfg.max_features &
        XFEATURE_MASK_FPSTATE) as mask, is used to put the buffer into fpu regs.
      
      For xsave(s) the mask is used to limit what parts of the fpu regs will
      be copied to the buffer. Likewise on xrstor(s), the mask is used to
      limit what parts of the fpu regs will be changed.
      
      The mask for xsave(s), the guest's fpstate->xfeatures, is defined on
      kvm_arch_vcpu_create(), which (in summary) sets it to all features
      supported by the cpu which are enabled on kernel config.
      
      This means that xsave(s) will save to guest buffer all the fpu regs
      contents the cpu has enabled when the guest is paused, even if they
      are not used.
      
      This would not be an issue, if xrstor(s) would also do that.
      
      xrstor(s)'s mask for host/guest swap is basically every valid feature
      contained in kernel config, except XFEATURE_MASK_PKRU.
      Accordingto kernel src, it is instead switched in switch_to() and
      flush_thread().
      
      Then, the following happens with a host supporting PKRU starts a
      guest that does not support it:
      1 - Host has XFEATURE_MASK_PKRU set. 1st switch to guest,
      2 - xsave(s) fpu regs to host fpustate (buffer has XFEATURE_MASK_PKRU)
      3 - xrstor(s) guest fpustate to fpu regs (fpu regs have XFEATURE_MASK_PKRU)
      4 - guest runs, then switch back to host,
      5 - xsave(s) fpu regs to guest fpstate (buffer now have XFEATURE_MASK_PKRU)
      6 - xrstor(s) host fpstate to fpu regs.
      7 - kvm_vcpu_ioctl_x86_get_xsave() copy guest fpstate to userspace (with
          XFEATURE_MASK_PKRU, which should not be supported by guest vcpu)
      
      On 5, even though the guest does not support PKRU, it does have the flag
      set on guest fpstate, which is transferred to userspace via vcpu ioctl
      KVM_GET_XSAVE.
      
      This becomes a problem when the user decides on migrating the above guest
      to another machine that does not support PKRU: the new host restores
      guest's fpu regs to as they were before (xrstor(s)), but since the new
      host don't support PKRU, a general-protection exception ocurs in xrstor(s)
      and that crashes the guest.
      
      This can be solved by making the guest's fpstate->user_xfeatures hold
      a copy of guest_supported_xcr0. This way, on 7 the only flags copied to
      userspace will be the ones compatible to guest requirements, and thus
      there will be no issue during migration.
      
      As a bonus, it will also fail if userspace tries to set fpu features
      (with the KVM_SET_XSAVE ioctl) that are not compatible to the guest
      configuration.  Such features will never be returned by KVM_GET_XSAVE
      or KVM_GET_XSAVE2.
      
      Also, since kvm_vcpu_after_set_cpuid() now sets fpstate->user_xfeatures,
      there is not need to set it in kvm_check_cpuid(). So, change
      fpstate_realloc() so it does not touch fpstate->user_xfeatures if a
      non-NULL guest_fpu is passed, which is the case when kvm_check_cpuid()
      calls it.
      Signed-off-by: NLeonardo Bras <leobras@redhat.com>
      Message-Id: <20220217053028.96432-2-leobras@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ad856280
  5. 11 2月, 2022 1 次提交
  6. 04 2月, 2022 1 次提交
    • J
      KVM: x86: Report deprecated x87 features in supported CPUID · e3bcfda0
      Jim Mattson 提交于
      CPUID.(EAX=7,ECX=0):EBX.FDP_EXCPTN_ONLY[bit 6] and
      CPUID.(EAX=7,ECX=0):EBX.ZERO_FCS_FDS[bit 13] are "defeature"
      bits. Unlike most of the other CPUID feature bits, these bits are
      clear if the features are present and set if the features are not
      present. These bits should be reported in KVM_GET_SUPPORTED_CPUID,
      because if these bits are set on hardware, they cannot be cleared in
      the guest CPUID. Doing so would claim guest support for a feature that
      the hardware doesn't support and that can't be efficiently emulated.
      
      Of course, any software (e.g WIN87EM.DLL) expecting these features to
      be present likely predates these CPUID feature bits and therefore
      doesn't know to check for them anyway.
      
      Aaron Lewis added the corresponding X86_FEATURE macros in
      commit cbb99c0f ("x86/cpufeatures: Add FDP_EXCPTN_ONLY and
      ZERO_FCS_FDS"), with the intention of reporting these bits in
      KVM_GET_SUPPORTED_CPUID, but I was unable to find a proposed patch on
      the kvm list.
      
      Opportunistically reordered the CPUID_7_0_EBX capability bits from
      least to most significant.
      
      Cc: Aaron Lewis <aaronlewis@google.com>
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Message-Id: <20220204001348.2844660-1-jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e3bcfda0
  7. 02 2月, 2022 1 次提交
  8. 27 1月, 2022 3 次提交
    • S
      KVM: x86: Free kvm_cpuid_entry2 array on post-KVM_RUN KVM_SET_CPUID{,2} · 811f95ff
      Sean Christopherson 提交于
      Free the "struct kvm_cpuid_entry2" array on successful post-KVM_RUN
      KVM_SET_CPUID{,2} to fix a memory leak, the callers of kvm_set_cpuid()
      free the array only on failure.
      
       BUG: memory leak
       unreferenced object 0xffff88810963a800 (size 2048):
        comm "syz-executor025", pid 3610, jiffies 4294944928 (age 8.080s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 0d 00 00 00  ................
          47 65 6e 75 6e 74 65 6c 69 6e 65 49 00 00 00 00  GenuntelineI....
        backtrace:
          [<ffffffff814948ee>] kmalloc_node include/linux/slab.h:604 [inline]
          [<ffffffff814948ee>] kvmalloc_node+0x3e/0x100 mm/util.c:580
          [<ffffffff814950f2>] kvmalloc include/linux/slab.h:732 [inline]
          [<ffffffff814950f2>] vmemdup_user+0x22/0x100 mm/util.c:199
          [<ffffffff8109f5ff>] kvm_vcpu_ioctl_set_cpuid2+0x8f/0xf0 arch/x86/kvm/cpuid.c:423
          [<ffffffff810711b9>] kvm_arch_vcpu_ioctl+0xb99/0x1e60 arch/x86/kvm/x86.c:5251
          [<ffffffff8103e92d>] kvm_vcpu_ioctl+0x4ad/0x950 arch/x86/kvm/../../../virt/kvm/kvm_main.c:4066
          [<ffffffff815afacc>] vfs_ioctl fs/ioctl.c:51 [inline]
          [<ffffffff815afacc>] __do_sys_ioctl fs/ioctl.c:874 [inline]
          [<ffffffff815afacc>] __se_sys_ioctl fs/ioctl.c:860 [inline]
          [<ffffffff815afacc>] __x64_sys_ioctl+0xfc/0x140 fs/ioctl.c:860
          [<ffffffff844a3335>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
          [<ffffffff844a3335>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
          [<ffffffff84600068>] entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Fixes: c6617c61 ("KVM: x86: Partially allow KVM_SET_CPUID{,2} after KVM_RUN")
      Cc: stable@vger.kernel.org
      Reported-by: syzbot+be576ad7655690586eec@syzkaller.appspotmail.com
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20220125210445.2053429-1-seanjc@google.com>
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      811f95ff
    • V
      KVM: x86: Check .flags in kvm_cpuid_check_equal() too · 033a3ea5
      Vitaly Kuznetsov 提交于
      kvm_cpuid_check_equal() checks for the (full) equality of the supplied
      CPUID data so .flags need to be checked too.
      Reported-by: NSean Christopherson <seanjc@google.com>
      Fixes: c6617c61 ("KVM: x86: Partially allow KVM_SET_CPUID{,2} after KVM_RUN")
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20220126131804.2839410-1-vkuznets@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      033a3ea5
    • L
      KVM: x86/cpuid: Exclude unpermitted xfeatures sizes at KVM_GET_SUPPORTED_CPUID · 1ffce092
      Like Xu 提交于
      With the help of xstate_get_guest_group_perm(), KVM can exclude unpermitted
      xfeatures in cpuid.0xd.0.eax, in which case the corresponding xfeatures
      sizes should also be matched to the permitted xfeatures.
      
      To fix this inconsistency, the permitted_xcr0 and permitted_xss are defined
      consistently, which implies 'supported' plus certain permissions for this
      task, and it also fixes cpuid.0xd.1.ebx and later leaf-by-leaf queries.
      
      Fixes: 445ecdf7 ("kvm: x86: Exclude unpermitted xfeatures at KVM_GET_SUPPORTED_CPUID")
      Signed-off-by: NLike Xu <likexu@tencent.com>
      Message-Id: <20220125115223.33707-1-likexu@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1ffce092
  9. 25 1月, 2022 1 次提交
  10. 20 1月, 2022 1 次提交
    • L
      KVM: x86/cpuid: Clear XFD for component i if the base feature is missing · e9737468
      Like Xu 提交于
      According to Intel extended feature disable (XFD) spec, the sub-function i
      (i > 1) of CPUID function 0DH enumerates "details for state component i.
      ECX[2] enumerates support for XFD support for this state component."
      
      If KVM does not report F(XFD) feature (e.g. due to CONFIG_X86_64),
      then the corresponding XFD support for any state component i
      should also be removed. Translate this dependency into KVM terms.
      
      Fixes: 690a757d ("kvm: x86: Add CPUID support for Intel AMX")
      Signed-off-by: NLike Xu <likexu@tencent.com>
      Message-Id: <20220117074531.76925-1-likexu@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e9737468
  11. 18 1月, 2022 3 次提交
    • L
      KVM: x86: Making the module parameter of vPMU more common · 4732f244
      Like Xu 提交于
      The new module parameter to control PMU virtualization should apply
      to Intel as well as AMD, for situations where userspace is not trusted.
      If the module parameter allows PMU virtualization, there could be a
      new KVM_CAP or guest CPUID bits whereby userspace can enable/disable
      PMU virtualization on a per-VM basis.
      
      If the module parameter does not allow PMU virtualization, there
      should be no userspace override, since we have no precedent for
      authorizing that kind of override. If it's false, other counter-based
      profiling features (such as LBR including the associated CPUID bits
      if any) will not be exposed.
      
      Change its name from "pmu" to "enable_pmu" as we have temporary
      variables with the same name in our code like "struct kvm_pmu *pmu".
      
      Fixes: b1d66dad ("KVM: x86/svm: Add module param to control PMU virtualization")
      Suggested-by : Jim Mattson <jmattson@google.com>
      Signed-off-by: NLike Xu <likexu@tencent.com>
      Message-Id: <20220111073823.21885-1-likexu@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4732f244
    • V
      KVM: x86: Partially allow KVM_SET_CPUID{,2} after KVM_RUN · c6617c61
      Vitaly Kuznetsov 提交于
      Commit feb627e8 ("KVM: x86: Forbid KVM_SET_CPUID{,2} after KVM_RUN")
      forbade changing CPUID altogether but unfortunately this is not fully
      compatible with existing VMMs. In particular, QEMU reuses vCPU fds for
      CPU hotplug after unplug and it calls KVM_SET_CPUID2. Instead of full ban,
      check whether the supplied CPUID data is equal to what was previously set.
      Reported-by: NIgor Mammedov <imammedo@redhat.com>
      Fixes: feb627e8 ("KVM: x86: Forbid KVM_SET_CPUID{,2} after KVM_RUN")
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20220117150542.2176196-3-vkuznets@redhat.com>
      Cc: stable@vger.kernel.org
      [Do not call kvm_find_cpuid_entry repeatedly. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c6617c61
    • V
      KVM: x86: Do runtime CPUID update before updating vcpu->arch.cpuid_entries · ee3a5f9e
      Vitaly Kuznetsov 提交于
      kvm_update_cpuid_runtime() mangles CPUID data coming from userspace
      VMM after updating 'vcpu->arch.cpuid_entries', this makes it
      impossible to compare an update with what was previously
      supplied. Introduce __kvm_update_cpuid_runtime() version which can be
      used to tweak the input before it goes to 'vcpu->arch.cpuid_entries'
      so the upcoming update check can compare tweaked data.
      
      No functional change intended.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20220117150542.2176196-2-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ee3a5f9e
  12. 15 1月, 2022 3 次提交
  13. 08 1月, 2022 2 次提交
  14. 08 12月, 2021 1 次提交
    • L
      KVM: x86/svm: Add module param to control PMU virtualization · b1d66dad
      Like Xu 提交于
      For Intel, the guest PMU can be disabled via clearing the PMU CPUID.
      For AMD, all hw implementations support the base set of four
      performance counters, with current mainstream hardware indicating
      the presence of two additional counters via X86_FEATURE_PERFCTR_CORE.
      
      In the virtualized world, the AMD guest driver may detect
      the presence of at least one counter MSR. Most hypervisor
      vendors would introduce a module param (like lbrv for svm)
      to disable PMU for all guests.
      
      Another control proposal per-VM is to pass PMU disable information
      via MSR_IA32_PERF_CAPABILITIES or one bit in CPUID Fn4000_00[FF:00].
      Both of methods require some guest-side changes, so a module
      parameter may not be sufficiently granular, but practical enough.
      Signed-off-by: NLike Xu <likexu@tencent.com>
      Message-Id: <20211117080304.38989-1-likexu@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b1d66dad
  15. 18 11月, 2021 1 次提交
  16. 11 11月, 2021 2 次提交
    • P
      KVM: x86: Make sure KVM_CPUID_FEATURES really are KVM_CPUID_FEATURES · 760849b1
      Paul Durrant 提交于
      Currently when kvm_update_cpuid_runtime() runs, it assumes that the
      KVM_CPUID_FEATURES leaf is located at 0x40000001. This is not true,
      however, if Hyper-V support is enabled. In this case the KVM leaves will
      be offset.
      
      This patch introdues as new 'kvm_cpuid_base' field into struct
      kvm_vcpu_arch to track the location of the KVM leaves and function
      kvm_update_kvm_cpuid_base() (called from kvm_set_cpuid()) to locate the
      leaves using the 'KVMKVMKVM\0\0\0' signature (which is now given a
      definition in kvm_para.h). Adjustment of KVM_CPUID_FEATURES will hence now
      target the correct leaf.
      
      NOTE: A new for_each_possible_hypervisor_cpuid_base() macro is intoduced
            into processor.h to avoid having duplicate code for the iteration
            over possible hypervisor base leaves.
      Signed-off-by: NPaul Durrant <pdurrant@amazon.com>
      Message-Id: <20211105095101.5384-3-pdurrant@amazon.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      760849b1
    • S
      KVM: x86: Add helper to consolidate core logic of SET_CPUID{2} flows · 8b44b174
      Sean Christopherson 提交于
      Move the core logic of SET_CPUID and SET_CPUID2 to a common helper, the
      only difference between the two ioctls() is the format of the userspace
      struct.  A future fix will add yet more code to the core logic.
      
      No functional change intended.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211105095101.5384-2-pdurrant@amazon.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8b44b174
  17. 01 10月, 2021 1 次提交
    • B
      KVM: x86: Expose Predictive Store Forwarding Disable · b73a5432
      Babu Moger 提交于
      Predictive Store Forwarding: AMD Zen3 processors feature a new
      technology called Predictive Store Forwarding (PSF).
      
      PSF is a hardware-based micro-architectural optimization designed
      to improve the performance of code execution by predicting address
      dependencies between loads and stores.
      
      How PSF works:
      
      It is very common for a CPU to execute a load instruction to an address
      that was recently written by a store. Modern CPUs implement a technique
      known as Store-To-Load-Forwarding (STLF) to improve performance in such
      cases. With STLF, data from the store is forwarded directly to the load
      without having to wait for it to be written to memory. In a typical CPU,
      STLF occurs after the address of both the load and store are calculated
      and determined to match.
      
      PSF expands on this by speculating on the relationship between loads and
      stores without waiting for the address calculation to complete. With PSF,
      the CPU learns over time the relationship between loads and stores. If
      STLF typically occurs between a particular store and load, the CPU will
      remember this.
      
      In typical code, PSF provides a performance benefit by speculating on
      the load result and allowing later instructions to begin execution
      sooner than they otherwise would be able to.
      
      The details of security analysis of AMD predictive store forwarding is
      documented here.
      https://www.amd.com/system/files/documents/security-analysis-predictive-store-forwarding.pdf
      
      Predictive Store Forwarding controls:
      There are two hardware control bits which influence the PSF feature:
      - MSR 48h bit 2 – Speculative Store Bypass (SSBD)
      - MSR 48h bit 7 – Predictive Store Forwarding Disable (PSFD)
      
      The PSF feature is disabled if either of these bits are set.  These bits
      are controllable on a per-thread basis in an SMT system. By default, both
      SSBD and PSFD are 0 meaning that the speculation features are enabled.
      
      While the SSBD bit disables PSF and speculative store bypass, PSFD only
      disables PSF.
      
      PSFD may be desirable for software which is concerned with the
      speculative behavior of PSF but desires a smaller performance impact than
      setting SSBD.
      
      Support for PSFD is indicated in CPUID Fn8000_0008 EBX[28].
      All processors that support PSF will also support PSFD.
      
      Linux kernel does not have the interface to enable/disable PSFD yet. Plan
      here is to expose the PSFD technology to KVM so that the guest kernel can
      make use of it if they wish to.
      Signed-off-by: NBabu Moger <Babu.Moger@amd.com>
      Message-Id: <163244601049.30292.5855870305350227855.stgit@bmoger-ubuntu>
      [Keep feature private to KVM, as requested by Borislav Petkov. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b73a5432
  18. 30 9月, 2021 1 次提交
    • S
      KVM: x86: Swap order of CPUID entry "index" vs. "significant flag" checks · e8a747d0
      Sean Christopherson 提交于
      Check whether a CPUID entry's index is significant before checking for a
      matching index to hack-a-fix an undefined behavior bug due to consuming
      uninitialized data.  RESET/INIT emulation uses kvm_cpuid() to retrieve
      CPUID.0x1, which does _not_ have a significant index, and fails to
      initialize the dummy variable that doubles as EBX/ECX/EDX output _and_
      ECX, a.k.a. index, input.
      
      Practically speaking, it's _extremely_  unlikely any compiler will yield
      code that causes problems, as the compiler would need to inline the
      kvm_cpuid() call to detect the uninitialized data, and intentionally hose
      the kernel, e.g. insert ud2, instead of simply ignoring the result of
      the index comparison.
      
      Although the sketchy "dummy" pattern was introduced in SVM by commit
      66f7b72e ("KVM: x86: Make register state after reset conform to
      specification"), it wasn't actually broken until commit 7ff6c035
      ("KVM: x86: Remove stateful CPUID handling") arbitrarily swapped the
      order of operations such that "index" was checked before the significant
      flag.
      
      Avoid consuming uninitialized data by reverting to checking the flag
      before the index purely so that the fix can be easily backported; the
      offending RESET/INIT code has been refactored, moved, and consolidated
      from vendor code to common x86 since the bug was introduced.  A future
      patch will directly address the bad RESET/INIT behavior.
      
      The undefined behavior was detected by syzbot + KernelMemorySanitizer.
      
        BUG: KMSAN: uninit-value in cpuid_entry2_find arch/x86/kvm/cpuid.c:68
        BUG: KMSAN: uninit-value in kvm_find_cpuid_entry arch/x86/kvm/cpuid.c:1103
        BUG: KMSAN: uninit-value in kvm_cpuid+0x456/0x28f0 arch/x86/kvm/cpuid.c:1183
         cpuid_entry2_find arch/x86/kvm/cpuid.c:68 [inline]
         kvm_find_cpuid_entry arch/x86/kvm/cpuid.c:1103 [inline]
         kvm_cpuid+0x456/0x28f0 arch/x86/kvm/cpuid.c:1183
         kvm_vcpu_reset+0x13fb/0x1c20 arch/x86/kvm/x86.c:10885
         kvm_apic_accept_events+0x58f/0x8c0 arch/x86/kvm/lapic.c:2923
         vcpu_enter_guest+0xfd2/0x6d80 arch/x86/kvm/x86.c:9534
         vcpu_run+0x7f5/0x18d0 arch/x86/kvm/x86.c:9788
         kvm_arch_vcpu_ioctl_run+0x245b/0x2d10 arch/x86/kvm/x86.c:10020
      
        Local variable ----dummy@kvm_vcpu_reset created at:
         kvm_vcpu_reset+0x1fb/0x1c20 arch/x86/kvm/x86.c:10812
         kvm_apic_accept_events+0x58f/0x8c0 arch/x86/kvm/lapic.c:2923
      
      Reported-by: syzbot+f3985126b746b3d59c9d@syzkaller.appspotmail.com
      Reported-by: NAlexander Potapenko <glider@google.com>
      Fixes: 2a24be79 ("KVM: VMX: Set EDX at INIT with CPUID.0x1, Family-Model-Stepping")
      Fixes: 7ff6c035 ("KVM: x86: Remove stateful CPUID handling")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Message-Id: <20210929222426.1855730-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e8a747d0
  19. 13 8月, 2021 1 次提交
    • S
      KVM: x86: Allow guest to set EFER.NX=1 on non-PAE 32-bit kernels · 1383279c
      Sean Christopherson 提交于
      Remove an ancient restriction that disallowed exposing EFER.NX to the
      guest if EFER.NX=0 on the host, even if NX is fully supported by the CPU.
      The motivation of the check, added by commit 2cc51560 ("KVM: VMX:
      Avoid saving and restoring msr_efer on lightweight vmexit"), was to rule
      out the case of host.EFER.NX=0 and guest.EFER.NX=1 so that KVM could run
      the guest with the host's EFER.NX and thus avoid context switching EFER
      if the only divergence was the NX bit.
      
      Fast forward to today, and KVM has long since stopped running the guest
      with the host's EFER.NX.  Not only does KVM context switch EFER if
      host.EFER.NX=1 && guest.EFER.NX=0, KVM also forces host.EFER.NX=0 &&
      guest.EFER.NX=1 when using shadow paging (to emulate SMEP).  Furthermore,
      the entire motivation for the restriction was made obsolete over a decade
      ago when Intel added dedicated host and guest EFER fields in the VMCS
      (Nehalem timeframe), which reduced the overhead of context switching EFER
      from 400+ cycles (2 * WRMSR + 1 * RDMSR) to a mere ~2 cycles.
      
      In practice, the removed restriction only affects non-PAE 32-bit kernels,
      as EFER.NX is set during boot if NX is supported and the kernel will use
      PAE paging (32-bit or 64-bit), regardless of whether or not the kernel
      will actually use NX itself (mark PTEs non-executable).
      
      Alternatively and/or complementarily, startup_32_smp() in head_32.S could
      be modified to set EFER.NX=1 regardless of paging mode, thus eliminating
      the scenario where NX is supported but not enabled.  However, that runs
      the risk of breaking non-KVM non-PAE kernels (though the risk is very,
      very low as there are no known EFER.NX errata), and also eliminates an
      easy-to-use mechanism for stressing KVM's handling of guest vs. host EFER
      across nested virtualization transitions.
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210805183804.1221554-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1383279c
  20. 15 7月, 2021 3 次提交
  21. 25 6月, 2021 1 次提交
    • S
      KVM: x86: Force all MMUs to reinitialize if guest CPUID is modified · 49c6f875
      Sean Christopherson 提交于
      Invalidate all MMUs' roles after a CPUID update to force reinitizliation
      of the MMU context/helpers.  Despite the efforts of commit de3ccd26
      ("KVM: MMU: record maximum physical address width in kvm_mmu_extended_role"),
      there are still a handful of CPUID-based properties that affect MMU
      behavior but are not incorporated into mmu_role.  E.g. 1gb hugepage
      support, AMD vs. Intel handling of bit 8, and SEV's C-Bit location all
      factor into the guest's reserved PTE bits.
      
      The obvious alternative would be to add all such properties to mmu_role,
      but doing so provides no benefit over simply forcing a reinitialization
      on every CPUID update, as setting guest CPUID is a rare operation.
      
      Note, reinitializing all MMUs after a CPUID update does not fix all of
      KVM's woes.  Specifically, kvm_mmu_page_role doesn't track the CPUID
      properties, which means that a vCPU can reuse shadow pages that should
      not exist for the new vCPU model, e.g. that map GPAs that are now illegal
      (due to MAXPHYADDR changes) or that set bits that are now reserved
      (PAGE_SIZE for 1gb pages), etc...
      
      Tracking the relevant CPUID properties in kvm_mmu_page_role would address
      the majority of problems, but fully tracking that much state in the
      shadow page role comes with an unpalatable cost as it would require a
      non-trivial increase in KVM's memory footprint.  The GBPAGES case is even
      worse, as neither Intel nor AMD provides a way to disable 1gb hugepage
      support in the hardware page walker, i.e. it's a virtualization hole that
      can't be closed when using TDP.
      
      In other words, resetting the MMU after a CPUID update is largely a
      superficial fix.  But, it will allow reverting the tracking of MAXPHYADDR
      in the mmu_role, and that case in particular needs to mostly work because
      KVM's shadow_root_level depends on guest MAXPHYADDR when 5-level paging
      is supported.  For cases where KVM botches guest behavior, the damage is
      limited to that guest.  But for the shadow_root_level, a misconfigured
      MMU can cause KVM to incorrectly access memory, e.g. due to walking off
      the end of its shadow page tables.
      
      Fixes: 7dcd5755 ("x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is needed")
      Cc: Yu Zhang <yu.c.zhang@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210622175739.3610207-7-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      49c6f875
  22. 10 6月, 2021 1 次提交
  23. 07 5月, 2021 3 次提交
    • P
      KVM: X86: Expose bus lock debug exception to guest · 76ea438b
      Paolo Bonzini 提交于
      Bus lock debug exception is an ability to notify the kernel by an #DB
      trap after the instruction acquires a bus lock and is executed when
      CPL>0. This allows the kernel to enforce user application throttling or
      mitigations.
      
      Existence of bus lock debug exception is enumerated via
      CPUID.(EAX=7,ECX=0).ECX[24]. Software can enable these exceptions by
      setting bit 2 of the MSR_IA32_DEBUGCTL. Expose the CPUID to guest and
      emulate the MSR handling when guest enables it.
      
      Support for this feature was originally developed by Xiaoyao Li and
      Chenyi Qiang, but code has since changed enough that this patch has
      nothing in common with theirs, except for this commit message.
      Co-developed-by: NXiaoyao Li <xiaoyao.li@intel.com>
      Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
      Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
      Message-Id: <20210202090433.13441-4-chenyi.qiang@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      76ea438b
    • S
      KVM: x86: Hide RDTSCP and RDPID if MSR_TSC_AUX probing failed · 78bba966
      Sean Christopherson 提交于
      If probing MSR_TSC_AUX failed, hide RDTSCP and RDPID, and WARN if either
      feature was reported as supported.  In theory, such a scenario should
      never happen as both Intel and AMD state that MSR_TSC_AUX is available if
      RDTSCP or RDPID is supported.  But, KVM injects #GP on MSR_TSC_AUX
      accesses if probing failed, faults on WRMSR(MSR_TSC_AUX) may be fatal to
      the guest (because they happen during early CPU bringup), and KVM itself
      has effectively misreported RDPID support in the past.
      
      Note, this also has the happy side effect of omitting MSR_TSC_AUX from
      the list of MSRs that are exposed to userspace if probing the MSR fails.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210504171734.1434054-16-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      78bba966
    • S
      KVM: x86: Emulate RDPID only if RDTSCP is supported · 85d00112
      Sean Christopherson 提交于
      Do not advertise emulation support for RDPID if RDTSCP is unsupported.
      RDPID emulation subtly relies on MSR_TSC_AUX to exist in hardware, as
      both vmx_get_msr() and svm_get_msr() will return an error if the MSR is
      unsupported, i.e. ctxt->ops->get_msr() will fail and the emulator will
      inject a #UD.
      
      Note, RDPID emulation also relies on RDTSCP being enabled in the guest,
      but this is a KVM bug and will eventually be fixed.
      
      Fixes: fb6d4d34 ("KVM: x86: emulate RDPID")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210504171734.1434054-3-seanjc@google.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NReiji Watanabe <reijiw@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      85d00112
  24. 26 4月, 2021 1 次提交
    • P
      KVM: SEV: Mask CPUID[0x8000001F].eax according to supported features · d9db0fd6
      Paolo Bonzini 提交于
      Add a reverse-CPUID entry for the memory encryption word, 0x8000001F.EAX,
      and use it to override the supported CPUID flags reported to userspace.
      Masking the reported CPUID flags avoids over-reporting KVM support, e.g.
      without the mask a SEV-SNP capable CPU may incorrectly advertise SNP
      support to userspace.
      
      Clear SEV/SEV-ES if their corresponding module parameters are disabled,
      and clear the memory encryption leaf completely if SEV is not fully
      supported in KVM.  Advertise SME_COHERENT in addition to SEV and SEV-ES,
      as the guest can use SME_COHERENT to avoid CLFLUSH operations.
      
      Explicitly omit SME and VM_PAGE_FLUSH from the reporting.  These features
      are used by KVM, but are not exposed to the guest, e.g. guest access to
      related MSRs will fault.
      
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Co-developed-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210422021125.3417167-6-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d9db0fd6
  25. 23 4月, 2021 1 次提交
    • S
      KVM: x86: Fix implicit enum conversion goof in scattered reverse CPUID code · 462f8dde
      Sean Christopherson 提交于
      Take "enum kvm_only_cpuid_leafs" in scattered specific CPUID helpers
      (which is obvious in hindsight), and use "unsigned int" for leafs that
      can be the kernel's standard "enum cpuid_leaf" or the aforementioned
      KVM-only variant.  Loss of the enum params is a bit disapponting, but
      gcc obviously isn't providing any extra sanity checks, and the various
      BUILD_BUG_ON() assertions ensure the input is in range.
      
      This fixes implicit enum conversions that are detected by clang-11:
      
      arch/x86/kvm/cpuid.c:499:29: warning: implicit conversion from enumeration type 'enum kvm_only_cpuid_leafs' to different enumeration type 'enum cpuid_leafs' [-Wenum-conversion]
              kvm_cpu_cap_init_scattered(CPUID_12_EAX,
              ~~~~~~~~~~~~~~~~~~~~~~~~~~ ^~~~~~~~~~~~
      arch/x86/kvm/cpuid.c:837:31: warning: implicit conversion from enumeration type 'enum kvm_only_cpuid_leafs' to different enumeration type 'enum cpuid_leafs' [-Wenum-conversion]
                      cpuid_entry_override(entry, CPUID_12_EAX);
                      ~~~~~~~~~~~~~~~~~~~~        ^~~~~~~~~~~~
      2 warnings generated.
      
      Fixes: 4e66c0cb ("KVM: x86: Add support for reverse CPUID lookup of scattered features")
      Cc: Kai Huang <kai.huang@intel.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210421010850.3009718-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      462f8dde