提交 · 21066101f42cfd86fdd835b70ce0e36c335f5f4d · openeuler / Kernel

20 1月, 2022 9 次提交

selftests: kvm/x86: Introduce is_amd_cpu() · 21066101

由 Jim Mattson 提交于 3年前

Replace the one ad hoc "AuthenticAMD" CPUID vendor string comparison
with a new function, is_amd_cpu().
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220115052431.447232-4-jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

21066101

selftests: kvm/x86: Parameterize the CPUID vendor string check · b33b9c40

由 Jim Mattson 提交于 3年前

Refactor is_intel_cpu() to make it easier to reuse the bulk of the
code for other vendors in the future.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220115052431.447232-3-jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b33b9c40

KVM: x86/pmu: Use binary search to check filtered events · 7ff775ac

由 Jim Mattson 提交于 3年前

The PMU event filter may contain up to 300 events. Replace the linear
search in reprogram_gp_counter() with a binary search.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220115052431.447232-2-jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7ff775ac

kvm: selftests: conditionally build vm_xsave_req_perm() · 1a1d1dbc

由 Wei Wang 提交于 3年前

vm_xsave_req_perm() is currently defined and used by x86_64 only.
Make it compiled into vm_create_with_vcpus() only when on x86_64
machines. Otherwise, it would cause linkage errors, e.g. on s390x.

Fixes: 415a3c33 ("kvm: selftests: Add support for KVM_CAP_XSAVE2")
Reported-by: NJanis Schoetterl-Glausch <scgl@linux.ibm.com>
Signed-off-by: NWei Wang <wei.w.wang@intel.com>
Tested-by: NJanis Schoetterl-Glausch <scgl@linux.ibm.com>
Message-Id: <20220118014817.30910-1-wei.w.wang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1a1d1dbc

KVM: x86/cpuid: Clear XFD for component i if the base feature is missing · e9737468

由 Like Xu 提交于 3年前

According to Intel extended feature disable (XFD) spec, the sub-function i
(i > 1) of CPUID function 0DH enumerates "details for state component i.
ECX[2] enumerates support for XFD support for this state component."

If KVM does not report F(XFD) feature (e.g. due to CONFIG_X86_64),
then the corresponding XFD support for any state component i
should also be removed. Translate this dependency into KVM terms.

Fixes: 690a757d ("kvm: x86: Add CPUID support for Intel AMX")
Signed-off-by: NLike Xu <likexu@tencent.com>
Message-Id: <20220117074531.76925-1-likexu@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e9737468

KVM: x86/mmu: Improve TLB flush comment in kvm_mmu_slot_remove_write_access() · 6ff94f27

由 David Matlack 提交于 3年前

Rewrite the comment in kvm_mmu_slot_remove_write_access() that explains
why it is safe to flush TLBs outside of the MMU lock after
write-protecting SPTEs for dirty logging. The current comment is a long
run-on sentence that was difficult to understand. In addition it was
specific to the shadow MMU (mentioning mmu_spte_update()) when the TDP
MMU has to handle this as well.

The new comment explains:
 - Why the TLB flush is necessary at all.
 - Why it is desirable to do the TLB flush outside of the MMU lock.
 - Why it is safe to do the TLB flush outside of the MMU lock.

No functional change intended.
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20220113233020.3986005-5-dmatlack@google.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6ff94f27

KVM: x86/mmu: Document and enforce MMU-writable and Host-writable invariants · 5f16bcac

由 David Matlack 提交于 3年前

SPTEs are tagged with software-only bits to indicate if it is
"MMU-writable" and "Host-writable". These bits are used to determine why
KVM has marked an SPTE as read-only.

Document these bits and their invariants, and enforce the invariants
with new WARNs in spte_can_locklessly_be_made_writable() to ensure they
are not accidentally violated in the future.

Opportunistically move DEFAULT_SPTE_{MMU,HOST}_WRITABLE next to
EPT_SPTE_{MMU,HOST}_WRITABLE since the new documentation applies to
both.

No functional change intended.
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20220113233020.3986005-4-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5f16bcac

KVM: x86/mmu: Clear MMU-writable during changed_pte notifier · f082d86e

由 David Matlack 提交于 3年前

When handling the changed_pte notifier and the new PTE is read-only,
clear both the Host-writable and MMU-writable bits in the SPTE. This
preserves the invariant that MMU-writable is set if-and-only-if
Host-writable is set.

No functional change intended. Nothing currently relies on the
aforementioned invariant and technically the changed_pte notifier is
dead code.
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20220113233020.3986005-3-dmatlack@google.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f082d86e

KVM: x86/mmu: Fix write-protection of PTs mapped by the TDP MMU · 7c8a4742

由 David Matlack 提交于 3年前

When the TDP MMU is write-protection GFNs for page table protection (as
opposed to for dirty logging, or due to the HVA not being writable), it
checks if the SPTE is already write-protected and if so skips modifying
the SPTE and the TLB flush.

This behavior is incorrect because it fails to check if the SPTE
is write-protected for page table protection, i.e. fails to check
that MMU-writable is '0'.  If the SPTE was write-protected for dirty
logging but not page table protection, the SPTE could locklessly be made
writable, and vCPUs could still be running with writable mappings cached
in their TLB.

Fix this by only skipping setting the SPTE if the SPTE is already
write-protected *and* MMU-writable is already clear.  Technically,
checking only MMU-writable would suffice; a SPTE cannot be writable
without MMU-writable being set.  But check both to be paranoid and
because it arguably yields more readable code.

Fixes: 46044f72 ("kvm: x86/mmu: Support write protection for nesting in tdp MMU")
Cc: stable@vger.kernel.org
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20220113233020.3986005-2-dmatlack@google.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7c8a4742

18 1月, 2022 6 次提交

KVM: x86: Making the module parameter of vPMU more common · 4732f244

由 Like Xu 提交于 3年前

The new module parameter to control PMU virtualization should apply
to Intel as well as AMD, for situations where userspace is not trusted.
If the module parameter allows PMU virtualization, there could be a
new KVM_CAP or guest CPUID bits whereby userspace can enable/disable
PMU virtualization on a per-VM basis.

If the module parameter does not allow PMU virtualization, there
should be no userspace override, since we have no precedent for
authorizing that kind of override. If it's false, other counter-based
profiling features (such as LBR including the associated CPUID bits
if any) will not be exposed.

Change its name from "pmu" to "enable_pmu" as we have temporary
variables with the same name in our code like "struct kvm_pmu *pmu".

Fixes: b1d66dad ("KVM: x86/svm: Add module param to control PMU virtualization")
Suggested-by : Jim Mattson <jmattson@google.com>
Signed-off-by: NLike Xu <likexu@tencent.com>
Message-Id: <20220111073823.21885-1-likexu@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4732f244

KVM: selftests: Test KVM_SET_CPUID2 after KVM_RUN · ecebb966

由 Vitaly Kuznetsov 提交于 3年前

KVM forbids KVM_SET_CPUID2 after KVM_RUN was performed on a vCPU unless
the supplied CPUID data is equal to what was previously set. Test this.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220117150542.2176196-5-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ecebb966

KVM: selftests: Rename 'get_cpuid_test' to 'cpuid_test' · 9e6d484f

由 Vitaly Kuznetsov 提交于 3年前

In preparation to reusing the existing 'get_cpuid_test' for testing
"KVM_SET_CPUID{,2} after KVM_RUN" rename it to 'cpuid_test' to avoid
the confusion.

No functional change intended.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220117150542.2176196-4-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9e6d484f

KVM: x86: Partially allow KVM_SET_CPUID{,2} after KVM_RUN · c6617c61

由 Vitaly Kuznetsov 提交于 3年前

Commit feb627e8 ("KVM: x86: Forbid KVM_SET_CPUID{,2} after KVM_RUN")
forbade changing CPUID altogether but unfortunately this is not fully
compatible with existing VMMs. In particular, QEMU reuses vCPU fds for
CPU hotplug after unplug and it calls KVM_SET_CPUID2. Instead of full ban,
check whether the supplied CPUID data is equal to what was previously set.
Reported-by: NIgor Mammedov <imammedo@redhat.com>
Fixes: feb627e8 ("KVM: x86: Forbid KVM_SET_CPUID{,2} after KVM_RUN")
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220117150542.2176196-3-vkuznets@redhat.com>
Cc: stable@vger.kernel.org
[Do not call kvm_find_cpuid_entry repeatedly. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c6617c61

KVM: x86: Do runtime CPUID update before updating vcpu->arch.cpuid_entries · ee3a5f9e

由 Vitaly Kuznetsov 提交于 3年前

kvm_update_cpuid_runtime() mangles CPUID data coming from userspace
VMM after updating 'vcpu->arch.cpuid_entries', this makes it
impossible to compare an update with what was previously
supplied. Introduce __kvm_update_cpuid_runtime() version which can be
used to tweak the input before it goes to 'vcpu->arch.cpuid_entries'
so the upcoming update check can compare tweaked data.

No functional change intended.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220117150542.2176196-2-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ee3a5f9e

KVM: x86/pmu: Fix available_event_types check for REF_CPU_CYCLES event · a2186448

由 Like Xu 提交于 3年前

According to CPUID 0x0A.EBX bit vector, the event [7] should be the
unrealized event "Topdown Slots" instead of the *kernel* generalized
common hardware event "REF_CPU_CYCLES", so we need to skip the cpuid
unavaliblity check in the intel_pmc_perf_hw_id() for the last
REF_CPU_CYCLES event and update the confusing comment.

If the event is marked as unavailable in the Intel guest CPUID
0AH.EBX leaf, we need to avoid any perf_event creation, whether
it's a gp or fixed counter. To distinguish whether it is a rejected
event or an event that needs to be programmed with PERF_TYPE_RAW type,
a new special returned value of "PERF_COUNT_HW_MAX + 1" is introduced.

Fixes: 62079d8a ("KVM: PMU: add proper support for fixed counter 2")
Signed-off-by: NLike Xu <likexu@tencent.com>
Message-Id: <20220105051509.69437-1-likexu@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a2186448

15 1月, 2022 21 次提交

x86/fpu: Fix inline prefix warnings · c862dcd1

由 Yang Zhong 提交于 3年前

Fix sparse warnings in xstate and remove inline prefix.

Fixes: 980fe2fd ("x86/fpu: Extend fpu_xstate_prctl() with guest permissions")
Signed-off-by: NYang Zhong <yang.zhong@intel.com>
Reported-by: Nkernel test robot <lkp@intel.com>
Message-Id: <20220113180825.322333-1-yang.zhong@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c862dcd1

selftest: kvm: Add amx selftest · bf70636d

由 Yang Zhong 提交于 3年前

This selftest covers two aspects of AMX.  The first is triggering #NM
exception and checking the MSR XFD_ERR value.  The second case is
loading tile config and tile data into guest registers and trapping to
the host side for a complete save/load of the guest state.  TMM0
is also checked against memory data after save/restore.
Signed-off-by: NYang Zhong <yang.zhong@intel.com>
Message-Id: <20211223145322.2914028-4-yang.zhong@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bf70636d

selftest: kvm: Move struct kvm_x86_state to header · 6559b4a5

由 Yang Zhong 提交于 3年前

Those changes can avoid dereferencing pointer compile issue
when amx_test.c reference state->xsave.

Move struct kvm_x86_state definition to processor.h.
Signed-off-by: NYang Zhong <yang.zhong@intel.com>
Message-Id: <20211223145322.2914028-3-yang.zhong@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6559b4a5

selftest: kvm: Reorder vcpu_load_state steps for AMX · 551447cf

由 Paolo Bonzini 提交于 3年前

For AMX support it is recommended to load XCR0 after XFD, so
that KVM does not see XFD=0, XCR=1 for a save state that will
eventually be disabled (which would lead to premature allocation
of the space required for that save state).

It is also required to load XSAVE data after XCR0 and XFD, so
that KVM can trigger allocation of the extra space required to
store AMX state.

Adjust vcpu_load_state to obey these new requirements.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NYang Zhong <yang.zhong@intel.com>
Message-Id: <20211223145322.2914028-2-yang.zhong@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

551447cf

kvm: x86: Disable interception for IA32_XFD on demand · b5274b1b

由 Kevin Tian 提交于 3年前

Always intercepting IA32_XFD causes non-negligible overhead when this
register is updated frequently in the guest.

Disable r/w emulation after intercepting the first WRMSR(IA32_XFD)
with a non-zero value.

Disable WRMSR emulation implies that IA32_XFD becomes out-of-sync
with the software states in fpstate and the per-cpu xfd cache. This
leads to two additional changes accordingly:

  - Call fpu_sync_guest_vmexit_xfd_state() after vm-exit to bring
    software states back in-sync with the MSR, before handle_exit_irqoff()
    is called.

  - Always trap #NM once write interception is disabled for IA32_XFD.
    The #NM exception is rare if the guest doesn't use dynamic
    features. Otherwise, there is at most one exception per guest
    task given a dynamic feature.

p.s. We have confirmed that SDM is being revised to say that
when setting IA32_XFD[18] the AMX register state is not guaranteed
to be preserved. This clarification avoids adding mess for a creative
guest which sets IA32_XFD[18]=1 before saving active AMX state to
its own storage.
Signed-off-by: NKevin Tian <kevin.tian@intel.com>
Signed-off-by: NJing Liu <jing2.liu@intel.com>
Signed-off-by: NYang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-22-yang.zhong@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b5274b1b

x86/fpu: Provide fpu_sync_guest_vmexit_xfd_state() · 5429cead

由 Thomas Gleixner 提交于 3年前

KVM can disable the write emulation for the XFD MSR when the vCPU's fpstate
is already correctly sized to reduce the overhead.

When write emulation is disabled the XFD MSR state after a VMEXIT is
unknown and therefore not in sync with the software states in fpstate and
the per CPU XFD cache.

Provide fpu_sync_guest_vmexit_xfd_state() which has to be invoked after a
VMEXIT before enabling interrupts when write emulation is disabled for the
XFD MSR.

It could be invoked unconditionally even when write emulation is enabled
for the price of a pointless MSR read.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NJing Liu <jing2.liu@intel.com>
Signed-off-by: NYang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-21-yang.zhong@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5429cead

kvm: selftests: Add support for KVM_CAP_XSAVE2 · 415a3c33

由 Wei Wang 提交于 3年前

When KVM_CAP_XSAVE2 is supported, userspace is expected to allocate
buffer for KVM_GET_XSAVE2 and KVM_SET_XSAVE using the size returned
by KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2).
Signed-off-by: NWei Wang <wei.w.wang@intel.com>
Signed-off-by: NGuang Zeng <guang.zeng@intel.com>
Signed-off-by: NJing Liu <jing2.liu@intel.com>
Signed-off-by: NYang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-20-yang.zhong@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

415a3c33

kvm: x86: Add support for getting/setting expanded xstate buffer · be50b206

由 Guang Zeng 提交于 3年前

With KVM_CAP_XSAVE, userspace uses a hardcoded 4KB buffer to get/set
xstate data from/to KVM. This doesn't work when dynamic xfeatures
(e.g. AMX) are exposed to the guest as they require a larger buffer
size.

Introduce a new capability (KVM_CAP_XSAVE2). Userspace VMM gets the
required xstate buffer size via KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2).
KVM_SET_XSAVE is extended to work with both legacy and new capabilities
by doing properly-sized memdup_user() based on the guest fpu container.
KVM_GET_XSAVE is kept for backward-compatible reason. Instead,
KVM_GET_XSAVE2 is introduced under KVM_CAP_XSAVE2 as the preferred
interface for getting xstate buffer (4KB or larger size) from KVM
(Link: https://lkml.org/lkml/2021/12/15/510)

Also, update the api doc with the new KVM_GET_XSAVE2 ioctl.
Signed-off-by: NGuang Zeng <guang.zeng@intel.com>
Signed-off-by: NWei Wang <wei.w.wang@intel.com>
Signed-off-by: NJing Liu <jing2.liu@intel.com>
Signed-off-by: NKevin Tian <kevin.tian@intel.com>
Signed-off-by: NYang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-19-yang.zhong@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

be50b206

x86/fpu: Add uabi_size to guest_fpu · c60427dd

由 Thomas Gleixner 提交于 3年前

Userspace needs to inquire KVM about the buffer size to work
with the new KVM_SET_XSAVE and KVM_GET_XSAVE2. Add the size info
to guest_fpu for KVM to access.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NWei Wang <wei.w.wang@intel.com>
Signed-off-by: NJing Liu <jing2.liu@intel.com>
Signed-off-by: NYang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-18-yang.zhong@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c60427dd

kvm: x86: Add CPUID support for Intel AMX · 690a757d

由 Jing Liu 提交于 3年前

Extend CPUID emulation to support XFD, AMX_TILE, AMX_INT8 and
AMX_BF16. Adding those bits into kvm_cpu_caps finally activates all
previous logics in this series.

Hide XFD on 32bit host kernels. Otherwise it leads to a weird situation
where KVM tells userspace to migrate MSR_IA32_XFD and then rejects
attempts to read/write the MSR.
Signed-off-by: NJing Liu <jing2.liu@intel.com>
Signed-off-by: NYang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-17-yang.zhong@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

690a757d

kvm: x86: Add XCR0 support for Intel AMX · 86aff7a4

由 Jing Liu 提交于 3年前

Two XCR0 bits are defined for AMX to support XSAVE mechanism. Bit 17
is for tilecfg and bit 18 is for tiledata.

The value of XCR0[17:18] is always either 00b or 11b. Also, SDM
recommends that only 64-bit operating systems enable Intel AMX by
setting XCR0[18:17]. 32-bit host kernel never sets the tile bits in
vcpu->arch.guest_supported_xcr0.
Signed-off-by: NJing Liu <jing2.liu@intel.com>
Signed-off-by: NKevin Tian <kevin.tian@intel.com>
Signed-off-by: NYang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-16-yang.zhong@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

86aff7a4

kvm: x86: Disable RDMSR interception of IA32_XFD_ERR · 61f20813

由 Jing Liu 提交于 3年前

This saves one unnecessary VM-exit in guest #NM handler, given that the
MSR is already restored with the guest value before the guest is resumed.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJing Liu <jing2.liu@intel.com>
Signed-off-by: NYang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-15-yang.zhong@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

61f20813

kvm: x86: Emulate IA32_XFD_ERR for guest · 548e8365

由 Jing Liu 提交于 3年前

Emulate read/write to IA32_XFD_ERR MSR.

Only the saved value in the guest_fpu container is touched in the
emulation handler. Actual MSR update is handled right before entering
the guest (with preemption disabled)
Signed-off-by: NJing Liu <jing2.liu@intel.com>
Signed-off-by: NZeng Guang <guang.zeng@intel.com>
Signed-off-by: NWei Wang <wei.w.wang@intel.com>
Signed-off-by: NJing Liu <jing2.liu@intel.com>
Signed-off-by: NYang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-14-yang.zhong@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

548e8365

kvm: x86: Intercept #NM for saving IA32_XFD_ERR · ec5be88a

由 Jing Liu 提交于 3年前

Guest IA32_XFD_ERR is generally modified in two places:

  - Set by CPU when #NM is triggered;
  - Cleared by guest in its #NM handler;

Intercept #NM for the first case when a nonzero value is written
to IA32_XFD. Nonzero indicates that the guest is willing to do
dynamic fpstate expansion for certain xfeatures, thus KVM needs to
manage and virtualize guest XFD_ERR properly. The vcpu exception
bitmap is updated in XFD write emulation according to guest_fpu::xfd.

Save the current XFD_ERR value to the guest_fpu container in the #NM
VM-exit handler. This must be done with interrupt disabled, otherwise
the unsaved MSR value may be clobbered by host activity.

The saving operation is conducted conditionally only when guest_fpu:xfd
includes a non-zero value. Doing so also avoids misread on a platform
which doesn't support XFD but #NM is triggered due to L1 interception.

Queueing #NM to the guest is postponed to handle_exception_nmi(). This
goes through the nested_vmx check so a virtual vmexit is queued instead
when #NM is triggered in L2 but L1 wants to intercept it.

Restore the host value (always ZERO outside of the host #NM
handler) before enabling interrupt.

Restore the guest value from the guest_fpu container right before
entering the guest (with interrupt disabled).
Suggested-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NJing Liu <jing2.liu@intel.com>
Signed-off-by: NKevin Tian <kevin.tian@intel.com>
Signed-off-by: NYang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-13-yang.zhong@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ec5be88a

x86/fpu: Prepare xfd_err in struct fpu_guest · 1df4fd83

由 Jing Liu 提交于 3年前

When XFD causes an instruction to generate #NM, IA32_XFD_ERR
contains information about which disabled state components are
being accessed. The #NM handler is expected to check this
information and then enable the state components by clearing
IA32_XFD for the faulting task (if having permission).

If the XFD_ERR value generated in guest is consumed/clobbered
by the host before the guest itself doing so, it may lead to
non-XFD-related #NM treated as XFD #NM in host (due to non-zero
value in XFD_ERR), or XFD-related #NM treated as non-XFD #NM in
guest (XFD_ERR cleared by the host #NM handler).

Introduce a new field in fpu_guest to save the guest xfd_err value.
KVM is expected to save guest xfd_err before interrupt is enabled
and restore it right before entering the guest (with interrupt
disabled).
Signed-off-by: NJing Liu <jing2.liu@intel.com>
Signed-off-by: NKevin Tian <kevin.tian@intel.com>
Signed-off-by: NJing Liu <jing2.liu@intel.com>
Signed-off-by: NYang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-12-yang.zhong@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1df4fd83

kvm: x86: Add emulation for IA32_XFD · 820a6ee9

由 Jing Liu 提交于 3年前

Intel's eXtended Feature Disable (XFD) feature allows the software
to dynamically adjust fpstate buffer size for XSAVE features which
have large state.

Because guest fpstate has been expanded for all possible dynamic
xstates at KVM_SET_CPUID2, emulation of the IA32_XFD MSR is
straightforward. For write just call fpu_update_guest_xfd() to
update the guest fpu container once all the sanity checks are passed.
For read simply return the cached value in the container.
Signed-off-by: NJing Liu <jing2.liu@intel.com>
Signed-off-by: NZeng Guang <guang.zeng@intel.com>
Signed-off-by: NWei Wang <wei.w.wang@intel.com>
Signed-off-by: NJing Liu <jing2.liu@intel.com>
Signed-off-by: NYang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-11-yang.zhong@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

820a6ee9

x86/fpu: Provide fpu_update_guest_xfd() for IA32_XFD emulation · 8eb9a48a

由 Kevin Tian 提交于 3年前

Guest XFD can be updated either in the emulation path or in the
restore path.

Provide a wrapper to update guest_fpu::fpstate::xfd. If the guest
fpstate is currently in-use, also update the per-cpu xfd cache and
the actual MSR.
Signed-off-by: NKevin Tian <kevin.tian@intel.com>
Signed-off-by: NJing Liu <jing2.liu@intel.com>
Signed-off-by: NYang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-10-yang.zhong@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8eb9a48a

kvm: x86: Enable dynamic xfeatures at KVM_SET_CPUID2 · 5ab2f45b

由 Jing Liu 提交于 3年前

KVM can request fpstate expansion in two approaches:

  1) When intercepting guest updates to XCR0 and XFD MSR;

  2) Before vcpu runs (e.g. at KVM_SET_CPUID2);

The first option doesn't waste memory for legacy guest if it doesn't
support XFD. However doing so introduces more complexity and also
imposes an order requirement in the restoring path, i.e. XCR0/XFD
must be restored before XSTATE.

Given that the agreement is to do the static approach. This is
considered a better tradeoff though it does waste 8K memory for
legacy guest if its CPUID includes dynamically-enabled xfeatures.

Successful fpstate expansion requires userspace VMM to acquire
guest xstate permissions before calling KVM_SET_CPUID2.

Also take the chance to adjust the indent in kvm_set_cpuid().
Signed-off-by: NJing Liu <jing2.liu@intel.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NKevin Tian <kevin.tian@intel.com>
Signed-off-by: NYang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-9-yang.zhong@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5ab2f45b

x86/fpu: Provide fpu_enable_guest_xfd_features() for KVM · 0781d60f

由 Sean Christopherson 提交于 3年前

Provide a wrapper for expanding the guest fpstate buffer according
to requested xfeatures. KVM wants to call this wrapper to manage
any dynamic xstate used by the guest.
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NKevin Tian <kevin.tian@intel.com>
Signed-off-by: NYang Zhong <yang.zhong@intel.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220105123532.12586-8-yang.zhong@intel.com>
[Remove unnecessary 32-bit check. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0781d60f

x86/fpu: Add guest support to xfd_enable_feature() · c270ce39

由 Thomas Gleixner 提交于 3年前

Guest support for dynamically enabled FPU features requires a few
modifications to the enablement function which is currently invoked from
the #NM handler:

  1) Use guest permissions and sizes for the update

  2) Update fpu_guest state accordingly

  3) Take into account that the enabling can be triggered either from a
     running guest via XSETBV and MSR_IA32_XFD write emulation or from
     a guest restore. In the latter case the guests fpstate is not the
     current tasks active fpstate.

Split the function and implement the guest mechanics throughout the
callchain.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NJing Liu <jing2.liu@intel.com>
Signed-off-by: NYang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-7-yang.zhong@intel.com>
[Add 32-bit stub for __xfd_enable_feature. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c270ce39

x86/fpu: Make XFD initialization in __fpstate_reset() a function argument · b0237dad

由 Jing Liu 提交于 3年前

vCPU threads are different from native tasks regarding to the initial XFD
value. While all native tasks follow a fixed value (init_fpstate::xfd)
established by the FPU core at boot, vCPU threads need to obey the reset
value (i.e. ZERO) defined by the specification, to meet the expectation of
the guest.

Let the caller supply an argument and adjust the host and guest related
invocations accordingly.
Signed-off-by: NJing Liu <jing2.liu@intel.com>
Signed-off-by: NYang Zhong <yang.zhong@intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NJing Liu <jing2.liu@intel.com>
Signed-off-by: NYang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-6-yang.zhong@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b0237dad

08 1月, 2022 4 次提交

kvm: x86: Exclude unpermitted xfeatures at KVM_GET_SUPPORTED_CPUID · 445ecdf7

由 Jing Liu 提交于 3年前

KVM_GET_SUPPORTED_CPUID should not include any dynamic xstates in
CPUID[0xD] if they have not been requested with prctl. Otherwise
a process which directly passes KVM_GET_SUPPORTED_CPUID to
KVM_SET_CPUID2 would now fail even if it doesn't intend to use a
dynamically enabled feature. Userspace must know that prctl is
required and allocate >4K xstate buffer before setting any dynamic
bit.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJing Liu <jing2.liu@intel.com>
Signed-off-by: NYang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-5-yang.zhong@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

445ecdf7

kvm: x86: Fix xstate_required_size() to follow XSTATE alignment rule · cc04b6a2

由 Jing Liu 提交于 3年前

CPUID.0xD.1.EBX enumerates the size of the XSAVE area (in compacted
format) required by XSAVES. If CPUID.0xD.i.ECX[1] is set for a state
component (i), this state component should be located on the next
64-bytes boundary following the preceding state component in the
compacted layout.

Fix xstate_required_size() to follow the alignment rule. AMX is the
first state component with 64-bytes alignment to catch this bug.
Signed-off-by: NJing Liu <jing2.liu@intel.com>
Signed-off-by: NYang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-4-yang.zhong@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cc04b6a2

x86/fpu: Prepare guest FPU for dynamically enabled FPU features · 36487e62

由 Thomas Gleixner 提交于 3年前

To support dynamically enabled FPU features for guests prepare the guest
pseudo FPU container to keep track of the currently enabled xfeatures and
the guest permissions.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NJing Liu <jing2.liu@intel.com>
Signed-off-by: NYang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-3-yang.zhong@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

36487e62

x86/fpu: Extend fpu_xstate_prctl() with guest permissions · 980fe2fd

由 Thomas Gleixner 提交于 3年前

KVM requires a clear separation of host user space and guest permissions
for dynamic XSTATE components.

Add a guest permissions member to struct fpu and a separate set of prctl()
arguments: ARCH_GET_XCOMP_GUEST_PERM and ARCH_REQ_XCOMP_GUEST_PERM.

The semantics are equivalent to the host user space permission control
except for the following constraints:

  1) Permissions have to be requested before the first vCPU is created

  2) Permissions are frozen when the first vCPU is created to ensure
     consistency. Any attempt to expand permissions via the prctl() after
     that point is rejected.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NJing Liu <jing2.liu@intel.com>
Signed-off-by: NYang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-2-yang.zhong@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

980fe2fd

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功