未验证 提交 606f08e8 编写于 作者: O openeuler-ci-bot 提交者: Gitee

!275 Intel Advanced Matrix Extensions (AMX) - KVM support

Merge Pull Request from: @Linwang_68f8 
 
 **Content:** 
Intel® Advanced Matrix Extensions (Intel® AMX) is a new 64-bit programming paradigm consisting of two components: a set of 2-dimensional registers (tiles) representing sub-arrays from a larger 2-dimensional memory image, and an accelerator able to operate on tiles, the first implementation is called TMUL (tile matrix multiply unit).

There are 37 patches in total in this patch set to introduce AMX guest support in openEuler.

 **Intel-kernel issue:** 
https://gitee.com/openeuler/intel-kernel/issues/I5RQLJ

 **Test environment:** 
Host: openEuler 22.09 + backporting kernel
Guest: openEuler 22.09 + QEMU 7.0 + backporting kernel

 **Test cases:** 
Host:
kernel self-test including sigaltstack and AMX state management testing.
TMUL functional testing.
AMX stress.
Context switch testing.
INT8/BF16 online inference.
Guest:
AMX stress.
Context switch testing.
INT8/BF16 online inference.

 **Known issue:** 
N/A

 **Default config change:** 
N/A 
 
Link:https://gitee.com/openeuler/kernel/pulls/275 
Reviewed-by: Jun Tian <jun.j.tian@intel.com> 
Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> 
Reviewed-by: Liu Chao <liuchao173@huawei.com> 
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> 
...@@ -1514,6 +1514,7 @@ is vcpu 0. ...@@ -1514,6 +1514,7 @@ is vcpu 0.
struct kvm_xsave { struct kvm_xsave {
__u32 region[1024]; __u32 region[1024];
__u32 extra[0];
}; };
This ioctl would copy current vcpu's xsave struct to the userspace. This ioctl would copy current vcpu's xsave struct to the userspace.
...@@ -1522,7 +1523,7 @@ This ioctl would copy current vcpu's xsave struct to the userspace. ...@@ -1522,7 +1523,7 @@ This ioctl would copy current vcpu's xsave struct to the userspace.
4.43 KVM_SET_XSAVE 4.43 KVM_SET_XSAVE
------------------ ------------------
:Capability: KVM_CAP_XSAVE :Capability: KVM_CAP_XSAVE and KVM_CAP_XSAVE2
:Architectures: x86 :Architectures: x86
:Type: vcpu ioctl :Type: vcpu ioctl
:Parameters: struct kvm_xsave (in) :Parameters: struct kvm_xsave (in)
...@@ -1533,9 +1534,18 @@ This ioctl would copy current vcpu's xsave struct to the userspace. ...@@ -1533,9 +1534,18 @@ This ioctl would copy current vcpu's xsave struct to the userspace.
struct kvm_xsave { struct kvm_xsave {
__u32 region[1024]; __u32 region[1024];
__u32 extra[0];
}; };
This ioctl would copy userspace's xsave struct to the kernel. This ioctl would copy userspace's xsave struct to the kernel. It copies
as many bytes as are returned by KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2),
when invoked on the vm file descriptor. The size value returned by
KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2) will always be at least 4096.
Currently, it is only greater than 4096 if a dynamic feature has been
enabled with ``arch_prctl()``, but this may change in the future.
The offsets of the state save areas in struct kvm_xsave follow the
contents of CPUID leaf 0xD on the host.
4.44 KVM_GET_XCRS 4.44 KVM_GET_XCRS
...@@ -1632,6 +1642,10 @@ userspace capabilities, and with user requirements (for example, the ...@@ -1632,6 +1642,10 @@ userspace capabilities, and with user requirements (for example, the
user may wish to constrain cpuid to emulate older hardware, or for user may wish to constrain cpuid to emulate older hardware, or for
feature consistency across a cluster). feature consistency across a cluster).
Dynamically-enabled feature bits need to be requested with
``arch_prctl()`` before calling this ioctl. Feature bits that have not
been requested are excluded from the result.
Note that certain capabilities, such as KVM_CAP_X86_DISABLE_EXITS, may Note that certain capabilities, such as KVM_CAP_X86_DISABLE_EXITS, may
expose cpuid features (e.g. MONITOR) which are not supported by kvm in expose cpuid features (e.g. MONITOR) which are not supported by kvm in
its default configuration. If userspace enables such capabilities, it its default configuration. If userspace enables such capabilities, it
...@@ -3181,6 +3195,7 @@ number. ...@@ -3181,6 +3195,7 @@ number.
:Capability: KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device, :Capability: KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device,
KVM_CAP_VCPU_ATTRIBUTES for vcpu device KVM_CAP_VCPU_ATTRIBUTES for vcpu device
KVM_CAP_SYS_ATTRIBUTES for system (/dev/kvm) device (no set)
:Type: device ioctl, vm ioctl, vcpu ioctl :Type: device ioctl, vm ioctl, vcpu ioctl
:Parameters: struct kvm_device_attr :Parameters: struct kvm_device_attr
:Returns: 0 on success, -1 on error :Returns: 0 on success, -1 on error
...@@ -3215,7 +3230,8 @@ transferred is defined by the particular attribute. ...@@ -3215,7 +3230,8 @@ transferred is defined by the particular attribute.
------------------------ ------------------------
:Capability: KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device, :Capability: KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device,
KVM_CAP_VCPU_ATTRIBUTES for vcpu device KVM_CAP_VCPU_ATTRIBUTES for vcpu device
KVM_CAP_SYS_ATTRIBUTES for system (/dev/kvm) device
:Type: device ioctl, vm ioctl, vcpu ioctl :Type: device ioctl, vm ioctl, vcpu ioctl
:Parameters: struct kvm_device_attr :Parameters: struct kvm_device_attr
:Returns: 0 on success, -1 on error :Returns: 0 on success, -1 on error
...@@ -4979,6 +4995,33 @@ KVM does guarantee that vCPUs will see either the previous filter or the new ...@@ -4979,6 +4995,33 @@ KVM does guarantee that vCPUs will see either the previous filter or the new
filter, e.g. MSRs with identical settings in both the old and new filter will filter, e.g. MSRs with identical settings in both the old and new filter will
have deterministic behavior. have deterministic behavior.
4.134 KVM_GET_XSAVE2
--------------------
:Capability: KVM_CAP_XSAVE2
:Architectures: x86
:Type: vcpu ioctl
:Parameters: struct kvm_xsave (out)
:Returns: 0 on success, -1 on error
::
struct kvm_xsave {
__u32 region[1024];
__u32 extra[0];
};
This ioctl would copy current vcpu's xsave struct to the userspace. It
copies as many bytes as are returned by KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2)
when invoked on the vm file descriptor. The size value returned by
KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2) will always be at least 4096.
Currently, it is only greater than 4096 if a dynamic feature has been
enabled with ``arch_prctl()``, but this may change in the future.
The offsets of the state save areas in struct kvm_xsave follow the contents
of CPUID leaf 0xD on the host.
5. The kvm_run structure 5. The kvm_run structure
======================== ========================
......
...@@ -111,10 +111,21 @@ static inline void fpstate_free(struct fpu *fpu) { } ...@@ -111,10 +111,21 @@ static inline void fpstate_free(struct fpu *fpu) { }
/* fpstate-related functions which are exported to KVM */ /* fpstate-related functions which are exported to KVM */
extern void fpstate_clear_xstate_component(struct fpstate *fps, unsigned int xfeature); extern void fpstate_clear_xstate_component(struct fpstate *fps, unsigned int xfeature);
extern u64 xstate_get_guest_group_perm(void);
/* KVM specific functions */ /* KVM specific functions */
extern bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu); extern bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu);
extern void fpu_free_guest_fpstate(struct fpu_guest *gfpu); extern void fpu_free_guest_fpstate(struct fpu_guest *gfpu);
extern int fpu_swap_kvm_fpstate(struct fpu_guest *gfpu, bool enter_guest); extern int fpu_swap_kvm_fpstate(struct fpu_guest *gfpu, bool enter_guest);
extern int fpu_enable_guest_xfd_features(struct fpu_guest *guest_fpu, u64 xfeatures);
#ifdef CONFIG_X86_64
extern void fpu_update_guest_xfd(struct fpu_guest *guest_fpu, u64 xfd);
extern void fpu_sync_guest_vmexit_xfd_state(void);
#else
static inline void fpu_update_guest_xfd(struct fpu_guest *guest_fpu, u64 xfd) { }
static inline void fpu_sync_guest_vmexit_xfd_state(void) { }
#endif
extern void fpu_copy_guest_fpstate_to_uabi(struct fpu_guest *gfpu, void *buf, unsigned int size, u32 pkru); extern void fpu_copy_guest_fpstate_to_uabi(struct fpu_guest *gfpu, void *buf, unsigned int size, u32 pkru);
extern int fpu_copy_uabi_to_guest_fpstate(struct fpu_guest *gfpu, const void *buf, u64 xcr0, u32 *vpkru); extern int fpu_copy_uabi_to_guest_fpstate(struct fpu_guest *gfpu, const void *buf, u64 xcr0, u32 *vpkru);
......
...@@ -389,6 +389,8 @@ struct fpstate { ...@@ -389,6 +389,8 @@ struct fpstate {
/* @regs is dynamically sized! Don't add anything after @regs! */ /* @regs is dynamically sized! Don't add anything after @regs! */
} __aligned(64); } __aligned(64);
#define FPU_GUEST_PERM_LOCKED BIT_ULL(63)
struct fpu_state_perm { struct fpu_state_perm {
/* /*
* @__state_perm: * @__state_perm:
...@@ -478,6 +480,13 @@ struct fpu { ...@@ -478,6 +480,13 @@ struct fpu {
*/ */
KABI_EXTEND(struct fpu_state_perm perm) KABI_EXTEND(struct fpu_state_perm perm)
/*
* @guest_perm:
*
* Permission related information for guest pseudo FPUs
*/
KABI_EXTEND(struct fpu_state_perm guest_perm)
/* /*
* @__fpstate: * @__fpstate:
* *
...@@ -498,6 +507,29 @@ struct fpu { ...@@ -498,6 +507,29 @@ struct fpu {
* Guest pseudo FPU container * Guest pseudo FPU container
*/ */
struct fpu_guest { struct fpu_guest {
/*
* @xfeatures: xfeature bitmap of features which are
* currently enabled for the guest vCPU.
*/
u64 xfeatures;
/*
* @perm: xfeature bitmap of features which are
* permitted to be enabled for the guest
* vCPU.
*/
u64 perm;
/*
* @xfd_err: Save the guest value.
*/
u64 xfd_err;
/*
* @uabi_size: Size required for save/restore
*/
unsigned int uabi_size;
/* /*
* @fpstate: Pointer to the allocated guest fpstate * @fpstate: Pointer to the allocated guest fpstate
*/ */
......
...@@ -575,6 +575,7 @@ struct kvm_vcpu_arch { ...@@ -575,6 +575,7 @@ struct kvm_vcpu_arch {
bool at_instruction_boundary; bool at_instruction_boundary;
bool tpr_access_reporting; bool tpr_access_reporting;
bool xsaves_enabled; bool xsaves_enabled;
bool xfd_no_write_intercept;
u64 ia32_xss; u64 ia32_xss;
u64 microcode_version; u64 microcode_version;
u64 arch_capabilities; u64 arch_capabilities;
......
...@@ -362,9 +362,23 @@ struct kvm_debugregs { ...@@ -362,9 +362,23 @@ struct kvm_debugregs {
__u64 reserved[9]; __u64 reserved[9];
}; };
/* for KVM_CAP_XSAVE */ /* for KVM_CAP_XSAVE and KVM_CAP_XSAVE2 */
struct kvm_xsave { struct kvm_xsave {
/*
* KVM_GET_XSAVE2 and KVM_SET_XSAVE write and read as many bytes
* as are returned by KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2)
* respectively, when invoked on the vm file descriptor.
*
* The size value returned by KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2)
* will always be at least 4096. Currently, it is only greater
* than 4096 if a dynamic feature has been enabled with
* ``arch_prctl()``, but this may change in the future.
*
* The offsets of the state save areas in struct kvm_xsave follow
* the contents of CPUID leaf 0xD on the host.
*/
__u32 region[1024]; __u32 region[1024];
__u32 extra[0];
}; };
#define KVM_MAX_XCRS 16 #define KVM_MAX_XCRS 16
...@@ -427,6 +441,9 @@ struct kvm_sync_regs { ...@@ -427,6 +441,9 @@ struct kvm_sync_regs {
#define KVM_STATE_VMX_PREEMPTION_TIMER_DEADLINE 0x00000001 #define KVM_STATE_VMX_PREEMPTION_TIMER_DEADLINE 0x00000001
/* attributes for system fd (group 0) */
#define KVM_X86_XCOMP_GUEST_SUPP 0
struct kvm_vmx_nested_state_data { struct kvm_vmx_nested_state_data {
__u8 vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE]; __u8 vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE];
__u8 shadow_vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE]; __u8 shadow_vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE];
......
...@@ -2,20 +2,22 @@ ...@@ -2,20 +2,22 @@
#ifndef _ASM_X86_PRCTL_H #ifndef _ASM_X86_PRCTL_H
#define _ASM_X86_PRCTL_H #define _ASM_X86_PRCTL_H
#define ARCH_SET_GS 0x1001 #define ARCH_SET_GS 0x1001
#define ARCH_SET_FS 0x1002 #define ARCH_SET_FS 0x1002
#define ARCH_GET_FS 0x1003 #define ARCH_GET_FS 0x1003
#define ARCH_GET_GS 0x1004 #define ARCH_GET_GS 0x1004
#define ARCH_GET_CPUID 0x1011 #define ARCH_GET_CPUID 0x1011
#define ARCH_SET_CPUID 0x1012 #define ARCH_SET_CPUID 0x1012
#define ARCH_GET_XCOMP_SUPP 0x1021 #define ARCH_GET_XCOMP_SUPP 0x1021
#define ARCH_GET_XCOMP_PERM 0x1022 #define ARCH_GET_XCOMP_PERM 0x1022
#define ARCH_REQ_XCOMP_PERM 0x1023 #define ARCH_REQ_XCOMP_PERM 0x1023
#define ARCH_GET_XCOMP_GUEST_PERM 0x1024
#define ARCH_REQ_XCOMP_GUEST_PERM 0x1025
#define ARCH_MAP_VDSO_X32 0x2001 #define ARCH_MAP_VDSO_X32 0x2001
#define ARCH_MAP_VDSO_32 0x2002 #define ARCH_MAP_VDSO_32 0x2002
#define ARCH_MAP_VDSO_64 0x2003 #define ARCH_MAP_VDSO_64 0x2003
#endif /* _ASM_X86_PRCTL_H */ #endif /* _ASM_X86_PRCTL_H */
...@@ -183,7 +183,27 @@ void fpu_reset_from_exception_fixup(void) ...@@ -183,7 +183,27 @@ void fpu_reset_from_exception_fixup(void)
} }
#if IS_ENABLED(CONFIG_KVM) #if IS_ENABLED(CONFIG_KVM)
static void __fpstate_reset(struct fpstate *fpstate); static void __fpstate_reset(struct fpstate *fpstate, u64 xfd);
static void fpu_init_guest_permissions(struct fpu_guest *gfpu)
{
struct fpu_state_perm *fpuperm;
u64 perm;
if (!IS_ENABLED(CONFIG_X86_64))
return;
spin_lock_irq(&current->sighand->siglock);
fpuperm = &current->group_leader->thread.fpu.guest_perm;
perm = fpuperm->__state_perm;
/* First fpstate allocation locks down permissions. */
WRITE_ONCE(fpuperm->__state_perm, perm | FPU_GUEST_PERM_LOCKED);
spin_unlock_irq(&current->sighand->siglock);
gfpu->perm = perm & ~FPU_GUEST_PERM_LOCKED;
}
bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu) bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu)
{ {
...@@ -195,12 +215,18 @@ bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu) ...@@ -195,12 +215,18 @@ bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu)
if (!fpstate) if (!fpstate)
return false; return false;
__fpstate_reset(fpstate); /* Leave xfd to 0 (the reset value defined by spec) */
__fpstate_reset(fpstate, 0);
fpstate_init_user(fpstate); fpstate_init_user(fpstate);
fpstate->is_valloc = true; fpstate->is_valloc = true;
fpstate->is_guest = true; fpstate->is_guest = true;
gfpu->fpstate = fpstate; gfpu->fpstate = fpstate;
gfpu->xfeatures = fpu_user_cfg.default_features;
gfpu->perm = fpu_user_cfg.default_features;
gfpu->uabi_size = fpu_user_cfg.default_size;
fpu_init_guest_permissions(gfpu);
return true; return true;
} }
EXPORT_SYMBOL_GPL(fpu_alloc_guest_fpstate); EXPORT_SYMBOL_GPL(fpu_alloc_guest_fpstate);
...@@ -220,6 +246,64 @@ void fpu_free_guest_fpstate(struct fpu_guest *gfpu) ...@@ -220,6 +246,64 @@ void fpu_free_guest_fpstate(struct fpu_guest *gfpu)
} }
EXPORT_SYMBOL_GPL(fpu_free_guest_fpstate); EXPORT_SYMBOL_GPL(fpu_free_guest_fpstate);
/*
* fpu_enable_guest_xfd_features - Check xfeatures against guest perm and enable
* @guest_fpu: Pointer to the guest FPU container
* @xfeatures: Features requested by guest CPUID
*
* Enable all dynamic xfeatures according to guest perm and requested CPUID.
*
* Return: 0 on success, error code otherwise
*/
int fpu_enable_guest_xfd_features(struct fpu_guest *guest_fpu, u64 xfeatures)
{
lockdep_assert_preemption_enabled();
/* Nothing to do if all requested features are already enabled. */
xfeatures &= ~guest_fpu->xfeatures;
if (!xfeatures)
return 0;
return __xfd_enable_feature(xfeatures, guest_fpu);
}
EXPORT_SYMBOL_GPL(fpu_enable_guest_xfd_features);
#ifdef CONFIG_X86_64
void fpu_update_guest_xfd(struct fpu_guest *guest_fpu, u64 xfd)
{
fpregs_lock();
guest_fpu->fpstate->xfd = xfd;
if (guest_fpu->fpstate->in_use)
xfd_update_state(guest_fpu->fpstate);
fpregs_unlock();
}
EXPORT_SYMBOL_GPL(fpu_update_guest_xfd);
/**
* fpu_sync_guest_vmexit_xfd_state - Synchronize XFD MSR and software state
*
* Must be invoked from KVM after a VMEXIT before enabling interrupts when
* XFD write emulation is disabled. This is required because the guest can
* freely modify XFD and the state at VMEXIT is not guaranteed to be the
* same as the state on VMENTER. So software state has to be udpated before
* any operation which depends on it can take place.
*
* Note: It can be invoked unconditionally even when write emulation is
* enabled for the price of a then pointless MSR read.
*/
void fpu_sync_guest_vmexit_xfd_state(void)
{
struct fpstate *fps = current->thread.fpu.fpstate;
lockdep_assert_irqs_disabled();
if (fpu_state_size_dynamic()) {
rdmsrl(MSR_IA32_XFD, fps->xfd);
__this_cpu_write(xfd_state, fps->xfd);
}
}
EXPORT_SYMBOL_GPL(fpu_sync_guest_vmexit_xfd_state);
#endif /* CONFIG_X86_64 */
int fpu_swap_kvm_fpstate(struct fpu_guest *guest_fpu, bool enter_guest) int fpu_swap_kvm_fpstate(struct fpu_guest *guest_fpu, bool enter_guest)
{ {
struct fpstate *guest_fps = guest_fpu->fpstate; struct fpstate *guest_fps = guest_fpu->fpstate;
...@@ -414,26 +498,28 @@ void fpstate_init_user(struct fpstate *fpstate) ...@@ -414,26 +498,28 @@ void fpstate_init_user(struct fpstate *fpstate)
fpstate_init_fstate(fpstate); fpstate_init_fstate(fpstate);
} }
static void __fpstate_reset(struct fpstate *fpstate) static void __fpstate_reset(struct fpstate *fpstate, u64 xfd)
{ {
/* Initialize sizes and feature masks */ /* Initialize sizes and feature masks */
fpstate->size = fpu_kernel_cfg.default_size; fpstate->size = fpu_kernel_cfg.default_size;
fpstate->user_size = fpu_user_cfg.default_size; fpstate->user_size = fpu_user_cfg.default_size;
fpstate->xfeatures = fpu_kernel_cfg.default_features; fpstate->xfeatures = fpu_kernel_cfg.default_features;
fpstate->user_xfeatures = fpu_user_cfg.default_features; fpstate->user_xfeatures = fpu_user_cfg.default_features;
fpstate->xfd = init_fpstate.xfd; fpstate->xfd = xfd;
} }
void fpstate_reset(struct fpu *fpu) void fpstate_reset(struct fpu *fpu)
{ {
/* Set the fpstate pointer to the default fpstate */ /* Set the fpstate pointer to the default fpstate */
fpu->fpstate = &fpu->__fpstate; fpu->fpstate = &fpu->__fpstate;
__fpstate_reset(fpu->fpstate); __fpstate_reset(fpu->fpstate, init_fpstate.xfd);
/* Initialize the permission related info in fpu */ /* Initialize the permission related info in fpu */
fpu->perm.__state_perm = fpu_kernel_cfg.default_features; fpu->perm.__state_perm = fpu_kernel_cfg.default_features;
fpu->perm.__state_size = fpu_kernel_cfg.default_size; fpu->perm.__state_size = fpu_kernel_cfg.default_size;
fpu->perm.__user_state_size = fpu_user_cfg.default_size; fpu->perm.__user_state_size = fpu_user_cfg.default_size;
/* Same defaults for guests */
fpu->guest_perm = fpu->perm;
} }
static inline void fpu_inherit_perms(struct fpu *dst_fpu) static inline void fpu_inherit_perms(struct fpu *dst_fpu)
...@@ -444,6 +530,7 @@ static inline void fpu_inherit_perms(struct fpu *dst_fpu) ...@@ -444,6 +530,7 @@ static inline void fpu_inherit_perms(struct fpu *dst_fpu)
spin_lock_irq(&current->sighand->siglock); spin_lock_irq(&current->sighand->siglock);
/* Fork also inherits the permissions of the parent */ /* Fork also inherits the permissions of the parent */
dst_fpu->perm = src_fpu->perm; dst_fpu->perm = src_fpu->perm;
dst_fpu->guest_perm = src_fpu->guest_perm;
spin_unlock_irq(&current->sighand->siglock); spin_unlock_irq(&current->sighand->siglock);
} }
} }
......
...@@ -1499,29 +1499,6 @@ void fpstate_free(struct fpu *fpu) ...@@ -1499,29 +1499,6 @@ void fpstate_free(struct fpu *fpu)
vfree(fpu->fpstate); vfree(fpu->fpstate);
} }
/**
* fpu_install_fpstate - Update the active fpstate in the FPU
*
* @fpu: A struct fpu * pointer
* @newfps: A struct fpstate * pointer
*
* Returns: A null pointer if the last active fpstate is the embedded
* one or the new fpstate is already installed;
* otherwise, a pointer to the old fpstate which has to
* be freed by the caller.
*/
static struct fpstate *fpu_install_fpstate(struct fpu *fpu,
struct fpstate *newfps)
{
struct fpstate *oldfps = fpu->fpstate;
if (fpu->fpstate == newfps)
return NULL;
fpu->fpstate = newfps;
return oldfps != &fpu->__fpstate ? oldfps : NULL;
}
/** /**
* fpstate_realloc - Reallocate struct fpstate for the requested new features * fpstate_realloc - Reallocate struct fpstate for the requested new features
* *
...@@ -1529,6 +1506,7 @@ static struct fpstate *fpu_install_fpstate(struct fpu *fpu, ...@@ -1529,6 +1506,7 @@ static struct fpstate *fpu_install_fpstate(struct fpu *fpu,
* of that task * of that task
* @ksize: The required size for the kernel buffer * @ksize: The required size for the kernel buffer
* @usize: The required size for user space buffers * @usize: The required size for user space buffers
* @guest_fpu: Pointer to a guest FPU container. NULL for host allocations
* *
* Note vs. vmalloc(): If the task with a vzalloc()-allocated buffer * Note vs. vmalloc(): If the task with a vzalloc()-allocated buffer
* terminates quickly, vfree()-induced IPIs may be a concern, but tasks * terminates quickly, vfree()-induced IPIs may be a concern, but tasks
...@@ -1537,13 +1515,13 @@ static struct fpstate *fpu_install_fpstate(struct fpu *fpu, ...@@ -1537,13 +1515,13 @@ static struct fpstate *fpu_install_fpstate(struct fpu *fpu,
* Returns: 0 on success, -ENOMEM on allocation error. * Returns: 0 on success, -ENOMEM on allocation error.
*/ */
static int fpstate_realloc(u64 xfeatures, unsigned int ksize, static int fpstate_realloc(u64 xfeatures, unsigned int ksize,
unsigned int usize) unsigned int usize, struct fpu_guest *guest_fpu)
{ {
struct fpu *fpu = &current->thread.fpu; struct fpu *fpu = &current->thread.fpu;
struct fpstate *curfps, *newfps = NULL; struct fpstate *curfps, *newfps = NULL;
unsigned int fpsize; unsigned int fpsize;
bool in_use;
curfps = fpu->fpstate;
fpsize = ksize + ALIGN(offsetof(struct fpstate, regs), 64); fpsize = ksize + ALIGN(offsetof(struct fpstate, regs), 64);
newfps = vzalloc(fpsize); newfps = vzalloc(fpsize);
...@@ -1553,28 +1531,56 @@ static int fpstate_realloc(u64 xfeatures, unsigned int ksize, ...@@ -1553,28 +1531,56 @@ static int fpstate_realloc(u64 xfeatures, unsigned int ksize,
newfps->user_size = usize; newfps->user_size = usize;
newfps->is_valloc = true; newfps->is_valloc = true;
/*
* When a guest FPU is supplied, use @guest_fpu->fpstate
* as reference independent whether it is in use or not.
*/
curfps = guest_fpu ? guest_fpu->fpstate : fpu->fpstate;
/* Determine whether @curfps is the active fpstate */
in_use = fpu->fpstate == curfps;
if (guest_fpu) {
newfps->is_guest = true;
newfps->is_confidential = curfps->is_confidential;
newfps->in_use = curfps->in_use;
guest_fpu->xfeatures |= xfeatures;
guest_fpu->uabi_size = usize;
}
fpregs_lock(); fpregs_lock();
/* /*
* Ensure that the current state is in the registers before * If @curfps is in use, ensure that the current state is in the
* swapping fpstate as that might invalidate it due to layout * registers before swapping fpstate as that might invalidate it
* changes. * due to layout changes.
*/ */
if (test_thread_flag(TIF_NEED_FPU_LOAD)) if (in_use && test_thread_flag(TIF_NEED_FPU_LOAD))
fpregs_restore_userregs(); fpregs_restore_userregs();
newfps->xfeatures = curfps->xfeatures | xfeatures; newfps->xfeatures = curfps->xfeatures | xfeatures;
newfps->user_xfeatures = curfps->user_xfeatures | xfeatures; newfps->user_xfeatures = curfps->user_xfeatures | xfeatures;
newfps->xfd = curfps->xfd & ~xfeatures; newfps->xfd = curfps->xfd & ~xfeatures;
curfps = fpu_install_fpstate(fpu, newfps);
/* Do the final updates within the locked region */ /* Do the final updates within the locked region */
xstate_init_xcomp_bv(&newfps->regs.xsave, newfps->xfeatures); xstate_init_xcomp_bv(&newfps->regs.xsave, newfps->xfeatures);
xfd_update_state(newfps);
if (guest_fpu) {
guest_fpu->fpstate = newfps;
/* If curfps is active, update the FPU fpstate pointer */
if (in_use)
fpu->fpstate = newfps;
} else {
fpu->fpstate = newfps;
}
if (in_use)
xfd_update_state(fpu->fpstate);
fpregs_unlock(); fpregs_unlock();
vfree(curfps); /* Only free valloc'ed state */
if (curfps && curfps->is_valloc)
vfree(curfps);
return 0; return 0;
} }
...@@ -1595,7 +1601,7 @@ static int validate_sigaltstack(unsigned int usize) ...@@ -1595,7 +1601,7 @@ static int validate_sigaltstack(unsigned int usize)
return 0; return 0;
} }
static int __xstate_request_perm(u64 permitted, u64 requested) static int __xstate_request_perm(u64 permitted, u64 requested, bool guest)
{ {
/* /*
* This deliberately does not exclude !XSAVES as we still might * This deliberately does not exclude !XSAVES as we still might
...@@ -1605,9 +1611,10 @@ static int __xstate_request_perm(u64 permitted, u64 requested) ...@@ -1605,9 +1611,10 @@ static int __xstate_request_perm(u64 permitted, u64 requested)
*/ */
bool compacted = cpu_feature_enabled(X86_FEATURE_XSAVES); bool compacted = cpu_feature_enabled(X86_FEATURE_XSAVES);
struct fpu *fpu = &current->group_leader->thread.fpu; struct fpu *fpu = &current->group_leader->thread.fpu;
struct fpu_state_perm *perm;
unsigned int ksize, usize; unsigned int ksize, usize;
u64 mask; u64 mask;
int ret; int ret = 0;
/* Check whether fully enabled */ /* Check whether fully enabled */
if ((permitted & requested) == requested) if ((permitted & requested) == requested)
...@@ -1621,15 +1628,18 @@ static int __xstate_request_perm(u64 permitted, u64 requested) ...@@ -1621,15 +1628,18 @@ static int __xstate_request_perm(u64 permitted, u64 requested)
mask &= XFEATURE_MASK_USER_SUPPORTED; mask &= XFEATURE_MASK_USER_SUPPORTED;
usize = xstate_calculate_size(mask, false); usize = xstate_calculate_size(mask, false);
ret = validate_sigaltstack(usize); if (!guest) {
if (ret) ret = validate_sigaltstack(usize);
return ret; if (ret)
return ret;
}
perm = guest ? &fpu->guest_perm : &fpu->perm;
/* Pairs with the READ_ONCE() in xstate_get_group_perm() */ /* Pairs with the READ_ONCE() in xstate_get_group_perm() */
WRITE_ONCE(fpu->perm.__state_perm, mask); WRITE_ONCE(perm->__state_perm, mask);
/* Protected by sighand lock */ /* Protected by sighand lock */
fpu->perm.__state_size = ksize; perm->__state_size = ksize;
fpu->perm.__user_state_size = usize; perm->__user_state_size = usize;
return ret; return ret;
} }
...@@ -1640,7 +1650,7 @@ static const u64 xstate_prctl_req[XFEATURE_MAX] = { ...@@ -1640,7 +1650,7 @@ static const u64 xstate_prctl_req[XFEATURE_MAX] = {
[XFEATURE_XTILE_DATA] = XFEATURE_MASK_XTILE_DATA, [XFEATURE_XTILE_DATA] = XFEATURE_MASK_XTILE_DATA,
}; };
static int xstate_request_perm(unsigned long idx) static int xstate_request_perm(unsigned long idx, bool guest)
{ {
u64 permitted, requested; u64 permitted, requested;
int ret; int ret;
...@@ -1661,26 +1671,33 @@ static int xstate_request_perm(unsigned long idx) ...@@ -1661,26 +1671,33 @@ static int xstate_request_perm(unsigned long idx)
return -EOPNOTSUPP; return -EOPNOTSUPP;
/* Lockless quick check */ /* Lockless quick check */
permitted = xstate_get_host_group_perm(); permitted = xstate_get_group_perm(guest);
if ((permitted & requested) == requested) if ((permitted & requested) == requested)
return 0; return 0;
/* Protect against concurrent modifications */ /* Protect against concurrent modifications */
spin_lock_irq(&current->sighand->siglock); spin_lock_irq(&current->sighand->siglock);
permitted = xstate_get_host_group_perm(); permitted = xstate_get_group_perm(guest);
ret = __xstate_request_perm(permitted, requested);
/* First vCPU allocation locks the permissions. */
if (guest && (permitted & FPU_GUEST_PERM_LOCKED))
ret = -EBUSY;
else
ret = __xstate_request_perm(permitted, requested, guest);
spin_unlock_irq(&current->sighand->siglock); spin_unlock_irq(&current->sighand->siglock);
return ret; return ret;
} }
int xfd_enable_feature(u64 xfd_err) int __xfd_enable_feature(u64 xfd_err, struct fpu_guest *guest_fpu)
{ {
u64 xfd_event = xfd_err & XFEATURE_MASK_USER_DYNAMIC; u64 xfd_event = xfd_err & XFEATURE_MASK_USER_DYNAMIC;
struct fpu_state_perm *perm;
unsigned int ksize, usize; unsigned int ksize, usize;
struct fpu *fpu; struct fpu *fpu;
if (!xfd_event) { if (!xfd_event) {
pr_err_once("XFD: Invalid xfd error: %016llx\n", xfd_err); if (!guest_fpu)
pr_err_once("XFD: Invalid xfd error: %016llx\n", xfd_err);
return 0; return 0;
} }
...@@ -1688,14 +1705,16 @@ int xfd_enable_feature(u64 xfd_err) ...@@ -1688,14 +1705,16 @@ int xfd_enable_feature(u64 xfd_err)
spin_lock_irq(&current->sighand->siglock); spin_lock_irq(&current->sighand->siglock);
/* If not permitted let it die */ /* If not permitted let it die */
if ((xstate_get_host_group_perm() & xfd_event) != xfd_event) { if ((xstate_get_group_perm(!!guest_fpu) & xfd_event) != xfd_event) {
spin_unlock_irq(&current->sighand->siglock); spin_unlock_irq(&current->sighand->siglock);
return -EPERM; return -EPERM;
} }
fpu = &current->group_leader->thread.fpu; fpu = &current->group_leader->thread.fpu;
ksize = fpu->perm.__state_size; perm = guest_fpu ? &fpu->guest_perm : &fpu->perm;
usize = fpu->perm.__user_state_size; ksize = perm->__state_size;
usize = perm->__user_state_size;
/* /*
* The feature is permitted. State size is sufficient. Dropping * The feature is permitted. State size is sufficient. Dropping
* the lock is safe here even if more features are added from * the lock is safe here even if more features are added from
...@@ -1708,17 +1727,29 @@ int xfd_enable_feature(u64 xfd_err) ...@@ -1708,17 +1727,29 @@ int xfd_enable_feature(u64 xfd_err)
* Try to allocate a new fpstate. If that fails there is no way * Try to allocate a new fpstate. If that fails there is no way
* out. * out.
*/ */
if (fpstate_realloc(xfd_event, ksize, usize)) if (fpstate_realloc(xfd_event, ksize, usize, guest_fpu))
return -EFAULT; return -EFAULT;
return 0; return 0;
} }
int xfd_enable_feature(u64 xfd_err)
{
return __xfd_enable_feature(xfd_err, NULL);
}
#else /* CONFIG_X86_64 */ #else /* CONFIG_X86_64 */
static inline int xstate_request_perm(unsigned long idx) static inline int xstate_request_perm(unsigned long idx, bool guest)
{ {
return -EPERM; return -EPERM;
} }
#endif /* !CONFIG_X86_64 */ #endif /* !CONFIG_X86_64 */
u64 xstate_get_guest_group_perm(void)
{
return xstate_get_group_perm(true);
}
EXPORT_SYMBOL_GPL(xstate_get_guest_group_perm);
/** /**
* fpu_xstate_prctl - xstate permission operations * fpu_xstate_prctl - xstate permission operations
* @tsk: Redundant pointer to current * @tsk: Redundant pointer to current
...@@ -1742,6 +1773,7 @@ long fpu_xstate_prctl(struct task_struct *tsk, int option, unsigned long arg2) ...@@ -1742,6 +1773,7 @@ long fpu_xstate_prctl(struct task_struct *tsk, int option, unsigned long arg2)
u64 __user *uptr = (u64 __user *)arg2; u64 __user *uptr = (u64 __user *)arg2;
u64 permitted, supported; u64 permitted, supported;
unsigned long idx = arg2; unsigned long idx = arg2;
bool guest = false;
if (tsk != current) if (tsk != current)
return -EPERM; return -EPERM;
...@@ -1760,11 +1792,20 @@ long fpu_xstate_prctl(struct task_struct *tsk, int option, unsigned long arg2) ...@@ -1760,11 +1792,20 @@ long fpu_xstate_prctl(struct task_struct *tsk, int option, unsigned long arg2)
permitted &= XFEATURE_MASK_USER_SUPPORTED; permitted &= XFEATURE_MASK_USER_SUPPORTED;
return put_user(permitted, uptr); return put_user(permitted, uptr);
case ARCH_GET_XCOMP_GUEST_PERM:
permitted = xstate_get_guest_group_perm();
permitted &= XFEATURE_MASK_USER_SUPPORTED;
return put_user(permitted, uptr);
case ARCH_REQ_XCOMP_GUEST_PERM:
guest = true;
fallthrough;
case ARCH_REQ_XCOMP_PERM: case ARCH_REQ_XCOMP_PERM:
if (!IS_ENABLED(CONFIG_X86_64)) if (!IS_ENABLED(CONFIG_X86_64))
return -EOPNOTSUPP; return -EOPNOTSUPP;
return xstate_request_perm(idx); return xstate_request_perm(idx, guest);
default: default:
return -EINVAL; return -EINVAL;
......
...@@ -20,10 +20,19 @@ static inline void xstate_init_xcomp_bv(struct xregs_state *xsave, u64 mask) ...@@ -20,10 +20,19 @@ static inline void xstate_init_xcomp_bv(struct xregs_state *xsave, u64 mask)
xsave->header.xcomp_bv = mask | XCOMP_BV_COMPACTED_FORMAT; xsave->header.xcomp_bv = mask | XCOMP_BV_COMPACTED_FORMAT;
} }
static inline u64 xstate_get_host_group_perm(void) static inline u64 xstate_get_group_perm(bool guest)
{ {
struct fpu *fpu = &current->group_leader->thread.fpu;
struct fpu_state_perm *perm;
/* Pairs with WRITE_ONCE() in xstate_request_perm() */ /* Pairs with WRITE_ONCE() in xstate_request_perm() */
return READ_ONCE(current->group_leader->thread.fpu.perm.__state_perm); perm = guest ? &fpu->guest_perm : &fpu->perm;
return READ_ONCE(perm->__state_perm);
}
static inline u64 xstate_get_host_group_perm(void)
{
return xstate_get_group_perm(false);
} }
enum xstate_copy_mode { enum xstate_copy_mode {
...@@ -153,8 +162,14 @@ static inline void xfd_update_state(struct fpstate *fpstate) ...@@ -153,8 +162,14 @@ static inline void xfd_update_state(struct fpstate *fpstate)
} }
} }
} }
extern int __xfd_enable_feature(u64 which, struct fpu_guest *guest_fpu);
#else #else
static inline void xfd_update_state(struct fpstate *fpstate) { } static inline void xfd_update_state(struct fpstate *fpstate) { }
static inline int __xfd_enable_feature(u64 which, struct fpu_guest *guest_fpu) {
return -EPERM;
}
#endif #endif
/* /*
......
...@@ -1003,6 +1003,8 @@ long do_arch_prctl_common(struct task_struct *task, int option, ...@@ -1003,6 +1003,8 @@ long do_arch_prctl_common(struct task_struct *task, int option,
case ARCH_GET_XCOMP_SUPP: case ARCH_GET_XCOMP_SUPP:
case ARCH_GET_XCOMP_PERM: case ARCH_GET_XCOMP_PERM:
case ARCH_REQ_XCOMP_PERM: case ARCH_REQ_XCOMP_PERM:
case ARCH_GET_XCOMP_GUEST_PERM:
case ARCH_REQ_XCOMP_GUEST_PERM:
return fpu_xstate_prctl(task, option, arg2); return fpu_xstate_prctl(task, option, arg2);
} }
......
...@@ -32,7 +32,7 @@ ...@@ -32,7 +32,7 @@
u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly; u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly;
EXPORT_SYMBOL_GPL(kvm_cpu_caps); EXPORT_SYMBOL_GPL(kvm_cpu_caps);
static u32 xstate_required_size(u64 xstate_bv, bool compacted) u32 xstate_required_size(u64 xstate_bv, bool compacted)
{ {
int feature_bit = 0; int feature_bit = 0;
u32 ret = XSAVE_HDR_SIZE + XSAVE_HDR_OFFSET; u32 ret = XSAVE_HDR_SIZE + XSAVE_HDR_OFFSET;
...@@ -42,7 +42,11 @@ static u32 xstate_required_size(u64 xstate_bv, bool compacted) ...@@ -42,7 +42,11 @@ static u32 xstate_required_size(u64 xstate_bv, bool compacted)
if (xstate_bv & 0x1) { if (xstate_bv & 0x1) {
u32 eax, ebx, ecx, edx, offset; u32 eax, ebx, ecx, edx, offset;
cpuid_count(0xD, feature_bit, &eax, &ebx, &ecx, &edx); cpuid_count(0xD, feature_bit, &eax, &ebx, &ecx, &edx);
offset = compacted ? ret : ebx; /* ECX[1]: 64B alignment in compacted form */
if (compacted)
offset = (ecx & 0x2) ? ALIGN(ret, 64) : ret;
else
offset = ebx;
ret = max(ret, offset + eax); ret = max(ret, offset + eax);
} }
...@@ -73,9 +77,12 @@ static inline struct kvm_cpuid_entry2 *cpuid_entry2_find( ...@@ -73,9 +77,12 @@ static inline struct kvm_cpuid_entry2 *cpuid_entry2_find(
return NULL; return NULL;
} }
static int kvm_check_cpuid(struct kvm_cpuid_entry2 *entries, int nent) static int kvm_check_cpuid(struct kvm_vcpu *vcpu,
struct kvm_cpuid_entry2 *entries,
int nent)
{ {
struct kvm_cpuid_entry2 *best; struct kvm_cpuid_entry2 *best;
u64 xfeatures;
/* /*
* The existing code assumes virtual address is 48-bit or 57-bit in the * The existing code assumes virtual address is 48-bit or 57-bit in the
...@@ -89,7 +96,20 @@ static int kvm_check_cpuid(struct kvm_cpuid_entry2 *entries, int nent) ...@@ -89,7 +96,20 @@ static int kvm_check_cpuid(struct kvm_cpuid_entry2 *entries, int nent)
return -EINVAL; return -EINVAL;
} }
return 0; /*
* Exposing dynamic xfeatures to the guest requires additional
* enabling in the FPU, e.g. to expand the guest XSAVE state size.
*/
best = cpuid_entry2_find(entries, nent, 0xd, 0);
if (!best)
return 0;
xfeatures = best->eax | ((u64)best->edx << 32);
xfeatures &= XFEATURE_MASK_USER_DYNAMIC;
if (!xfeatures)
return 0;
return fpu_enable_guest_xfd_features(&vcpu->arch.guest_fpu, xfeatures);
} }
void kvm_update_pv_runtime(struct kvm_vcpu *vcpu) void kvm_update_pv_runtime(struct kvm_vcpu *vcpu)
...@@ -275,7 +295,7 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu, ...@@ -275,7 +295,7 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
e2[i].padding[2] = 0; e2[i].padding[2] = 0;
} }
r = kvm_check_cpuid(e2, cpuid->nent); r = kvm_check_cpuid(vcpu, e2, cpuid->nent);
if (r) { if (r) {
kvfree(e2); kvfree(e2);
goto out_free_cpuid; goto out_free_cpuid;
...@@ -311,7 +331,7 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu, ...@@ -311,7 +331,7 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
return PTR_ERR(e2); return PTR_ERR(e2);
} }
r = kvm_check_cpuid(e2, cpuid->nent); r = kvm_check_cpuid(vcpu, e2, cpuid->nent);
if (r) { if (r) {
kvfree(e2); kvfree(e2);
return r; return r;
...@@ -388,9 +408,11 @@ void kvm_set_cpu_caps(void) ...@@ -388,9 +408,11 @@ void kvm_set_cpu_caps(void)
#ifdef CONFIG_X86_64 #ifdef CONFIG_X86_64
unsigned int f_gbpages = F(GBPAGES); unsigned int f_gbpages = F(GBPAGES);
unsigned int f_lm = F(LM); unsigned int f_lm = F(LM);
unsigned int f_xfd = F(XFD);
#else #else
unsigned int f_gbpages = 0; unsigned int f_gbpages = 0;
unsigned int f_lm = 0; unsigned int f_lm = 0;
unsigned int f_xfd = 0;
#endif #endif
memset(kvm_cpu_caps, 0, sizeof(kvm_cpu_caps)); memset(kvm_cpu_caps, 0, sizeof(kvm_cpu_caps));
...@@ -458,7 +480,8 @@ void kvm_set_cpu_caps(void) ...@@ -458,7 +480,8 @@ void kvm_set_cpu_caps(void)
F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) | F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) | F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
F(MD_CLEAR) | F(AVX512_VP2INTERSECT) | F(FSRM) | F(MD_CLEAR) | F(AVX512_VP2INTERSECT) | F(FSRM) |
F(SERIALIZE) | F(TSXLDTRK) | F(AVX512_FP16) F(SERIALIZE) | F(TSXLDTRK) | F(AVX512_FP16) |
F(AMX_TILE) | F(AMX_INT8) | F(AMX_BF16)
); );
/* TSC_ADJUST and ARCH_CAPABILITIES are emulated in software. */ /* TSC_ADJUST and ARCH_CAPABILITIES are emulated in software. */
...@@ -477,7 +500,7 @@ void kvm_set_cpu_caps(void) ...@@ -477,7 +500,7 @@ void kvm_set_cpu_caps(void)
); );
kvm_cpu_cap_mask(CPUID_D_1_EAX, kvm_cpu_cap_mask(CPUID_D_1_EAX,
F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1) | F(XSAVES) F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1) | F(XSAVES) | f_xfd
); );
kvm_cpu_cap_init_scattered(CPUID_12_EAX, kvm_cpu_cap_init_scattered(CPUID_12_EAX,
...@@ -583,6 +606,8 @@ static struct kvm_cpuid_entry2 *do_host_cpuid(struct kvm_cpuid_array *array, ...@@ -583,6 +606,8 @@ static struct kvm_cpuid_entry2 *do_host_cpuid(struct kvm_cpuid_array *array,
case 0x14: case 0x14:
case 0x17: case 0x17:
case 0x18: case 0x18:
case 0x1d:
case 0x1e:
case 0x1f: case 0x1f:
case 0x8000001d: case 0x8000001d:
entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
...@@ -761,12 +786,15 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function) ...@@ -761,12 +786,15 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
goto out; goto out;
} }
break; break;
case 0xd: case 0xd: {
entry->eax &= supported_xcr0; u64 permitted_xcr0 = supported_xcr0 & xstate_get_guest_group_perm();
entry->ebx = xstate_required_size(supported_xcr0, false); u64 permitted_xss = supported_xss;
entry->eax &= permitted_xcr0;
entry->ebx = xstate_required_size(permitted_xcr0, false);
entry->ecx = entry->ebx; entry->ecx = entry->ebx;
entry->edx &= supported_xcr0 >> 32; entry->edx &= permitted_xcr0 >> 32;
if (!supported_xcr0) if (!permitted_xcr0)
break; break;
entry = do_host_cpuid(array, function, 1); entry = do_host_cpuid(array, function, 1);
...@@ -775,20 +803,20 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function) ...@@ -775,20 +803,20 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
cpuid_entry_override(entry, CPUID_D_1_EAX); cpuid_entry_override(entry, CPUID_D_1_EAX);
if (entry->eax & (F(XSAVES)|F(XSAVEC))) if (entry->eax & (F(XSAVES)|F(XSAVEC)))
entry->ebx = xstate_required_size(supported_xcr0 | supported_xss, entry->ebx = xstate_required_size(permitted_xcr0 | permitted_xss,
true); true);
else { else {
WARN_ON_ONCE(supported_xss != 0); WARN_ON_ONCE(permitted_xss != 0);
entry->ebx = 0; entry->ebx = 0;
} }
entry->ecx &= supported_xss; entry->ecx &= permitted_xss;
entry->edx &= supported_xss >> 32; entry->edx &= permitted_xss >> 32;
for (i = 2; i < 64; ++i) { for (i = 2; i < 64; ++i) {
bool s_state; bool s_state;
if (supported_xcr0 & BIT_ULL(i)) if (permitted_xcr0 & BIT_ULL(i))
s_state = false; s_state = false;
else if (supported_xss & BIT_ULL(i)) else if (permitted_xss & BIT_ULL(i))
s_state = true; s_state = true;
else else
continue; continue;
...@@ -802,16 +830,20 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function) ...@@ -802,16 +830,20 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
* invalid sub-leafs. Only valid sub-leafs should * invalid sub-leafs. Only valid sub-leafs should
* reach this point, and they should have a non-zero * reach this point, and they should have a non-zero
* save state size. Furthermore, check whether the * save state size. Furthermore, check whether the
* processor agrees with supported_xcr0/supported_xss * processor agrees with permitted_xcr0/permitted_xss
* on whether this is an XCR0- or IA32_XSS-managed area. * on whether this is an XCR0- or IA32_XSS-managed area.
*/ */
if (WARN_ON_ONCE(!entry->eax || (entry->ecx & 0x1) != s_state)) { if (WARN_ON_ONCE(!entry->eax || (entry->ecx & 0x1) != s_state)) {
--array->nent; --array->nent;
continue; continue;
} }
if (!kvm_cpu_cap_has(X86_FEATURE_XFD))
entry->ecx &= ~BIT_ULL(2);
entry->edx = 0; entry->edx = 0;
} }
break; break;
}
case 0x12: case 0x12:
/* Intel SGX */ /* Intel SGX */
if (!kvm_cpu_cap_has(X86_FEATURE_SGX)) { if (!kvm_cpu_cap_has(X86_FEATURE_SGX)) {
...@@ -856,6 +888,24 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function) ...@@ -856,6 +888,24 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
goto out; goto out;
} }
break; break;
/* Intel AMX TILE */
case 0x1d:
if (!kvm_cpu_cap_has(X86_FEATURE_AMX_TILE)) {
entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
break;
}
for (i = 1, max_idx = entry->eax; i <= max_idx; ++i) {
if (!do_host_cpuid(array, function, i))
goto out;
}
break;
case 0x1e: /* TMUL information */
if (!kvm_cpu_cap_has(X86_FEATURE_AMX_TILE)) {
entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
break;
}
break;
case KVM_CPUID_SIGNATURE: { case KVM_CPUID_SIGNATURE: {
static const char signature[12] = "KVMKVMKVM\0\0"; static const char signature[12] = "KVMKVMKVM\0\0";
const u32 *sigptr = (const u32 *)signature; const u32 *sigptr = (const u32 *)signature;
......
...@@ -47,6 +47,8 @@ int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu *vcpu, ...@@ -47,6 +47,8 @@ int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu *vcpu,
bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx, bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
u32 *ecx, u32 *edx, bool exact_only); u32 *ecx, u32 *edx, bool exact_only);
u32 xstate_required_size(u64 xstate_bv, bool compacted);
int cpuid_query_maxphyaddr(struct kvm_vcpu *vcpu); int cpuid_query_maxphyaddr(struct kvm_vcpu *vcpu);
static inline int cpuid_maxphyaddr(struct kvm_vcpu *vcpu) static inline int cpuid_maxphyaddr(struct kvm_vcpu *vcpu)
......
...@@ -133,6 +133,11 @@ static inline bool is_machine_check(u32 intr_info) ...@@ -133,6 +133,11 @@ static inline bool is_machine_check(u32 intr_info)
return is_exception_n(intr_info, MC_VECTOR); return is_exception_n(intr_info, MC_VECTOR);
} }
static inline bool is_nm_fault(u32 intr_info)
{
return is_exception_n(intr_info, NM_VECTOR);
}
/* Undocumented: icebp/int1 */ /* Undocumented: icebp/int1 */
static inline bool is_icebp(u32 intr_info) static inline bool is_icebp(u32 intr_info)
{ {
......
...@@ -36,6 +36,7 @@ ...@@ -36,6 +36,7 @@
#include <asm/debugreg.h> #include <asm/debugreg.h>
#include <asm/desc.h> #include <asm/desc.h>
#include <asm/fpu/api.h> #include <asm/fpu/api.h>
#include <asm/fpu/xstate.h>
#include <asm/idtentry.h> #include <asm/idtentry.h>
#include <asm/io.h> #include <asm/io.h>
#include <asm/irq_remapping.h> #include <asm/irq_remapping.h>
...@@ -165,6 +166,8 @@ static u32 vmx_possible_passthrough_msrs[MAX_POSSIBLE_PASSTHROUGH_MSRS] = { ...@@ -165,6 +166,8 @@ static u32 vmx_possible_passthrough_msrs[MAX_POSSIBLE_PASSTHROUGH_MSRS] = {
MSR_FS_BASE, MSR_FS_BASE,
MSR_GS_BASE, MSR_GS_BASE,
MSR_KERNEL_GS_BASE, MSR_KERNEL_GS_BASE,
MSR_IA32_XFD,
MSR_IA32_XFD_ERR,
#endif #endif
MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_CS,
MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_ESP,
...@@ -922,6 +925,14 @@ void update_exception_bitmap(struct kvm_vcpu *vcpu) ...@@ -922,6 +925,14 @@ void update_exception_bitmap(struct kvm_vcpu *vcpu)
vmcs_write32(PAGE_FAULT_ERROR_CODE_MATCH, mask); vmcs_write32(PAGE_FAULT_ERROR_CODE_MATCH, mask);
} }
/*
* Disabling xfd interception indicates that dynamic xfeatures
* might be used in the guest. Always trap #NM in this case
* to save guest xfd_err timely.
*/
if (vcpu->arch.xfd_no_write_intercept)
eb |= (1u << NM_VECTOR);
vmcs_write32(EXCEPTION_BITMAP, eb); vmcs_write32(EXCEPTION_BITMAP, eb);
} }
...@@ -2130,6 +2141,24 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) ...@@ -2130,6 +2141,24 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_KERNEL_GS_BASE: case MSR_KERNEL_GS_BASE:
vmx_write_guest_kernel_gs_base(vmx, data); vmx_write_guest_kernel_gs_base(vmx, data);
break; break;
case MSR_IA32_XFD:
ret = kvm_set_msr_common(vcpu, msr_info);
/*
* Always intercepting WRMSR could incur non-negligible
* overhead given xfd might be changed frequently in
* guest context switch. Disable write interception
* upon the first write with a non-zero value (indicating
* potential usage on dynamic xfeatures). Also update
* exception bitmap to trap #NM for proper virtualization
* of guest xfd_err.
*/
if (!ret && data) {
vmx_disable_intercept_for_msr(vcpu, MSR_IA32_XFD,
MSR_TYPE_RW);
vcpu->arch.xfd_no_write_intercept = true;
update_exception_bitmap(vcpu);
}
break;
#endif #endif
case MSR_IA32_SYSENTER_CS: case MSR_IA32_SYSENTER_CS:
if (is_guest_mode(vcpu)) if (is_guest_mode(vcpu))
...@@ -5076,6 +5105,17 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu) ...@@ -5076,6 +5105,17 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
if (is_machine_check(intr_info) || is_nmi(intr_info)) if (is_machine_check(intr_info) || is_nmi(intr_info))
return 1; /* handled by handle_exception_nmi_irqoff() */ return 1; /* handled by handle_exception_nmi_irqoff() */
/*
* Queue the exception here instead of in handle_nm_fault_irqoff().
* This ensures the nested_vmx check is not skipped so vmexit can
* be reflected to L1 (when it intercepts #NM) before reaching this
* point.
*/
if (is_nm_fault(intr_info)) {
kvm_queue_exception(vcpu, NM_VECTOR);
return 1;
}
if (is_invalid_opcode(intr_info)) if (is_invalid_opcode(intr_info))
return handle_ud(vcpu); return handle_ud(vcpu);
...@@ -6759,6 +6799,26 @@ static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu, ...@@ -6759,6 +6799,26 @@ static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu,
kvm_after_interrupt(vcpu); kvm_after_interrupt(vcpu);
} }
static void handle_nm_fault_irqoff(struct kvm_vcpu *vcpu)
{
/*
* Save xfd_err to guest_fpu before interrupt is enabled, so the
* MSR value is not clobbered by the host activity before the guest
* has chance to consume it.
*
* Do not blindly read xfd_err here, since this exception might
* be caused by L1 interception on a platform which doesn't
* support xfd at all.
*
* Do it conditionally upon guest_fpu::xfd. xfd_err matters
* only when xfd contains a non-zero value.
*
* Queuing exception is done in vmx_handle_exit. See comment there.
*/
if (vcpu->arch.guest_fpu.fpstate->xfd)
rdmsrl(MSR_IA32_XFD_ERR, vcpu->arch.guest_fpu.xfd_err);
}
static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx) static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
{ {
const unsigned long nmi_entry = (unsigned long)asm_exc_nmi_noist; const unsigned long nmi_entry = (unsigned long)asm_exc_nmi_noist;
...@@ -6767,6 +6827,9 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx) ...@@ -6767,6 +6827,9 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
/* if exit due to PF check for async PF */ /* if exit due to PF check for async PF */
if (is_page_fault(intr_info)) if (is_page_fault(intr_info))
vmx->vcpu.arch.apf.host_apf_flags = kvm_read_and_reset_apf_flags(); vmx->vcpu.arch.apf.host_apf_flags = kvm_read_and_reset_apf_flags();
/* if exit due to NM, handle before interrupts are enabled */
else if (is_nm_fault(intr_info))
handle_nm_fault_irqoff(&vmx->vcpu);
/* Handle machine checks before interrupts are enabled */ /* Handle machine checks before interrupts are enabled */
else if (is_machine_check(intr_info)) else if (is_machine_check(intr_info))
kvm_machine_check(); kvm_machine_check();
...@@ -7690,6 +7753,11 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) ...@@ -7690,6 +7753,11 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
} }
} }
if (kvm_cpu_cap_has(X86_FEATURE_XFD))
vmx_set_intercept_for_msr(vcpu, MSR_IA32_XFD_ERR, MSR_TYPE_R,
!guest_cpuid_has(vcpu, X86_FEATURE_XFD));
set_cr4_guest_host_mask(vmx); set_cr4_guest_host_mask(vmx);
vmx_write_encls_bitmap(vcpu, NULL); vmx_write_encls_bitmap(vcpu, NULL);
......
...@@ -336,7 +336,7 @@ struct vcpu_vmx { ...@@ -336,7 +336,7 @@ struct vcpu_vmx {
struct lbr_desc lbr_desc; struct lbr_desc lbr_desc;
/* Save desired MSR intercept (read: pass-through) state */ /* Save desired MSR intercept (read: pass-through) state */
#define MAX_POSSIBLE_PASSTHROUGH_MSRS 13 #define MAX_POSSIBLE_PASSTHROUGH_MSRS 15
struct { struct {
DECLARE_BITMAP(read, MAX_POSSIBLE_PASSTHROUGH_MSRS); DECLARE_BITMAP(read, MAX_POSSIBLE_PASSTHROUGH_MSRS);
DECLARE_BITMAP(write, MAX_POSSIBLE_PASSTHROUGH_MSRS); DECLARE_BITMAP(write, MAX_POSSIBLE_PASSTHROUGH_MSRS);
......
...@@ -88,6 +88,8 @@ ...@@ -88,6 +88,8 @@
u64 __read_mostly kvm_mce_cap_supported = MCG_CTL_P | MCG_SER_P; u64 __read_mostly kvm_mce_cap_supported = MCG_CTL_P | MCG_SER_P;
EXPORT_SYMBOL_GPL(kvm_mce_cap_supported); EXPORT_SYMBOL_GPL(kvm_mce_cap_supported);
#define ERR_PTR_USR(e) ((void __user *)ERR_PTR(e))
#define emul_to_vcpu(ctxt) \ #define emul_to_vcpu(ctxt) \
((struct kvm_vcpu *)(ctxt)->vcpu) ((struct kvm_vcpu *)(ctxt)->vcpu)
...@@ -199,7 +201,7 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs; ...@@ -199,7 +201,7 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
#define KVM_SUPPORTED_XCR0 (XFEATURE_MASK_FP | XFEATURE_MASK_SSE \ #define KVM_SUPPORTED_XCR0 (XFEATURE_MASK_FP | XFEATURE_MASK_SSE \
| XFEATURE_MASK_YMM | XFEATURE_MASK_BNDREGS \ | XFEATURE_MASK_YMM | XFEATURE_MASK_BNDREGS \
| XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \ | XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
| XFEATURE_MASK_PKRU) | XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE)
u64 __read_mostly host_efer; u64 __read_mostly host_efer;
EXPORT_SYMBOL_GPL(host_efer); EXPORT_SYMBOL_GPL(host_efer);
...@@ -1020,6 +1022,11 @@ static int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr) ...@@ -1020,6 +1022,11 @@ static int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr)
if ((xcr0 & XFEATURE_MASK_AVX512) != XFEATURE_MASK_AVX512) if ((xcr0 & XFEATURE_MASK_AVX512) != XFEATURE_MASK_AVX512)
return 1; return 1;
} }
if ((xcr0 & XFEATURE_MASK_XTILE) &&
((xcr0 & XFEATURE_MASK_XTILE) != XFEATURE_MASK_XTILE))
return 1;
vcpu->arch.xcr0 = xcr0; vcpu->arch.xcr0 = xcr0;
if ((xcr0 ^ old_xcr0) & XFEATURE_MASK_EXTEND) if ((xcr0 ^ old_xcr0) & XFEATURE_MASK_EXTEND)
...@@ -1322,6 +1329,7 @@ static const u32 msrs_to_save_all[] = { ...@@ -1322,6 +1329,7 @@ static const u32 msrs_to_save_all[] = {
MSR_F15H_PERF_CTL3, MSR_F15H_PERF_CTL4, MSR_F15H_PERF_CTL5, MSR_F15H_PERF_CTL3, MSR_F15H_PERF_CTL4, MSR_F15H_PERF_CTL5,
MSR_F15H_PERF_CTR0, MSR_F15H_PERF_CTR1, MSR_F15H_PERF_CTR2, MSR_F15H_PERF_CTR0, MSR_F15H_PERF_CTR1, MSR_F15H_PERF_CTR2,
MSR_F15H_PERF_CTR3, MSR_F15H_PERF_CTR4, MSR_F15H_PERF_CTR5, MSR_F15H_PERF_CTR3, MSR_F15H_PERF_CTR4, MSR_F15H_PERF_CTR5,
MSR_IA32_XFD, MSR_IA32_XFD_ERR,
}; };
static u32 msrs_to_save[ARRAY_SIZE(msrs_to_save_all)]; static u32 msrs_to_save[ARRAY_SIZE(msrs_to_save_all)];
...@@ -3415,6 +3423,30 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) ...@@ -3415,6 +3423,30 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1; return 1;
vcpu->arch.msr_misc_features_enables = data; vcpu->arch.msr_misc_features_enables = data;
break; break;
#ifdef CONFIG_X86_64
case MSR_IA32_XFD:
if (!msr_info->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_XFD))
return 1;
if (data & ~(XFEATURE_MASK_USER_DYNAMIC &
vcpu->arch.guest_supported_xcr0))
return 1;
fpu_update_guest_xfd(&vcpu->arch.guest_fpu, data);
break;
case MSR_IA32_XFD_ERR:
if (!msr_info->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_XFD))
return 1;
if (data & ~(XFEATURE_MASK_USER_DYNAMIC &
vcpu->arch.guest_supported_xcr0))
return 1;
vcpu->arch.guest_fpu.xfd_err = data;
break;
#endif
default: default:
if (msr && (msr == vcpu->kvm->arch.xen_hvm_config.msr)) if (msr && (msr == vcpu->kvm->arch.xen_hvm_config.msr))
return xen_hvm_config(vcpu, data); return xen_hvm_config(vcpu, data);
...@@ -3724,6 +3756,22 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) ...@@ -3724,6 +3756,22 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_K7_HWCR: case MSR_K7_HWCR:
msr_info->data = vcpu->arch.msr_hwcr; msr_info->data = vcpu->arch.msr_hwcr;
break; break;
#ifdef CONFIG_X86_64
case MSR_IA32_XFD:
if (!msr_info->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_XFD))
return 1;
msr_info->data = vcpu->arch.guest_fpu.fpstate->xfd;
break;
case MSR_IA32_XFD_ERR:
if (!msr_info->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_XFD))
return 1;
msr_info->data = vcpu->arch.guest_fpu.xfd_err;
break;
#endif
default: default:
if (kvm_pmu_is_valid_msr(vcpu, msr_info->index)) if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
return kvm_pmu_get_msr(vcpu, msr_info); return kvm_pmu_get_msr(vcpu, msr_info);
...@@ -3870,6 +3918,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) ...@@ -3870,6 +3918,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
#ifdef CONFIG_X86_SGX_KVM #ifdef CONFIG_X86_SGX_KVM
case KVM_CAP_SGX_ATTRIBUTE: case KVM_CAP_SGX_ATTRIBUTE:
#endif #endif
case KVM_CAP_SYS_ATTRIBUTES:
r = 1; r = 1;
break; break;
case KVM_CAP_SYNC_REGS: case KVM_CAP_SYNC_REGS:
...@@ -3945,6 +3994,14 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) ...@@ -3945,6 +3994,14 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
else else
r = 0; r = 0;
break; break;
case KVM_CAP_XSAVE2: {
u64 guest_perm = xstate_get_guest_group_perm();
r = xstate_required_size(supported_xcr0 & guest_perm, false);
if (r < sizeof(struct kvm_xsave))
r = sizeof(struct kvm_xsave);
break;
}
case KVM_CAP_X86_NOTIFY_VMEXIT: case KVM_CAP_X86_NOTIFY_VMEXIT:
r = kvm_has_notify_vmexit; r = kvm_has_notify_vmexit;
break; break;
...@@ -3952,7 +4009,49 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) ...@@ -3952,7 +4009,49 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
break; break;
} }
return r; return r;
}
static inline void __user *kvm_get_attr_addr(struct kvm_device_attr *attr)
{
void __user *uaddr = (void __user*)(unsigned long)attr->addr;
if ((u64)(unsigned long)uaddr != attr->addr)
return ERR_PTR_USR(-EFAULT);
return uaddr;
}
static int kvm_x86_dev_get_attr(struct kvm_device_attr *attr)
{
u64 __user *uaddr = kvm_get_attr_addr(attr);
if (attr->group)
return -ENXIO;
if (IS_ERR(uaddr))
return PTR_ERR(uaddr);
switch (attr->attr) {
case KVM_X86_XCOMP_GUEST_SUPP:
if (put_user(supported_xcr0, uaddr))
return -EFAULT;
return 0;
default:
return -ENXIO;
break;
}
}
static int kvm_x86_dev_has_attr(struct kvm_device_attr *attr)
{
if (attr->group)
return -ENXIO;
switch (attr->attr) {
case KVM_X86_XCOMP_GUEST_SUPP:
return 0;
default:
return -ENXIO;
}
} }
long kvm_arch_dev_ioctl(struct file *filp, long kvm_arch_dev_ioctl(struct file *filp,
...@@ -4040,6 +4139,22 @@ long kvm_arch_dev_ioctl(struct file *filp, ...@@ -4040,6 +4139,22 @@ long kvm_arch_dev_ioctl(struct file *filp,
case KVM_GET_MSRS: case KVM_GET_MSRS:
r = msr_io(NULL, argp, do_get_msr_feature, 1); r = msr_io(NULL, argp, do_get_msr_feature, 1);
break; break;
case KVM_GET_DEVICE_ATTR: {
struct kvm_device_attr attr;
r = -EFAULT;
if (copy_from_user(&attr, (void __user *)arg, sizeof(attr)))
break;
r = kvm_x86_dev_get_attr(&attr);
break;
}
case KVM_HAS_DEVICE_ATTR: {
struct kvm_device_attr attr;
r = -EFAULT;
if (copy_from_user(&attr, (void __user *)arg, sizeof(attr)))
break;
r = kvm_x86_dev_has_attr(&attr);
break;
}
default: default:
r = -EINVAL; r = -EINVAL;
break; break;
...@@ -4588,6 +4703,16 @@ static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu, ...@@ -4588,6 +4703,16 @@ static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
vcpu->arch.pkru); vcpu->arch.pkru);
} }
static void kvm_vcpu_ioctl_x86_get_xsave2(struct kvm_vcpu *vcpu,
u8 *state, unsigned int size)
{
if (fpstate_is_confidential(&vcpu->arch.guest_fpu))
return;
fpu_copy_guest_fpstate_to_uabi(&vcpu->arch.guest_fpu,
state, size, vcpu->arch.pkru);
}
static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu, static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu,
struct kvm_xsave *guest_xsave) struct kvm_xsave *guest_xsave)
{ {
...@@ -4907,6 +5032,10 @@ long kvm_arch_vcpu_ioctl(struct file *filp, ...@@ -4907,6 +5032,10 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
break; break;
} }
case KVM_GET_XSAVE: { case KVM_GET_XSAVE: {
r = -EINVAL;
if (vcpu->arch.guest_fpu.uabi_size > sizeof(struct kvm_xsave))
break;
u.xsave = kzalloc(sizeof(struct kvm_xsave), GFP_KERNEL_ACCOUNT); u.xsave = kzalloc(sizeof(struct kvm_xsave), GFP_KERNEL_ACCOUNT);
r = -ENOMEM; r = -ENOMEM;
if (!u.xsave) if (!u.xsave)
...@@ -4921,7 +5050,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp, ...@@ -4921,7 +5050,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
break; break;
} }
case KVM_SET_XSAVE: { case KVM_SET_XSAVE: {
u.xsave = memdup_user(argp, sizeof(*u.xsave)); int size = vcpu->arch.guest_fpu.uabi_size;
u.xsave = memdup_user(argp, size);
if (IS_ERR(u.xsave)) { if (IS_ERR(u.xsave)) {
r = PTR_ERR(u.xsave); r = PTR_ERR(u.xsave);
goto out_nofree; goto out_nofree;
...@@ -4930,6 +5061,25 @@ long kvm_arch_vcpu_ioctl(struct file *filp, ...@@ -4930,6 +5061,25 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
r = kvm_vcpu_ioctl_x86_set_xsave(vcpu, u.xsave); r = kvm_vcpu_ioctl_x86_set_xsave(vcpu, u.xsave);
break; break;
} }
case KVM_GET_XSAVE2: {
int size = vcpu->arch.guest_fpu.uabi_size;
u.xsave = kzalloc(size, GFP_KERNEL_ACCOUNT);
r = -ENOMEM;
if (!u.xsave)
break;
kvm_vcpu_ioctl_x86_get_xsave2(vcpu, u.buffer, size);
r = -EFAULT;
if (copy_to_user(argp, u.xsave, size))
break;
r = 0;
break;
}
case KVM_GET_XCRS: { case KVM_GET_XCRS: {
u.xcrs = kzalloc(sizeof(struct kvm_xcrs), GFP_KERNEL_ACCOUNT); u.xcrs = kzalloc(sizeof(struct kvm_xcrs), GFP_KERNEL_ACCOUNT);
r = -ENOMEM; r = -ENOMEM;
...@@ -5923,6 +6073,11 @@ static void kvm_init_msr_list(void) ...@@ -5923,6 +6073,11 @@ static void kvm_init_msr_list(void)
min(INTEL_PMC_MAX_GENERIC, x86_pmu.num_counters_gp)) min(INTEL_PMC_MAX_GENERIC, x86_pmu.num_counters_gp))
continue; continue;
break; break;
case MSR_IA32_XFD:
case MSR_IA32_XFD_ERR:
if (!kvm_cpu_cap_has(X86_FEATURE_XFD))
continue;
break;
default: default:
break; break;
} }
...@@ -9158,6 +9313,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) ...@@ -9158,6 +9313,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
if (test_thread_flag(TIF_NEED_FPU_LOAD)) if (test_thread_flag(TIF_NEED_FPU_LOAD))
switch_fpu_return(); switch_fpu_return();
if (vcpu->arch.guest_fpu.xfd_err)
wrmsrl(MSR_IA32_XFD_ERR, vcpu->arch.guest_fpu.xfd_err);
if (unlikely(vcpu->arch.switch_db_regs)) { if (unlikely(vcpu->arch.switch_db_regs)) {
set_debugreg(0, 7); set_debugreg(0, 7);
set_debugreg(vcpu->arch.eff_db[0], 0); set_debugreg(vcpu->arch.eff_db[0], 0);
...@@ -9202,8 +9360,19 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) ...@@ -9202,8 +9360,19 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
vcpu->mode = OUTSIDE_GUEST_MODE; vcpu->mode = OUTSIDE_GUEST_MODE;
smp_wmb(); smp_wmb();
/*
* Sync xfd before calling handle_exit_irqoff() which may
* rely on the fact that guest_fpu::xfd is up-to-date (e.g.
* in #NM irqoff handler).
*/
if (vcpu->arch.xfd_no_write_intercept)
fpu_sync_guest_vmexit_xfd_state();
kvm_x86_ops.handle_exit_irqoff(vcpu); kvm_x86_ops.handle_exit_irqoff(vcpu);
if (vcpu->arch.guest_fpu.xfd_err)
wrmsrl(MSR_IA32_XFD_ERR, 0);
/* /*
* Consume any pending interrupts, including the possible source of * Consume any pending interrupts, including the possible source of
* VM-Exit on SVM and any ticks that occur between VM-Exit and now. * VM-Exit on SVM and any ticks that occur between VM-Exit and now.
......
...@@ -1070,6 +1070,8 @@ struct kvm_ppc_resize_hpt { ...@@ -1070,6 +1070,8 @@ struct kvm_ppc_resize_hpt {
#define KVM_CAP_ENFORCE_PV_FEATURE_CPUID 190 #define KVM_CAP_ENFORCE_PV_FEATURE_CPUID 190
#define KVM_CAP_X86_BUS_LOCK_EXIT 193 #define KVM_CAP_X86_BUS_LOCK_EXIT 193
#define KVM_CAP_SGX_ATTRIBUTE 196 #define KVM_CAP_SGX_ATTRIBUTE 196
#define KVM_CAP_XSAVE2 208
#define KVM_CAP_SYS_ATTRIBUTES 209
#define KVM_CAP_X86_TRIPLE_FAULT_EVENT 218 #define KVM_CAP_X86_TRIPLE_FAULT_EVENT 218
#define KVM_CAP_X86_NOTIFY_VMEXIT 219 #define KVM_CAP_X86_NOTIFY_VMEXIT 219
...@@ -1558,6 +1560,9 @@ struct kvm_enc_region { ...@@ -1558,6 +1560,9 @@ struct kvm_enc_region {
#define KVM_S390_NORMAL_RESET _IO(KVMIO, 0xc3) #define KVM_S390_NORMAL_RESET _IO(KVMIO, 0xc3)
#define KVM_S390_CLEAR_RESET _IO(KVMIO, 0xc4) #define KVM_S390_CLEAR_RESET _IO(KVMIO, 0xc4)
/* Available with KVM_CAP_XSAVE2 */
#define KVM_GET_XSAVE2 _IOR(KVMIO, 0xcf, struct kvm_xsave)
struct kvm_s390_pv_sec_parm { struct kvm_s390_pv_sec_parm {
__u64 origin; __u64 origin;
__u64 length; __u64 length;
......
...@@ -659,6 +659,8 @@ ...@@ -659,6 +659,8 @@
#define MSR_IA32_BNDCFGS_RSVD 0x00000ffc #define MSR_IA32_BNDCFGS_RSVD 0x00000ffc
#define MSR_IA32_XFD 0x000001c4
#define MSR_IA32_XFD_ERR 0x000001c5
#define MSR_IA32_XSS 0x00000da0 #define MSR_IA32_XSS 0x00000da0
#define MSR_IA32_APICBASE 0x0000001b #define MSR_IA32_APICBASE 0x0000001b
......
...@@ -358,9 +358,23 @@ struct kvm_debugregs { ...@@ -358,9 +358,23 @@ struct kvm_debugregs {
__u64 reserved[9]; __u64 reserved[9];
}; };
/* for KVM_CAP_XSAVE */ /* for KVM_CAP_XSAVE and KVM_CAP_XSAVE2 */
struct kvm_xsave { struct kvm_xsave {
/*
* KVM_GET_XSAVE2 and KVM_SET_XSAVE write and read as many bytes
* as are returned by KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2)
* respectively, when invoked on the vm file descriptor.
*
* The size value returned by KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2)
* will always be at least 4096. Currently, it is only greater
* than 4096 if a dynamic feature has been enabled with
* ``arch_prctl()``, but this may change in the future.
*
* The offsets of the state save areas in struct kvm_xsave follow
* the contents of CPUID leaf 0xD on the host.
*/
__u32 region[1024]; __u32 region[1024];
__u32 extra[0];
}; };
#define KVM_MAX_XCRS 16 #define KVM_MAX_XCRS 16
...@@ -423,6 +437,9 @@ struct kvm_sync_regs { ...@@ -423,6 +437,9 @@ struct kvm_sync_regs {
#define KVM_STATE_VMX_PREEMPTION_TIMER_DEADLINE 0x00000001 #define KVM_STATE_VMX_PREEMPTION_TIMER_DEADLINE 0x00000001
/* attributes for system fd (group 0) */
#define KVM_X86_XCOMP_GUEST_SUPP 0
struct kvm_vmx_nested_state_data { struct kvm_vmx_nested_state_data {
__u8 vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE]; __u8 vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE];
__u8 shadow_vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE]; __u8 shadow_vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE];
......
...@@ -2,16 +2,22 @@ ...@@ -2,16 +2,22 @@
#ifndef _ASM_X86_PRCTL_H #ifndef _ASM_X86_PRCTL_H
#define _ASM_X86_PRCTL_H #define _ASM_X86_PRCTL_H
#define ARCH_SET_GS 0x1001 #define ARCH_SET_GS 0x1001
#define ARCH_SET_FS 0x1002 #define ARCH_SET_FS 0x1002
#define ARCH_GET_FS 0x1003 #define ARCH_GET_FS 0x1003
#define ARCH_GET_GS 0x1004 #define ARCH_GET_GS 0x1004
#define ARCH_GET_CPUID 0x1011 #define ARCH_GET_CPUID 0x1011
#define ARCH_SET_CPUID 0x1012 #define ARCH_SET_CPUID 0x1012
#define ARCH_MAP_VDSO_X32 0x2001 #define ARCH_GET_XCOMP_SUPP 0x1021
#define ARCH_MAP_VDSO_32 0x2002 #define ARCH_GET_XCOMP_PERM 0x1022
#define ARCH_MAP_VDSO_64 0x2003 #define ARCH_REQ_XCOMP_PERM 0x1023
#define ARCH_GET_XCOMP_GUEST_PERM 0x1024
#define ARCH_REQ_XCOMP_GUEST_PERM 0x1025
#define ARCH_MAP_VDSO_X32 0x2001
#define ARCH_MAP_VDSO_32 0x2002
#define ARCH_MAP_VDSO_64 0x2003
#endif /* _ASM_X86_PRCTL_H */ #endif /* _ASM_X86_PRCTL_H */
...@@ -1053,6 +1053,8 @@ struct kvm_ppc_resize_hpt { ...@@ -1053,6 +1053,8 @@ struct kvm_ppc_resize_hpt {
#define KVM_CAP_X86_USER_SPACE_MSR 188 #define KVM_CAP_X86_USER_SPACE_MSR 188
#define KVM_CAP_X86_MSR_FILTER 189 #define KVM_CAP_X86_MSR_FILTER 189
#define KVM_CAP_ENFORCE_PV_FEATURE_CPUID 190 #define KVM_CAP_ENFORCE_PV_FEATURE_CPUID 190
#define KVM_CAP_XSAVE2 207
#define KVM_CAP_SYS_ATTRIBUTES 209
#ifdef KVM_CAP_IRQ_ROUTING #ifdef KVM_CAP_IRQ_ROUTING
...@@ -1462,6 +1464,8 @@ struct kvm_s390_ucas_mapping { ...@@ -1462,6 +1464,8 @@ struct kvm_s390_ucas_mapping {
/* Available with KVM_CAP_XSAVE */ /* Available with KVM_CAP_XSAVE */
#define KVM_GET_XSAVE _IOR(KVMIO, 0xa4, struct kvm_xsave) #define KVM_GET_XSAVE _IOR(KVMIO, 0xa4, struct kvm_xsave)
#define KVM_SET_XSAVE _IOW(KVMIO, 0xa5, struct kvm_xsave) #define KVM_SET_XSAVE _IOW(KVMIO, 0xa5, struct kvm_xsave)
/* Available with KVM_CAP_XSAVE2 */
#define KVM_GET_XSAVE2 _IOR(KVMIO, 0xcf, struct kvm_xsave)
/* Available with KVM_CAP_XCRS */ /* Available with KVM_CAP_XCRS */
#define KVM_GET_XCRS _IOR(KVMIO, 0xa6, struct kvm_xcrs) #define KVM_GET_XCRS _IOR(KVMIO, 0xa6, struct kvm_xcrs)
#define KVM_SET_XCRS _IOW(KVMIO, 0xa7, struct kvm_xcrs) #define KVM_SET_XCRS _IOW(KVMIO, 0xa7, struct kvm_xcrs)
......
...@@ -61,6 +61,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/tsc_msrs_test ...@@ -61,6 +61,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/tsc_msrs_test
TEST_GEN_PROGS_x86_64 += x86_64/user_msr_test TEST_GEN_PROGS_x86_64 += x86_64/user_msr_test
TEST_GEN_PROGS_x86_64 += x86_64/max_vcpuid_cap_test TEST_GEN_PROGS_x86_64 += x86_64/max_vcpuid_cap_test
TEST_GEN_PROGS_x86_64 += x86_64/triple_fault_event_test TEST_GEN_PROGS_x86_64 += x86_64/triple_fault_event_test
TEST_GEN_PROGS_x86_64 += x86_64/amx_test
TEST_GEN_PROGS_x86_64 += demand_paging_test TEST_GEN_PROGS_x86_64 += demand_paging_test
TEST_GEN_PROGS_x86_64 += dirty_log_test TEST_GEN_PROGS_x86_64 += dirty_log_test
TEST_GEN_PROGS_x86_64 += dirty_log_perf_test TEST_GEN_PROGS_x86_64 += dirty_log_perf_test
......
...@@ -10,8 +10,10 @@ ...@@ -10,8 +10,10 @@
#include <assert.h> #include <assert.h>
#include <stdint.h> #include <stdint.h>
#include <syscall.h>
#include <asm/msr-index.h> #include <asm/msr-index.h>
#include <asm/prctl.h>
#define X86_EFLAGS_FIXED (1u << 1) #define X86_EFLAGS_FIXED (1u << 1)
...@@ -72,6 +74,21 @@ struct desc_ptr { ...@@ -72,6 +74,21 @@ struct desc_ptr {
uint64_t address; uint64_t address;
} __attribute__((packed)); } __attribute__((packed));
struct kvm_x86_state {
struct kvm_xsave *xsave;
struct kvm_vcpu_events events;
struct kvm_mp_state mp_state;
struct kvm_regs regs;
struct kvm_xcrs xcrs;
struct kvm_sregs sregs;
struct kvm_debugregs debugregs;
union {
struct kvm_nested_state nested;
char nested_[16384];
};
struct kvm_msrs msrs;
};
static inline uint64_t get_desc64_base(const struct desc64 *desc) static inline uint64_t get_desc64_base(const struct desc64 *desc)
{ {
return ((uint64_t)desc->base3 << 32) | return ((uint64_t)desc->base3 << 32) |
...@@ -315,10 +332,10 @@ static inline unsigned long get_xmm(int n) ...@@ -315,10 +332,10 @@ static inline unsigned long get_xmm(int n)
bool is_intel_cpu(void); bool is_intel_cpu(void);
struct kvm_x86_state;
struct kvm_x86_state *vcpu_save_state(struct kvm_vm *vm, uint32_t vcpuid); struct kvm_x86_state *vcpu_save_state(struct kvm_vm *vm, uint32_t vcpuid);
void vcpu_load_state(struct kvm_vm *vm, uint32_t vcpuid, void vcpu_load_state(struct kvm_vm *vm, uint32_t vcpuid,
struct kvm_x86_state *state); struct kvm_x86_state *state);
void kvm_x86_state_cleanup(struct kvm_x86_state *state);
struct kvm_msr_list *kvm_get_msr_index_list(void); struct kvm_msr_list *kvm_get_msr_index_list(void);
...@@ -374,6 +391,8 @@ bool set_cpuid(struct kvm_cpuid2 *cpuid, struct kvm_cpuid_entry2 *ent); ...@@ -374,6 +391,8 @@ bool set_cpuid(struct kvm_cpuid2 *cpuid, struct kvm_cpuid_entry2 *ent);
uint64_t kvm_hypercall(uint64_t nr, uint64_t a0, uint64_t a1, uint64_t a2, uint64_t kvm_hypercall(uint64_t nr, uint64_t a0, uint64_t a1, uint64_t a2,
uint64_t a3); uint64_t a3);
void vm_xsave_req_perm(int bit);
/* /*
* Basic CPU control in CR0 * Basic CPU control in CR0
*/ */
...@@ -419,4 +438,11 @@ uint64_t kvm_hypercall(uint64_t nr, uint64_t a0, uint64_t a1, uint64_t a2, ...@@ -419,4 +438,11 @@ uint64_t kvm_hypercall(uint64_t nr, uint64_t a0, uint64_t a1, uint64_t a2,
/* VMX_EPT_VPID_CAP bits */ /* VMX_EPT_VPID_CAP bits */
#define VMX_EPT_VPID_CAP_AD_BITS (1ULL << 21) #define VMX_EPT_VPID_CAP_AD_BITS (1ULL << 21)
#define XSTATE_XTILE_CFG_BIT 17
#define XSTATE_XTILE_DATA_BIT 18
#define XSTATE_XTILE_CFG_MASK (1ULL << XSTATE_XTILE_CFG_BIT)
#define XSTATE_XTILE_DATA_MASK (1ULL << XSTATE_XTILE_DATA_BIT)
#define XFEATURE_XTILE_MASK (XSTATE_XTILE_CFG_MASK | \
XSTATE_XTILE_DATA_MASK)
#endif /* SELFTEST_KVM_PROCESSOR_H */ #endif /* SELFTEST_KVM_PROCESSOR_H */
...@@ -66,15 +66,15 @@ int kvm_check_cap(long cap) ...@@ -66,15 +66,15 @@ int kvm_check_cap(long cap)
/* VM Check Capability /* VM Check Capability
* *
* Input Args: * Input Args:
* vm - Virtual Machine * vm - Virtual Machine
* cap - Capability * cap - Capability
* *
* Output Args: None * Output Args: None
* *
* Return: * Return:
* On success, the Value corresponding to the capability (KVM_CAP_*) * On success, the Value corresponding to the capability (KVM_CAP_*)
* specified by the value of cap. On failure a TEST_ASSERT failure * specified by the value of cap. On failure a TEST_ASSERT failure
* is produced. * is produced.
* *
* Looks up and returns the value corresponding to the capability * Looks up and returns the value corresponding to the capability
* (KVM_CAP_*) given by cap. * (KVM_CAP_*) given by cap.
...@@ -85,7 +85,7 @@ int vm_check_cap(struct kvm_vm *vm, long cap) ...@@ -85,7 +85,7 @@ int vm_check_cap(struct kvm_vm *vm, long cap)
ret = ioctl(vm->fd, KVM_CHECK_EXTENSION, cap); ret = ioctl(vm->fd, KVM_CHECK_EXTENSION, cap);
TEST_ASSERT(ret >= 0, "KVM_CHECK_EXTENSION VM IOCTL failed,\n" TEST_ASSERT(ret >= 0, "KVM_CHECK_EXTENSION VM IOCTL failed,\n"
" rc: %i errno: %i", ret, errno); " rc: %i errno: %i", ret, errno);
return ret; return ret;
} }
......
...@@ -580,6 +580,68 @@ static void vcpu_setup(struct kvm_vm *vm, int vcpuid, int pgd_memslot, int gdt_m ...@@ -580,6 +580,68 @@ static void vcpu_setup(struct kvm_vm *vm, int vcpuid, int pgd_memslot, int gdt_m
vcpu_sregs_set(vm, vcpuid, &sregs); vcpu_sregs_set(vm, vcpuid, &sregs);
} }
#define CPUID_XFD_BIT (1 << 4)
static bool is_xfd_supported(void)
{
int eax, ebx, ecx, edx;
const int leaf = 0xd, subleaf = 0x1;
__asm__ __volatile__(
"cpuid"
: /* output */ "=a"(eax), "=b"(ebx),
"=c"(ecx), "=d"(edx)
: /* input */ "0"(leaf), "2"(subleaf));
return !!(eax & CPUID_XFD_BIT);
}
void vm_xsave_req_perm(int bit)
{
int kvm_fd;
u64 bitmask;
long rc;
struct kvm_device_attr attr = {
.group = 0,
.attr = KVM_X86_XCOMP_GUEST_SUPP,
.addr = (unsigned long) &bitmask
};
kvm_fd = open(KVM_DEV_PATH, O_RDONLY);
if (kvm_fd < 0) {
print_skip("%s not available, is KVM loaded? (errno: %d)",
KVM_DEV_PATH, errno);
exit(KSFT_SKIP);
}
rc = ioctl(kvm_fd, KVM_GET_DEVICE_ATTR, &attr);
close(kvm_fd);
if (rc == -1 && (errno == ENXIO || errno == EINVAL))
exit(KSFT_SKIP);
TEST_ASSERT(rc == 0, "KVM_GET_DEVICE_ATTR(0, KVM_X86_XCOMP_GUEST_SUPP) error: %ld", rc);
if (!(bitmask & (1ULL << bit)))
exit(KSFT_SKIP);
if (!is_xfd_supported())
exit(KSFT_SKIP);
rc = syscall(SYS_arch_prctl, ARCH_REQ_XCOMP_GUEST_PERM, bit);
/*
* The older kernel version(<5.15) can't support
* ARCH_REQ_XCOMP_GUEST_PERM and directly return.
*/
if (rc)
return;
rc = syscall(SYS_arch_prctl, ARCH_GET_XCOMP_GUEST_PERM, &bitmask);
TEST_ASSERT(rc == 0, "prctl(ARCH_GET_XCOMP_GUEST_PERM) error: %ld", rc);
TEST_ASSERT(bitmask & (1ULL << bit),
"prctl(ARCH_REQ_XCOMP_GUEST_PERM) failure bitmask=0x%lx",
bitmask);
}
void vm_vcpu_add_default(struct kvm_vm *vm, uint32_t vcpuid, void *guest_code) void vm_vcpu_add_default(struct kvm_vm *vm, uint32_t vcpuid, void *guest_code)
{ {
struct kvm_mp_state mp_state; struct kvm_mp_state mp_state;
...@@ -903,21 +965,6 @@ void vcpu_dump(FILE *stream, struct kvm_vm *vm, uint32_t vcpuid, uint8_t indent) ...@@ -903,21 +965,6 @@ void vcpu_dump(FILE *stream, struct kvm_vm *vm, uint32_t vcpuid, uint8_t indent)
sregs_dump(stream, &sregs, indent + 4); sregs_dump(stream, &sregs, indent + 4);
} }
struct kvm_x86_state {
struct kvm_vcpu_events events;
struct kvm_mp_state mp_state;
struct kvm_regs regs;
struct kvm_xsave xsave;
struct kvm_xcrs xcrs;
struct kvm_sregs sregs;
struct kvm_debugregs debugregs;
union {
struct kvm_nested_state nested;
char nested_[16384];
};
struct kvm_msrs msrs;
};
static int kvm_get_num_msrs_fd(int kvm_fd) static int kvm_get_num_msrs_fd(int kvm_fd)
{ {
struct kvm_msr_list nmsrs; struct kvm_msr_list nmsrs;
...@@ -957,6 +1004,22 @@ struct kvm_msr_list *kvm_get_msr_index_list(void) ...@@ -957,6 +1004,22 @@ struct kvm_msr_list *kvm_get_msr_index_list(void)
return list; return list;
} }
static int vcpu_save_xsave_state(struct kvm_vm *vm, struct vcpu *vcpu,
struct kvm_x86_state *state)
{
int size;
size = vm_check_cap(vm, KVM_CAP_XSAVE2);
if (!size)
size = sizeof(struct kvm_xsave);
state->xsave = malloc(size);
if (size == sizeof(struct kvm_xsave))
return ioctl(vcpu->fd, KVM_GET_XSAVE, state->xsave);
else
return ioctl(vcpu->fd, KVM_GET_XSAVE2, state->xsave);
}
struct kvm_x86_state *vcpu_save_state(struct kvm_vm *vm, uint32_t vcpuid) struct kvm_x86_state *vcpu_save_state(struct kvm_vm *vm, uint32_t vcpuid)
{ {
struct vcpu *vcpu = vcpu_find(vm, vcpuid); struct vcpu *vcpu = vcpu_find(vm, vcpuid);
...@@ -1000,7 +1063,7 @@ struct kvm_x86_state *vcpu_save_state(struct kvm_vm *vm, uint32_t vcpuid) ...@@ -1000,7 +1063,7 @@ struct kvm_x86_state *vcpu_save_state(struct kvm_vm *vm, uint32_t vcpuid)
TEST_ASSERT(r == 0, "Unexpected result from KVM_GET_REGS, r: %i", TEST_ASSERT(r == 0, "Unexpected result from KVM_GET_REGS, r: %i",
r); r);
r = ioctl(vcpu->fd, KVM_GET_XSAVE, &state->xsave); r = vcpu_save_xsave_state(vm, vcpu, state);
TEST_ASSERT(r == 0, "Unexpected result from KVM_GET_XSAVE, r: %i", TEST_ASSERT(r == 0, "Unexpected result from KVM_GET_XSAVE, r: %i",
r); r);
...@@ -1045,24 +1108,25 @@ void vcpu_load_state(struct kvm_vm *vm, uint32_t vcpuid, struct kvm_x86_state *s ...@@ -1045,24 +1108,25 @@ void vcpu_load_state(struct kvm_vm *vm, uint32_t vcpuid, struct kvm_x86_state *s
struct vcpu *vcpu = vcpu_find(vm, vcpuid); struct vcpu *vcpu = vcpu_find(vm, vcpuid);
int r; int r;
r = ioctl(vcpu->fd, KVM_SET_XSAVE, &state->xsave); r = ioctl(vcpu->fd, KVM_SET_SREGS, &state->sregs);
TEST_ASSERT(r == 0, "Unexpected result from KVM_SET_XSAVE, r: %i", TEST_ASSERT(r == 0, "Unexpected result from KVM_SET_SREGS, r: %i",
r); r);
r = ioctl(vcpu->fd, KVM_SET_MSRS, &state->msrs);
TEST_ASSERT(r == state->msrs.nmsrs,
"Unexpected result from KVM_SET_MSRS, r: %i (failed at %x)",
r, r == state->msrs.nmsrs ? -1 : state->msrs.entries[r].index);
if (kvm_check_cap(KVM_CAP_XCRS)) { if (kvm_check_cap(KVM_CAP_XCRS)) {
r = ioctl(vcpu->fd, KVM_SET_XCRS, &state->xcrs); r = ioctl(vcpu->fd, KVM_SET_XCRS, &state->xcrs);
TEST_ASSERT(r == 0, "Unexpected result from KVM_SET_XCRS, r: %i", TEST_ASSERT(r == 0, "Unexpected result from KVM_SET_XCRS, r: %i",
r); r);
} }
r = ioctl(vcpu->fd, KVM_SET_SREGS, &state->sregs); r = ioctl(vcpu->fd, KVM_SET_XSAVE, state->xsave);
TEST_ASSERT(r == 0, "Unexpected result from KVM_SET_SREGS, r: %i", TEST_ASSERT(r == 0, "Unexpected result from KVM_SET_XSAVE, r: %i",
r); r);
r = ioctl(vcpu->fd, KVM_SET_MSRS, &state->msrs);
TEST_ASSERT(r == state->msrs.nmsrs, "Unexpected result from KVM_SET_MSRS, r: %i (failed at %x)",
r, r == state->msrs.nmsrs ? -1 : state->msrs.entries[r].index);
r = ioctl(vcpu->fd, KVM_SET_VCPU_EVENTS, &state->events); r = ioctl(vcpu->fd, KVM_SET_VCPU_EVENTS, &state->events);
TEST_ASSERT(r == 0, "Unexpected result from KVM_SET_VCPU_EVENTS, r: %i", TEST_ASSERT(r == 0, "Unexpected result from KVM_SET_VCPU_EVENTS, r: %i",
r); r);
...@@ -1086,6 +1150,12 @@ void vcpu_load_state(struct kvm_vm *vm, uint32_t vcpuid, struct kvm_x86_state *s ...@@ -1086,6 +1150,12 @@ void vcpu_load_state(struct kvm_vm *vm, uint32_t vcpuid, struct kvm_x86_state *s
} }
} }
void kvm_x86_state_cleanup(struct kvm_x86_state *state)
{
free(state->xsave);
free(state);
}
bool is_intel_cpu(void) bool is_intel_cpu(void)
{ {
int eax, ebx, ecx, edx; int eax, ebx, ecx, edx;
......
// SPDX-License-Identifier: GPL-2.0-only
/*
* amx tests
*
* Copyright (C) 2021, Intel, Inc.
*
* Tests for amx #NM exception and save/restore.
*/
#define _GNU_SOURCE /* for program_invocation_short_name */
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/syscall.h>
#include "test_util.h"
#include "kvm_util.h"
#include "processor.h"
#include "vmx.h"
#ifndef __x86_64__
# error This test is 64-bit only
#endif
#define VCPU_ID 0
#define X86_FEATURE_XSAVE (1 << 26)
#define X86_FEATURE_OSXSAVE (1 << 27)
#define PAGE_SIZE (1 << 12)
#define NUM_TILES 8
#define TILE_SIZE 1024
#define XSAVE_SIZE ((NUM_TILES * TILE_SIZE) + PAGE_SIZE)
/* Tile configuration associated: */
#define MAX_TILES 16
#define RESERVED_BYTES 14
#define XFEATURE_XTILECFG 17
#define XFEATURE_XTILEDATA 18
#define XFEATURE_MASK_XTILECFG (1 << XFEATURE_XTILECFG)
#define XFEATURE_MASK_XTILEDATA (1 << XFEATURE_XTILEDATA)
#define XFEATURE_MASK_XTILE (XFEATURE_MASK_XTILECFG | XFEATURE_MASK_XTILEDATA)
#define TILE_CPUID 0x1d
#define XSTATE_CPUID 0xd
#define TILE_PALETTE_CPUID_SUBLEAVE 0x1
#define XSTATE_USER_STATE_SUBLEAVE 0x0
#define XSAVE_HDR_OFFSET 512
struct xsave_data {
u8 area[XSAVE_SIZE];
} __aligned(64);
struct tile_config {
u8 palette_id;
u8 start_row;
u8 reserved[RESERVED_BYTES];
u16 colsb[MAX_TILES];
u8 rows[MAX_TILES];
};
struct tile_data {
u8 data[NUM_TILES * TILE_SIZE];
};
struct xtile_info {
u16 bytes_per_tile;
u16 bytes_per_row;
u16 max_names;
u16 max_rows;
u32 xsave_offset;
u32 xsave_size;
};
static struct xtile_info xtile;
static inline u64 __xgetbv(u32 index)
{
u32 eax, edx;
asm volatile("xgetbv;"
: "=a" (eax), "=d" (edx)
: "c" (index));
return eax + ((u64)edx << 32);
}
static inline void __xsetbv(u32 index, u64 value)
{
u32 eax = value;
u32 edx = value >> 32;
asm volatile("xsetbv" :: "a" (eax), "d" (edx), "c" (index));
}
static inline void __ldtilecfg(void *cfg)
{
asm volatile(".byte 0xc4,0xe2,0x78,0x49,0x00"
: : "a"(cfg));
}
static inline void __tileloadd(void *tile)
{
asm volatile(".byte 0xc4,0xe2,0x7b,0x4b,0x04,0x10"
: : "a"(tile), "d"(0));
}
static inline void __tilerelease(void)
{
asm volatile(".byte 0xc4, 0xe2, 0x78, 0x49, 0xc0" ::);
}
static inline void __xsavec(struct xsave_data *data, uint64_t rfbm)
{
uint32_t rfbm_lo = rfbm;
uint32_t rfbm_hi = rfbm >> 32;
asm volatile("xsavec (%%rdi)"
: : "D" (data), "a" (rfbm_lo), "d" (rfbm_hi)
: "memory");
}
static inline void cpuid(uint32_t *eax, uint32_t *ebx,
uint32_t *ecx, uint32_t *edx)
{
/* ecx is often an input as well as an output. */
asm volatile("cpuid"
: "=a" (*eax),
"=b" (*ebx),
"=c" (*ecx),
"=d" (*edx)
: "0" (*eax), "2" (*ecx)
: "memory");
}
static inline void check_cpuid_xsave(void)
{
uint32_t eax, ebx, ecx, edx;
eax = 1;
ecx = 0;
cpuid(&eax, &ebx, &ecx, &edx);
if (!(ecx & X86_FEATURE_XSAVE))
GUEST_ASSERT(!"cpuid: no CPU xsave support!");
if (!(ecx & X86_FEATURE_OSXSAVE))
GUEST_ASSERT(!"cpuid: no OS xsave support!");
}
static bool check_xsave_supports_xtile(void)
{
return __xgetbv(0) & XFEATURE_MASK_XTILE;
}
static bool enum_xtile_config(void)
{
u32 eax, ebx, ecx, edx;
eax = TILE_CPUID;
ecx = TILE_PALETTE_CPUID_SUBLEAVE;
cpuid(&eax, &ebx, &ecx, &edx);
if (!eax || !ebx || !ecx)
return false;
xtile.max_names = ebx >> 16;
if (xtile.max_names < NUM_TILES)
return false;
xtile.bytes_per_tile = eax >> 16;
if (xtile.bytes_per_tile < TILE_SIZE)
return false;
xtile.bytes_per_row = ebx;
xtile.max_rows = ecx;
return true;
}
static bool enum_xsave_tile(void)
{
u32 eax, ebx, ecx, edx;
eax = XSTATE_CPUID;
ecx = XFEATURE_XTILEDATA;
cpuid(&eax, &ebx, &ecx, &edx);
if (!eax || !ebx)
return false;
xtile.xsave_offset = ebx;
xtile.xsave_size = eax;
return true;
}
static bool check_xsave_size(void)
{
u32 eax, ebx, ecx, edx;
bool valid = false;
eax = XSTATE_CPUID;
ecx = XSTATE_USER_STATE_SUBLEAVE;
cpuid(&eax, &ebx, &ecx, &edx);
if (ebx && ebx <= XSAVE_SIZE)
valid = true;
return valid;
}
static bool check_xtile_info(void)
{
bool ret = false;
if (!check_xsave_size())
return ret;
if (!enum_xsave_tile())
return ret;
if (!enum_xtile_config())
return ret;
if (sizeof(struct tile_data) >= xtile.xsave_size)
ret = true;
return ret;
}
static void set_tilecfg(struct tile_config *cfg)
{
int i;
/* Only palette id 1 */
cfg->palette_id = 1;
for (i = 0; i < xtile.max_names; i++) {
cfg->colsb[i] = xtile.bytes_per_row;
cfg->rows[i] = xtile.max_rows;
}
}
static void set_xstatebv(void *data, uint64_t bv)
{
*(uint64_t *)(data + XSAVE_HDR_OFFSET) = bv;
}
static u64 get_xstatebv(void *data)
{
return *(u64 *)(data + XSAVE_HDR_OFFSET);
}
static void init_regs(void)
{
uint64_t cr4, xcr0;
/* turn on CR4.OSXSAVE */
cr4 = get_cr4();
cr4 |= X86_CR4_OSXSAVE;
set_cr4(cr4);
xcr0 = __xgetbv(0);
xcr0 |= XFEATURE_MASK_XTILE;
__xsetbv(0x0, xcr0);
}
static void __attribute__((__flatten__)) guest_code(struct tile_config *amx_cfg,
struct tile_data *tiledata,
struct xsave_data *xsave_data)
{
init_regs();
check_cpuid_xsave();
GUEST_ASSERT(check_xsave_supports_xtile());
GUEST_ASSERT(check_xtile_info());
/* check xtile configs */
GUEST_ASSERT(xtile.xsave_offset == 2816);
GUEST_ASSERT(xtile.xsave_size == 8192);
GUEST_ASSERT(xtile.max_names == 8);
GUEST_ASSERT(xtile.bytes_per_tile == 1024);
GUEST_ASSERT(xtile.bytes_per_row == 64);
GUEST_ASSERT(xtile.max_rows == 16);
GUEST_SYNC(1);
/* xfd=0, enable amx */
wrmsr(MSR_IA32_XFD, 0);
GUEST_SYNC(2);
GUEST_ASSERT(rdmsr(MSR_IA32_XFD) == 0);
set_tilecfg(amx_cfg);
__ldtilecfg(amx_cfg);
GUEST_SYNC(3);
/* Check save/restore when trap to userspace */
__tileloadd(tiledata);
GUEST_SYNC(4);
__tilerelease();
GUEST_SYNC(5);
/* bit 18 not in the XCOMP_BV after xsavec() */
set_xstatebv(xsave_data, XFEATURE_MASK_XTILEDATA);
__xsavec(xsave_data, XFEATURE_MASK_XTILEDATA);
GUEST_ASSERT((get_xstatebv(xsave_data) & XFEATURE_MASK_XTILEDATA) == 0);
/* xfd=0x40000, disable amx tiledata */
wrmsr(MSR_IA32_XFD, XFEATURE_MASK_XTILEDATA);
GUEST_SYNC(6);
GUEST_ASSERT(rdmsr(MSR_IA32_XFD) == XFEATURE_MASK_XTILEDATA);
set_tilecfg(amx_cfg);
__ldtilecfg(amx_cfg);
/* Trigger #NM exception */
__tileloadd(tiledata);
GUEST_SYNC(10);
GUEST_DONE();
}
void guest_nm_handler(struct ex_regs *regs)
{
/* Check if #NM is triggered by XFEATURE_MASK_XTILEDATA */
GUEST_SYNC(7);
GUEST_ASSERT(rdmsr(MSR_IA32_XFD_ERR) == XFEATURE_MASK_XTILEDATA);
GUEST_SYNC(8);
GUEST_ASSERT(rdmsr(MSR_IA32_XFD_ERR) == XFEATURE_MASK_XTILEDATA);
/* Clear xfd_err */
wrmsr(MSR_IA32_XFD_ERR, 0);
/* xfd=0, enable amx */
wrmsr(MSR_IA32_XFD, 0);
GUEST_SYNC(9);
}
int main(int argc, char *argv[])
{
struct kvm_cpuid_entry2 *entry;
struct kvm_regs regs1, regs2;
bool amx_supported = false;
struct kvm_vm *vm;
struct kvm_run *run;
struct kvm_x86_state *state;
int xsave_restore_size = 0;
vm_vaddr_t amx_cfg, tiledata, xsavedata;
struct ucall uc;
u32 amx_offset;
int stage, ret;
vm_xsave_req_perm(XSTATE_XTILE_DATA_BIT);
/* Create VM */
vm = vm_create_default(VCPU_ID, 0, guest_code);
entry = kvm_get_supported_cpuid_entry(1);
if (!(entry->ecx & X86_FEATURE_XSAVE)) {
print_skip("XSAVE feature not supported");
exit(KSFT_SKIP);
}
if (kvm_get_cpuid_max_basic() >= 0xd) {
entry = kvm_get_supported_cpuid_index(0xd, 0);
amx_supported = entry && !!(entry->eax & XFEATURE_MASK_XTILE);
if (!amx_supported) {
print_skip("AMX is not supported by the vCPU (eax=0x%x)", entry->eax);
exit(KSFT_SKIP);
}
/* Get xsave/restore max size */
xsave_restore_size = entry->ecx;
}
run = vcpu_state(vm, VCPU_ID);
vcpu_regs_get(vm, VCPU_ID, &regs1);
/* Register #NM handler */
vm_init_descriptor_tables(vm);
vcpu_init_descriptor_tables(vm, VCPU_ID);
vm_handle_exception(vm, NM_VECTOR, guest_nm_handler);
/* amx cfg for guest_code */
amx_cfg = vm_vaddr_alloc(vm, getpagesize(), KVM_UTIL_MIN_VADDR, 0, 0);
memset(addr_gva2hva(vm, amx_cfg), 0x0, getpagesize());
/* amx tiledata for guest_code */
tiledata = vm_vaddr_alloc(vm, 2 * getpagesize(), KVM_UTIL_MIN_VADDR, 0, 0);
memset(addr_gva2hva(vm, tiledata), rand() | 1, 2 * getpagesize());
/* xsave data for guest_code */
xsavedata = vm_vaddr_alloc(vm, 3 * getpagesize(), KVM_UTIL_MIN_VADDR, 0, 0);
memset(addr_gva2hva(vm, xsavedata), 0, 3 * getpagesize());
vcpu_args_set(vm, VCPU_ID, 3, amx_cfg, tiledata, xsavedata);
for (stage = 1; ; stage++) {
_vcpu_run(vm, VCPU_ID);
TEST_ASSERT(run->exit_reason == KVM_EXIT_IO,
"Stage %d: unexpected exit reason: %u (%s),\n",
stage, run->exit_reason,
exit_reason_str(run->exit_reason));
switch (get_ucall(vm, VCPU_ID, &uc)) {
case UCALL_ABORT:
TEST_FAIL("%s at %s:%ld", (const char *)uc.args[0],
__FILE__, uc.args[1]);
/* NOT REACHED */
case UCALL_SYNC:
switch (uc.args[1]) {
case 1:
case 2:
case 3:
case 5:
case 6:
case 7:
case 8:
fprintf(stderr, "GUEST_SYNC(%ld)\n", uc.args[1]);
break;
case 4:
case 10:
fprintf(stderr,
"GUEST_SYNC(%ld), check save/restore status\n", uc.args[1]);
/* Compacted mode, get amx offset by xsave area
* size subtract 8K amx size.
*/
amx_offset = xsave_restore_size - NUM_TILES*TILE_SIZE;
state = vcpu_save_state(vm, VCPU_ID);
void *amx_start = (void *)state->xsave + amx_offset;
void *tiles_data = (void *)addr_gva2hva(vm, tiledata);
/* Only check TMM0 register, 1 tile */
ret = memcmp(amx_start, tiles_data, TILE_SIZE);
TEST_ASSERT(ret == 0, "memcmp failed, ret=%d\n", ret);
kvm_x86_state_cleanup(state);
break;
case 9:
fprintf(stderr,
"GUEST_SYNC(%ld), #NM exception and enable amx\n", uc.args[1]);
break;
}
break;
case UCALL_DONE:
fprintf(stderr, "UCALL_DONE\n");
goto done;
default:
TEST_FAIL("Unknown ucall %lu", uc.cmd);
}
state = vcpu_save_state(vm, VCPU_ID);
memset(&regs1, 0, sizeof(regs1));
vcpu_regs_get(vm, VCPU_ID, &regs1);
kvm_vm_release(vm);
/* Restore state in a new VM. */
kvm_vm_restart(vm, O_RDWR);
vm_vcpu_add(vm, VCPU_ID);
vcpu_set_cpuid(vm, VCPU_ID, kvm_get_supported_cpuid());
vcpu_load_state(vm, VCPU_ID, state);
run = vcpu_state(vm, VCPU_ID);
kvm_x86_state_cleanup(state);
memset(&regs2, 0, sizeof(regs2));
vcpu_regs_get(vm, VCPU_ID, &regs2);
TEST_ASSERT(!memcmp(&regs1, &regs2, sizeof(regs2)),
"Unexpected register values after vcpu_load_state; rdi: %lx rsi: %lx",
(ulong) regs2.rdi, (ulong) regs2.rsi);
}
done:
kvm_vm_free(vm);
}
...@@ -148,7 +148,7 @@ int main(int argc, char *argv[]) ...@@ -148,7 +148,7 @@ int main(int argc, char *argv[])
vcpu_enable_evmcs(vm, VCPU_ID); vcpu_enable_evmcs(vm, VCPU_ID);
vcpu_load_state(vm, VCPU_ID, state); vcpu_load_state(vm, VCPU_ID, state);
run = vcpu_state(vm, VCPU_ID); run = vcpu_state(vm, VCPU_ID);
free(state); kvm_x86_state_cleanup(state);
memset(&regs2, 0, sizeof(regs2)); memset(&regs2, 0, sizeof(regs2));
vcpu_regs_get(vm, VCPU_ID, &regs2); vcpu_regs_get(vm, VCPU_ID, &regs2);
......
...@@ -156,7 +156,7 @@ int main(int argc, char *argv[]) ...@@ -156,7 +156,7 @@ int main(int argc, char *argv[])
vcpu_set_cpuid(vm, VCPU_ID, kvm_get_supported_cpuid()); vcpu_set_cpuid(vm, VCPU_ID, kvm_get_supported_cpuid());
vcpu_load_state(vm, VCPU_ID, state); vcpu_load_state(vm, VCPU_ID, state);
run = vcpu_state(vm, VCPU_ID); run = vcpu_state(vm, VCPU_ID);
free(state); kvm_x86_state_cleanup(state);
} }
done: done:
......
...@@ -219,7 +219,7 @@ int main(int argc, char *argv[]) ...@@ -219,7 +219,7 @@ int main(int argc, char *argv[])
vcpu_set_cpuid(vm, VCPU_ID, kvm_get_supported_cpuid()); vcpu_set_cpuid(vm, VCPU_ID, kvm_get_supported_cpuid());
vcpu_load_state(vm, VCPU_ID, state); vcpu_load_state(vm, VCPU_ID, state);
run = vcpu_state(vm, VCPU_ID); run = vcpu_state(vm, VCPU_ID);
free(state); kvm_x86_state_cleanup(state);
memset(&regs2, 0, sizeof(regs2)); memset(&regs2, 0, sizeof(regs2));
vcpu_regs_get(vm, VCPU_ID, &regs2); vcpu_regs_get(vm, VCPU_ID, &regs2);
......
...@@ -245,7 +245,7 @@ int main(int argc, char *argv[]) ...@@ -245,7 +245,7 @@ int main(int argc, char *argv[])
vcpu_set_cpuid(vm, VCPU_ID, kvm_get_supported_cpuid()); vcpu_set_cpuid(vm, VCPU_ID, kvm_get_supported_cpuid());
vcpu_load_state(vm, VCPU_ID, state); vcpu_load_state(vm, VCPU_ID, state);
run = vcpu_state(vm, VCPU_ID); run = vcpu_state(vm, VCPU_ID);
free(state); kvm_x86_state_cleanup(state);
memset(&regs2, 0, sizeof(regs2)); memset(&regs2, 0, sizeof(regs2));
vcpu_regs_get(vm, VCPU_ID, &regs2); vcpu_regs_get(vm, VCPU_ID, &regs2);
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册