未验证 提交 c19d496b 编写于 作者: O openeuler-ci-bot 提交者: Gitee

!58 Intel Advanced Matrix Extensions (AMX) support on SPR

Merge Pull Request from: @Linwang_68f8 
 
 **Title: Intel Advanced Matrix Extensions (AMX) support on SPR** 

 **Content:** 
Intel® Advanced Matrix Extensions (Intel® AMX) is a new 64-bit programming paradigm consisting of two components: a set of 2-dimensional registers (tiles) representing sub-arrays from a larger 2-dimensional memory image, and an accelerator able to operate on tiles, the first implementation is called TMUL (tile matrix multiply unit).

This patch set involves 188 patches to enable native support of AMX in OpenEuler.

 **Intel-kernel issue:** 
#I590ZC:SPR: Advanced Matrix Extensions (AMX)

 **Test:** 


- kernel self-test including sigaltstack and AMX state management testing.
- TMUL functional testing.
- AMX stress.
- Context switch testing.
- OneDNN/Benchdnn.
- INT8/BF16 online inference.


 **Known issue:** 
N/A

 **Default config change:** 



```
@@ -479,6 +494,7 @@ CONFIG_LEGACY_VSYSCALL_EMULATE=y
# CONFIG_LEGACY_VSYSCALL_NONE is not set
# CONFIG_CMDLINE_BOOL is not set
CONFIG_MODIFY_LDT_SYSCALL=y
+# CONFIG_STRICT_SIGALTSTACK_SIZE is not set
CONFIG_HAVE_LIVEPATCH_FTRACE=y
CONFIG_HAVE_LIVEPATCH_WO_FTRACE=y

@@ -845,6 +861,7 @@ CONFIG_HAVE_STATIC_CALL=y
CONFIG_HAVE_STATIC_CALL_INLINE=y
CONFIG_HAVE_PREEMPT_DYNAMIC=y
CONFIG_ARCH_WANT_LD_ORPHAN_WARN=y
+CONFIG_DYNAMIC_SIGFRAME=y
```


 
 
Link:https://gitee.com/openeuler/kernel/pulls/58 
Reviewed-by: Liu Chao <liuchao173@huawei.com> 
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com> 
...@@ -5384,6 +5384,15 @@ ...@@ -5384,6 +5384,15 @@
stifb= [HW] stifb= [HW]
Format: bpp:<bpp1>[:<bpp2>[:<bpp3>...]] Format: bpp:<bpp1>[:<bpp2>[:<bpp3>...]]
strict_sas_size=
[X86]
Format: <bool>
Enable or disable strict sigaltstack size checks
against the required signal frame size which
depends on the supported FPU features. This can
be used to filter out binaries which have
not yet been made aware of AT_MINSIGSTKSZ.
sunrpc.min_resvport= sunrpc.min_resvport=
sunrpc.max_resvport= sunrpc.max_resvport=
[NFS,SUNRPC] [NFS,SUNRPC]
......
.. SPDX-License-Identifier: GPL-2.0
==================================
x86-specific ELF Auxiliary Vectors
==================================
This document describes the semantics of the x86 auxiliary vectors.
Introduction
============
ELF Auxiliary vectors enable the kernel to efficiently provide
configuration-specific parameters to userspace. In this example, a program
allocates an alternate stack based on the kernel-provided size::
#include <sys/auxv.h>
#include <elf.h>
#include <signal.h>
#include <stdlib.h>
#include <assert.h>
#include <err.h>
#ifndef AT_MINSIGSTKSZ
#define AT_MINSIGSTKSZ 51
#endif
....
stack_t ss;
ss.ss_sp = malloc(ss.ss_size);
assert(ss.ss_sp);
ss.ss_size = getauxval(AT_MINSIGSTKSZ) + SIGSTKSZ;
ss.ss_flags = 0;
if (sigaltstack(&ss, NULL))
err(1, "sigaltstack");
The exposed auxiliary vectors
=============================
AT_SYSINFO is used for locating the vsyscall entry point. It is not
exported on 64-bit mode.
AT_SYSINFO_EHDR is the start address of the page containing the vDSO.
AT_MINSIGSTKSZ denotes the minimum stack size required by the kernel to
deliver a signal to user-space. AT_MINSIGSTKSZ comprehends the space
consumed by the kernel to accommodate the user context for the current
hardware configuration. It does not comprehend subsequent user-space stack
consumption, which must be added by the user. (e.g. Above, user-space adds
SIGSTKSZ to AT_MINSIGSTKSZ.)
...@@ -35,3 +35,5 @@ x86-specific Documentation ...@@ -35,3 +35,5 @@ x86-specific Documentation
x86_64/index x86_64/index
sva sva
sgx sgx
elf_auxvec
xstate
Using XSTATE features in user space applications
================================================
The x86 architecture supports floating-point extensions which are
enumerated via CPUID. Applications consult CPUID and use XGETBV to
evaluate which features have been enabled by the kernel XCR0.
Up to AVX-512 and PKRU states, these features are automatically enabled by
the kernel if available. Features like AMX TILE_DATA (XSTATE component 18)
are enabled by XCR0 as well, but the first use of related instruction is
trapped by the kernel because by default the required large XSTATE buffers
are not allocated automatically.
Using dynamically enabled XSTATE features in user space applications
--------------------------------------------------------------------
The kernel provides an arch_prctl(2) based mechanism for applications to
request the usage of such features. The arch_prctl(2) options related to
this are:
-ARCH_GET_XCOMP_SUPP
arch_prctl(ARCH_GET_XCOMP_SUPP, &features);
ARCH_GET_XCOMP_SUPP stores the supported features in userspace storage of
type uint64_t. The second argument is a pointer to that storage.
-ARCH_GET_XCOMP_PERM
arch_prctl(ARCH_GET_XCOMP_PERM, &features);
ARCH_GET_XCOMP_PERM stores the features for which the userspace process
has permission in userspace storage of type uint64_t. The second argument
is a pointer to that storage.
-ARCH_REQ_XCOMP_PERM
arch_prctl(ARCH_REQ_XCOMP_PERM, feature_nr);
ARCH_REQ_XCOMP_PERM allows to request permission for a dynamically enabled
feature or a feature set. A feature set can be mapped to a facility, e.g.
AMX, and can require one or more XSTATE components to be enabled.
The feature argument is the number of the highest XSTATE component which
is required for a facility to work.
When requesting permission for a feature, the kernel checks the
availability. The kernel ensures that sigaltstacks in the process's tasks
are large enough to accommodate the resulting large signal frame. It
enforces this both during ARCH_REQ_XCOMP_SUPP and during any subsequent
sigaltstack(2) calls. If an installed sigaltstack is smaller than the
resulting sigframe size, ARCH_REQ_XCOMP_SUPP results in -ENOSUPP. Also,
sigaltstack(2) results in -ENOMEM if the requested altstack is too small
for the permitted features.
Permission, when granted, is valid per process. Permissions are inherited
on fork(2) and cleared on exec(3).
The first use of an instruction related to a dynamically enabled feature is
trapped by the kernel. The trap handler checks whether the process has
permission to use the feature. If the process has no permission then the
kernel sends SIGILL to the application. If the process has permission then
the handler allocates a larger xstate buffer for the task so the large
state can be context switched. In the unlikely cases that the allocation
fails, the kernel sends SIGSEGV.
Dynamic features in signal frames
---------------------------------
Dynamcally enabled features are not written to the signal frame upon signal
entry if the feature is in its initial configuration. This differs from
non-dynamic features which are always written regardless of their
configuration. Signal handlers can examine the XSAVE buffer's XSTATE_BV
field to determine if a features was written.
...@@ -1119,6 +1119,9 @@ config ARCH_SPLIT_ARG64 ...@@ -1119,6 +1119,9 @@ config ARCH_SPLIT_ARG64
If a 32-bit architecture requires 64-bit arguments to be split into If a 32-bit architecture requires 64-bit arguments to be split into
pairs of 32-bit arguments, select this option. pairs of 32-bit arguments, select this option.
config DYNAMIC_SIGFRAME
bool
source "kernel/gcov/Kconfig" source "kernel/gcov/Kconfig"
source "scripts/gcc-plugins/Kconfig" source "scripts/gcc-plugins/Kconfig"
......
...@@ -111,6 +111,7 @@ config X86 ...@@ -111,6 +111,7 @@ config X86
select CLOCKSOURCE_VALIDATE_LAST_CYCLE select CLOCKSOURCE_VALIDATE_LAST_CYCLE
select CLOCKSOURCE_WATCHDOG select CLOCKSOURCE_WATCHDOG
select DCACHE_WORD_ACCESS select DCACHE_WORD_ACCESS
select DYNAMIC_SIGFRAME
select EDAC_ATOMIC_SCRUB select EDAC_ATOMIC_SCRUB
select EDAC_SUPPORT select EDAC_SUPPORT
select GENERIC_CLOCKEVENTS select GENERIC_CLOCKEVENTS
...@@ -2452,6 +2453,22 @@ config MODIFY_LDT_SYSCALL ...@@ -2452,6 +2453,22 @@ config MODIFY_LDT_SYSCALL
Saying 'N' here may make sense for embedded or server kernels. Saying 'N' here may make sense for embedded or server kernels.
config STRICT_SIGALTSTACK_SIZE
bool "Enforce strict size checking for sigaltstack"
depends on DYNAMIC_SIGFRAME
help
For historical reasons MINSIGSTKSZ is a constant which became
already too small with AVX512 support. Add a mechanism to
enforce strict checking of the sigaltstack size against the
real size of the FPU frame. This option enables the check
by default. It can also be controlled via the kernel command
line option 'strict_sas_size' independent of this config
switch. Enabling it might break existing applications which
allocate a too small sigaltstack but 'work' because they
never get a signal delivered.
Say 'N' unless you want to really enforce this check.
source "kernel/livepatch/Kconfig" source "kernel/livepatch/Kconfig"
endmenu endmenu
......
...@@ -491,7 +491,7 @@ static void intel_pmu_arch_lbr_xrstors(void *ctx) ...@@ -491,7 +491,7 @@ static void intel_pmu_arch_lbr_xrstors(void *ctx)
{ {
struct x86_perf_task_context_arch_lbr_xsave *task_ctx = ctx; struct x86_perf_task_context_arch_lbr_xsave *task_ctx = ctx;
copy_kernel_to_dynamic_supervisor(&task_ctx->xsave, XFEATURE_MASK_LBR); xrstors(&task_ctx->xsave, XFEATURE_MASK_LBR);
} }
static __always_inline bool lbr_is_reset_in_cstate(void *ctx) static __always_inline bool lbr_is_reset_in_cstate(void *ctx)
...@@ -576,7 +576,7 @@ static void intel_pmu_arch_lbr_xsaves(void *ctx) ...@@ -576,7 +576,7 @@ static void intel_pmu_arch_lbr_xsaves(void *ctx)
{ {
struct x86_perf_task_context_arch_lbr_xsave *task_ctx = ctx; struct x86_perf_task_context_arch_lbr_xsave *task_ctx = ctx;
copy_dynamic_supervisor_to_kernel(&task_ctx->xsave, XFEATURE_MASK_LBR); xsaves(&task_ctx->xsave, XFEATURE_MASK_LBR);
} }
static void __intel_pmu_lbr_save(void *ctx) static void __intel_pmu_lbr_save(void *ctx)
...@@ -993,7 +993,7 @@ static void intel_pmu_arch_lbr_read_xsave(struct cpu_hw_events *cpuc) ...@@ -993,7 +993,7 @@ static void intel_pmu_arch_lbr_read_xsave(struct cpu_hw_events *cpuc)
intel_pmu_store_lbr(cpuc, NULL); intel_pmu_store_lbr(cpuc, NULL);
return; return;
} }
copy_dynamic_supervisor_to_kernel(&xsave->xsave, XFEATURE_MASK_LBR); xsaves(&xsave->xsave, XFEATURE_MASK_LBR);
intel_pmu_store_lbr(cpuc, xsave->lbr.entries); intel_pmu_store_lbr(cpuc, xsave->lbr.entries);
} }
......
...@@ -14,6 +14,7 @@ ...@@ -14,6 +14,7 @@
#include <linux/perf_event.h> #include <linux/perf_event.h>
#include <asm/fpu/xstate.h>
#include <asm/intel_ds.h> #include <asm/intel_ds.h>
/* To enable MSR tracing please use the generic trace points. */ /* To enable MSR tracing please use the generic trace points. */
......
...@@ -24,7 +24,6 @@ ...@@ -24,7 +24,6 @@
#include <linux/syscalls.h> #include <linux/syscalls.h>
#include <asm/ucontext.h> #include <asm/ucontext.h>
#include <linux/uaccess.h> #include <linux/uaccess.h>
#include <asm/fpu/internal.h>
#include <asm/fpu/signal.h> #include <asm/fpu/signal.h>
#include <asm/ptrace.h> #include <asm/ptrace.h>
#include <asm/ia32_unistd.h> #include <asm/ia32_unistd.h>
...@@ -57,8 +56,8 @@ static inline void reload_segments(struct sigcontext_32 *sc) ...@@ -57,8 +56,8 @@ static inline void reload_segments(struct sigcontext_32 *sc)
/* /*
* Do a signal return; undo the signal stack. * Do a signal return; undo the signal stack.
*/ */
static int ia32_restore_sigcontext(struct pt_regs *regs, static bool ia32_restore_sigcontext(struct pt_regs *regs,
struct sigcontext_32 __user *usc) struct sigcontext_32 __user *usc)
{ {
struct sigcontext_32 sc; struct sigcontext_32 sc;
...@@ -66,7 +65,7 @@ static int ia32_restore_sigcontext(struct pt_regs *regs, ...@@ -66,7 +65,7 @@ static int ia32_restore_sigcontext(struct pt_regs *regs,
current->restart_block.fn = do_no_restart_syscall; current->restart_block.fn = do_no_restart_syscall;
if (unlikely(copy_from_user(&sc, usc, sizeof(sc)))) if (unlikely(copy_from_user(&sc, usc, sizeof(sc))))
return -EFAULT; return false;
/* Get only the ia32 registers. */ /* Get only the ia32 registers. */
regs->bx = sc.bx; regs->bx = sc.bx;
...@@ -111,7 +110,7 @@ COMPAT_SYSCALL_DEFINE0(sigreturn) ...@@ -111,7 +110,7 @@ COMPAT_SYSCALL_DEFINE0(sigreturn)
set_current_blocked(&set); set_current_blocked(&set);
if (ia32_restore_sigcontext(regs, &frame->sc)) if (!ia32_restore_sigcontext(regs, &frame->sc))
goto badframe; goto badframe;
return regs->ax; return regs->ax;
...@@ -135,7 +134,7 @@ COMPAT_SYSCALL_DEFINE0(rt_sigreturn) ...@@ -135,7 +134,7 @@ COMPAT_SYSCALL_DEFINE0(rt_sigreturn)
set_current_blocked(&set); set_current_blocked(&set);
if (ia32_restore_sigcontext(regs, &frame->uc.uc_mcontext)) if (!ia32_restore_sigcontext(regs, &frame->uc.uc_mcontext))
goto badframe; goto badframe;
if (compat_restore_altstack(&frame->uc.uc_stack)) if (compat_restore_altstack(&frame->uc.uc_stack))
...@@ -220,8 +219,8 @@ static void __user *get_sigframe(struct ksignal *ksig, struct pt_regs *regs, ...@@ -220,8 +219,8 @@ static void __user *get_sigframe(struct ksignal *ksig, struct pt_regs *regs,
sp = fpu__alloc_mathframe(sp, 1, &fx_aligned, &math_size); sp = fpu__alloc_mathframe(sp, 1, &fx_aligned, &math_size);
*fpstate = (struct _fpstate_32 __user *) sp; *fpstate = (struct _fpstate_32 __user *) sp;
if (copy_fpstate_to_sigframe(*fpstate, (void __user *)fx_aligned, if (!copy_fpstate_to_sigframe(*fpstate, (void __user *)fx_aligned,
math_size) < 0) math_size))
return (void __user *) -1L; return (void __user *) -1L;
sp -= frame_size; sp -= frame_size;
......
...@@ -119,28 +119,19 @@ ...@@ -119,28 +119,19 @@
# define CC_OUT(c) [_cc_ ## c] "=qm" # define CC_OUT(c) [_cc_ ## c] "=qm"
#endif #endif
# include <asm/extable_fixup_types.h>
/* Exception table entry */ /* Exception table entry */
#ifdef __ASSEMBLY__ #ifdef __ASSEMBLY__
# define _ASM_EXTABLE_HANDLE(from, to, handler) \
# define _ASM_EXTABLE_TYPE(from, to, type) \
.pushsection "__ex_table","a" ; \ .pushsection "__ex_table","a" ; \
.balign 4 ; \ .balign 4 ; \
.long (from) - . ; \ .long (from) - . ; \
.long (to) - . ; \ .long (to) - . ; \
.long (handler) - . ; \ .long type ; \
.popsection .popsection
# define _ASM_EXTABLE(from, to) \
_ASM_EXTABLE_HANDLE(from, to, ex_handler_default)
# define _ASM_EXTABLE_UA(from, to) \
_ASM_EXTABLE_HANDLE(from, to, ex_handler_uaccess)
# define _ASM_EXTABLE_CPY(from, to) \
_ASM_EXTABLE_HANDLE(from, to, ex_handler_copy)
# define _ASM_EXTABLE_FAULT(from, to) \
_ASM_EXTABLE_HANDLE(from, to, ex_handler_fault)
# ifdef CONFIG_KPROBES # ifdef CONFIG_KPROBES
# define _ASM_NOKPROBE(entry) \ # define _ASM_NOKPROBE(entry) \
.pushsection "_kprobe_blacklist","aw" ; \ .pushsection "_kprobe_blacklist","aw" ; \
...@@ -152,27 +143,15 @@ ...@@ -152,27 +143,15 @@
# endif # endif
#else /* ! __ASSEMBLY__ */ #else /* ! __ASSEMBLY__ */
# define _EXPAND_EXTABLE_HANDLE(x) #x
# define _ASM_EXTABLE_HANDLE(from, to, handler) \ # define _ASM_EXTABLE_TYPE(from, to, type) \
" .pushsection \"__ex_table\",\"a\"\n" \ " .pushsection \"__ex_table\",\"a\"\n" \
" .balign 4\n" \ " .balign 4\n" \
" .long (" #from ") - .\n" \ " .long (" #from ") - .\n" \
" .long (" #to ") - .\n" \ " .long (" #to ") - .\n" \
" .long (" _EXPAND_EXTABLE_HANDLE(handler) ") - .\n" \ " .long " __stringify(type) " \n" \
" .popsection\n" " .popsection\n"
# define _ASM_EXTABLE(from, to) \
_ASM_EXTABLE_HANDLE(from, to, ex_handler_default)
# define _ASM_EXTABLE_UA(from, to) \
_ASM_EXTABLE_HANDLE(from, to, ex_handler_uaccess)
# define _ASM_EXTABLE_CPY(from, to) \
_ASM_EXTABLE_HANDLE(from, to, ex_handler_copy)
# define _ASM_EXTABLE_FAULT(from, to) \
_ASM_EXTABLE_HANDLE(from, to, ex_handler_fault)
/* For C file, we already have NOKPROBE_SYMBOL macro */ /* For C file, we already have NOKPROBE_SYMBOL macro */
/* /*
...@@ -185,4 +164,16 @@ register unsigned long current_stack_pointer asm(_ASM_SP); ...@@ -185,4 +164,16 @@ register unsigned long current_stack_pointer asm(_ASM_SP);
#define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer) #define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer)
#endif /* __ASSEMBLY__ */ #endif /* __ASSEMBLY__ */
#define _ASM_EXTABLE(from, to) \
_ASM_EXTABLE_TYPE(from, to, EX_TYPE_DEFAULT)
#define _ASM_EXTABLE_UA(from, to) \
_ASM_EXTABLE_TYPE(from, to, EX_TYPE_UACCESS)
#define _ASM_EXTABLE_CPY(from, to) \
_ASM_EXTABLE_TYPE(from, to, EX_TYPE_COPY)
#define _ASM_EXTABLE_FAULT(from, to) \
_ASM_EXTABLE_TYPE(from, to, EX_TYPE_FAULT)
#endif /* _ASM_X86_ASM_H */ #endif /* _ASM_X86_ASM_H */
...@@ -276,6 +276,7 @@ ...@@ -276,6 +276,7 @@
#define X86_FEATURE_XSAVEC (10*32+ 1) /* XSAVEC instruction */ #define X86_FEATURE_XSAVEC (10*32+ 1) /* XSAVEC instruction */
#define X86_FEATURE_XGETBV1 (10*32+ 2) /* XGETBV with ECX = 1 instruction */ #define X86_FEATURE_XGETBV1 (10*32+ 2) /* XGETBV with ECX = 1 instruction */
#define X86_FEATURE_XSAVES (10*32+ 3) /* XSAVES/XRSTORS instructions */ #define X86_FEATURE_XSAVES (10*32+ 3) /* XSAVES/XRSTORS instructions */
#define X86_FEATURE_XFD (10*32+ 4) /* eXtended Feature Disabling */
/* /*
* Extended auxiliary flags: Linux defined - for features scattered in various * Extended auxiliary flags: Linux defined - for features scattered in various
...@@ -380,7 +381,10 @@ ...@@ -380,7 +381,10 @@
#define X86_FEATURE_TSXLDTRK (18*32+16) /* TSX Suspend Load Address Tracking */ #define X86_FEATURE_TSXLDTRK (18*32+16) /* TSX Suspend Load Address Tracking */
#define X86_FEATURE_PCONFIG (18*32+18) /* Intel PCONFIG */ #define X86_FEATURE_PCONFIG (18*32+18) /* Intel PCONFIG */
#define X86_FEATURE_ARCH_LBR (18*32+19) /* Intel ARCH LBR */ #define X86_FEATURE_ARCH_LBR (18*32+19) /* Intel ARCH LBR */
#define X86_FEATURE_AMX_BF16 (18*32+22) /* AMX bf16 Support */
#define X86_FEATURE_AVX512_FP16 (18*32+23) /* AVX512 FP16 */ #define X86_FEATURE_AVX512_FP16 (18*32+23) /* AVX512 FP16 */
#define X86_FEATURE_AMX_TILE (18*32+24) /* AMX tile Support */
#define X86_FEATURE_AMX_INT8 (18*32+25) /* AMX int8 Support */
#define X86_FEATURE_SPEC_CTRL (18*32+26) /* "" Speculation Control (IBRS + IBPB) */ #define X86_FEATURE_SPEC_CTRL (18*32+26) /* "" Speculation Control (IBRS + IBPB) */
#define X86_FEATURE_INTEL_STIBP (18*32+27) /* "" Single Thread Indirect Branch Predictors */ #define X86_FEATURE_INTEL_STIBP (18*32+27) /* "" Single Thread Indirect Branch Predictors */
#define X86_FEATURE_FLUSH_L1D (18*32+28) /* Flush L1D cache */ #define X86_FEATURE_FLUSH_L1D (18*32+28) /* Flush L1D cache */
......
...@@ -314,6 +314,7 @@ do { \ ...@@ -314,6 +314,7 @@ do { \
NEW_AUX_ENT(AT_SYSINFO, VDSO_ENTRY); \ NEW_AUX_ENT(AT_SYSINFO, VDSO_ENTRY); \
NEW_AUX_ENT(AT_SYSINFO_EHDR, VDSO_CURRENT_BASE); \ NEW_AUX_ENT(AT_SYSINFO_EHDR, VDSO_CURRENT_BASE); \
} \ } \
NEW_AUX_ENT(AT_MINSIGSTKSZ, get_sigframe_size()); \
} while (0) } while (0)
/* /*
...@@ -330,6 +331,7 @@ extern unsigned long task_size_32bit(void); ...@@ -330,6 +331,7 @@ extern unsigned long task_size_32bit(void);
extern unsigned long task_size_64bit(int full_addr_space); extern unsigned long task_size_64bit(int full_addr_space);
extern unsigned long get_mmap_base(int is_legacy); extern unsigned long get_mmap_base(int is_legacy);
extern bool mmap_address_hint_valid(unsigned long addr, unsigned long len); extern bool mmap_address_hint_valid(unsigned long addr, unsigned long len);
extern unsigned long get_sigframe_size(void);
#ifdef CONFIG_X86_32 #ifdef CONFIG_X86_32
...@@ -351,6 +353,7 @@ do { \ ...@@ -351,6 +353,7 @@ do { \
if (vdso64_enabled) \ if (vdso64_enabled) \
NEW_AUX_ENT(AT_SYSINFO_EHDR, \ NEW_AUX_ENT(AT_SYSINFO_EHDR, \
(unsigned long __force)current->mm->context.vdso); \ (unsigned long __force)current->mm->context.vdso); \
NEW_AUX_ENT(AT_MINSIGSTKSZ, get_sigframe_size()); \
} while (0) } while (0)
/* As a historical oddity, the x32 and x86_64 vDSOs are controlled together. */ /* As a historical oddity, the x32 and x86_64 vDSOs are controlled together. */
...@@ -359,6 +362,7 @@ do { \ ...@@ -359,6 +362,7 @@ do { \
if (vdso64_enabled) \ if (vdso64_enabled) \
NEW_AUX_ENT(AT_SYSINFO_EHDR, \ NEW_AUX_ENT(AT_SYSINFO_EHDR, \
(unsigned long __force)current->mm->context.vdso); \ (unsigned long __force)current->mm->context.vdso); \
NEW_AUX_ENT(AT_MINSIGSTKSZ, get_sigframe_size()); \
} while (0) } while (0)
#define AT_SYSINFO 32 #define AT_SYSINFO 32
......
/* SPDX-License-Identifier: GPL-2.0 */ /* SPDX-License-Identifier: GPL-2.0 */
#ifndef _ASM_X86_EXTABLE_H #ifndef _ASM_X86_EXTABLE_H
#define _ASM_X86_EXTABLE_H #define _ASM_X86_EXTABLE_H
#include <asm/extable_fixup_types.h>
/* /*
* The exception table consists of triples of addresses relative to the * The exception table consists of two addresses relative to the
* exception table entry itself. The first address is of an instruction * exception table entry itself and a type selector field.
* that is allowed to fault, the second is the target at which the program *
* should continue. The third is a handler function to deal with the fault * The first address is of an instruction that is allowed to fault, the
* caused by the instruction in the first field. * second is the target at which the program should continue.
*
* The type entry is used by fixup_exception() to select the handler to
* deal with the fault caused by the instruction in the first field.
* *
* All the routines below use bits of fixup code that are out of line * All the routines below use bits of fixup code that are out of line
* with the main instruction path. This means when everything is well, * with the main instruction path. This means when everything is well,
...@@ -15,7 +21,7 @@ ...@@ -15,7 +21,7 @@
*/ */
struct exception_table_entry { struct exception_table_entry {
int insn, fixup, handler; int insn, fixup, type;
}; };
struct pt_regs; struct pt_regs;
...@@ -25,21 +31,27 @@ struct pt_regs; ...@@ -25,21 +31,27 @@ struct pt_regs;
do { \ do { \
(a)->fixup = (b)->fixup + (delta); \ (a)->fixup = (b)->fixup + (delta); \
(b)->fixup = (tmp).fixup - (delta); \ (b)->fixup = (tmp).fixup - (delta); \
(a)->handler = (b)->handler + (delta); \ (a)->type = (b)->type; \
(b)->handler = (tmp).handler - (delta); \ (b)->type = (tmp).type; \
} while (0) } while (0)
enum handler_type {
EX_HANDLER_NONE,
EX_HANDLER_FAULT,
EX_HANDLER_UACCESS,
EX_HANDLER_OTHER
};
extern int fixup_exception(struct pt_regs *regs, int trapnr, extern int fixup_exception(struct pt_regs *regs, int trapnr,
unsigned long error_code, unsigned long fault_addr); unsigned long error_code, unsigned long fault_addr);
extern int fixup_bug(struct pt_regs *regs, int trapnr); extern int fixup_bug(struct pt_regs *regs, int trapnr);
extern enum handler_type ex_get_fault_handler_type(unsigned long ip); extern int ex_get_fixup_type(unsigned long ip);
extern void early_fixup_exception(struct pt_regs *regs, int trapnr); extern void early_fixup_exception(struct pt_regs *regs, int trapnr);
#ifdef CONFIG_X86_MCE
extern void ex_handler_msr_mce(struct pt_regs *regs, bool wrmsr);
#else
static inline void ex_handler_msr_mce(struct pt_regs *regs, bool wrmsr) { }
#endif
#if defined(CONFIG_BPF_JIT) && defined(CONFIG_X86_64)
bool ex_handler_bpf(const struct exception_table_entry *x, struct pt_regs *regs);
#else
static inline bool ex_handler_bpf(const struct exception_table_entry *x,
struct pt_regs *regs) { return false; }
#endif
#endif #endif
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _ASM_X86_EXTABLE_FIXUP_TYPES_H
#define _ASM_X86_EXTABLE_FIXUP_TYPES_H
#define EX_TYPE_NONE 0
#define EX_TYPE_DEFAULT 1
#define EX_TYPE_FAULT 2
#define EX_TYPE_UACCESS 3
#define EX_TYPE_COPY 4
#define EX_TYPE_CLEAR_FS 5
#define EX_TYPE_FPU_RESTORE 6
#define EX_TYPE_WRMSR 7
#define EX_TYPE_RDMSR 8
#define EX_TYPE_BPF 9
#define EX_TYPE_WRMSR_IN_MCE 10
#define EX_TYPE_RDMSR_IN_MCE 11
#define EX_TYPE_DEFAULT_MCE_SAFE 12
#define EX_TYPE_FAULT_MCE_SAFE 13
#endif
...@@ -12,6 +12,8 @@ ...@@ -12,6 +12,8 @@
#define _ASM_X86_FPU_API_H #define _ASM_X86_FPU_API_H
#include <linux/bottom_half.h> #include <linux/bottom_half.h>
#include <asm/fpu/types.h>
/* /*
* Use kernel_fpu_begin/end() if you intend to use FPU in kernel context. It * Use kernel_fpu_begin/end() if you intend to use FPU in kernel context. It
* disables preemption so be careful if you intend to use it for long periods * disables preemption so be careful if you intend to use it for long periods
...@@ -36,9 +38,9 @@ static inline void kernel_fpu_begin(void) ...@@ -36,9 +38,9 @@ static inline void kernel_fpu_begin(void)
} }
/* /*
* Use fpregs_lock() while editing CPU's FPU registers or fpu->state. * Use fpregs_lock() while editing CPU's FPU registers or fpu->fpstate.
* A context switch will (and softirq might) save CPU's FPU registers to * A context switch will (and softirq might) save CPU's FPU registers to
* fpu->state and set TIF_NEED_FPU_LOAD leaving CPU's FPU registers in * fpu->fpstate.regs and set TIF_NEED_FPU_LOAD leaving CPU's FPU registers in
* a random state. * a random state.
*/ */
static inline void fpregs_lock(void) static inline void fpregs_lock(void)
...@@ -81,4 +83,56 @@ extern int cpu_has_xfeatures(u64 xfeatures_mask, const char **feature_name); ...@@ -81,4 +83,56 @@ extern int cpu_has_xfeatures(u64 xfeatures_mask, const char **feature_name);
static inline void update_pasid(void) { } static inline void update_pasid(void) { }
/* Trap handling */
extern int fpu__exception_code(struct fpu *fpu, int trap_nr);
extern void fpu_sync_fpstate(struct fpu *fpu);
extern void fpu_reset_from_exception_fixup(void);
/* Boot, hotplug and resume */
extern void fpu__init_cpu(void);
extern void fpu__init_system(struct cpuinfo_x86 *c);
extern void fpu__init_check_bugs(void);
extern void fpu__resume_cpu(void);
#ifdef CONFIG_MATH_EMULATION
extern void fpstate_init_soft(struct swregs_state *soft);
#else
static inline void fpstate_init_soft(struct swregs_state *soft) {}
#endif
/* State tracking */
DECLARE_PER_CPU(struct fpu *, fpu_fpregs_owner_ctx);
/* Process cleanup */
#ifdef CONFIG_X86_64
extern void fpstate_free(struct fpu *fpu);
#else
static inline void fpstate_free(struct fpu *fpu) { }
#endif
/* fpstate-related functions which are exported to KVM */
extern void fpstate_clear_xstate_component(struct fpstate *fps, unsigned int xfeature);
/* KVM specific functions */
extern bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu);
extern void fpu_free_guest_fpstate(struct fpu_guest *gfpu);
extern int fpu_swap_kvm_fpstate(struct fpu_guest *gfpu, bool enter_guest);
extern void fpu_copy_guest_fpstate_to_uabi(struct fpu_guest *gfpu, void *buf, unsigned int size, u32 pkru);
extern int fpu_copy_uabi_to_guest_fpstate(struct fpu_guest *gfpu, const void *buf, u64 xcr0, u32 *vpkru);
static inline void fpstate_set_confidential(struct fpu_guest *gfpu)
{
gfpu->fpstate->is_confidential = true;
}
static inline bool fpstate_is_confidential(struct fpu_guest *gfpu)
{
return gfpu->fpstate->is_confidential;
}
/* prctl */
struct task_struct;
extern long fpu_xstate_prctl(struct task_struct *tsk, int option, unsigned long arg2);
#endif /* _ASM_X86_FPU_API_H */ #endif /* _ASM_X86_FPU_API_H */
/* SPDX-License-Identifier: GPL-2.0 */
/*
* Copyright (C) 1994 Linus Torvalds
*
* Pentium III FXSR, SSE support
* General FPU state handling cleanups
* Gareth Hughes <gareth@valinux.com>, May 2000
* x86-64 work by Andi Kleen 2002
*/
#ifndef _ASM_X86_FPU_INTERNAL_H
#define _ASM_X86_FPU_INTERNAL_H
#include <linux/compat.h>
#include <linux/sched.h>
#include <linux/slab.h>
#include <linux/mm.h>
#include <asm/user.h>
#include <asm/fpu/api.h>
#include <asm/fpu/xstate.h>
#include <asm/fpu/xcr.h>
#include <asm/cpufeature.h>
#include <asm/trace/fpu.h>
/*
* High level FPU state handling functions:
*/
extern void fpu__prepare_read(struct fpu *fpu);
extern void fpu__prepare_write(struct fpu *fpu);
extern void fpu__save(struct fpu *fpu);
extern int fpu__restore_sig(void __user *buf, int ia32_frame);
extern void fpu__drop(struct fpu *fpu);
extern int fpu__copy(struct task_struct *dst, struct task_struct *src);
extern void fpu__clear_user_states(struct fpu *fpu);
extern void fpu__clear_all(struct fpu *fpu);
extern int fpu__exception_code(struct fpu *fpu, int trap_nr);
/*
* Boot time FPU initialization functions:
*/
extern void fpu__init_cpu(void);
extern void fpu__init_system_xstate(void);
extern void fpu__init_cpu_xstate(void);
extern void fpu__init_system(struct cpuinfo_x86 *c);
extern void fpu__init_check_bugs(void);
extern void fpu__resume_cpu(void);
extern u64 fpu__get_supported_xfeatures_mask(void);
/*
* Debugging facility:
*/
#ifdef CONFIG_X86_DEBUG_FPU
# define WARN_ON_FPU(x) WARN_ON_ONCE(x)
#else
# define WARN_ON_FPU(x) ({ (void)(x); 0; })
#endif
/*
* FPU related CPU feature flag helper routines:
*/
static __always_inline __pure bool use_xsaveopt(void)
{
return static_cpu_has(X86_FEATURE_XSAVEOPT);
}
static __always_inline __pure bool use_xsave(void)
{
return static_cpu_has(X86_FEATURE_XSAVE);
}
static __always_inline __pure bool use_fxsr(void)
{
return static_cpu_has(X86_FEATURE_FXSR);
}
/*
* fpstate handling functions:
*/
extern union fpregs_state init_fpstate;
extern void fpstate_init(union fpregs_state *state);
#ifdef CONFIG_MATH_EMULATION
extern void fpstate_init_soft(struct swregs_state *soft);
#else
static inline void fpstate_init_soft(struct swregs_state *soft) {}
#endif
static inline void fpstate_init_xstate(struct xregs_state *xsave)
{
/*
* XRSTORS requires these bits set in xcomp_bv, or it will
* trigger #GP:
*/
xsave->header.xcomp_bv = XCOMP_BV_COMPACTED_FORMAT | xfeatures_mask_all;
}
static inline void fpstate_init_fxstate(struct fxregs_state *fx)
{
fx->cwd = 0x37f;
fx->mxcsr = MXCSR_DEFAULT;
}
extern void fpstate_sanitize_xstate(struct fpu *fpu);
/* Returns 0 or the negated trap number, which results in -EFAULT for #PF */
#define user_insn(insn, output, input...) \
({ \
int err; \
\
might_fault(); \
\
asm volatile(ASM_STAC "\n" \
"1: " #insn "\n" \
"2: " ASM_CLAC "\n" \
".section .fixup,\"ax\"\n" \
"3: negl %%eax\n" \
" jmp 2b\n" \
".previous\n" \
_ASM_EXTABLE_FAULT(1b, 3b) \
: [err] "=a" (err), output \
: "0"(0), input); \
err; \
})
#define kernel_insn_err(insn, output, input...) \
({ \
int err; \
asm volatile("1:" #insn "\n\t" \
"2:\n" \
".section .fixup,\"ax\"\n" \
"3: movl $-1,%[err]\n" \
" jmp 2b\n" \
".previous\n" \
_ASM_EXTABLE(1b, 3b) \
: [err] "=r" (err), output \
: "0"(0), input); \
err; \
})
#define kernel_insn(insn, output, input...) \
asm volatile("1:" #insn "\n\t" \
"2:\n" \
_ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_fprestore) \
: output : input)
static inline int copy_fregs_to_user(struct fregs_state __user *fx)
{
return user_insn(fnsave %[fx]; fwait, [fx] "=m" (*fx), "m" (*fx));
}
static inline int copy_fxregs_to_user(struct fxregs_state __user *fx)
{
if (IS_ENABLED(CONFIG_X86_32))
return user_insn(fxsave %[fx], [fx] "=m" (*fx), "m" (*fx));
else
return user_insn(fxsaveq %[fx], [fx] "=m" (*fx), "m" (*fx));
}
static inline void copy_kernel_to_fxregs(struct fxregs_state *fx)
{
if (IS_ENABLED(CONFIG_X86_32))
kernel_insn(fxrstor %[fx], "=m" (*fx), [fx] "m" (*fx));
else
kernel_insn(fxrstorq %[fx], "=m" (*fx), [fx] "m" (*fx));
}
static inline int copy_kernel_to_fxregs_err(struct fxregs_state *fx)
{
if (IS_ENABLED(CONFIG_X86_32))
return kernel_insn_err(fxrstor %[fx], "=m" (*fx), [fx] "m" (*fx));
else
return kernel_insn_err(fxrstorq %[fx], "=m" (*fx), [fx] "m" (*fx));
}
static inline int copy_user_to_fxregs(struct fxregs_state __user *fx)
{
if (IS_ENABLED(CONFIG_X86_32))
return user_insn(fxrstor %[fx], "=m" (*fx), [fx] "m" (*fx));
else
return user_insn(fxrstorq %[fx], "=m" (*fx), [fx] "m" (*fx));
}
static inline void copy_kernel_to_fregs(struct fregs_state *fx)
{
kernel_insn(frstor %[fx], "=m" (*fx), [fx] "m" (*fx));
}
static inline int copy_kernel_to_fregs_err(struct fregs_state *fx)
{
return kernel_insn_err(frstor %[fx], "=m" (*fx), [fx] "m" (*fx));
}
static inline int copy_user_to_fregs(struct fregs_state __user *fx)
{
return user_insn(frstor %[fx], "=m" (*fx), [fx] "m" (*fx));
}
static inline void copy_fxregs_to_kernel(struct fpu *fpu)
{
if (IS_ENABLED(CONFIG_X86_32))
asm volatile( "fxsave %[fx]" : [fx] "=m" (fpu->state.fxsave));
else
asm volatile("fxsaveq %[fx]" : [fx] "=m" (fpu->state.fxsave));
}
static inline void fxsave(struct fxregs_state *fx)
{
if (IS_ENABLED(CONFIG_X86_32))
asm volatile( "fxsave %[fx]" : [fx] "=m" (*fx));
else
asm volatile("fxsaveq %[fx]" : [fx] "=m" (*fx));
}
/* These macros all use (%edi)/(%rdi) as the single memory argument. */
#define XSAVE ".byte " REX_PREFIX "0x0f,0xae,0x27"
#define XSAVEOPT ".byte " REX_PREFIX "0x0f,0xae,0x37"
#define XSAVES ".byte " REX_PREFIX "0x0f,0xc7,0x2f"
#define XRSTOR ".byte " REX_PREFIX "0x0f,0xae,0x2f"
#define XRSTORS ".byte " REX_PREFIX "0x0f,0xc7,0x1f"
/*
* After this @err contains 0 on success or the negated trap number when
* the operation raises an exception. For faults this results in -EFAULT.
*/
#define XSTATE_OP(op, st, lmask, hmask, err) \
asm volatile("1:" op "\n\t" \
"xor %[err], %[err]\n" \
"2:\n\t" \
".pushsection .fixup,\"ax\"\n\t" \
"3: negl %%eax\n\t" \
"jmp 2b\n\t" \
".popsection\n\t" \
_ASM_EXTABLE_FAULT(1b, 3b) \
: [err] "=a" (err) \
: "D" (st), "m" (*st), "a" (lmask), "d" (hmask) \
: "memory")
/*
* If XSAVES is enabled, it replaces XSAVEOPT because it supports a compact
* format and supervisor states in addition to modified optimization in
* XSAVEOPT.
*
* Otherwise, if XSAVEOPT is enabled, XSAVEOPT replaces XSAVE because XSAVEOPT
* supports modified optimization which is not supported by XSAVE.
*
* We use XSAVE as a fallback.
*
* The 661 label is defined in the ALTERNATIVE* macros as the address of the
* original instruction which gets replaced. We need to use it here as the
* address of the instruction where we might get an exception at.
*/
#define XSTATE_XSAVE(st, lmask, hmask, err) \
asm volatile(ALTERNATIVE_2(XSAVE, \
XSAVEOPT, X86_FEATURE_XSAVEOPT, \
XSAVES, X86_FEATURE_XSAVES) \
"\n" \
"xor %[err], %[err]\n" \
"3:\n" \
".pushsection .fixup,\"ax\"\n" \
"4: movl $-2, %[err]\n" \
"jmp 3b\n" \
".popsection\n" \
_ASM_EXTABLE(661b, 4b) \
: [err] "=r" (err) \
: "D" (st), "m" (*st), "a" (lmask), "d" (hmask) \
: "memory")
/*
* Use XRSTORS to restore context if it is enabled. XRSTORS supports compact
* XSAVE area format.
*/
#define XSTATE_XRESTORE(st, lmask, hmask) \
asm volatile(ALTERNATIVE(XRSTOR, \
XRSTORS, X86_FEATURE_XSAVES) \
"\n" \
"3:\n" \
_ASM_EXTABLE_HANDLE(661b, 3b, ex_handler_fprestore)\
: \
: "D" (st), "m" (*st), "a" (lmask), "d" (hmask) \
: "memory")
/*
* This function is called only during boot time when x86 caps are not set
* up and alternative can not be used yet.
*/
static inline void copy_kernel_to_xregs_booting(struct xregs_state *xstate)
{
u64 mask = -1;
u32 lmask = mask;
u32 hmask = mask >> 32;
int err;
WARN_ON(system_state != SYSTEM_BOOTING);
if (boot_cpu_has(X86_FEATURE_XSAVES))
XSTATE_OP(XRSTORS, xstate, lmask, hmask, err);
else
XSTATE_OP(XRSTOR, xstate, lmask, hmask, err);
/*
* We should never fault when copying from a kernel buffer, and the FPU
* state we set at boot time should be valid.
*/
WARN_ON_FPU(err);
}
/*
* Save processor xstate to xsave area.
*/
static inline void copy_xregs_to_kernel(struct xregs_state *xstate)
{
u64 mask = xfeatures_mask_all;
u32 lmask = mask;
u32 hmask = mask >> 32;
int err;
WARN_ON_FPU(!alternatives_patched);
XSTATE_XSAVE(xstate, lmask, hmask, err);
/* We should never fault when copying to a kernel buffer: */
WARN_ON_FPU(err);
}
/*
* Restore processor xstate from xsave area.
*/
static inline void copy_kernel_to_xregs(struct xregs_state *xstate, u64 mask)
{
u32 lmask = mask;
u32 hmask = mask >> 32;
XSTATE_XRESTORE(xstate, lmask, hmask);
}
/*
* Save xstate to user space xsave area.
*
* We don't use modified optimization because xrstor/xrstors might track
* a different application.
*
* We don't use compacted format xsave area for
* backward compatibility for old applications which don't understand
* compacted format of xsave area.
*/
static inline int copy_xregs_to_user(struct xregs_state __user *buf)
{
u64 mask = xfeatures_mask_user();
u32 lmask = mask;
u32 hmask = mask >> 32;
int err;
/*
* Clear the xsave header first, so that reserved fields are
* initialized to zero.
*/
err = __clear_user(&buf->header, sizeof(buf->header));
if (unlikely(err))
return -EFAULT;
stac();
XSTATE_OP(XSAVE, buf, lmask, hmask, err);
clac();
return err;
}
/*
* Restore xstate from user space xsave area.
*/
static inline int copy_user_to_xregs(struct xregs_state __user *buf, u64 mask)
{
struct xregs_state *xstate = ((__force struct xregs_state *)buf);
u32 lmask = mask;
u32 hmask = mask >> 32;
int err;
stac();
XSTATE_OP(XRSTOR, xstate, lmask, hmask, err);
clac();
return err;
}
/*
* Restore xstate from kernel space xsave area, return an error code instead of
* an exception.
*/
static inline int copy_kernel_to_xregs_err(struct xregs_state *xstate, u64 mask)
{
u32 lmask = mask;
u32 hmask = mask >> 32;
int err;
if (static_cpu_has(X86_FEATURE_XSAVES))
XSTATE_OP(XRSTORS, xstate, lmask, hmask, err);
else
XSTATE_OP(XRSTOR, xstate, lmask, hmask, err);
return err;
}
extern int copy_fpregs_to_fpstate(struct fpu *fpu);
static inline void __copy_kernel_to_fpregs(union fpregs_state *fpstate, u64 mask)
{
if (use_xsave()) {
copy_kernel_to_xregs(&fpstate->xsave, mask);
} else {
if (use_fxsr())
copy_kernel_to_fxregs(&fpstate->fxsave);
else
copy_kernel_to_fregs(&fpstate->fsave);
}
}
static inline void copy_kernel_to_fpregs(union fpregs_state *fpstate)
{
/*
* AMD K7/K8 CPUs don't save/restore FDP/FIP/FOP unless an exception is
* pending. Clear the x87 state here by setting it to fixed values.
* "m" is a random variable that should be in L1.
*/
if (unlikely(static_cpu_has_bug(X86_BUG_FXSAVE_LEAK))) {
asm volatile(
"fnclex\n\t"
"emms\n\t"
"fildl %P[addr]" /* set F?P to defined value */
: : [addr] "m" (fpstate));
}
__copy_kernel_to_fpregs(fpstate, -1);
}
extern int copy_fpstate_to_sigframe(void __user *buf, void __user *fp, int size);
/*
* FPU context switch related helper methods:
*/
DECLARE_PER_CPU(struct fpu *, fpu_fpregs_owner_ctx);
/*
* The in-register FPU state for an FPU context on a CPU is assumed to be
* valid if the fpu->last_cpu matches the CPU, and the fpu_fpregs_owner_ctx
* matches the FPU.
*
* If the FPU register state is valid, the kernel can skip restoring the
* FPU state from memory.
*
* Any code that clobbers the FPU registers or updates the in-memory
* FPU state for a task MUST let the rest of the kernel know that the
* FPU registers are no longer valid for this task.
*
* Either one of these invalidation functions is enough. Invalidate
* a resource you control: CPU if using the CPU for something else
* (with preemption disabled), FPU for the current task, or a task that
* is prevented from running by the current task.
*/
static inline void __cpu_invalidate_fpregs_state(void)
{
__this_cpu_write(fpu_fpregs_owner_ctx, NULL);
}
static inline void __fpu_invalidate_fpregs_state(struct fpu *fpu)
{
fpu->last_cpu = -1;
}
static inline int fpregs_state_valid(struct fpu *fpu, unsigned int cpu)
{
return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
}
/*
* These generally need preemption protection to work,
* do try to avoid using these on their own:
*/
static inline void fpregs_deactivate(struct fpu *fpu)
{
this_cpu_write(fpu_fpregs_owner_ctx, NULL);
trace_x86_fpu_regs_deactivated(fpu);
}
static inline void fpregs_activate(struct fpu *fpu)
{
this_cpu_write(fpu_fpregs_owner_ctx, fpu);
trace_x86_fpu_regs_activated(fpu);
}
/*
* Internal helper, do not use directly. Use switch_fpu_return() instead.
*/
static inline void __fpregs_load_activate(void)
{
struct fpu *fpu = &current->thread.fpu;
int cpu = smp_processor_id();
if (WARN_ON_ONCE(current->flags & PF_KTHREAD))
return;
if (!fpregs_state_valid(fpu, cpu)) {
copy_kernel_to_fpregs(&fpu->state);
fpregs_activate(fpu);
fpu->last_cpu = cpu;
}
clear_thread_flag(TIF_NEED_FPU_LOAD);
}
/*
* FPU state switching for scheduling.
*
* This is a two-stage process:
*
* - switch_fpu_prepare() saves the old state.
* This is done within the context of the old process.
*
* - switch_fpu_finish() sets TIF_NEED_FPU_LOAD; the floating point state
* will get loaded on return to userspace, or when the kernel needs it.
*
* If TIF_NEED_FPU_LOAD is cleared then the CPU's FPU registers
* are saved in the current thread's FPU register state.
*
* If TIF_NEED_FPU_LOAD is set then CPU's FPU registers may not
* hold current()'s FPU registers. It is required to load the
* registers before returning to userland or using the content
* otherwise.
*
* The FPU context is only stored/restored for a user task and
* PF_KTHREAD is used to distinguish between kernel and user threads.
*/
static inline void switch_fpu_prepare(struct task_struct *prev, int cpu)
{
struct fpu *old_fpu = &prev->thread.fpu;
if (static_cpu_has(X86_FEATURE_FPU) && !(prev->flags & PF_KTHREAD)) {
if (!copy_fpregs_to_fpstate(old_fpu))
old_fpu->last_cpu = -1;
else
old_fpu->last_cpu = cpu;
/* But leave fpu_fpregs_owner_ctx! */
trace_x86_fpu_regs_deactivated(old_fpu);
}
}
/*
* Misc helper functions:
*/
/*
* Load PKRU from the FPU context if available. Delay loading of the
* complete FPU state until the return to userland.
*/
static inline void switch_fpu_finish(struct task_struct *next)
{
u32 pkru_val = init_pkru_value;
struct pkru_state *pk;
struct fpu *next_fpu = &next->thread.fpu;
if (!static_cpu_has(X86_FEATURE_FPU))
return;
set_thread_flag(TIF_NEED_FPU_LOAD);
if (!cpu_feature_enabled(X86_FEATURE_OSPKE))
return;
/*
* PKRU state is switched eagerly because it needs to be valid before we
* return to userland e.g. for a copy_to_user() operation.
*/
if (!(next->flags & PF_KTHREAD)) {
/*
* If the PKRU bit in xsave.header.xfeatures is not set,
* then the PKRU component was in init state, which means
* XRSTOR will set PKRU to 0. If the bit is not set then
* get_xsave_addr() will return NULL because the PKRU value
* in memory is not valid. This means pkru_val has to be
* set to 0 and not to init_pkru_value.
*/
pk = get_xsave_addr(&next_fpu->state.xsave, XFEATURE_PKRU);
pkru_val = pk ? pk->pkru : 0;
}
__write_pkru(pkru_val);
}
#endif /* _ASM_X86_FPU_INTERNAL_H */
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _ASM_X86_FPU_SCHED_H
#define _ASM_X86_FPU_SCHED_H
#include <linux/sched.h>
#include <asm/cpufeature.h>
#include <asm/fpu/types.h>
#include <asm/trace/fpu.h>
extern void save_fpregs_to_fpstate(struct fpu *fpu);
extern void fpu__drop(struct fpu *fpu);
extern int fpu_clone(struct task_struct *dst, unsigned long clone_flags);
extern void fpu_flush_thread(void);
/*
* FPU state switching for scheduling.
*
* This is a two-stage process:
*
* - switch_fpu_prepare() saves the old state.
* This is done within the context of the old process.
*
* - switch_fpu_finish() sets TIF_NEED_FPU_LOAD; the floating point state
* will get loaded on return to userspace, or when the kernel needs it.
*
* If TIF_NEED_FPU_LOAD is cleared then the CPU's FPU registers
* are saved in the current thread's FPU register state.
*
* If TIF_NEED_FPU_LOAD is set then CPU's FPU registers may not
* hold current()'s FPU registers. It is required to load the
* registers before returning to userland or using the content
* otherwise.
*
* The FPU context is only stored/restored for a user task and
* PF_KTHREAD is used to distinguish between kernel and user threads.
*/
static inline void switch_fpu_prepare(struct fpu *old_fpu, int cpu)
{
if (cpu_feature_enabled(X86_FEATURE_FPU) &&
!(current->flags & PF_KTHREAD)) {
save_fpregs_to_fpstate(old_fpu);
/*
* The save operation preserved register state, so the
* fpu_fpregs_owner_ctx is still @old_fpu. Store the
* current CPU number in @old_fpu, so the next return
* to user space can avoid the FPU register restore
* when is returns on the same CPU and still owns the
* context.
*/
old_fpu->last_cpu = cpu;
trace_x86_fpu_regs_deactivated(old_fpu);
}
}
/*
* Delay loading of the complete FPU state until the return to userland.
* PKRU is handled separately.
*/
static inline void switch_fpu_finish(void)
{
if (cpu_feature_enabled(X86_FEATURE_FPU))
set_thread_flag(TIF_NEED_FPU_LOAD);
}
#endif /* _ASM_X86_FPU_SCHED_H */
...@@ -5,6 +5,11 @@ ...@@ -5,6 +5,11 @@
#ifndef _ASM_X86_FPU_SIGNAL_H #ifndef _ASM_X86_FPU_SIGNAL_H
#define _ASM_X86_FPU_SIGNAL_H #define _ASM_X86_FPU_SIGNAL_H
#include <linux/compat.h>
#include <linux/user.h>
#include <asm/fpu/types.h>
#ifdef CONFIG_X86_64 #ifdef CONFIG_X86_64
# include <uapi/asm/sigcontext.h> # include <uapi/asm/sigcontext.h>
# include <asm/user32.h> # include <asm/user32.h>
...@@ -29,6 +34,14 @@ unsigned long ...@@ -29,6 +34,14 @@ unsigned long
fpu__alloc_mathframe(unsigned long sp, int ia32_frame, fpu__alloc_mathframe(unsigned long sp, int ia32_frame,
unsigned long *buf_fx, unsigned long *size); unsigned long *buf_fx, unsigned long *size);
extern void fpu__init_prepare_fx_sw_frame(void); unsigned long fpu__get_fpstate_size(void);
extern bool copy_fpstate_to_sigframe(void __user *buf, void __user *fp, int size);
extern void fpu__clear_user_states(struct fpu *fpu);
extern bool fpu__restore_sig(void __user *buf, int ia32_frame);
extern void restore_fpregs_from_fpstate(struct fpstate *fpstate, u64 mask);
extern bool copy_fpstate_to_sigframe(void __user *buf, void __user *fp, int size);
#endif /* _ASM_X86_FPU_SIGNAL_H */ #endif /* _ASM_X86_FPU_SIGNAL_H */
...@@ -120,6 +120,9 @@ enum xfeature { ...@@ -120,6 +120,9 @@ enum xfeature {
XFEATURE_RSRVD_COMP_13, XFEATURE_RSRVD_COMP_13,
XFEATURE_RSRVD_COMP_14, XFEATURE_RSRVD_COMP_14,
XFEATURE_LBR, XFEATURE_LBR,
XFEATURE_RSRVD_COMP_16,
XFEATURE_XTILE_CFG,
XFEATURE_XTILE_DATA,
XFEATURE_MAX, XFEATURE_MAX,
}; };
...@@ -136,12 +139,21 @@ enum xfeature { ...@@ -136,12 +139,21 @@ enum xfeature {
#define XFEATURE_MASK_PKRU (1 << XFEATURE_PKRU) #define XFEATURE_MASK_PKRU (1 << XFEATURE_PKRU)
#define XFEATURE_MASK_PASID (1 << XFEATURE_PASID) #define XFEATURE_MASK_PASID (1 << XFEATURE_PASID)
#define XFEATURE_MASK_LBR (1 << XFEATURE_LBR) #define XFEATURE_MASK_LBR (1 << XFEATURE_LBR)
#define XFEATURE_MASK_XTILE_CFG (1 << XFEATURE_XTILE_CFG)
#define XFEATURE_MASK_XTILE_DATA (1 << XFEATURE_XTILE_DATA)
#define XFEATURE_MASK_FPSSE (XFEATURE_MASK_FP | XFEATURE_MASK_SSE) #define XFEATURE_MASK_FPSSE (XFEATURE_MASK_FP | XFEATURE_MASK_SSE)
#define XFEATURE_MASK_AVX512 (XFEATURE_MASK_OPMASK \ #define XFEATURE_MASK_AVX512 (XFEATURE_MASK_OPMASK \
| XFEATURE_MASK_ZMM_Hi256 \ | XFEATURE_MASK_ZMM_Hi256 \
| XFEATURE_MASK_Hi16_ZMM) | XFEATURE_MASK_Hi16_ZMM)
#ifdef CONFIG_X86_64
# define XFEATURE_MASK_XTILE (XFEATURE_MASK_XTILE_DATA \
| XFEATURE_MASK_XTILE_CFG)
#else
# define XFEATURE_MASK_XTILE (0)
#endif
#define FIRST_EXTENDED_XFEATURE XFEATURE_YMM #define FIRST_EXTENDED_XFEATURE XFEATURE_YMM
struct reg_128_bit { struct reg_128_bit {
...@@ -153,6 +165,9 @@ struct reg_256_bit { ...@@ -153,6 +165,9 @@ struct reg_256_bit {
struct reg_512_bit { struct reg_512_bit {
u8 regbytes[512/8]; u8 regbytes[512/8];
}; };
struct reg_1024_byte {
u8 regbytes[1024];
};
/* /*
* State component 2: * State component 2:
...@@ -255,6 +270,23 @@ struct arch_lbr_state { ...@@ -255,6 +270,23 @@ struct arch_lbr_state {
u64 ler_to; u64 ler_to;
u64 ler_info; u64 ler_info;
struct lbr_entry entries[]; struct lbr_entry entries[];
};
/*
* State component 17: 64-byte tile configuration register.
*/
struct xtile_cfg {
u64 tcfg[8];
} __packed;
/*
* State component 18: 1KB tile data register.
* Each register represents 16 64-byte rows of the matrix
* data. But the number of registers depends on the actual
* implementation.
*/
struct xtile_data {
struct reg_1024_byte tmm;
} __packed; } __packed;
/* /*
...@@ -309,6 +341,91 @@ union fpregs_state { ...@@ -309,6 +341,91 @@ union fpregs_state {
u8 __padding[PAGE_SIZE]; u8 __padding[PAGE_SIZE];
}; };
struct fpstate {
/* @kernel_size: The size of the kernel register image */
unsigned int size;
/* @user_size: The size in non-compacted UABI format */
unsigned int user_size;
/* @xfeatures: xfeatures for which the storage is sized */
u64 xfeatures;
/* @user_xfeatures: xfeatures valid in UABI buffers */
u64 user_xfeatures;
/* @xfd: xfeatures disabled to trap userspace use. */
u64 xfd;
/* @is_valloc: Indicator for dynamically allocated state */
unsigned int is_valloc : 1;
/* @is_guest: Indicator for guest state (KVM) */
unsigned int is_guest : 1;
/*
* @is_confidential: Indicator for KVM confidential mode.
* The FPU registers are restored by the
* vmentry firmware from encrypted guest
* memory. On vmexit the FPU registers are
* saved by firmware to encrypted guest memory
* and the registers are scrubbed before
* returning to the host. So there is no
* content which is worth saving and restoring.
* The fpstate has to be there so that
* preemption and softirq FPU usage works
* without special casing.
*/
unsigned int is_confidential : 1;
/* @in_use: State is in use */
unsigned int in_use : 1;
/* @regs: The register state union for all supported formats */
union fpregs_state regs;
/* @regs is dynamically sized! Don't add anything after @regs! */
} __aligned(64);
struct fpu_state_perm {
/*
* @__state_perm:
*
* This bitmap indicates the permission for state components, which
* are available to a thread group. The permission prctl() sets the
* enabled state bits in thread_group_leader()->thread.fpu.
*
* All run time operations use the per thread information in the
* currently active fpu.fpstate which contains the xfeature masks
* and sizes for kernel and user space.
*
* This master permission field is only to be used when
* task.fpu.fpstate based checks fail to validate whether the task
* is allowed to expand it's xfeatures set which requires to
* allocate a larger sized fpstate buffer.
*
* Do not access this field directly. Use the provided helper
* function. Unlocked access is possible for quick checks.
*/
u64 __state_perm;
/*
* @__state_size:
*
* The size required for @__state_perm. Only valid to access
* with sighand locked.
*/
unsigned int __state_size;
/*
* @__user_state_size:
*
* The size required for @__state_perm user part. Only valid to
* access with sighand locked.
*/
unsigned int __user_state_size;
};
/* /*
* Highest level per task FPU state data structure that * Highest level per task FPU state data structure that
* contains the FPU register state plus various FPU * contains the FPU register state plus various FPU
...@@ -337,19 +454,100 @@ struct fpu { ...@@ -337,19 +454,100 @@ struct fpu {
unsigned long avx512_timestamp; unsigned long avx512_timestamp;
/* /*
* @state: * @fpstate:
*
* Pointer to the active struct fpstate. Initialized to
* point at @__fpstate below.
*/
struct fpstate *fpstate;
/*
* @__task_fpstate:
*
* Pointer to an inactive struct fpstate. Initialized to NULL. Is
* used only for KVM support to swap out the regular task fpstate.
*/
struct fpstate *__task_fpstate;
/*
* @perm:
*
* Permission related information
*/
struct fpu_state_perm perm;
/*
* @__fpstate:
* *
* In-memory copy of all FPU registers that we save/restore * Initial in-memory storage for FPU registers which are saved in
* over context switches. If the task is using the FPU then * context switch and when the kernel uses the FPU. The registers
* the registers in the FPU are more recent than this state * are restored from this storage on return to user space if they
* copy. If the task context-switches away then they get * are not longer containing the tasks FPU register state.
* saved here and represent the FPU state.
*/ */
union fpregs_state state; struct fpstate __fpstate;
/* /*
* WARNING: 'state' is dynamically-sized. Do not put * WARNING: '__fpstate' is dynamically-sized. Do not put
* anything after it here. * anything after it here.
*/ */
}; };
/*
* Guest pseudo FPU container
*/
struct fpu_guest {
/*
* @fpstate: Pointer to the allocated guest fpstate
*/
struct fpstate *fpstate;
};
/*
* FPU state configuration data. Initialized at boot time. Read only after init.
*/
struct fpu_state_config {
/*
* @max_size:
*
* The maximum size of the register state buffer. Includes all
* supported features except independent managed features.
*/
unsigned int max_size;
/*
* @default_size:
*
* The default size of the register state buffer. Includes all
* supported features except independent managed features and
* features which have to be requested by user space before usage.
*/
unsigned int default_size;
/*
* @max_features:
*
* The maximum supported features bitmap. Does not include
* independent managed features.
*/
u64 max_features;
/*
* @default_features:
*
* The default supported features bitmap. Does not include
* independent managed features and features which have to
* be requested by user space before usage.
*/
u64 default_features;
/*
* @legacy_features:
*
* Features which can be reported back to user space
* even without XSAVE support, i.e. legacy features FP + SSE
*/
u64 legacy_features;
};
/* FPU state configuration information */
extern struct fpu_state_config fpu_kernel_cfg, fpu_user_cfg;
#endif /* _ASM_X86_FPU_H */ #endif /* _ASM_X86_FPU_H */
...@@ -2,18 +2,8 @@ ...@@ -2,18 +2,8 @@
#ifndef _ASM_X86_FPU_XCR_H #ifndef _ASM_X86_FPU_XCR_H
#define _ASM_X86_FPU_XCR_H #define _ASM_X86_FPU_XCR_H
/*
* MXCSR and XCR definitions:
*/
static inline void ldmxcsr(u32 mxcsr)
{
asm volatile("ldmxcsr %0" :: "m" (mxcsr));
}
extern unsigned int mxcsr_feature_mask;
#define XCR_XFEATURE_ENABLED_MASK 0x00000000 #define XCR_XFEATURE_ENABLED_MASK 0x00000000
#define XCR_XFEATURE_IN_USE_MASK 0x00000001
static inline u64 xgetbv(u32 index) static inline u64 xgetbv(u32 index)
{ {
...@@ -31,4 +21,15 @@ static inline void xsetbv(u32 index, u64 value) ...@@ -31,4 +21,15 @@ static inline void xsetbv(u32 index, u64 value)
asm volatile("xsetbv" :: "a" (eax), "d" (edx), "c" (index)); asm volatile("xsetbv" :: "a" (eax), "d" (edx), "c" (index));
} }
/*
* Return a mask of xfeatures which are currently being tracked
* by the processor as being in the initial configuration.
*
* Callers should check X86_FEATURE_XGETBV1.
*/
static inline u64 xfeatures_in_use(void)
{
return xgetbv(XCR_XFEATURE_IN_USE_MASK);
}
#endif /* _ASM_X86_FPU_XCR_H */ #endif /* _ASM_X86_FPU_XCR_H */
...@@ -6,6 +6,7 @@ ...@@ -6,6 +6,7 @@
#include <linux/types.h> #include <linux/types.h>
#include <asm/processor.h> #include <asm/processor.h>
#include <asm/fpu/api.h>
#include <asm/user.h> #include <asm/user.h>
/* Bit 63 of XCR0 is reserved for future expansion */ /* Bit 63 of XCR0 is reserved for future expansion */
...@@ -13,6 +14,8 @@ ...@@ -13,6 +14,8 @@
#define XSTATE_CPUID 0x0000000d #define XSTATE_CPUID 0x0000000d
#define TILE_CPUID 0x0000001d
#define FXSAVE_SIZE 512 #define FXSAVE_SIZE 512
#define XSAVE_HDR_SIZE 64 #define XSAVE_HDR_SIZE 64
...@@ -32,7 +35,19 @@ ...@@ -32,7 +35,19 @@
XFEATURE_MASK_Hi16_ZMM | \ XFEATURE_MASK_Hi16_ZMM | \
XFEATURE_MASK_PKRU | \ XFEATURE_MASK_PKRU | \
XFEATURE_MASK_BNDREGS | \ XFEATURE_MASK_BNDREGS | \
XFEATURE_MASK_BNDCSR) XFEATURE_MASK_BNDCSR | \
XFEATURE_MASK_XTILE)
/*
* Features which are restored when returning to user space.
* PKRU is not restored on return to user space because PKRU
* is switched eagerly in switch_to() and flush_thread()
*/
#define XFEATURE_MASK_USER_RESTORE \
(XFEATURE_MASK_USER_SUPPORTED & ~XFEATURE_MASK_PKRU)
/* Features which are dynamically enabled for a process on request */
#define XFEATURE_MASK_USER_DYNAMIC XFEATURE_MASK_XTILE_DATA
/* All currently supported supervisor features */ /* All currently supported supervisor features */
#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID) #define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID)
...@@ -42,21 +57,21 @@ ...@@ -42,21 +57,21 @@
* and its size may be huge. Saving/restoring such supervisor state components * and its size may be huge. Saving/restoring such supervisor state components
* at each context switch can cause high CPU and space overhead, which should * at each context switch can cause high CPU and space overhead, which should
* be avoided. Such supervisor state components should only be saved/restored * be avoided. Such supervisor state components should only be saved/restored
* on demand. The on-demand dynamic supervisor features are set in this mask. * on demand. The on-demand supervisor features are set in this mask.
* *
* Unlike the existing supported supervisor features, a dynamic supervisor * Unlike the existing supported supervisor features, an independent supervisor
* feature does not allocate a buffer in task->fpu, and the corresponding * feature does not allocate a buffer in task->fpu, and the corresponding
* supervisor state component cannot be saved/restored at each context switch. * supervisor state component cannot be saved/restored at each context switch.
* *
* To support a dynamic supervisor feature, a developer should follow the * To support an independent supervisor feature, a developer should follow the
* dos and don'ts as below: * dos and don'ts as below:
* - Do dynamically allocate a buffer for the supervisor state component. * - Do dynamically allocate a buffer for the supervisor state component.
* - Do manually invoke the XSAVES/XRSTORS instruction to save/restore the * - Do manually invoke the XSAVES/XRSTORS instruction to save/restore the
* state component to/from the buffer. * state component to/from the buffer.
* - Don't set the bit corresponding to the dynamic supervisor feature in * - Don't set the bit corresponding to the independent supervisor feature in
* IA32_XSS at run time, since it has been set at boot time. * IA32_XSS at run time, since it has been set at boot time.
*/ */
#define XFEATURE_MASK_DYNAMIC (XFEATURE_MASK_LBR) #define XFEATURE_MASK_INDEPENDENT (XFEATURE_MASK_LBR)
/* /*
* Unsupported supervisor features. When a supervisor feature in this mask is * Unsupported supervisor features. When a supervisor feature in this mask is
...@@ -66,54 +81,52 @@ ...@@ -66,54 +81,52 @@
/* All supervisor states including supported and unsupported states. */ /* All supervisor states including supported and unsupported states. */
#define XFEATURE_MASK_SUPERVISOR_ALL (XFEATURE_MASK_SUPERVISOR_SUPPORTED | \ #define XFEATURE_MASK_SUPERVISOR_ALL (XFEATURE_MASK_SUPERVISOR_SUPPORTED | \
XFEATURE_MASK_DYNAMIC | \ XFEATURE_MASK_INDEPENDENT | \
XFEATURE_MASK_SUPERVISOR_UNSUPPORTED) XFEATURE_MASK_SUPERVISOR_UNSUPPORTED)
#ifdef CONFIG_X86_64 /*
#define REX_PREFIX "0x48, " * The feature mask required to restore FPU state:
#else * - All user states which are not eagerly switched in switch_to()/exec()
#define REX_PREFIX * - The suporvisor states
#endif */
#define XFEATURE_MASK_FPSTATE (XFEATURE_MASK_USER_RESTORE | \
extern u64 xfeatures_mask_all; XFEATURE_MASK_SUPERVISOR_SUPPORTED)
static inline u64 xfeatures_mask_supervisor(void)
{
return xfeatures_mask_all & XFEATURE_MASK_SUPERVISOR_SUPPORTED;
}
static inline u64 xfeatures_mask_user(void)
{
return xfeatures_mask_all & XFEATURE_MASK_USER_SUPPORTED;
}
static inline u64 xfeatures_mask_dynamic(void)
{
if (!boot_cpu_has(X86_FEATURE_ARCH_LBR))
return XFEATURE_MASK_DYNAMIC & ~XFEATURE_MASK_LBR;
return XFEATURE_MASK_DYNAMIC; /*
} * Features in this mask have space allocated in the signal frame, but may not
* have that space initialized when the feature is in its init state.
*/
#define XFEATURE_MASK_SIGFRAME_INITOPT (XFEATURE_MASK_XTILE | \
XFEATURE_MASK_USER_DYNAMIC)
extern u64 xstate_fx_sw_bytes[USER_XSTATE_FX_SW_WORDS]; extern u64 xstate_fx_sw_bytes[USER_XSTATE_FX_SW_WORDS];
extern void __init update_regset_xstate_info(unsigned int size, extern void __init update_regset_xstate_info(unsigned int size,
u64 xstate_mask); u64 xstate_mask);
void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr);
const void *get_xsave_field_ptr(int xfeature_nr);
int using_compacted_format(void);
int xfeature_size(int xfeature_nr); int xfeature_size(int xfeature_nr);
struct membuf;
void copy_xstate_to_kernel(struct membuf to, struct xregs_state *xsave);
int copy_kernel_to_xstate(struct xregs_state *xsave, const void *kbuf);
int copy_user_to_xstate(struct xregs_state *xsave, const void __user *ubuf);
void copy_supervisor_to_kernel(struct xregs_state *xsave);
void copy_dynamic_supervisor_to_kernel(struct xregs_state *xstate, u64 mask);
void copy_kernel_to_dynamic_supervisor(struct xregs_state *xstate, u64 mask);
void xsaves(struct xregs_state *xsave, u64 mask);
void xrstors(struct xregs_state *xsave, u64 mask);
int xfd_enable_feature(u64 xfd_err);
#ifdef CONFIG_X86_64
DECLARE_STATIC_KEY_FALSE(__fpu_state_size_dynamic);
#endif
/* Validate an xstate header supplied by userspace (ptrace or sigreturn) */ #ifdef CONFIG_X86_64
int validate_user_xstate_header(const struct xstate_header *hdr); DECLARE_STATIC_KEY_FALSE(__fpu_state_size_dynamic);
static __always_inline __pure bool fpu_state_size_dynamic(void)
{
return static_branch_unlikely(&__fpu_state_size_dynamic);
}
#else
static __always_inline __pure bool fpu_state_size_dynamic(void)
{
return false;
}
#endif
#endif #endif
...@@ -604,11 +604,10 @@ struct kvm_vcpu_arch { ...@@ -604,11 +604,10 @@ struct kvm_vcpu_arch {
* *
* Note that while the PKRU state lives inside the fpu registers, * Note that while the PKRU state lives inside the fpu registers,
* it is switched out separately at VMENTER and VMEXIT time. The * it is switched out separately at VMENTER and VMEXIT time. The
* "guest_fpu" state here contains the guest FPU context, with the * "guest_fpstate" state here contains the guest FPU context, with the
* host PRKU bits. * host PRKU bits.
*/ */
struct fpu *user_fpu; struct fpu_guest guest_fpu;
struct fpu *guest_fpu;
u64 xcr0; u64 xcr0;
u64 guest_supported_xcr0; u64 guest_supported_xcr0;
......
...@@ -644,6 +644,8 @@ ...@@ -644,6 +644,8 @@
#define MSR_IA32_BNDCFGS_RSVD 0x00000ffc #define MSR_IA32_BNDCFGS_RSVD 0x00000ffc
#define MSR_IA32_XFD 0x000001c4
#define MSR_IA32_XFD_ERR 0x000001c5
#define MSR_IA32_XSS 0x00000da0 #define MSR_IA32_XSS 0x00000da0
#define MSR_IA32_APICBASE 0x0000001b #define MSR_IA32_APICBASE 0x0000001b
......
...@@ -92,7 +92,7 @@ static __always_inline unsigned long long __rdmsr(unsigned int msr) ...@@ -92,7 +92,7 @@ static __always_inline unsigned long long __rdmsr(unsigned int msr)
asm volatile("1: rdmsr\n" asm volatile("1: rdmsr\n"
"2:\n" "2:\n"
_ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_rdmsr_unsafe) _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_RDMSR)
: EAX_EDX_RET(val, low, high) : "c" (msr)); : EAX_EDX_RET(val, low, high) : "c" (msr));
return EAX_EDX_VAL(val, low, high); return EAX_EDX_VAL(val, low, high);
...@@ -102,7 +102,7 @@ static __always_inline void __wrmsr(unsigned int msr, u32 low, u32 high) ...@@ -102,7 +102,7 @@ static __always_inline void __wrmsr(unsigned int msr, u32 low, u32 high)
{ {
asm volatile("1: wrmsr\n" asm volatile("1: wrmsr\n"
"2:\n" "2:\n"
_ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_wrmsr_unsafe) _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR)
: : "c" (msr), "a"(low), "d" (high) : "memory"); : : "c" (msr), "a"(low), "d" (high) : "memory");
} }
......
...@@ -23,7 +23,7 @@ ...@@ -23,7 +23,7 @@
#ifndef __ASSEMBLY__ #ifndef __ASSEMBLY__
#include <asm/x86_init.h> #include <asm/x86_init.h>
#include <asm/fpu/xstate.h> #include <asm/pkru.h>
#include <asm/fpu/api.h> #include <asm/fpu/api.h>
#include <asm-generic/pgtable_uffd.h> #include <asm-generic/pgtable_uffd.h>
...@@ -126,35 +126,6 @@ static inline int pte_dirty(pte_t pte) ...@@ -126,35 +126,6 @@ static inline int pte_dirty(pte_t pte)
return pte_flags(pte) & _PAGE_DIRTY; return pte_flags(pte) & _PAGE_DIRTY;
} }
static inline u32 read_pkru(void)
{
if (boot_cpu_has(X86_FEATURE_OSPKE))
return rdpkru();
return 0;
}
static inline void write_pkru(u32 pkru)
{
struct pkru_state *pk;
if (!boot_cpu_has(X86_FEATURE_OSPKE))
return;
pk = get_xsave_addr(&current->thread.fpu.state.xsave, XFEATURE_PKRU);
/*
* The PKRU value in xstate needs to be in sync with the value that is
* written to the CPU. The FPU restore on return to userland would
* otherwise load the previous value again.
*/
fpregs_lock();
if (pk)
pk->pkru = pkru;
__write_pkru(pkru);
fpregs_unlock();
}
static inline int pte_young(pte_t pte) static inline int pte_young(pte_t pte)
{ {
return pte_flags(pte) & _PAGE_ACCESSED; return pte_flags(pte) & _PAGE_ACCESSED;
...@@ -1360,32 +1331,6 @@ static inline pmd_t pmd_swp_clear_uffd_wp(pmd_t pmd) ...@@ -1360,32 +1331,6 @@ static inline pmd_t pmd_swp_clear_uffd_wp(pmd_t pmd)
} }
#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */
#define PKRU_AD_BIT 0x1u
#define PKRU_WD_BIT 0x2u
#define PKRU_BITS_PER_PKEY 2
#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
extern u32 init_pkru_value;
#else
#define init_pkru_value 0
#endif
static inline bool __pkru_allows_read(u32 pkru, u16 pkey)
{
int pkru_pkey_bits = pkey * PKRU_BITS_PER_PKEY;
return !(pkru & (PKRU_AD_BIT << pkru_pkey_bits));
}
static inline bool __pkru_allows_write(u32 pkru, u16 pkey)
{
int pkru_pkey_bits = pkey * PKRU_BITS_PER_PKEY;
/*
* Access-disable disables writes too so we need to check
* both bits here.
*/
return !(pkru & ((PKRU_AD_BIT|PKRU_WD_BIT) << pkru_pkey_bits));
}
static inline u16 pte_flags_pkey(unsigned long pte_flags) static inline u16 pte_flags_pkey(unsigned long pte_flags)
{ {
#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
......
...@@ -9,14 +9,14 @@ ...@@ -9,14 +9,14 @@
* will be necessary to ensure that the types that store key * will be necessary to ensure that the types that store key
* numbers and masks have sufficient capacity. * numbers and masks have sufficient capacity.
*/ */
#define arch_max_pkey() (boot_cpu_has(X86_FEATURE_OSPKE) ? 16 : 1) #define arch_max_pkey() (cpu_feature_enabled(X86_FEATURE_OSPKE) ? 16 : 1)
extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
unsigned long init_val); unsigned long init_val);
static inline bool arch_pkeys_enabled(void) static inline bool arch_pkeys_enabled(void)
{ {
return boot_cpu_has(X86_FEATURE_OSPKE); return cpu_feature_enabled(X86_FEATURE_OSPKE);
} }
/* /*
...@@ -26,7 +26,7 @@ static inline bool arch_pkeys_enabled(void) ...@@ -26,7 +26,7 @@ static inline bool arch_pkeys_enabled(void)
extern int __execute_only_pkey(struct mm_struct *mm); extern int __execute_only_pkey(struct mm_struct *mm);
static inline int execute_only_pkey(struct mm_struct *mm) static inline int execute_only_pkey(struct mm_struct *mm)
{ {
if (!boot_cpu_has(X86_FEATURE_OSPKE)) if (!cpu_feature_enabled(X86_FEATURE_OSPKE))
return ARCH_DEFAULT_PKEY; return ARCH_DEFAULT_PKEY;
return __execute_only_pkey(mm); return __execute_only_pkey(mm);
...@@ -37,7 +37,7 @@ extern int __arch_override_mprotect_pkey(struct vm_area_struct *vma, ...@@ -37,7 +37,7 @@ extern int __arch_override_mprotect_pkey(struct vm_area_struct *vma,
static inline int arch_override_mprotect_pkey(struct vm_area_struct *vma, static inline int arch_override_mprotect_pkey(struct vm_area_struct *vma,
int prot, int pkey) int prot, int pkey)
{ {
if (!boot_cpu_has(X86_FEATURE_OSPKE)) if (!cpu_feature_enabled(X86_FEATURE_OSPKE))
return 0; return 0;
return __arch_override_mprotect_pkey(vma, prot, pkey); return __arch_override_mprotect_pkey(vma, prot, pkey);
...@@ -124,7 +124,6 @@ extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, ...@@ -124,7 +124,6 @@ extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
unsigned long init_val); unsigned long init_val);
extern int __arch_set_user_pkey_access(struct task_struct *tsk, int pkey, extern int __arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
unsigned long init_val); unsigned long init_val);
extern void copy_init_pkru_to_fpregs(void);
static inline int vma_pkey(struct vm_area_struct *vma) static inline int vma_pkey(struct vm_area_struct *vma)
{ {
......
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _ASM_X86_PKRU_H
#define _ASM_X86_PKRU_H
#include <asm/cpufeature.h>
#define PKRU_AD_BIT 0x1
#define PKRU_WD_BIT 0x2
#define PKRU_BITS_PER_PKEY 2
#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
extern u32 init_pkru_value;
#define pkru_get_init_value() READ_ONCE(init_pkru_value)
#else
#define init_pkru_value 0
#define pkru_get_init_value() 0
#endif
static inline bool __pkru_allows_read(u32 pkru, u16 pkey)
{
int pkru_pkey_bits = pkey * PKRU_BITS_PER_PKEY;
return !(pkru & (PKRU_AD_BIT << pkru_pkey_bits));
}
static inline bool __pkru_allows_write(u32 pkru, u16 pkey)
{
int pkru_pkey_bits = pkey * PKRU_BITS_PER_PKEY;
/*
* Access-disable disables writes too so we need to check
* both bits here.
*/
return !(pkru & ((PKRU_AD_BIT|PKRU_WD_BIT) << pkru_pkey_bits));
}
static inline u32 read_pkru(void)
{
if (cpu_feature_enabled(X86_FEATURE_OSPKE))
return rdpkru();
return 0;
}
static inline void write_pkru(u32 pkru)
{
if (!cpu_feature_enabled(X86_FEATURE_OSPKE))
return;
/*
* WRPKRU is relatively expensive compared to RDPKRU.
* Avoid WRPKRU when it would not change the value.
*/
if (pkru != rdpkru())
wrpkru(pkru);
}
static inline void pkru_write_default(void)
{
if (!cpu_feature_enabled(X86_FEATURE_OSPKE))
return;
wrpkru(pkru_get_init_value());
}
#endif
...@@ -477,9 +477,6 @@ DECLARE_PER_CPU_ALIGNED(struct stack_canary, stack_canary); ...@@ -477,9 +477,6 @@ DECLARE_PER_CPU_ALIGNED(struct stack_canary, stack_canary);
DECLARE_PER_CPU(struct irq_stack *, softirq_stack_ptr); DECLARE_PER_CPU(struct irq_stack *, softirq_stack_ptr);
#endif /* X86_64 */ #endif /* X86_64 */
extern unsigned int fpu_kernel_xstate_size;
extern unsigned int fpu_user_xstate_size;
struct perf_event; struct perf_event;
struct thread_struct { struct thread_struct {
...@@ -537,6 +534,15 @@ struct thread_struct { ...@@ -537,6 +534,15 @@ struct thread_struct {
unsigned int iopl_warn:1; unsigned int iopl_warn:1;
unsigned int sig_on_uaccess_err:1; unsigned int sig_on_uaccess_err:1;
/*
* Protection Keys Register for Userspace. Loaded immediately on
* context switch. Store it in thread_struct to avoid a lookup in
* the tasks's FPU xstate buffer. This value is only valid when a
* task is scheduled out. For 'current' the authoritative source of
* PKRU is the hardware itself.
*/
u32 pkru;
/* Floating point and extended processor state */ /* Floating point and extended processor state */
struct fpu fpu; struct fpu fpu;
/* /*
...@@ -545,12 +551,12 @@ struct thread_struct { ...@@ -545,12 +551,12 @@ struct thread_struct {
*/ */
}; };
/* Whitelist the FPU state from the task_struct for hardened usercopy. */ extern void fpu_thread_struct_whitelist(unsigned long *offset, unsigned long *size);
static inline void arch_thread_struct_whitelist(unsigned long *offset, static inline void arch_thread_struct_whitelist(unsigned long *offset,
unsigned long *size) unsigned long *size)
{ {
*offset = offsetof(struct thread_struct, fpu.state); fpu_thread_struct_whitelist(offset, size);
*size = fpu_kernel_xstate_size;
} }
static inline void static inline void
......
...@@ -40,6 +40,6 @@ void x86_report_nx(void); ...@@ -40,6 +40,6 @@ void x86_report_nx(void);
extern int reboot_force; extern int reboot_force;
long do_arch_prctl_common(struct task_struct *task, int option, long do_arch_prctl_common(struct task_struct *task, int option,
unsigned long cpuid_enabled); unsigned long arg2);
#endif /* _ASM_X86_PROTO_H */ #endif /* _ASM_X86_PROTO_H */
...@@ -346,7 +346,7 @@ static inline void __loadsegment_fs(unsigned short value) ...@@ -346,7 +346,7 @@ static inline void __loadsegment_fs(unsigned short value)
"1: movw %0, %%fs \n" "1: movw %0, %%fs \n"
"2: \n" "2: \n"
_ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_clear_fs) _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_CLEAR_FS)
: : "rm" (value) : "memory"); : : "rm" (value) : "memory");
} }
......
...@@ -85,4 +85,6 @@ struct rt_sigframe_x32 { ...@@ -85,4 +85,6 @@ struct rt_sigframe_x32 {
#endif /* CONFIG_X86_64 */ #endif /* CONFIG_X86_64 */
void __init init_sigframe_size(void);
#endif /* _ASM_X86_SIGFRAME_H */ #endif /* _ASM_X86_SIGFRAME_H */
...@@ -104,25 +104,13 @@ static inline void wrpkru(u32 pkru) ...@@ -104,25 +104,13 @@ static inline void wrpkru(u32 pkru)
: : "a" (pkru), "c"(ecx), "d"(edx)); : : "a" (pkru), "c"(ecx), "d"(edx));
} }
static inline void __write_pkru(u32 pkru)
{
/*
* WRPKRU is relatively expensive compared to RDPKRU.
* Avoid WRPKRU when it would not change the value.
*/
if (pkru == rdpkru())
return;
wrpkru(pkru);
}
#else #else
static inline u32 rdpkru(void) static inline u32 rdpkru(void)
{ {
return 0; return 0;
} }
static inline void __write_pkru(u32 pkru) static inline void wrpkru(u32 pkru)
{ {
} }
#endif #endif
......
...@@ -22,8 +22,8 @@ DECLARE_EVENT_CLASS(x86_fpu, ...@@ -22,8 +22,8 @@ DECLARE_EVENT_CLASS(x86_fpu,
__entry->fpu = fpu; __entry->fpu = fpu;
__entry->load_fpu = test_thread_flag(TIF_NEED_FPU_LOAD); __entry->load_fpu = test_thread_flag(TIF_NEED_FPU_LOAD);
if (boot_cpu_has(X86_FEATURE_OSXSAVE)) { if (boot_cpu_has(X86_FEATURE_OSXSAVE)) {
__entry->xfeatures = fpu->state.xsave.header.xfeatures; __entry->xfeatures = fpu->fpstate->regs.xsave.header.xfeatures;
__entry->xcomp_bv = fpu->state.xsave.header.xcomp_bv; __entry->xcomp_bv = fpu->fpstate->regs.xsave.header.xcomp_bv;
} }
), ),
TP_printk("x86/fpu: %p load: %d xfeatures: %llx xcomp_bv: %llx", TP_printk("x86/fpu: %p load: %d xfeatures: %llx xcomp_bv: %llx",
......
...@@ -12,9 +12,9 @@ ...@@ -12,9 +12,9 @@
/* entries in ARCH_DLINFO: */ /* entries in ARCH_DLINFO: */
#if defined(CONFIG_IA32_EMULATION) || !defined(CONFIG_X86_64) #if defined(CONFIG_IA32_EMULATION) || !defined(CONFIG_X86_64)
# define AT_VECTOR_SIZE_ARCH 2 # define AT_VECTOR_SIZE_ARCH 3
#else /* else it's non-compat x86-64 */ #else /* else it's non-compat x86-64 */
# define AT_VECTOR_SIZE_ARCH 1 # define AT_VECTOR_SIZE_ARCH 2
#endif #endif
#endif /* _ASM_X86_AUXVEC_H */ #endif /* _ASM_X86_AUXVEC_H */
...@@ -10,6 +10,10 @@ ...@@ -10,6 +10,10 @@
#define ARCH_GET_CPUID 0x1011 #define ARCH_GET_CPUID 0x1011
#define ARCH_SET_CPUID 0x1012 #define ARCH_SET_CPUID 0x1012
#define ARCH_GET_XCOMP_SUPP 0x1021
#define ARCH_GET_XCOMP_PERM 0x1022
#define ARCH_REQ_XCOMP_PERM 0x1023
#define ARCH_MAP_VDSO_X32 0x2001 #define ARCH_MAP_VDSO_X32 0x2001
#define ARCH_MAP_VDSO_32 0x2002 #define ARCH_MAP_VDSO_32 0x2002
#define ARCH_MAP_VDSO_64 0x2003 #define ARCH_MAP_VDSO_64 0x2003
......
...@@ -23,7 +23,7 @@ ...@@ -23,7 +23,7 @@
#include <asm/bugs.h> #include <asm/bugs.h>
#include <asm/processor.h> #include <asm/processor.h>
#include <asm/processor-flags.h> #include <asm/processor-flags.h>
#include <asm/fpu/internal.h> #include <asm/fpu/api.h>
#include <asm/msr.h> #include <asm/msr.h>
#include <asm/vmx.h> #include <asm/vmx.h>
#include <asm/paravirt.h> #include <asm/paravirt.h>
......
...@@ -42,7 +42,7 @@ ...@@ -42,7 +42,7 @@
#include <asm/setup.h> #include <asm/setup.h>
#include <asm/apic.h> #include <asm/apic.h>
#include <asm/desc.h> #include <asm/desc.h>
#include <asm/fpu/internal.h> #include <asm/fpu/api.h>
#include <asm/mtrr.h> #include <asm/mtrr.h>
#include <asm/hwcap2.h> #include <asm/hwcap2.h>
#include <linux/numa.h> #include <linux/numa.h>
...@@ -58,6 +58,7 @@ ...@@ -58,6 +58,7 @@
#include <asm/intel-family.h> #include <asm/intel-family.h>
#include <asm/cpu_device_id.h> #include <asm/cpu_device_id.h>
#include <asm/uv/uv.h> #include <asm/uv/uv.h>
#include <asm/sigframe.h>
#include "cpu.h" #include "cpu.h"
...@@ -467,27 +468,22 @@ static bool pku_disabled; ...@@ -467,27 +468,22 @@ static bool pku_disabled;
static __always_inline void setup_pku(struct cpuinfo_x86 *c) static __always_inline void setup_pku(struct cpuinfo_x86 *c)
{ {
struct pkru_state *pk; if (c == &boot_cpu_data) {
if (pku_disabled || !cpu_feature_enabled(X86_FEATURE_PKU))
return;
/*
* Setting CR4.PKE will cause the X86_FEATURE_OSPKE cpuid
* bit to be set. Enforce it.
*/
setup_force_cpu_cap(X86_FEATURE_OSPKE);
/* check the boot processor, plus compile options for PKU: */ } else if (!cpu_feature_enabled(X86_FEATURE_OSPKE)) {
if (!cpu_feature_enabled(X86_FEATURE_PKU))
return;
/* checks the actual processor's cpuid bits: */
if (!cpu_has(c, X86_FEATURE_PKU))
return;
if (pku_disabled)
return; return;
}
cr4_set_bits(X86_CR4_PKE); cr4_set_bits(X86_CR4_PKE);
pk = get_xsave_addr(&init_fpstate.xsave, XFEATURE_PKRU); /* Load the default PKRU value */
if (pk) pkru_write_default();
pk->pkru = init_pkru_value;
/*
* Seting X86_CR4_PKE will cause the X86_FEATURE_OSPKE
* cpuid bit to be set. We need to ensure that we
* update that bit in this CPU's "cpu_info".
*/
set_cpu_cap(c, X86_FEATURE_OSPKE);
} }
#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
...@@ -1378,6 +1374,8 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c) ...@@ -1378,6 +1374,8 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
fpu__init_system(c); fpu__init_system(c);
init_sigframe_size();
#ifdef CONFIG_X86_32 #ifdef CONFIG_X86_32
/* /*
* Regardless of whether PCID is enumerated, the SDM says * Regardless of whether PCID is enumerated, the SDM says
...@@ -1793,9 +1791,8 @@ void print_cpu_info(struct cpuinfo_x86 *c) ...@@ -1793,9 +1791,8 @@ void print_cpu_info(struct cpuinfo_x86 *c)
} }
/* /*
* clearcpuid= was already parsed in fpu__init_parse_early_param. * clearcpuid= was already parsed in cpu_parse_early_param(). This dummy
* But we need to keep a dummy __setup around otherwise it would * function prevents it from becoming an environment variable for init.
* show up as an environment variable for init.
*/ */
static __init int setup_clearcpuid(char *arg) static __init int setup_clearcpuid(char *arg)
{ {
......
...@@ -75,6 +75,9 @@ static const struct cpuid_dep cpuid_deps[] = { ...@@ -75,6 +75,9 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_SGX_LC, X86_FEATURE_SGX }, { X86_FEATURE_SGX_LC, X86_FEATURE_SGX },
{ X86_FEATURE_SGX1, X86_FEATURE_SGX }, { X86_FEATURE_SGX1, X86_FEATURE_SGX },
{ X86_FEATURE_SGX2, X86_FEATURE_SGX1 }, { X86_FEATURE_SGX2, X86_FEATURE_SGX1 },
{ X86_FEATURE_XFD, X86_FEATURE_XSAVES },
{ X86_FEATURE_XFD, X86_FEATURE_XGETBV1 },
{ X86_FEATURE_AMX_TILE, X86_FEATURE_XFD },
{} {}
}; };
......
...@@ -382,13 +382,16 @@ static int msr_to_offset(u32 msr) ...@@ -382,13 +382,16 @@ static int msr_to_offset(u32 msr)
return -1; return -1;
} }
__visible bool ex_handler_rdmsr_fault(const struct exception_table_entry *fixup, void ex_handler_msr_mce(struct pt_regs *regs, bool wrmsr)
struct pt_regs *regs, int trapnr,
unsigned long error_code,
unsigned long fault_addr)
{ {
pr_emerg("MSR access error: RDMSR from 0x%x at rIP: 0x%lx (%pS)\n", if (wrmsr) {
(unsigned int)regs->cx, regs->ip, (void *)regs->ip); pr_emerg("MSR access error: WRMSR to 0x%x (tried to write 0x%08x%08x) at rIP: 0x%lx (%pS)\n",
(unsigned int)regs->cx, (unsigned int)regs->dx, (unsigned int)regs->ax,
regs->ip, (void *)regs->ip);
} else {
pr_emerg("MSR access error: RDMSR from 0x%x at rIP: 0x%lx (%pS)\n",
(unsigned int)regs->cx, regs->ip, (void *)regs->ip);
}
show_stack_regs(regs); show_stack_regs(regs);
...@@ -396,8 +399,6 @@ __visible bool ex_handler_rdmsr_fault(const struct exception_table_entry *fixup, ...@@ -396,8 +399,6 @@ __visible bool ex_handler_rdmsr_fault(const struct exception_table_entry *fixup,
while (true) while (true)
cpu_relax(); cpu_relax();
return true;
} }
/* MSR access wrappers used for error injection */ /* MSR access wrappers used for error injection */
...@@ -429,32 +430,13 @@ static noinstr u64 mce_rdmsrl(u32 msr) ...@@ -429,32 +430,13 @@ static noinstr u64 mce_rdmsrl(u32 msr)
*/ */
asm volatile("1: rdmsr\n" asm volatile("1: rdmsr\n"
"2:\n" "2:\n"
_ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_rdmsr_fault) _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_RDMSR_IN_MCE)
: EAX_EDX_RET(val, low, high) : "c" (msr)); : EAX_EDX_RET(val, low, high) : "c" (msr));
return EAX_EDX_VAL(val, low, high); return EAX_EDX_VAL(val, low, high);
} }
__visible bool ex_handler_wrmsr_fault(const struct exception_table_entry *fixup,
struct pt_regs *regs, int trapnr,
unsigned long error_code,
unsigned long fault_addr)
{
pr_emerg("MSR access error: WRMSR to 0x%x (tried to write 0x%08x%08x) at rIP: 0x%lx (%pS)\n",
(unsigned int)regs->cx, (unsigned int)regs->dx, (unsigned int)regs->ax,
regs->ip, (void *)regs->ip);
show_stack_regs(regs);
panic("MCA architectural violation!\n");
while (true)
cpu_relax();
return true;
}
static noinstr void mce_wrmsrl(u32 msr, u64 v) static noinstr void mce_wrmsrl(u32 msr, u64 v)
{ {
u32 low, high; u32 low, high;
...@@ -479,7 +461,7 @@ static noinstr void mce_wrmsrl(u32 msr, u64 v) ...@@ -479,7 +461,7 @@ static noinstr void mce_wrmsrl(u32 msr, u64 v)
/* See comment in mce_rdmsrl() */ /* See comment in mce_rdmsrl() */
asm volatile("1: wrmsr\n" asm volatile("1: wrmsr\n"
"2:\n" "2:\n"
_ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_wrmsr_fault) _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR_IN_MCE)
: : "c" (msr), "a"(low), "d" (high) : "memory"); : : "c" (msr), "a"(low), "d" (high) : "memory");
} }
......
...@@ -61,7 +61,7 @@ static inline void cmci_disable_bank(int bank) { } ...@@ -61,7 +61,7 @@ static inline void cmci_disable_bank(int bank) { }
static inline void intel_init_cmci(void) { } static inline void intel_init_cmci(void) { }
static inline void intel_init_lmce(void) { } static inline void intel_init_lmce(void) { }
static inline void intel_clear_lmce(void) { } static inline void intel_clear_lmce(void) { }
static inline bool intel_filter_mce(struct mce *m) { return false; }; static inline bool intel_filter_mce(struct mce *m) { return false; }
#endif #endif
void mce_timer_kick(unsigned long interval); void mce_timer_kick(unsigned long interval);
...@@ -183,17 +183,7 @@ extern bool filter_mce(struct mce *m); ...@@ -183,17 +183,7 @@ extern bool filter_mce(struct mce *m);
#ifdef CONFIG_X86_MCE_AMD #ifdef CONFIG_X86_MCE_AMD
extern bool amd_filter_mce(struct mce *m); extern bool amd_filter_mce(struct mce *m);
#else #else
static inline bool amd_filter_mce(struct mce *m) { return false; }; static inline bool amd_filter_mce(struct mce *m) { return false; }
#endif #endif
__visible bool ex_handler_rdmsr_fault(const struct exception_table_entry *fixup,
struct pt_regs *regs, int trapnr,
unsigned long error_code,
unsigned long fault_addr);
__visible bool ex_handler_wrmsr_fault(const struct exception_table_entry *fixup,
struct pt_regs *regs, int trapnr,
unsigned long error_code,
unsigned long fault_addr);
#endif /* __X86_MCE_INTERNAL_H__ */ #endif /* __X86_MCE_INTERNAL_H__ */
...@@ -269,25 +269,25 @@ static bool is_copy_from_user(struct pt_regs *regs) ...@@ -269,25 +269,25 @@ static bool is_copy_from_user(struct pt_regs *regs)
*/ */
static int error_context(struct mce *m, struct pt_regs *regs) static int error_context(struct mce *m, struct pt_regs *regs)
{ {
enum handler_type t;
if ((m->cs & 3) == 3) if ((m->cs & 3) == 3)
return IN_USER; return IN_USER;
if (!mc_recoverable(m->mcgstatus)) if (!mc_recoverable(m->mcgstatus))
return IN_KERNEL; return IN_KERNEL;
t = ex_get_fault_handler_type(m->ip); switch (ex_get_fixup_type(m->ip)) {
if (t == EX_HANDLER_FAULT) { case EX_TYPE_UACCESS:
m->kflags |= MCE_IN_KERNEL_RECOV; case EX_TYPE_COPY:
return IN_KERNEL_RECOV; if (!regs || !is_copy_from_user(regs))
} return IN_KERNEL;
if (t == EX_HANDLER_UACCESS && regs && is_copy_from_user(regs)) {
m->kflags |= MCE_IN_KERNEL_RECOV;
m->kflags |= MCE_IN_KERNEL_COPYIN; m->kflags |= MCE_IN_KERNEL_COPYIN;
fallthrough;
case EX_TYPE_FAULT_MCE_SAFE:
case EX_TYPE_DEFAULT_MCE_SAFE:
m->kflags |= MCE_IN_KERNEL_RECOV;
return IN_KERNEL_RECOV; return IN_KERNEL_RECOV;
default:
return IN_KERNEL;
} }
return IN_KERNEL;
} }
static int mce_severity_amd_smca(struct mce *m, enum context err_ctx) static int mce_severity_amd_smca(struct mce *m, enum context err_ctx)
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
/* /*
* x86 FPU bug checks: * x86 FPU bug checks:
*/ */
#include <asm/fpu/internal.h> #include <asm/fpu/api.h>
/* /*
* Boot time CPU/FPU FDIV bug detection code: * Boot time CPU/FPU FDIV bug detection code:
......
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __X86_KERNEL_FPU_CONTEXT_H
#define __X86_KERNEL_FPU_CONTEXT_H
#include <asm/fpu/xstate.h>
#include <asm/trace/fpu.h>
/* Functions related to FPU context tracking */
/*
* The in-register FPU state for an FPU context on a CPU is assumed to be
* valid if the fpu->last_cpu matches the CPU, and the fpu_fpregs_owner_ctx
* matches the FPU.
*
* If the FPU register state is valid, the kernel can skip restoring the
* FPU state from memory.
*
* Any code that clobbers the FPU registers or updates the in-memory
* FPU state for a task MUST let the rest of the kernel know that the
* FPU registers are no longer valid for this task.
*
* Either one of these invalidation functions is enough. Invalidate
* a resource you control: CPU if using the CPU for something else
* (with preemption disabled), FPU for the current task, or a task that
* is prevented from running by the current task.
*/
static inline void __cpu_invalidate_fpregs_state(void)
{
__this_cpu_write(fpu_fpregs_owner_ctx, NULL);
}
static inline void __fpu_invalidate_fpregs_state(struct fpu *fpu)
{
fpu->last_cpu = -1;
}
static inline int fpregs_state_valid(struct fpu *fpu, unsigned int cpu)
{
return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
}
static inline void fpregs_deactivate(struct fpu *fpu)
{
__this_cpu_write(fpu_fpregs_owner_ctx, NULL);
trace_x86_fpu_regs_deactivated(fpu);
}
static inline void fpregs_activate(struct fpu *fpu)
{
__this_cpu_write(fpu_fpregs_owner_ctx, fpu);
trace_x86_fpu_regs_activated(fpu);
}
/* Internal helper for switch_fpu_return() and signal frame setup */
static inline void fpregs_restore_userregs(void)
{
struct fpu *fpu = &current->thread.fpu;
int cpu = smp_processor_id();
if (WARN_ON_ONCE(current->flags & PF_KTHREAD))
return;
if (!fpregs_state_valid(fpu, cpu)) {
/*
* This restores _all_ xstate which has not been
* established yet.
*
* If PKRU is enabled, then the PKRU value is already
* correct because it was either set in switch_to() or in
* flush_thread(). So it is excluded because it might be
* not up to date in current->thread.fpu.xsave state.
*
* XFD state is handled in restore_fpregs_from_fpstate().
*/
restore_fpregs_from_fpstate(fpu->fpstate, XFEATURE_MASK_FPSTATE);
fpregs_activate(fpu);
fpu->last_cpu = cpu;
}
clear_thread_flag(TIF_NEED_FPU_LOAD);
}
#endif
此差异已折叠。
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
/* /*
* x86 FPU boot time init code: * x86 FPU boot time init code:
*/ */
#include <asm/fpu/internal.h> #include <asm/fpu/api.h>
#include <asm/tlbflush.h> #include <asm/tlbflush.h>
#include <asm/setup.h> #include <asm/setup.h>
...@@ -10,6 +10,10 @@ ...@@ -10,6 +10,10 @@
#include <linux/sched/task.h> #include <linux/sched/task.h>
#include <linux/init.h> #include <linux/init.h>
#include "internal.h"
#include "legacy.h"
#include "xstate.h"
/* /*
* Initialize the registers found in all CPUs, CR0 and CR4: * Initialize the registers found in all CPUs, CR0 and CR4:
*/ */
...@@ -34,7 +38,7 @@ static void fpu__init_cpu_generic(void) ...@@ -34,7 +38,7 @@ static void fpu__init_cpu_generic(void)
/* Flush out any pending x87 state: */ /* Flush out any pending x87 state: */
#ifdef CONFIG_MATH_EMULATION #ifdef CONFIG_MATH_EMULATION
if (!boot_cpu_has(X86_FEATURE_FPU)) if (!boot_cpu_has(X86_FEATURE_FPU))
fpstate_init_soft(&current->thread.fpu.state.soft); fpstate_init_soft(&current->thread.fpu.fpstate->regs.soft);
else else
#endif #endif
asm volatile ("fninit"); asm volatile ("fninit");
...@@ -89,7 +93,7 @@ static void fpu__init_system_early_generic(struct cpuinfo_x86 *c) ...@@ -89,7 +93,7 @@ static void fpu__init_system_early_generic(struct cpuinfo_x86 *c)
/* /*
* Boot time FPU feature detection code: * Boot time FPU feature detection code:
*/ */
unsigned int mxcsr_feature_mask __read_mostly = 0xffffffffu; unsigned int mxcsr_feature_mask __ro_after_init = 0xffffffffu;
EXPORT_SYMBOL_GPL(mxcsr_feature_mask); EXPORT_SYMBOL_GPL(mxcsr_feature_mask);
static void __init fpu__init_system_mxcsr(void) static void __init fpu__init_system_mxcsr(void)
...@@ -121,23 +125,14 @@ static void __init fpu__init_system_mxcsr(void) ...@@ -121,23 +125,14 @@ static void __init fpu__init_system_mxcsr(void)
static void __init fpu__init_system_generic(void) static void __init fpu__init_system_generic(void)
{ {
/* /*
* Set up the legacy init FPU context. (xstate init might overwrite this * Set up the legacy init FPU context. Will be updated when the
* with a more modern format, if the CPU supports it.) * CPU supports XSAVE[S].
*/ */
fpstate_init(&init_fpstate); fpstate_init_user(&init_fpstate);
fpu__init_system_mxcsr(); fpu__init_system_mxcsr();
} }
/*
* Size of the FPU context state. All tasks in the system use the
* same context size, regardless of what portion they use.
* This is inherent to the XSAVE architecture which puts all state
* components into a single, continuous memory block:
*/
unsigned int fpu_kernel_xstate_size;
EXPORT_SYMBOL_GPL(fpu_kernel_xstate_size);
/* Get alignment of the TYPE. */ /* Get alignment of the TYPE. */
#define TYPE_ALIGN(TYPE) offsetof(struct { char x; TYPE test; }, test) #define TYPE_ALIGN(TYPE) offsetof(struct { char x; TYPE test; }, test)
...@@ -162,13 +157,13 @@ static void __init fpu__init_task_struct_size(void) ...@@ -162,13 +157,13 @@ static void __init fpu__init_task_struct_size(void)
* Subtract off the static size of the register state. * Subtract off the static size of the register state.
* It potentially has a bunch of padding. * It potentially has a bunch of padding.
*/ */
task_size -= sizeof(((struct task_struct *)0)->thread.fpu.state); task_size -= sizeof(current->thread.fpu.__fpstate.regs);
/* /*
* Add back the dynamically-calculated register state * Add back the dynamically-calculated register state
* size. * size.
*/ */
task_size += fpu_kernel_xstate_size; task_size += fpu_kernel_cfg.default_size;
/* /*
* We dynamically size 'struct fpu', so we require that * We dynamically size 'struct fpu', so we require that
...@@ -177,7 +172,7 @@ static void __init fpu__init_task_struct_size(void) ...@@ -177,7 +172,7 @@ static void __init fpu__init_task_struct_size(void)
* you hit a compile error here, check the structure to * you hit a compile error here, check the structure to
* see if something got added to the end. * see if something got added to the end.
*/ */
CHECK_MEMBER_AT_END_OF(struct fpu, state); CHECK_MEMBER_AT_END_OF(struct fpu, __fpstate);
CHECK_MEMBER_AT_END_OF(struct thread_struct, fpu); CHECK_MEMBER_AT_END_OF(struct thread_struct, fpu);
CHECK_MEMBER_AT_END_OF(struct task_struct, thread); CHECK_MEMBER_AT_END_OF(struct task_struct, thread);
...@@ -192,48 +187,34 @@ static void __init fpu__init_task_struct_size(void) ...@@ -192,48 +187,34 @@ static void __init fpu__init_task_struct_size(void)
*/ */
static void __init fpu__init_system_xstate_size_legacy(void) static void __init fpu__init_system_xstate_size_legacy(void)
{ {
static int on_boot_cpu __initdata = 1; unsigned int size;
WARN_ON_FPU(!on_boot_cpu);
on_boot_cpu = 0;
/* /*
* Note that xstate sizes might be overwritten later during * Note that the size configuration might be overwritten later
* fpu__init_system_xstate(). * during fpu__init_system_xstate().
*/ */
if (!cpu_feature_enabled(X86_FEATURE_FPU)) {
if (!boot_cpu_has(X86_FEATURE_FPU)) { size = sizeof(struct swregs_state);
fpu_kernel_xstate_size = sizeof(struct swregs_state); } else if (cpu_feature_enabled(X86_FEATURE_FXSR)) {
size = sizeof(struct fxregs_state);
fpu_user_cfg.legacy_features = XFEATURE_MASK_FPSSE;
} else { } else {
if (boot_cpu_has(X86_FEATURE_FXSR)) size = sizeof(struct fregs_state);
fpu_kernel_xstate_size = fpu_user_cfg.legacy_features = XFEATURE_MASK_FP;
sizeof(struct fxregs_state);
else
fpu_kernel_xstate_size =
sizeof(struct fregs_state);
} }
fpu_user_xstate_size = fpu_kernel_xstate_size; fpu_kernel_cfg.max_size = size;
fpu_kernel_cfg.default_size = size;
fpu_user_cfg.max_size = size;
fpu_user_cfg.default_size = size;
fpstate_reset(&current->thread.fpu);
} }
/* static void __init fpu__init_init_fpstate(void)
* Find supported xfeatures based on cpu features and command-line input.
* This must be called after fpu__init_parse_early_param() is called and
* xfeatures_mask is enumerated.
*/
u64 __init fpu__get_supported_xfeatures_mask(void)
{ {
return XFEATURE_MASK_USER_SUPPORTED | /* Bring init_fpstate size and features up to date */
XFEATURE_MASK_SUPERVISOR_SUPPORTED; init_fpstate.size = fpu_kernel_cfg.max_size;
} init_fpstate.xfeatures = fpu_kernel_cfg.max_features;
/* Legacy code to initialize eager fpu mode. */
static void __init fpu__init_system_ctx_switch(void)
{
static bool on_boot_cpu __initdata = 1;
WARN_ON_FPU(!on_boot_cpu);
on_boot_cpu = 0;
} }
/* /*
...@@ -242,6 +223,7 @@ static void __init fpu__init_system_ctx_switch(void) ...@@ -242,6 +223,7 @@ static void __init fpu__init_system_ctx_switch(void)
*/ */
void __init fpu__init_system(struct cpuinfo_x86 *c) void __init fpu__init_system(struct cpuinfo_x86 *c)
{ {
fpstate_reset(&current->thread.fpu);
fpu__init_system_early_generic(c); fpu__init_system_early_generic(c);
/* /*
...@@ -252,8 +234,7 @@ void __init fpu__init_system(struct cpuinfo_x86 *c) ...@@ -252,8 +234,7 @@ void __init fpu__init_system(struct cpuinfo_x86 *c)
fpu__init_system_generic(); fpu__init_system_generic();
fpu__init_system_xstate_size_legacy(); fpu__init_system_xstate_size_legacy();
fpu__init_system_xstate(); fpu__init_system_xstate(fpu_kernel_cfg.max_size);
fpu__init_task_struct_size(); fpu__init_task_struct_size();
fpu__init_init_fpstate();
fpu__init_system_ctx_switch();
} }
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __X86_KERNEL_FPU_INTERNAL_H
#define __X86_KERNEL_FPU_INTERNAL_H
extern struct fpstate init_fpstate;
/* CPU feature check wrappers */
static __always_inline __pure bool use_xsave(void)
{
return cpu_feature_enabled(X86_FEATURE_XSAVE);
}
static __always_inline __pure bool use_fxsr(void)
{
return cpu_feature_enabled(X86_FEATURE_FXSR);
}
#ifdef CONFIG_X86_DEBUG_FPU
# define WARN_ON_FPU(x) WARN_ON_ONCE(x)
#else
# define WARN_ON_FPU(x) ({ (void)(x); 0; })
#endif
/* Used in init.c */
extern void fpstate_init_user(struct fpstate *fpstate);
extern void fpstate_reset(struct fpu *fpu);
#endif
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __X86_KERNEL_FPU_LEGACY_H
#define __X86_KERNEL_FPU_LEGACY_H
#include <asm/fpu/types.h>
extern unsigned int mxcsr_feature_mask;
static inline void ldmxcsr(u32 mxcsr)
{
asm volatile("ldmxcsr %0" :: "m" (mxcsr));
}
/*
* Returns 0 on success or the trap number when the operation raises an
* exception.
*/
#define user_insn(insn, output, input...) \
({ \
int err; \
\
might_fault(); \
\
asm volatile(ASM_STAC "\n" \
"1: " #insn "\n" \
"2: " ASM_CLAC "\n" \
_ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_FAULT_MCE_SAFE) \
: [err] "=a" (err), output \
: "0"(0), input); \
err; \
})
#define kernel_insn_err(insn, output, input...) \
({ \
int err; \
asm volatile("1:" #insn "\n\t" \
"2:\n" \
".section .fixup,\"ax\"\n" \
"3: movl $-1,%[err]\n" \
" jmp 2b\n" \
".previous\n" \
_ASM_EXTABLE(1b, 3b) \
: [err] "=r" (err), output \
: "0"(0), input); \
err; \
})
#define kernel_insn(insn, output, input...) \
asm volatile("1:" #insn "\n\t" \
"2:\n" \
_ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_FPU_RESTORE) \
: output : input)
static inline int fnsave_to_user_sigframe(struct fregs_state __user *fx)
{
return user_insn(fnsave %[fx]; fwait, [fx] "=m" (*fx), "m" (*fx));
}
static inline int fxsave_to_user_sigframe(struct fxregs_state __user *fx)
{
if (IS_ENABLED(CONFIG_X86_32))
return user_insn(fxsave %[fx], [fx] "=m" (*fx), "m" (*fx));
else
return user_insn(fxsaveq %[fx], [fx] "=m" (*fx), "m" (*fx));
}
static inline void fxrstor(struct fxregs_state *fx)
{
if (IS_ENABLED(CONFIG_X86_32))
kernel_insn(fxrstor %[fx], "=m" (*fx), [fx] "m" (*fx));
else
kernel_insn(fxrstorq %[fx], "=m" (*fx), [fx] "m" (*fx));
}
static inline int fxrstor_safe(struct fxregs_state *fx)
{
if (IS_ENABLED(CONFIG_X86_32))
return kernel_insn_err(fxrstor %[fx], "=m" (*fx), [fx] "m" (*fx));
else
return kernel_insn_err(fxrstorq %[fx], "=m" (*fx), [fx] "m" (*fx));
}
static inline int fxrstor_from_user_sigframe(struct fxregs_state __user *fx)
{
if (IS_ENABLED(CONFIG_X86_32))
return user_insn(fxrstor %[fx], "=m" (*fx), [fx] "m" (*fx));
else
return user_insn(fxrstorq %[fx], "=m" (*fx), [fx] "m" (*fx));
}
static inline void frstor(struct fregs_state *fx)
{
kernel_insn(frstor %[fx], "=m" (*fx), [fx] "m" (*fx));
}
static inline int frstor_safe(struct fregs_state *fx)
{
return kernel_insn_err(frstor %[fx], "=m" (*fx), [fx] "m" (*fx));
}
static inline int frstor_from_user_sigframe(struct fregs_state __user *fx)
{
return user_insn(frstor %[fx], "=m" (*fx), [fx] "m" (*fx));
}
static inline void fxsave(struct fxregs_state *fx)
{
if (IS_ENABLED(CONFIG_X86_32))
asm volatile( "fxsave %[fx]" : [fx] "=m" (*fx));
else
asm volatile("fxsaveq %[fx]" : [fx] "=m" (*fx));
}
#endif
...@@ -2,11 +2,17 @@ ...@@ -2,11 +2,17 @@
/* /*
* FPU register's regset abstraction, for ptrace, core dumps, etc. * FPU register's regset abstraction, for ptrace, core dumps, etc.
*/ */
#include <asm/fpu/internal.h> #include <linux/sched/task_stack.h>
#include <linux/vmalloc.h>
#include <asm/fpu/api.h>
#include <asm/fpu/signal.h> #include <asm/fpu/signal.h>
#include <asm/fpu/regset.h> #include <asm/fpu/regset.h>
#include <asm/fpu/xstate.h>
#include <linux/sched/task_stack.h> #include "context.h"
#include "internal.h"
#include "legacy.h"
#include "xstate.h"
/* /*
* The xstateregs_active() routine is the same as the regset_fpregs_active() routine, * The xstateregs_active() routine is the same as the regset_fpregs_active() routine,
...@@ -26,18 +32,58 @@ int regset_xregset_fpregs_active(struct task_struct *target, const struct user_r ...@@ -26,18 +32,58 @@ int regset_xregset_fpregs_active(struct task_struct *target, const struct user_r
return 0; return 0;
} }
/*
* The regset get() functions are invoked from:
*
* - coredump to dump the current task's fpstate. If the current task
* owns the FPU then the memory state has to be synchronized and the
* FPU register state preserved. Otherwise fpstate is already in sync.
*
* - ptrace to dump fpstate of a stopped task, in which case the registers
* have already been saved to fpstate on context switch.
*/
static void sync_fpstate(struct fpu *fpu)
{
if (fpu == &current->thread.fpu)
fpu_sync_fpstate(fpu);
}
/*
* Invalidate cached FPU registers before modifying the stopped target
* task's fpstate.
*
* This forces the target task on resume to restore the FPU registers from
* modified fpstate. Otherwise the task might skip the restore and operate
* with the cached FPU registers which discards the modifications.
*/
static void fpu_force_restore(struct fpu *fpu)
{
/*
* Only stopped child tasks can be used to modify the FPU
* state in the fpstate buffer:
*/
WARN_ON_FPU(fpu == &current->thread.fpu);
__fpu_invalidate_fpregs_state(fpu);
}
int xfpregs_get(struct task_struct *target, const struct user_regset *regset, int xfpregs_get(struct task_struct *target, const struct user_regset *regset,
struct membuf to) struct membuf to)
{ {
struct fpu *fpu = &target->thread.fpu; struct fpu *fpu = &target->thread.fpu;
if (!boot_cpu_has(X86_FEATURE_FXSR)) if (!cpu_feature_enabled(X86_FEATURE_FXSR))
return -ENODEV; return -ENODEV;
fpu__prepare_read(fpu); sync_fpstate(fpu);
fpstate_sanitize_xstate(fpu);
if (!use_xsave()) {
return membuf_write(&to, &fpu->fpstate->regs.fxsave,
sizeof(fpu->fpstate->regs.fxsave));
}
return membuf_write(&to, &fpu->state.fxsave, sizeof(struct fxregs_state)); copy_xstate_to_uabi_buf(to, target, XSTATE_COPY_FX);
return 0;
} }
int xfpregs_set(struct task_struct *target, const struct user_regset *regset, int xfpregs_set(struct task_struct *target, const struct user_regset *regset,
...@@ -45,62 +91,51 @@ int xfpregs_set(struct task_struct *target, const struct user_regset *regset, ...@@ -45,62 +91,51 @@ int xfpregs_set(struct task_struct *target, const struct user_regset *regset,
const void *kbuf, const void __user *ubuf) const void *kbuf, const void __user *ubuf)
{ {
struct fpu *fpu = &target->thread.fpu; struct fpu *fpu = &target->thread.fpu;
struct fxregs_state newstate;
int ret; int ret;
if (!boot_cpu_has(X86_FEATURE_FXSR)) if (!cpu_feature_enabled(X86_FEATURE_FXSR))
return -ENODEV; return -ENODEV;
fpu__prepare_write(fpu); /* No funny business with partial or oversized writes is permitted. */
fpstate_sanitize_xstate(fpu); if (pos != 0 || count != sizeof(newstate))
return -EINVAL;
ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &newstate, 0, -1);
&fpu->state.fxsave, 0, -1); if (ret)
return ret;
/* /* Do not allow an invalid MXCSR value. */
* mxcsr reserved bits must be masked to zero for security reasons. if (newstate.mxcsr & ~mxcsr_feature_mask)
*/ return -EINVAL;
fpu->state.fxsave.mxcsr &= mxcsr_feature_mask;
/* fpu_force_restore(fpu);
* update the header bits in the xsave header, indicating the
* presence of FP and SSE state.
*/
if (boot_cpu_has(X86_FEATURE_XSAVE))
fpu->state.xsave.header.xfeatures |= XFEATURE_MASK_FPSSE;
return ret; /* Copy the state */
memcpy(&fpu->fpstate->regs.fxsave, &newstate, sizeof(newstate));
/* Clear xmm8..15 for 32-bit callers */
BUILD_BUG_ON(sizeof(fpu->__fpstate.regs.fxsave.xmm_space) != 16 * 16);
if (in_ia32_syscall())
memset(&fpu->fpstate->regs.fxsave.xmm_space[8*4], 0, 8 * 16);
/* Mark FP and SSE as in use when XSAVE is enabled */
if (use_xsave())
fpu->fpstate->regs.xsave.header.xfeatures |= XFEATURE_MASK_FPSSE;
return 0;
} }
int xstateregs_get(struct task_struct *target, const struct user_regset *regset, int xstateregs_get(struct task_struct *target, const struct user_regset *regset,
struct membuf to) struct membuf to)
{ {
struct fpu *fpu = &target->thread.fpu; if (!cpu_feature_enabled(X86_FEATURE_XSAVE))
struct xregs_state *xsave;
if (!boot_cpu_has(X86_FEATURE_XSAVE))
return -ENODEV; return -ENODEV;
xsave = &fpu->state.xsave; sync_fpstate(&target->thread.fpu);
fpu__prepare_read(fpu);
if (using_compacted_format()) { copy_xstate_to_uabi_buf(to, target, XSTATE_COPY_XSAVE);
copy_xstate_to_kernel(to, xsave); return 0;
return 0;
} else {
fpstate_sanitize_xstate(fpu);
/*
* Copy the 48 bytes defined by the software into the xsave
* area in the thread struct, so that we can copy the whole
* area to user using one user_regset_copyout().
*/
memcpy(&xsave->i387.sw_reserved, xstate_fx_sw_bytes, sizeof(xstate_fx_sw_bytes));
/*
* Copy the xstate memory layout.
*/
return membuf_write(&to, xsave, fpu_user_xstate_size);
}
} }
int xstateregs_set(struct task_struct *target, const struct user_regset *regset, int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
...@@ -108,44 +143,34 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset, ...@@ -108,44 +143,34 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
const void *kbuf, const void __user *ubuf) const void *kbuf, const void __user *ubuf)
{ {
struct fpu *fpu = &target->thread.fpu; struct fpu *fpu = &target->thread.fpu;
struct xregs_state *xsave; struct xregs_state *tmpbuf = NULL;
int ret; int ret;
if (!boot_cpu_has(X86_FEATURE_XSAVE)) if (!cpu_feature_enabled(X86_FEATURE_XSAVE))
return -ENODEV; return -ENODEV;
/* /*
* A whole standard-format XSAVE buffer is needed: * A whole standard-format XSAVE buffer is needed:
*/ */
if (pos != 0 || count != fpu_user_xstate_size) if (pos != 0 || count != fpu_user_cfg.max_size)
return -EFAULT; return -EFAULT;
xsave = &fpu->state.xsave; if (!kbuf) {
tmpbuf = vmalloc(count);
fpu__prepare_write(fpu); if (!tmpbuf)
return -ENOMEM;
if (using_compacted_format()) { if (copy_from_user(tmpbuf, ubuf, count)) {
if (kbuf) ret = -EFAULT;
ret = copy_kernel_to_xstate(xsave, kbuf); goto out;
else }
ret = copy_user_to_xstate(xsave, ubuf);
} else {
ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, xsave, 0, -1);
if (!ret)
ret = validate_user_xstate_header(&xsave->header);
} }
/* fpu_force_restore(fpu);
* mxcsr reserved bits must be masked to zero for security reasons. ret = copy_uabi_from_kernel_to_xstate(fpu->fpstate, kbuf ?: tmpbuf);
*/
xsave->i387.mxcsr &= mxcsr_feature_mask;
/*
* In case of failure, mark all states as init:
*/
if (ret)
fpstate_init(&fpu->state);
out:
vfree(tmpbuf);
return ret; return ret;
} }
...@@ -221,10 +246,10 @@ static inline u32 twd_fxsr_to_i387(struct fxregs_state *fxsave) ...@@ -221,10 +246,10 @@ static inline u32 twd_fxsr_to_i387(struct fxregs_state *fxsave)
* FXSR floating point environment conversions. * FXSR floating point environment conversions.
*/ */
void static void __convert_from_fxsr(struct user_i387_ia32_struct *env,
convert_from_fxsr(struct user_i387_ia32_struct *env, struct task_struct *tsk) struct task_struct *tsk,
struct fxregs_state *fxsave)
{ {
struct fxregs_state *fxsave = &tsk->thread.fpu.state.fxsave;
struct _fpreg *to = (struct _fpreg *) &env->st_space[0]; struct _fpreg *to = (struct _fpreg *) &env->st_space[0];
struct _fpxreg *from = (struct _fpxreg *) &fxsave->st_space[0]; struct _fpxreg *from = (struct _fpxreg *) &fxsave->st_space[0];
int i; int i;
...@@ -258,6 +283,12 @@ convert_from_fxsr(struct user_i387_ia32_struct *env, struct task_struct *tsk) ...@@ -258,6 +283,12 @@ convert_from_fxsr(struct user_i387_ia32_struct *env, struct task_struct *tsk)
memcpy(&to[i], &from[i], sizeof(to[0])); memcpy(&to[i], &from[i], sizeof(to[0]));
} }
void
convert_from_fxsr(struct user_i387_ia32_struct *env, struct task_struct *tsk)
{
__convert_from_fxsr(env, tsk, &tsk->thread.fpu.fpstate->regs.fxsave);
}
void convert_to_fxsr(struct fxregs_state *fxsave, void convert_to_fxsr(struct fxregs_state *fxsave,
const struct user_i387_ia32_struct *env) const struct user_i387_ia32_struct *env)
...@@ -290,25 +321,29 @@ int fpregs_get(struct task_struct *target, const struct user_regset *regset, ...@@ -290,25 +321,29 @@ int fpregs_get(struct task_struct *target, const struct user_regset *regset,
{ {
struct fpu *fpu = &target->thread.fpu; struct fpu *fpu = &target->thread.fpu;
struct user_i387_ia32_struct env; struct user_i387_ia32_struct env;
struct fxregs_state fxsave, *fx;
fpu__prepare_read(fpu); sync_fpstate(fpu);
if (!boot_cpu_has(X86_FEATURE_FPU)) if (!cpu_feature_enabled(X86_FEATURE_FPU))
return fpregs_soft_get(target, regset, to); return fpregs_soft_get(target, regset, to);
if (!boot_cpu_has(X86_FEATURE_FXSR)) { if (!cpu_feature_enabled(X86_FEATURE_FXSR)) {
return membuf_write(&to, &fpu->state.fsave, return membuf_write(&to, &fpu->fpstate->regs.fsave,
sizeof(struct fregs_state)); sizeof(struct fregs_state));
} }
fpstate_sanitize_xstate(fpu); if (use_xsave()) {
struct membuf mb = { .p = &fxsave, .left = sizeof(fxsave) };
if (to.left == sizeof(env)) { /* Handle init state optimized xstate correctly */
convert_from_fxsr(to.p, target); copy_xstate_to_uabi_buf(mb, target, XSTATE_COPY_FP);
return 0; fx = &fxsave;
} else {
fx = &fpu->fpstate->regs.fxsave;
} }
convert_from_fxsr(&env, target); __convert_from_fxsr(&env, target, fx);
return membuf_write(&to, &env, sizeof(env)); return membuf_write(&to, &env, sizeof(env));
} }
...@@ -320,31 +355,32 @@ int fpregs_set(struct task_struct *target, const struct user_regset *regset, ...@@ -320,31 +355,32 @@ int fpregs_set(struct task_struct *target, const struct user_regset *regset,
struct user_i387_ia32_struct env; struct user_i387_ia32_struct env;
int ret; int ret;
fpu__prepare_write(fpu); /* No funny business with partial or oversized writes is permitted. */
fpstate_sanitize_xstate(fpu); if (pos != 0 || count != sizeof(struct user_i387_ia32_struct))
return -EINVAL;
if (!boot_cpu_has(X86_FEATURE_FPU)) if (!cpu_feature_enabled(X86_FEATURE_FPU))
return fpregs_soft_set(target, regset, pos, count, kbuf, ubuf); return fpregs_soft_set(target, regset, pos, count, kbuf, ubuf);
if (!boot_cpu_has(X86_FEATURE_FXSR)) ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &env, 0, -1);
return user_regset_copyin(&pos, &count, &kbuf, &ubuf, if (ret)
&fpu->state.fsave, 0, return ret;
-1);
if (pos > 0 || count < sizeof(env)) fpu_force_restore(fpu);
convert_from_fxsr(&env, target);
ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &env, 0, -1); if (cpu_feature_enabled(X86_FEATURE_FXSR))
if (!ret) convert_to_fxsr(&fpu->fpstate->regs.fxsave, &env);
convert_to_fxsr(&target->thread.fpu.state.fxsave, &env); else
memcpy(&fpu->fpstate->regs.fsave, &env, sizeof(env));
/* /*
* update the header bit in the xsave header, indicating the * Update the header bit in the xsave header, indicating the
* presence of FP. * presence of FP.
*/ */
if (boot_cpu_has(X86_FEATURE_XSAVE)) if (cpu_feature_enabled(X86_FEATURE_XSAVE))
fpu->state.xsave.header.xfeatures |= XFEATURE_MASK_FP; fpu->fpstate->regs.xsave.header.xfeatures |= XFEATURE_MASK_FP;
return ret;
return 0;
} }
#endif /* CONFIG_X86_32 || CONFIG_IA32_EMULATION */ #endif /* CONFIG_X86_32 || CONFIG_IA32_EMULATION */
此差异已折叠。
此差异已折叠。
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __X86_KERNEL_FPU_XSTATE_H
#define __X86_KERNEL_FPU_XSTATE_H
#include <asm/cpufeature.h>
#include <asm/fpu/xstate.h>
#include <asm/fpu/xcr.h>
#ifdef CONFIG_X86_64
DECLARE_PER_CPU(u64, xfd_state);
#endif
static inline void xstate_init_xcomp_bv(struct xregs_state *xsave, u64 mask)
{
/*
* XRSTORS requires these bits set in xcomp_bv, or it will
* trigger #GP:
*/
if (cpu_feature_enabled(X86_FEATURE_XSAVES))
xsave->header.xcomp_bv = mask | XCOMP_BV_COMPACTED_FORMAT;
}
static inline u64 xstate_get_host_group_perm(void)
{
/* Pairs with WRITE_ONCE() in xstate_request_perm() */
return READ_ONCE(current->group_leader->thread.fpu.perm.__state_perm);
}
enum xstate_copy_mode {
XSTATE_COPY_FP,
XSTATE_COPY_FX,
XSTATE_COPY_XSAVE,
};
struct membuf;
extern void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
u32 pkru_val, enum xstate_copy_mode copy_mode);
extern void copy_xstate_to_uabi_buf(struct membuf to, struct task_struct *tsk,
enum xstate_copy_mode mode);
extern int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf);
extern int copy_sigframe_from_user_to_xstate(struct fpstate *fpstate, const void __user *ubuf);
extern void fpu__init_cpu_xstate(void);
extern void fpu__init_system_xstate(unsigned int legacy_size);
extern void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr);
static inline u64 xfeatures_mask_supervisor(void)
{
return fpu_kernel_cfg.max_features & XFEATURE_MASK_SUPERVISOR_SUPPORTED;
}
static inline u64 xfeatures_mask_independent(void)
{
if (!cpu_feature_enabled(X86_FEATURE_ARCH_LBR))
return XFEATURE_MASK_INDEPENDENT & ~XFEATURE_MASK_LBR;
return XFEATURE_MASK_INDEPENDENT;
}
/* XSAVE/XRSTOR wrapper functions */
#ifdef CONFIG_X86_64
#define REX_PREFIX "0x48, "
#else
#define REX_PREFIX
#endif
/* These macros all use (%edi)/(%rdi) as the single memory argument. */
#define XSAVE ".byte " REX_PREFIX "0x0f,0xae,0x27"
#define XSAVEOPT ".byte " REX_PREFIX "0x0f,0xae,0x37"
#define XSAVES ".byte " REX_PREFIX "0x0f,0xc7,0x2f"
#define XRSTOR ".byte " REX_PREFIX "0x0f,0xae,0x2f"
#define XRSTORS ".byte " REX_PREFIX "0x0f,0xc7,0x1f"
/*
* After this @err contains 0 on success or the trap number when the
* operation raises an exception.
*/
#define XSTATE_OP(op, st, lmask, hmask, err) \
asm volatile("1:" op "\n\t" \
"xor %[err], %[err]\n" \
"2:\n\t" \
_ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_FAULT_MCE_SAFE) \
: [err] "=a" (err) \
: "D" (st), "m" (*st), "a" (lmask), "d" (hmask) \
: "memory")
/*
* If XSAVES is enabled, it replaces XSAVEOPT because it supports a compact
* format and supervisor states in addition to modified optimization in
* XSAVEOPT.
*
* Otherwise, if XSAVEOPT is enabled, XSAVEOPT replaces XSAVE because XSAVEOPT
* supports modified optimization which is not supported by XSAVE.
*
* We use XSAVE as a fallback.
*
* The 661 label is defined in the ALTERNATIVE* macros as the address of the
* original instruction which gets replaced. We need to use it here as the
* address of the instruction where we might get an exception at.
*/
#define XSTATE_XSAVE(st, lmask, hmask, err) \
asm volatile(ALTERNATIVE_2(XSAVE, \
XSAVEOPT, X86_FEATURE_XSAVEOPT, \
XSAVES, X86_FEATURE_XSAVES) \
"\n" \
"xor %[err], %[err]\n" \
"3:\n" \
".pushsection .fixup,\"ax\"\n" \
"4: movl $-2, %[err]\n" \
"jmp 3b\n" \
".popsection\n" \
_ASM_EXTABLE(661b, 4b) \
: [err] "=r" (err) \
: "D" (st), "m" (*st), "a" (lmask), "d" (hmask) \
: "memory")
/*
* Use XRSTORS to restore context if it is enabled. XRSTORS supports compact
* XSAVE area format.
*/
#define XSTATE_XRESTORE(st, lmask, hmask) \
asm volatile(ALTERNATIVE(XRSTOR, \
XRSTORS, X86_FEATURE_XSAVES) \
"\n" \
"3:\n" \
_ASM_EXTABLE_TYPE(661b, 3b, EX_TYPE_FPU_RESTORE) \
: \
: "D" (st), "m" (*st), "a" (lmask), "d" (hmask) \
: "memory")
#if defined(CONFIG_X86_64) && defined(CONFIG_X86_DEBUG_FPU)
extern void xfd_validate_state(struct fpstate *fpstate, u64 mask, bool rstor);
#else
static inline void xfd_validate_state(struct fpstate *fpstate, u64 mask, bool rstor) { }
#endif
#ifdef CONFIG_X86_64
static inline void xfd_update_state(struct fpstate *fpstate)
{
if (fpu_state_size_dynamic()) {
u64 xfd = fpstate->xfd;
if (__this_cpu_read(xfd_state) != xfd) {
wrmsrl(MSR_IA32_XFD, xfd);
__this_cpu_write(xfd_state, xfd);
}
}
}
#else
static inline void xfd_update_state(struct fpstate *fpstate) { }
#endif
/*
* Save processor xstate to xsave area.
*
* Uses either XSAVE or XSAVEOPT or XSAVES depending on the CPU features
* and command line options. The choice is permanent until the next reboot.
*/
static inline void os_xsave(struct fpstate *fpstate)
{
u64 mask = fpstate->xfeatures;
u32 lmask = mask;
u32 hmask = mask >> 32;
int err;
WARN_ON_FPU(!alternatives_patched);
xfd_validate_state(fpstate, mask, false);
XSTATE_XSAVE(&fpstate->regs.xsave, lmask, hmask, err);
/* We should never fault when copying to a kernel buffer: */
WARN_ON_FPU(err);
}
/*
* Restore processor xstate from xsave area.
*
* Uses XRSTORS when XSAVES is used, XRSTOR otherwise.
*/
static inline void os_xrstor(struct fpstate *fpstate, u64 mask)
{
u32 lmask = mask;
u32 hmask = mask >> 32;
xfd_validate_state(fpstate, mask, true);
XSTATE_XRESTORE(&fpstate->regs.xsave, lmask, hmask);
}
/* Restore of supervisor state. Does not require XFD */
static inline void os_xrstor_supervisor(struct fpstate *fpstate)
{
u64 mask = xfeatures_mask_supervisor();
u32 lmask = mask;
u32 hmask = mask >> 32;
XSTATE_XRESTORE(&fpstate->regs.xsave, lmask, hmask);
}
/*
* XSAVE itself always writes all requested xfeatures. Removing features
* from the request bitmap reduces the features which are written.
* Generate a mask of features which must be written to a sigframe. The
* unset features can be optimized away and not written.
*
* This optimization is user-visible. Only use for states where
* uninitialized sigframe contents are tolerable, like dynamic features.
*
* Users of buffers produced with this optimization must check XSTATE_BV
* to determine which features have been optimized out.
*/
static inline u64 xfeatures_need_sigframe_write(void)
{
u64 xfeaures_to_write;
/* In-use features must be written: */
xfeaures_to_write = xfeatures_in_use();
/* Also write all non-optimizable sigframe features: */
xfeaures_to_write |= XFEATURE_MASK_USER_SUPPORTED &
~XFEATURE_MASK_SIGFRAME_INITOPT;
return xfeaures_to_write;
}
/*
* Save xstate to user space xsave area.
*
* We don't use modified optimization because xrstor/xrstors might track
* a different application.
*
* We don't use compacted format xsave area for backward compatibility for
* old applications which don't understand the compacted format of the
* xsave area.
*
* The caller has to zero buf::header before calling this because XSAVE*
* does not touch the reserved fields in the header.
*/
static inline int xsave_to_user_sigframe(struct xregs_state __user *buf)
{
/*
* Include the features which are not xsaved/rstored by the kernel
* internally, e.g. PKRU. That's user space ABI and also required
* to allow the signal handler to modify PKRU.
*/
struct fpstate *fpstate = current->thread.fpu.fpstate;
u64 mask = fpstate->user_xfeatures;
u32 lmask;
u32 hmask;
int err;
/* Optimize away writing unnecessary xfeatures: */
if (fpu_state_size_dynamic())
mask &= xfeatures_need_sigframe_write();
lmask = mask;
hmask = mask >> 32;
xfd_validate_state(fpstate, mask, false);
stac();
XSTATE_OP(XSAVE, buf, lmask, hmask, err);
clac();
return err;
}
/*
* Restore xstate from user space xsave area.
*/
static inline int xrstor_from_user_sigframe(struct xregs_state __user *buf, u64 mask)
{
struct xregs_state *xstate = ((__force struct xregs_state *)buf);
u32 lmask = mask;
u32 hmask = mask >> 32;
int err;
xfd_validate_state(current->thread.fpu.fpstate, mask, true);
stac();
XSTATE_OP(XRSTOR, xstate, lmask, hmask, err);
clac();
return err;
}
/*
* Restore xstate from kernel space xsave area, return an error code instead of
* an exception.
*/
static inline int os_xrstor_safe(struct fpstate *fpstate, u64 mask)
{
struct xregs_state *xstate = &fpstate->regs.xsave;
u32 lmask = mask;
u32 hmask = mask >> 32;
int err;
/* Ensure that XFD is up to date */
xfd_update_state(fpstate);
if (cpu_feature_enabled(X86_FEATURE_XSAVES))
XSTATE_OP(XRSTORS, xstate, lmask, hmask, err);
else
XSTATE_OP(XRSTOR, xstate, lmask, hmask, err);
return err;
}
#endif
...@@ -30,7 +30,9 @@ ...@@ -30,7 +30,9 @@
#include <asm/apic.h> #include <asm/apic.h>
#include <linux/uaccess.h> #include <linux/uaccess.h>
#include <asm/mwait.h> #include <asm/mwait.h>
#include <asm/fpu/internal.h> #include <asm/fpu/api.h>
#include <asm/fpu/sched.h>
#include <asm/fpu/xstate.h>
#include <asm/debugreg.h> #include <asm/debugreg.h>
#include <asm/nmi.h> #include <asm/nmi.h>
#include <asm/tlbflush.h> #include <asm/tlbflush.h>
...@@ -93,9 +95,19 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src) ...@@ -93,9 +95,19 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
#ifdef CONFIG_VM86 #ifdef CONFIG_VM86
dst->thread.vm86 = NULL; dst->thread.vm86 = NULL;
#endif #endif
/* Drop the copied pointer to current's fpstate */
dst->thread.fpu.fpstate = NULL;
return fpu__copy(dst, src); return 0;
}
#ifdef CONFIG_X86_64
void arch_release_task_struct(struct task_struct *tsk)
{
if (fpu_state_size_dynamic())
fpstate_free(&tsk->thread.fpu);
} }
#endif
/* /*
* Free thread data structures etc.. * Free thread data structures etc..
...@@ -162,13 +174,22 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg, ...@@ -162,13 +174,22 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg,
frame->flags = X86_EFLAGS_FIXED; frame->flags = X86_EFLAGS_FIXED;
#endif #endif
fpu_clone(p, clone_flags);
/* Kernel thread ? */ /* Kernel thread ? */
if (unlikely(p->flags & PF_KTHREAD)) { if (unlikely(p->flags & PF_KTHREAD)) {
p->thread.pkru = pkru_get_init_value();
memset(childregs, 0, sizeof(struct pt_regs)); memset(childregs, 0, sizeof(struct pt_regs));
kthread_frame_init(frame, sp, arg); kthread_frame_init(frame, sp, arg);
return 0; return 0;
} }
/*
* Clone current's PKRU value from hardware. tsk->thread.pkru
* is only valid when scheduled out.
*/
p->thread.pkru = read_pkru();
frame->bx = 0; frame->bx = 0;
*childregs = *current_pt_regs(); *childregs = *current_pt_regs();
childregs->ax = 0; childregs->ax = 0;
...@@ -189,6 +210,15 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg, ...@@ -189,6 +210,15 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg,
return ret; return ret;
} }
static void pkru_flush_thread(void)
{
/*
* If PKRU is enabled the default PKRU value has to be loaded into
* the hardware right here (similar to context switch).
*/
pkru_write_default();
}
void flush_thread(void) void flush_thread(void)
{ {
struct task_struct *tsk = current; struct task_struct *tsk = current;
...@@ -196,7 +226,8 @@ void flush_thread(void) ...@@ -196,7 +226,8 @@ void flush_thread(void)
flush_ptrace_hw_breakpoint(tsk); flush_ptrace_hw_breakpoint(tsk);
memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array)); memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
fpu__clear_all(&tsk->thread.fpu); fpu_flush_thread();
pkru_flush_thread();
} }
void disable_TSC(void) void disable_TSC(void)
...@@ -944,13 +975,17 @@ unsigned long get_wchan(struct task_struct *p) ...@@ -944,13 +975,17 @@ unsigned long get_wchan(struct task_struct *p)
} }
long do_arch_prctl_common(struct task_struct *task, int option, long do_arch_prctl_common(struct task_struct *task, int option,
unsigned long cpuid_enabled) unsigned long arg2)
{ {
switch (option) { switch (option) {
case ARCH_GET_CPUID: case ARCH_GET_CPUID:
return get_cpuid_mode(); return get_cpuid_mode();
case ARCH_SET_CPUID: case ARCH_SET_CPUID:
return set_cpuid_mode(task, cpuid_enabled); return set_cpuid_mode(task, arg2);
case ARCH_GET_XCOMP_SUPP:
case ARCH_GET_XCOMP_PERM:
case ARCH_REQ_XCOMP_PERM:
return fpu_xstate_prctl(task, option, arg2);
} }
return -EINVAL; return -EINVAL;
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
...@@ -23,7 +23,7 @@ ...@@ -23,7 +23,7 @@
#include <asm/stacktrace.h> #include <asm/stacktrace.h>
#include <asm/sev-es.h> #include <asm/sev-es.h>
#include <asm/insn-eval.h> #include <asm/insn-eval.h>
#include <asm/fpu/internal.h> #include <asm/fpu/xcr.h>
#include <asm/processor.h> #include <asm/processor.h>
#include <asm/realmode.h> #include <asm/realmode.h>
#include <asm/traps.h> #include <asm/traps.h>
......
此差异已折叠。
...@@ -69,7 +69,7 @@ ...@@ -69,7 +69,7 @@
#include <asm/mwait.h> #include <asm/mwait.h>
#include <asm/apic.h> #include <asm/apic.h>
#include <asm/io_apic.h> #include <asm/io_apic.h>
#include <asm/fpu/internal.h> #include <asm/fpu/api.h>
#include <asm/setup.h> #include <asm/setup.h>
#include <asm/uv/uv.h> #include <asm/uv/uv.h>
#include <linux/mc146818rtc.h> #include <linux/mc146818rtc.h>
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册