未验证 提交 0424a4c7 编写于 作者: O openeuler-ci-bot 提交者: Gitee

!50 arm64: support SME(Scalable Matrix Extension)

Merge Pull Request from: @jentlestea 
 
This patchset aims to support SME feature in arm64 architecture.

[Testing]
CONFIG_ARM64_SME=y

Checking https://developer.arm.com/documentation/ddi0616/latest for details.

Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> 
 
Link:https://gitee.com/openeuler/kernel/pulls/50 
Reviewed-by: Xie XiuQi <xiexiuqi@huawei.com> 
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com> 
...@@ -257,6 +257,40 @@ HWCAP2_RPRES ...@@ -257,6 +257,40 @@ HWCAP2_RPRES
Functionality implied by ID_AA64ISAR2_EL1.RPRES == 0b0001. Functionality implied by ID_AA64ISAR2_EL1.RPRES == 0b0001.
HWCAP2_SME
Functionality implied by ID_AA64PFR1_EL1.SME == 0b0001, as described
by Documentation/arm64/sme.rst.
HWCAP2_SME_I16I64
Functionality implied by ID_AA64SMFR0_EL1.I16I64 == 0b1111.
HWCAP2_SME_F64F64
Functionality implied by ID_AA64SMFR0_EL1.F64F64 == 0b1.
HWCAP2_SME_I8I32
Functionality implied by ID_AA64SMFR0_EL1.I8I32 == 0b1111.
HWCAP2_SME_F16F32
Functionality implied by ID_AA64SMFR0_EL1.F16F32 == 0b1.
HWCAP2_SME_B16F32
Functionality implied by ID_AA64SMFR0_EL1.B16F32 == 0b1.
HWCAP2_SME_F32F32
Functionality implied by ID_AA64SMFR0_EL1.F32F32 == 0b1.
HWCAP2_SME_FA64
Functionality implied by ID_AA64SMFR0_EL1.FA64 == 0b1.
4. Unused AT_HWCAP bits 4. Unused AT_HWCAP bits
----------------------- -----------------------
......
...@@ -20,6 +20,7 @@ ARM64 Architecture ...@@ -20,6 +20,7 @@ ARM64 Architecture
perf perf
pointer-authentication pointer-authentication
silicon-errata silicon-errata
sme
sve sve
tagged-address-abi tagged-address-abi
tagged-pointers tagged-pointers
......
===================================================
Scalable Matrix Extension support for AArch64 Linux
===================================================
This document outlines briefly the interface provided to userspace by Linux in
order to support use of the ARM Scalable Matrix Extension (SME).
This is an outline of the most important features and issues only and not
intended to be exhaustive. It should be read in conjunction with the SVE
documentation in sve.rst which provides details on the Streaming SVE mode
included in SME.
This document does not aim to describe the SME architecture or programmer's
model. To aid understanding, a minimal description of relevant programmer's
model features for SME is included in Appendix A.
1. General
-----------
* PSTATE.SM, PSTATE.ZA, the streaming mode vector length, the ZA
register state and TPIDR2_EL0 are tracked per thread.
* The presence of SME is reported to userspace via HWCAP2_SME in the aux vector
AT_HWCAP2 entry. Presence of this flag implies the presence of the SME
instructions and registers, and the Linux-specific system interfaces
described in this document. SME is reported in /proc/cpuinfo as "sme".
* Support for the execution of SME instructions in userspace can also be
detected by reading the CPU ID register ID_AA64PFR1_EL1 using an MRS
instruction, and checking that the value of the SME field is nonzero. [3]
It does not guarantee the presence of the system interfaces described in the
following sections: software that needs to verify that those interfaces are
present must check for HWCAP2_SME instead.
* There are a number of optional SME features, presence of these is reported
through AT_HWCAP2 through:
HWCAP2_SME_I16I64
HWCAP2_SME_F64F64
HWCAP2_SME_I8I32
HWCAP2_SME_F16F32
HWCAP2_SME_B16F32
HWCAP2_SME_F32F32
HWCAP2_SME_FA64
This list may be extended over time as the SME architecture evolves.
These extensions are also reported via the CPU ID register ID_AA64SMFR0_EL1,
which userspace can read using an MRS instruction. See elf_hwcaps.txt and
cpu-feature-registers.txt for details.
* Debuggers should restrict themselves to interacting with the target via the
NT_ARM_SVE, NT_ARM_SSVE and NT_ARM_ZA regsets. The recommended way
of detecting support for these regsets is to connect to a target process
first and then attempt a
ptrace(PTRACE_GETREGSET, pid, NT_ARM_<regset>, &iov).
* Whenever ZA register values are exchanged in memory between userspace and
the kernel, the register value is encoded in memory as a series of horizontal
vectors from 0 to VL/8-1 stored in the same endianness invariant format as is
used for SVE vectors.
* On thread creation TPIDR2_EL0 is preserved unless CLONE_SETTLS is specified,
in which case it is set to 0.
2. Vector lengths
------------------
SME defines a second vector length similar to the SVE vector length which is
controls the size of the streaming mode SVE vectors and the ZA matrix array.
The ZA matrix is square with each side having as many bytes as a streaming
mode SVE vector.
3. Sharing of streaming and non-streaming mode SVE state
---------------------------------------------------------
It is implementation defined which if any parts of the SVE state are shared
between streaming and non-streaming modes. When switching between modes
via software interfaces such as ptrace if no register content is provided as
part of switching no state will be assumed to be shared and everything will
be zeroed.
4. System call behaviour
-------------------------
* On syscall PSTATE.ZA is preserved, if PSTATE.ZA==1 then the contents of the
ZA matrix are preserved.
* On syscall PSTATE.SM will be cleared and the SVE registers will be handled
as per the standard SVE ABI.
* Neither the SVE registers nor ZA are used to pass arguments to or receive
results from any syscall.
* On process creation (eg, clone()) the newly created process will have
PSTATE.SM cleared.
* All other SME state of a thread, including the currently configured vector
length, the state of the PR_SME_VL_INHERIT flag, and the deferred vector
length (if any), is preserved across all syscalls, subject to the specific
exceptions for execve() described in section 6.
5. Signal handling
-------------------
* Signal handlers are invoked with streaming mode and ZA disabled.
* A new signal frame record za_context encodes the ZA register contents on
signal delivery. [1]
* The signal frame record for ZA always contains basic metadata, in particular
the thread's vector length (in za_context.vl).
* The ZA matrix may or may not be included in the record, depending on
the value of PSTATE.ZA. The registers are present if and only if:
za_context.head.size >= ZA_SIG_CONTEXT_SIZE(sve_vq_from_vl(za_context.vl))
in which case PSTATE.ZA == 1.
* If matrix data is present, the remainder of the record has a vl-dependent
size and layout. Macros ZA_SIG_* are defined [1] to facilitate access to
them.
* The matrix is stored as a series of horizontal vectors in the same format as
is used for SVE vectors.
* If the ZA context is too big to fit in sigcontext.__reserved[], then extra
space is allocated on the stack, an extra_context record is written in
__reserved[] referencing this space. za_context is then written in the
extra space. Refer to [1] for further details about this mechanism.
5. Signal return
-----------------
When returning from a signal handler:
* If there is no za_context record in the signal frame, or if the record is
present but contains no register data as described in the previous section,
then ZA is disabled.
* If za_context is present in the signal frame and contains matrix data then
PSTATE.ZA is set to 1 and ZA is populated with the specified data.
* The vector length cannot be changed via signal return. If za_context.vl in
the signal frame does not match the current vector length, the signal return
attempt is treated as illegal, resulting in a forced SIGSEGV.
6. prctl extensions
--------------------
Some new prctl() calls are added to allow programs to manage the SME vector
length:
prctl(PR_SME_SET_VL, unsigned long arg)
Sets the vector length of the calling thread and related flags, where
arg == vl | flags. Other threads of the calling process are unaffected.
vl is the desired vector length, where sve_vl_valid(vl) must be true.
flags:
PR_SME_VL_INHERIT
Inherit the current vector length across execve(). Otherwise, the
vector length is reset to the system default at execve(). (See
Section 9.)
PR_SME_SET_VL_ONEXEC
Defer the requested vector length change until the next execve()
performed by this thread.
The effect is equivalent to implicit execution of the following
call immediately after the next execve() (if any) by the thread:
prctl(PR_SME_SET_VL, arg & ~PR_SME_SET_VL_ONEXEC)
This allows launching of a new program with a different vector
length, while avoiding runtime side effects in the caller.
Without PR_SME_SET_VL_ONEXEC, the requested change takes effect
immediately.
Return value: a nonnegative on success, or a negative value on error:
EINVAL: SME not supported, invalid vector length requested, or
invalid flags.
On success:
* Either the calling thread's vector length or the deferred vector length
to be applied at the next execve() by the thread (dependent on whether
PR_SME_SET_VL_ONEXEC is present in arg), is set to the largest value
supported by the system that is less than or equal to vl. If vl ==
SVE_VL_MAX, the value set will be the largest value supported by the
system.
* Any previously outstanding deferred vector length change in the calling
thread is cancelled.
* The returned value describes the resulting configuration, encoded as for
PR_SME_GET_VL. The vector length reported in this value is the new
current vector length for this thread if PR_SME_SET_VL_ONEXEC was not
present in arg; otherwise, the reported vector length is the deferred
vector length that will be applied at the next execve() by the calling
thread.
* Changing the vector length causes all of ZA, P0..P15, FFR and all bits of
Z0..Z31 except for Z0 bits [127:0] .. Z31 bits [127:0] to become
unspecified, including both streaming and non-streaming SVE state.
Calling PR_SME_SET_VL with vl equal to the thread's current vector
length, or calling PR_SME_SET_VL with the PR_SVE_SET_VL_ONEXEC flag,
does not constitute a change to the vector length for this purpose.
* Changing the vector length causes PSTATE.ZA and PSTATE.SM to be cleared.
Calling PR_SME_SET_VL with vl equal to the thread's current vector
length, or calling PR_SME_SET_VL with the PR_SVE_SET_VL_ONEXEC flag,
does not constitute a change to the vector length for this purpose.
prctl(PR_SME_GET_VL)
Gets the vector length of the calling thread.
The following flag may be OR-ed into the result:
PR_SME_VL_INHERIT
Vector length will be inherited across execve().
There is no way to determine whether there is an outstanding deferred
vector length change (which would only normally be the case between a
fork() or vfork() and the corresponding execve() in typical use).
To extract the vector length from the result, bitwise and it with
PR_SME_VL_LEN_MASK.
Return value: a nonnegative value on success, or a negative value on error:
EINVAL: SME not supported.
7. ptrace extensions
---------------------
* A new regset NT_ARM_SSVE is defined for access to streaming mode SVE
state via PTRACE_GETREGSET and PTRACE_SETREGSET, this is documented in
sve.rst.
* A new regset NT_ARM_ZA is defined for ZA state for access to ZA state via
PTRACE_GETREGSET and PTRACE_SETREGSET.
Refer to [2] for definitions.
The regset data starts with struct user_za_header, containing:
size
Size of the complete regset, in bytes.
This depends on vl and possibly on other things in the future.
If a call to PTRACE_GETREGSET requests less data than the value of
size, the caller can allocate a larger buffer and retry in order to
read the complete regset.
max_size
Maximum size in bytes that the regset can grow to for the target
thread. The regset won't grow bigger than this even if the target
thread changes its vector length etc.
vl
Target thread's current streaming vector length, in bytes.
max_vl
Maximum possible streaming vector length for the target thread.
flags
Zero or more of the following flags, which have the same
meaning and behaviour as the corresponding PR_SET_VL_* flags:
SME_PT_VL_INHERIT
SME_PT_VL_ONEXEC (SETREGSET only).
* The effects of changing the vector length and/or flags are equivalent to
those documented for PR_SME_SET_VL.
The caller must make a further GETREGSET call if it needs to know what VL is
actually set by SETREGSET, unless is it known in advance that the requested
VL is supported.
* The size and layout of the payload depends on the header fields. The
SME_PT_ZA_*() macros are provided to facilitate access to the data.
* In either case, for SETREGSET it is permissible to omit the payload, in which
case the vector length and flags are changed and PSTATE.ZA is set to 0
(along with any consequences of those changes). If a payload is provided
then PSTATE.ZA will be set to 1.
* For SETREGSET, if the requested VL is not supported, the effect will be the
same as if the payload were omitted, except that an EIO error is reported.
No attempt is made to translate the payload data to the correct layout
for the vector length actually set. It is up to the caller to translate the
payload layout for the actual VL and retry.
* The effect of writing a partial, incomplete payload is unspecified.
8. ELF coredump extensions
---------------------------
* NT_ARM_SSVE notes will be added to each coredump for
each thread of the dumped process. The contents will be equivalent to the
data that would have been read if a PTRACE_GETREGSET of the corresponding
type were executed for each thread when the coredump was generated.
* A NT_ARM_ZA note will be added to each coredump for each thread of the
dumped process. The contents will be equivalent to the data that would have
been read if a PTRACE_GETREGSET of NT_ARM_ZA were executed for each thread
when the coredump was generated.
9. System runtime configuration
--------------------------------
* To mitigate the ABI impact of expansion of the signal frame, a policy
mechanism is provided for administrators, distro maintainers and developers
to set the default vector length for userspace processes:
/proc/sys/abi/sme_default_vector_length
Writing the text representation of an integer to this file sets the system
default vector length to the specified value, unless the value is greater
than the maximum vector length supported by the system in which case the
default vector length is set to that maximum.
The result can be determined by reopening the file and reading its
contents.
At boot, the default vector length is initially set to 32 or the maximum
supported vector length, whichever is smaller and supported. This
determines the initial vector length of the init process (PID 1).
Reading this file returns the current system default vector length.
* At every execve() call, the new vector length of the new process is set to
the system default vector length, unless
* PR_SME_VL_INHERIT (or equivalently SME_PT_VL_INHERIT) is set for the
calling thread, or
* a deferred vector length change is pending, established via the
PR_SME_SET_VL_ONEXEC flag (or SME_PT_VL_ONEXEC).
* Modifying the system default vector length does not affect the vector length
of any existing process or thread that does not make an execve() call.
Appendix A. SME programmer's model (informative)
=================================================
This section provides a minimal description of the additions made by SME to the
ARMv8-A programmer's model that are relevant to this document.
Note: This section is for information only and not intended to be complete or
to replace any architectural specification.
A.1. Registers
---------------
In A64 state, SME adds the following:
* A new mode, streaming mode, in which a subset of the normal FPSIMD and SVE
features are available. When supported EL0 software may enter and leave
streaming mode at any time.
For best system performance it is strongly encouraged for software to enable
streaming mode only when it is actively being used.
* A new vector length controlling the size of ZA and the Z registers when in
streaming mode, separately to the vector length used for SVE when not in
streaming mode. There is no requirement that either the currently selected
vector length or the set of vector lengths supported for the two modes in
a given system have any relationship. The streaming mode vector length
is referred to as SVL.
* A new ZA matrix register. This is a square matrix of SVLxSVL bits. Most
operations on ZA require that streaming mode be enabled but ZA can be
enabled without streaming mode in order to load, save and retain data.
For best system performance it is strongly encouraged for software to enable
ZA only when it is actively being used.
* Two new 1 bit fields in PSTATE which may be controlled via the SMSTART and
SMSTOP instructions or by access to the SVCR system register:
* PSTATE.ZA, if this is 1 then the ZA matrix is accessible and has valid
data while if it is 0 then ZA can not be accessed. When PSTATE.ZA is
changed from 0 to 1 all bits in ZA are cleared.
* PSTATE.SM, if this is 1 then the PE is in streaming mode. When the value
of PSTATE.SM is changed then it is implementation defined if the subset
of the floating point register bits valid in both modes may be retained.
Any other bits will be cleared.
References
==========
[1] arch/arm64/include/uapi/asm/sigcontext.h
AArch64 Linux signal ABI definitions
[2] arch/arm64/include/uapi/asm/ptrace.h
AArch64 Linux ptrace ABI definitions
[3] Documentation/arm64/cpu-feature-registers.rst
...@@ -7,7 +7,9 @@ Author: Dave Martin <Dave.Martin@arm.com> ...@@ -7,7 +7,9 @@ Author: Dave Martin <Dave.Martin@arm.com>
Date: 4 August 2017 Date: 4 August 2017
This document outlines briefly the interface provided to userspace by Linux in This document outlines briefly the interface provided to userspace by Linux in
order to support use of the ARM Scalable Vector Extension (SVE). order to support use of the ARM Scalable Vector Extension (SVE), including
interactions with Streaming SVE mode added by the Scalable Matrix Extension
(SME).
This is an outline of the most important features and issues only and not This is an outline of the most important features and issues only and not
intended to be exhaustive. intended to be exhaustive.
...@@ -23,6 +25,10 @@ model features for SVE is included in Appendix A. ...@@ -23,6 +25,10 @@ model features for SVE is included in Appendix A.
* SVE registers Z0..Z31, P0..P15 and FFR and the current vector length VL, are * SVE registers Z0..Z31, P0..P15 and FFR and the current vector length VL, are
tracked per-thread. tracked per-thread.
* In streaming mode FFR is not accessible unless HWCAP2_SME_FA64 is present
in the system, when it is not supported and these interfaces are used to
access streaming mode FFR is read and written as zero.
* The presence of SVE is reported to userspace via HWCAP_SVE in the aux vector * The presence of SVE is reported to userspace via HWCAP_SVE in the aux vector
AT_HWCAP entry. Presence of this flag implies the presence of the SVE AT_HWCAP entry. Presence of this flag implies the presence of the SVE
instructions and registers, and the Linux-specific system interfaces instructions and registers, and the Linux-specific system interfaces
...@@ -53,10 +59,19 @@ model features for SVE is included in Appendix A. ...@@ -53,10 +59,19 @@ model features for SVE is included in Appendix A.
which userspace can read using an MRS instruction. See elf_hwcaps.txt and which userspace can read using an MRS instruction. See elf_hwcaps.txt and
cpu-feature-registers.txt for details. cpu-feature-registers.txt for details.
* On hardware that supports the SME extensions, HWCAP2_SME will also be
reported in the AT_HWCAP2 aux vector entry. Among other things SME adds
streaming mode which provides a subset of the SVE feature set using a
separate SME vector length and the same Z/V registers. See sme.rst
for more details.
* Debuggers should restrict themselves to interacting with the target via the * Debuggers should restrict themselves to interacting with the target via the
NT_ARM_SVE regset. The recommended way of detecting support for this regset NT_ARM_SVE regset. The recommended way of detecting support for this regset
is to connect to a target process first and then attempt a is to connect to a target process first and then attempt a
ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov). ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov). Note that when SME is
present and streaming SVE mode is in use the FPSIMD subset of registers
will be read via NT_ARM_SVE and NT_ARM_SVE writes will exit streaming mode
in the target.
* Whenever SVE scalable register values (Zn, Pn, FFR) are exchanged in memory * Whenever SVE scalable register values (Zn, Pn, FFR) are exchanged in memory
between userspace and the kernel, the register value is encoded in memory in between userspace and the kernel, the register value is encoded in memory in
...@@ -126,6 +141,11 @@ the SVE instruction set architecture. ...@@ -126,6 +141,11 @@ the SVE instruction set architecture.
are only present in fpsimd_context. For convenience, the content of V0..V31 are only present in fpsimd_context. For convenience, the content of V0..V31
is duplicated between sve_context and fpsimd_context. is duplicated between sve_context and fpsimd_context.
* The record contains a flag field which includes a flag SVE_SIG_FLAG_SM which
if set indicates that the thread is in streaming mode and the vector length
and register data (if present) describe the streaming SVE data and vector
length.
* The signal frame record for SVE always contains basic metadata, in particular * The signal frame record for SVE always contains basic metadata, in particular
the thread's vector length (in sve_context.vl). the thread's vector length (in sve_context.vl).
...@@ -170,6 +190,11 @@ When returning from a signal handler: ...@@ -170,6 +190,11 @@ When returning from a signal handler:
the signal frame does not match the current vector length, the signal return the signal frame does not match the current vector length, the signal return
attempt is treated as illegal, resulting in a forced SIGSEGV. attempt is treated as illegal, resulting in a forced SIGSEGV.
* It is permitted to enter or leave streaming mode by setting or clearing
the SVE_SIG_FLAG_SM flag but applications should take care to ensure that
when doing so sve_context.vl and any register data are appropriate for the
vector length in the new mode.
6. prctl extensions 6. prctl extensions
-------------------- --------------------
...@@ -255,7 +280,7 @@ prctl(PR_SVE_GET_VL) ...@@ -255,7 +280,7 @@ prctl(PR_SVE_GET_VL)
vector length change (which would only normally be the case between a vector length change (which would only normally be the case between a
fork() or vfork() and the corresponding execve() in typical use). fork() or vfork() and the corresponding execve() in typical use).
To extract the vector length from the result, and it with To extract the vector length from the result, bitwise and it with
PR_SVE_VL_LEN_MASK. PR_SVE_VL_LEN_MASK.
Return value: a nonnegative value on success, or a negative value on error: Return value: a nonnegative value on success, or a negative value on error:
...@@ -265,8 +290,14 @@ prctl(PR_SVE_GET_VL) ...@@ -265,8 +290,14 @@ prctl(PR_SVE_GET_VL)
7. ptrace extensions 7. ptrace extensions
--------------------- ---------------------
* A new regset NT_ARM_SVE is defined for use with PTRACE_GETREGSET and * New regsets NT_ARM_SVE and NT_ARM_SSVE are defined for use with
PTRACE_SETREGSET. PTRACE_GETREGSET and PTRACE_SETREGSET. NT_ARM_SSVE describes the
streaming mode SVE registers and NT_ARM_SVE describes the
non-streaming mode SVE registers.
In this description a register set is referred to as being "live" when
the target is in the appropriate streaming or non-streaming mode and is
using data beyond the subset shared with the FPSIMD Vn registers.
Refer to [2] for definitions. Refer to [2] for definitions.
...@@ -297,7 +328,7 @@ The regset data starts with struct user_sve_header, containing: ...@@ -297,7 +328,7 @@ The regset data starts with struct user_sve_header, containing:
flags flags
either at most one of
SVE_PT_REGS_FPSIMD SVE_PT_REGS_FPSIMD
...@@ -331,6 +362,10 @@ The regset data starts with struct user_sve_header, containing: ...@@ -331,6 +362,10 @@ The regset data starts with struct user_sve_header, containing:
SVE_PT_VL_ONEXEC (SETREGSET only). SVE_PT_VL_ONEXEC (SETREGSET only).
If neither FPSIMD nor SVE flags are provided then no register
payload is available, this is only possible when SME is implemented.
* The effects of changing the vector length and/or flags are equivalent to * The effects of changing the vector length and/or flags are equivalent to
those documented for PR_SVE_SET_VL. those documented for PR_SVE_SET_VL.
...@@ -346,6 +381,13 @@ The regset data starts with struct user_sve_header, containing: ...@@ -346,6 +381,13 @@ The regset data starts with struct user_sve_header, containing:
case only the vector length and flags are changed (along with any case only the vector length and flags are changed (along with any
consequences of those changes). consequences of those changes).
* In systems supporting SME when in streaming mode a GETREGSET for
NT_REG_SVE will return only the user_sve_header with no register data,
similarly a GETREGSET for NT_REG_SSVE will not return any register data
when not in streaming mode.
* A GETREGSET for NT_ARM_SSVE will never return SVE_PT_REGS_FPSIMD.
* For SETREGSET, if an SVE_PT_REGS_SVE payload is present and the * For SETREGSET, if an SVE_PT_REGS_SVE payload is present and the
requested VL is not supported, the effect will be the same as if the requested VL is not supported, the effect will be the same as if the
payload were omitted, except that an EIO error is reported. No payload were omitted, except that an EIO error is reported. No
...@@ -355,17 +397,25 @@ The regset data starts with struct user_sve_header, containing: ...@@ -355,17 +397,25 @@ The regset data starts with struct user_sve_header, containing:
unspecified. It is up to the caller to translate the payload layout unspecified. It is up to the caller to translate the payload layout
for the actual VL and retry. for the actual VL and retry.
* Where SME is implemented it is not possible to GETREGSET the register
state for normal SVE when in streaming mode, nor the streaming mode
register state when in normal mode, regardless of the implementation defined
behaviour of the hardware for sharing data between the two modes.
* Any SETREGSET of NT_ARM_SVE will exit streaming mode if the target was in
streaming mode and any SETREGSET of NT_ARM_SSVE will enter streaming mode
if the target was not in streaming mode.
* The effect of writing a partial, incomplete payload is unspecified. * The effect of writing a partial, incomplete payload is unspecified.
8. ELF coredump extensions 8. ELF coredump extensions
--------------------------- ---------------------------
* A NT_ARM_SVE note will be added to each coredump for each thread of the * NT_ARM_SVE and NT_ARM_SSVE notes will be added to each coredump for
dumped process. The contents will be equivalent to the data that would have each thread of the dumped process. The contents will be equivalent to the
been read if a PTRACE_GETREGSET of NT_ARM_SVE were executed for each thread data that would have been read if a PTRACE_GETREGSET of the corresponding
when the coredump was generated. type were executed for each thread when the coredump was generated.
9. System runtime configuration 9. System runtime configuration
-------------------------------- --------------------------------
......
...@@ -1917,6 +1917,17 @@ config ARM64_SVE ...@@ -1917,6 +1917,17 @@ config ARM64_SVE
Thus, you will need to enable CONFIG_ARM64_VHE if you want to support Thus, you will need to enable CONFIG_ARM64_VHE if you want to support
KVM in the same kernel image. KVM in the same kernel image.
config ARM64_SME
bool "ARM Scalable Matrix Extension support"
default y
depends on ARM64_SVE
help
The Scalable Matrix Extension (SME) is an extension to the AArch64
execution state which utilises a substantial subset of the SVE
instruction set, together with the addition of new architectural
register state capable of holding two dimensional matrix tiles to
enable various matrix operations.
config ARM64_MODULE_PLTS config ARM64_MODULE_PLTS
bool "Use PLTs to allow module memory to spill over into vmalloc area" bool "Use PLTs to allow module memory to spill over into vmalloc area"
depends on MODULES depends on MODULES
......
...@@ -32,6 +32,7 @@ struct cpuinfo_arm64 { ...@@ -32,6 +32,7 @@ struct cpuinfo_arm64 {
u64 reg_id_aa64pfr0; u64 reg_id_aa64pfr0;
u64 reg_id_aa64pfr1; u64 reg_id_aa64pfr1;
u64 reg_id_aa64zfr0; u64 reg_id_aa64zfr0;
u64 reg_id_aa64smfr0;
u32 reg_id_dfr0; u32 reg_id_dfr0;
u32 reg_id_dfr1; u32 reg_id_dfr1;
...@@ -58,6 +59,9 @@ struct cpuinfo_arm64 { ...@@ -58,6 +59,9 @@ struct cpuinfo_arm64 {
/* pseudo-ZCR for recording maximum ZCR_EL1 LEN value: */ /* pseudo-ZCR for recording maximum ZCR_EL1 LEN value: */
u64 reg_zcr; u64 reg_zcr;
/* pseudo-SMCR for recording maximum SMCR_EL1 LEN value: */
u64 reg_smcr;
}; };
DECLARE_PER_CPU(struct cpuinfo_arm64, cpu_data); DECLARE_PER_CPU(struct cpuinfo_arm64, cpu_data);
......
...@@ -72,6 +72,8 @@ ...@@ -72,6 +72,8 @@
#define ARM64_HAS_ECV 64 #define ARM64_HAS_ECV 64
#define ARM64_HAS_EPAN 65 #define ARM64_HAS_EPAN 65
#define ARM64_SPECTRE_BHB 66 #define ARM64_SPECTRE_BHB 66
#define ARM64_SME 67
#define ARM64_SME_FA64 68
#define ARM64_NCAPS 80 #define ARM64_NCAPS 80
......
...@@ -335,6 +335,7 @@ struct arm64_cpu_capabilities { ...@@ -335,6 +335,7 @@ struct arm64_cpu_capabilities {
struct { /* Feature register checking */ struct { /* Feature register checking */
u32 sys_reg; u32 sys_reg;
u8 field_pos; u8 field_pos;
u8 field_width;
u8 min_field_value; u8 min_field_value;
u8 hwcap_type; u8 hwcap_type;
bool sign; bool sign;
...@@ -598,6 +599,13 @@ static inline bool id_aa64pfr0_sve(u64 pfr0) ...@@ -598,6 +599,13 @@ static inline bool id_aa64pfr0_sve(u64 pfr0)
return val > 0; return val > 0;
} }
static inline bool id_aa64pfr1_sme(u64 pfr1)
{
u32 val = cpuid_feature_extract_unsigned_field(pfr1, ID_AA64PFR1_SME_SHIFT);
return val > 0;
}
void __init setup_cpu_features(void); void __init setup_cpu_features(void);
void check_local_cpu_capabilities(void); void check_local_cpu_capabilities(void);
...@@ -717,6 +725,23 @@ static __always_inline bool system_supports_sve(void) ...@@ -717,6 +725,23 @@ static __always_inline bool system_supports_sve(void)
cpus_have_const_cap(ARM64_SVE); cpus_have_const_cap(ARM64_SVE);
} }
static __always_inline bool system_supports_sme(void)
{
return IS_ENABLED(CONFIG_ARM64_SME) &&
cpus_have_const_cap(ARM64_SME);
}
static __always_inline bool system_supports_fa64(void)
{
return IS_ENABLED(CONFIG_ARM64_SME) &&
cpus_have_const_cap(ARM64_SME_FA64);
}
static __always_inline bool system_supports_tpidr2(void)
{
return system_supports_sme();
}
static __always_inline bool system_supports_cnp(void) static __always_inline bool system_supports_cnp(void)
{ {
return IS_ENABLED(CONFIG_ARM64_CNP) && return IS_ENABLED(CONFIG_ARM64_CNP) &&
......
...@@ -37,7 +37,8 @@ ...@@ -37,7 +37,8 @@
#define ESR_ELx_EC_ERET (0x1a) /* EL2 only */ #define ESR_ELx_EC_ERET (0x1a) /* EL2 only */
/* Unallocated EC: 0x1B */ /* Unallocated EC: 0x1B */
#define ESR_ELx_EC_FPAC (0x1C) /* EL1 and above */ #define ESR_ELx_EC_FPAC (0x1C) /* EL1 and above */
/* Unallocated EC: 0x1D - 0x1E */ #define ESR_ELx_EC_SME (0x1D)
/* Unallocated EC: 0x1E */
#define ESR_ELx_EC_IMP_DEF (0x1f) /* EL3 only */ #define ESR_ELx_EC_IMP_DEF (0x1f) /* EL3 only */
#define ESR_ELx_EC_IABT_LOW (0x20) #define ESR_ELx_EC_IABT_LOW (0x20)
#define ESR_ELx_EC_IABT_CUR (0x21) #define ESR_ELx_EC_IABT_CUR (0x21)
...@@ -75,6 +76,7 @@ ...@@ -75,6 +76,7 @@
#define ESR_ELx_IL_SHIFT (25) #define ESR_ELx_IL_SHIFT (25)
#define ESR_ELx_IL (UL(1) << ESR_ELx_IL_SHIFT) #define ESR_ELx_IL (UL(1) << ESR_ELx_IL_SHIFT)
#define ESR_ELx_ISS_MASK (ESR_ELx_IL - 1) #define ESR_ELx_ISS_MASK (ESR_ELx_IL - 1)
#define ESR_ELx_ISS(esr) ((esr) & ESR_ELx_ISS_MASK)
/* ISS field definitions shared by different classes */ /* ISS field definitions shared by different classes */
#define ESR_ELx_WNR_SHIFT (6) #define ESR_ELx_WNR_SHIFT (6)
...@@ -326,6 +328,15 @@ ...@@ -326,6 +328,15 @@
#define ESR_ELx_CP15_32_ISS_SYS_CNTFRQ (ESR_ELx_CP15_32_ISS_SYS_VAL(0, 0, 14, 0) |\ #define ESR_ELx_CP15_32_ISS_SYS_CNTFRQ (ESR_ELx_CP15_32_ISS_SYS_VAL(0, 0, 14, 0) |\
ESR_ELx_CP15_32_ISS_DIR_READ) ESR_ELx_CP15_32_ISS_DIR_READ)
/*
* ISS values for SME traps
*/
#define ESR_ELx_SME_ISS_SME_DISABLED 0
#define ESR_ELx_SME_ISS_ILL 1
#define ESR_ELx_SME_ISS_SM_DISABLED 2
#define ESR_ELx_SME_ISS_ZA_DISABLED 3
#ifndef __ASSEMBLY__ #ifndef __ASSEMBLY__
#include <asm/types.h> #include <asm/types.h>
......
...@@ -58,6 +58,7 @@ void do_debug_exception(unsigned long addr_if_watchpoint, unsigned int esr, ...@@ -58,6 +58,7 @@ void do_debug_exception(unsigned long addr_if_watchpoint, unsigned int esr,
struct pt_regs *regs); struct pt_regs *regs);
void do_fpsimd_acc(unsigned int esr, struct pt_regs *regs); void do_fpsimd_acc(unsigned int esr, struct pt_regs *regs);
void do_sve_acc(unsigned int esr, struct pt_regs *regs); void do_sve_acc(unsigned int esr, struct pt_regs *regs);
void do_sme_acc(unsigned int esr, struct pt_regs *regs);
void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs); void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs);
void do_sysinstr(unsigned int esr, struct pt_regs *regs); void do_sysinstr(unsigned int esr, struct pt_regs *regs);
void do_sp_pc_abort(unsigned long addr, unsigned int esr, struct pt_regs *regs); void do_sp_pc_abort(unsigned long addr, unsigned int esr, struct pt_regs *regs);
......
...@@ -32,6 +32,18 @@ ...@@ -32,6 +32,18 @@
#define VFP_STATE_SIZE ((32 * 8) + 4) #define VFP_STATE_SIZE ((32 * 8) + 4)
#endif #endif
/*
* When we defined the maximum SVE vector length we defined the ABI so
* that the maximum vector length included all the reserved for future
* expansion bits in ZCR rather than those just currently defined by
* the architecture. While SME follows a similar pattern the fact that
* it includes a square matrix means that any allocations that attempt
* to cover the maximum potential vector length (such as happen with
* the regset used for ptrace) end up being extremely large. Define
* the much lower actual limit for use in such situations.
*/
#define SME_VQ_MAX 16
struct task_struct; struct task_struct;
extern void fpsimd_save_state(struct user_fpsimd_state *state); extern void fpsimd_save_state(struct user_fpsimd_state *state);
...@@ -47,13 +59,25 @@ extern void fpsimd_update_current_state(struct user_fpsimd_state const *state); ...@@ -47,13 +59,25 @@ extern void fpsimd_update_current_state(struct user_fpsimd_state const *state);
extern void fpsimd_bind_task_to_cpu(void); extern void fpsimd_bind_task_to_cpu(void);
extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state, extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state,
void *sve_state, unsigned int sve_vl); void *sve_state, unsigned int sve_vl,
void *za_state, unsigned int sme_vl,
u64 *svcr);
extern void fpsimd_flush_task_state(struct task_struct *target); extern void fpsimd_flush_task_state(struct task_struct *target);
extern void fpsimd_save_and_flush_cpu_state(void); extern void fpsimd_save_and_flush_cpu_state(void);
/* Maximum VL that SVE VL-agnostic software can transparently support */ static inline bool thread_sm_enabled(struct thread_struct *thread)
#define SVE_VL_ARCH_MAX 0x100 {
return system_supports_sme() && (thread->svcr & SVCR_SM_MASK);
}
static inline bool thread_za_enabled(struct thread_struct *thread)
{
return system_supports_sme() && (thread->svcr & SVCR_ZA_MASK);
}
/* Maximum VL that SVE/SME VL-agnostic software can transparently support */
#define VL_ARCH_MAX 0x100
/* Offset of FFR in the SVE register dump */ /* Offset of FFR in the SVE register dump */
static inline size_t sve_ffr_offset(int vl) static inline size_t sve_ffr_offset(int vl)
...@@ -63,25 +87,33 @@ static inline size_t sve_ffr_offset(int vl) ...@@ -63,25 +87,33 @@ static inline size_t sve_ffr_offset(int vl)
static inline void *sve_pffr(struct thread_struct *thread) static inline void *sve_pffr(struct thread_struct *thread)
{ {
return (char *)thread->sve_state + sve_ffr_offset(thread->sve_vl); unsigned int vl;
if (system_supports_sme() && thread_sm_enabled(thread))
vl = thread_get_sme_vl(thread);
else
vl = thread_get_sve_vl(thread);
return (char *)thread->sve_state + sve_ffr_offset(vl);
} }
extern void sve_save_state(void *state, u32 *pfpsr); extern void sve_save_state(void *state, u32 *pfpsr, int save_ffr);
extern void sve_load_state(void const *state, u32 const *pfpsr, extern void sve_load_state(void const *state, u32 const *pfpsr,
unsigned long vq_minus_1); int restore_ffr);
extern void sve_flush_live(void); extern void sve_flush_live(bool flush_ffr, unsigned long vq_minus_1);
extern void sve_load_from_fpsimd_state(struct user_fpsimd_state const *state,
unsigned long vq_minus_1);
extern unsigned int sve_get_vl(void); extern unsigned int sve_get_vl(void);
extern void sve_set_vq(unsigned long vq_minus_1);
extern void sme_set_vq(unsigned long vq_minus_1);
extern void za_save_state(void *state);
extern void za_load_state(void const *state);
struct arm64_cpu_capabilities; struct arm64_cpu_capabilities;
extern void sve_kernel_enable(const struct arm64_cpu_capabilities *__unused); extern void sve_kernel_enable(const struct arm64_cpu_capabilities *__unused);
extern void sme_kernel_enable(const struct arm64_cpu_capabilities *__unused);
extern void fa64_kernel_enable(const struct arm64_cpu_capabilities *__unused);
extern u64 read_zcr_features(void); extern u64 read_zcr_features(void);
extern u64 read_smcr_features(void);
extern int __ro_after_init sve_max_vl;
extern int __ro_after_init sve_max_virtualisable_vl;
extern __ro_after_init DECLARE_BITMAP(sve_vq_map, SVE_VQ_MAX);
/* /*
* Helpers to translate bit indices in sve_vq_map to VQ values (and * Helpers to translate bit indices in sve_vq_map to VQ values (and
...@@ -98,11 +130,27 @@ static inline unsigned int __bit_to_vq(unsigned int bit) ...@@ -98,11 +130,27 @@ static inline unsigned int __bit_to_vq(unsigned int bit)
return SVE_VQ_MAX - bit; return SVE_VQ_MAX - bit;
} }
/* Ensure vq >= SVE_VQ_MIN && vq <= SVE_VQ_MAX before calling this function */
static inline bool sve_vq_available(unsigned int vq) struct vl_info {
{ enum vec_type type;
return test_bit(__vq_to_bit(vq), sve_vq_map); const char *name; /* For display purposes */
}
/* Minimum supported vector length across all CPUs */
int min_vl;
/* Maximum supported vector length across all CPUs */
int max_vl;
int max_virtualisable_vl;
/*
* Set of available vector lengths,
* where length vq encoded as bit __vq_to_bit(vq):
*/
DECLARE_BITMAP(vq_map, SVE_VQ_MAX);
/* Set of vector lengths present on at least one cpu: */
DECLARE_BITMAP(vq_partial_map, SVE_VQ_MAX);
};
#ifdef CONFIG_ARM64_SVE #ifdef CONFIG_ARM64_SVE
...@@ -111,10 +159,11 @@ extern size_t sve_state_size(struct task_struct const *task); ...@@ -111,10 +159,11 @@ extern size_t sve_state_size(struct task_struct const *task);
extern void sve_alloc(struct task_struct *task); extern void sve_alloc(struct task_struct *task);
extern void fpsimd_release_task(struct task_struct *task); extern void fpsimd_release_task(struct task_struct *task);
extern void fpsimd_sync_to_sve(struct task_struct *task); extern void fpsimd_sync_to_sve(struct task_struct *task);
extern void fpsimd_force_sync_to_sve(struct task_struct *task);
extern void sve_sync_to_fpsimd(struct task_struct *task); extern void sve_sync_to_fpsimd(struct task_struct *task);
extern void sve_sync_from_fpsimd_zeropad(struct task_struct *task); extern void sve_sync_from_fpsimd_zeropad(struct task_struct *task);
extern int sve_set_vector_length(struct task_struct *task, extern int vec_set_vector_length(struct task_struct *task, enum vec_type type,
unsigned long vl, unsigned long flags); unsigned long vl, unsigned long flags);
extern int sve_set_current_vl(unsigned long arg); extern int sve_set_current_vl(unsigned long arg);
...@@ -130,15 +179,84 @@ static inline void sve_user_enable(void) ...@@ -130,15 +179,84 @@ static inline void sve_user_enable(void)
sysreg_clear_set(cpacr_el1, 0, CPACR_EL1_ZEN_EL0EN); sysreg_clear_set(cpacr_el1, 0, CPACR_EL1_ZEN_EL0EN);
} }
#define sve_cond_update_zcr_vq(val, reg) \
do { \
u64 __zcr = read_sysreg_s((reg)); \
u64 __new = __zcr & ~ZCR_ELx_LEN_MASK; \
__new |= (val) & ZCR_ELx_LEN_MASK; \
if (__zcr != __new) \
write_sysreg_s(__new, (reg)); \
} while (0)
/* /*
* Probing and setup functions. * Probing and setup functions.
* Calls to these functions must be serialised with one another. * Calls to these functions must be serialised with one another.
*/ */
extern void __init sve_init_vq_map(void); enum vec_type;
extern void sve_update_vq_map(void);
extern int sve_verify_vq_map(void); extern void __init vec_init_vq_map(enum vec_type type);
extern void vec_update_vq_map(enum vec_type type);
extern int vec_verify_vq_map(enum vec_type type);
extern void __init sve_setup(void); extern void __init sve_setup(void);
extern __ro_after_init struct vl_info vl_info[ARM64_VEC_MAX];
static inline void write_vl(enum vec_type type, u64 val)
{
u64 tmp;
switch (type) {
#ifdef CONFIG_ARM64_SVE
case ARM64_VEC_SVE:
tmp = read_sysreg_s(SYS_ZCR_EL1) & ~ZCR_ELx_LEN_MASK;
write_sysreg_s(tmp | val, SYS_ZCR_EL1);
break;
#endif
#ifdef CONFIG_ARM64_SME
case ARM64_VEC_SME:
tmp = read_sysreg_s(SYS_SMCR_EL1) & ~SMCR_ELx_LEN_MASK;
write_sysreg_s(tmp | val, SYS_SMCR_EL1);
break;
#endif
default:
WARN_ON_ONCE(1);
break;
}
}
static inline int vec_max_vl(enum vec_type type)
{
return vl_info[type].max_vl;
}
static inline int vec_max_virtualisable_vl(enum vec_type type)
{
return vl_info[type].max_virtualisable_vl;
}
static inline int sve_max_vl(void)
{
return vec_max_vl(ARM64_VEC_SVE);
}
static inline int sve_max_virtualisable_vl(void)
{
return vec_max_virtualisable_vl(ARM64_VEC_SVE);
}
/* Ensure vq >= SVE_VQ_MIN && vq <= SVE_VQ_MAX before calling this function */
static inline bool vq_available(enum vec_type type, unsigned int vq)
{
return test_bit(__vq_to_bit(vq), vl_info[type].vq_map);
}
static inline bool sve_vq_available(unsigned int vq)
{
return vq_available(ARM64_VEC_SVE, vq);
}
size_t sve_state_size(struct task_struct const *task);
#else /* ! CONFIG_ARM64_SVE */ #else /* ! CONFIG_ARM64_SVE */
static inline void sve_alloc(struct task_struct *task) { } static inline void sve_alloc(struct task_struct *task) { }
...@@ -156,16 +274,110 @@ static inline int sve_get_current_vl(void) ...@@ -156,16 +274,110 @@ static inline int sve_get_current_vl(void)
return -EINVAL; return -EINVAL;
} }
static inline int sve_max_vl(void)
{
return -EINVAL;
}
static inline bool sve_vq_available(unsigned int vq) { return false; }
static inline void sve_user_disable(void) { BUILD_BUG(); } static inline void sve_user_disable(void) { BUILD_BUG(); }
static inline void sve_user_enable(void) { BUILD_BUG(); } static inline void sve_user_enable(void) { BUILD_BUG(); }
static inline void sve_init_vq_map(void) { } #define sve_cond_update_zcr_vq(val, reg) do { } while (0)
static inline void sve_update_vq_map(void) { }
static inline int sve_verify_vq_map(void) { return 0; } static inline void vec_init_vq_map(enum vec_type t) { }
static inline void vec_update_vq_map(enum vec_type t) { }
static inline int vec_verify_vq_map(enum vec_type t) { return 0; }
static inline void sve_setup(void) { } static inline void sve_setup(void) { }
static inline size_t sve_state_size(struct task_struct const *task)
{
return 0;
}
#endif /* ! CONFIG_ARM64_SVE */ #endif /* ! CONFIG_ARM64_SVE */
#ifdef CONFIG_ARM64_SME
static inline void sme_user_disable(void)
{
sysreg_clear_set(cpacr_el1, CPACR_EL1_SMEN_EL0EN, 0);
}
static inline void sme_user_enable(void)
{
sysreg_clear_set(cpacr_el1, 0, CPACR_EL1_SMEN_EL0EN);
}
static inline void sme_smstart_sm(void)
{
asm volatile(__msr_s(SYS_SVCR_SMSTART_SM_EL0, "xzr"));
}
static inline void sme_smstop_sm(void)
{
asm volatile(__msr_s(SYS_SVCR_SMSTOP_SM_EL0, "xzr"));
}
static inline void sme_smstop(void)
{
asm volatile(__msr_s(SYS_SVCR_SMSTOP_SMZA_EL0, "xzr"));
}
extern void __init sme_setup(void);
static inline int sme_max_vl(void)
{
return vec_max_vl(ARM64_VEC_SME);
}
static inline int sme_max_virtualisable_vl(void)
{
return vec_max_virtualisable_vl(ARM64_VEC_SME);
}
extern void sme_alloc(struct task_struct *task);
extern unsigned int sme_get_vl(void);
extern int sme_set_current_vl(unsigned long arg);
extern int sme_get_current_vl(void);
/*
* Return how many bytes of memory are required to store the full SME
* specific state (currently just ZA) for task, given task's currently
* configured vector length.
*/
static inline size_t za_state_size(struct task_struct const *task)
{
unsigned int vl = task_get_sme_vl(task);
return ZA_SIG_REGS_SIZE(sve_vq_from_vl(vl));
}
#else
static inline void sme_user_disable(void) { BUILD_BUG(); }
static inline void sme_user_enable(void) { BUILD_BUG(); }
static inline void sme_smstart_sm(void) { }
static inline void sme_smstop_sm(void) { }
static inline void sme_smstop(void) { }
static inline void sme_alloc(struct task_struct *task) { }
static inline void sme_setup(void) { }
static inline unsigned int sme_get_vl(void) { return 0; }
static inline int sme_max_vl(void) { return 0; }
static inline int sme_max_virtualisable_vl(void) { return 0; }
static inline int sme_set_current_vl(unsigned long arg) { return -EINVAL; }
static inline int sme_get_current_vl(void) { return -EINVAL; }
static inline size_t za_state_size(struct task_struct const *task)
{
return 0;
}
#endif /* ! CONFIG_ARM64_SME */
/* For use by EFI runtime services calls only */ /* For use by EFI runtime services calls only */
extern void __efi_fpsimd_begin(void); extern void __efi_fpsimd_begin(void);
extern void __efi_fpsimd_end(void); extern void __efi_fpsimd_end(void);
......
...@@ -6,6 +6,8 @@ ...@@ -6,6 +6,8 @@
* Author: Catalin Marinas <catalin.marinas@arm.com> * Author: Catalin Marinas <catalin.marinas@arm.com>
*/ */
#include <asm/assembler.h>
.macro fpsimd_save state, tmpnr .macro fpsimd_save state, tmpnr
stp q0, q1, [\state, #16 * 0] stp q0, q1, [\state, #16 * 0]
stp q2, q3, [\state, #16 * 2] stp q2, q3, [\state, #16 * 2]
...@@ -91,6 +93,12 @@ ...@@ -91,6 +93,12 @@
.endif .endif
.endm .endm
.macro _sme_check_wv v
.if (\v) < 12 || (\v) > 15
.error "Bad vector select register \v."
.endif
.endm
/* SVE instruction encodings for non-SVE-capable assemblers */ /* SVE instruction encodings for non-SVE-capable assemblers */
/* STR (vector): STR Z\nz, [X\nxbase, #\offset, MUL VL] */ /* STR (vector): STR Z\nz, [X\nxbase, #\offset, MUL VL] */
...@@ -171,6 +179,54 @@ ...@@ -171,6 +179,54 @@
| (\np) | (\np)
.endm .endm
/* SME instruction encodings for non-SME-capable assemblers */
/* (pre binutils 2.38/LLVM 13) */
/* RDSVL X\nx, #\imm */
.macro _sme_rdsvl nx, imm
_check_general_reg \nx
_check_num (\imm), -0x20, 0x1f
.inst 0x04bf5800 \
| (\nx) \
| (((\imm) & 0x3f) << 5)
.endm
/*
* STR (vector from ZA array):
* STR ZA[\nw, #\offset], [X\nxbase, #\offset, MUL VL]
*/
.macro _sme_str_zav nw, nxbase, offset=0
_sme_check_wv \nw
_check_general_reg \nxbase
_check_num (\offset), -0x100, 0xff
.inst 0xe1200000 \
| (((\nw) & 3) << 13) \
| ((\nxbase) << 5) \
| ((\offset) & 7)
.endm
/*
* LDR (vector to ZA array):
* LDR ZA[\nw, #\offset], [X\nxbase, #\offset, MUL VL]
*/
.macro _sme_ldr_zav nw, nxbase, offset=0
_sme_check_wv \nw
_check_general_reg \nxbase
_check_num (\offset), -0x100, 0xff
.inst 0xe1000000 \
| (((\nw) & 3) << 13) \
| ((\nxbase) << 5) \
| ((\offset) & 7)
.endm
/*
* Zero the entire ZA array
* ZERO ZA
*/
.macro zero_za
.inst 0xc00800ff
.endm
.macro __for from:req, to:req .macro __for from:req, to:req
.if (\from) == (\to) .if (\from) == (\to)
_for__body %\from _for__body %\from
...@@ -205,36 +261,56 @@ ...@@ -205,36 +261,56 @@
921: 921:
.endm .endm
/* Update SMCR_EL1.LEN with the new VQ */
.macro sme_load_vq xvqminus1, xtmp, xtmp2
mrs_s \xtmp, SYS_SMCR_EL1
bic \xtmp2, \xtmp, SMCR_ELx_LEN_MASK
orr \xtmp2, \xtmp2, \xvqminus1
cmp \xtmp2, \xtmp
b.eq 921f
msr_s SYS_SMCR_EL1, \xtmp2 //self-synchronising
921:
.endm
/* Preserve the first 128-bits of Znz and zero the rest. */ /* Preserve the first 128-bits of Znz and zero the rest. */
.macro _sve_flush_z nz .macro _sve_flush_z nz
_sve_check_zreg \nz _sve_check_zreg \nz
mov v\nz\().16b, v\nz\().16b mov v\nz\().16b, v\nz\().16b
.endm .endm
.macro sve_flush .macro sve_flush_z
_for n, 0, 31, _sve_flush_z \n _for n, 0, 31, _sve_flush_z \n
.endm
.macro sve_flush_p
_for n, 0, 15, _sve_pfalse \n _for n, 0, 15, _sve_pfalse \n
.endm
.macro sve_flush_ffr
_sve_wrffr 0 _sve_wrffr 0
.endm .endm
.macro sve_save nxbase, xpfpsr, nxtmp .macro sve_save nxbase, xpfpsr, save_ffr, nxtmp
_for n, 0, 31, _sve_str_v \n, \nxbase, \n - 34 _for n, 0, 31, _sve_str_v \n, \nxbase, \n - 34
_for n, 0, 15, _sve_str_p \n, \nxbase, \n - 16 _for n, 0, 15, _sve_str_p \n, \nxbase, \n - 16
cbz \save_ffr, 921f
_sve_rdffr 0 _sve_rdffr 0
_sve_str_p 0, \nxbase _sve_str_p 0, \nxbase
_sve_ldr_p 0, \nxbase, -16 _sve_ldr_p 0, \nxbase, -16
b 922f
921:
str xzr, [x\nxbase] // Zero out FFR
922:
mrs x\nxtmp, fpsr mrs x\nxtmp, fpsr
str w\nxtmp, [\xpfpsr] str w\nxtmp, [\xpfpsr]
mrs x\nxtmp, fpcr mrs x\nxtmp, fpcr
str w\nxtmp, [\xpfpsr, #4] str w\nxtmp, [\xpfpsr, #4]
.endm .endm
.macro sve_load nxbase, xpfpsr, xvqminus1, nxtmp, xtmp2 .macro sve_load nxbase, xpfpsr, restore_ffr, nxtmp
sve_load_vq \xvqminus1, x\nxtmp, \xtmp2
_for n, 0, 31, _sve_ldr_v \n, \nxbase, \n - 34 _for n, 0, 31, _sve_ldr_v \n, \nxbase, \n - 34
cbz \restore_ffr, 921f
_sve_ldr_p 0, \nxbase _sve_ldr_p 0, \nxbase
_sve_wrffr 0 _sve_wrffr 0
921:
_for n, 0, 15, _sve_ldr_p \n, \nxbase, \n - 16 _for n, 0, 15, _sve_ldr_p \n, \nxbase, \n - 16
ldr w\nxtmp, [\xpfpsr] ldr w\nxtmp, [\xpfpsr]
...@@ -242,3 +318,25 @@ ...@@ -242,3 +318,25 @@
ldr w\nxtmp, [\xpfpsr, #4] ldr w\nxtmp, [\xpfpsr, #4]
msr fpcr, x\nxtmp msr fpcr, x\nxtmp
.endm .endm
.macro sme_save_za nxbase, xvl, nw
mov w\nw, #0
423:
_sme_str_zav \nw, \nxbase
add x\nxbase, x\nxbase, \xvl
add x\nw, x\nw, #1
cmp \xvl, x\nw
bne 423b
.endm
.macro sme_load_za nxbase, xvl, nw
mov w\nw, #0
423:
_sme_ldr_zav \nw, \nxbase
add x\nxbase, x\nxbase, \xvl
add x\nw, x\nw, #1
cmp \xvl, x\nw
bne 423b
.endm
...@@ -108,6 +108,14 @@ ...@@ -108,6 +108,14 @@
#define KERNEL_HWCAP_ECV __khwcap2_feature(ECV) #define KERNEL_HWCAP_ECV __khwcap2_feature(ECV)
#define KERNEL_HWCAP_AFP __khwcap2_feature(AFP) #define KERNEL_HWCAP_AFP __khwcap2_feature(AFP)
#define KERNEL_HWCAP_RPRES __khwcap2_feature(RPRES) #define KERNEL_HWCAP_RPRES __khwcap2_feature(RPRES)
#define KERNEL_HWCAP_SME __khwcap2_feature(SME)
#define KERNEL_HWCAP_SME_I16I64 __khwcap2_feature(SME_I16I64)
#define KERNEL_HWCAP_SME_F64F64 __khwcap2_feature(SME_F64F64)
#define KERNEL_HWCAP_SME_I8I32 __khwcap2_feature(SME_I8I32)
#define KERNEL_HWCAP_SME_F16F32 __khwcap2_feature(SME_F16F32)
#define KERNEL_HWCAP_SME_B16F32 __khwcap2_feature(SME_B16F32)
#define KERNEL_HWCAP_SME_F32F32 __khwcap2_feature(SME_F32F32)
#define KERNEL_HWCAP_SME_FA64 __khwcap2_feature(SME_FA64)
/* /*
* This yields a mask that user programs can use to figure out what * This yields a mask that user programs can use to figure out what
......
...@@ -279,6 +279,7 @@ ...@@ -279,6 +279,7 @@
#define CPTR_EL2_TCPAC (1U << 31) #define CPTR_EL2_TCPAC (1U << 31)
#define CPTR_EL2_TAM (1 << 30) #define CPTR_EL2_TAM (1 << 30)
#define CPTR_EL2_TTA (1 << 20) #define CPTR_EL2_TTA (1 << 20)
#define CPTR_EL2_TSM (1 << 12)
#define CPTR_EL2_TFP (1 << CPTR_EL2_TFP_SHIFT) #define CPTR_EL2_TFP (1 << CPTR_EL2_TFP_SHIFT)
#define CPTR_EL2_TZ (1 << 8) #define CPTR_EL2_TZ (1 << 8)
#define CPTR_EL2_RES1 0x000032ff /* known RES1 bits in CPTR_EL2 */ #define CPTR_EL2_RES1 0x000032ff /* known RES1 bits in CPTR_EL2 */
......
...@@ -285,8 +285,11 @@ struct vcpu_reset_state { ...@@ -285,8 +285,11 @@ struct vcpu_reset_state {
struct kvm_vcpu_arch { struct kvm_vcpu_arch {
struct kvm_cpu_context ctxt; struct kvm_cpu_context ctxt;
/* Guest floating point state */
void *sve_state; void *sve_state;
unsigned int sve_max_vl; unsigned int sve_max_vl;
u64 svcr;
/* Stage 2 paging state used by the hardware on next switch */ /* Stage 2 paging state used by the hardware on next switch */
struct kvm_s2_mmu *hw_mmu; struct kvm_s2_mmu *hw_mmu;
...@@ -394,8 +397,10 @@ struct kvm_vcpu_arch { ...@@ -394,8 +397,10 @@ struct kvm_vcpu_arch {
}; };
/* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */ /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
#define vcpu_sve_pffr(vcpu) ((void *)((char *)((vcpu)->arch.sve_state) + \ #define vcpu_sve_pffr(vcpu) (kern_hyp_va((vcpu)->arch.sve_state) + \
sve_ffr_offset((vcpu)->arch.sve_max_vl))) sve_ffr_offset((vcpu)->arch.sve_max_vl))
#define vcpu_sve_max_vq(vcpu) sve_vq_from_vl((vcpu)->arch.sve_max_vl)
#define vcpu_sve_state_size(vcpu) ({ \ #define vcpu_sve_state_size(vcpu) ({ \
size_t __size_ret; \ size_t __size_ret; \
...@@ -404,7 +409,7 @@ struct kvm_vcpu_arch { ...@@ -404,7 +409,7 @@ struct kvm_vcpu_arch {
if (WARN_ON(!sve_vl_valid((vcpu)->arch.sve_max_vl))) { \ if (WARN_ON(!sve_vl_valid((vcpu)->arch.sve_max_vl))) { \
__size_ret = 0; \ __size_ret = 0; \
} else { \ } else { \
__vcpu_vq = sve_vq_from_vl((vcpu)->arch.sve_max_vl); \ __vcpu_vq = vcpu_sve_max_vq(vcpu); \
__size_ret = SVE_SIG_REGS_SIZE(__vcpu_vq); \ __size_ret = SVE_SIG_REGS_SIZE(__vcpu_vq); \
} \ } \
\ \
...@@ -420,6 +425,7 @@ struct kvm_vcpu_arch { ...@@ -420,6 +425,7 @@ struct kvm_vcpu_arch {
#define KVM_ARM64_GUEST_HAS_SVE (1 << 5) /* SVE exposed to guest */ #define KVM_ARM64_GUEST_HAS_SVE (1 << 5) /* SVE exposed to guest */
#define KVM_ARM64_VCPU_SVE_FINALIZED (1 << 6) /* SVE config completed */ #define KVM_ARM64_VCPU_SVE_FINALIZED (1 << 6) /* SVE config completed */
#define KVM_ARM64_GUEST_HAS_PTRAUTH (1 << 7) /* PTRAUTH exposed to guest */ #define KVM_ARM64_GUEST_HAS_PTRAUTH (1 << 7) /* PTRAUTH exposed to guest */
#define KVM_ARM64_HOST_SME_ENABLED (1 << 16) /* SME enabled for EL0 */
#define vcpu_has_sve(vcpu) (system_supports_sve() && \ #define vcpu_has_sve(vcpu) (system_supports_sve() && \
((vcpu)->arch.flags & KVM_ARM64_GUEST_HAS_SVE)) ((vcpu)->arch.flags & KVM_ARM64_GUEST_HAS_SVE))
......
...@@ -89,6 +89,8 @@ void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu); ...@@ -89,6 +89,8 @@ void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu);
void __fpsimd_save_state(struct user_fpsimd_state *fp_regs); void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs); void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs);
void __sve_save_state(void *sve_pffr, u32 *fpsr);
void __sve_restore_state(void *sve_pffr, u32 *fpsr);
#ifndef __KVM_NVHE_HYPERVISOR__ #ifndef __KVM_NVHE_HYPERVISOR__
void activate_traps_vhe_load(struct kvm_vcpu *vcpu); void activate_traps_vhe_load(struct kvm_vcpu *vcpu);
......
...@@ -113,6 +113,12 @@ struct debug_info { ...@@ -113,6 +113,12 @@ struct debug_info {
#endif #endif
}; };
enum vec_type {
ARM64_VEC_SVE = 0,
ARM64_VEC_SME,
ARM64_VEC_MAX,
};
struct cpu_context { struct cpu_context {
unsigned long x19; unsigned long x19;
unsigned long x20; unsigned long x20;
...@@ -158,16 +164,74 @@ struct thread_struct { ...@@ -158,16 +164,74 @@ struct thread_struct {
u64 sctlr_tcf0; u64 sctlr_tcf0;
u64 gcr_user_incl; u64 gcr_user_incl;
#endif #endif
KABI_RESERVE(1) KABI_USE(1, unsigned int vl[ARM64_VEC_MAX])
KABI_RESERVE(2) KABI_USE(2, unsigned int vl_onexec[ARM64_VEC_MAX])
KABI_RESERVE(3) KABI_USE(3, u64 tpidr2_el0)
KABI_RESERVE(4) KABI_USE(4, u64 svcr)
KABI_RESERVE(5) KABI_USE(5, void *za_state) /* ZA register, if any */
KABI_RESERVE(6) KABI_RESERVE(6)
KABI_RESERVE(7) KABI_RESERVE(7)
KABI_RESERVE(8) KABI_RESERVE(8)
}; };
static inline unsigned int thread_get_vl(struct thread_struct *thread,
enum vec_type type)
{
return thread->vl[type];
}
static inline unsigned int thread_get_sve_vl(struct thread_struct *thread)
{
return thread_get_vl(thread, ARM64_VEC_SVE);
}
static inline unsigned int thread_get_sme_vl(struct thread_struct *thread)
{
return thread_get_vl(thread, ARM64_VEC_SME);
}
static inline unsigned int thread_get_cur_vl(struct thread_struct *thread)
{
if (system_supports_sme() && (thread->svcr & SVCR_SM_MASK))
return thread_get_sme_vl(thread);
else
return thread_get_sve_vl(thread);
}
unsigned int task_get_vl(const struct task_struct *task, enum vec_type type);
void task_set_vl(struct task_struct *task, enum vec_type type,
unsigned long vl);
void task_set_vl_onexec(struct task_struct *task, enum vec_type type,
unsigned long vl);
unsigned int task_get_vl_onexec(const struct task_struct *task,
enum vec_type type);
static inline unsigned int task_get_sve_vl(const struct task_struct *task)
{
return task_get_vl(task, ARM64_VEC_SVE);
}
static inline unsigned int task_get_sme_vl(const struct task_struct *task)
{
return task_get_vl(task, ARM64_VEC_SME);
}
static inline void task_set_sve_vl(struct task_struct *task, unsigned long vl)
{
task_set_vl(task, ARM64_VEC_SVE, vl);
}
static inline unsigned int task_get_sve_vl_onexec(const struct task_struct *task)
{
return task_get_vl_onexec(task, ARM64_VEC_SVE);
}
static inline void task_set_sve_vl_onexec(struct task_struct *task,
unsigned long vl)
{
task_set_vl_onexec(task, ARM64_VEC_SVE, vl);
}
static inline void arch_thread_struct_whitelist(unsigned long *offset, static inline void arch_thread_struct_whitelist(unsigned long *offset,
unsigned long *size) unsigned long *size)
{ {
...@@ -306,9 +370,11 @@ extern void __init minsigstksz_setup(void); ...@@ -306,9 +370,11 @@ extern void __init minsigstksz_setup(void);
*/ */
#include <asm/fpsimd.h> #include <asm/fpsimd.h>
/* Userspace interface for PR_SVE_{SET,GET}_VL prctl()s: */ /* Userspace interface for PR_S[MV]E_{SET,GET}_VL prctl()s: */
#define SVE_SET_VL(arg) sve_set_current_vl(arg) #define SVE_SET_VL(arg) sve_set_current_vl(arg)
#define SVE_GET_VL() sve_get_current_vl() #define SVE_GET_VL() sve_get_current_vl()
#define SME_SET_VL(arg) sme_set_current_vl(arg)
#define SME_GET_VL() sme_get_current_vl()
/* PR_PAC_RESET_KEYS prctl */ /* PR_PAC_RESET_KEYS prctl */
#define PAC_RESET_KEYS(tsk, arg) ptrauth_prctl_reset_keys(tsk, arg) #define PAC_RESET_KEYS(tsk, arg) ptrauth_prctl_reset_keys(tsk, arg)
......
...@@ -37,6 +37,7 @@ struct rt_sigframe_user_layout { ...@@ -37,6 +37,7 @@ struct rt_sigframe_user_layout {
unsigned long fpsimd_offset; unsigned long fpsimd_offset;
unsigned long esr_offset; unsigned long esr_offset;
unsigned long sve_offset; unsigned long sve_offset;
unsigned long za_offset;
unsigned long extra_offset; unsigned long extra_offset;
unsigned long end_offset; unsigned long end_offset;
}; };
...@@ -44,6 +45,7 @@ struct rt_sigframe_user_layout { ...@@ -44,6 +45,7 @@ struct rt_sigframe_user_layout {
struct user_ctxs { struct user_ctxs {
struct fpsimd_context __user *fpsimd; struct fpsimd_context __user *fpsimd;
struct sve_context __user *sve; struct sve_context __user *sve;
struct za_context __user *za;
}; };
struct frame_record { struct frame_record {
...@@ -129,6 +131,7 @@ static int get_sigframe(struct rt_sigframe_user_layout *user, ...@@ -129,6 +131,7 @@ static int get_sigframe(struct rt_sigframe_user_layout *user,
return 0; return 0;
} }
extern int restore_za_context(struct user_ctxs *user);
static int restore_sigframe(struct pt_regs *regs, static int restore_sigframe(struct pt_regs *regs,
struct rt_sigframe __user *sf) struct rt_sigframe __user *sf)
{ {
...@@ -170,9 +173,13 @@ static int restore_sigframe(struct pt_regs *regs, ...@@ -170,9 +173,13 @@ static int restore_sigframe(struct pt_regs *regs,
} }
} }
if (err == 0 && system_supports_sme() && user.za)
err = restore_za_context(&user);
return err; return err;
} }
extern int preserve_za_context(struct za_context __user *ctx);
static int setup_sigframe(struct rt_sigframe_user_layout *user, static int setup_sigframe(struct rt_sigframe_user_layout *user,
struct pt_regs *regs, sigset_t *set) struct pt_regs *regs, sigset_t *set)
{ {
...@@ -212,13 +219,21 @@ static int setup_sigframe(struct rt_sigframe_user_layout *user, ...@@ -212,13 +219,21 @@ static int setup_sigframe(struct rt_sigframe_user_layout *user,
&esr_ctx->esr, err); &esr_ctx->esr, err);
} }
/* Scalable Vector Extension state, if present */ /* Scalable Vector Extension state (including streaming), if present */
if (system_supports_sve() && err == 0 && user->sve_offset) { if ((system_supports_sve() || system_supports_sme()) &&
err == 0 && user->sve_offset) {
struct sve_context __user *sve_ctx = struct sve_context __user *sve_ctx =
apply_user_offset(user, user->sve_offset); apply_user_offset(user, user->sve_offset);
err |= preserve_sve_context(sve_ctx); err |= preserve_sve_context(sve_ctx);
} }
/* ZA state if present */
if (system_supports_sme() && err == 0 && user->za_offset) {
struct za_context __user *za_ctx =
apply_user_offset(user, user->za_offset);
err |= preserve_za_context(za_ctx);
}
if (err == 0 && user->extra_offset) if (err == 0 && user->extra_offset)
setup_extra_context((char __user *)user->sigframe, user->size, setup_extra_context((char __user *)user->sigframe, user->size,
(char __user *)apply_user_offset(user, (char __user *)apply_user_offset(user,
......
...@@ -115,6 +115,10 @@ ...@@ -115,6 +115,10 @@
* System registers, organised loosely by encoding but grouped together * System registers, organised loosely by encoding but grouped together
* where the architected name contains an index. e.g. ID_MMFR<n>_EL1. * where the architected name contains an index. e.g. ID_MMFR<n>_EL1.
*/ */
#define SYS_SVCR_SMSTOP_SM_EL0 sys_reg(0, 3, 4, 2, 3)
#define SYS_SVCR_SMSTART_SM_EL0 sys_reg(0, 3, 4, 3, 3)
#define SYS_SVCR_SMSTOP_SMZA_EL0 sys_reg(0, 3, 4, 6, 3)
#define SYS_OSDTRRX_EL1 sys_reg(2, 0, 0, 0, 2) #define SYS_OSDTRRX_EL1 sys_reg(2, 0, 0, 0, 2)
#define SYS_MDCCINT_EL1 sys_reg(2, 0, 0, 2, 0) #define SYS_MDCCINT_EL1 sys_reg(2, 0, 0, 2, 0)
#define SYS_MDSCR_EL1 sys_reg(2, 0, 0, 2, 2) #define SYS_MDSCR_EL1 sys_reg(2, 0, 0, 2, 2)
...@@ -170,6 +174,7 @@ ...@@ -170,6 +174,7 @@
#define SYS_ID_AA64PFR0_EL1 sys_reg(3, 0, 0, 4, 0) #define SYS_ID_AA64PFR0_EL1 sys_reg(3, 0, 0, 4, 0)
#define SYS_ID_AA64PFR1_EL1 sys_reg(3, 0, 0, 4, 1) #define SYS_ID_AA64PFR1_EL1 sys_reg(3, 0, 0, 4, 1)
#define SYS_ID_AA64ZFR0_EL1 sys_reg(3, 0, 0, 4, 4) #define SYS_ID_AA64ZFR0_EL1 sys_reg(3, 0, 0, 4, 4)
#define SYS_ID_AA64SMFR0_EL1 sys_reg(3, 0, 0, 4, 5)
#define SYS_ID_AA64DFR0_EL1 sys_reg(3, 0, 0, 5, 0) #define SYS_ID_AA64DFR0_EL1 sys_reg(3, 0, 0, 5, 0)
#define SYS_ID_AA64DFR1_EL1 sys_reg(3, 0, 0, 5, 1) #define SYS_ID_AA64DFR1_EL1 sys_reg(3, 0, 0, 5, 1)
...@@ -192,6 +197,8 @@ ...@@ -192,6 +197,8 @@
#define SYS_GCR_EL1 sys_reg(3, 0, 1, 0, 6) #define SYS_GCR_EL1 sys_reg(3, 0, 1, 0, 6)
#define SYS_ZCR_EL1 sys_reg(3, 0, 1, 2, 0) #define SYS_ZCR_EL1 sys_reg(3, 0, 1, 2, 0)
#define SYS_SMPRI_EL1 sys_reg(3, 0, 1, 2, 4)
#define SYS_SMCR_EL1 sys_reg(3, 0, 1, 2, 6)
#define SYS_TTBR0_EL1 sys_reg(3, 0, 2, 0, 0) #define SYS_TTBR0_EL1 sys_reg(3, 0, 2, 0, 0)
#define SYS_TTBR1_EL1 sys_reg(3, 0, 2, 0, 1) #define SYS_TTBR1_EL1 sys_reg(3, 0, 2, 0, 1)
...@@ -333,6 +340,8 @@ ...@@ -333,6 +340,8 @@
/*** End of Statistical Profiling Extension ***/ /*** End of Statistical Profiling Extension ***/
#define SMPRI_EL1_PRIORITY_MASK 0xf
#define SYS_PMINTENSET_EL1 sys_reg(3, 0, 9, 14, 1) #define SYS_PMINTENSET_EL1 sys_reg(3, 0, 9, 14, 1)
#define SYS_PMINTENCLR_EL1 sys_reg(3, 0, 9, 14, 2) #define SYS_PMINTENCLR_EL1 sys_reg(3, 0, 9, 14, 2)
...@@ -388,8 +397,13 @@ ...@@ -388,8 +397,13 @@
#define SYS_CCSIDR_EL1 sys_reg(3, 1, 0, 0, 0) #define SYS_CCSIDR_EL1 sys_reg(3, 1, 0, 0, 0)
#define SYS_CLIDR_EL1 sys_reg(3, 1, 0, 0, 1) #define SYS_CLIDR_EL1 sys_reg(3, 1, 0, 0, 1)
#define SYS_GMID_EL1 sys_reg(3, 1, 0, 0, 4) #define SYS_GMID_EL1 sys_reg(3, 1, 0, 0, 4)
#define SYS_SMIDR_EL1 sys_reg(3, 1, 0, 0, 6)
#define SYS_AIDR_EL1 sys_reg(3, 1, 0, 0, 7) #define SYS_AIDR_EL1 sys_reg(3, 1, 0, 0, 7)
#define SMIDR_EL1_IMPLEMENTER_SHIFT 24
#define SMIDR_EL1_SMPS_SHIFT 15
#define SMIDR_EL1_AFFINITY_SHIFT 0
#define SYS_CSSELR_EL1 sys_reg(3, 2, 0, 0, 0) #define SYS_CSSELR_EL1 sys_reg(3, 2, 0, 0, 0)
#define SYS_CTR_EL0 sys_reg(3, 3, 0, 0, 1) #define SYS_CTR_EL0 sys_reg(3, 3, 0, 0, 1)
...@@ -398,6 +412,10 @@ ...@@ -398,6 +412,10 @@
#define SYS_RNDR_EL0 sys_reg(3, 3, 2, 4, 0) #define SYS_RNDR_EL0 sys_reg(3, 3, 2, 4, 0)
#define SYS_RNDRRS_EL0 sys_reg(3, 3, 2, 4, 1) #define SYS_RNDRRS_EL0 sys_reg(3, 3, 2, 4, 1)
#define SYS_SVCR sys_reg(3, 3, 4, 2, 2)
#define SVCR_ZA_MASK 2
#define SVCR_SM_MASK 1
#define SYS_PMCR_EL0 sys_reg(3, 3, 9, 12, 0) #define SYS_PMCR_EL0 sys_reg(3, 3, 9, 12, 0)
#define SYS_PMCNTENSET_EL0 sys_reg(3, 3, 9, 12, 1) #define SYS_PMCNTENSET_EL0 sys_reg(3, 3, 9, 12, 1)
#define SYS_PMCNTENCLR_EL0 sys_reg(3, 3, 9, 12, 2) #define SYS_PMCNTENCLR_EL0 sys_reg(3, 3, 9, 12, 2)
...@@ -414,6 +432,7 @@ ...@@ -414,6 +432,7 @@
#define SYS_TPIDR_EL0 sys_reg(3, 3, 13, 0, 2) #define SYS_TPIDR_EL0 sys_reg(3, 3, 13, 0, 2)
#define SYS_TPIDRRO_EL0 sys_reg(3, 3, 13, 0, 3) #define SYS_TPIDRRO_EL0 sys_reg(3, 3, 13, 0, 3)
#define SYS_TPIDR2_EL0 sys_reg(3, 3, 13, 0, 5)
#define SYS_SCXTNUM_EL0 sys_reg(3, 3, 13, 0, 7) #define SYS_SCXTNUM_EL0 sys_reg(3, 3, 13, 0, 7)
...@@ -477,8 +496,17 @@ ...@@ -477,8 +496,17 @@
#define SYS_PMCCFILTR_EL0 sys_reg(3, 3, 14, 15, 7) #define SYS_PMCCFILTR_EL0 sys_reg(3, 3, 14, 15, 7)
#define SYS_HFGRTR_EL2 sys_reg(3, 4, 1, 1, 4)
#define SYS_HFGWTR_EL2 sys_reg(3, 4, 1, 1, 5)
#define SYS_HFGITR_EL2 sys_reg(3, 4, 1, 1, 6)
#define SYS_ZCR_EL2 sys_reg(3, 4, 1, 2, 0) #define SYS_ZCR_EL2 sys_reg(3, 4, 1, 2, 0)
#define SYS_HCRX_EL2 sys_reg(3, 4, 1, 2, 2)
#define SYS_SMPRIMAP_EL2 sys_reg(3, 4, 1, 2, 5)
#define SYS_SMCR_EL2 sys_reg(3, 4, 1, 2, 6)
#define SYS_DACR32_EL2 sys_reg(3, 4, 3, 0, 0) #define SYS_DACR32_EL2 sys_reg(3, 4, 3, 0, 0)
#define SYS_HDFGRTR_EL2 sys_reg(3, 4, 3, 1, 4)
#define SYS_HDFGWTR_EL2 sys_reg(3, 4, 3, 1, 5)
#define SYS_HAFGRTR_EL2 sys_reg(3, 4, 3, 1, 6)
#define SYS_SPSR_EL2 sys_reg(3, 4, 4, 0, 0) #define SYS_SPSR_EL2 sys_reg(3, 4, 4, 0, 0)
#define SYS_ELR_EL2 sys_reg(3, 4, 4, 0, 1) #define SYS_ELR_EL2 sys_reg(3, 4, 4, 0, 1)
#define SYS_IFSR32_EL2 sys_reg(3, 4, 5, 0, 1) #define SYS_IFSR32_EL2 sys_reg(3, 4, 5, 0, 1)
...@@ -534,6 +562,7 @@ ...@@ -534,6 +562,7 @@
#define SYS_SCTLR_EL12 sys_reg(3, 5, 1, 0, 0) #define SYS_SCTLR_EL12 sys_reg(3, 5, 1, 0, 0)
#define SYS_CPACR_EL12 sys_reg(3, 5, 1, 0, 2) #define SYS_CPACR_EL12 sys_reg(3, 5, 1, 0, 2)
#define SYS_ZCR_EL12 sys_reg(3, 5, 1, 2, 0) #define SYS_ZCR_EL12 sys_reg(3, 5, 1, 2, 0)
#define SYS_SMCR_EL12 sys_reg(3, 5, 1, 2, 6)
#define SYS_TTBR0_EL12 sys_reg(3, 5, 2, 0, 0) #define SYS_TTBR0_EL12 sys_reg(3, 5, 2, 0, 0)
#define SYS_TTBR1_EL12 sys_reg(3, 5, 2, 0, 1) #define SYS_TTBR1_EL12 sys_reg(3, 5, 2, 0, 1)
#define SYS_TCR_EL12 sys_reg(3, 5, 2, 0, 2) #define SYS_TCR_EL12 sys_reg(3, 5, 2, 0, 2)
...@@ -557,6 +586,7 @@ ...@@ -557,6 +586,7 @@
#define SYS_CNTV_CVAL_EL02 sys_reg(3, 5, 14, 3, 2) #define SYS_CNTV_CVAL_EL02 sys_reg(3, 5, 14, 3, 2)
/* Common SCTLR_ELx flags. */ /* Common SCTLR_ELx flags. */
#define SCTLR_ELx_ENTP2 (BIT(60))
#define SCTLR_ELx_DSSBS (BIT(44)) #define SCTLR_ELx_DSSBS (BIT(44))
#define SCTLR_ELx_ATA (BIT(43)) #define SCTLR_ELx_ATA (BIT(43))
...@@ -752,6 +782,7 @@ ...@@ -752,6 +782,7 @@
#define ID_AA64PFR0_EL0_32BIT_64BIT 0x2 #define ID_AA64PFR0_EL0_32BIT_64BIT 0x2
/* id_aa64pfr1 */ /* id_aa64pfr1 */
#define ID_AA64PFR1_SME_SHIFT 24
#define ID_AA64PFR1_MPAMFRAC_SHIFT 16 #define ID_AA64PFR1_MPAMFRAC_SHIFT 16
#define ID_AA64PFR1_RASFRAC_SHIFT 12 #define ID_AA64PFR1_RASFRAC_SHIFT 12
#define ID_AA64PFR1_MTE_SHIFT 8 #define ID_AA64PFR1_MTE_SHIFT 8
...@@ -762,6 +793,7 @@ ...@@ -762,6 +793,7 @@
#define ID_AA64PFR1_SSBS_PSTATE_ONLY 1 #define ID_AA64PFR1_SSBS_PSTATE_ONLY 1
#define ID_AA64PFR1_SSBS_PSTATE_INSNS 2 #define ID_AA64PFR1_SSBS_PSTATE_INSNS 2
#define ID_AA64PFR1_BT_BTI 0x1 #define ID_AA64PFR1_BT_BTI 0x1
#define ID_AA64PFR1_SME 1
#define ID_AA64PFR1_MTE_NI 0x0 #define ID_AA64PFR1_MTE_NI 0x0
#define ID_AA64PFR1_MTE_EL0 0x1 #define ID_AA64PFR1_MTE_EL0 0x1
...@@ -789,6 +821,23 @@ ...@@ -789,6 +821,23 @@
#define ID_AA64ZFR0_AES_PMULL 0x2 #define ID_AA64ZFR0_AES_PMULL 0x2
#define ID_AA64ZFR0_SVEVER_SVE2 0x1 #define ID_AA64ZFR0_SVEVER_SVE2 0x1
/* id_aa64smfr0 */
#define ID_AA64SMFR0_FA64_SHIFT 63
#define ID_AA64SMFR0_I16I64_SHIFT 52
#define ID_AA64SMFR0_F64F64_SHIFT 48
#define ID_AA64SMFR0_I8I32_SHIFT 36
#define ID_AA64SMFR0_F16F32_SHIFT 35
#define ID_AA64SMFR0_B16F32_SHIFT 34
#define ID_AA64SMFR0_F32F32_SHIFT 32
#define ID_AA64SMFR0_FA64 0x1
#define ID_AA64SMFR0_I16I64 0xf
#define ID_AA64SMFR0_F64F64 0x1
#define ID_AA64SMFR0_I8I32 0xf
#define ID_AA64SMFR0_F16F32 0x1
#define ID_AA64SMFR0_B16F32 0x1
#define ID_AA64SMFR0_F32F32 0x1
/* id_aa64mmfr0 */ /* id_aa64mmfr0 */
#define ID_AA64MMFR0_ECV_SHIFT 60 #define ID_AA64MMFR0_ECV_SHIFT 60
#define ID_AA64MMFR0_FGT_SHIFT 56 #define ID_AA64MMFR0_FGT_SHIFT 56
...@@ -822,6 +871,7 @@ ...@@ -822,6 +871,7 @@
/* id_aa64mmfr1 */ /* id_aa64mmfr1 */
#define ID_AA64MMFR1_ECBHB_SHIFT 60 #define ID_AA64MMFR1_ECBHB_SHIFT 60
#define ID_AA64MMFR1_HCX_SHIFT 40
#define ID_AA64MMFR1_AFP_SHIFT 44 #define ID_AA64MMFR1_AFP_SHIFT 44
#define ID_AA64MMFR1_ETS_SHIFT 36 #define ID_AA64MMFR1_ETS_SHIFT 36
#define ID_AA64MMFR1_TWED_SHIFT 32 #define ID_AA64MMFR1_TWED_SHIFT 32
...@@ -993,14 +1043,19 @@ ...@@ -993,14 +1043,19 @@
#define DCZID_DZP_SHIFT 4 #define DCZID_DZP_SHIFT 4
#define DCZID_BS_SHIFT 0 #define DCZID_BS_SHIFT 0
/*
* The ZCR_ELx_LEN_* definitions intentionally include bits [8:4] which
* are reserved by the SVE architecture for future expansion of the LEN
* field, with compatible semantics.
*/
#define ZCR_ELx_LEN_SHIFT 0 #define ZCR_ELx_LEN_SHIFT 0
#define ZCR_ELx_LEN_SIZE 9 #define ZCR_ELx_LEN_WIDTH 4
#define ZCR_ELx_LEN_MASK 0x1ff #define ZCR_ELx_LEN_MASK 0xf
#define SMCR_ELx_FA64_SHIFT 31
#define SMCR_ELx_FA64_MASK (1 << SMCR_ELx_FA64_SHIFT)
#define SMCR_ELx_LEN_SHIFT 0
#define SMCR_ELx_LEN_WIDTH 4
#define SMCR_ELx_LEN_MASK 0xf
#define CPACR_EL1_SMEN_EL1EN (BIT(24)) /* enable EL1 access */
#define CPACR_EL1_SMEN_EL0EN (BIT(25)) /* enable EL0 access, if EL1EN set */
#define CPACR_EL1_ZEN_EL1EN (BIT(16)) /* enable EL1 access */ #define CPACR_EL1_ZEN_EL1EN (BIT(16)) /* enable EL1 access */
#define CPACR_EL1_ZEN_EL0EN (BIT(17)) /* enable EL0 access, if EL1EN set */ #define CPACR_EL1_ZEN_EL0EN (BIT(17)) /* enable EL0 access, if EL1EN set */
...@@ -1029,6 +1084,9 @@ ...@@ -1029,6 +1084,9 @@
#define SYS_TFSR_EL1_TF0 (UL(1) << SYS_TFSR_EL1_TF0_SHIFT) #define SYS_TFSR_EL1_TF0 (UL(1) << SYS_TFSR_EL1_TF0_SHIFT)
#define SYS_TFSR_EL1_TF1 (UL(1) << SYS_TFSR_EL1_TF1_SHIFT) #define SYS_TFSR_EL1_TF1 (UL(1) << SYS_TFSR_EL1_TF1_SHIFT)
/* HCRX_EL2 definitions */
#define HCRX_EL2_SMPME_MASK (1 << 5)
/* Safe value for MPIDR_EL1: Bit31:RES1, Bit30:U:0, Bit24:MT:0 */ /* Safe value for MPIDR_EL1: Bit31:RES1, Bit30:U:0, Bit24:MT:0 */
#define SYS_MPIDR_SAFE_VAL (BIT(31)) #define SYS_MPIDR_SAFE_VAL (BIT(31))
...@@ -1150,4 +1208,10 @@ ...@@ -1150,4 +1208,10 @@
#endif #endif
/* HFG[WR]TR_EL2 bit definitions */
#define HFGxTR_EL2_nTPIDR2_EL0_SHIFT 55
#define HFGxTR_EL2_nTPIDR2_EL0_MASK BIT_MASK(HFGxTR_EL2_nTPIDR2_EL0_SHIFT)
#define HFGxTR_EL2_nSMPRI_EL1_SHIFT 54
#define HFGxTR_EL2_nSMPRI_EL1_MASK BIT_MASK(HFGxTR_EL2_nSMPRI_EL1_SHIFT)
#endif /* __ASM_SYSREG_H */ #endif /* __ASM_SYSREG_H */
...@@ -81,11 +81,13 @@ void arch_release_task_struct(struct task_struct *tsk); ...@@ -81,11 +81,13 @@ void arch_release_task_struct(struct task_struct *tsk);
#define TIF_SINGLESTEP 21 #define TIF_SINGLESTEP 21
#define TIF_32BIT 22 /* AARCH32 process */ #define TIF_32BIT 22 /* AARCH32 process */
#define TIF_SVE 23 /* Scalable Vector Extension in use */ #define TIF_SVE 23 /* Scalable Vector Extension in use */
#define TIF_SVE_VL_INHERIT 24 /* Inherit sve_vl_onexec across exec */ #define TIF_SVE_VL_INHERIT 24 /* Inherit SVE vl_onexec across exec */
#define TIF_SSBD 25 /* Wants SSB mitigation */ #define TIF_SSBD 25 /* Wants SSB mitigation */
#define TIF_TAGGED_ADDR 26 /* Allow tagged user addresses */ #define TIF_TAGGED_ADDR 26 /* Allow tagged user addresses */
#define TIF_32BIT_AARCH64 27 /* 32 bit process on AArch64(ILP32) */ #define TIF_32BIT_AARCH64 27 /* 32 bit process on AArch64(ILP32) */
#define TIF_PATCH_PENDING 28 /* pending live patching update */ #define TIF_PATCH_PENDING 28 /* pending live patching update */
#define TIF_SME 29 /* SME in use */
#define TIF_SME_VL_INHERIT 30 /* Inherit SME vl_onexec across exec */
#define _TIF_SIGPENDING (1 << TIF_SIGPENDING) #define _TIF_SIGPENDING (1 << TIF_SIGPENDING)
#define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED) #define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED)
......
...@@ -78,5 +78,13 @@ ...@@ -78,5 +78,13 @@
#define HWCAP2_ECV (1 << 19) #define HWCAP2_ECV (1 << 19)
#define HWCAP2_AFP (1 << 20) #define HWCAP2_AFP (1 << 20)
#define HWCAP2_RPRES (1 << 21) #define HWCAP2_RPRES (1 << 21)
#define HWCAP2_SME (1 << 23)
#define HWCAP2_SME_I16I64 (1 << 24)
#define HWCAP2_SME_F64F64 (1 << 25)
#define HWCAP2_SME_I8I32 (1 << 26)
#define HWCAP2_SME_F16F32 (1 << 27)
#define HWCAP2_SME_B16F32 (1 << 28)
#define HWCAP2_SME_F32F32 (1 << 29)
#define HWCAP2_SME_FA64 (1 << 30)
#endif /* _UAPI__ASM_HWCAP_H */ #endif /* _UAPI__ASM_HWCAP_H */
...@@ -109,7 +109,7 @@ struct user_hwdebug_state { ...@@ -109,7 +109,7 @@ struct user_hwdebug_state {
} dbg_regs[16]; } dbg_regs[16];
}; };
/* SVE/FP/SIMD state (NT_ARM_SVE) */ /* SVE/FP/SIMD state (NT_ARM_SVE & NT_ARM_SSVE) */
struct user_sve_header { struct user_sve_header {
__u32 size; /* total meaningful regset content in bytes */ __u32 size; /* total meaningful regset content in bytes */
...@@ -220,6 +220,7 @@ struct user_sve_header { ...@@ -220,6 +220,7 @@ struct user_sve_header {
(SVE_PT_SVE_PREG_OFFSET(vq, __SVE_NUM_PREGS) - \ (SVE_PT_SVE_PREG_OFFSET(vq, __SVE_NUM_PREGS) - \
SVE_PT_SVE_PREGS_OFFSET(vq)) SVE_PT_SVE_PREGS_OFFSET(vq))
/* For streaming mode SVE (SSVE) FFR must be read and written as zero */
#define SVE_PT_SVE_FFR_OFFSET(vq) \ #define SVE_PT_SVE_FFR_OFFSET(vq) \
(SVE_PT_REGS_OFFSET + __SVE_FFR_OFFSET(vq)) (SVE_PT_REGS_OFFSET + __SVE_FFR_OFFSET(vq))
...@@ -240,10 +241,12 @@ struct user_sve_header { ...@@ -240,10 +241,12 @@ struct user_sve_header {
- SVE_PT_SVE_OFFSET + (__SVE_VQ_BYTES - 1)) \ - SVE_PT_SVE_OFFSET + (__SVE_VQ_BYTES - 1)) \
/ __SVE_VQ_BYTES * __SVE_VQ_BYTES) / __SVE_VQ_BYTES * __SVE_VQ_BYTES)
#define SVE_PT_SIZE(vq, flags) \ #define SVE_PT_SIZE(vq, flags) \
(((flags) & SVE_PT_REGS_MASK) == SVE_PT_REGS_SVE ? \ (((flags) & SVE_PT_REGS_MASK) == SVE_PT_REGS_SVE ? \
SVE_PT_SVE_OFFSET + SVE_PT_SVE_SIZE(vq, flags) \ SVE_PT_SVE_OFFSET + SVE_PT_SVE_SIZE(vq, flags) \
: SVE_PT_FPSIMD_OFFSET + SVE_PT_FPSIMD_SIZE(vq, flags)) : ((((flags) & SVE_PT_REGS_MASK) == SVE_PT_REGS_FPSIMD ? \
SVE_PT_FPSIMD_OFFSET + SVE_PT_FPSIMD_SIZE(vq, flags) \
: SVE_PT_REGS_OFFSET)))
/* pointer authentication masks (NT_ARM_PAC_MASK) */ /* pointer authentication masks (NT_ARM_PAC_MASK) */
...@@ -265,6 +268,62 @@ struct user_pac_generic_keys { ...@@ -265,6 +268,62 @@ struct user_pac_generic_keys {
__uint128_t apgakey; __uint128_t apgakey;
}; };
/* ZA state (NT_ARM_ZA) */
struct user_za_header {
__u32 size; /* total meaningful regset content in bytes */
__u32 max_size; /* maxmium possible size for this thread */
__u16 vl; /* current vector length */
__u16 max_vl; /* maximum possible vector length */
__u16 flags;
__u16 __reserved;
};
/*
* Common ZA_PT_* flags:
* These must be kept in sync with prctl interface in <linux/prctl.h>
*/
#define ZA_PT_VL_INHERIT ((1 << 17) /* PR_SME_VL_INHERIT */ >> 16)
#define ZA_PT_VL_ONEXEC ((1 << 18) /* PR_SME_SET_VL_ONEXEC */ >> 16)
/*
* The remainder of the ZA state follows struct user_za_header. The
* total size of the ZA state (including header) depends on the
* metadata in the header: ZA_PT_SIZE(vq, flags) gives the total size
* of the state in bytes, including the header.
*
* Refer to <asm/sigcontext.h> for details of how to pass the correct
* "vq" argument to these macros.
*/
/* Offset from the start of struct user_za_header to the register data */
#define ZA_PT_ZA_OFFSET \
((sizeof(struct user_za_header) + (__SVE_VQ_BYTES - 1)) \
/ __SVE_VQ_BYTES * __SVE_VQ_BYTES)
/*
* The payload starts at offset ZA_PT_ZA_OFFSET, and is of size
* ZA_PT_ZA_SIZE(vq, flags).
*
* The ZA array is stored as a sequence of horizontal vectors ZAV of SVL/8
* bytes each, starting from vector 0.
*
* Additional data might be appended in the future.
*
* The ZA matrix is represented in memory in an endianness-invariant layout
* which differs from the layout used for the FPSIMD V-registers on big-endian
* systems: see sigcontext.h for more explanation.
*/
#define ZA_PT_ZAV_OFFSET(vq, n) \
(ZA_PT_ZA_OFFSET + ((vq * __SVE_VQ_BYTES) * n))
#define ZA_PT_ZA_SIZE(vq) ((vq * __SVE_VQ_BYTES) * (vq * __SVE_VQ_BYTES))
#define ZA_PT_SIZE(vq) \
(ZA_PT_ZA_OFFSET + ZA_PT_ZA_SIZE(vq))
#endif /* __ASSEMBLY__ */ #endif /* __ASSEMBLY__ */
#endif /* _UAPI__ASM_PTRACE_H */ #endif /* _UAPI__ASM_PTRACE_H */
...@@ -132,6 +132,17 @@ struct extra_context { ...@@ -132,6 +132,17 @@ struct extra_context {
#define SVE_MAGIC 0x53564501 #define SVE_MAGIC 0x53564501
struct sve_context { struct sve_context {
struct _aarch64_ctx head;
__u16 vl;
__u16 flags;
__u16 __reserved[2];
};
#define SVE_SIG_FLAG_SM 0x1 /* Context describes streaming mode */
#define ZA_MAGIC 0x54366345
struct za_context {
struct _aarch64_ctx head; struct _aarch64_ctx head;
__u16 vl; __u16 vl;
__u16 __reserved[3]; __u16 __reserved[3];
...@@ -186,9 +197,16 @@ struct sve_context { ...@@ -186,9 +197,16 @@ struct sve_context {
* sve_context.vl must equal the thread's current vector length when * sve_context.vl must equal the thread's current vector length when
* doing a sigreturn. * doing a sigreturn.
* *
* On systems with support for SME the SVE register state may reflect either
* streaming or non-streaming mode. In streaming mode the streaming mode
* vector length will be used and the flag SVE_SIG_FLAG_SM will be set in
* the flags field. It is permitted to enter or leave streaming mode in
* a signal return, applications should take care to ensure that any difference
* in vector length between the two modes is handled, including any resizing
* and movement of context blocks.
* *
* Note: for all these macros, the "vq" argument denotes the SVE * Note: for all these macros, the "vq" argument denotes the vector length
* vector length in quadwords (i.e., units of 128 bits). * in quadwords (i.e., units of 128 bits).
* *
* The correct way to obtain vq is to use sve_vq_from_vl(vl). The * The correct way to obtain vq is to use sve_vq_from_vl(vl). The
* result is valid if and only if sve_vl_valid(vl) is true. This is * result is valid if and only if sve_vl_valid(vl) is true. This is
...@@ -249,4 +267,37 @@ struct sve_context { ...@@ -249,4 +267,37 @@ struct sve_context {
#define SVE_SIG_CONTEXT_SIZE(vq) \ #define SVE_SIG_CONTEXT_SIZE(vq) \
(SVE_SIG_REGS_OFFSET + SVE_SIG_REGS_SIZE(vq)) (SVE_SIG_REGS_OFFSET + SVE_SIG_REGS_SIZE(vq))
/*
* If the ZA register is enabled for the thread at signal delivery then,
* za_context.head.size >= ZA_SIG_CONTEXT_SIZE(sve_vq_from_vl(za_context.vl))
* and the register data may be accessed using the ZA_SIG_*() macros.
*
* If za_context.head.size < ZA_SIG_CONTEXT_SIZE(sve_vq_from_vl(za_context.vl))
* then ZA was not enabled and no register data was included in which case
* ZA register was not enabled for the thread and no register data
* the ZA_SIG_*() macros should not be used except for this check.
*
* The same convention applies when returning from a signal: a caller
* will need to remove or resize the za_context block if it wants to
* enable the ZA register when it was previously non-live or vice-versa.
* This may require the caller to allocate fresh memory and/or move other
* context blocks in the signal frame.
*
* Changing the vector length during signal return is not permitted:
* za_context.vl must equal the thread's current SME vector length when
* doing a sigreturn.
*/
#define ZA_SIG_REGS_OFFSET \
((sizeof(struct za_context) + (__SVE_VQ_BYTES - 1)) \
/ __SVE_VQ_BYTES * __SVE_VQ_BYTES)
#define ZA_SIG_REGS_SIZE(vq) ((vq * __SVE_VQ_BYTES) * (vq * __SVE_VQ_BYTES))
#define ZA_SIG_ZAV_OFFSET(vq, n) (ZA_SIG_REGS_OFFSET + \
(SVE_SIG_ZREG_SIZE(vq) * n))
#define ZA_SIG_CONTEXT_SIZE(vq) \
(ZA_SIG_REGS_OFFSET + ZA_SIG_REGS_SIZE(vq))
#endif /* _UAPI__ASM_SIGCONTEXT_H */ #endif /* _UAPI__ASM_SIGCONTEXT_H */
此差异已折叠。
...@@ -97,6 +97,14 @@ static const char *const hwcap_str[] = { ...@@ -97,6 +97,14 @@ static const char *const hwcap_str[] = {
[KERNEL_HWCAP_ECV] = "ecv", [KERNEL_HWCAP_ECV] = "ecv",
[KERNEL_HWCAP_AFP] = "afp", [KERNEL_HWCAP_AFP] = "afp",
[KERNEL_HWCAP_RPRES] = "rpres", [KERNEL_HWCAP_RPRES] = "rpres",
[KERNEL_HWCAP_SME] = "sme",
[KERNEL_HWCAP_SME_I16I64] = "smei16i64",
[KERNEL_HWCAP_SME_F64F64] = "smef64f64",
[KERNEL_HWCAP_SME_I8I32] = "smei8i32",
[KERNEL_HWCAP_SME_F16F32] = "smef16f32",
[KERNEL_HWCAP_SME_B16F32] = "smeb16f32",
[KERNEL_HWCAP_SME_F32F32] = "smef32f32",
[KERNEL_HWCAP_SME_FA64] = "smefa64",
}; };
#ifdef CONFIG_AARCH32_EL0 #ifdef CONFIG_AARCH32_EL0
...@@ -374,6 +382,7 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info) ...@@ -374,6 +382,7 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
info->reg_id_aa64pfr0 = read_cpuid(ID_AA64PFR0_EL1); info->reg_id_aa64pfr0 = read_cpuid(ID_AA64PFR0_EL1);
info->reg_id_aa64pfr1 = read_cpuid(ID_AA64PFR1_EL1); info->reg_id_aa64pfr1 = read_cpuid(ID_AA64PFR1_EL1);
info->reg_id_aa64zfr0 = read_cpuid(ID_AA64ZFR0_EL1); info->reg_id_aa64zfr0 = read_cpuid(ID_AA64ZFR0_EL1);
info->reg_id_aa64smfr0 = read_cpuid(ID_AA64SMFR0_EL1);
/* Update the 32bit ID registers only if AArch32 is implemented */ /* Update the 32bit ID registers only if AArch32 is implemented */
if (id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0)) { if (id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0)) {
...@@ -405,6 +414,10 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info) ...@@ -405,6 +414,10 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
id_aa64pfr0_sve(info->reg_id_aa64pfr0)) id_aa64pfr0_sve(info->reg_id_aa64pfr0))
info->reg_zcr = read_zcr_features(); info->reg_zcr = read_zcr_features();
if (IS_ENABLED(CONFIG_ARM64_SME) &&
id_aa64pfr1_sme(info->reg_id_aa64pfr1))
info->reg_smcr = read_smcr_features();
cpuinfo_detect_icache_policy(info); cpuinfo_detect_icache_policy(info);
} }
......
...@@ -283,6 +283,14 @@ static void noinstr el0_sve_acc(struct pt_regs *regs, unsigned long esr) ...@@ -283,6 +283,14 @@ static void noinstr el0_sve_acc(struct pt_regs *regs, unsigned long esr)
do_sve_acc(esr, regs); do_sve_acc(esr, regs);
} }
static void noinstr el0_sme_acc(struct pt_regs *regs, unsigned long esr)
{
enter_from_user_mode();
local_daif_restore(DAIF_PROCCTX);
do_sme_acc(esr, regs);
exit_to_user_mode();
}
static void noinstr el0_fpsimd_exc(struct pt_regs *regs, unsigned long esr) static void noinstr el0_fpsimd_exc(struct pt_regs *regs, unsigned long esr)
{ {
enter_from_user_mode(); enter_from_user_mode();
...@@ -380,6 +388,9 @@ asmlinkage void noinstr el0_sync_handler(struct pt_regs *regs) ...@@ -380,6 +388,9 @@ asmlinkage void noinstr el0_sync_handler(struct pt_regs *regs)
case ESR_ELx_EC_SVE: case ESR_ELx_EC_SVE:
el0_sve_acc(regs, esr); el0_sve_acc(regs, esr);
break; break;
case ESR_ELx_EC_SME:
el0_sme_acc(regs, esr);
break;
case ESR_ELx_EC_FP_EXC64: case ESR_ELx_EC_FP_EXC64:
el0_fpsimd_exc(regs, esr); el0_fpsimd_exc(regs, esr);
break; break;
......
...@@ -34,12 +34,12 @@ SYM_FUNC_END(fpsimd_load_state) ...@@ -34,12 +34,12 @@ SYM_FUNC_END(fpsimd_load_state)
#ifdef CONFIG_ARM64_SVE #ifdef CONFIG_ARM64_SVE
SYM_FUNC_START(sve_save_state) SYM_FUNC_START(sve_save_state)
sve_save 0, x1, 2 sve_save 0, x1, x2, 3
ret ret
SYM_FUNC_END(sve_save_state) SYM_FUNC_END(sve_save_state)
SYM_FUNC_START(sve_load_state) SYM_FUNC_START(sve_load_state)
sve_load 0, x1, x2, 3, x4 sve_load 0, x1, x2, 4
ret ret
SYM_FUNC_END(sve_load_state) SYM_FUNC_END(sve_load_state)
...@@ -48,27 +48,63 @@ SYM_FUNC_START(sve_get_vl) ...@@ -48,27 +48,63 @@ SYM_FUNC_START(sve_get_vl)
ret ret
SYM_FUNC_END(sve_get_vl) SYM_FUNC_END(sve_get_vl)
SYM_FUNC_START(sve_set_vq)
sve_load_vq x0, x1, x2
ret
SYM_FUNC_END(sve_set_vq)
/* /*
* Load SVE state from FPSIMD state. * Zero all SVE registers but the first 128-bits of each vector
* *
* x0 = pointer to struct fpsimd_state * VQ must already be configured by caller, any further updates of VQ
* x1 = VQ - 1 * will need to ensure that the register state remains valid.
* *
* Each SVE vector will be loaded with the first 128-bits taken from FPSIMD * x0 = include FFR?
* and the rest zeroed. All the other SVE registers will be zeroed. * x1 = VQ - 1
*/ */
SYM_FUNC_START(sve_load_from_fpsimd_state)
sve_load_vq x1, x2, x3
fpsimd_restore x0, 8
_for n, 0, 15, _sve_pfalse \n
_sve_wrffr 0
ret
SYM_FUNC_END(sve_load_from_fpsimd_state)
/* Zero all SVE registers but the first 128-bits of each vector */
SYM_FUNC_START(sve_flush_live) SYM_FUNC_START(sve_flush_live)
sve_flush cbz x1, 1f // A VQ-1 of 0 is 128 bits so no extra Z state
ret sve_flush_z
1: sve_flush_p
tbz x0, #0, 2f
sve_flush_ffr
2: ret
SYM_FUNC_END(sve_flush_live) SYM_FUNC_END(sve_flush_live)
#endif /* CONFIG_ARM64_SVE */ #endif /* CONFIG_ARM64_SVE */
#ifdef CONFIG_ARM64_SME
SYM_FUNC_START(sme_get_vl)
_sme_rdsvl 0, 1
ret
SYM_FUNC_END(sme_get_vl)
SYM_FUNC_START(sme_set_vq)
sme_load_vq x0, x1, x2
ret
SYM_FUNC_END(sme_set_vq)
/*
* Save the SME state
*
* x0 - pointer to buffer for state
*/
SYM_FUNC_START(za_save_state)
_sme_rdsvl 1, 1 // x1 = VL/8
sme_save_za 0, x1, 12
ret
SYM_FUNC_END(za_save_state)
/*
* Load the SME state
*
* x0 - pointer to buffer for state
*/
SYM_FUNC_START(za_load_state)
_sme_rdsvl 1, 1 // x1 = VL/8
sme_load_za 0, x1, 12
ret
SYM_FUNC_END(za_load_state)
#endif /* CONFIG_ARM64_SME */
此差异已折叠。
...@@ -613,6 +613,90 @@ set_hcr: ...@@ -613,6 +613,90 @@ set_hcr:
isb isb
ret ret
/* Disable any fine grained traps */
.macro __init_el2_fgt
mrs x1, id_aa64mmfr0_el1
ubfx x1, x1, #ID_AA64MMFR0_FGT_SHIFT, #4
cbz x1, .Lskip_fgt_\@
mov x0, xzr
mrs x1, id_aa64dfr0_el1
ubfx x1, x1, #ID_AA64DFR0_PMSVER_SHIFT, #4
cmp x1, #3
b.lt .Lset_debug_fgt_\@
/* Disable PMSNEVFR_EL1 read and write traps */
orr x0, x0, #(1 << 62)
.Lset_debug_fgt_\@:
msr_s SYS_HDFGRTR_EL2, x0
msr_s SYS_HDFGWTR_EL2, x0
mov x0, xzr
mrs x1, id_aa64pfr1_el1
ubfx x1, x1, #ID_AA64PFR1_SME_SHIFT, #4
cbz x1, .Lset_fgt_\@
/* Disable nVHE traps of TPIDR2 and SMPRI */
orr x0, x0, #HFGxTR_EL2_nSMPRI_EL1_MASK
orr x0, x0, #HFGxTR_EL2_nTPIDR2_EL0_MASK
.Lset_fgt_\@:
msr_s SYS_HFGRTR_EL2, x0
msr_s SYS_HFGWTR_EL2, x0
msr_s SYS_HFGITR_EL2, xzr
mrs x1, id_aa64pfr0_el1 // AMU traps UNDEF without AMU
ubfx x1, x1, #ID_AA64PFR0_AMU_SHIFT, #4
cbz x1, .Lskip_fgt_\@
msr_s SYS_HAFGRTR_EL2, xzr
.Lskip_fgt_\@:
.endm
/* SME register access and priority mapping */
.macro __init_el2_nvhe_sme
mrs x1, id_aa64pfr1_el1
ubfx x1, x1, #ID_AA64PFR1_SME_SHIFT, #4
cbz x1, .Lskip_sme_\@
bic x0, x0, #CPTR_EL2_TSM // Also disable SME traps
msr cptr_el2, x0 // Disable copro. traps to EL2
isb
mrs x1, sctlr_el2
orr x1, x1, #SCTLR_ELx_ENTP2 // Disable TPIDR2 traps
msr sctlr_el2, x1
isb
mov x1, #0 // SMCR controls
mrs_s x2, SYS_ID_AA64SMFR0_EL1
ubfx x2, x2, #ID_AA64SMFR0_FA64_SHIFT, #1 // Full FP in SM?
cbz x2, .Lskip_sme_fa64_\@
orr x1, x1, SMCR_ELx_FA64_MASK
.Lskip_sme_fa64_\@:
orr x1, x1, #SMCR_ELx_LEN_MASK // Enable full SME vector
msr_s SYS_SMCR_EL2, x1 // length for EL1.
mrs_s x1, SYS_SMIDR_EL1 // Priority mapping supported?
ubfx x1, x1, #SMIDR_EL1_SMPS_SHIFT, #1
cbz x1, .Lskip_sme_\@
msr_s SYS_SMPRIMAP_EL2, xzr // Make all priorities equal
mrs x1, id_aa64mmfr1_el1 // HCRX_EL2 present?
ubfx x1, x1, #ID_AA64MMFR1_HCX_SHIFT, #4
cbz x1, .Lskip_sme_\@
mrs_s x1, SYS_HCRX_EL2
orr x1, x1, #HCRX_EL2_SMPME_MASK // Enable priority mapping
msr_s SYS_HCRX_EL2, x1
.Lskip_sme_\@:
.endm
SYM_INNER_LABEL(install_el2_stub, SYM_L_LOCAL) SYM_INNER_LABEL(install_el2_stub, SYM_L_LOCAL)
/* /*
* When VHE is not in use, early init of EL2 and EL1 needs to be * When VHE is not in use, early init of EL2 and EL1 needs to be
...@@ -639,6 +723,9 @@ SYM_INNER_LABEL(install_el2_stub, SYM_L_LOCAL) ...@@ -639,6 +723,9 @@ SYM_INNER_LABEL(install_el2_stub, SYM_L_LOCAL)
mov x1, #ZCR_ELx_LEN_MASK // SVE: Enable full vector mov x1, #ZCR_ELx_LEN_MASK // SVE: Enable full vector
msr_s SYS_ZCR_EL2, x1 // length for EL1. msr_s SYS_ZCR_EL2, x1 // length for EL1.
__init_el2_nvhe_sme
__init_el2_fgt
/* Hypervisor stub */ /* Hypervisor stub */
7: adr_l x0, __hyp_stub_vectors 7: adr_l x0, __hyp_stub_vectors
msr vbar_el2, x0 msr vbar_el2, x0
......
...@@ -320,6 +320,9 @@ void show_regs(struct pt_regs * regs) ...@@ -320,6 +320,9 @@ void show_regs(struct pt_regs * regs)
static void tls_thread_flush(void) static void tls_thread_flush(void)
{ {
write_sysreg(0, tpidr_el0); write_sysreg(0, tpidr_el0);
if (system_supports_tpidr2())
write_sysreg_s(0, SYS_TPIDR2_EL0);
if (is_a32_compat_task()) { if (is_a32_compat_task()) {
current->thread.uw.tp_value = 0; current->thread.uw.tp_value = 0;
...@@ -369,16 +372,42 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src) ...@@ -369,16 +372,42 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
/* /*
* Detach src's sve_state (if any) from dst so that it does not * Detach src's sve_state (if any) from dst so that it does not
* get erroneously used or freed prematurely. dst's sve_state * get erroneously used or freed prematurely. dst's copies
* will be allocated on demand later on if dst uses SVE. * will be allocated on demand later on if dst uses SVE.
* For consistency, also clear TIF_SVE here: this could be done * For consistency, also clear TIF_SVE here: this could be done
* later in copy_process(), but to avoid tripping up future * later in copy_process(), but to avoid tripping up future
* maintainers it is best not to leave TIF_SVE and sve_state in * maintainers it is best not to leave TIF flags and buffers in
* an inconsistent state, even temporarily. * an inconsistent state, even temporarily.
*/ */
dst->thread.sve_state = NULL; dst->thread.sve_state = NULL;
clear_tsk_thread_flag(dst, TIF_SVE); clear_tsk_thread_flag(dst, TIF_SVE);
/*
* In the unlikely event that we create a new thread with ZA
* enabled we should retain the ZA state so duplicate it here.
* This may be shortly freed if we exec() or if CLONE_SETTLS
* but it's simpler to do it here. To avoid confusing the rest
* of the code ensure that we have a sve_state allocated
* whenever za_state is allocated.
*/
if (thread_za_enabled(&src->thread)) {
dst->thread.sve_state = kzalloc(sve_state_size(src),
GFP_KERNEL);
if (!dst->thread.sve_state)
return -ENOMEM;
dst->thread.za_state = kmemdup(src->thread.za_state,
za_state_size(src),
GFP_KERNEL);
if (!dst->thread.za_state) {
kfree(dst->thread.sve_state);
dst->thread.sve_state = NULL;
return -ENOMEM;
}
} else {
dst->thread.za_state = NULL;
clear_tsk_thread_flag(dst, TIF_SME);
}
/* clear any pending asynchronous tag fault raised by the parent */ /* clear any pending asynchronous tag fault raised by the parent */
clear_tsk_thread_flag(dst, TIF_MTE_ASYNC_FAULT); clear_tsk_thread_flag(dst, TIF_MTE_ASYNC_FAULT);
...@@ -414,6 +443,8 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start, ...@@ -414,6 +443,8 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
* out-of-sync with the saved value. * out-of-sync with the saved value.
*/ */
*task_user_tls(p) = read_sysreg(tpidr_el0); *task_user_tls(p) = read_sysreg(tpidr_el0);
if (system_supports_tpidr2())
p->thread.tpidr2_el0 = read_sysreg_s(SYS_TPIDR2_EL0);
if (stack_start) { if (stack_start) {
if (is_a32_compat_thread(task_thread_info(p))) if (is_a32_compat_thread(task_thread_info(p)))
...@@ -424,10 +455,12 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start, ...@@ -424,10 +455,12 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
/* /*
* If a TLS pointer was passed to clone, use it for the new * If a TLS pointer was passed to clone, use it for the new
* thread. * thread. We also reset TPIDR2 if it's in use.
*/ */
if (clone_flags & CLONE_SETTLS) if (clone_flags & CLONE_SETTLS) {
p->thread.uw.tp_value = tls; p->thread.uw.tp_value = tls;
p->thread.tpidr2_el0 = 0;
}
} else { } else {
/* /*
* A kthread has no context to ERET to, so ensure any buggy * A kthread has no context to ERET to, so ensure any buggy
...@@ -453,6 +486,8 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start, ...@@ -453,6 +486,8 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
void tls_preserve_current_state(void) void tls_preserve_current_state(void)
{ {
*task_user_tls(current) = read_sysreg(tpidr_el0); *task_user_tls(current) = read_sysreg(tpidr_el0);
if (system_supports_tpidr2() && !is_compat_task())
current->thread.tpidr2_el0 = read_sysreg_s(SYS_TPIDR2_EL0);
} }
static void tls_thread_switch(struct task_struct *next) static void tls_thread_switch(struct task_struct *next)
...@@ -465,6 +500,8 @@ static void tls_thread_switch(struct task_struct *next) ...@@ -465,6 +500,8 @@ static void tls_thread_switch(struct task_struct *next)
write_sysreg(0, tpidrro_el0); write_sysreg(0, tpidrro_el0);
write_sysreg(*task_user_tls(next), tpidr_el0); write_sysreg(*task_user_tls(next), tpidr_el0);
if (system_supports_tpidr2())
write_sysreg_s(next->thread.tpidr2_el0, SYS_TPIDR2_EL0);
} }
/* /*
......
...@@ -715,21 +715,51 @@ static int system_call_set(struct task_struct *target, ...@@ -715,21 +715,51 @@ static int system_call_set(struct task_struct *target,
#ifdef CONFIG_ARM64_SVE #ifdef CONFIG_ARM64_SVE
static void sve_init_header_from_task(struct user_sve_header *header, static void sve_init_header_from_task(struct user_sve_header *header,
struct task_struct *target) struct task_struct *target,
enum vec_type type)
{ {
unsigned int vq; unsigned int vq;
bool active;
bool fpsimd_only;
enum vec_type task_type;
memset(header, 0, sizeof(*header)); memset(header, 0, sizeof(*header));
header->flags = test_tsk_thread_flag(target, TIF_SVE) ? /* Check if the requested registers are active for the task */
SVE_PT_REGS_SVE : SVE_PT_REGS_FPSIMD; if (thread_sm_enabled(&target->thread))
if (test_tsk_thread_flag(target, TIF_SVE_VL_INHERIT)) task_type = ARM64_VEC_SME;
header->flags |= SVE_PT_VL_INHERIT; else
task_type = ARM64_VEC_SVE;
active = (task_type == type);
switch (type) {
case ARM64_VEC_SVE:
if (test_tsk_thread_flag(target, TIF_SVE_VL_INHERIT))
header->flags |= SVE_PT_VL_INHERIT;
fpsimd_only = !test_tsk_thread_flag(target, TIF_SVE);
break;
case ARM64_VEC_SME:
if (test_tsk_thread_flag(target, TIF_SME_VL_INHERIT))
header->flags |= SVE_PT_VL_INHERIT;
fpsimd_only = false;
break;
default:
WARN_ON_ONCE(1);
return;
}
if (active) {
if (fpsimd_only) {
header->flags |= SVE_PT_REGS_FPSIMD;
} else {
header->flags |= SVE_PT_REGS_SVE;
}
}
header->vl = target->thread.sve_vl; header->vl = task_get_vl(target, type);
vq = sve_vq_from_vl(header->vl); vq = sve_vq_from_vl(header->vl);
header->max_vl = sve_max_vl; header->max_vl = vec_max_vl(type);
header->size = SVE_PT_SIZE(vq, header->flags); header->size = SVE_PT_SIZE(vq, header->flags);
header->max_size = SVE_PT_SIZE(sve_vq_from_vl(header->max_vl), header->max_size = SVE_PT_SIZE(sve_vq_from_vl(header->max_vl),
SVE_PT_REGS_SVE); SVE_PT_REGS_SVE);
...@@ -740,19 +770,17 @@ static unsigned int sve_size_from_header(struct user_sve_header const *header) ...@@ -740,19 +770,17 @@ static unsigned int sve_size_from_header(struct user_sve_header const *header)
return ALIGN(header->size, SVE_VQ_BYTES); return ALIGN(header->size, SVE_VQ_BYTES);
} }
static int sve_get(struct task_struct *target, static int sve_get_common(struct task_struct *target,
const struct user_regset *regset, const struct user_regset *regset,
struct membuf to) struct membuf to,
enum vec_type type)
{ {
struct user_sve_header header; struct user_sve_header header;
unsigned int vq; unsigned int vq;
unsigned long start, end; unsigned long start, end;
if (!system_supports_sve())
return -EINVAL;
/* Header */ /* Header */
sve_init_header_from_task(&header, target); sve_init_header_from_task(&header, target, type);
vq = sve_vq_from_vl(header.vl); vq = sve_vq_from_vl(header.vl);
membuf_write(&to, &header, sizeof(header)); membuf_write(&to, &header, sizeof(header));
...@@ -760,49 +788,61 @@ static int sve_get(struct task_struct *target, ...@@ -760,49 +788,61 @@ static int sve_get(struct task_struct *target,
if (target == current) if (target == current)
fpsimd_preserve_current_state(); fpsimd_preserve_current_state();
/* Registers: FPSIMD-only case */
BUILD_BUG_ON(SVE_PT_FPSIMD_OFFSET != sizeof(header)); BUILD_BUG_ON(SVE_PT_FPSIMD_OFFSET != sizeof(header));
if ((header.flags & SVE_PT_REGS_MASK) == SVE_PT_REGS_FPSIMD) BUILD_BUG_ON(SVE_PT_SVE_OFFSET != sizeof(header));
switch ((header.flags & SVE_PT_REGS_MASK)) {
case SVE_PT_REGS_FPSIMD:
return __fpr_get(target, regset, to); return __fpr_get(target, regset, to);
/* Otherwise: full SVE case */ case SVE_PT_REGS_SVE:
start = SVE_PT_SVE_OFFSET;
end = SVE_PT_SVE_FFR_OFFSET(vq) + SVE_PT_SVE_FFR_SIZE(vq);
membuf_write(&to, target->thread.sve_state, end - start);
BUILD_BUG_ON(SVE_PT_SVE_OFFSET != sizeof(header)); start = end;
start = SVE_PT_SVE_OFFSET; end = SVE_PT_SVE_FPSR_OFFSET(vq);
end = SVE_PT_SVE_FFR_OFFSET(vq) + SVE_PT_SVE_FFR_SIZE(vq); membuf_zero(&to, end - start);
membuf_write(&to, target->thread.sve_state, end - start);
start = end; /*
end = SVE_PT_SVE_FPSR_OFFSET(vq); * Copy fpsr, and fpcr which must follow contiguously in
membuf_zero(&to, end - start); * struct fpsimd_state:
*/
start = end;
end = SVE_PT_SVE_FPCR_OFFSET(vq) + SVE_PT_SVE_FPCR_SIZE;
membuf_write(&to, &target->thread.uw.fpsimd_state.fpsr,
end - start);
/* start = end;
* Copy fpsr, and fpcr which must follow contiguously in end = sve_size_from_header(&header);
* struct fpsimd_state: return membuf_zero(&to, end - start);
*/
start = end;
end = SVE_PT_SVE_FPCR_OFFSET(vq) + SVE_PT_SVE_FPCR_SIZE;
membuf_write(&to, &target->thread.uw.fpsimd_state.fpsr, end - start);
start = end; default:
end = sve_size_from_header(&header); return 0;
return membuf_zero(&to, end - start); }
} }
static int sve_set(struct task_struct *target, static int sve_get(struct task_struct *target,
const struct user_regset *regset, const struct user_regset *regset,
unsigned int pos, unsigned int count, struct membuf to)
const void *kbuf, const void __user *ubuf) {
if (!system_supports_sve())
return -EINVAL;
return sve_get_common(target, regset, to, ARM64_VEC_SVE);
}
static int sve_set_common(struct task_struct *target,
const struct user_regset *regset,
unsigned int pos, unsigned int count,
const void *kbuf, const void __user *ubuf,
enum vec_type type)
{ {
int ret; int ret;
struct user_sve_header header; struct user_sve_header header;
unsigned int vq; unsigned int vq;
unsigned long start, end; unsigned long start, end;
if (!system_supports_sve())
return -EINVAL;
/* Header */ /* Header */
if (count < sizeof(header)) if (count < sizeof(header))
return -EINVAL; return -EINVAL;
...@@ -813,15 +853,39 @@ static int sve_set(struct task_struct *target, ...@@ -813,15 +853,39 @@ static int sve_set(struct task_struct *target,
/* /*
* Apart from SVE_PT_REGS_MASK, all SVE_PT_* flags are consumed by * Apart from SVE_PT_REGS_MASK, all SVE_PT_* flags are consumed by
* sve_set_vector_length(), which will also validate them for us: * vec_set_vector_length(), which will also validate them for us:
*/ */
ret = sve_set_vector_length(target, header.vl, ret = vec_set_vector_length(target, type, header.vl,
((unsigned long)header.flags & ~SVE_PT_REGS_MASK) << 16); ((unsigned long)header.flags & ~SVE_PT_REGS_MASK) << 16);
if (ret) if (ret)
goto out; goto out;
/* Actual VL set may be less than the user asked for: */ /* Actual VL set may be less than the user asked for: */
vq = sve_vq_from_vl(target->thread.sve_vl); vq = sve_vq_from_vl(task_get_vl(target, type));
/* Enter/exit streaming mode */
if (system_supports_sme()) {
u64 old_svcr = target->thread.svcr;
switch (type) {
case ARM64_VEC_SVE:
target->thread.svcr &= ~SVCR_SM_MASK;
break;
case ARM64_VEC_SME:
target->thread.svcr |= SVCR_SM_MASK;
break;
default:
WARN_ON_ONCE(1);
return -EINVAL;
}
/*
* If we switched then invalidate any existing SVE
* state and ensure there's storage.
*/
if (target->thread.svcr != old_svcr)
sve_alloc(target);
}
/* Registers: FPSIMD-only case */ /* Registers: FPSIMD-only case */
...@@ -830,10 +894,15 @@ static int sve_set(struct task_struct *target, ...@@ -830,10 +894,15 @@ static int sve_set(struct task_struct *target,
ret = __fpr_set(target, regset, pos, count, kbuf, ubuf, ret = __fpr_set(target, regset, pos, count, kbuf, ubuf,
SVE_PT_FPSIMD_OFFSET); SVE_PT_FPSIMD_OFFSET);
clear_tsk_thread_flag(target, TIF_SVE); clear_tsk_thread_flag(target, TIF_SVE);
if (type == ARM64_VEC_SME)
fpsimd_force_sync_to_sve(target);
goto out; goto out;
} }
/* Otherwise: full SVE case */ /*
* Otherwise: no registers or full SVE case. For backwards
* compatibility reasons we treat empty flags as SVE registers.
*/
/* /*
* If setting a different VL from the requested VL and there is * If setting a different VL from the requested VL and there is
...@@ -846,11 +915,17 @@ static int sve_set(struct task_struct *target, ...@@ -846,11 +915,17 @@ static int sve_set(struct task_struct *target,
} }
sve_alloc(target); sve_alloc(target);
if (!target->thread.sve_state) {
ret = -ENOMEM;
clear_tsk_thread_flag(target, TIF_SVE);
goto out;
}
/* /*
* Ensure target->thread.sve_state is up to date with target's * Ensure target->thread.sve_state is up to date with target's
* FPSIMD regs, so that a short copyin leaves trailing registers * FPSIMD regs, so that a short copyin leaves trailing
* unmodified. * registers unmodified. Always enable SVE even if going into
* streaming mode.
*/ */
fpsimd_sync_to_sve(target); fpsimd_sync_to_sve(target);
set_tsk_thread_flag(target, TIF_SVE); set_tsk_thread_flag(target, TIF_SVE);
...@@ -886,8 +961,181 @@ static int sve_set(struct task_struct *target, ...@@ -886,8 +961,181 @@ static int sve_set(struct task_struct *target,
return ret; return ret;
} }
static int sve_set(struct task_struct *target,
const struct user_regset *regset,
unsigned int pos, unsigned int count,
const void *kbuf, const void __user *ubuf)
{
if (!system_supports_sve())
return -EINVAL;
return sve_set_common(target, regset, pos, count, kbuf, ubuf,
ARM64_VEC_SVE);
}
#endif /* CONFIG_ARM64_SVE */ #endif /* CONFIG_ARM64_SVE */
#ifdef CONFIG_ARM64_SME
static int ssve_get(struct task_struct *target,
const struct user_regset *regset,
struct membuf to)
{
if (!system_supports_sme())
return -EINVAL;
return sve_get_common(target, regset, to, ARM64_VEC_SME);
}
static int ssve_set(struct task_struct *target,
const struct user_regset *regset,
unsigned int pos, unsigned int count,
const void *kbuf, const void __user *ubuf)
{
if (!system_supports_sme())
return -EINVAL;
return sve_set_common(target, regset, pos, count, kbuf, ubuf,
ARM64_VEC_SME);
}
static int za_get(struct task_struct *target,
const struct user_regset *regset,
struct membuf to)
{
struct user_za_header header;
unsigned int vq;
unsigned long start, end;
if (!system_supports_sme())
return -EINVAL;
/* Header */
memset(&header, 0, sizeof(header));
if (test_tsk_thread_flag(target, TIF_SME_VL_INHERIT))
header.flags |= ZA_PT_VL_INHERIT;
header.vl = task_get_sme_vl(target);
vq = sve_vq_from_vl(header.vl);
header.max_vl = sme_max_vl();
header.max_size = ZA_PT_SIZE(vq);
/* If ZA is not active there is only the header */
if (thread_za_enabled(&target->thread))
header.size = ZA_PT_SIZE(vq);
else
header.size = ZA_PT_ZA_OFFSET;
membuf_write(&to, &header, sizeof(header));
BUILD_BUG_ON(ZA_PT_ZA_OFFSET != sizeof(header));
end = ZA_PT_ZA_OFFSET;
if (target == current)
fpsimd_preserve_current_state();
/* Any register data to include? */
if (thread_za_enabled(&target->thread)) {
start = end;
end = ZA_PT_SIZE(vq);
membuf_write(&to, target->thread.za_state, end - start);
}
/* Zero any trailing padding */
start = end;
end = ALIGN(header.size, SVE_VQ_BYTES);
return membuf_zero(&to, end - start);
}
static int za_set(struct task_struct *target,
const struct user_regset *regset,
unsigned int pos, unsigned int count,
const void *kbuf, const void __user *ubuf)
{
int ret;
struct user_za_header header;
unsigned int vq;
unsigned long start, end;
if (!system_supports_sme())
return -EINVAL;
/* Header */
if (count < sizeof(header))
return -EINVAL;
ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &header,
0, sizeof(header));
if (ret)
goto out;
/*
* All current ZA_PT_* flags are consumed by
* vec_set_vector_length(), which will also validate them for
* us:
*/
ret = vec_set_vector_length(target, ARM64_VEC_SME, header.vl,
((unsigned long)header.flags) << 16);
if (ret)
goto out;
/* Actual VL set may be less than the user asked for: */
vq = sve_vq_from_vl(task_get_sme_vl(target));
/* Ensure there is some SVE storage for streaming mode */
if (!target->thread.sve_state) {
sve_alloc(target);
if (!target->thread.sve_state) {
clear_thread_flag(TIF_SME);
ret = -ENOMEM;
goto out;
}
}
/* Allocate/reinit ZA storage */
sme_alloc(target);
if (!target->thread.za_state) {
ret = -ENOMEM;
clear_tsk_thread_flag(target, TIF_SME);
goto out;
}
/* If there is no data then disable ZA */
if (!count) {
target->thread.svcr &= ~SVCR_ZA_MASK;
goto out;
}
/*
* If setting a different VL from the requested VL and there is
* register data, the data layout will be wrong: don't even
* try to set the registers in this case.
*/
if (vq != sve_vq_from_vl(header.vl)) {
ret = -EIO;
goto out;
}
BUILD_BUG_ON(ZA_PT_ZA_OFFSET != sizeof(header));
start = ZA_PT_ZA_OFFSET;
end = ZA_PT_SIZE(vq);
ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
target->thread.za_state,
start, end);
if (ret)
goto out;
/* Mark ZA as active and let userspace use it */
set_tsk_thread_flag(target, TIF_SME);
target->thread.svcr |= SVCR_ZA_MASK;
out:
fpsimd_flush_task_state(target);
return ret;
}
#endif /* CONFIG_ARM64_SME */
#ifdef CONFIG_ARM64_PTR_AUTH #ifdef CONFIG_ARM64_PTR_AUTH
static int pac_mask_get(struct task_struct *target, static int pac_mask_get(struct task_struct *target,
const struct user_regset *regset, const struct user_regset *regset,
...@@ -1073,6 +1321,10 @@ enum aarch64_regset { ...@@ -1073,6 +1321,10 @@ enum aarch64_regset {
#ifdef CONFIG_ARM64_SVE #ifdef CONFIG_ARM64_SVE
REGSET_SVE, REGSET_SVE,
#endif #endif
#ifdef CONFIG_ARM64_SVE
REGSET_SSVE,
REGSET_ZA,
#endif
#ifdef CONFIG_ARM64_PTR_AUTH #ifdef CONFIG_ARM64_PTR_AUTH
REGSET_PAC_MASK, REGSET_PAC_MASK,
#ifdef CONFIG_CHECKPOINT_RESTORE #ifdef CONFIG_CHECKPOINT_RESTORE
...@@ -1152,6 +1404,33 @@ static const struct user_regset aarch64_regsets[] = { ...@@ -1152,6 +1404,33 @@ static const struct user_regset aarch64_regsets[] = {
.set = sve_set, .set = sve_set,
}, },
#endif #endif
#ifdef CONFIG_ARM64_SME
[REGSET_SSVE] = { /* Streaming mode SVE */
.core_note_type = NT_ARM_SSVE,
.n = DIV_ROUND_UP(SVE_PT_SIZE(SME_VQ_MAX, SVE_PT_REGS_SVE),
SVE_VQ_BYTES),
.size = SVE_VQ_BYTES,
.align = SVE_VQ_BYTES,
.regset_get = ssve_get,
.set = ssve_set,
},
[REGSET_ZA] = { /* SME ZA */
.core_note_type = NT_ARM_ZA,
/*
* ZA is a single register but it's variably sized and
* the ptrace core requires that the size of any data
* be an exact multiple of the configured register
* size so report as though we had SVE_VQ_BYTES
* registers. These values aren't exposed to
* userspace.
*/
.n = DIV_ROUND_UP(ZA_PT_SIZE(SME_VQ_MAX), SVE_VQ_BYTES),
.size = SVE_VQ_BYTES,
.align = SVE_VQ_BYTES,
.regset_get = za_get,
.set = za_set,
},
#endif
#ifdef CONFIG_ARM64_PTR_AUTH #ifdef CONFIG_ARM64_PTR_AUTH
[REGSET_PAC_MASK] = { [REGSET_PAC_MASK] = {
.core_note_type = NT_ARM_PAC_MASK, .core_note_type = NT_ARM_PAC_MASK,
......
...@@ -180,11 +180,17 @@ int preserve_sve_context(struct sve_context __user *ctx) ...@@ -180,11 +180,17 @@ int preserve_sve_context(struct sve_context __user *ctx)
{ {
int err = 0; int err = 0;
u16 reserved[ARRAY_SIZE(ctx->__reserved)]; u16 reserved[ARRAY_SIZE(ctx->__reserved)];
unsigned int vl = current->thread.sve_vl; u16 flags = 0;
unsigned int vl = task_get_sve_vl(current);
unsigned int vq = 0; unsigned int vq = 0;
if (test_thread_flag(TIF_SVE)) if (thread_sm_enabled(&current->thread)) {
vl = task_get_sme_vl(current);
vq = sve_vq_from_vl(vl); vq = sve_vq_from_vl(vl);
flags |= SVE_SIG_FLAG_SM;
} else if (test_thread_flag(TIF_SVE)) {
vq = sve_vq_from_vl(vl);
}
memset(reserved, 0, sizeof(reserved)); memset(reserved, 0, sizeof(reserved));
...@@ -192,6 +198,7 @@ int preserve_sve_context(struct sve_context __user *ctx) ...@@ -192,6 +198,7 @@ int preserve_sve_context(struct sve_context __user *ctx)
__put_user_error(round_up(SVE_SIG_CONTEXT_SIZE(vq), 16), __put_user_error(round_up(SVE_SIG_CONTEXT_SIZE(vq), 16),
&ctx->head.size, err); &ctx->head.size, err);
__put_user_error(vl, &ctx->vl, err); __put_user_error(vl, &ctx->vl, err);
__put_user_error(flags, &ctx->flags, err);
BUILD_BUG_ON(sizeof(ctx->__reserved) != sizeof(reserved)); BUILD_BUG_ON(sizeof(ctx->__reserved) != sizeof(reserved));
err |= __copy_to_user(&ctx->__reserved, reserved, sizeof(reserved)); err |= __copy_to_user(&ctx->__reserved, reserved, sizeof(reserved));
...@@ -212,18 +219,28 @@ int preserve_sve_context(struct sve_context __user *ctx) ...@@ -212,18 +219,28 @@ int preserve_sve_context(struct sve_context __user *ctx)
int restore_sve_fpsimd_context(struct user_ctxs *user) int restore_sve_fpsimd_context(struct user_ctxs *user)
{ {
int err; int err;
unsigned int vq; unsigned int vl, vq;
struct user_fpsimd_state fpsimd; struct user_fpsimd_state fpsimd;
struct sve_context sve; struct sve_context sve;
if (__copy_from_user(&sve, user->sve, sizeof(sve))) if (__copy_from_user(&sve, user->sve, sizeof(sve)))
return -EFAULT; return -EFAULT;
if (sve.vl != current->thread.sve_vl) if (sve.flags & SVE_SIG_FLAG_SM) {
if (!system_supports_sme())
return -EINVAL;
vl = task_get_sme_vl(current);
} else {
vl = task_get_sve_vl(current);
}
if (sve.vl != vl)
return -EINVAL; return -EINVAL;
if (sve.head.size <= sizeof(*user->sve)) { if (sve.head.size <= sizeof(*user->sve)) {
clear_thread_flag(TIF_SVE); clear_thread_flag(TIF_SVE);
current->thread.svcr &= ~SVCR_SM_MASK;
goto fpsimd_only; goto fpsimd_only;
} }
...@@ -243,6 +260,11 @@ int restore_sve_fpsimd_context(struct user_ctxs *user) ...@@ -243,6 +260,11 @@ int restore_sve_fpsimd_context(struct user_ctxs *user)
/* From now, fpsimd_thread_switch() won't touch thread.sve_state */ /* From now, fpsimd_thread_switch() won't touch thread.sve_state */
sve_alloc(current); sve_alloc(current);
if (!current->thread.sve_state) {
clear_thread_flag(TIF_SVE);
return -ENOMEM;
}
err = __copy_from_user(current->thread.sve_state, err = __copy_from_user(current->thread.sve_state,
(char __user const *)user->sve + (char __user const *)user->sve +
SVE_SIG_REGS_OFFSET, SVE_SIG_REGS_OFFSET,
...@@ -250,7 +272,10 @@ int restore_sve_fpsimd_context(struct user_ctxs *user) ...@@ -250,7 +272,10 @@ int restore_sve_fpsimd_context(struct user_ctxs *user)
if (err) if (err)
return -EFAULT; return -EFAULT;
set_thread_flag(TIF_SVE); if (sve.flags & SVE_SIG_FLAG_SM)
current->thread.svcr |= SVCR_SM_MASK;
else
set_thread_flag(TIF_SVE);
fpsimd_only: fpsimd_only:
/* copy the FP and status/control registers */ /* copy the FP and status/control registers */
...@@ -269,6 +294,98 @@ int restore_sve_fpsimd_context(struct user_ctxs *user) ...@@ -269,6 +294,98 @@ int restore_sve_fpsimd_context(struct user_ctxs *user)
#endif /* ! CONFIG_ARM64_SVE */ #endif /* ! CONFIG_ARM64_SVE */
#ifdef CONFIG_ARM64_SME
int preserve_za_context(struct za_context __user *ctx)
{
int err = 0;
u16 reserved[ARRAY_SIZE(ctx->__reserved)];
unsigned int vl = task_get_sme_vl(current);
unsigned int vq;
if (thread_za_enabled(&current->thread))
vq = sve_vq_from_vl(vl);
else
vq = 0;
memset(reserved, 0, sizeof(reserved));
__put_user_error(ZA_MAGIC, &ctx->head.magic, err);
__put_user_error(round_up(ZA_SIG_CONTEXT_SIZE(vq), 16),
&ctx->head.size, err);
__put_user_error(vl, &ctx->vl, err);
BUILD_BUG_ON(sizeof(ctx->__reserved) != sizeof(reserved));
err |= __copy_to_user(&ctx->__reserved, reserved, sizeof(reserved));
if (vq) {
/*
* This assumes that the ZA state has already been saved to
* the task struct by calling the function
* fpsimd_signal_preserve_current_state().
*/
err |= __copy_to_user((char __user *)ctx + ZA_SIG_REGS_OFFSET,
current->thread.za_state,
ZA_SIG_REGS_SIZE(vq));
}
return err ? -EFAULT : 0;
}
int restore_za_context(struct user_ctxs __user *user)
{
int err;
unsigned int vq;
struct za_context za;
if (__copy_from_user(&za, user->za, sizeof(za)))
return -EFAULT;
if (za.vl != task_get_sme_vl(current))
return -EINVAL;
if (za.head.size <= sizeof(*user->za)) {
current->thread.svcr &= ~SVCR_ZA_MASK;
return 0;
}
vq = sve_vq_from_vl(za.vl);
if (za.head.size < ZA_SIG_CONTEXT_SIZE(vq))
return -EINVAL;
/*
* Careful: we are about __copy_from_user() directly into
* thread.za_state with preemption enabled, so protection is
* needed to prevent a racing context switch from writing stale
* registers back over the new data.
*/
fpsimd_flush_task_state(current);
/* From now, fpsimd_thread_switch() won't touch thread.sve_state */
sme_alloc(current);
if (!current->thread.za_state) {
current->thread.svcr &= ~SVCR_ZA_MASK;
clear_thread_flag(TIF_SME);
return -ENOMEM;
}
err = __copy_from_user(current->thread.za_state,
(char __user const *)user->za +
ZA_SIG_REGS_OFFSET,
ZA_SIG_REGS_SIZE(vq));
if (err)
return -EFAULT;
set_thread_flag(TIF_SME);
current->thread.svcr |= SVCR_ZA_MASK;
return 0;
}
#else /* ! CONFIG_ARM64_SME */
#endif /* ! CONFIG_ARM64_SME */
int __parse_user_sigcontext(struct user_ctxs *user, int __parse_user_sigcontext(struct user_ctxs *user,
struct sigcontext __user const *sc, struct sigcontext __user const *sc,
void __user const *sigframe_base) void __user const *sigframe_base)
...@@ -282,6 +399,7 @@ int __parse_user_sigcontext(struct user_ctxs *user, ...@@ -282,6 +399,7 @@ int __parse_user_sigcontext(struct user_ctxs *user,
user->fpsimd = NULL; user->fpsimd = NULL;
user->sve = NULL; user->sve = NULL;
user->za = NULL;
if (!IS_ALIGNED((unsigned long)base, 16)) if (!IS_ALIGNED((unsigned long)base, 16))
goto invalid; goto invalid;
...@@ -335,7 +453,7 @@ int __parse_user_sigcontext(struct user_ctxs *user, ...@@ -335,7 +453,7 @@ int __parse_user_sigcontext(struct user_ctxs *user,
break; break;
case SVE_MAGIC: case SVE_MAGIC:
if (!system_supports_sve()) if (!system_supports_sve() && !system_supports_sme())
goto invalid; goto invalid;
if (user->sve) if (user->sve)
...@@ -347,6 +465,19 @@ int __parse_user_sigcontext(struct user_ctxs *user, ...@@ -347,6 +465,19 @@ int __parse_user_sigcontext(struct user_ctxs *user,
user->sve = (struct sve_context __user *)head; user->sve = (struct sve_context __user *)head;
break; break;
case ZA_MAGIC:
if (!system_supports_sme())
goto invalid;
if (user->za)
goto invalid;
if (size < sizeof(*user->za))
goto invalid;
user->za = (struct za_context __user *)head;
break;
case EXTRA_MAGIC: case EXTRA_MAGIC:
if (have_extra_context) if (have_extra_context)
goto invalid; goto invalid;
...@@ -465,11 +596,12 @@ int setup_sigframe_layout(struct rt_sigframe_user_layout *user, bool add_all) ...@@ -465,11 +596,12 @@ int setup_sigframe_layout(struct rt_sigframe_user_layout *user, bool add_all)
if (system_supports_sve()) { if (system_supports_sve()) {
unsigned int vq = 0; unsigned int vq = 0;
if (add_all || test_thread_flag(TIF_SVE)) { if (add_all || test_thread_flag(TIF_SVE) ||
int vl = sve_max_vl; thread_sm_enabled(&current->thread)) {
int vl = max(sve_max_vl(), sme_max_vl());
if (!add_all) if (!add_all)
vl = current->thread.sve_vl; vl = thread_get_cur_vl(&current->thread);
vq = sve_vq_from_vl(vl); vq = sve_vq_from_vl(vl);
} }
...@@ -480,6 +612,24 @@ int setup_sigframe_layout(struct rt_sigframe_user_layout *user, bool add_all) ...@@ -480,6 +612,24 @@ int setup_sigframe_layout(struct rt_sigframe_user_layout *user, bool add_all)
return err; return err;
} }
if (system_supports_sme()) {
unsigned int vl;
unsigned int vq = 0;
if (add_all)
vl = sme_max_vl();
else
vl = task_get_sme_vl(current);
if (thread_za_enabled(&current->thread))
vq = sve_vq_from_vl(vl);
err = sigframe_alloc(user, &user->za_offset,
ZA_SIG_CONTEXT_SIZE(vq));
if (err)
return err;
}
return sigframe_alloc_end(user); return sigframe_alloc_end(user);
} }
...@@ -554,6 +704,13 @@ static void setup_return(struct pt_regs *regs, struct k_sigaction *ka, ...@@ -554,6 +704,13 @@ static void setup_return(struct pt_regs *regs, struct k_sigaction *ka,
/* TCO (Tag Check Override) always cleared for signal handlers */ /* TCO (Tag Check Override) always cleared for signal handlers */
regs->pstate &= ~PSR_TCO_BIT; regs->pstate &= ~PSR_TCO_BIT;
/* Signal handlers are invoked with ZA and streaming mode disabled */
if (system_supports_sme()) {
current->thread.svcr &= ~(SVCR_ZA_MASK |
SVCR_SM_MASK);
sme_smstop();
}
if (ka->sa.sa_flags & SA_RESTORER) if (ka->sa.sa_flags & SA_RESTORER)
sigtramp = ka->sa.sa_restorer; sigtramp = ka->sa.sa_restorer;
else else
......
...@@ -171,11 +171,36 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr, ...@@ -171,11 +171,36 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
syscall_trace_exit(regs); syscall_trace_exit(regs);
} }
static inline void sve_user_discard(void) /*
* As per the ABI exit SME streaming mode and clear the SVE state not
* shared with FPSIMD on syscall entry.
*/
static inline void fp_user_discard(void)
{ {
/*
* If SME is active then exit streaming mode. If ZA is active
* then flush the SVE registers but leave userspace access to
* both SVE and SME enabled, otherwise disable SME for the
* task and fall through to disabling SVE too. This means
* that after a syscall we never have any streaming mode
* register state to track, if this changes the KVM code will
* need updating.
*/
if (system_supports_sme() && test_thread_flag(TIF_SME)) {
u64 svcr = read_sysreg_s(SYS_SVCR);
if (svcr & SVCR_SM_MASK)
sme_smstop_sm();
}
if (!system_supports_sve()) if (!system_supports_sve())
return; return;
/*
* If SME is not active then disable SVE, the registers will
* be cleared when userspace next attempts to access them and
* we do not need to track the SVE register state until then.
*/
clear_thread_flag(TIF_SVE); clear_thread_flag(TIF_SVE);
/* /*
...@@ -213,7 +238,7 @@ void do_el0_svc(struct pt_regs *regs) ...@@ -213,7 +238,7 @@ void do_el0_svc(struct pt_regs *regs)
} }
#endif #endif
sve_user_discard(); fp_user_discard();
el0_svc_common(regs, regs->regs[8], __NR_syscalls, t); el0_svc_common(regs, regs->regs[8], __NR_syscalls, t);
} }
......
...@@ -736,6 +736,7 @@ static const char *esr_class_str[] = { ...@@ -736,6 +736,7 @@ static const char *esr_class_str[] = {
[ESR_ELx_EC_SVE] = "SVE", [ESR_ELx_EC_SVE] = "SVE",
[ESR_ELx_EC_ERET] = "ERET/ERETAA/ERETAB", [ESR_ELx_EC_ERET] = "ERET/ERETAA/ERETAB",
[ESR_ELx_EC_FPAC] = "FPAC", [ESR_ELx_EC_FPAC] = "FPAC",
[ESR_ELx_EC_SME] = "SME",
[ESR_ELx_EC_IMP_DEF] = "EL3 IMP DEF", [ESR_ELx_EC_IMP_DEF] = "EL3 IMP DEF",
[ESR_ELx_EC_IABT_LOW] = "IABT (lower EL)", [ESR_ELx_EC_IABT_LOW] = "IABT (lower EL)",
[ESR_ELx_EC_IABT_CUR] = "IABT (current EL)", [ESR_ELx_EC_IABT_CUR] = "IABT (current EL)",
......
此差异已折叠。
...@@ -311,7 +311,7 @@ static int get_sve_vls(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg) ...@@ -311,7 +311,7 @@ static int get_sve_vls(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
memset(vqs, 0, sizeof(vqs)); memset(vqs, 0, sizeof(vqs));
max_vq = sve_vq_from_vl(vcpu->arch.sve_max_vl); max_vq = vcpu_sve_max_vq(vcpu);
for (vq = SVE_VQ_MIN; vq <= max_vq; ++vq) for (vq = SVE_VQ_MIN; vq <= max_vq; ++vq)
if (sve_vq_available(vq)) if (sve_vq_available(vq))
vqs[vq_word(vq)] |= vq_mask(vq); vqs[vq_word(vq)] |= vq_mask(vq);
...@@ -439,7 +439,7 @@ static int sve_reg_to_region(struct sve_state_reg_region *region, ...@@ -439,7 +439,7 @@ static int sve_reg_to_region(struct sve_state_reg_region *region,
if (!vcpu_has_sve(vcpu) || (reg->id & SVE_REG_SLICE_MASK) > 0) if (!vcpu_has_sve(vcpu) || (reg->id & SVE_REG_SLICE_MASK) > 0)
return -ENOENT; return -ENOENT;
vq = sve_vq_from_vl(vcpu->arch.sve_max_vl); vq = vcpu_sve_max_vq(vcpu);
reqoffset = SVE_SIG_ZREG_OFFSET(vq, reg_num) - reqoffset = SVE_SIG_ZREG_OFFSET(vq, reg_num) -
SVE_SIG_REGS_OFFSET; SVE_SIG_REGS_OFFSET;
...@@ -449,7 +449,7 @@ static int sve_reg_to_region(struct sve_state_reg_region *region, ...@@ -449,7 +449,7 @@ static int sve_reg_to_region(struct sve_state_reg_region *region,
if (!vcpu_has_sve(vcpu) || (reg->id & SVE_REG_SLICE_MASK) > 0) if (!vcpu_has_sve(vcpu) || (reg->id & SVE_REG_SLICE_MASK) > 0)
return -ENOENT; return -ENOENT;
vq = sve_vq_from_vl(vcpu->arch.sve_max_vl); vq = vcpu_sve_max_vq(vcpu);
reqoffset = SVE_SIG_PREG_OFFSET(vq, reg_num) - reqoffset = SVE_SIG_PREG_OFFSET(vq, reg_num) -
SVE_SIG_REGS_OFFSET; SVE_SIG_REGS_OFFSET;
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册