提交 · 17eed27b02da88560b4592390952b9a71042ab8b · openanolis / cloud-kernel

03 11月, 2017 27 次提交

arm64/sve: KVM: Prevent guests from using SVE · 17eed27b

由 Dave Martin 提交于 10月 31, 2017

Until KVM has full SVE support, guests must not be allowed to
execute SVE instructions.

This patch enables the necessary traps, and also ensures that the
traps are disabled again on exit from the guest so that the host
can still use SVE if it wants to.

On guest exit, high bits of the SVE Zn registers may have been
clobbered as a side-effect the execution of FPSIMD instructions in
the guest.  The existing KVM host FPSIMD restore code is not
sufficient to restore these bits, so this patch explicitly marks
the CPU as not containing cached vector state for any task, thus
forcing a reload on the next return to userspace.  This is an
interim measure, in advance of adding full SVE awareness to KVM.

This marking of cached vector state in the CPU as invalid is done
using __this_cpu_write(fpsimd_last_state, NULL) in fpsimd.c.  Due
to the repeated use of this rather obscure operation, it makes
sense to factor it out as a separate helper with a clearer name.
This patch factors it out as fpsimd_flush_cpu_state(), and ports
all callers to use it.

As a side effect of this refactoring, a this_cpu_write() in
fpsimd_cpu_pm_notifier() is changed to __this_cpu_write().  This
should be fine, since cpu_pm_enter() is supposed to be called only
with interrupts disabled.
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

17eed27b

arm64/sve: Add sysctl to set the default vector length for new processes · 4ffa09a9

由 Dave Martin 提交于 10月 31, 2017

Because of the effect of SVE on the size of the signal frame, the
default vector length used for new processes involves a tradeoff
between performance of SVE-enabled software on the one hand, and
reliability of non-SVE-aware software on the other hand.

For this reason, the best choice depends on the repertoire of
userspace software in use and is thus best left up to distro
maintainers, sysadmins and developers.

If CONFIG_SYSCTL and CONFIG_PROC_SYSCTL are enabled, this patch
exposes the default vector length in
/proc/sys/abi/sve_default_vector_length, where boot scripts or the
adventurous can poke it.

In common with other arm64 ABI sysctls, this control is currently
global: setting it requires CAP_SYS_ADMIN in the root user
namespace, but the value set is effective for subsequent execs in
all namespaces.  The control only affects _new_ processes, however:
changing it does not affect the vector length of any existing
process.

The intended usage model is that if userspace is known to be fully
SVE-tolerant (or a developer is curious to find out) then this
parameter can be cranked up during system startup.
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

4ffa09a9

arm64/sve: Add prctl controls for userspace vector length management · 2d2123bc

由 Dave Martin 提交于 10月 31, 2017

This patch adds two arm64-specific prctls, to permit userspace to
control its vector length:

 * PR_SVE_SET_VL: set the thread's SVE vector length and vector
   length inheritance mode.

 * PR_SVE_GET_VL: get the same information.

Although these prctls resemble instruction set features in the SVE
architecture, they provide additional control: the vector length
inheritance mode is Linux-specific and nothing to do with the
architecture, and the architecture does not permit EL0 to set its
own vector length directly.  Both can be used in portable tools
without requiring the use of SVE instructions.
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Cc: Alex Bennée <alex.bennee@linaro.org>
[will: Fixed up prctl constants to avoid clash with PDEATHSIG]
Signed-off-by: NWill Deacon <will.deacon@arm.com>

2d2123bc

arm64/sve: ptrace and ELF coredump support · 43d4da2c

由 Dave Martin 提交于 10月 31, 2017

This patch defines and implements a new regset NT_ARM_SVE, which
describes a thread's SVE register state.  This allows a debugger to
manipulate the SVE state, as well as being included in ELF
coredumps for post-mortem debugging.

Because the regset size and layout are dependent on the thread's
current vector length, it is not possible to define a C struct to
describe the regset contents as is done for existing regsets.
Instead, and for the same reasons, NT_ARM_SVE is based on the
freeform variable-layout approach used for the SVE signal frame.

Additionally, to reduce debug overhead when debugging threads that
might or might not have live SVE register state, NT_ARM_SVE may be
presented in one of two different formats: the old struct
user_fpsimd_state format is embedded for describing the state of a
thread with no live SVE state, whereas a new variable-layout
structure is embedded for describing live SVE state.  This avoids a
debugger needing to poll NT_PRFPREG in addition to NT_ARM_SVE, and
allows existing userspace code to handle the non-SVE case without
too much modification.

For this to work, NT_ARM_SVE is defined with a fixed-format header
of type struct user_sve_header, which the recipient can use to
figure out the content, size and layout of the reset of the regset.
Accessor macros are defined to allow the vector-length-dependent
parts of the regset to be manipulated.
Signed-off-by: NAlan Hayward <alan.hayward@arm.com>
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Cc: Alex Bennée <alex.bennee@linaro.org>
Cc: Okamoto Takayuki <tokamoto@jp.fujitsu.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

43d4da2c

arm64/sve: Preserve SVE registers around EFI runtime service calls · fdfa976c

由 Dave Martin 提交于 10月 31, 2017

The EFI runtime services ABI allows EFI to make free use of the
FPSIMD registers during EFI runtime service calls, subject to the
callee-save requirements of the AArch64 procedure call standard.

However, the SVE architecture allows upper bits of the SVE vector
registers to be zeroed as a side-effect of FPSIMD V-register
writes.  This means that the SVE vector registers must be saved in
their entirety in order to avoid data loss: non-SVE-aware EFI
implementations cannot restore them correctly.

The non-IRQ case is already handled gracefully by
kernel_neon_begin().  For the IRQ case, this patch allocates a
suitable per-CPU stash buffer for the full SVE register state and
uses it to preserve the affected registers around EFI calls.  It is
currently unclear how the EFI runtime services ABI will be
clarified with respect to SVE, so it safest to assume that the
predicate registers and FFR must be saved and restored too.

No attempt is made to restore the restore the vector length after
a call, for now.  It is deemed rather insane for EFI to change it,
and contemporary EFI implementations certainly won't.
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

fdfa976c

arm64/sve: Preserve SVE registers around kernel-mode NEON use · 1bd3f936

由 Dave Martin 提交于 10月 31, 2017

Kernel-mode NEON will corrupt the SVE vector registers, due to the
way they alias the FPSIMD vector registers in the hardware.

This patch ensures that any live SVE register content for the task
is saved by kernel_neon_begin().  The data will be restored in the
usual way on return to userspace.
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

1bd3f936

arm64/sve: Probe SVE capabilities and usable vector lengths · 2e0f2478

由 Dave Martin 提交于 10月 31, 2017

This patch uses the cpufeatures framework to determine common SVE
capabilities and vector lengths, and configures the runtime SVE
support code appropriately.

ZCR_ELx is not really a feature register, but it is convenient to
use it as a template for recording the maximum vector length
supported by a CPU, using the LEN field.  This field is similar to
a feature field in that it is a contiguous bitfield for which we
want to determine the minimum system-wide value.  This patch adds
ZCR as a pseudo-register in cpuinfo/cpufeatures, with appropriate
custom code to populate it.  Finding the minimum supported value of
the LEN field is left to the cpufeatures framework in the usual
way.

The meaning of ID_AA64ZFR0_EL1 is not architecturally defined yet,
so for now we just require it to be zero.

Note that much of this code is dormant and SVE still won't be used
yet, since system_supports_sve() remains hardwired to false.
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Cc: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

2e0f2478

arm64: cpufeature: Move sys_caps_initialised declarations · 8f1eec57

由 Dave Martin 提交于 10月 31, 2017

update_cpu_features() currently cannot tell whether it is being
called during early or late secondary boot.  This doesn't
desperately matter for anything it currently does.

However, SVE will need to know here whether the set of available
vector lengths is known or still to be determined when booting a
CPU, so that it can be updated appropriately.

This patch simply moves the sys_caps_initialised stuff to the top
of the file so that it can be used more widely.  There doesn't seem
to be a more obvious place to put it.
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

8f1eec57

arm64/sve: Backend logic for setting the vector length · 7582e220

由 Dave Martin 提交于 10月 31, 2017

This patch implements the core logic for changing a task's vector
length on request from userspace.  This will be used by the ptrace
and prctl frontends that are implemented in later patches.

The SVE architecture permits, but does not require, implementations
to support vector lengths that are not a power of two.  To handle
this, logic is added to check a requested vector length against a
possibly sparse bitmap of available vector lengths at runtime, so
that the best supported value can be chosen.
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Cc: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

7582e220

arm64/sve: Signal handling support · 8cd969d2

由 Dave Martin 提交于 10月 31, 2017

This patch implements support for saving and restoring the SVE
registers around signals.

A fixed-size header struct sve_context is always included in the
signal frame encoding the thread's vector length at the time of
signal delivery, optionally followed by a variable-layout structure
encoding the SVE registers.

Because of the need to preserve backwards compatibility, the FPSIMD
view of the SVE registers is always dumped as a struct
fpsimd_context in the usual way, in addition to any sve_context.

The SVE vector registers are dumped in full, including bits 127:0
of each register which alias the corresponding FPSIMD vector
registers in the hardware.  To avoid any ambiguity about which
alias to restore during sigreturn, the kernel always restores bits
127:0 of each SVE vector register from the fpsimd_context in the
signal frame (which must be present): userspace needs to take this
into account if it wants to modify the SVE vector register contents
on return from a signal.

FPSR and FPCR, which are used by both FPSIMD and SVE, are not
included in sve_context because they are always present in
fpsimd_context anyway.

For signal delivery, a new helper
fpsimd_signal_preserve_current_state() is added to update _both_
the FPSIMD and SVE views in the task struct, to make it easier to
populate this information into the signal frame.  Because of the
redundancy between the two views of the state, only one is updated
otherwise.
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Cc: Alex Bennée <alex.bennee@linaro.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

8cd969d2

arm64/sve: Support vector length resetting for new processes · 79ab047c

由 Dave Martin 提交于 10月 31, 2017

It's desirable to be able to reset the vector length to some sane
default for new processes, since the new binary and its libraries
may or may not be SVE-aware.

This patch tracks the desired post-exec vector length (if any) in a
new thread member sve_vl_onexec, and adds a new thread flag
TIF_SVE_VL_INHERIT to control whether to inherit or reset the
vector length.  Currently these are inactive.  Subsequent patches
will provide the capability to configure them.
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

79ab047c

arm64/sve: Core task context handling · bc0ee476

由 Dave Martin 提交于 10月 31, 2017

This patch adds the core support for switching and managing the SVE
architectural state of user tasks.

Calls to the existing FPSIMD low-level save/restore functions are
factored out as new functions task_fpsimd_{save,load}(), since SVE
now dynamically may or may not need to be handled at these points
depending on the kernel configuration, hardware features discovered
at boot, and the runtime state of the task.  To make these
decisions as fast as possible, const cpucaps are used where
feasible, via the system_supports_sve() helper.

The SVE registers are only tracked for threads that have explicitly
used SVE, indicated by the new thread flag TIF_SVE.  Otherwise, the
FPSIMD view of the architectural state is stored in
thread.fpsimd_state as usual.

When in use, the SVE registers are not stored directly in
thread_struct due to their potentially large and variable size.
Because the task_struct slab allocator must be configured very
early during kernel boot, it is also tricky to configure it
correctly to match the maximum vector length provided by the
hardware, since this depends on examining secondary CPUs as well as
the primary.  Instead, a pointer sve_state in thread_struct points
to a dynamically allocated buffer containing the SVE register data,
and code is added to allocate and free this buffer at appropriate
times.

TIF_SVE is set when taking an SVE access trap from userspace, if
suitable hardware support has been detected.  This enables SVE for
the thread: a subsequent return to userspace will disable the trap
accordingly.  If such a trap is taken without sufficient system-
wide hardware support, SIGILL is sent to the thread instead as if
an undefined instruction had been executed: this may happen if
userspace tries to use SVE in a system where not all CPUs support
it for example.

The kernel will clear TIF_SVE and disable SVE for the thread
whenever an explicit syscall is made by userspace.  For backwards
compatibility reasons and conformance with the spirit of the base
AArch64 procedure call standard, the subset of the SVE register
state that aliases the FPSIMD registers is still preserved across a
syscall even if this happens.  The remainder of the SVE register
state logically becomes zero at syscall entry, though the actual
zeroing work is currently deferred until the thread next tries to
use SVE, causing another trap to the kernel.  This implementation
is suboptimal: in the future, the fastpath case may be optimised
to zero the registers in-place and leave SVE enabled for the task,
where beneficial.

TIF_SVE is also cleared in the following slowpath cases, which are
taken as reasonable hints that the task may no longer use SVE:
 * exec
 * fork and clone

Code is added to sync data between thread.fpsimd_state and
thread.sve_state whenever enabling/disabling SVE, in a manner
consistent with the SVE architectural programmer's model.
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Alex Bennée <alex.bennee@linaro.org>
[will: added #include to fix allnoconfig build]
[will: use enable_daif in do_sve_acc]
Signed-off-by: NWill Deacon <will.deacon@arm.com>

bc0ee476

arm64/sve: Low-level CPU setup · 22043a3c

由 Dave Martin 提交于 10月 31, 2017

To enable the kernel to use SVE, SVE traps from EL1 to EL2 must be
disabled.  To take maximum advantage of the hardware, the full
available vector length also needs to be enabled for EL1 by
programming ZCR_EL2.LEN.  (The kernel will program ZCR_EL1.LEN as
required, but this cannot override the limit set by ZCR_EL2.)

This patch makes the appropriate changes to the EL2 early setup
code.
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Cc: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

22043a3c

arm64/sve: Signal frame and context structure definition · d0b8cd31

由 Dave Martin 提交于 10月 31, 2017

This patch defines the representation that will be used for the SVE
register state in the signal frame, and implements support for
saving and restoring the SVE registers around signals.

The same layout will also be used for the in-kernel task state.

Due to the variability of the SVE vector length, it is not possible
to define a fixed C struct to describe all the registers.  Instead,
Macros are defined in sigcontext.h to facilitate access to the
parts of the structure.
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Cc: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

d0b8cd31

arm64/sve: Kconfig update and conditional compilation support · ddd25ad1

由 Dave Martin 提交于 10月 31, 2017

This patch adds CONFIG_ARM64_SVE to control building of SVE support
into the kernel, and adds a stub predicate system_supports_sve() to
control conditional compilation and runtime SVE support.

system_supports_sve() just returns false for now: it will be
replaced with a non-trivial implementation in a later patch, once
SVE support is complete enough to be enabled safely.
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

ddd25ad1

arm64/sve: Low-level SVE architectural state manipulation functions · 1fc5dce7

由 Dave Martin 提交于 10月 31, 2017

Manipulating the SVE architectural state, including the vector and
predicate registers, first-fault register and the vector length,
requires the use of dedicated instructions added by SVE.

This patch adds suitable assembly functions for saving and
restoring the SVE registers and querying the vector length.
Setting of the vector length is done as part of register restore.

Since people building kernels may not all get an SVE-enabled
toolchain for a while, this patch uses macros that generate
explicit opcodes in place of assembler mnemonics.
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

1fc5dce7

arm64/sve: System register and exception syndrome definitions · 67236564

由 Dave Martin 提交于 10月 31, 2017

The SVE architecture adds some system registers, ID register fields
and a dedicated ESR exception class.

This patch adds the appropriate definitions that will be needed by
the kernel.
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

67236564

arm64: fpsimd: Simplify uses of {set,clear}_ti_thread_flag() · 9cf5b54f

由 Dave Martin 提交于 10月 31, 2017

The existing FPSIMD context switch code contains a couple of
instances of {set,clear}_ti_thread(task_thread_info(task)).  Since
there are thread flag manipulators that operate directly on
task_struct, this verbosity isn't strictly needed.

For consistency, this patch simplifies the affected calls.  This
should have no impact on behaviour.
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

9cf5b54f

arm64: Port deprecated instruction emulation to new sysctl interface · 38b9aeb3

由 Dave Martin 提交于 10月 31, 2017

Currently, armv8_deprected.c takes charge of the "abi" sysctl
directory, which makes life difficult for other code that wants to
register sysctls in the same directory.

There is a "new" [1] sysctl registration interface that removes the
need to define ctl_tables for parent directories explicitly, which
is ideal here.

This patch ports register_insn_emulation_sysctl() over to the
register_sysctl() interface and removes the redundant ctl_table for
"abi".
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>

[1] fea478d4 (sysctl: Add register_sysctl for normal sysctl
users)
The commit message notes an intent to port users of the
pre-existing interfaces over to register_sysctl(), though the
number of users of the new interface currently appears negligible.
Signed-off-by: NWill Deacon <will.deacon@arm.com>

38b9aeb3

arm64: efi: Add missing Kconfig dependency on KERNEL_MODE_NEON · b472db6c

由 Dave Martin 提交于 10月 31, 2017

The EFI runtime services ABI permits calls to EFI to clobber
certain FPSIMD/NEON registers, as per the AArch64 procedure call
standard.

Saving/restoring the clobbered registers around such calls needs
KERNEL_MODE_NEON, but the dependency is missing from Kconfig.

This patch adds the missing dependency.

This will aid bisection of the patches implementing support for the
ARM Scalable Vector Extension (SVE).
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

b472db6c

arm64: KVM: Hide unsupported AArch64 CPU features from guests · 93390c0a

由 Dave Martin 提交于 10月 31, 2017

Currently, a guest kernel sees the true CPU feature registers
(ID_*_EL1) when it reads them using MRS instructions.  This means
that the guest may observe features that are present in the
hardware but the host doesn't understand or doesn't provide support
for.  A guest may legimitately try to use such a feature as per the
architecture, but use of the feature may trap instead of working
normally, triggering undef injection into the guest.

This is not a problem for the host, but the guest may go wrong when
running on newer hardware than the host knows about.

This patch hides from guest VMs any AArch64-specific CPU features
that the host doesn't support, by exposing to the guest the
sanitised versions of the registers computed by the cpufeatures
framework, instead of the true hardware registers.  To achieve
this, HCR_EL2.TID3 is now set for AArch64 guests, and emulation
code is added to KVM to report the sanitised versions of the
affected registers in response to MRS and register reads from
userspace.

The affected registers are removed from invariant_sys_regs[] (since
the invariant_sys_regs handling is no longer quite correct for
them) and added to sys_reg_desgs[], with appropriate access(),
get_user() and set_user() methods.  No runtime vcpu storage is
allocated for the registers: instead, they are read on demand from
the cpufeatures framework.  This may need modification in the
future if there is a need for userspace to customise the features
visible to the guest.

Attempts by userspace to write the registers are handled similarly
to the current invariant_sys_regs handling: writes are permitted,
but only if they don't attempt to change the value.  This is
sufficient to support VM snapshot/restore from userspace.

Because of the additional registers, restoring a VM on an older
kernel may not work unless userspace knows how to handle the extra
VM registers exposed to the KVM user ABI by this patch.

Under the principle of least damage, this patch makes no attempt to
handle any of the other registers currently in
invariant_sys_regs[], or to emulate registers for AArch32: however,
these could be handled in a similar way in future, as necessary.
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Acked-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

93390c0a

arm64: signal: Verify extra data is user-readable in sys_rt_sigreturn · abf73988

由 Dave Martin 提交于 10月 31, 2017

Currently sys_rt_sigreturn() verifies that the base sigframe is
readable, but no similar check is performed on the extra data to
which an extra_context record points.

This matters because the extra data will be read with the
unprotected user accessors.  However, this is not a problem at
present because the extra data base address is required to be
exactly at the end of the base sigframe.  So, there would need to
be a non-user-readable kernel address within about 59K
(SIGFRAME_MAXSZ - sizeof(struct rt_sigframe)) of some address for
which access_ok(VERIFY_READ) returns true, in order for sigreturn
to be able to read kernel memory that should be inaccessible to the
user task.  This is currently impossible due to the untranslatable
address hole between the TTBR0 and TTBR1 address ranges.

Disappearance of the hole between the TTBR0 and TTBR1 mapping
ranges would require the VA size for TTBR0 and TTBR1 to grow to at
least 55 bits, and either the disabling of tagged pointers for
userspace or enabling of tagged pointers for kernel space; none of
which is currently envisaged.

Even so, it is wrong to use the unprotected user accessors without
an accompanying access_ok() check.

To avoid the potential for future surprises, this patch does an
explicit access_ok() check on the extra data space when parsing an
extra_context record.

Fixes: 33f08261 ("arm64: signal: Allow expansion of the signal frame")
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

abf73988

arm64: fpsimd: Correctly annotate exception helpers called from asm · 94ef7ecb

由 Dave Martin 提交于 10月 31, 2017

A couple of FPSIMD exception handling functions that are called
from entry.S are currently not annotated as such.

This is not a big deal since asmlinkage does nothing on arm/arm64,
but fixing the annotations is more consistent and may help avoid
future surprises.

This patch adds appropriate asmlinkage annotations for
do_fpsimd_acc() and do_fpsimd_exc().
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

94ef7ecb

regset: Add support for dynamically sized regsets · 27e64b4b

由 Dave Martin 提交于 10月 31, 2017

Currently the regset API doesn't allow for the possibility that
regsets (or at least, the amount of meaningful data in a regset)
may change in size.

In particular, this results in useless padding being added to
coredumps if a regset's current size is smaller than its
theoretical maximum size.

This patch adds a get_size() function to struct user_regset.
Individual regset implementations can implement this function to
return the current size of the regset data.  A regset_size()
function is added to provide callers with an abstract interface for
determining the size of a regset without needing to know whether
the regset is dynamically sized or not.

The only affected user of this interface is the ELF coredump code:
This patch ports ELF coredump to dump regsets with their actual
size in the coredump.  This has no effect except for new regsets
that are dynamically sized and provide a get_size() implementation.
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: H. J. Lu <hjl.tools@gmail.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

27e64b4b

arm-ccn: perf: Prevent module unload while PMU is in use · c7f5828b

由 Suzuki K Poulose 提交于 11月 03, 2017

When the PMU driver is built as a module, the perf expects the
pmu->module to be valid, so that the driver is prevented from
being unloaded while it is in use. Fix the CCN pmu driver to
fill in this field.

Fixes: a33b0daa ("bus: ARM CCN PMU driver")
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

c7f5828b

perf: arm_spe: Prevent module unload while the PMU is in use · 19b4aff2

由 Suzuki K Poulose 提交于 11月 03, 2017

When the PMU driver is built as a module, the perf expects the
pmu->module to be valid, so that the driver is prevented from
being unloaded while it is in use. Fix the SPE pmu driver to
fill in this field.

Cc: Will Deacon <will.deacon@arm.com>
Acked-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

19b4aff2

arm64: Fix static use of function graph · d125bffc

由 Julien Thierry 提交于 11月 03, 2017

Function graph does not work currently when CONFIG_DYNAMIC_TRACE is not
set. This is because ftrace_function_trace is not always set to ftrace_stub
when function_graph is in use.

Do not skip checking of graph tracer functions when ftrace_function_trace
is set.
Signed-off-by: NJulien Thierry <julien.thierry@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: NAKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

d125bffc

02 11月, 2017 11 次提交

arm64: entry.S: move SError handling into a C function for future expansion · a92d4d14

由 Xie XiuQi 提交于 11月 02, 2017

Today SError is taken using the inv_entry macro that ends up in
bad_mode.

SError can be used by the RAS Extensions to notify either the OS or
firmware of CPU problems, some of which may have been corrected.

To allow this handling to be added, add a do_serror() C function
that just panic()s. Add the entry.S boiler plate to save/restore the
CPU registers and unmask debug exceptions. Future patches may change
do_serror() to return if the SError Interrupt was notification of a
corrected error.
Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NWang Xiongfeng <wangxiongfengi2@huawei.com>
[Split out of a bigger patch, added compat path, renamed, enabled debug
 exceptions]
Signed-off-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

a92d4d14

arm64: entry.S: convert elX_irq · b282e1ce

由 James Morse 提交于 11月 02, 2017

Following our 'dai' order, irqs should be processed with debug and
serror exceptions unmasked.

Add a helper to unmask these two, (and fiq for good measure).
Signed-off-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

b282e1ce

arm64: entry.S convert el0_sync · 746647c7

由 James Morse 提交于 11月 02, 2017

el0_sync also unmasks exceptions on a case-by-case basis, debug exceptions
are enabled, unless this was a debug exception. Irqs are unmasked for
some exception types but not for others.

el0_dbg should run with everything masked to prevent us taking a debug
exception from do_debug_exception. For the other cases we can unmask
everything. This changes the behaviour of fpsimd_{acc,exc} and el0_inv
which previously ran with irqs masked.

This patch removed the last user of enable_dbg_and_irq, remove it.
Signed-off-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

746647c7

arm64: entry.S: convert el1_sync · b55a5a1b

由 James Morse 提交于 11月 02, 2017

el1_sync unmasks exceptions on a case-by-case basis, debug exceptions
are unmasked, unless this was a debug exception. IRQs are unmasked
for instruction and data aborts only if the interupted context had
irqs unmasked.

Following our 'dai' order, el1_dbg should run with everything masked.
For the other cases we can inherit whatever we interrupted.

Add a macro inherit_daif to set daif based on the interrupted pstate.
Signed-off-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

b55a5a1b

arm64: entry.S: Remove disable_dbg · 84d0fb1b

由 James Morse 提交于 11月 02, 2017

enable_step_tsk is the only user of disable_dbg, which doesn't respect
our 'dai' order for exception masking. enable_step_tsk may enable
single-step, so previously needed to mask debug exceptions to prevent us
from single-stepping kernel_exit. enable_step_tsk is called at the end
of the ret_to_user loop, which has already masked all exceptions so this
is no longer needed.

Remove disable_dbg, add a comment that enable_step_tsk's caller should
have masked debug.
Signed-off-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

84d0fb1b

arm64: Mask all exceptions during kernel_exit · 8d66772e

由 James Morse 提交于 11月 02, 2017

To take RAS Exceptions as quickly as possible we need to keep SError
unmasked as much as possible. We need to mask it during kernel_exit
as taking an error from this code will overwrite the exception-registers.

Adding a naked 'disable_daif' to kernel_exit causes a performance problem
for micro-benchmarks that do no real work, (e.g. calling getpid() in a
loop). This is because the ret_to_user loop has already masked IRQs so
that the TIF_WORK_MASK thread flags can't change underneath it, adding
disable_daif is an additional self-synchronising operation.

In the future, the RAS APEI code may need to modify the TIF_WORK_MASK
flags from an SError, in which case the ret_to_user loop must mask SError
while it examines the flags.

Disable all exceptions for return to EL1. For return to EL0 get the
ret_to_user loop to leave all exceptions masked once it has done its
work, this avoids an extra pstate-write.
Signed-off-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

8d66772e

arm64: Move the async/fiq helpers to explicitly set process context flags · 41bd5b5d

由 James Morse 提交于 11月 02, 2017

Remove the local_{async,fiq}_{en,dis}able macros as they don't respect
our newly defined order and are only used to set the flags for process
context when we bring CPUs online.

Add a helper to do this. The IRQ flag varies as we want it masked on
the boot CPU until we are ready to handle interrupts.
The boot CPU unmasks SError during early boot once it can print an error
message. If we can print an error message about SError, we can do the
same for FIQ. Debug exceptions are already enabled by __cpu_setup(),
which has also configured MDSCR_EL1 to disable MDE and KDE.
Signed-off-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

41bd5b5d

arm64: introduce an order for exceptions · 65be7a1b

由 James Morse 提交于 11月 02, 2017

Currently SError is always masked in the kernel. To support RAS exceptions
using SError on hardware with the v8.2 RAS Extensions we need to unmask
SError as much as possible.

Let's define an order for masking and unmasking exceptions. 'dai' is
memorable and effectively what we have today.

Disabling debug exceptions should cause all other exceptions to be masked.
Masking SError should mask irq, but not disable debug exceptions.
Masking irqs has no side effects for other flags. Keeping to this order
makes it easier for entry.S to know which exceptions should be unmasked.

FIQ is never expected, but we mask it when we mask debug exceptions, and
unmask it at all other times.

Given masking debug exceptions masks everything, we don't need macros
to save/restore that bit independently. Remove them and switch the last
caller over to use the daif calls.
Signed-off-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

65be7a1b

arm64: explicitly mask all exceptions · 0fbeb318

由 James Morse 提交于 11月 02, 2017

There are a few places where we want to mask all exceptions. Today we
do this in a piecemeal fashion, typically we expect the caller to
have masked irqs and the arch code masks debug exceptions, ignoring
serror which is probably masked.

Make it clear that 'mask all exceptions' is the intention by adding
helpers to do exactly that.

This will let us unmask SError without having to add 'oh and SError'
to these paths.
Signed-off-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

0fbeb318

arm64: suspend: remove useless included file · c10f0d06

由 Yisheng Xie 提交于 11月 01, 2017

After commit 9e8e865b ("arm64: unify idmap removal"), we no need to
flush tlb in suspend.c, so the included file tlbflush.h can be removed.
Signed-off-by: NYisheng Xie <xieyisheng1@huawei.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

c10f0d06

arm64: Don't walk page table for user faults in do_mem_abort · 80b6eb04

由 Will Deacon 提交于 10月 31, 2017

Commit 42dbf54e ("arm64: consistently log ESR and page table")
dumps page table entries for user faults hitting do_bad entries in the
fault handler table. Whilst this shouldn't really happen in practice,
it's not beyond the realms of possibility if e.g. running an old kernel
on a new CPU.

Generally, we want to avoid exposing physical addresses under the control
of userspace (see commit bf396c09 ("arm64: mm: don't print out page
table entries on EL0 faults")), so walk the page tables only on exceptions
from EL1.
Reported-by: NKristina Martsenko <kristina.martsenko@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

80b6eb04

31 10月, 2017 1 次提交

arm64: vdso: fix clock_getres for 4GiB-aligned res · c80ed088

由 Mark Rutland 提交于 10月 30, 2017

The vdso tries to check for a NULL res pointer in __kernel_clock_getres,
but only checks the lower 32 bits as is uses CBZ on the W register the
res pointer is held in.

Thus, if the res pointer happened to be aligned to a 4GiB boundary, we'd
spuriously skip storing the timespec to it, while returning a zero error code
to the caller.

Prevent this by checking the whole pointer, using CBZ on the X register
the res pointer is held in.

Fixes: 9031fefd ("arm64: VDSO support")
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Reported-by: NAndrew Pinski <apinski@cavium.com>
Reported-by: NMark Salyzyn <salyzyn@android.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

c80ed088

30 10月, 2017 1 次提交

arm64: prevent regressions in compressed kernel image size when upgrading to binutils 2.27 · fd9dde6a

由 Nick Desaulniers 提交于 10月 27, 2017

Upon upgrading to binutils 2.27, we found that our lz4 and gzip
compressed kernel images were significantly larger, resulting is 10ms
boot time regressions.

As noted by Rahul:
"aarch64 binaries uses RELA relocations, where each relocation entry
includes an addend value. This is similar to x86_64.  On x86_64, the
addend values are also stored at the relocation offset for relative
relocations. This is an optimization: in the case where code does not
need to be relocated, the loader can simply skip processing relative
relocations.  In binutils-2.25, both bfd and gold linkers did this for
x86_64, but only the gold linker did this for aarch64.  The kernel build
here is using the bfd linker, which stored zeroes at the relocation
offsets for relative relocations.  Since a set of zeroes compresses
better than a set of non-zero addend values, this behavior was resulting
in much better lz4 compression.

The bfd linker in binutils-2.27 is now storing the actual addend values
at the relocation offsets. The behavior is now consistent with what it
does for x86_64 and what gold linker does for both architectures.  The
change happened in this upstream commit:
https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=1f56df9d0d5ad89806c24e71f296576d82344613
Since a bunch of zeroes got replaced by non-zero addend values, we see
the side effect of lz4 compressed image being a bit bigger.

To get the old behavior from the bfd linker, "--no-apply-dynamic-relocs"
flag can be used:
$ LDFLAGS="--no-apply-dynamic-relocs" make
With this flag, the compressed image size is back to what it was with
binutils-2.25.

If the kernel is using ASLR, there aren't additional runtime costs to
--no-apply-dynamic-relocs, as the relocations will need to be applied
again anyway after the kernel is relocated to a random address.

If the kernel is not using ASLR, then presumably the current default
behavior of the linker is better. Since the static linker performed the
dynamic relocs, and the kernel is not moved to a different address at
load time, it can skip applying the relocations all over again."

Some measurements:

$ ld -v
GNU ld (binutils-2.25-f3d35cf6) 2.25.51.20141117
                    ^
$ ls -l vmlinux
-rwxr-x--- 1 ndesaulniers eng 300652760 Oct 26 11:57 vmlinux
$ ls -l Image.lz4-dtb
-rw-r----- 1 ndesaulniers eng 16932627 Oct 26 11:57 Image.lz4-dtb

$ ld -v
GNU ld (binutils-2.27-53dd00a1) 2.27.0.20170315
                    ^
pre patch:
$ ls -l vmlinux
-rwxr-x--- 1 ndesaulniers eng 300376208 Oct 26 11:43 vmlinux
$ ls -l Image.lz4-dtb
-rw-r----- 1 ndesaulniers eng 18159474 Oct 26 11:43 Image.lz4-dtb

post patch:
$ ls -l vmlinux
-rwxr-x--- 1 ndesaulniers eng 300376208 Oct 26 12:06 vmlinux
$ ls -l Image.lz4-dtb
-rw-r----- 1 ndesaulniers eng 16932466 Oct 26 12:06 Image.lz4-dtb

By Siqi's measurement w/ gzip:
binutils 2.27 with this patch (with --no-apply-dynamic-relocs):
Image 41535488
Image.gz 13404067

binutils 2.27 without this patch (without --no-apply-dynamic-relocs):
Image 41535488
Image.gz 14125516

Any compression scheme should be able to get better results from the
longer runs of zeros, not just GZIP and LZ4.

10ms boot time savings isn't anything to get excited about, but users of
arm64+compression+bfd-2.27 should not have to pay a penalty for no
runtime improvement.
Reported-by: NGopinath Elanchezhian <gelanchezhian@google.com>
Reported-by: NSindhuri Pentyala <spentyala@google.com>
Reported-by: NWei Wang <wvw@google.com>
Suggested-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Suggested-by: NRahul Chaudhry <rahulchaudhry@google.com>
Suggested-by: NSiqi Lin <siqilin@google.com>
Suggested-by: NStephen Hines <srhines@google.com>
Signed-off-by: NNick Desaulniers <ndesaulniers@google.com>
Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
[will: added comment to Makefile]
Signed-off-by: NWill Deacon <will.deacon@arm.com>

fd9dde6a

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功