提交 · cc5705fb1bf119ebb693d594f0157e0dd418590e · openeuler / Kernel

01 12月, 2021 2 次提交

KVM: arm64: Drop vcpu->arch.has_run_once for vcpu->pid · cc5705fb

由 Marc Zyngier 提交于 10月 14, 2021

With the transition to kvm_arch_vcpu_run_pid_change() to handle
the "run once" activities, it becomes obvious that has_run_once
is now an exact shadow of vcpu->pid.

Replace vcpu->arch.has_run_once with a new vcpu_has_run_once()
helper that directly checks for vcpu->pid, and get rid of the
now unused field.
Reviewed-by: NAndrew Jones <drjones@redhat.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

cc5705fb

KVM: arm64: Move kvm_arch_vcpu_run_pid_change() out of line · 052f064d

由 Marc Zyngier 提交于 10月 14, 2021

Having kvm_arch_vcpu_run_pid_change() inline doesn't bring anything
to the table. Move it next to kvm_vcpu_first_run_init(), which will
be convenient for what is next to come.
Reviewed-by: NAndrew Jones <drjones@redhat.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

052f064d

24 11月, 2021 1 次提交

arm64: uaccess: avoid blocking within critical sections · 94902d84

由 Mark Rutland 提交于 11月 22, 2021

As Vincent reports in:

  https://lore.kernel.org/r/20211118163417.21617-1-vincent.whitchurch@axis.com

The put_user() in schedule_tail() can get stuck in a livelock, similar
to a problem recently fixed on riscv in commit:

  285a76bb ("riscv: evaluate put_user() arg before enabling user access")

In __raw_put_user() we have a critical section between
uaccess_ttbr0_enable() and uaccess_ttbr0_disable() where we cannot
safely call into the scheduler without having taken an exception, as
schedule() and other scheduling functions will not save/restore the
TTBR0 state. If either of the `x` or `ptr` arguments to __raw_put_user()
contain a blocking call, we may call into the scheduler within the
critical section. This can result in two problems:

1) The access within the critical section will occur without the
   required TTBR0 tables installed. This will fault, and where the
   required tables permit access, the access will be retried without the
   required tables, resulting in a livelock.

2) When TTBR0 SW PAN is in use, check_and_switch_context() does not
   modify TTBR0, leaving a stale value installed. The mappings of the
   blocked task will erroneously be accessible to regular accesses in
   the context of the new task. Additionally, if the tables are
   subsequently freed, local TLB maintenance required to reuse the ASID
   may be lost, potentially resulting in TLB corruption (e.g. in the
   presence of CnP).

The same issue exists for __raw_get_user() in the critical section
between uaccess_ttbr0_enable() and uaccess_ttbr0_disable().

A similar issue exists for __get_kernel_nofault() and
__put_kernel_nofault() for the critical section between
__uaccess_enable_tco_async() and __uaccess_disable_tco_async(), as the
TCO state is not context-switched by direct calls into the scheduler.
Here the TCO state may be lost from the context of the current task,
resulting in unexpected asynchronous tag check faults. It may also be
leaked to another task, suppressing expected tag check faults.

To fix all of these cases, we must ensure that we do not directly call
into the scheduler in their respective critical sections. This patch
reworks __raw_put_user(), __raw_get_user(), __get_kernel_nofault(), and
__put_kernel_nofault(), ensuring that parameters are evaluated outside
of the critical sections. To make this requirement clear, comments are
added describing the problem, and line spaces added to separate the
critical sections from other portions of the macros.

For __raw_get_user() and __raw_put_user() the `err` parameter is
conditionally assigned to, and we must currently evaluate this in the
critical section. This behaviour is relied upon by the signal code,
which uses chains of put_user_error() and get_user_error(), checking the
return value at the end. In all cases, the `err` parameter is a plain
int rather than a more complex expression with a blocking call, so this
is safe.

In future we should try to clean up the `err` usage to remove the
potential for this to be a problem.

Aside from the changes to time of evaluation, there should be no
functional change as a result of this patch.
Reported-by: NVincent Whitchurch <vincent.whitchurch@axis.com>
Link: https://lore.kernel.org/r/20211118163417.21617-1-vincent.whitchurch@axis.com
Fixes: f253d827 ("arm64: uaccess: refactor __{get,put}_user")
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20211122125820.55286-1-mark.rutland@arm.comSigned-off-by: NWill Deacon <will@kernel.org>

94902d84

16 11月, 2021 2 次提交

arm64: mm: Fix VM_BUG_ON(mm != &init_mm) for trans_pgd · d3eb70ea

由 Pingfan Liu 提交于 11月 12, 2021

trans_pgd_create_copy() can hit "VM_BUG_ON(mm != &init_mm)" in the
function pmd_populate_kernel().

This is the combined consequence of commit 5de59884 ("arm64:
trans_pgd: pass NULL instead of init_mm to *_populate functions"), which
replaced &init_mm with NULL and commit 59511cfd ("arm64: mm: use XN
table mapping attributes for user/kernel mappings"), which introduced
the VM_BUG_ON.

Since the former sounds reasonable, it is better to work on the later.
From the perspective of trans_pgd, two groups of functions are
considered in the later one:

  pmd_populate_kernel()
    mm == NULL should be fixed, else it hits VM_BUG_ON()
  p?d_populate()
    mm == NULL means PXN, that is OK, since trans_pgd only copies a
    linear map, no execution will happen on the map.

So it is good enough to just relax VM_BUG_ON() to disregard mm == NULL

Fixes: 59511cfd ("arm64: mm: use XN table mapping attributes for user/kernel mappings")
Signed-off-by: NPingfan Liu <kernelfans@gmail.com>
Cc: <stable@vger.kernel.org> # 5.13.x
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Matthias Brugger <mbrugger@suse.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Reviewed-by: NPasha Tatashin <pasha.tatashin@soleen.com>
Link: https://lore.kernel.org/r/20211112052214.9086-1-kernelfans@gmail.comSigned-off-by: NWill Deacon <will@kernel.org>

d3eb70ea

arm64: ftrace: use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR · c6d3cd32

由 Mark Rutland 提交于 10月 29, 2021

When CONFIG_FUNCTION_GRAPH_TRACER is selected and the function graph
tracer is in use, unwind_frame() may erroneously associate a traced
function with an incorrect return address. This can happen when starting
an unwind from a pt_regs, or when unwinding across an exception
boundary.

This can be seen when recording with perf while the function graph
tracer is in use. For example:

| # echo function_graph > /sys/kernel/debug/tracing/current_tracer
| # perf record -g -e raw_syscalls:sys_enter:k /bin/true
| # perf report

... reports the callchain erroneously as:

| el0t_64_sync
| el0t_64_sync_handler
| el0_svc_common.constprop.0
| perf_callchain
| get_perf_callchain
| syscall_trace_enter
| syscall_trace_enter

... whereas when the function graph tracer is not in use, it reports:

| el0t_64_sync
| el0t_64_sync_handler
| el0_svc
| do_el0_svc
| el0_svc_common.constprop.0
| syscall_trace_enter
| syscall_trace_enter

The underlying problem is that ftrace_graph_get_ret_stack() takes an
index offset from the most recent entry added to the fgraph return
stack. We start an unwind at offset 0, and increment the offset each
time we encounter a rewritten return address (i.e. when we see
`return_to_handler`). This is broken in two cases:

1) Between creating a pt_regs and starting the unwind, function calls
   may place entries on the stack, leaving an arbitrary offset which we
   can only determine by performing a full unwind from the caller of the
   unwind code (and relying on none of the unwind code being
   instrumented).

   This can result in erroneous entries being reported in a backtrace
   recorded by perf or kfence when the function graph tracer is in use.
   Currently show_regs() is unaffected as dump_backtrace() performs an
   initial unwind.

2) When unwinding across an exception boundary (whether continuing an
   unwind or starting a new unwind from regs), we currently always skip
   the LR of the interrupted context. Where this was live and contained
   a rewritten address, we won't consume the corresponding fgraph ret
   stack entry, leaving subsequent entries off-by-one.

   This can result in erroneous entries being reported in a backtrace
   performed by any in-kernel unwinder when that backtrace crosses an
   exception boundary, with entries after the boundary being reported
   incorrectly. This includes perf, kfence, show_regs(), panic(), etc.

To fix this, we need to be able to uniquely identify each rewritten
return address such that we can map this back to the original return
address. We can use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR to associate
each rewritten return address with a unique location on the stack. As
the return address is passed in the LR (and so is not guaranteed a
unique location in memory), we use the FP upon entry to the function
(i.e. the address of the caller's frame record) as the return address
pointer. Any nested call will have a different FP value as the caller
must create its own frame record and update FP to point to this.

Since ftrace_graph_ret_addr() requires the return address with the PAC
stripped, the stripping of the PAC is moved before the fixup of the
rewritten address. As we would unconditionally strip the PAC, moving
this earlier is not harmful, and we can avoid a redundant strip in the
return address fixup code.

I've tested this with the perf case above, the ftrace selftests, and
a number of ad-hoc unwinder tests. The tests all pass, and I have seen
no unexpected behaviour as a result of this change. I've tested with
pointer authentication under QEMU TCG where magic-sysrq+l correctly
recovers the original return addresses.

Note that this doesn't fix the issue of skipping a live LR at an
exception boundary, which is a more general problem and requires more
substantial rework. Were we to consume the LR in all cases this would
result in warnings where the interrupted context's LR contains
`return_to_handler`, but the FP has been altered, e.g.

| func:
|	<--- ftrace entry ---> 	// logs FP & LR, rewrites LR
| 	STP	FP, LR, [SP, #-16]!
| 	MOV	FP, SP
| 	<--- INTERRUPT --->

... as ftrace_graph_get_ret_stack() fill not find a matching entry,
triggering the WARN_ON_ONCE() in unwind_frame().

Link: https://lore.kernel.org/r/20211025164925.GB2001@C02TD0UTHF1T.local
Link: https://lore.kernel.org/r/20211027132529.30027-1-mark.rutland@arm.comSigned-off-by: NMark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: NMark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211029162245.39761-1-mark.rutland@arm.comSigned-off-by: NWill Deacon <will@kernel.org>

c6d3cd32

08 11月, 2021 3 次提交

KVM: arm64: Change the return type of kvm_vcpu_preferred_target() · 08e873cb

由 YueHaibing 提交于 11月 05, 2021

kvm_vcpu_preferred_target() always return 0 because kvm_target_cpu()
never returns a negative error code.
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Reviewed-by: NAlexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211105011500.16280-1-yuehaibing@huawei.com

08e873cb

KVM: arm64: Extract ESR_ELx.EC only · 8bb08411

由 Mark Rutland 提交于 11月 03, 2021

Since ARMv8.0 the upper 32 bits of ESR_ELx have been RES0, and recently
some of the upper bits gained a meaning and can be non-zero. For
example, when FEAT_LS64 is implemented, ESR_ELx[36:32] contain ISS2,
which for an ST64BV or ST64BV0 can be non-zero. This can be seen in ARM
DDI 0487G.b, page D13-3145, section D13.2.37.

Generally, we must not rely on RES0 bit remaining zero in future, and
when extracting ESR_ELx.EC we must mask out all other bits.

All C code uses the ESR_ELx_EC() macro, which masks out the irrelevant
bits, and therefore no alterations are required to C code to avoid
consuming irrelevant bits.

In a couple of places the KVM assembly extracts ESR_ELx.EC using LSR on
an X register, and so could in theory consume previously RES0 bits. In
both cases this is for comparison with EC values ESR_ELx_EC_HVC32 and
ESR_ELx_EC_HVC64, for which the upper bits of ESR_ELx must currently be
zero, but this could change in future.

This patch adjusts the KVM vectors to use UBFX rather than LSR to
extract ESR_ELx.EC, ensuring these are robust to future additions to
ESR_ELx.

Cc: stable@vger.kernel.org
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: NWill Deacon <will@kernel.org>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211103110545.4613-1-mark.rutland@arm.com

8bb08411

arm64: pgtable: make __pte_to_phys/__phys_to_pte_val inline functions · c7c386fb

由 Arnd Bergmann 提交于 11月 05, 2021

gcc warns about undefined behavior the vmalloc code when building
with CONFIG_ARM64_PA_BITS_52, when the 'idx++' in the argument to
__phys_to_pte_val() is evaluated twice:

mm/vmalloc.c: In function 'vmap_pfn_apply':
mm/vmalloc.c:2800:58: error: operation on 'data->idx' may be undefined [-Werror=sequence-point]
 2800 |         *pte = pte_mkspecial(pfn_pte(data->pfns[data->idx++], data->prot));
      |                                                 ~~~~~~~~~^~
arch/arm64/include/asm/pgtable-types.h:25:37: note: in definition of macro '__pte'
   25 | #define __pte(x)        ((pte_t) { (x) } )
      |                                     ^
arch/arm64/include/asm/pgtable.h:80:15: note: in expansion of macro '__phys_to_pte_val'
   80 |         __pte(__phys_to_pte_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot))
      |               ^~~~~~~~~~~~~~~~~
mm/vmalloc.c:2800:30: note: in expansion of macro 'pfn_pte'
 2800 |         *pte = pte_mkspecial(pfn_pte(data->pfns[data->idx++], data->prot));
      |                              ^~~~~~~

I have no idea why this never showed up earlier, but the safest
workaround appears to be changing those macros into inline functions
so the arguments get evaluated only once.

Cc: Matthew Wilcox <willy@infradead.org>
Fixes: 75387b92 ("arm64: handle 52-bit physical addresses in page table entries")
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20211105075414.2553155-1-arnd@kernel.orgSigned-off-by: NWill Deacon <will@kernel.org>

c7c386fb

26 10月, 2021 1 次提交

arm64/sve: Add stub for sve_max_virtualisable_vl() · 49ed9204

由 Mark Brown 提交于 10月 22, 2021

Fixes build problems for configurations with KVM enabled but SVE disabled.
Reported-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NMark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211022141635.2360415-2-broonie@kernel.orgSigned-off-by: NWill Deacon <will@kernel.org>

49ed9204

23 10月, 2021 1 次提交

arm64: Recover kretprobe modified return address in stacktrace · cd9bc2c9

由 Masami Hiramatsu 提交于 10月 21, 2021

Since the kretprobe replaces the function return address with
the kretprobe_trampoline on the stack, stack unwinder shows it
instead of the correct return address.

This checks whether the next return address is the
__kretprobe_trampoline(), and if so, try to find the correct
return address from the kretprobe instance list. For this purpose
this adds 'kr_cur' loop cursor to memorize the current kretprobe
instance.

With this fix, now arm64 can enable
CONFIG_ARCH_CORRECT_STACKTRACE_ON_KRETPROBE, and pass the
kprobe self tests.
Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
Acked-by: NWill Deacon <will@kernel.org>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

cd9bc2c9

22 10月, 2021 2 次提交

arm64: errata: Add workaround for TSB flush failures · fa82d0b4

由 Suzuki K Poulose 提交于 10月 19, 2021

Arm Neoverse-N2 (#2067961) and Cortex-A710 (#2054223) suffers
from errata, where a TSB (trace synchronization barrier)
fails to flush the trace data completely, when executed from
a trace prohibited region. In Linux we always execute it
after we have moved the PE to trace prohibited region. So,
we can apply the workaround every time a TSB is executed.

The work around is to issue two TSB consecutively.

NOTE: This errata is defined as LOCAL_CPU_ERRATUM, implying
that a late CPU could be blocked from booting if it is the
first CPU that requires the workaround. This is because we
do not allow setting a cpu_hwcaps after the SMP boot. The
other alternative is to use "this_cpu_has_cap()" instead
of the faster system wide check, which may be a bit of an
overhead, given we may have to do this in nvhe KVM host
before a guest entry.

Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Reviewed-by: NMathieu Poirier <mathieu.poirier@linaro.org>
Reviewed-by: NAnshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20211019163153.3692640-4-suzuki.poulose@arm.comSigned-off-by: NWill Deacon <will@kernel.org>

fa82d0b4

arm64: Add Neoverse-N2, Cortex-A710 CPU part definition · 2d0d6567

由 Suzuki K Poulose 提交于 10月 19, 2021

Add the CPU Partnumbers for the new Arm designs.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Reviewed-by: NAnshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20211019163153.3692640-2-suzuki.poulose@arm.comSigned-off-by: NWill Deacon <will@kernel.org>

2d0d6567

21 10月, 2021 17 次提交

arm64: extable: add load_unaligned_zeropad() handler · 753b3236

由 Mark Rutland 提交于 10月 19, 2021

For inline assembly, we place exception fixups out-of-line in the
`.fixup` section such that these are out of the way of the fast path.
This has a few drawbacks:

* Since the fixup code is anonymous, backtraces will symbolize fixups as
  offsets from the nearest prior symbol, currently
  `__entry_tramp_text_end`. This is confusing, and painful to debug
  without access to the relevant vmlinux.

* Since the exception handler adjusts the PC to execute the fixup, and
  the fixup uses a direct branch back into the function it fixes,
  backtraces of fixups miss the original function. This is confusing,
  and violates requirements for RELIABLE_STACKTRACE (and therefore
  LIVEPATCH).

* Inline assembly and associated fixups are generated from templates,
  and we have many copies of logically identical fixups which only
  differ in which specific registers are written to and which address is
  branched to at the end of the fixup. This is potentially wasteful of
  I-cache resources, and makes it hard to add additional logic to fixups
  without significant bloat.

* In the case of load_unaligned_zeropad(), the logic in the fixup
  requires a temporary register that we must allocate even in the
  fast-path where it will not be used.

This patch address all four concerns for load_unaligned_zeropad() fixups
by adding a dedicated exception handler which performs the fixup logic
in exception context and subsequent returns back after the faulting
instruction. For the moment, the fixup logic is identical to the old
assembly fixup logic, but in future we could enhance this by taking the
ESR and FAR into account to constrain the faults we try to fix up, or to
specialize fixups for MTE tag check faults.

Other than backtracing, there should be no functional change as a result
of this patch.
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Reviewed-by: NArd Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20211019160219.5202-13-mark.rutland@arm.comSigned-off-by: NWill Deacon <will@kernel.org>

753b3236

arm64: extable: add a dedicated uaccess handler · 2e77a62c

由 Mark Rutland 提交于 10月 19, 2021

For inline assembly, we place exception fixups out-of-line in the
`.fixup` section such that these are out of the way of the fast path.
This has a few drawbacks:

* Since the fixup code is anonymous, backtraces will symbolize fixups as
  offsets from the nearest prior symbol, currently
  `__entry_tramp_text_end`. This is confusing, and painful to debug
  without access to the relevant vmlinux.

* Since the exception handler adjusts the PC to execute the fixup, and
  the fixup uses a direct branch back into the function it fixes,
  backtraces of fixups miss the original function. This is confusing,
  and violates requirements for RELIABLE_STACKTRACE (and therefore
  LIVEPATCH).

* Inline assembly and associated fixups are generated from templates,
  and we have many copies of logically identical fixups which only
  differ in which specific registers are written to and which address is
  branched to at the end of the fixup. This is potentially wasteful of
  I-cache resources, and makes it hard to add additional logic to fixups
  without significant bloat.

This patch address all three concerns for inline uaccess fixups by
adding a dedicated exception handler which updates registers in
exception context and subsequent returns back into the function which
faulted, removing the need for fixups specialized to each faulting
instruction.

Other than backtracing, there should be no functional change as a result
of this patch.
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Reviewed-by: NArd Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20211019160219.5202-12-mark.rutland@arm.comSigned-off-by: NWill Deacon <will@kernel.org>

2e77a62c

arm64: extable: add `type` and `data` fields · d6e2cc56

由 Mark Rutland 提交于 10月 19, 2021

Subsequent patches will add specialized handlers for fixups, in addition
to the simple PC fixup and BPF handlers we have today. In preparation,
this patch adds a new `type` field to struct exception_table_entry, and
uses this to distinguish the fixup and BPF cases. A `data` field is also
added so that subsequent patches can associate data specific to each
exception site (e.g. register numbers).

Handlers are named ex_handler_*() for consistency, following the exmaple
of x86. At the same time, get_ex_fixup() is split out into a helper so
that it can be used by other ex_handler_*() functions ins subsequent
patches.

This patch will increase the size of the exception tables, which will be
remedied by subsequent patches removing redundant fixup code. There
should be no functional change as a result of this patch.

Since each entry is now 12 bytes in size, we must reduce the alignment
of each entry from `.align 3` (i.e. 8 bytes) to `.align 2` (i.e. 4
bytes), which is the natrual alignment of the `insn` and `fixup` fields.
The current 8-byte alignment is a holdover from when the `insn` and
`fixup` fields was 8 bytes, and while not harmful has not been necessary
since commit:

  6c94f27a ("arm64: switch to relative exception tables")

Similarly, RO_EXCEPTION_TABLE_ALIGN is dropped to 4 bytes.

Concurrently with this patch, x86's exception table entry format is
being updated (similarly to a 12-byte format, with 32-bytes of absolute
data). Once both have been merged it should be possible to unify the
sorttable logic for the two.
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Reviewed-by: NArd Biesheuvel <ardb@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: James Morse <james.morse@arm.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20211019160219.5202-11-mark.rutland@arm.comSigned-off-by: NWill Deacon <will@kernel.org>

d6e2cc56

arm64: extable: make fixup_exception() return bool · e8c328d7

由 Mark Rutland 提交于 10月 19, 2021

The return values of fixup_exception() and arm64_bpf_fixup_exception()
represent a boolean condition rather than an error code, so for clarity
it would be better to return `bool` rather than `int`.

This patch adjusts the code accordingly. While we're modifying the
prototype, we also remove the unnecessary `extern` keyword, so that this
won't look out of place when we make subsequent additions to the header.

There should be no functional change as a result of this patch.
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Reviewed-by: NArd Biesheuvel <ardb@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: James Morse <james.morse@arm.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20211019160219.5202-9-mark.rutland@arm.comSigned-off-by: NWill Deacon <will@kernel.org>

e8c328d7

arm64: extable: consolidate definitions · 819771cc

由 Mark Rutland 提交于 10月 19, 2021

In subsequent patches we'll alter the structure and usage of struct
exception_table_entry. For inline assembly, we create these using the
`_ASM_EXTABLE()` CPP macro defined in <asm/uaccess.h>, and for plain
assembly code we use the `_asm_extable()` GAS macro defined in
<asm/assembler.h>, which are largely identical save for different
escaping and stringification requirements.

This patch moves the common definitions to a new <asm/asm-extable.h>
header, so that it's easier to keep the two in-sync, and to remove the
implication that these are only used for uaccess helpers (as e.g.
load_unaligned_zeropad() is only used on kernel memory, and depends upon
`_ASM_EXTABLE()`.

At the same time, a few minor modifications are made for clarity and in
preparation for subsequent patches:

* The structure creation is factored out into an `__ASM_EXTABLE_RAW()`
  macro. This will make it easier to support different fixup variants in
  subsequent patches without needing to update all users of
  `_ASM_EXTABLE()`, and makes it easier to see tha the CPP and GAS
  variants of the macros are structurally identical.

  For the CPP macro, the stringification of fields is left to the
  wrapper macro, `_ASM_EXTABLE()`, as in subsequent patches it will be
  necessary to stringify fields in wrapper macros to safely concatenate
  strings which cannot be token-pasted together in CPP.

* The fields of the structure are created separately on their own lines.
  This will make it easier to add/remove/modify individual fields
  clearly.

* Additional parentheses are added around the use of macro arguments in
  field definitions to avoid any potential problems with evaluation due
  to operator precedence, and to make errors upon misuse clearer.

* USER() is moved into <asm/asm-uaccess.h>, as it is not required by all
  assembly code, and is already refered to by comments in that file.

There should be no functional change as a result of this patch.
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Reviewed-by: NArd Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20211019160219.5202-8-mark.rutland@arm.comSigned-off-by: NWill Deacon <will@kernel.org>

819771cc

arm64: gpr-num: support W registers · 286fba6c

由 Mark Rutland 提交于 10月 19, 2021

In subsequent patches we'll want to map W registers to their register
numbers. Update gpr-num.h so that we can do this.
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Reviewed-by: NArd Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20211019160219.5202-7-mark.rutland@arm.comSigned-off-by: NWill Deacon <will@kernel.org>

286fba6c

arm64: factor out GPR numbering helpers · 8ed1b498

由 Mark Rutland 提交于 10月 19, 2021

In <asm/sysreg.h> we have macros to convert the names of general purpose
registers (GPRs) into integer constants, which we use to manually build
the encoding for `MRS` and `MSR` instructions where we can't rely on the
assembler to do so for us.

In subsequent patches we'll need to map the same GPR names to integer
constants so that we can use this to build metadata for exception
fixups.

So that the we can use the mappings elsewhere, factor out the
definitions into a new <asm/gpr-num.h> header, renaming the definitions
to align with this "GPR num" naming for clarity.

There should be no functional change as a result of this patch.
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Reviewed-by: NArd Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20211019160219.5202-6-mark.rutland@arm.comSigned-off-by: NWill Deacon <will@kernel.org>

8ed1b498

arm64: kvm: use kvm_exception_table_entry · ae2b2f33

由 Mark Rutland 提交于 10月 19, 2021

In subsequent patches we'll alter `struct exception_table_entry`, adding
fields that are not needed for KVM exception fixups.

In preparation for this, migrate KVM to its own `struct
kvm_exception_table_entry`, which is identical to the current format of
`struct exception_table_entry`. Comments are updated accordingly.

There should be no functional change as a result of this patch.
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Reviewed-by: NArd Biesheuvel <ardb@kernel.org>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211019160219.5202-5-mark.rutland@arm.comSigned-off-by: NWill Deacon <will@kernel.org>

ae2b2f33

arm64: vdso32: drop test for -march=armv8-a · a517faa9

由 Nick Desaulniers 提交于 10月 19, 2021

As Arnd points out:
gcc-4.8 already supported -march=armv8, and we require gcc-5.1 now, so
both this #if/#else construct and the corresponding
"cc32-option,-march=armv8-a" check should be obsolete now.

Link: https://lore.kernel.org/lkml/CAK8P3a3UBEJ0Py2ycz=rHfgog8g3mCOeQOwO0Gmp-iz6Uxkapg@mail.gmail.com/Suggested-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NNick Desaulniers <ndesaulniers@google.com>
Reviewed-by: NVincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: NNathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20211019223646.1146945-3-ndesaulniers@google.comSigned-off-by: NWill Deacon <will@kernel.org>

a517faa9

arm64: vdso32: drop the test for dmb ishld · 1907d3ff

由 Nick Desaulniers 提交于 10月 19, 2021

Binutils added support for this instruction in commit
e797f7e0b2bedc9328d4a9a0ebc63ca7a2dbbebc which shipped in 2.24 (just
missing the 2.23 release) but was cherry-picked into 2.23 in commit
27a50d6755bae906bc73b4ec1a8b448467f0bea1. Thanks to Christian and Simon
for helping me with the patch archaeology.

According to Documentation/process/changes.rst, the minimum supported
version of binutils is 2.23. Since all supported versions of GAS support
this instruction, drop the assembler invocation, preprocessor
flags/guards, and the cross assembler macro that's now unused.

This also avoids a recursive self reference in a follow up cleanup
patch.

Cc: Christian Biesinger <cbiesinger@google.com>
Cc: Simon Marchi <simon.marchi@polymtl.ca>
Signed-off-by: NNick Desaulniers <ndesaulniers@google.com>
Reviewed-by: NVincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: NNathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20211019223646.1146945-2-ndesaulniers@google.comSigned-off-by: NWill Deacon <will@kernel.org>

1907d3ff

arm64/sve: Track vector lengths for tasks in an array · 5838a155

由 Mark Brown 提交于 10月 19, 2021

As for SVE we will track a per task SME vector length for tasks. Convert
the existing storage for the vector length into an array and update
fpsimd_flush_task() to initialise this in a function.
Signed-off-by: NMark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211019172247.3045838-10-broonie@kernel.orgSigned-off-by: NWill Deacon <will@kernel.org>

5838a155

arm64/sve: Explicitly load vector length when restoring SVE state · ddc806b5

由 Mark Brown 提交于 10月 19, 2021

Currently when restoring the SVE state we supply the SVE vector length
as an argument to sve_load_state() and the underlying macros. This becomes
inconvenient with the addition of SME since we may need to restore any
combination of SVE and SME vector lengths, and we already separately
restore the vector length in the KVM code. We don't need to know the vector
length during the actual register load since the SME load instructions can
index into the data array for us.

Refactor the interface so we explicitly set the vector length separately
to restoring the SVE registers in preparation for adding SME support, no
functional change should be involved.
Signed-off-by: NMark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211019172247.3045838-9-broonie@kernel.orgSigned-off-by: NWill Deacon <will@kernel.org>

ddc806b5

arm64/sve: Put system wide vector length information into structs · b5bc00ff

由 Mark Brown 提交于 10月 19, 2021

With the introduction of SME we will have a second vector length in the
system, enumerated and configured in a very similar fashion to the
existing SVE vector length. While there are a few differences in how
things are handled this is a relatively small portion of the overall
code so in order to avoid code duplication we factor out

We create two structs, one vl_info for the static hardware properties
and one vl_config for the runtime configuration, with an array
instantiated for each and update all the users to reference these. Some
accessor functions are provided where helpful for readability, and the
write to set the vector length is put into a function since the system
register being updated needs to be chosen at compile time.

This is a mostly mechanical replacement, further work will be required
to actually make things generic, ensuring that we handle those places
where there are differences properly.
Signed-off-by: NMark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211019172247.3045838-8-broonie@kernel.orgSigned-off-by: NWill Deacon <will@kernel.org>

b5bc00ff

arm64/sve: Use accessor functions for vector lengths in thread_struct · 0423eedc

由 Mark Brown 提交于 10月 19, 2021

In a system with SME there are parallel vector length controls for SVE and
SME vectors which function in much the same way so it is desirable to
share the code for handling them as much as possible. In order to prepare
for doing this add a layer of accessor functions for the various VL related
operations on tasks.

Since almost all current interactions are actually via task->thread rather
than directly with the thread_info the accessors use that. Accessors are
provided for both generic and SVE specific usage, the generic accessors
should be used for cases where register state is being manipulated since
the registers are shared between streaming and regular SVE so we know that
when SME support is implemented we will always have to be in the appropriate
mode already and hence can generalise now.

Since we are using task_struct and we don't want to cause widespread
inclusion of sched.h the acessors are all out of line, it is hoped that
none of the uses are in a sufficiently critical path for this to be an
issue. Those that are most likely to present an issue are in the same
translation unit so hopefully the compiler may be able to inline anyway.

This is purely adding the layer of abstraction, additional work will be
needed to support tasks using SME.
Signed-off-by: NMark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211019172247.3045838-7-broonie@kernel.orgSigned-off-by: NWill Deacon <will@kernel.org>

0423eedc

arm64/sve: Make access to FFR optional · 9f584866

由 Mark Brown 提交于 10月 19, 2021

SME introduces streaming SVE mode in which FFR is not present and the
instructions for accessing it UNDEF. In preparation for handling this
update the low level SVE state access functions to take a flag specifying
if FFR should be handled. When saving the register state we store a zero
for FFR to guard against uninitialized data being read. No behaviour change
should be introduced by this patch.
Signed-off-by: NMark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211019172247.3045838-5-broonie@kernel.orgSigned-off-by: NWill Deacon <will@kernel.org>

9f584866

arm64/sve: Make sve_state_size() static · 12cc2352

由 Mark Brown 提交于 10月 19, 2021

There are no users outside fpsimd.c so make sve_state_size() static.
KVM open codes an equivalent.
Signed-off-by: NMark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211019172247.3045838-4-broonie@kernel.orgSigned-off-by: NWill Deacon <will@kernel.org>

12cc2352

arm64/sve: Remove sve_load_from_fpsimd_state() · b53223e0

由 Mark Brown 提交于 10月 19, 2021

Following optimisations of the SVE register handling we no longer load the
SVE state from a saved copy of the FPSIMD registers, we convert directly
in registers or from one saved state to another. Remove the function so we
don't need to update it during further refactoring.
Signed-off-by: NMark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211019172247.3045838-3-broonie@kernel.orgSigned-off-by: NWill Deacon <will@kernel.org>

b53223e0

19 10月, 2021 3 次提交

arm64: Add HWCAP for self-synchronising virtual counter · fee29f00

由 Marc Zyngier 提交于 10月 17, 2021

Since userspace can make use of the CNTVSS_EL0 instruction, expose
it via a HWCAP.
Suggested-by: NWill Deacon <will@kernel.org>
Acked-by: NWill Deacon <will@kernel.org>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211017124225.3018098-18-maz@kernel.orgSigned-off-by: NWill Deacon <will@kernel.org>

fee29f00

arm64: Add handling of CNTVCTSS traps · ae976f06

由 Marc Zyngier 提交于 10月 17, 2021

Since CNTVCTSS obey the same control bits as CNTVCT, add the necessary
decoding to the hook table. Note that there is no known user of
this at the moment.
Acked-by: NWill Deacon <will@kernel.org>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211017124225.3018098-17-maz@kernel.orgSigned-off-by: NWill Deacon <will@kernel.org>

ae976f06

arm64: Add CNT{P,V}CTSS_EL0 alternatives to cnt{p,v}ct_el0 · 9ee840a9

由 Marc Zyngier 提交于 10月 17, 2021

CNTPCTSS_EL0 and CNTVCTSS_EL0 are alternatives to the usual
CNTPCT_EL0 and CNTVCT_EL0 that do not require a previous ISB
to be synchronised (SS stands for Self-Synchronising).

Use the ARM64_HAS_ECV capability to control alternative sequences
that switch to these low(er)-cost primitives. Note that the
counter access in the VDSO is for now left alone until we decide
whether we want to allow this.
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211017124225.3018098-16-maz@kernel.orgSigned-off-by: NWill Deacon <will@kernel.org>

9ee840a9

18 10月, 2021 7 次提交

KVM: arm64: pkvm: Consolidate include files · 3061725d

由 Marc Zyngier 提交于 10月 13, 2021

kvm_fixed_config.h is pkvm specific, and would be better placed
near its users. At the same time, include/nvhe/sys_regs.h is now
almost empty.

Merge the two into arch/arm64/kvm/hyp/include/nvhe/fixed_config.h.
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Reviewed-by: NFuad Tabba <tabba@google.com>
Tested-by: NFuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20211013120346.2926621-9-maz@kernel.org

3061725d

clocksource/drivers/arch_arm_timer: Move workaround synchronisation around · db26f8f2

由 Marc Zyngier 提交于 10月 17, 2021

We currently handle synchronisation when workarounds are enabled
by having an ISB in the __arch_counter_get_cnt?ct_stable() helpers.

While this works, this prevents us from relaxing this synchronisation.

Instead, move it closer to the point where the synchronisation is
actually needed. Further patches will subsequently relax this.
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211017124225.3018098-14-maz@kernel.orgSigned-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>

db26f8f2

clocksource/drivers/arm_arch_timer: Drop unnecessary ISB on CVAL programming · ec8f7f33

由 Marc Zyngier 提交于 10月 17, 2021

Switching from TVAL to CVAL has a small drawback: we need an ISB
before reading the counter. We cannot get rid of it, but we can
instead remove the one that comes just after writing to CVAL.

This reduces the number of ISBs from 3 to 2 when programming
the timer.
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211017124225.3018098-12-maz@kernel.orgSigned-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>

ec8f7f33

clocksource/drivers/arm_arch_timer: Move system register timer programming over to CVAL · a38b71b0

由 Marc Zyngier 提交于 10月 17, 2021

In order to cope better with high frequency counters, move the
programming of the timers from the countdown timer (TVAL) over
to the comparator (CVAL).

The programming model is slightly different, as we now need to
read the current counter value to have an absolute deadline
instead of a relative one.

There is a small overhead to this change, which we will address
in the following patches.
Reviewed-by: NOliver Upton <oupton@google.com>
Reviewed-by: NMark Rutland <mark.rutland@arm.com>
Tested-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211017124225.3018098-5-maz@kernel.orgSigned-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>

a38b71b0

clocksource/drivers/arm_arch_timer: Extend write side of timer register accessors to u64 · 1e8d9292

由 Marc Zyngier 提交于 10月 17, 2021

The various accessors for the timer sysreg and MMIO registers are
currently hardwired to 32bit. However, we are about to introduce
the use of the CVAL registers, which require a 64bit access.

Upgrade the write side of the accessors to take a 64bit value
(the read side is left untouched as we don't plan to ever read
back any of these registers).

No functional change expected.
Reviewed-by: NOliver Upton <oupton@google.com>
Reviewed-by: NMark Rutland <mark.rutland@arm.com>
Tested-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211017124225.3018098-4-maz@kernel.orgSigned-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>

1e8d9292

clocksource/drivers/arm_arch_timer: Drop CNT*_TVAL read accessors · d7268998

由 Marc Zyngier 提交于 10月 17, 2021

The arch timer driver never reads the various TVAL registers, only
writes to them. It is thus pointless to provide accessors
for them and to implement errata workarounds.

Drop these read-side accessors, and add a couple of BUG() statements
for the time being. These statements will be removed further down
the line.
Reviewed-by: NOliver Upton <oupton@google.com>
Reviewed-by: NMark Rutland <mark.rutland@arm.com>
Tested-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211017124225.3018098-3-maz@kernel.orgSigned-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>

d7268998

clocksource/arm_arch_timer: Add build-time guards for unhandled register accesses · 4775bc63

由 Marc Zyngier 提交于 10月 17, 2021

As we are about to change the registers that are used by the driver,
start by adding build-time checks to ensure that we always handle
all registers and access modes.
Suggested-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211017124225.3018098-2-maz@kernel.orgSigned-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>

4775bc63

17 10月, 2021 1 次提交

KVM: arm64: vgic-v3: Reduce common group trapping to ICV_DIR_EL1 when possible · 0924729b

由 Marc Zyngier 提交于 10月 10, 2021

On systems that advertise ICH_VTR_EL2.SEIS, we trap all GICv3 sysreg
accesses from the guest. From a performance perspective, this is OK
as long as the guest doesn't hammer the GICv3 CPU interface.

In most cases, this is fine, unless the guest actively uses
priorities and switches PMR_EL1 very often. Which is exactly what
happens when a Linux guest runs with irqchip.gicv3_pseudo_nmi=1.
In these condition, the performance plumets as we hit PMR each time
we mask/unmask interrupts. Not good.

There is however an opportunity for improvement. Careful reading
of the architecture specification indicates that the only GICv3
sysreg belonging to the common group (which contains the SGI
registers, PMR, DIR, CTLR and RPR) that is allowed to generate
a SError is DIR. Everything else is safe.

It is thus possible to substitute the trapping of all the common
group with just that of DIR if it supported by the implementation.
Yes, that's yet another optional bit of the architecture.
So let's just do that, as it leads to some impressive result on
the M1:

Without this change:
	bash-5.1# /host/home/maz/hackbench 100 process 1000
	Running with 100*40 (== 4000) tasks.
	Time: 56.596

With this change:
	bash-5.1# /host/home/maz/hackbench 100 process 1000
	Running with 100*40 (== 4000) tasks.
	Time: 8.649

which is a pretty convincing result.
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Reviewed-by: NAlexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20211010150910.2911495-4-maz@kernel.org

0924729b

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功