提交 · b282e1ce29bb677224ba8fb38e94f5e94e2656d5 · openanolis / cloud-kernel

02 11月, 2017 10 次提交

arm64: entry.S: convert elX_irq · b282e1ce

由 James Morse 提交于 11月 02, 2017

Following our 'dai' order, irqs should be processed with debug and
serror exceptions unmasked.

Add a helper to unmask these two, (and fiq for good measure).
Signed-off-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

b282e1ce

arm64: entry.S convert el0_sync · 746647c7

由 James Morse 提交于 11月 02, 2017

el0_sync also unmasks exceptions on a case-by-case basis, debug exceptions
are enabled, unless this was a debug exception. Irqs are unmasked for
some exception types but not for others.

el0_dbg should run with everything masked to prevent us taking a debug
exception from do_debug_exception. For the other cases we can unmask
everything. This changes the behaviour of fpsimd_{acc,exc} and el0_inv
which previously ran with irqs masked.

This patch removed the last user of enable_dbg_and_irq, remove it.
Signed-off-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

746647c7

arm64: entry.S: convert el1_sync · b55a5a1b

由 James Morse 提交于 11月 02, 2017

el1_sync unmasks exceptions on a case-by-case basis, debug exceptions
are unmasked, unless this was a debug exception. IRQs are unmasked
for instruction and data aborts only if the interupted context had
irqs unmasked.

Following our 'dai' order, el1_dbg should run with everything masked.
For the other cases we can inherit whatever we interrupted.

Add a macro inherit_daif to set daif based on the interrupted pstate.
Signed-off-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

b55a5a1b

arm64: entry.S: Remove disable_dbg · 84d0fb1b

由 James Morse 提交于 11月 02, 2017

enable_step_tsk is the only user of disable_dbg, which doesn't respect
our 'dai' order for exception masking. enable_step_tsk may enable
single-step, so previously needed to mask debug exceptions to prevent us
from single-stepping kernel_exit. enable_step_tsk is called at the end
of the ret_to_user loop, which has already masked all exceptions so this
is no longer needed.

Remove disable_dbg, add a comment that enable_step_tsk's caller should
have masked debug.
Signed-off-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

84d0fb1b

arm64: Mask all exceptions during kernel_exit · 8d66772e

由 James Morse 提交于 11月 02, 2017

To take RAS Exceptions as quickly as possible we need to keep SError
unmasked as much as possible. We need to mask it during kernel_exit
as taking an error from this code will overwrite the exception-registers.

Adding a naked 'disable_daif' to kernel_exit causes a performance problem
for micro-benchmarks that do no real work, (e.g. calling getpid() in a
loop). This is because the ret_to_user loop has already masked IRQs so
that the TIF_WORK_MASK thread flags can't change underneath it, adding
disable_daif is an additional self-synchronising operation.

In the future, the RAS APEI code may need to modify the TIF_WORK_MASK
flags from an SError, in which case the ret_to_user loop must mask SError
while it examines the flags.

Disable all exceptions for return to EL1. For return to EL0 get the
ret_to_user loop to leave all exceptions masked once it has done its
work, this avoids an extra pstate-write.
Signed-off-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

8d66772e

arm64: Move the async/fiq helpers to explicitly set process context flags · 41bd5b5d

由 James Morse 提交于 11月 02, 2017

Remove the local_{async,fiq}_{en,dis}able macros as they don't respect
our newly defined order and are only used to set the flags for process
context when we bring CPUs online.

Add a helper to do this. The IRQ flag varies as we want it masked on
the boot CPU until we are ready to handle interrupts.
The boot CPU unmasks SError during early boot once it can print an error
message. If we can print an error message about SError, we can do the
same for FIQ. Debug exceptions are already enabled by __cpu_setup(),
which has also configured MDSCR_EL1 to disable MDE and KDE.
Signed-off-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

41bd5b5d

arm64: introduce an order for exceptions · 65be7a1b

由 James Morse 提交于 11月 02, 2017

Currently SError is always masked in the kernel. To support RAS exceptions
using SError on hardware with the v8.2 RAS Extensions we need to unmask
SError as much as possible.

Let's define an order for masking and unmasking exceptions. 'dai' is
memorable and effectively what we have today.

Disabling debug exceptions should cause all other exceptions to be masked.
Masking SError should mask irq, but not disable debug exceptions.
Masking irqs has no side effects for other flags. Keeping to this order
makes it easier for entry.S to know which exceptions should be unmasked.

FIQ is never expected, but we mask it when we mask debug exceptions, and
unmask it at all other times.

Given masking debug exceptions masks everything, we don't need macros
to save/restore that bit independently. Remove them and switch the last
caller over to use the daif calls.
Signed-off-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

65be7a1b

arm64: explicitly mask all exceptions · 0fbeb318

由 James Morse 提交于 11月 02, 2017

There are a few places where we want to mask all exceptions. Today we
do this in a piecemeal fashion, typically we expect the caller to
have masked irqs and the arch code masks debug exceptions, ignoring
serror which is probably masked.

Make it clear that 'mask all exceptions' is the intention by adding
helpers to do exactly that.

This will let us unmask SError without having to add 'oh and SError'
to these paths.
Signed-off-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

0fbeb318

arm64: suspend: remove useless included file · c10f0d06

由 Yisheng Xie 提交于 11月 01, 2017

After commit 9e8e865b ("arm64: unify idmap removal"), we no need to
flush tlb in suspend.c, so the included file tlbflush.h can be removed.
Signed-off-by: NYisheng Xie <xieyisheng1@huawei.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

c10f0d06

arm64: Don't walk page table for user faults in do_mem_abort · 80b6eb04

由 Will Deacon 提交于 10月 31, 2017

Commit 42dbf54e ("arm64: consistently log ESR and page table")
dumps page table entries for user faults hitting do_bad entries in the
fault handler table. Whilst this shouldn't really happen in practice,
it's not beyond the realms of possibility if e.g. running an old kernel
on a new CPU.

Generally, we want to avoid exposing physical addresses under the control
of userspace (see commit bf396c09 ("arm64: mm: don't print out page
table entries on EL0 faults")), so walk the page tables only on exceptions
from EL1.
Reported-by: NKristina Martsenko <kristina.martsenko@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

80b6eb04

31 10月, 2017 1 次提交

arm64: vdso: fix clock_getres for 4GiB-aligned res · c80ed088

由 Mark Rutland 提交于 10月 30, 2017

The vdso tries to check for a NULL res pointer in __kernel_clock_getres,
but only checks the lower 32 bits as is uses CBZ on the W register the
res pointer is held in.

Thus, if the res pointer happened to be aligned to a 4GiB boundary, we'd
spuriously skip storing the timespec to it, while returning a zero error code
to the caller.

Prevent this by checking the whole pointer, using CBZ on the X register
the res pointer is held in.

Fixes: 9031fefd ("arm64: VDSO support")
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Reported-by: NAndrew Pinski <apinski@cavium.com>
Reported-by: NMark Salyzyn <salyzyn@android.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

c80ed088

30 10月, 2017 2 次提交

arm64: prevent regressions in compressed kernel image size when upgrading to binutils 2.27 · fd9dde6a

由 Nick Desaulniers 提交于 10月 27, 2017

Upon upgrading to binutils 2.27, we found that our lz4 and gzip
compressed kernel images were significantly larger, resulting is 10ms
boot time regressions.

As noted by Rahul:
"aarch64 binaries uses RELA relocations, where each relocation entry
includes an addend value. This is similar to x86_64.  On x86_64, the
addend values are also stored at the relocation offset for relative
relocations. This is an optimization: in the case where code does not
need to be relocated, the loader can simply skip processing relative
relocations.  In binutils-2.25, both bfd and gold linkers did this for
x86_64, but only the gold linker did this for aarch64.  The kernel build
here is using the bfd linker, which stored zeroes at the relocation
offsets for relative relocations.  Since a set of zeroes compresses
better than a set of non-zero addend values, this behavior was resulting
in much better lz4 compression.

The bfd linker in binutils-2.27 is now storing the actual addend values
at the relocation offsets. The behavior is now consistent with what it
does for x86_64 and what gold linker does for both architectures.  The
change happened in this upstream commit:
https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=1f56df9d0d5ad89806c24e71f296576d82344613
Since a bunch of zeroes got replaced by non-zero addend values, we see
the side effect of lz4 compressed image being a bit bigger.

To get the old behavior from the bfd linker, "--no-apply-dynamic-relocs"
flag can be used:
$ LDFLAGS="--no-apply-dynamic-relocs" make
With this flag, the compressed image size is back to what it was with
binutils-2.25.

If the kernel is using ASLR, there aren't additional runtime costs to
--no-apply-dynamic-relocs, as the relocations will need to be applied
again anyway after the kernel is relocated to a random address.

If the kernel is not using ASLR, then presumably the current default
behavior of the linker is better. Since the static linker performed the
dynamic relocs, and the kernel is not moved to a different address at
load time, it can skip applying the relocations all over again."

Some measurements:

$ ld -v
GNU ld (binutils-2.25-f3d35cf6) 2.25.51.20141117
                    ^
$ ls -l vmlinux
-rwxr-x--- 1 ndesaulniers eng 300652760 Oct 26 11:57 vmlinux
$ ls -l Image.lz4-dtb
-rw-r----- 1 ndesaulniers eng 16932627 Oct 26 11:57 Image.lz4-dtb

$ ld -v
GNU ld (binutils-2.27-53dd00a1) 2.27.0.20170315
                    ^
pre patch:
$ ls -l vmlinux
-rwxr-x--- 1 ndesaulniers eng 300376208 Oct 26 11:43 vmlinux
$ ls -l Image.lz4-dtb
-rw-r----- 1 ndesaulniers eng 18159474 Oct 26 11:43 Image.lz4-dtb

post patch:
$ ls -l vmlinux
-rwxr-x--- 1 ndesaulniers eng 300376208 Oct 26 12:06 vmlinux
$ ls -l Image.lz4-dtb
-rw-r----- 1 ndesaulniers eng 16932466 Oct 26 12:06 Image.lz4-dtb

By Siqi's measurement w/ gzip:
binutils 2.27 with this patch (with --no-apply-dynamic-relocs):
Image 41535488
Image.gz 13404067

binutils 2.27 without this patch (without --no-apply-dynamic-relocs):
Image 41535488
Image.gz 14125516

Any compression scheme should be able to get better results from the
longer runs of zeros, not just GZIP and LZ4.

10ms boot time savings isn't anything to get excited about, but users of
arm64+compression+bfd-2.27 should not have to pay a penalty for no
runtime improvement.
Reported-by: NGopinath Elanchezhian <gelanchezhian@google.com>
Reported-by: NSindhuri Pentyala <spentyala@google.com>
Reported-by: NWei Wang <wvw@google.com>
Suggested-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Suggested-by: NRahul Chaudhry <rahulchaudhry@google.com>
Suggested-by: NSiqi Lin <siqilin@google.com>
Suggested-by: NStephen Hines <srhines@google.com>
Signed-off-by: NNick Desaulniers <ndesaulniers@google.com>
Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
[will: added comment to Makefile]
Signed-off-by: NWill Deacon <will.deacon@arm.com>

fd9dde6a

arm64: Implement arch-specific pte_access_permitted() · 6218f96c

由 Catalin Marinas 提交于 10月 26, 2017

The generic pte_access_permitted() implementation only checks for
pte_present() (together with the write permission where applicable).
However, for both kernel ptes and PROT_NONE mappings pte_present() also
returns true on arm64 even though such mappings are not user accessible.
Additionally, arm64 now supports execute-only user permission
(PROT_EXEC) which is implemented by clearing the PTE_USER bit.

With this patch the arm64 implementation of pte_access_permitted()
checks for the PTE_VALID and PTE_USER bits together with writable access
if applicable.

Cc: <stable@vger.kernel.org>
Reported-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

6218f96c

27 10月, 2017 4 次提交

arm64: uapi: Remove PSR_Q_BIT · d7b1d22d

由 Will Deacon 提交于 10月 19, 2017

PSTATE.Q only exists for AArch32, which can be referred to using
COMPAT_PSR_Q_BIT. Remove PSR_Q_BIT, since the native bit doesn't exist
in the architecture
Tested-by: NLaura Abbott <labbott@redhat.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

d7b1d22d

arm64: traps: Pretty-print pstate in register dumps · b7300d4c

由 Will Deacon 提交于 10月 19, 2017

We can decode the PSTATE easily enough, so pretty-print it in register
dumps.
Tested-by: NLaura Abbott <labbott@redhat.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

b7300d4c

arm64: traps: Don't print stack or raw PC/LR values in backtraces · a25ffd3a

由 Will Deacon 提交于 10月 19, 2017

Printing raw pointer values in backtraces has potential security
implications and are of questionable value anyway.

This patch follows x86's lead and removes the "Exception stack:" dump
from kernel backtraces, as well as converting PC/LR values to symbols
such as "sysrq_handle_crash+0x20/0x30".
Tested-by: NLaura Abbott <labbott@redhat.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

a25ffd3a

arm64: consistently log ESR and page table · 42dbf54e

由 Mark Rutland 提交于 10月 19, 2017

When we take a fault we can't handle, we try to dump some relevant
information, but we're not consistent about doing so.

In do_mem_abort(), we log the full ESR, but don't dump a page table
walk. In __do_kernel_fault, we dump an attempted decoding of the ESR
(but not the ESR itself) along with a page table walk.

Let's try to make things more consistent by dumping the full ESR in
mem_abort_decode(), and having do_mem_abort dump a page table walk. The
existing dump of the ESR in do_mem_abort() is rendered redundant, and
removed.
Tested-by: NLaura Abbott <labbott@redhat.com>
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Julien Thierry <julien.thierry@arm.com>
Cc: Kristina Martsenko <kristina.martsenko@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

42dbf54e

25 10月, 2017 3 次提交

arm64: asm-bug: Renumber macro local labels to avoid clashes · fa3eb71d

由 Dave Martin 提交于 10月 24, 2017

Currently ASM_BUG() and its constituent macros define local
assembler labels 0, 1 and 2 internally, which carries a high risk
of clash with callers' labels and consequent mis-assembly.

This patch gives the labels a big random offset to minimise the
chance of such errors.
Signed-off-by: NDave Martin <Dave.Martin@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

fa3eb71d

arm64: Fix single stepping in kernel traps · 6436beee

由 Julien Thierry 提交于 10月 25, 2017

Software Step exception is missing after stepping a trapped instruction.

Ensure SPSR.SS gets set to 0 after emulating/skipping a trapped instruction
before doing ERET.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: NJulien Thierry <julien.thierry@arm.com>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
[will: replaced AARCH32_INSN_SIZE with 4]
Signed-off-by: NWill Deacon <will.deacon@arm.com>

6436beee

arm64: Use existing defines for mdscr · e28cc025

由 Julien Thierry 提交于 10月 25, 2017

Literal values are being used to set single stepping in mdscr from assembly
code. There are already existing defines representing those values, use
those instead of the literal values.
Signed-off-by: NJulien Thierry <julien.thierry@arm.com>
Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
Acked-by: NMark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

e28cc025

24 10月, 2017 4 次提交

arm64: Avoid aligning normal memory pointers in __memcpy_{to,from}io · 9ca255bf

由 Mark Salyzyn 提交于 10月 24, 2017

__memcpy_{to,from}io fall back to byte-at-a-time copying if both the
source and destination pointers are not 8-byte aligned. Since one of the
pointers always points at normal memory, this is unnecessary and
detrimental to performance, so only do byte copying until we hit an 8-byte
boundary for the device pointer.

This change was motivated by performance issues in the pstore driver.
On a test platform, measuring probe time for pstore, console buffer
size of 1/4MB and pmsg of 1/2MB, was in the 90-107ms region. Change
managed to reduce it to 10-25ms, an improvement in boot time.

Cc: Kees Cook <keescook@chromium.org>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: NMark Salyzyn <salyzyn@android.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

9ca255bf

Merge branch 'for-next/perf' into aarch64/for-next/core · 1e0c661f

由 Will Deacon 提交于 10月 24, 2017

Merge in ARM PMU and perf updates for 4.15:

  - Support for the Statistical Profiling Extension
  - Support for Hisilicon's SoC PMU
Signed-off-by: NWill Deacon <will.deacon@arm.com>

1e0c661f

arm/arm64: pmu: Distinguish percpu irq and percpu_devid irq · 611479c7

由 Julien Thierry 提交于 10月 13, 2017

arm_pmu interrupts are maked as PERCPU even when these are not local
physical interrupts to a single CPU. When using non-local interrupts,
interrupts marked as PERCPU will not get freed not disabled properly
by the PMU driver.

Check if interrupts are local to a single CPU with PERCPU_DEVID since
this is what the PMU driver really needs to know.
Acked-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NJulien Thierry <julien.thierry@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

611479c7

irqdesc: Add function to identify percpu_devid irqs · 08395c7f

由 Julien Thierry 提交于 10月 13, 2017

irq_is_percpu indicates whether an irq should only target a single cpu.
PERCPU_DEVID flag indicates that an irq can be configured differently on
each cpu it can target.

Provide a function to check whether an irq is PERCPU_DEVID.
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NJulien Thierry <julien.thierry@arm.com>

08395c7f

20 10月, 2017 7 次提交

arm64: Fix the feature type for ID register fields · 5bdecb79

由 Suzuki K Poulose 提交于 10月 19, 2017

Now that the ARM ARM clearly specifies the rules for inferring
the values of the ID register fields, fix the types of the
feature bits we have in the kernel.

As per ARM ARM DDI0487B.b, section D10.1.4 "Principles of the
ID scheme for fields in ID registers" lists the registers to
which the scheme applies along with the exceptions.

This patch changes the relevant feature bits from FTR_EXACT
to FTR_LOWER_SAFE to select the safer value. This will enable
an older kernel running on a new CPU detect the safer option
rather than completely disabling the feature.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Martin <dave.martin@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

5bdecb79

arm64: MAINTAINERS: hisi: Add HiSilicon SoC PMU support · 07141342

由 Shaokun Zhang 提交于 10月 19, 2017

Add support HiSilicon SoC uncore PMU driver.
Acked-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NShaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

07141342

perf: hisi: Add support for HiSilicon SoC DDRC PMU driver · 904dcf03

由 Shaokun Zhang 提交于 10月 19, 2017

This patch adds support for DDRC PMU driver in HiSilicon SoC chip, Each
DDRC has own control, counter and interrupt registers and is an separate
PMU. For each DDRC PMU, it has 8-fixed-purpose counters which have been
mapped to 8-events by hardware, it assumes that counter index is equal
to event code (0 - 7) in DDRC PMU driver. Interrupt is supported to
handle counter (32-bits) overflow.
Acked-by: NMark Rutland <mark.rutland@arm.com>
Reviewed-by: NJonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: NShaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: NAnurup M <anurup.m@huawei.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

904dcf03

perf: hisi: Add support for HiSilicon SoC HHA PMU driver · 2bab3cf9

由 Shaokun Zhang 提交于 10月 19, 2017

L3 cache coherence is maintained by Hydra Home Agent (HHA) in HiSilicon
SoC. This patch adds support for HHA PMU driver, Each HHA has own
control, counter and interrupt registers and is an separate PMU. For
each HHA PMU, it has 16-programable counters and each counter is
free-running. Interrupt is supported to handle counter (48-bits)
overflow.
Acked-by: NMark Rutland <mark.rutland@arm.com>
Reviewed-by: NJonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: NShaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: NAnurup M <anurup.m@huawei.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

2bab3cf9

perf: hisi: Add support for HiSilicon SoC L3C PMU driver · 2940bc43

由 Shaokun Zhang 提交于 10月 19, 2017

This patch adds support for L3C PMU driver in HiSilicon SoC chip, Each
L3C has own control, counter and interrupt registers and is an separate
PMU. For each L3C PMU, it has 8-programable counters and each counter
is free-running. Interrupt is supported to handle counter (48-bits)
overflow.
Acked-by: NMark Rutland <mark.rutland@arm.com>
Reviewed-by: NJonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: NShaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: NAnurup M <anurup.m@huawei.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

2940bc43

perf: hisi: Add support for HiSilicon SoC uncore PMU driver · 6ce4ef94

由 Shaokun Zhang 提交于 10月 19, 2017

This patch adds support HiSilicon SoC uncore PMU driver framework and
interfaces.
Acked-by: NMark Rutland <mark.rutland@arm.com>
Reviewed-by: NJonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: NShaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: NAnurup M <anurup.m@huawei.com>
[will: Fix leader accounting in uncore group validation]
Signed-off-by: NWill Deacon <will.deacon@arm.com>

6ce4ef94

Documentation: perf: hisi: Documentation for HiSilicon SoC PMU driver · 3125b5b2

由 Shaokun Zhang 提交于 10月 19, 2017

This patch adds documentation for the uncore PMUs on HiSilicon SoC.
Acked-by: NMark Rutland <mark.rutland@arm.com>
Reviewed-by: NJonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: NShaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: NAnurup M <anurup.m@huawei.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

3125b5b2

19 10月, 2017 1 次提交

arm64: Update fault_info table with new exception types · 3f7c86b2

由 Julien Thierry 提交于 10月 17, 2017

Based on: ARM Architecture Reference Manual, ARMv8 (DDI 0487B.b).

ARMv8.1 introduces the optional feature ARMv8.1-TTHM which can trigger a
new type of memory abort. This exception is triggered when hardware update
of page table flags is not atomic in regards to other memory accesses.
Replace the corresponding unknown entry with a more accurate one.

Cf: Section D10.2.28 ESR_ELx, Exception Syndrome Register (p D10-2381),
section D4.4.11 Restriction on memory types for hardware updates on page
tables (p D4-2116 - D4-2117).

ARMv8.2 does not add new exception types, however it is worth mentioning
that when obligatory feature RAS (optional for ARMv8.{0,1}) is implemented,
exceptions related to "Synchronous parity or ECC error on memory access,
not on translation table walk" become reserved and should not occur.
Signed-off-by: NJulien Thierry <julien.thierry@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

3f7c86b2

18 10月, 2017 7 次提交

drivers/perf: Add support for ARMv8.2 Statistical Profiling Extension · d5d9696b

由 Will Deacon 提交于 9月 22, 2016

The ARMv8.2 architecture introduces the optional Statistical Profiling
Extension (SPE).

SPE can be used to profile a population of operations in the CPU pipeline
after instruction decode. These are either architected instructions (i.e.
a dynamic instruction trace) or CPU-specific uops and the choice is fixed
statically in the hardware and advertised to userspace via caps/. Sampling
is controlled using a sampling interval, similar to a regular PMU counter,
but also with an optional random perturbation to avoid falling into patterns
where you continuously profile the same instruction in a hot loop.

After each operation is decoded, the interval counter is decremented. When
it hits zero, an operation is chosen for profiling and tracked within the
pipeline until it retires. Along the way, information such as TLB lookups,
cache misses, time spent to issue etc is captured in the form of a sample.
The sample is then filtered according to certain criteria (e.g. load
latency) that can be specified in the event config (described under
format/) and, if the sample satisfies the filter, it is written out to
memory as a record, otherwise it is discarded. Only one operation can
be sampled at a time.

The in-memory buffer is linear and virtually addressed, raising an
interrupt when it fills up. The PMU driver handles these interrupts to
give the appearance of a ring buffer, as expected by the AUX code.

The in-memory trace-like format is self-describing (though not parseable
in reverse) and written as a series of records, with each record
corresponding to a sample and consisting of a sequence of packets. These
packets are defined by the architecture, although some have CPU-specific
fields for recording information specific to the microarchitecture.

As a simple example, a record generated for a branch instruction may
consist of the following packets:

  0 (Address) : Virtual PC of the branch instruction
  1 (Type)    : Conditional direct branch
  2 (Counter) : Number of cycles taken from Dispatch to Issue
  3 (Address) : Virtual branch target + condition flags
  4 (Counter) : Number of cycles taken from Dispatch to Complete
  5 (Events)  : Mispredicted as not-taken
  6 (END)     : End of record

It is also possible to toggle properties such as timestamp packets in
each record.

This patch adds support for SPE in the form of a new perf driver.

Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Reviewed-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

d5d9696b

dt-bindings: Document devicetree binding for ARM SPE · 4b8b77a4

由 Will Deacon 提交于 9月 22, 2016

This patch documents the devicetree binding in use for ARM SPE.

Cc: Rob Herring <robh@kernel.org>
Acked-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

4b8b77a4

arm64: head: Init PMSCR_EL2.{PA,PCT} when entered at EL2 without VHE · b0c57e10

由 Will Deacon 提交于 7月 07, 2017

When booting at EL2, ensure that we permit the EL1 host to sample
physical addresses and physical counter values using SPE.
Acked-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

b0c57e10

arm64: sysreg: Move SPE registers and PSB into common header files · a173c390

由 Will Deacon 提交于 9月 20, 2017

SPE is part of the v8.2 architecture, so move its system register and
field definitions into sysreg.h and the new PSB barrier into barrier.h

Finally, move KVM over to using the generic definitions so that it
doesn't have to open-code its own versions.
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Acked-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

a173c390

perf/core: Add PERF_AUX_FLAG_COLLISION to report colliding samples · 085b3062

由 Will Deacon 提交于 9月 23, 2016

The ARM SPE architecture permits an implementation to ignore a sample
if the sample is due to be taken whilst another sample is already being
produced. In this case, it is desirable to report the collision to
userspace, as they may want to lower the sample period.

This patch adds a PERF_AUX_FLAG_COLLISION flag, so that such events can
be relayed to userspace.
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

085b3062

perf/core: Export AUX buffer helpers to modules · bc1d2020

由 Will Deacon 提交于 8月 16, 2016

Perf PMU drivers using AUX buffers cannot be built as modules unless
the AUX helpers are exported.

This patch exports perf_aux_output_{begin,end,skip} and perf_get_aux to
modules.

Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

bc1d2020

genirq: export irq_get_percpu_devid_partition to modules · 5ffeb050

由 Will Deacon 提交于 7月 25, 2016

Any modular driver using cluster-affine PPIs needs to be able to call
irq_get_percpu_devid_partition so that it can enable the IRQ on the
correct subset of CPUs.

This patch exports the symbol so that it can be called from within a
module.
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

5ffeb050

17 10月, 2017 1 次提交

Merge tag 'acpi/iort-for-v4.15' of... · 0515ce0f

由 Will Deacon 提交于 10月 17, 2017

Merge tag 'acpi/iort-for-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lpieralisi/linux into aarch64/for-next/core

Pull arm64 ACPI IORT updates from Lorenzo Pieralisi:

- Code clean-ups (A.Yadav, L.Pieralisi)
- Platform devices inizialization rework in preparation for IORT PMCG
  handling (L.Pieralisi)
- Mapping API rework to enable MSIs for IORT components as defined in
  IORT specification issue C (H.Guo, L.Pieralisi)
Signed-off-by: NWill Deacon <will.deacon@arm.com>

0515ce0f

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功