提交 · 09bb74816497ef8d8c1ff9ab51d2be14935da1f8 · openeuler / Kernel

27 12月, 2019 40 次提交

arm64: ssbd: Add support for PSTATE.SSBS rather than trapping to EL3 · 09bb7481

由 Will Deacon 提交于 8月 07, 2018

mainline inclusion
from mainline-v4.20-rc1
commit 8f04e8e6
category: feature
bugzilla: 20806
CVE: NA

-------------------------------------------------

On CPUs with support for PSTATE.SSBS, the kernel can toggle the SSBD
state without needing to call into firmware.

This patch hooks into the existing SSBD infrastructure so that SSBS is
used on CPUs that support it, but it's all made horribly complicated by
the very real possibility of big/little systems that don't uniformly
provide the new capability.
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

Conflicts:
  arch/arm64/kernel/process.c
  arch/arm64/kernel/ssbd.c
  arch/arm64/kernel/cpufeature.c
[yyl: adjust context]
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

09bb7481

arm64: cpufeature: Detect SSBS and advertise to userspace · be185032

由 Will Deacon 提交于 6月 15, 2018

mainline inclusion
from mainline-v4.20-rc1
commit d71be2b6
category: feature
bugzilla: 20806
CVE: NA

-------------------------------------------------

Armv8.5 introduces a new PSTATE bit known as Speculative Store Bypass
Safe (SSBS) which can be used as a mitigation against Spectre variant 4.

Additionally, a CPU may provide instructions to manipulate PSTATE.SSBS
directly, so that userspace can toggle the SSBS control without trapping
to the kernel.

This patch probes for the existence of SSBS and advertise the new instructions
to userspace if they exist.
Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Conflicts:
  arch/arm64/kernel/cpufeature.c
  arch/arm64/include/asm/cpucaps.h
[yyl: adjust context]
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

be185032

arm64: Fix silly typo in comment · 9f3c5929

由 Will Deacon 提交于 6月 15, 2018

mainline inclusion
from mainline-v4.20-rc1
commit ca7f686a
category: feature
bugzilla: 20806
CVE: NA

-------------------------------------------------

I was passing through and figuered I'd fix this up:

	featuer -> feature
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

9f3c5929

arm64/neon: Disable -Wincompatible-pointer-types when building with Clang · e9fabe0c

由 Nathan Chancellor 提交于 8月 31, 2019

mainline inclusion
from mainline-5.0
commit 0738c8b5
category: bugfix
bugzilla: 11024
CVE: NA

-------------------------------------------------

After commit cc9f8349 ("arm64: crypto: add NEON accelerated XOR
implementation"), Clang builds for arm64 started failing with the
following error message.

arch/arm64/lib/xor-neon.c:58:28: error: incompatible pointer types
assigning to 'const unsigned long *' from 'uint64_t *' (aka 'unsigned
long long *') [-Werror,-Wincompatible-pointer-types]
                v3 = veorq_u64(vld1q_u64(dp1 +  6), vld1q_u64(dp2 + 6));
                                         ^~~~~~~~
/usr/lib/llvm-9/lib/clang/9.0.0/include/arm_neon.h:7538:47: note:
expanded from macro 'vld1q_u64'
  __ret = (uint64x2_t) __builtin_neon_vld1q_v(__p0, 51); \
                                              ^~~~

There has been quite a bit of debate and triage that has gone into
figuring out what the proper fix is, viewable at the link below, which
is still ongoing. Ard suggested disabling this warning with Clang with a
pragma so no neon code will have this type of error. While this is not
at all an ideal solution, this build error is the only thing preventing
KernelCI from having successful arm64 defconfig and allmodconfig builds
on linux-next. Getting continuous integration running is more important
so new warnings/errors or boot failures can be caught and fixed quickly.

Link: https://github.com/ClangBuiltLinux/linux/issues/283Suggested-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
(cherry picked from commit 0738c8b5)
Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

e9fabe0c

arm64: crypto: add NEON accelerated XOR implementation · 2b99d865

由 Jackie Liu 提交于 8月 31, 2019

mainline inclusion
from mainline-5.0-rc1
commit: cc9f8349
category: feature
feature: NEON accelerated XOR
bugzilla: 11024
CVE: NA

--------------------------------------------------

This is a NEON acceleration method that can improve
performance by approximately 20%. I got the following
data from the centos 7.5 on Huawei's HISI1616 chip:

[ 93.837726] xor: measuring software checksum speed
[ 93.874039]   8regs  : 7123.200 MB/sec
[ 93.914038]   32regs : 7180.300 MB/sec
[ 93.954043]   arm64_neon: 9856.000 MB/sec
[ 93.954047] xor: using function: arm64_neon (9856.000 MB/sec)

I believe this code can bring some optimization for
all arm64 platform. thanks for Ard Biesheuvel's suggestions.
Signed-off-by: NJackie Liu <liuyun01@kylinos.cn>
Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

2b99d865

arm64/neon: add workaround for ambiguous C99 stdint.h types · 28777f93

由 Jackie Liu 提交于 8月 31, 2019

mainline inclusion
from mainline-5.0-rc1
commit: 21e28547
category: feature
feature: NEON accelerated XOR
bugzilla: 11024
CVE: NA

--------------------------------------------------

In a way similar to ARM commit 09096f6a ("ARM: 7822/1: add workaround
for ambiguous C99 stdint.h types"), this patch redefines the macros that
are used in stdint.h so its definitions of uint64_t and int64_t are
compatible with those of the kernel.

This patch comes from: https://patchwork.kernel.org/patch/3540001/
Wrote by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

We mark this file as a private file and don't have to override asm/types.h
Signed-off-by: NJackie Liu <liuyun01@kylinos.cn>
Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

28777f93

arm64: cpufeature: add feature for CRC32 instructions · 3a8984a9

由 Ard Biesheuvel 提交于 8月 31, 2019

mainline inclusion
from mainline-4.20-rc1
commit: 86d0dd34
category: feature
feature: accelerated crc32 routines
bugzilla: 13702
CVE: NA

--------------------------------------------------

Add a CRC32 feature bit and wire it up to the CPU id register so we
will be able to use alternatives patching for CRC32 operations.
Acked-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

Conflicts:
	arch/arm64/include/asm/cpucaps.h
	arch/arm64/kernel/cpufeature.c
Signed-off-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

3a8984a9

arm_pmu: acpi: spe: Add initial MADT/SPE probing · 09436382

由 Jeremy Linton 提交于 8月 19, 2019

mainline inclusion
from mainline-5.3-rc1
commit d24a0c70
category: feature
bugzilla: 16072
CVE: NA
---------------------------

ACPI 6.3 adds additional fields to the MADT GICC
structure to describe SPE PPI's. We pick these out
of the cached reference to the madt_gicc structure
similarly to the core PMU code. We then create a platform
device referring to the IRQ and let the user/module loader
decide whether to load the SPE driver.
Tested-by: NHanjun Gou <gouhanjun@huawei.com>
Reviewed-by: NSudeep Holla <sudeep.holla@arm.com>
Reviewed-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: NJeremy Linton <jeremy.linton@arm.com>
Signed-off-by: NHongbo Yao <yaohongbo@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

09436382

Revert "arm_pmu: acpi: spe: Add initial MADT/SPE probing" · d44538a7

由 Hongbo Yao 提交于 8月 19, 2019

hulk inclusion
category: feature
bugzilla: 16072
CVE: NA
---------------------------

This reverts commit 556b16f5ad7e910c3784bb02b33c2af6ca9c9a4b.
In Linux 5.3.0, SPE ACPI enablement has been upstreamed. SPE patches
in hulk-4.19 are the old version, and they need to be reverted
to the mainline version.
Signed-off-by: NHongbo Yao <yaohongbo@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

d44538a7

arm64: tlbflush: Ensure start/end of address range are aligned to stride · bcfdec50

由 Will Deacon 提交于 8月 13, 2019

mainline inclusion
from mainline-5.2
commit: 01d57485
category: feature
feature: Reduce synchronous TLB invalidation on ARM64
bugzilla: NA
CVE: NA

--------------------------------------------------

Since commit 3d65b6bb ("arm64: tlbi: Set MAX_TLBI_OPS to
PTRS_PER_PTE"), we resort to per-ASID invalidation when attempting to
perform more than PTRS_PER_PTE invalidation instructions in a single
call to __flush_tlb_range(). Whilst this is beneficial, the mmu_gather
code does not ensure that the end address of the range is rounded-up
to the stride when freeing intermediate page tables in pXX_free_tlb(),
which defeats our range checking.

Align the bounds passed into __flush_tlb_range().

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Reported-by: NHanjun Guo <guohanjun@huawei.com>
Tested-by: NHanjun Guo <guohanjun@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NHanjun Guo <guohanjun@huawei.com>
Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

bcfdec50

arm64: tlbi: Set MAX_TLBI_OPS to PTRS_PER_PTE · c19a7104

由 Will Deacon 提交于 8月 13, 2019

mainline inclusion
from mainline-4.21
commit: 3d65b6bb
category: feature
feature: Reduce synchronous TLB invalidation on ARM64
bugzilla: NA
CVE: NA

--------------------------------------------------

In order to reduce the possibility of soft lock-ups, we bound the
maximum number of TLBI operations performed by a single call to
flush_tlb_range() to an arbitrary constant of 1024.

Whilst this does the job of avoiding lock-ups, we can actually be a bit
smarter by defining this as PTRS_PER_PTE. Due to the structure of our
page tables, using PTRS_PER_PTE means that an outer loop calling
flush_tlb_range() for entire table entries will end up performing just a
single TLBI operation for each entry. As an example, mremap()ing a 1GB
range mapped using 4k pages now requires only 512 TLBI operations when
moving the page tables as opposed to 262144 operations (512*512) when
using the current threshold of 1024.

Cc: Joel Fernandes <joel@joelfernandes.org>
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NHanjun Guo <guohanjun@huawei.com>
Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

c19a7104

arm64: mm: Don't wait for completion of TLB invalidation when page aging · 0cd4b474

由 Alex Van Brunt 提交于 8月 13, 2019

mainline inclusion
from mainline-4.21
commit: 3403e56b
category: feature
feature: Reduce synchronous TLB invalidation on ARM64
bugzilla: NA
CVE: NA

--------------------------------------------------

When transitioning a PTE from young to old as part of page aging, we
can avoid waiting for the TLB invalidation to complete and therefore
drop the subsequent DSB instruction. Whilst this opens up a race with
page reclaim, where a PTE in active use via a stale, young TLB entry
does not update the underlying descriptor, the worst thing that happens
is that the page is reclaimed and then immediately faulted back in.

Given that we have a DSB in our context-switch path, the window for a
spurious reclaim is fairly limited and eliding the barrier claims to
boost NVMe/SSD accesses by over 10% on some platforms.

A similar optimisation was made for x86 in commit b13b1d2d ("x86/mm:
In the PTE swapout page reclaim case clear the accessed bit instead of
flushing the TLB").
Signed-off-by: NAlex Van Brunt <avanbrunt@nvidia.com>
Signed-off-by: NAshish Mhetre <amhetre@nvidia.com>
[will: rewrote patch]
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NHanjun Guo <guohanjun@huawei.com>
Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

0cd4b474

arm64: tlb: Rewrite stale comment in asm/tlbflush.h · d45ff356

由 Will Deacon 提交于 8月 13, 2019

mainline inclusion
from mainline-4.20-rc1
commit: 7f088727
category: feature
feature: Reduce synchronous TLB invalidation on ARM64
bugzilla: NA
CVE: NA

--------------------------------------------------

Peter Z asked me to justify the barrier usage in asm/tlbflush.h, but
actually that whole block comment needs to be rewritten.
Reported-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NHanjun Guo <guohanjun@huawei.com>
Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

d45ff356

arm64: tlb: Avoid synchronous TLBIs when freeing page tables · f51b9fe4

由 Will Deacon 提交于 8月 13, 2019

mainline inclusion
from mainline-4.20-rc1
commit: ace8cb75
category: feature
feature: Reduce synchronous TLB invalidation on ARM64
bugzilla: NA
CVE: NA

--------------------------------------------------

By selecting HAVE_RCU_TABLE_INVALIDATE, we can rely on tlb_flush() being
called if we fail to batch table pages for freeing. This in turn allows
us to postpone walk-cache invalidation until tlb_finish_mmu(), which
avoids lots of unnecessary DSBs and means we can shoot down the ASID if
the range is large enough.
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NHanjun Guo <guohanjun@huawei.com>
Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

f51b9fe4

arm64: tlb: Adjust stride and type of TLBI according to mmu_gather · 0943e8a3

由 Will Deacon 提交于 8月 13, 2019

mainline inclusion
from mainline-4.20-rc1
commit: f270ab88
category: feature
feature: Reduce synchronous TLB invalidation on ARM64
bugzilla: NA
CVE: NA

--------------------------------------------------

Now that the core mmu_gather code keeps track of both the levels of page
table cleared and also whether or not these entries correspond to
intermediate entries, we can use this in our tlb_flush() callback to
reduce the number of invalidations we issue as well as their scope.
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NHanjun Guo <guohanjun@huawei.com>
Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

0943e8a3

arm64: tlb: Remove redundant !CONFIG_HAVE_RCU_TABLE_FREE code · 37a73907

由 Will Deacon 提交于 8月 13, 2019

mainline inclusion
from mainline-4.20-rc1
commit: 07212cd4
category: feature
feature: Reduce synchronous TLB invalidation on ARM64
bugzilla: NA
CVE: NA

--------------------------------------------------

If there's one thing the RCU-based table freeing doesn't need, it's more
ifdeffery.

Remove the redundant !CONFIG_HAVE_RCU_TABLE_FREE code, since this option
is unconditionally selected in our Kconfig.
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NHanjun Guo <guohanjun@huawei.com>
Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

37a73907

arm64: tlbflush: Allow stride to be specified for __flush_tlb_range() · a730f628

由 Will Deacon 提交于 8月 13, 2019

mainline inclusion
from mainline-4.20-rc1
commit: 67a902ac
category: feature
feature: Reduce synchronous TLB invalidation on ARM64
bugzilla: NA
CVE: NA

--------------------------------------------------

When we are unmapping intermediate page-table entries or huge pages, we
don't need to issue a TLBI instruction for every PAGE_SIZE chunk in the
VA range being unmapped.

Allow the invalidation stride to be passed to __flush_tlb_range(), and
adjust our "just nuke the ASID" heuristic to take this into account.
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NHanjun Guo <guohanjun@huawei.com>
Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

a730f628

arm64: tlb: Justify non-leaf invalidation in flush_tlb_range() · fe23372e

由 Will Deacon 提交于 8月 13, 2019

mainline inclusion
from mainline-4.20-rc1
commit: d8289d3a
category: feature
feature: Reduce synchronous TLB invalidation on ARM64
bugzilla: NA
CVE: NA

--------------------------------------------------

Add a comment to explain why we can't get away with last-level
invalidation in flush_tlb_range()
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NHanjun Guo <guohanjun@huawei.com>
Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

fe23372e

arm64: pgtable: Implement p[mu]d_valid() and check in set_p[mu]d() · af14ffb3

由 Will Deacon 提交于 8月 13, 2019

mainline inclusion
from mainline-4.20-rc1
commit: 0795edaf
category: feature
feature: Reduce synchronous TLB invalidation on ARM64
bugzilla: NA
CVE: NA

--------------------------------------------------

Now that our walk-cache invalidation routines imply a DSB before the
invalidation, we no longer need one when we are clearing an entry during
unmap.
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NHanjun Guo <guohanjun@huawei.com>
Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

af14ffb3

arm64: tlb: Add DSB ISHST prior to TLBI in __flush_tlb_[kernel_]pgtable() · f8349f19

由 Will Deacon 提交于 8月 13, 2019

mainline inclusion
from mainline-4.20-rc1
commit: 45a284bc
category: feature
feature: Reduce synchronous TLB invalidation on ARM64
bugzilla: NA
CVE: NA

--------------------------------------------------

__flush_tlb_[kernel_]pgtable() rely on set_pXd() having a DSB after
writing the new table entry and therefore avoid the barrier prior to the
TLBI instruction.

In preparation for delaying our walk-cache invalidation on the unmap()
path, move the DSB into the TLB invalidation routines.
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NHanjun Guo <guohanjun@huawei.com>
Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

f8349f19

arm64: tlb: Use last-level invalidation in flush_tlb_kernel_range() · 7468f51e

由 Will Deacon 提交于 8月 13, 2019

mainline inclusion
from mainline-4.20-rc1
commit: 6899a4c8
category: feature
feature: Reduce synchronous TLB invalidation on ARM64
bugzilla: NA
CVE: NA

--------------------------------------------------

flush_tlb_kernel_range() is only ever used to invalidate last-level
entries, so we can restrict the scope of the TLB invalidation
instruction.
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NHanjun Guo <guohanjun@huawei.com>
Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

7468f51e

arm64: cpufeature: Fix feature comparison for CTR_EL0.{CWG, ERG} · b5a891ed

由 Will Deacon 提交于 8月 07, 2019

commit 147b9635 upstream.

If CTR_EL0.{CWG,ERG} are 0b0000 then they must be interpreted to have
their architecturally maximum values, which defeats the use of
FTR_HIGHER_SAFE when sanitising CPU ID registers on heterogeneous
machines.

Introduce FTR_HIGHER_OR_ZERO_SAFE so that these fields effectively
saturate at zero.

Fixes: 3c739b57 ("arm64: Keep track of CPU feature registers")
Cc: <stable@vger.kernel.org> # 4.4.x-
Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NWill Deacon <will@kernel.org>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

b5a891ed

arm64: compat: Provide definition for COMPAT_SIGMINSTKSZ · 5001e3d7

由 Will Deacon 提交于 8月 06, 2019

commit 24951465 upstream.

arch/arm/ defines a SIGMINSTKSZ of 2k, so we should use the same value
for compat tasks.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: NDave Martin <Dave.Martin@arm.com>
Reported-by: NSteve McIntyre <steve.mcintyre@arm.com>
Tested-by: NSteve McIntyre <93sam@debian.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

5001e3d7

arm64: assembler: Switch ESB-instruction with a vanilla nop if !ARM64_HAS_RAS · a9e2959c

由 James Morse 提交于 7月 31, 2019

[ Upstream commit 2b68a2a9 ]

The ESB-instruction is a nop on CPUs that don't implement the RAS
extensions. This lets us use it in places like the vectors without
having to use alternatives.

If someone disables CONFIG_ARM64_RAS_EXTN, this instruction still has
its RAS extensions behaviour, but we no longer read DISR_EL1 as this
register does depend on alternatives.

This could go wrong if we want to synchronize an SError from a KVM
guest. On a CPU that has the RAS extensions, but the KConfig option
was disabled, we consume the pending SError with no chance of ever
reading it.

Hide the ESB-instruction behind the CONFIG_ARM64_RAS_EXTN option,
outputting a regular nop if the feature has been disabled.
Reported-by: NJulien Thierry <julien.thierry@arm.com>
Signed-off-by: NJames Morse <james.morse@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

a9e2959c

arm64: cacheinfo: Update cache_line_size detected from DT or PPTT · 02c4be2f

由 Shaokun Zhang 提交于 6月 05, 2019

mainline inclusion
from mainline-v5.2
commit: 7b8c87b2
category: performance
bugzilla: NA
CVE: NA

--------------------------------------------------

Add coherency_max_size variable to record the maximum cache line size
cache_line_size is derived from CTR_EL0.CWG field and is called mostly
for I/O device drivers. For some platforms like the HiSilicon Kunpeng920
server SoC, cache line sizes are different between L1/2 cache and L3
cache while L1 cache line size is 64-byte and L3 is 128-byte, but
CTR_EL0.CWG is misreporting using L1 cache line size.

We shall correct the right value which is important for I/O performance.
Let's update the cache line size if it is detected from DT or PPTT
information.

Cc: Will Deacon <will.deacon@arm.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Zhenfa Qiu <qiuzhenfa@hisilicon.com>
Reported-by: NZhenfa Qiu <qiuzhenfa@hisilicon.com>
Suggested-by: NCatalin Marinas <catalin.marinas@arm.com>
Reviewed-by: NSudeep Holla <sudeep.holla@arm.com>
Signed-off-by: NShaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NShaokun Zhang <zhangshaokun@hisilicon.com>
Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

02c4be2f

arm64: insn: Fix ldadd instruction encoding · 7a38cb74

由 Jean-Philippe Brucker 提交于 7月 04, 2019

commit c5e2edeb upstream.

GCC 8.1.0 reports that the ldadd instruction encoding, recently added to
insn.c, doesn't match the mask and couldn't possibly be identified:

linux/arch/arm64/include/asm/insn.h: In function 'aarch64_insn_is_ldadd':
linux/arch/arm64/include/asm/insn.h:280:257: warning: bitwise comparison always evaluates to false [-Wtautological-compare]

Bits [31:30] normally encode the size of the instruction (1 to 8 bytes)
and the current instruction value only encodes the 4- and 8-byte
variants. At the moment only the BPF JIT needs this instruction, and
doesn't require the 1- and 2-byte variants, but to be consistent with
our other ldr and str instruction encodings, clear the size field in the
insn value.

Fixes: 34b8ab09 ("bpf, arm64: use more scalable stadd over ldxr / stxr loop in xadd")
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Reported-by: NKuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: NYoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: NJean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

7a38cb74

bpf, arm64: use more scalable stadd over ldxr / stxr loop in xadd · af7a5e01

由 Daniel Borkmann 提交于 7月 04, 2019

commit 34b8ab09 upstream.

Since ARMv8.1 supplement introduced LSE atomic instructions back in 2016,
lets add support for STADD and use that in favor of LDXR / STXR loop for
the XADD mapping if available. STADD is encoded as an alias for LDADD with
XZR as the destination register, therefore add LDADD to the instruction
encoder along with STADD as special case and use it in the JIT for CPUs
that advertise LSE atomics in CPUID register. If immediate offset in the
BPF XADD insn is 0, then use dst register directly instead of temporary
one.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJean-Philippe Brucker <jean-philippe.brucker@arm.com>
Acked-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

af7a5e01

arm64: futex: Avoid copying out uninitialised stack in failed cmpxchg() · 61deb9bf

由 Will Deacon 提交于 7月 04, 2019

commit 8e4e0ac0 upstream.

Returning an error code from futex_atomic_cmpxchg_inatomic() indicates
that the caller should not make any use of *uval, and should instead act
upon on the value of the error code. Although this is implemented
correctly in our futex code, we needlessly copy uninitialised stack to
*uval in the error case, which can easily be avoided.
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

61deb9bf

arm64: use the correct function type in SYSCALL_DEFINE0 · cd816efd

由 Sami Tolvanen 提交于 6月 22, 2019

[ Upstream commit 0e358bd7 ]

Although a syscall defined using SYSCALL_DEFINE0 doesn't accept
parameters, use the correct function type to avoid indirect call
type mismatches with Control-Flow Integrity checking.
Signed-off-by: NSami Tolvanen <samitolvanen@google.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

cd816efd

arm64: fix syscall_fn_t type · 02b9bacb

由 Sami Tolvanen 提交于 6月 22, 2019

[ Upstream commit 8ef8f368 ]

Syscall wrappers in <asm/syscall_wrapper.h> use const struct pt_regs *
as the argument type. Use const in syscall_fn_t as well to fix indirect
call type mismatches with Control-Flow Integrity checking.
Signed-off-by: NSami Tolvanen <samitolvanen@google.com>
Reviewed-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

02b9bacb

nmi_watchdog: add asm/nmi.h for ARM64 · 737c7d19

由 Xiongfeng Wang 提交于 6月 15, 2019

hulk inclusion
category: bugfix
bugzilla: 16572
CVE: NA

-------------------------------------------------

When I compile the kernel with CONFIG_SDEI_WATCHDOG chosen and without
CONFIG_HARDLOCKUP_DETECTOR, I got the following error.
./include/linux/nmi.h: In function 'touch_nmi_watchdog':
./include/linux/nmi.h:141:2: error: implicit declaration of function 'arch_touch_nmi_watchdog' [-Werror=implicit-function-declaration]
  arch_touch_nmi_watchdog();
  ^~~~~~~~~~~~~~~~~~~~~~~

It is because CONFIG_SDEI_WATCHDOG selects HAVE_NMI_WATCHDOG, but we
didn't provide arch_touch_nmi_watchdog for ARM64 in 'asm/nmi.h'. This
patch implements arch_touch_nmi_watchdog() for ARM64 when
CONFIG_HARDLOCKUP_DETECTOR is not chosen.
Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

737c7d19

arm64: vdso: Fix clock_getres() for CLOCK_REALTIME · 13486be4

由 Vincenzo Frascino 提交于 6月 03, 2019

mainline inclusion
from mainline-5.2
commit 81fb8736
category: bugfix
bugzilla: NA
CVE: NA

-------------------------------------------------

clock_getres() in the vDSO library has to preserve the same behaviour
of posix_get_hrtimer_res().

In particular, posix_get_hrtimer_res() does:

    sec = 0;
    ns = hrtimer_resolution;

where 'hrtimer_resolution' depends on whether or not high resolution
timers are enabled, which is a runtime decision.

The vDSO incorrectly returns the constant CLOCK_REALTIME_RES. Fix this
by exposing 'hrtimer_resolution' in the vDSO datapage and returning that
instead.
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NVincenzo Frascino <vincenzo.frascino@arm.com>
[will: Use WRITE_ONCE(), move adr off COARSE path, renumber labels, use 'w' reg]
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NXuefeng Wang <wxf.wang@hisilicon.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

13486be4

arm64: Fix compiler warning from pte_unmap() with -Wunused-but-set-variable · 42f58f2d

由 Qian Cai 提交于 6月 01, 2019

[ Upstream commit 74dd022f ]

When building with -Wunused-but-set-variable, the compiler shouts about
a number of pte_unmap() users, since this expands to an empty macro on
arm64:

  | mm/gup.c: In function 'gup_pte_range':
  | mm/gup.c:1727:16: warning: variable 'ptem' set but not used
  | [-Wunused-but-set-variable]
  | mm/gup.c: At top level:
  | mm/memory.c: In function 'copy_pte_range':
  | mm/memory.c:821:24: warning: variable 'orig_dst_pte' set but not used
  | [-Wunused-but-set-variable]
  | mm/memory.c:821:9: warning: variable 'orig_src_pte' set but not used
  | [-Wunused-but-set-variable]
  | mm/swap_state.c: In function 'swap_ra_info':
  | mm/swap_state.c:641:15: warning: variable 'orig_pte' set but not used
  | [-Wunused-but-set-variable]
  | mm/madvise.c: In function 'madvise_free_pte_range':
  | mm/madvise.c:318:9: warning: variable 'orig_pte' set but not used
  | [-Wunused-but-set-variable]

Rewrite pte_unmap() as a static inline function, which silences the
warnings.
Signed-off-by: NQian Cai <cai@lca.pw>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

42f58f2d

arm64: errata: Add workaround for Cortex-A76 erratum #1463225 · 2fb8821e

由 Will Deacon 提交于 6月 01, 2019

commit 969f5ea6 upstream.

Revisions of the Cortex-A76 CPU prior to r4p0 are affected by an erratum
that can prevent interrupts from being taken when single-stepping.

This patch implements a software workaround to prevent userspace from
effectively being able to disable interrupts.

Cc: <stable@vger.kernel.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Conflicts:
  arch/arm64/include/asm/cpucaps.h
[yyl: adjust context]
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

2fb8821e

arm64: arch_timer: Ensure counter register reads occur with seqlock held · bb2eb3bf

由 Will Deacon 提交于 5月 27, 2019

mainline inclusion
from mainline-5.2
commit 75a19a02
category: bugfix
bugzilla: NA
CVE: NA

-------------------------------------------------

When executing clock_gettime(), either in the vDSO or via a system call,
we need to ensure that the read of the counter register occurs within
the seqlock reader critical section. This ensures that updates to the
clocksource parameters (e.g. the multiplier) are consistent with the
counter value and therefore avoids the situation where time appears to
go backwards across multiple reads.

Extend the vDSO logic so that the seqlock critical section covers the
read of the counter register as well as accesses to the data page. Since
reads of the counter system registers are not ordered by memory barrier
instructions, introduce dependency ordering from the counter read to a
subsequent memory access so that the seqlock memory barriers apply to
the counter access in both the vDSO and the system call paths.

Cc: <stable@vger.kernel.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: NVincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://lore.kernel.org/linux-arm-kernel/alpine.DEB.2.21.1902081950260.1662@nanos.tec.linutronix.de/Reported-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NXuefeng Wang <wxf.wang@hisilicon.com>
Reviewed-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

bb2eb3bf

Revert "arm64: arch_timer: Disable CNTVCT_EL0 trap if workaround is enabled" · 8a34e087

由 Yang Yingliang 提交于 5月 27, 2019

hulk inclusion
category: performance
bugzilla: 16082
CVE: NA

-------------------------------------------------

This reverts commit 47819486652f2dc95ad1fe6a1a862a3c2971d657.
487f18a5f1cf ("arm64: vdso: do cntvct workaround in the VDSO") and
47819486652f ("arm64: arch_timer: Disable CNTVCT_EL0 trap if workaround is enabled")
are not needed for now. Apply these two patches later.
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

8a34e087

arm64: compat: Reduce address limit · eca3bfce

由 Vincenzo Frascino 提交于 5月 25, 2019

commit d2631193 upstream.

Currently, compat tasks running on arm64 can allocate memory up to
TASK_SIZE_32 (UL(0x100000000)).

This means that mmap() allocations, if we treat them as returning an
array, are not compliant with the sections 6.5.8 of the C standard
(C99) which states that: "If the expression P points to an element of
an array object and the expression Q points to the last element of the
same array object, the pointer expression Q+1 compares greater than P".

Redefine TASK_SIZE_32 to address the issue.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: <stable@vger.kernel.org>
Reported-by: NJann Horn <jannh@google.com>
Signed-off-by: NVincenzo Frascino <vincenzo.frascino@arm.com>
[will: fixed typo in comment]
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

eca3bfce

arm64: arch_timer: Disable CNTVCT_EL0 trap if workaround is enabled · 60b42e93

由 Yang Yingliang 提交于 5月 22, 2019

hulk inclusion
category: performance
bugzilla: 16082
CVE: NA

-------------------------------------------------

It costs very much time to read CNTVCT_EL0, if a cntvct workaround
and CNTVCT_EL0 trap is enabled. To decrease the read time, we disable
CNTVCT_EL0 trap and do the cntvct workaround in the VDSO by adding
vdso_fix.
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

60b42e93

arm64: mm: Pin down ASIDs for sharing mm with devices · d73dd486

由 Jean-Philippe Brucker 提交于 5月 18, 2019

hulk inclusion
category: feature
bugzilla: 14369
CVE: NA
-------------------

To enable address space sharing with the IOMMU, introduce mm_context_get()
and mm_context_put(), that pin down a context and ensure that it will keep
its ASID after a rollover.

Pinning is necessary because a device constantly needs a valid ASID,
unlike tasks that only require one when running. Without pinning, we would
need to notify the IOMMU when we're about to use a new ASID for a task,
and it would get complicated when a new task is assigned a shared ASID.
Consider the following scenario with no ASID pinned:

1. Task t1 is running on CPUx with shared ASID (gen=1, asid=1)
2. Task t2 is scheduled on CPUx, gets ASID (1, 2)
3. Task tn is scheduled on CPUy, a rollover occurs, tn gets ASID (2, 1)
   We would now have to immediately generate a new ASID for t1, notify
   the IOMMU, and finally enable task tn. We are holding the lock during
   all that time, since we can't afford having another CPU trigger a
   rollover. The IOMMU issues invalidation commands that can take tens of
   milliseconds.

It gets needlessly complicated. All we wanted to do was schedule task tn,
that has no business with the IOMMU. By letting the IOMMU pin tasks when
needed, we avoid stalling the slow path, and let the pinning fail when
we're out of shareable ASIDs.

After a rollover, the allocator expects at least one ASID to be available
in addition to the reserved ones (one per CPU). So (NR_ASIDS - NR_CPUS -
1) is the maximum number of ASIDs that can be shared with the IOMMU.

Cc: catalin.marinas@arm.com
Signed-off-by: NJean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: NFang Lijun <fanglijun3@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Reviewed-by: NZhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

d73dd486

mm: Add DDR and HBM to nodes by cmdline · ed485a0a

由 Lijun Fang 提交于 5月 13, 2019

euler inclusion
category: feature
bugzilla: 11082
CVE: NA
---------------------

When the kernel boot, we need to determine DDR or HBM,
and add them to nodes by parse cmdline, instead of memory hotplug.
Signed-off-by: NLijun Fang <fanglijun3@huawei.com>
Reviewed-by: Nzhong jiang <zhongjiang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

ed485a0a

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功