提交 · 11d04d0fdaa39936842ed3133eb94152082734be · openeuler / Kernel

27 12月, 2019 40 次提交

KVM: arm64: Don't write junk to sysregs on reset · 11d04d0f

由 Marc Zyngier 提交于 9月 02, 2019

[ Upstream commit 03fdfb26 ]

At the moment, the way we reset system registers is mildly insane:
We write junk to them, call the reset functions, and then check that
we have something else in them.

The "fun" thing is that this can happen while the guest is running
(PSCI, for example). If anything in KVM has to evaluate the state
of a system register while junk is in there, bad thing may happen.

Let's stop doing that. Instead, we track that we have called a
reset function for that register, and assume that the reset
function has done something. This requires fixing a couple of
sysreg refinition in the trap table.

In the end, the very need of this reset check is pretty dubious,
as it doesn't check everything (a lot of the sysregs leave outside of
the sys_regs[] array). It may well be axed in the near future.
Tested-by: NZenghui Yu <yuzenghui@huawei.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

11d04d0f

arm64: ftrace: Ensure module ftrace trampoline is coherent with I-side · c3e4cd3f

由 Will Deacon 提交于 9月 02, 2019

commit b6143d10d23ebb4a77af311e8b8b7f019d0163e6 upstream.

The initial support for dynamic ftrace trampolines in modules made use
of an indirect branch which loaded its target from the beginning of
a special section (e71a4e1b ("arm64: ftrace: add support for far
branches to dynamic ftrace")). Since no instructions were being patched,
no cache maintenance was needed. However, later in be0f272b ("arm64:
ftrace: emit ftrace-mod.o contents through code") this code was reworked
to output the trampoline instructions directly into the PLT entry but,
unfortunately, the necessary cache maintenance was overlooked.

Add a call to __flush_icache_range() after writing the new trampoline
instructions but before patching in the branch to the trampoline.

Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: James Morse <james.morse@arm.com>
Cc: <stable@vger.kernel.org>
Fixes: be0f272b ("arm64: ftrace: emit ftrace-mod.o contents through code")
Signed-off-by: NWill Deacon <will@kernel.org>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

c3e4cd3f

arm64: KVM: regmap: Fix unexpected switch fall-through · 63d4142f

由 Anders Roxell 提交于 9月 02, 2019

commit 3d584a3c upstream.

When fall-through warnings was enabled by default, commit d93512ef0f0e
("Makefile: Globally enable fall-through warning"), the following
warnings was starting to show up:

In file included from ../arch/arm64/include/asm/kvm_emulate.h:19,
                 from ../arch/arm64/kvm/regmap.c:13:
../arch/arm64/kvm/regmap.c: In function ‘vcpu_write_spsr32’:
../arch/arm64/include/asm/kvm_hyp.h:31:3: warning: this statement may fall
 through [-Wimplicit-fallthrough=]
   asm volatile(ALTERNATIVE(__msr_s(r##nvh, "%x0"), \
   ^~~
../arch/arm64/include/asm/kvm_hyp.h:46:31: note: in expansion of macro ‘write_sysreg_elx’
 #define write_sysreg_el1(v,r) write_sysreg_elx(v, r, _EL1, _EL12)
                               ^~~~~~~~~~~~~~~~
../arch/arm64/kvm/regmap.c:180:3: note: in expansion of macro ‘write_sysreg_el1’
   write_sysreg_el1(v, SYS_SPSR);
   ^~~~~~~~~~~~~~~~
../arch/arm64/kvm/regmap.c:181:2: note: here
  case KVM_SPSR_ABT:
  ^~~~
In file included from ../arch/arm64/include/asm/cputype.h:132,
                 from ../arch/arm64/include/asm/cache.h:8,
                 from ../include/linux/cache.h:6,
                 from ../include/linux/printk.h:9,
                 from ../include/linux/kernel.h:15,
                 from ../include/asm-generic/bug.h:18,
                 from ../arch/arm64/include/asm/bug.h:26,
                 from ../include/linux/bug.h:5,
                 from ../include/linux/mmdebug.h:5,
                 from ../include/linux/mm.h:9,
                 from ../arch/arm64/kvm/regmap.c:11:
../arch/arm64/include/asm/sysreg.h:837:2: warning: this statement may fall
 through [-Wimplicit-fallthrough=]
  asm volatile("msr " __stringify(r) ", %x0"  \
  ^~~
../arch/arm64/kvm/regmap.c:182:3: note: in expansion of macro ‘write_sysreg’
   write_sysreg(v, spsr_abt);
   ^~~~~~~~~~~~
../arch/arm64/kvm/regmap.c:183:2: note: here
  case KVM_SPSR_UND:
  ^~~~

Rework to add a 'break;' in the swich-case since it didn't have that,
leading to an interresting set of bugs.

Cc: stable@vger.kernel.org # v4.17+
Fixes: a8928195 ("KVM: arm64: Prepare to handle deferred save/restore of 32-bit registers")
Signed-off-by: NAnders Roxell <anders.roxell@linaro.org>
[maz: reworked commit message, fixed stable range]
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

63d4142f

arm64/mm: fix variable 'pud' set but not used · 2b0684b6

由 Qian Cai 提交于 9月 02, 2019

[ Upstream commit 7d4e2dcf ]

GCC throws a warning,

arch/arm64/mm/mmu.c: In function 'pud_free_pmd_page':
arch/arm64/mm/mmu.c:1033:8: warning: variable 'pud' set but not used
[-Wunused-but-set-variable]
  pud_t pud;
        ^~~

because pud_table() is a macro and compiled away. Fix it by making it a
static inline function and for pud_sect() as well.
Signed-off-by: NQian Cai <cai@lca.pw>
Signed-off-by: NWill Deacon <will@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

2b0684b6

arm64: unwind: Prohibit probing on return_address() · 1f6771f7

由 Masami Hiramatsu 提交于 9月 02, 2019

[ Upstream commit ee07b93e ]

Prohibit probing on return_address() and subroutines which
is called from return_address(), since the it is invoked from
trace_hardirqs_off() which is also kprobe blacklisted.
Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: NWill Deacon <will@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

1f6771f7

arm64/efi: fix variable 'si' set but not used · 88f03e4a

由 Qian Cai 提交于 9月 02, 2019

[ Upstream commit f1d48362 ]

GCC throws out this warning on arm64.

drivers/firmware/efi/libstub/arm-stub.c: In function 'efi_entry':
drivers/firmware/efi/libstub/arm-stub.c:132:22: warning: variable 'si'
set but not used [-Wunused-but-set-variable]

Fix it by making free_screen_info() a static inline function.
Acked-by: NWill Deacon <will@kernel.org>
Signed-off-by: NQian Cai <cai@lca.pw>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

88f03e4a

config: enable CONFIG_NUMA_AWARE_SPINLOCKS by default · 77a2fc07

由 Yang Yingliang 提交于 9月 02, 2019

hulk inclusion
category: feature
bugzilla: 13227
CVE: NA
---------------------------

enable CONFIG_NUMA_AWARE_SPINLOCKS in hulk_defconfig and
storage_ci_defconfig
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

77a2fc07

numa-aware qspinlock: using boot option to enable it · 1ca5d55d

由 Hanjun Guo 提交于 8月 31, 2019

hulk inclusion
category: feature
bugzilla: 13227
CVE: NA

-------------------------------------------------

Set numa-aware qspinlock default off and enable it by passing
using_numa_aware_qspinlock in the boot cmdline.
Signed-off-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

1ca5d55d

qspinlock: numaware: Add ARM64 support · e2ef2690

由 Hanjun Guo 提交于 8月 31, 2019

hulk inclusion
category: feature
bugzilla: 13227
CVE: NA

-------------------------------------------------

Enabling CNA is controlled via a new configuration option
(NUMA_AWARE_SPINLOCKS). Add it for arm64 support.
Signed-off-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

e2ef2690

arm64: Force SSBS on context switch · c2f1181d

由 Marc Zyngier 提交于 7月 22, 2019

mainline inclusion
from mainline-v5.3-rc2
commit cbdf8a189a66001c36007bf0f5c975d0376c5c3a
category: feature
bugzilla: 20806
CVE: NA

-------------------------------------------------

On a CPU that doesn't support SSBS, PSTATE[12] is RES0.  In a system
where only some of the CPUs implement SSBS, we end-up losing track of
the SSBS bit across task migration.

To address this issue, let's force the SSBS bit on context switch.

Fixes: 8f04e8e6e29c ("arm64: ssbd: Add support for PSTATE.SSBS rather than trapping to EL3")
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
[will: inverted logic and added comments]
Signed-off-by: NWill Deacon <will@kernel.org>
Conflicts:
  arch/arm64/kernel/process.c
[yyl: adjust context]
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

c2f1181d

arm64: fix SSBS sanitization · 0c708103

由 Mark Rutland 提交于 2月 15, 2019

mainline inclusion
from mainline-v5.0-rc8
commit f54dada8274643e3ff4436df0ea124aeedc43cae
category: feature
bugzilla: 20806
CVE: NA

-------------------------------------------------

In valid_user_regs() we treat SSBS as a RES0 bit, and consequently it is
unexpectedly cleared when we restore a sigframe or fiddle with GPRs via
ptrace.

This patch fixes valid_user_regs() to account for this, updating the
function to refer to the latest ARM ARM (ARM DDI 0487D.a). For AArch32
tasks, SSBS appears in bit 23 of SPSR_EL1, matching its position in the
AArch32-native PSR format, and we don't need to translate it as we have
to for DIT.

There are no other bit assignments that we need to account for today.
As the recent documentation describes the DIT bit, we can drop our
comment regarding DIT.

While removing SSBS from the RES0 masks, existing inconsistent
whitespace is corrected.

Fixes: d71be2b6c0e19180 ("arm64: cpufeature: Detect SSBS and advertise to userspace")
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

0c708103

arm64: cpu: Move errata and feature enable callbacks closer to callers · a37ad0cd

由 Will Deacon 提交于 8月 07, 2018

mainline inclusion
from mainline-v4.20-rc1
commit b8925ee2
category: feature
bugzilla: 20806
CVE: NA

-------------------------------------------------

The cpu errata and feature enable callbacks are only called via their
respective arm64_cpu_capabilities structure and therefore shouldn't
exist in the global namespace.

Move the PAN, RAS and cache maintenance emulation enable callbacks into
the same files as their corresponding arm64_cpu_capabilities structures,
making them static in the process.
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Conflicts:
  arch/arm64/kernel/cpu_errata.c
  arch/arm64/kernel/cpufeature.c
[yyl: adjust context]
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

a37ad0cd

KVM: arm64: Set SCTLR_EL2.DSSBS if SSBD is forcefully disabled and !vhe · 1eea4492

由 Will Deacon 提交于 8月 08, 2018

mainline inclusion
from mainline-v4.20-rc1
commit 7c36447a
category: feature
bugzilla: 20806
CVE: NA

-------------------------------------------------

When running without VHE, it is necessary to set SCTLR_EL2.DSSBS if SSBD
has been forcefully disabled on the kernel command-line.
Acked-by: NChristoffer Dall <christoffer.dall@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

1eea4492

arm64: ssbd: Add support for PSTATE.SSBS rather than trapping to EL3 · 09bb7481

由 Will Deacon 提交于 8月 07, 2018

mainline inclusion
from mainline-v4.20-rc1
commit 8f04e8e6
category: feature
bugzilla: 20806
CVE: NA

-------------------------------------------------

On CPUs with support for PSTATE.SSBS, the kernel can toggle the SSBD
state without needing to call into firmware.

This patch hooks into the existing SSBD infrastructure so that SSBS is
used on CPUs that support it, but it's all made horribly complicated by
the very real possibility of big/little systems that don't uniformly
provide the new capability.
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

Conflicts:
  arch/arm64/kernel/process.c
  arch/arm64/kernel/ssbd.c
  arch/arm64/kernel/cpufeature.c
[yyl: adjust context]
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

09bb7481

arm64: ssbd: Drop #ifdefs for PR_SPEC_STORE_BYPASS · 1438602c

由 Will Deacon 提交于 6月 15, 2018

mainline inclusion
from mainline-v4.20-rc1
commit 2d1b2a91d56b19636b740ea70c8399d1df249f20
category: feature
bugzilla: 20806
CVE: NA

-------------------------------------------------

Now that we're all merged nicely into mainline, there's no need to check
to see if PR_SPEC_STORE_BYPASS is defined.
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

1438602c

arm64: cpufeature: Detect SSBS and advertise to userspace · be185032

由 Will Deacon 提交于 6月 15, 2018

mainline inclusion
from mainline-v4.20-rc1
commit d71be2b6c0e19180b5f80a6d42039cc074a693a2
category: feature
bugzilla: 20806
CVE: NA

-------------------------------------------------

Armv8.5 introduces a new PSTATE bit known as Speculative Store Bypass
Safe (SSBS) which can be used as a mitigation against Spectre variant 4.

Additionally, a CPU may provide instructions to manipulate PSTATE.SSBS
directly, so that userspace can toggle the SSBS control without trapping
to the kernel.

This patch probes for the existence of SSBS and advertise the new instructions
to userspace if they exist.
Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Conflicts:
  arch/arm64/kernel/cpufeature.c
  arch/arm64/include/asm/cpucaps.h
[yyl: adjust context]
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

be185032

arm64: Fix silly typo in comment · 9f3c5929

由 Will Deacon 提交于 6月 15, 2018

mainline inclusion
from mainline-v4.20-rc1
commit ca7f686a
category: feature
bugzilla: 20806
CVE: NA

-------------------------------------------------

I was passing through and figuered I'd fix this up:

	featuer -> feature
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

9f3c5929

arm64/neon: Disable -Wincompatible-pointer-types when building with Clang · e9fabe0c

由 Nathan Chancellor 提交于 8月 31, 2019

mainline inclusion
from mainline-5.0
commit 0738c8b5
category: bugfix
bugzilla: 11024
CVE: NA

-------------------------------------------------

After commit cc9f8349 ("arm64: crypto: add NEON accelerated XOR
implementation"), Clang builds for arm64 started failing with the
following error message.

arch/arm64/lib/xor-neon.c:58:28: error: incompatible pointer types
assigning to 'const unsigned long *' from 'uint64_t *' (aka 'unsigned
long long *') [-Werror,-Wincompatible-pointer-types]
                v3 = veorq_u64(vld1q_u64(dp1 +  6), vld1q_u64(dp2 + 6));
                                         ^~~~~~~~
/usr/lib/llvm-9/lib/clang/9.0.0/include/arm_neon.h:7538:47: note:
expanded from macro 'vld1q_u64'
  __ret = (uint64x2_t) __builtin_neon_vld1q_v(__p0, 51); \
                                              ^~~~

There has been quite a bit of debate and triage that has gone into
figuring out what the proper fix is, viewable at the link below, which
is still ongoing. Ard suggested disabling this warning with Clang with a
pragma so no neon code will have this type of error. While this is not
at all an ideal solution, this build error is the only thing preventing
KernelCI from having successful arm64 defconfig and allmodconfig builds
on linux-next. Getting continuous integration running is more important
so new warnings/errors or boot failures can be caught and fixed quickly.

Link: https://github.com/ClangBuiltLinux/linux/issues/283Suggested-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
(cherry picked from commit 0738c8b5)
Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

e9fabe0c

arm64: crypto: add NEON accelerated XOR implementation · 2b99d865

由 Jackie Liu 提交于 8月 31, 2019

mainline inclusion
from mainline-5.0-rc1
commit: cc9f8349
category: feature
feature: NEON accelerated XOR
bugzilla: 11024
CVE: NA

--------------------------------------------------

This is a NEON acceleration method that can improve
performance by approximately 20%. I got the following
data from the centos 7.5 on Huawei's HISI1616 chip:

[ 93.837726] xor: measuring software checksum speed
[ 93.874039]   8regs  : 7123.200 MB/sec
[ 93.914038]   32regs : 7180.300 MB/sec
[ 93.954043]   arm64_neon: 9856.000 MB/sec
[ 93.954047] xor: using function: arm64_neon (9856.000 MB/sec)

I believe this code can bring some optimization for
all arm64 platform. thanks for Ard Biesheuvel's suggestions.
Signed-off-by: NJackie Liu <liuyun01@kylinos.cn>
Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

2b99d865

arm64/neon: add workaround for ambiguous C99 stdint.h types · 28777f93

由 Jackie Liu 提交于 8月 31, 2019

mainline inclusion
from mainline-5.0-rc1
commit: 21e28547
category: feature
feature: NEON accelerated XOR
bugzilla: 11024
CVE: NA

--------------------------------------------------

In a way similar to ARM commit 09096f6a ("ARM: 7822/1: add workaround
for ambiguous C99 stdint.h types"), this patch redefines the macros that
are used in stdint.h so its definitions of uint64_t and int64_t are
compatible with those of the kernel.

This patch comes from: https://patchwork.kernel.org/patch/3540001/
Wrote by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

We mark this file as a private file and don't have to override asm/types.h
Signed-off-by: NJackie Liu <liuyun01@kylinos.cn>
Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

28777f93

arm64/lib: improve CRC32 performance for deep pipelines · fce68364

由 Ard Biesheuvel 提交于 8月 31, 2019

mainline inclusion
from mainline-5.0
commit: efdb25efc7645b326cd5eb82be5feeabe167c24e
category: perf
bugzilla: 20886
CVE: NA

lib/crc32test result:

[root@localhost build]# rmmod crc32test && insmod lib/crc32test.ko &&
dmesg | grep cycles
[83170.153209] CPU7: use cycles 26243990
[83183.122137] CPU7: use cycles 26151290
[83309.691628] CPU7: use cycles 26122830
[83312.415559] CPU7: use cycles 26232600
[83313.191479] CPU8: use cycles 26082350

rmmod crc32test && insmod lib/crc32test.ko && dmesg | grep cycles
[ 1023.539931] CPU25: use cycles 12256730
[ 1024.850360] CPU24: use cycles 12249680
[ 1025.463622] CPU25: use cycles 12253330
[ 1025.862925] CPU25: use cycles 12269720
[ 1026.376038] CPU26: use cycles 12222480

Based on 13702:
arm64/lib: improve CRC32 performance for deep pipelines
crypto: arm64/crc32 - remove PMULL based CRC32 driver
arm64/lib: add accelerated crc32 routines
arm64: cpufeature: add feature for CRC32 instructions
lib/crc32: make core crc32() routines weak so they can be overridden

----------------------------------------------

Improve the performance of the crc32() asm routines by getting rid of
most of the branches and small sized loads on the common path.

Instead, use a branchless code path involving overlapping 16 byte
loads to process the first (length % 32) bytes, and process the
remainder using a loop that processes 32 bytes at a time.

Tested using the following test program:

  #include <stdlib.h>

  extern void crc32_le(unsigned short, char const*, int);

  int main(void)
  {
    static const char buf[4096];

    srand(20181126);

    for (int i = 0; i < 100 * 1000 * 1000; i++)
      crc32_le(0, buf, rand() % 1024);

    return 0;
  }

On Cortex-A53 and Cortex-A57, the performance regresses but only very
slightly. On Cortex-A72 however, the performance improves from

  $ time ./crc32

  real  0m10.149s
  user  0m10.149s
  sys   0m0.000s

to

  $ time ./crc32

  real  0m7.915s
  user  0m7.915s
  sys   0m0.000s

Cc: Rui Sun <sunrui26@huawei.com>
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

fce68364

crypto: arm64/crc32 - remove PMULL based CRC32 driver · 0e584ca5

由 Ard Biesheuvel 提交于 8月 31, 2019

mainline inclusion
from mainline-4.20-rc1
commit: 598b7d41
category: feature
feature: accelerated crc32 routines
bugzilla: 13702
CVE: NA

--------------------------------------------------

Now that the scalar fallbacks have been moved out of this driver into
the core crc32()/crc32c() routines, we are left with a CRC32 crypto API
driver for arm64 that is based only on 64x64 polynomial multiplication,
which is an optional instruction in the ARMv8 architecture, and is less
and less likely to be available on cores that do not also implement the
CRC32 instructions, given that those are mandatory in the architecture
as of ARMv8.1.

Since the scalar instructions do not require the special handling that
SIMD instructions do, and since they turn out to be considerably faster
on some cores (Cortex-A53) as well, there is really no point in keeping
this code around so let's just remove it.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

0e584ca5

arm64/lib: add accelerated crc32 routines · 11fa09f2

由 Ard Biesheuvel 提交于 8月 31, 2019

mainline inclusion
from mainline-4.20-rc1
commit: 7481cddf29ede204b475facc40e6f65459939881
category: feature
feature: accelerated crc32 routines
bugzilla: 13702
CVE: NA

--------------------------------------------------

Unlike crc32c(), which is wired up to the crypto API internally so the
optimal driver is selected based on the platform's capabilities,
crc32_le() is implemented as a library function using a slice-by-8 table
based C implementation. Even though few of the call sites may be
bottlenecks, calling a time variant implementation with a non-negligible
D-cache footprint is a bit of a waste, given that ARMv8.1 and up mandates
support for the CRC32 instructions that were optional in ARMv8.0, but are
already widely available, even on the Cortex-A53 based Raspberry Pi.

So implement routines that use these instructions if available, and fall
back to the existing generic routines otherwise. The selection is based
on alternatives patching.

Note that this unconditionally selects CONFIG_CRC32 as a builtin. Since
CRC32 is relied upon by core functionality such as CONFIG_OF_FLATTREE,
this just codifies the status quo.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

11fa09f2

arm64: cpufeature: add feature for CRC32 instructions · 3a8984a9

由 Ard Biesheuvel 提交于 8月 31, 2019

mainline inclusion
from mainline-4.20-rc1
commit: 86d0dd34eafffbc76a81aba6ae2d71927d3835a8
category: feature
feature: accelerated crc32 routines
bugzilla: 13702
CVE: NA

--------------------------------------------------

Add a CRC32 feature bit and wire it up to the CPU id register so we
will be able to use alternatives patching for CRC32 operations.
Acked-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

Conflicts:
	arch/arm64/include/asm/cpucaps.h
	arch/arm64/kernel/cpufeature.c
Signed-off-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

3a8984a9

config: enable CONFIG_KTASK in hulk_defconfig and storage_ci_defconfig · 58622cdd

由 Hongbo Yao 提交于 8月 30, 2019

hulk inclusion
category: feature
bugzilla: 13228
CVE: NA
---------------------------

enable CONFIG_KTASK in hulk_defconfig and storage_ci_defconfig
Signed-off-by: NHongbo Yao <yaohongbo@huawei.com>
Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

58622cdd

arm64: set all the CPU as present for dtb booting with 'CONFIG_ACPI' enabled · 015f6d7e

由 Xiongfeng Wang 提交于 8月 28, 2019

hulk inclusion
category: bugfix
bugzilla: 20799
CVE: NA
---------------------------

The following patch didn't consider the situation when we boot the
system using device tree but 'CONFIG_ACPI' is enabled. In this
situation, we also need to set all the CPU as present CPU.

Fixes: 280637f70ab5 ("arm64: mark all the GICC nodes in MADT as possible cpu")
Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

015f6d7e

arm64/numa: Report correct memblock range for the dummy node · 65246909

由 Anshuman Khandual 提交于 7月 24, 2019

mainline inclusion
from mainline-v4.20-rc5
commit 77cfe950
category: bugfix
bugzilla: 5611
CVE: NA

The dummy node ID is marked into all memory ranges on the system. So the
dummy node really extends the entire memblock.memory. Hence report correct
extent information for the dummy node using memblock range helper functions
instead of the range [0LLU, PFN_PHYS(max_pfn) - 1)].

Fixes: 1a2db300 ("arm64, numa: Add NUMA support for arm64 platforms")
Acked-by: NPunit Agrawal <punit.agrawal@arm.com>
Signed-off-by: NAnshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NXuefeng Wang <wxf.wang@hisilicon.com>
Reviewed-by: NZhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

65246909

bpf, arm64: fix getting subprog addr from aux for calls · f7dcdf17

由 Daniel Borkmann 提交于 7月 24, 2019

mainline inclusion
from mainline-v4.20-rc5
commit 8c11ea5c
category: bugfix
bugzilla: 5654
CVE: NA

The arm64 JIT has the same issue as ppc64 JIT in that the relative BPF
to BPF call offset can be too far away from core kernel in that relative
encoding into imm is not sufficient and could potentially be truncated,
see also fd045f6c ("arm64: add support for module PLTs") which adds
spill-over space for module_alloc() and therefore bpf_jit_binary_alloc().
Therefore, use the recently added bpf_jit_get_func_addr() helper for
properly fetching the address through prog->aux->func[off]->bpf_func
instead. This also has the benefit to optimize normal helper calls since
their address can use the optimized emission. Tested on Cavium ThunderX
CN8890.

Fixes: db496944 ("bpf: arm64: add JIT support for multi-function programs")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NXuefeng Wang <wxf.wang@hisilicon.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

f7dcdf17

pcie: enable hisi pcie dfx driver · 3cbc9ab3

由 liuyanshi 提交于 8月 23, 2019

driver inclusion
category: feature
bugzilla: NA
CVE: NA

enable hisi pcie debug driver for hiarmtool.
Signed-off-by: Nliuyanshi <liuyanshi@huawei.com>
Reviewed-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

3cbc9ab3

arm64: Add CPU hotplug support · 2875d85b

由 Xiongfeng Wang 提交于 8月 22, 2019

hulk inclusion
category: feature
bugzilla: 20208
CVE: NA
---------------------------

To support CPU hotplug, we need to implement 'acpi_(un)map_cpu()' and
'arch_(un)register_cpu()' for ARM64. These functions are called in
'acpi_processor_hotadd_init()/acpi_processor_remove()' when the CPU is hot
added into or hot removed from the system.

Note: This patch only support core hotplug and does not support socket
hotplug because we don't support live configuration of GIC.
Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Acked-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

2875d85b

arm64: mark all the GICC nodes in MADT as possible cpu · fbcc048a

由 Xiongfeng Wang 提交于 8月 22, 2019

hulk inclusion
category: feature
bugzilla: 20208
CVE: NA
---------------------------

We set 'cpu_possible_mask' based on the enabled GICC node in MADT. If
the GICC node is disabled, we will skip initializing the kernel data
structure for that CPU.

To support CPU hotplug, we need to initialize some CPU related data
structure in advance. This patch mark all the GICC nodes as possible CPU
and only these enabled GICC nodes as present CPU.
Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Acked-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

fbcc048a

config: disable CONFIG_ISCSI_IBFT by default · c5f7f11c

由 Yang Yingliang 提交于 8月 20, 2019

hulk inclusion
category: bugfix
bugzilla: 4979
CVE: NA

-------------------------------------------------

Disable CONFIG_ISCSI_IBFT by default.

c5f7f11c

arm_pmu: acpi: spe: Add initial MADT/SPE probing · 09436382

由 Jeremy Linton 提交于 8月 19, 2019

mainline inclusion
from mainline-5.3-rc1
commit d24a0c7099b3
category: feature
bugzilla: 16072
CVE: NA
---------------------------

ACPI 6.3 adds additional fields to the MADT GICC
structure to describe SPE PPI's. We pick these out
of the cached reference to the madt_gicc structure
similarly to the core PMU code. We then create a platform
device referring to the IRQ and let the user/module loader
decide whether to load the SPE driver.
Tested-by: NHanjun Gou <gouhanjun@huawei.com>
Reviewed-by: NSudeep Holla <sudeep.holla@arm.com>
Reviewed-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: NJeremy Linton <jeremy.linton@arm.com>
Signed-off-by: NHongbo Yao <yaohongbo@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

09436382

Revert "arm_pmu: acpi: spe: Add initial MADT/SPE probing" · d44538a7

由 Hongbo Yao 提交于 8月 19, 2019

hulk inclusion
category: feature
bugzilla: 16072
CVE: NA
---------------------------

This reverts commit 556b16f5ad7e910c3784bb02b33c2af6ca9c9a4b.
In Linux 5.3.0, SPE ACPI enablement has been upstreamed. SPE patches
in hulk-4.19 are the old version, and they need to be reverted
to the mainline version.
Signed-off-by: NHongbo Yao <yaohongbo@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

d44538a7

arm64: tlbflush: Ensure start/end of address range are aligned to stride · bcfdec50

由 Will Deacon 提交于 8月 13, 2019

mainline inclusion
from mainline-5.2
commit: 01d57485fcdb9f9101a10a18e32d5f8b023cab86
category: feature
feature: Reduce synchronous TLB invalidation on ARM64
bugzilla: NA
CVE: NA

--------------------------------------------------

Since commit 3d65b6bbc01e ("arm64: tlbi: Set MAX_TLBI_OPS to
PTRS_PER_PTE"), we resort to per-ASID invalidation when attempting to
perform more than PTRS_PER_PTE invalidation instructions in a single
call to __flush_tlb_range(). Whilst this is beneficial, the mmu_gather
code does not ensure that the end address of the range is rounded-up
to the stride when freeing intermediate page tables in pXX_free_tlb(),
which defeats our range checking.

Align the bounds passed into __flush_tlb_range().

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Reported-by: NHanjun Guo <guohanjun@huawei.com>
Tested-by: NHanjun Guo <guohanjun@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NHanjun Guo <guohanjun@huawei.com>
Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

bcfdec50

arm64: tlbi: Set MAX_TLBI_OPS to PTRS_PER_PTE · c19a7104

由 Will Deacon 提交于 8月 13, 2019

mainline inclusion
from mainline-4.21
commit: 3d65b6bbc01ecece8142e62a8a5f1d48ba41a240
category: feature
feature: Reduce synchronous TLB invalidation on ARM64
bugzilla: NA
CVE: NA

--------------------------------------------------

In order to reduce the possibility of soft lock-ups, we bound the
maximum number of TLBI operations performed by a single call to
flush_tlb_range() to an arbitrary constant of 1024.

Whilst this does the job of avoiding lock-ups, we can actually be a bit
smarter by defining this as PTRS_PER_PTE. Due to the structure of our
page tables, using PTRS_PER_PTE means that an outer loop calling
flush_tlb_range() for entire table entries will end up performing just a
single TLBI operation for each entry. As an example, mremap()ing a 1GB
range mapped using 4k pages now requires only 512 TLBI operations when
moving the page tables as opposed to 262144 operations (512*512) when
using the current threshold of 1024.

Cc: Joel Fernandes <joel@joelfernandes.org>
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NHanjun Guo <guohanjun@huawei.com>
Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

c19a7104

arm64: mm: Don't wait for completion of TLB invalidation when page aging · 0cd4b474

由 Alex Van Brunt 提交于 8月 13, 2019

mainline inclusion
from mainline-4.21
commit: 3403e56b
category: feature
feature: Reduce synchronous TLB invalidation on ARM64
bugzilla: NA
CVE: NA

--------------------------------------------------

When transitioning a PTE from young to old as part of page aging, we
can avoid waiting for the TLB invalidation to complete and therefore
drop the subsequent DSB instruction. Whilst this opens up a race with
page reclaim, where a PTE in active use via a stale, young TLB entry
does not update the underlying descriptor, the worst thing that happens
is that the page is reclaimed and then immediately faulted back in.

Given that we have a DSB in our context-switch path, the window for a
spurious reclaim is fairly limited and eliding the barrier claims to
boost NVMe/SSD accesses by over 10% on some platforms.

A similar optimisation was made for x86 in commit b13b1d2d ("x86/mm:
In the PTE swapout page reclaim case clear the accessed bit instead of
flushing the TLB").
Signed-off-by: NAlex Van Brunt <avanbrunt@nvidia.com>
Signed-off-by: NAshish Mhetre <amhetre@nvidia.com>
[will: rewrote patch]
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NHanjun Guo <guohanjun@huawei.com>
Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

0cd4b474

arm64: tlb: Rewrite stale comment in asm/tlbflush.h · d45ff356

由 Will Deacon 提交于 8月 13, 2019

mainline inclusion
from mainline-4.20-rc1
commit: 7f08872774eb971693ba79eeb2d4db364c9f5bfb
category: feature
feature: Reduce synchronous TLB invalidation on ARM64
bugzilla: NA
CVE: NA

--------------------------------------------------

Peter Z asked me to justify the barrier usage in asm/tlbflush.h, but
actually that whole block comment needs to be rewritten.
Reported-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NHanjun Guo <guohanjun@huawei.com>
Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

d45ff356

arm64: tlb: Avoid synchronous TLBIs when freeing page tables · f51b9fe4

由 Will Deacon 提交于 8月 13, 2019

mainline inclusion
from mainline-4.20-rc1
commit: ace8cb754539077ed75f3f15b77b2b51b5b7a431
category: feature
feature: Reduce synchronous TLB invalidation on ARM64
bugzilla: NA
CVE: NA

--------------------------------------------------

By selecting HAVE_RCU_TABLE_INVALIDATE, we can rely on tlb_flush() being
called if we fail to batch table pages for freeing. This in turn allows
us to postpone walk-cache invalidation until tlb_finish_mmu(), which
avoids lots of unnecessary DSBs and means we can shoot down the ASID if
the range is large enough.
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NHanjun Guo <guohanjun@huawei.com>
Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

f51b9fe4

arm64: tlb: Adjust stride and type of TLBI according to mmu_gather · 0943e8a3

由 Will Deacon 提交于 8月 13, 2019

mainline inclusion
from mainline-4.20-rc1
commit: f270ab88fdf205be1a7a46ccb61f4a343be543a2
category: feature
feature: Reduce synchronous TLB invalidation on ARM64
bugzilla: NA
CVE: NA

--------------------------------------------------

Now that the core mmu_gather code keeps track of both the levels of page
table cleared and also whether or not these entries correspond to
intermediate entries, we can use this in our tlb_flush() callback to
reduce the number of invalidations we issue as well as their scope.
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NHanjun Guo <guohanjun@huawei.com>
Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

0943e8a3

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功