- 27 12月, 2019 40 次提交
-
-
由 Will Deacon 提交于
mainline inclusion from mainline-v4.20-rc1 commit 8f04e8e6 category: feature bugzilla: 20806 CVE: NA ------------------------------------------------- On CPUs with support for PSTATE.SSBS, the kernel can toggle the SSBD state without needing to call into firmware. This patch hooks into the existing SSBD infrastructure so that SSBS is used on CPUs that support it, but it's all made horribly complicated by the very real possibility of big/little systems that don't uniformly provide the new capability. Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> Conflicts: arch/arm64/kernel/process.c arch/arm64/kernel/ssbd.c arch/arm64/kernel/cpufeature.c [yyl: adjust context] Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com> Reviewed-by: NHanjun Guo <guohanjun@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Will Deacon 提交于
mainline inclusion from mainline-v4.20-rc1 commit d71be2b6 category: feature bugzilla: 20806 CVE: NA ------------------------------------------------- Armv8.5 introduces a new PSTATE bit known as Speculative Store Bypass Safe (SSBS) which can be used as a mitigation against Spectre variant 4. Additionally, a CPU may provide instructions to manipulate PSTATE.SSBS directly, so that userspace can toggle the SSBS control without trapping to the kernel. This patch probes for the existence of SSBS and advertise the new instructions to userspace if they exist. Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> Conflicts: arch/arm64/kernel/cpufeature.c arch/arm64/include/asm/cpucaps.h [yyl: adjust context] Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com> Reviewed-by: NHanjun Guo <guohanjun@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Will Deacon 提交于
mainline inclusion from mainline-v4.20-rc1 commit ca7f686a category: feature bugzilla: 20806 CVE: NA ------------------------------------------------- I was passing through and figuered I'd fix this up: featuer -> feature Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com> Reviewed-by: NHanjun Guo <guohanjun@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Nathan Chancellor 提交于
mainline inclusion from mainline-5.0 commit 0738c8b5 category: bugfix bugzilla: 11024 CVE: NA ------------------------------------------------- After commit cc9f8349 ("arm64: crypto: add NEON accelerated XOR implementation"), Clang builds for arm64 started failing with the following error message. arch/arm64/lib/xor-neon.c:58:28: error: incompatible pointer types assigning to 'const unsigned long *' from 'uint64_t *' (aka 'unsigned long long *') [-Werror,-Wincompatible-pointer-types] v3 = veorq_u64(vld1q_u64(dp1 + 6), vld1q_u64(dp2 + 6)); ^~~~~~~~ /usr/lib/llvm-9/lib/clang/9.0.0/include/arm_neon.h:7538:47: note: expanded from macro 'vld1q_u64' __ret = (uint64x2_t) __builtin_neon_vld1q_v(__p0, 51); \ ^~~~ There has been quite a bit of debate and triage that has gone into figuring out what the proper fix is, viewable at the link below, which is still ongoing. Ard suggested disabling this warning with Clang with a pragma so no neon code will have this type of error. While this is not at all an ideal solution, this build error is the only thing preventing KernelCI from having successful arm64 defconfig and allmodconfig builds on linux-next. Getting continuous integration running is more important so new warnings/errors or boot failures can be caught and fixed quickly. Link: https://github.com/ClangBuiltLinux/linux/issues/283Suggested-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: NNathan Chancellor <natechancellor@gmail.com> Signed-off-by: NWill Deacon <will.deacon@arm.com> (cherry picked from commit 0738c8b5) Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com> Reviewed-by: NHanjun Guo <guohanjun@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Jackie Liu 提交于
mainline inclusion from mainline-5.0-rc1 commit: cc9f8349 category: feature feature: NEON accelerated XOR bugzilla: 11024 CVE: NA -------------------------------------------------- This is a NEON acceleration method that can improve performance by approximately 20%. I got the following data from the centos 7.5 on Huawei's HISI1616 chip: [ 93.837726] xor: measuring software checksum speed [ 93.874039] 8regs : 7123.200 MB/sec [ 93.914038] 32regs : 7180.300 MB/sec [ 93.954043] arm64_neon: 9856.000 MB/sec [ 93.954047] xor: using function: arm64_neon (9856.000 MB/sec) I believe this code can bring some optimization for all arm64 platform. thanks for Ard Biesheuvel's suggestions. Signed-off-by: NJackie Liu <liuyun01@kylinos.cn> Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NHanjun Guo <guohanjun@huawei.com> Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com> Reviewed-by: NHanjun Guo <guohanjun@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Jackie Liu 提交于
mainline inclusion from mainline-5.0-rc1 commit: 21e28547 category: feature feature: NEON accelerated XOR bugzilla: 11024 CVE: NA -------------------------------------------------- In a way similar to ARM commit 09096f6a ("ARM: 7822/1: add workaround for ambiguous C99 stdint.h types"), this patch redefines the macros that are used in stdint.h so its definitions of uint64_t and int64_t are compatible with those of the kernel. This patch comes from: https://patchwork.kernel.org/patch/3540001/ Wrote by: Ard Biesheuvel <ard.biesheuvel@linaro.org> We mark this file as a private file and don't have to override asm/types.h Signed-off-by: NJackie Liu <liuyun01@kylinos.cn> Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NHanjun Guo <guohanjun@huawei.com> Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com> Reviewed-by: NHanjun Guo <guohanjun@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Ard Biesheuvel 提交于
mainline inclusion from mainline-4.20-rc1 commit: 86d0dd34 category: feature feature: accelerated crc32 routines bugzilla: 13702 CVE: NA -------------------------------------------------- Add a CRC32 feature bit and wire it up to the CPU id register so we will be able to use alternatives patching for CRC32 operations. Acked-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> Conflicts: arch/arm64/include/asm/cpucaps.h arch/arm64/kernel/cpufeature.c Signed-off-by: NHanjun Guo <guohanjun@huawei.com> Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com> Reviewed-by: NHanjun Guo <guohanjun@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Jeremy Linton 提交于
mainline inclusion from mainline-5.3-rc1 commit d24a0c70 category: feature bugzilla: 16072 CVE: NA --------------------------- ACPI 6.3 adds additional fields to the MADT GICC structure to describe SPE PPI's. We pick these out of the cached reference to the madt_gicc structure similarly to the core PMU code. We then create a platform device referring to the IRQ and let the user/module loader decide whether to load the SPE driver. Tested-by: NHanjun Gou <gouhanjun@huawei.com> Reviewed-by: NSudeep Holla <sudeep.holla@arm.com> Reviewed-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: NJeremy Linton <jeremy.linton@arm.com> Signed-off-by: NHongbo Yao <yaohongbo@huawei.com> Reviewed-by: NHanjun Guo <guohanjun@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Hongbo Yao 提交于
hulk inclusion category: feature bugzilla: 16072 CVE: NA --------------------------- This reverts commit 556b16f5ad7e910c3784bb02b33c2af6ca9c9a4b. In Linux 5.3.0, SPE ACPI enablement has been upstreamed. SPE patches in hulk-4.19 are the old version, and they need to be reverted to the mainline version. Signed-off-by: NHongbo Yao <yaohongbo@huawei.com> Reviewed-by: NHanjun Guo <guohanjun@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Will Deacon 提交于
mainline inclusion from mainline-5.2 commit: 01d57485 category: feature feature: Reduce synchronous TLB invalidation on ARM64 bugzilla: NA CVE: NA -------------------------------------------------- Since commit 3d65b6bb ("arm64: tlbi: Set MAX_TLBI_OPS to PTRS_PER_PTE"), we resort to per-ASID invalidation when attempting to perform more than PTRS_PER_PTE invalidation instructions in a single call to __flush_tlb_range(). Whilst this is beneficial, the mmu_gather code does not ensure that the end address of the range is rounded-up to the stride when freeing intermediate page tables in pXX_free_tlb(), which defeats our range checking. Align the bounds passed into __flush_tlb_range(). Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Reported-by: NHanjun Guo <guohanjun@huawei.com> Tested-by: NHanjun Guo <guohanjun@huawei.com> Reviewed-by: NHanjun Guo <guohanjun@huawei.com> Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NHanjun Guo <guohanjun@huawei.com> Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Will Deacon 提交于
mainline inclusion from mainline-4.21 commit: 3d65b6bb category: feature feature: Reduce synchronous TLB invalidation on ARM64 bugzilla: NA CVE: NA -------------------------------------------------- In order to reduce the possibility of soft lock-ups, we bound the maximum number of TLBI operations performed by a single call to flush_tlb_range() to an arbitrary constant of 1024. Whilst this does the job of avoiding lock-ups, we can actually be a bit smarter by defining this as PTRS_PER_PTE. Due to the structure of our page tables, using PTRS_PER_PTE means that an outer loop calling flush_tlb_range() for entire table entries will end up performing just a single TLBI operation for each entry. As an example, mremap()ing a 1GB range mapped using 4k pages now requires only 512 TLBI operations when moving the page tables as opposed to 262144 operations (512*512) when using the current threshold of 1024. Cc: Joel Fernandes <joel@joelfernandes.org> Acked-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NHanjun Guo <guohanjun@huawei.com> Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Alex Van Brunt 提交于
mainline inclusion from mainline-4.21 commit: 3403e56b category: feature feature: Reduce synchronous TLB invalidation on ARM64 bugzilla: NA CVE: NA -------------------------------------------------- When transitioning a PTE from young to old as part of page aging, we can avoid waiting for the TLB invalidation to complete and therefore drop the subsequent DSB instruction. Whilst this opens up a race with page reclaim, where a PTE in active use via a stale, young TLB entry does not update the underlying descriptor, the worst thing that happens is that the page is reclaimed and then immediately faulted back in. Given that we have a DSB in our context-switch path, the window for a spurious reclaim is fairly limited and eliding the barrier claims to boost NVMe/SSD accesses by over 10% on some platforms. A similar optimisation was made for x86 in commit b13b1d2d ("x86/mm: In the PTE swapout page reclaim case clear the accessed bit instead of flushing the TLB"). Signed-off-by: NAlex Van Brunt <avanbrunt@nvidia.com> Signed-off-by: NAshish Mhetre <amhetre@nvidia.com> [will: rewrote patch] Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NHanjun Guo <guohanjun@huawei.com> Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Will Deacon 提交于
mainline inclusion from mainline-4.20-rc1 commit: 7f088727 category: feature feature: Reduce synchronous TLB invalidation on ARM64 bugzilla: NA CVE: NA -------------------------------------------------- Peter Z asked me to justify the barrier usage in asm/tlbflush.h, but actually that whole block comment needs to be rewritten. Reported-by: NPeter Zijlstra <peterz@infradead.org> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NHanjun Guo <guohanjun@huawei.com> Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Will Deacon 提交于
mainline inclusion from mainline-4.20-rc1 commit: ace8cb75 category: feature feature: Reduce synchronous TLB invalidation on ARM64 bugzilla: NA CVE: NA -------------------------------------------------- By selecting HAVE_RCU_TABLE_INVALIDATE, we can rely on tlb_flush() being called if we fail to batch table pages for freeing. This in turn allows us to postpone walk-cache invalidation until tlb_finish_mmu(), which avoids lots of unnecessary DSBs and means we can shoot down the ASID if the range is large enough. Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NHanjun Guo <guohanjun@huawei.com> Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Will Deacon 提交于
mainline inclusion from mainline-4.20-rc1 commit: f270ab88 category: feature feature: Reduce synchronous TLB invalidation on ARM64 bugzilla: NA CVE: NA -------------------------------------------------- Now that the core mmu_gather code keeps track of both the levels of page table cleared and also whether or not these entries correspond to intermediate entries, we can use this in our tlb_flush() callback to reduce the number of invalidations we issue as well as their scope. Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NHanjun Guo <guohanjun@huawei.com> Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Will Deacon 提交于
mainline inclusion from mainline-4.20-rc1 commit: 07212cd4 category: feature feature: Reduce synchronous TLB invalidation on ARM64 bugzilla: NA CVE: NA -------------------------------------------------- If there's one thing the RCU-based table freeing doesn't need, it's more ifdeffery. Remove the redundant !CONFIG_HAVE_RCU_TABLE_FREE code, since this option is unconditionally selected in our Kconfig. Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NHanjun Guo <guohanjun@huawei.com> Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Will Deacon 提交于
mainline inclusion from mainline-4.20-rc1 commit: 67a902ac category: feature feature: Reduce synchronous TLB invalidation on ARM64 bugzilla: NA CVE: NA -------------------------------------------------- When we are unmapping intermediate page-table entries or huge pages, we don't need to issue a TLBI instruction for every PAGE_SIZE chunk in the VA range being unmapped. Allow the invalidation stride to be passed to __flush_tlb_range(), and adjust our "just nuke the ASID" heuristic to take this into account. Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NHanjun Guo <guohanjun@huawei.com> Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Will Deacon 提交于
mainline inclusion from mainline-4.20-rc1 commit: d8289d3a category: feature feature: Reduce synchronous TLB invalidation on ARM64 bugzilla: NA CVE: NA -------------------------------------------------- Add a comment to explain why we can't get away with last-level invalidation in flush_tlb_range() Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NHanjun Guo <guohanjun@huawei.com> Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Will Deacon 提交于
mainline inclusion from mainline-4.20-rc1 commit: 0795edaf category: feature feature: Reduce synchronous TLB invalidation on ARM64 bugzilla: NA CVE: NA -------------------------------------------------- Now that our walk-cache invalidation routines imply a DSB before the invalidation, we no longer need one when we are clearing an entry during unmap. Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NHanjun Guo <guohanjun@huawei.com> Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Will Deacon 提交于
mainline inclusion from mainline-4.20-rc1 commit: 45a284bc category: feature feature: Reduce synchronous TLB invalidation on ARM64 bugzilla: NA CVE: NA -------------------------------------------------- __flush_tlb_[kernel_]pgtable() rely on set_pXd() having a DSB after writing the new table entry and therefore avoid the barrier prior to the TLBI instruction. In preparation for delaying our walk-cache invalidation on the unmap() path, move the DSB into the TLB invalidation routines. Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NHanjun Guo <guohanjun@huawei.com> Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Will Deacon 提交于
mainline inclusion from mainline-4.20-rc1 commit: 6899a4c8 category: feature feature: Reduce synchronous TLB invalidation on ARM64 bugzilla: NA CVE: NA -------------------------------------------------- flush_tlb_kernel_range() is only ever used to invalidate last-level entries, so we can restrict the scope of the TLB invalidation instruction. Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NHanjun Guo <guohanjun@huawei.com> Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Will Deacon 提交于
commit 147b9635 upstream. If CTR_EL0.{CWG,ERG} are 0b0000 then they must be interpreted to have their architecturally maximum values, which defeats the use of FTR_HIGHER_SAFE when sanitising CPU ID registers on heterogeneous machines. Introduce FTR_HIGHER_OR_ZERO_SAFE so that these fields effectively saturate at zero. Fixes: 3c739b57 ("arm64: Keep track of CPU feature registers") Cc: <stable@vger.kernel.org> # 4.4.x- Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com> Acked-by: NMark Rutland <mark.rutland@arm.com> Signed-off-by: NWill Deacon <will@kernel.org> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Will Deacon 提交于
commit 24951465 upstream. arch/arm/ defines a SIGMINSTKSZ of 2k, so we should use the same value for compat tasks. Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Oleg Nesterov <oleg@redhat.com> Reviewed-by: NDave Martin <Dave.Martin@arm.com> Reported-by: NSteve McIntyre <steve.mcintyre@arm.com> Tested-by: NSteve McIntyre <93sam@debian.org> Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 James Morse 提交于
[ Upstream commit 2b68a2a9 ] The ESB-instruction is a nop on CPUs that don't implement the RAS extensions. This lets us use it in places like the vectors without having to use alternatives. If someone disables CONFIG_ARM64_RAS_EXTN, this instruction still has its RAS extensions behaviour, but we no longer read DISR_EL1 as this register does depend on alternatives. This could go wrong if we want to synchronize an SError from a KVM guest. On a CPU that has the RAS extensions, but the KConfig option was disabled, we consume the pending SError with no chance of ever reading it. Hide the ESB-instruction behind the CONFIG_ARM64_RAS_EXTN option, outputting a regular nop if the feature has been disabled. Reported-by: NJulien Thierry <julien.thierry@arm.com> Signed-off-by: NJames Morse <james.morse@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Shaokun Zhang 提交于
mainline inclusion from mainline-v5.2 commit: 7b8c87b2 category: performance bugzilla: NA CVE: NA -------------------------------------------------- Add coherency_max_size variable to record the maximum cache line size cache_line_size is derived from CTR_EL0.CWG field and is called mostly for I/O device drivers. For some platforms like the HiSilicon Kunpeng920 server SoC, cache line sizes are different between L1/2 cache and L3 cache while L1 cache line size is 64-byte and L3 is 128-byte, but CTR_EL0.CWG is misreporting using L1 cache line size. We shall correct the right value which is important for I/O performance. Let's update the cache line size if it is detected from DT or PPTT information. Cc: Will Deacon <will.deacon@arm.com> Cc: Jeremy Linton <jeremy.linton@arm.com> Cc: Zhenfa Qiu <qiuzhenfa@hisilicon.com> Reported-by: NZhenfa Qiu <qiuzhenfa@hisilicon.com> Suggested-by: NCatalin Marinas <catalin.marinas@arm.com> Reviewed-by: NSudeep Holla <sudeep.holla@arm.com> Signed-off-by: NShaokun Zhang <zhangshaokun@hisilicon.com> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NShaokun Zhang <zhangshaokun@hisilicon.com> Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Jean-Philippe Brucker 提交于
commit c5e2edeb upstream. GCC 8.1.0 reports that the ldadd instruction encoding, recently added to insn.c, doesn't match the mask and couldn't possibly be identified: linux/arch/arm64/include/asm/insn.h: In function 'aarch64_insn_is_ldadd': linux/arch/arm64/include/asm/insn.h:280:257: warning: bitwise comparison always evaluates to false [-Wtautological-compare] Bits [31:30] normally encode the size of the instruction (1 to 8 bytes) and the current instruction value only encodes the 4- and 8-byte variants. At the moment only the BPF JIT needs this instruction, and doesn't require the 1- and 2-byte variants, but to be consistent with our other ldr and str instruction encodings, clear the size field in the insn value. Fixes: 34b8ab09 ("bpf, arm64: use more scalable stadd over ldxr / stxr loop in xadd") Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Reported-by: NKuninori Morimoto <kuninori.morimoto.gx@renesas.com> Signed-off-by: NYoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: NJean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Daniel Borkmann 提交于
commit 34b8ab09 upstream. Since ARMv8.1 supplement introduced LSE atomic instructions back in 2016, lets add support for STADD and use that in favor of LDXR / STXR loop for the XADD mapping if available. STADD is encoded as an alias for LDADD with XZR as the destination register, therefore add LDADD to the instruction encoder along with STADD as special case and use it in the JIT for CPUs that advertise LSE atomics in CPUID register. If immediate offset in the BPF XADD insn is 0, then use dst register directly instead of temporary one. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NJean-Philippe Brucker <jean-philippe.brucker@arm.com> Acked-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Will Deacon 提交于
commit 8e4e0ac0 upstream. Returning an error code from futex_atomic_cmpxchg_inatomic() indicates that the caller should not make any use of *uval, and should instead act upon on the value of the error code. Although this is implemented correctly in our futex code, we needlessly copy uninitialised stack to *uval in the error case, which can easily be avoided. Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Sami Tolvanen 提交于
[ Upstream commit 0e358bd7 ] Although a syscall defined using SYSCALL_DEFINE0 doesn't accept parameters, use the correct function type to avoid indirect call type mismatches with Control-Flow Integrity checking. Signed-off-by: NSami Tolvanen <samitolvanen@google.com> Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Sami Tolvanen 提交于
[ Upstream commit 8ef8f368 ] Syscall wrappers in <asm/syscall_wrapper.h> use const struct pt_regs * as the argument type. Use const in syscall_fn_t as well to fix indirect call type mismatches with Control-Flow Integrity checking. Signed-off-by: NSami Tolvanen <samitolvanen@google.com> Reviewed-by: NMark Rutland <mark.rutland@arm.com> Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Xiongfeng Wang 提交于
hulk inclusion category: bugfix bugzilla: 16572 CVE: NA ------------------------------------------------- When I compile the kernel with CONFIG_SDEI_WATCHDOG chosen and without CONFIG_HARDLOCKUP_DETECTOR, I got the following error. ./include/linux/nmi.h: In function 'touch_nmi_watchdog': ./include/linux/nmi.h:141:2: error: implicit declaration of function 'arch_touch_nmi_watchdog' [-Werror=implicit-function-declaration] arch_touch_nmi_watchdog(); ^~~~~~~~~~~~~~~~~~~~~~~ It is because CONFIG_SDEI_WATCHDOG selects HAVE_NMI_WATCHDOG, but we didn't provide arch_touch_nmi_watchdog for ARM64 in 'asm/nmi.h'. This patch implements arch_touch_nmi_watchdog() for ARM64 when CONFIG_HARDLOCKUP_DETECTOR is not chosen. Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Vincenzo Frascino 提交于
mainline inclusion from mainline-5.2 commit 81fb8736 category: bugfix bugzilla: NA CVE: NA ------------------------------------------------- clock_getres() in the vDSO library has to preserve the same behaviour of posix_get_hrtimer_res(). In particular, posix_get_hrtimer_res() does: sec = 0; ns = hrtimer_resolution; where 'hrtimer_resolution' depends on whether or not high resolution timers are enabled, which is a runtime decision. The vDSO incorrectly returns the constant CLOCK_REALTIME_RES. Fix this by exposing 'hrtimer_resolution' in the vDSO datapage and returning that instead. Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NVincenzo Frascino <vincenzo.frascino@arm.com> [will: Use WRITE_ONCE(), move adr off COARSE path, renumber labels, use 'w' reg] Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NXuefeng Wang <wxf.wang@hisilicon.com> Reviewed-by: NHanjun Guo <guohanjun@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Qian Cai 提交于
[ Upstream commit 74dd022f ] When building with -Wunused-but-set-variable, the compiler shouts about a number of pte_unmap() users, since this expands to an empty macro on arm64: | mm/gup.c: In function 'gup_pte_range': | mm/gup.c:1727:16: warning: variable 'ptem' set but not used | [-Wunused-but-set-variable] | mm/gup.c: At top level: | mm/memory.c: In function 'copy_pte_range': | mm/memory.c:821:24: warning: variable 'orig_dst_pte' set but not used | [-Wunused-but-set-variable] | mm/memory.c:821:9: warning: variable 'orig_src_pte' set but not used | [-Wunused-but-set-variable] | mm/swap_state.c: In function 'swap_ra_info': | mm/swap_state.c:641:15: warning: variable 'orig_pte' set but not used | [-Wunused-but-set-variable] | mm/madvise.c: In function 'madvise_free_pte_range': | mm/madvise.c:318:9: warning: variable 'orig_pte' set but not used | [-Wunused-but-set-variable] Rewrite pte_unmap() as a static inline function, which silences the warnings. Signed-off-by: NQian Cai <cai@lca.pw> Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Will Deacon 提交于
commit 969f5ea6 upstream. Revisions of the Cortex-A76 CPU prior to r4p0 are affected by an erratum that can prevent interrupts from being taken when single-stepping. This patch implements a software workaround to prevent userspace from effectively being able to disable interrupts. Cc: <stable@vger.kernel.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Conflicts: arch/arm64/include/asm/cpucaps.h [yyl: adjust context] Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Will Deacon 提交于
mainline inclusion from mainline-5.2 commit 75a19a02 category: bugfix bugzilla: NA CVE: NA ------------------------------------------------- When executing clock_gettime(), either in the vDSO or via a system call, we need to ensure that the read of the counter register occurs within the seqlock reader critical section. This ensures that updates to the clocksource parameters (e.g. the multiplier) are consistent with the counter value and therefore avoids the situation where time appears to go backwards across multiple reads. Extend the vDSO logic so that the seqlock critical section covers the read of the counter register as well as accesses to the data page. Since reads of the counter system registers are not ordered by memory barrier instructions, introduce dependency ordering from the counter read to a subsequent memory access so that the seqlock memory barriers apply to the counter access in both the vDSO and the system call paths. Cc: <stable@vger.kernel.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Tested-by: NVincenzo Frascino <vincenzo.frascino@arm.com> Link: https://lore.kernel.org/linux-arm-kernel/alpine.DEB.2.21.1902081950260.1662@nanos.tec.linutronix.de/Reported-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NXuefeng Wang <wxf.wang@hisilicon.com> Reviewed-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Yang Yingliang 提交于
hulk inclusion category: performance bugzilla: 16082 CVE: NA ------------------------------------------------- This reverts commit 47819486652f2dc95ad1fe6a1a862a3c2971d657. 487f18a5f1cf ("arm64: vdso: do cntvct workaround in the VDSO") and 47819486652f ("arm64: arch_timer: Disable CNTVCT_EL0 trap if workaround is enabled") are not needed for now. Apply these two patches later. Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Vincenzo Frascino 提交于
commit d2631193 upstream. Currently, compat tasks running on arm64 can allocate memory up to TASK_SIZE_32 (UL(0x100000000)). This means that mmap() allocations, if we treat them as returning an array, are not compliant with the sections 6.5.8 of the C standard (C99) which states that: "If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than P". Redefine TASK_SIZE_32 to address the issue. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Jann Horn <jannh@google.com> Cc: <stable@vger.kernel.org> Reported-by: NJann Horn <jannh@google.com> Signed-off-by: NVincenzo Frascino <vincenzo.frascino@arm.com> [will: fixed typo in comment] Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Yang Yingliang 提交于
hulk inclusion category: performance bugzilla: 16082 CVE: NA ------------------------------------------------- It costs very much time to read CNTVCT_EL0, if a cntvct workaround and CNTVCT_EL0 trap is enabled. To decrease the read time, we disable CNTVCT_EL0 trap and do the cntvct workaround in the VDSO by adding vdso_fix. Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Jean-Philippe Brucker 提交于
hulk inclusion category: feature bugzilla: 14369 CVE: NA ------------------- To enable address space sharing with the IOMMU, introduce mm_context_get() and mm_context_put(), that pin down a context and ensure that it will keep its ASID after a rollover. Pinning is necessary because a device constantly needs a valid ASID, unlike tasks that only require one when running. Without pinning, we would need to notify the IOMMU when we're about to use a new ASID for a task, and it would get complicated when a new task is assigned a shared ASID. Consider the following scenario with no ASID pinned: 1. Task t1 is running on CPUx with shared ASID (gen=1, asid=1) 2. Task t2 is scheduled on CPUx, gets ASID (1, 2) 3. Task tn is scheduled on CPUy, a rollover occurs, tn gets ASID (2, 1) We would now have to immediately generate a new ASID for t1, notify the IOMMU, and finally enable task tn. We are holding the lock during all that time, since we can't afford having another CPU trigger a rollover. The IOMMU issues invalidation commands that can take tens of milliseconds. It gets needlessly complicated. All we wanted to do was schedule task tn, that has no business with the IOMMU. By letting the IOMMU pin tasks when needed, we avoid stalling the slow path, and let the pinning fail when we're out of shareable ASIDs. After a rollover, the allocator expects at least one ASID to be available in addition to the reserved ones (one per CPU). So (NR_ASIDS - NR_CPUS - 1) is the maximum number of ASIDs that can be shared with the IOMMU. Cc: catalin.marinas@arm.com Signed-off-by: NJean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: NFang Lijun <fanglijun3@huawei.com> Reviewed-by: NHanjun Guo <guohanjun@huawei.com> Reviewed-by: NZhen Lei <thunder.leizhen@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Lijun Fang 提交于
euler inclusion category: feature bugzilla: 11082 CVE: NA --------------------- When the kernel boot, we need to determine DDR or HBM, and add them to nodes by parse cmdline, instead of memory hotplug. Signed-off-by: NLijun Fang <fanglijun3@huawei.com> Reviewed-by: Nzhong jiang <zhongjiang@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-