提交 · a080e323be8d66415944ad862fcf750825f871e7 · openeuler / Kernel

20 12月, 2021 2 次提交

KVM: arm64: Fix comment for kvm_reset_vcpu() · a080e323

由 Fuad Tabba 提交于 12月 08, 2021

The comment for kvm_reset_vcpu() refers to the sysreg table as
being the table above, probably because of the code extracted at
commit f4672752 ("arm64: KVM: virtual CPU reset").

Fix the comment to remove the potentially confusing reference.
Signed-off-by: NFuad Tabba <tabba@google.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211208193257.667613-2-tabba@google.com

a080e323

KVM: arm64: Use defined value for SCTLR_ELx_EE · 500ca524

由 Fuad Tabba 提交于 12月 08, 2021

Replace the hardcoded value with the existing definition.

No functional change intended.
Signed-off-by: NFuad Tabba <tabba@google.com>
Acked-by: NWill Deacon <will@kernel.org>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211208192810.657360-1-tabba@google.com

500ca524

17 12月, 2021 1 次提交

KVM: arm64: Rework kvm_pgtable initialisation · 9d8604b2

由 Marc Zyngier 提交于 11月 29, 2021

Ganapatrao reported that the kvm_pgtable->mmu pointer is more or
less hardcoded to the main S2 mmu structure, while the nested
code needs it to point to other instances (as we have one instance
per nested context).

Rework the initialisation of the kvm_pgtable structure so that
this assumtion doesn't hold true anymore. This requires some
minor changes to the order in which things are initialised
(the mmu->arch pointer being the critical one).
Reported-by: NGanapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Reviewed-by: NGanapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211129200150.351436-5-maz@kernel.org

9d8604b2

08 12月, 2021 1 次提交

KVM: arm64: Drop unused workaround_flags vcpu field · 142ff9bd

由 Marc Zyngier 提交于 12月 08, 2021

workaround_flags is a leftover from our earlier Spectre-v4 workaround
implementation, and now serves no purpose.

Get rid of the field and the corresponding asm-offset definition.

Fixes: 29e8910a ("KVM: arm64: Simplify handling of ARCH_WORKAROUND_2")
Signed-off-by: NMarc Zyngier <maz@kernel.org>

142ff9bd

06 12月, 2021 1 次提交

KVM: arm64: Constify kvm_io_gic_ops · 636dcd02

由 Rikard Falkeborn 提交于 12月 04, 2021

The only usage of kvm_io_gic_ops is to make a comparison with its
address and to pass its address to kvm_iodevice_init() which takes a
pointer to const kvm_io_device_ops as input. Make it const to allow the
compiler to put it in read-only memory.
Signed-off-by: NRikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211204213518.83642-1-rikard.falkeborn@gmail.com

636dcd02

01 12月, 2021 1 次提交

KVM: arm64: Add minimal handling for the ARMv8.7 PMU · 00e228b3

由 Marc Zyngier 提交于 11月 26, 2021

When running a KVM guest hosted on an ARMv8.7 machine, the host
kernel complains that it doesn't know about the architected number
of events.

Fix it by adding the PMUver code corresponding to PMUv3 for ARMv8.7.
Reviewed-by: NAlexandru Elisei <alexandru.elisei@arm.com>
Tested-by: NAlexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211126115533.217903-1-maz@kernel.org

00e228b3

25 11月, 2021 1 次提交

arm64: dts: exynos: drop samsung,ufs-shareability-reg-offset in ExynosAutov9 · 5fe76251

由 Chanho Park 提交于 11月 24, 2021

samsung,ufs-shareability-reg-offset is not necessary anymore since it
was integrated into the second argument of samsung,sysreg.

Fixes: 31bbac52 ("arm64: dts: exynos: add initial support for exynosautov9 SoC")
Signed-off-by: NChanho Park <chanho61.park@samsung.com>
Signed-off-by: NKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Link: https://lore.kernel.org/r/20211102064826.15796-1-chanho61.park@samsung.com
Link: https://lore.kernel.org/r/20211124085042.9649-2-krzysztof.kozlowski@canonical.com'
Signed-off-by: NArnd Bergmann <arnd@arndb.de>

5fe76251

24 11月, 2021 1 次提交

arm64: uaccess: avoid blocking within critical sections · 94902d84

由 Mark Rutland 提交于 11月 22, 2021

As Vincent reports in:

  https://lore.kernel.org/r/20211118163417.21617-1-vincent.whitchurch@axis.com

The put_user() in schedule_tail() can get stuck in a livelock, similar
to a problem recently fixed on riscv in commit:

  285a76bb ("riscv: evaluate put_user() arg before enabling user access")

In __raw_put_user() we have a critical section between
uaccess_ttbr0_enable() and uaccess_ttbr0_disable() where we cannot
safely call into the scheduler without having taken an exception, as
schedule() and other scheduling functions will not save/restore the
TTBR0 state. If either of the `x` or `ptr` arguments to __raw_put_user()
contain a blocking call, we may call into the scheduler within the
critical section. This can result in two problems:

1) The access within the critical section will occur without the
   required TTBR0 tables installed. This will fault, and where the
   required tables permit access, the access will be retried without the
   required tables, resulting in a livelock.

2) When TTBR0 SW PAN is in use, check_and_switch_context() does not
   modify TTBR0, leaving a stale value installed. The mappings of the
   blocked task will erroneously be accessible to regular accesses in
   the context of the new task. Additionally, if the tables are
   subsequently freed, local TLB maintenance required to reuse the ASID
   may be lost, potentially resulting in TLB corruption (e.g. in the
   presence of CnP).

The same issue exists for __raw_get_user() in the critical section
between uaccess_ttbr0_enable() and uaccess_ttbr0_disable().

A similar issue exists for __get_kernel_nofault() and
__put_kernel_nofault() for the critical section between
__uaccess_enable_tco_async() and __uaccess_disable_tco_async(), as the
TCO state is not context-switched by direct calls into the scheduler.
Here the TCO state may be lost from the context of the current task,
resulting in unexpected asynchronous tag check faults. It may also be
leaked to another task, suppressing expected tag check faults.

To fix all of these cases, we must ensure that we do not directly call
into the scheduler in their respective critical sections. This patch
reworks __raw_put_user(), __raw_get_user(), __get_kernel_nofault(), and
__put_kernel_nofault(), ensuring that parameters are evaluated outside
of the critical sections. To make this requirement clear, comments are
added describing the problem, and line spaces added to separate the
critical sections from other portions of the macros.

For __raw_get_user() and __raw_put_user() the `err` parameter is
conditionally assigned to, and we must currently evaluate this in the
critical section. This behaviour is relied upon by the signal code,
which uses chains of put_user_error() and get_user_error(), checking the
return value at the end. In all cases, the `err` parameter is a plain
int rather than a more complex expression with a blocking call, so this
is safe.

In future we should try to clean up the `err` usage to remove the
potential for this to be a problem.

Aside from the changes to time of evaluation, there should be no
functional change as a result of this patch.
Reported-by: NVincent Whitchurch <vincent.whitchurch@axis.com>
Link: https://lore.kernel.org/r/20211118163417.21617-1-vincent.whitchurch@axis.com
Fixes: f253d827 ("arm64: uaccess: refactor __{get,put}_user")
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20211122125820.55286-1-mark.rutland@arm.comSigned-off-by: NWill Deacon <will@kernel.org>

94902d84

18 11月, 2021 1 次提交

KVM: arm64: Cap KVM_CAP_NR_VCPUS by kvm_arm_default_max_vcpus() · f60a00d7

由 Vitaly Kuznetsov 提交于 11月 16, 2021

Generally, it doesn't make sense to return the recommended maximum number
of vCPUs which exceeds the maximum possible number of vCPUs.

Note: ARM64 is special as the value returned by KVM_CAP_MAX_VCPUS differs
depending on whether it is a system-wide ioctl or a per-VM one. Previously,
KVM_CAP_NR_VCPUS didn't have this difference and it seems preferable to
keep the status quo. Cap KVM_CAP_NR_VCPUS by kvm_arm_default_max_vcpus()
which is what gets returned by system-wide KVM_CAP_MAX_VCPUS.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211116163443.88707-2-vkuznets@redhat.com>
Acked-by: NMarc Zyngier <maz@kernel.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f60a00d7

16 11月, 2021 2 次提交

arm64: mm: Fix VM_BUG_ON(mm != &init_mm) for trans_pgd · d3eb70ea

由 Pingfan Liu 提交于 11月 12, 2021

trans_pgd_create_copy() can hit "VM_BUG_ON(mm != &init_mm)" in the
function pmd_populate_kernel().

This is the combined consequence of commit 5de59884 ("arm64:
trans_pgd: pass NULL instead of init_mm to *_populate functions"), which
replaced &init_mm with NULL and commit 59511cfd ("arm64: mm: use XN
table mapping attributes for user/kernel mappings"), which introduced
the VM_BUG_ON.

Since the former sounds reasonable, it is better to work on the later.
From the perspective of trans_pgd, two groups of functions are
considered in the later one:

  pmd_populate_kernel()
    mm == NULL should be fixed, else it hits VM_BUG_ON()
  p?d_populate()
    mm == NULL means PXN, that is OK, since trans_pgd only copies a
    linear map, no execution will happen on the map.

So it is good enough to just relax VM_BUG_ON() to disregard mm == NULL

Fixes: 59511cfd ("arm64: mm: use XN table mapping attributes for user/kernel mappings")
Signed-off-by: NPingfan Liu <kernelfans@gmail.com>
Cc: <stable@vger.kernel.org> # 5.13.x
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Matthias Brugger <mbrugger@suse.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Reviewed-by: NPasha Tatashin <pasha.tatashin@soleen.com>
Link: https://lore.kernel.org/r/20211112052214.9086-1-kernelfans@gmail.comSigned-off-by: NWill Deacon <will@kernel.org>

d3eb70ea

arm64: ftrace: use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR · c6d3cd32

由 Mark Rutland 提交于 10月 29, 2021

When CONFIG_FUNCTION_GRAPH_TRACER is selected and the function graph
tracer is in use, unwind_frame() may erroneously associate a traced
function with an incorrect return address. This can happen when starting
an unwind from a pt_regs, or when unwinding across an exception
boundary.

This can be seen when recording with perf while the function graph
tracer is in use. For example:

| # echo function_graph > /sys/kernel/debug/tracing/current_tracer
| # perf record -g -e raw_syscalls:sys_enter:k /bin/true
| # perf report

... reports the callchain erroneously as:

| el0t_64_sync
| el0t_64_sync_handler
| el0_svc_common.constprop.0
| perf_callchain
| get_perf_callchain
| syscall_trace_enter
| syscall_trace_enter

... whereas when the function graph tracer is not in use, it reports:

| el0t_64_sync
| el0t_64_sync_handler
| el0_svc
| do_el0_svc
| el0_svc_common.constprop.0
| syscall_trace_enter
| syscall_trace_enter

The underlying problem is that ftrace_graph_get_ret_stack() takes an
index offset from the most recent entry added to the fgraph return
stack. We start an unwind at offset 0, and increment the offset each
time we encounter a rewritten return address (i.e. when we see
`return_to_handler`). This is broken in two cases:

1) Between creating a pt_regs and starting the unwind, function calls
   may place entries on the stack, leaving an arbitrary offset which we
   can only determine by performing a full unwind from the caller of the
   unwind code (and relying on none of the unwind code being
   instrumented).

   This can result in erroneous entries being reported in a backtrace
   recorded by perf or kfence when the function graph tracer is in use.
   Currently show_regs() is unaffected as dump_backtrace() performs an
   initial unwind.

2) When unwinding across an exception boundary (whether continuing an
   unwind or starting a new unwind from regs), we currently always skip
   the LR of the interrupted context. Where this was live and contained
   a rewritten address, we won't consume the corresponding fgraph ret
   stack entry, leaving subsequent entries off-by-one.

   This can result in erroneous entries being reported in a backtrace
   performed by any in-kernel unwinder when that backtrace crosses an
   exception boundary, with entries after the boundary being reported
   incorrectly. This includes perf, kfence, show_regs(), panic(), etc.

To fix this, we need to be able to uniquely identify each rewritten
return address such that we can map this back to the original return
address. We can use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR to associate
each rewritten return address with a unique location on the stack. As
the return address is passed in the LR (and so is not guaranteed a
unique location in memory), we use the FP upon entry to the function
(i.e. the address of the caller's frame record) as the return address
pointer. Any nested call will have a different FP value as the caller
must create its own frame record and update FP to point to this.

Since ftrace_graph_ret_addr() requires the return address with the PAC
stripped, the stripping of the PAC is moved before the fixup of the
rewritten address. As we would unconditionally strip the PAC, moving
this earlier is not harmful, and we can avoid a redundant strip in the
return address fixup code.

I've tested this with the perf case above, the ftrace selftests, and
a number of ad-hoc unwinder tests. The tests all pass, and I have seen
no unexpected behaviour as a result of this change. I've tested with
pointer authentication under QEMU TCG where magic-sysrq+l correctly
recovers the original return addresses.

Note that this doesn't fix the issue of skipping a live LR at an
exception boundary, which is a more general problem and requires more
substantial rework. Were we to consume the LR in all cases this would
result in warnings where the interrupted context's LR contains
`return_to_handler`, but the FP has been altered, e.g.

| func:
|	<--- ftrace entry ---> 	// logs FP & LR, rewrites LR
| 	STP	FP, LR, [SP, #-16]!
| 	MOV	FP, SP
| 	<--- INTERRUPT --->

... as ftrace_graph_get_ret_stack() fill not find a matching entry,
triggering the WARN_ON_ONCE() in unwind_frame().

Link: https://lore.kernel.org/r/20211025164925.GB2001@C02TD0UTHF1T.local
Link: https://lore.kernel.org/r/20211027132529.30027-1-mark.rutland@arm.comSigned-off-by: NMark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: NMark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211029162245.39761-1-mark.rutland@arm.comSigned-off-by: NWill Deacon <will@kernel.org>

c6d3cd32

12 11月, 2021 1 次提交

kasan: add kasan mode messages when kasan init · b873e986

由 Kuan-Ying Lee 提交于 11月 10, 2021

There are multiple kasan modes.  It makes sense that we add some
messages to know which kasan mode is active when booting up [1].

Link: https://bugzilla.kernel.org/show_bug.cgi?id=212195 [1]
Link: https://lkml.kernel.org/r/20211020094850.4113-1-Kuan-Ying.Lee@mediatek.comSigned-off-by: NKuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Reviewed-by: NMarco Elver <elver@google.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: Yee Lee <yee.lee@mediatek.com>
Cc: Nicholas Tang <nicholas.tang@mediatek.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b873e986

09 11月, 2021 1 次提交

KVM: arm64: Fix host stage-2 finalization · 50a8d331

由 Quentin Perret 提交于 11月 08, 2021

We currently walk the hypervisor stage-1 page-table towards the end of
hyp init in nVHE protected mode and adjust the host page ownership
attributes in its stage-2 in order to get a consistent state from both
point of views. The walk is done on the entire hyp VA space, and expects
to only ever find page-level mappings. While this expectation is
reasonable in the half of hyp VA space that maps memory with a fixed
offset (see the loop in pkvm_create_mappings_locked()), it can be
incorrect in the other half where nothing prevents the usage of block
mappings. For instance, on systems where memory is physically aligned at
an address that happens to maps to a PMD aligned VA in the hyp_vmemmap,
kvm_pgtable_hyp_map() will install block mappings when backing the
hyp_vmemmap, which will later cause finalize_host_mappings() to fail.
Furthermore, it should be noted that all pages backing the hyp_vmemmap
are also mapped in the 'fixed offset range' of the hypervisor, which
implies that finalize_host_mappings() will walk both aliases and update
the host stage-2 attributes twice. The order in which this happens is
unpredictable, though, since the hyp VA layout is highly dependent on
the position of the idmap page, hence resulting in a fragile mess at
best.

In order to fix all of this, let's restrict the finalization walk to
only cover memory regions in the 'fixed-offset range' of the hyp VA
space and nothing else. This not only fixes a correctness issue, but
will also result in a slighlty faster hyp initialization overall.

Fixes: 2c50166c ("KVM: arm64: Mark host bss and rodata section as shared")
Signed-off-by: NQuentin Perret <qperret@google.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211108154636.393384-1-qperret@google.com

50a8d331

08 11月, 2021 7 次提交

KVM: arm64: Change the return type of kvm_vcpu_preferred_target() · 08e873cb

由 YueHaibing 提交于 11月 05, 2021

kvm_vcpu_preferred_target() always return 0 because kvm_target_cpu()
never returns a negative error code.
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Reviewed-by: NAlexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211105011500.16280-1-yuehaibing@huawei.com

08e873cb

KVM: arm64: nvhe: Fix a non-kernel-doc comment · deacd669

由 Randy Dunlap 提交于 11月 05, 2021

Do not use kernel-doc "/**" notation when the comment is not in
kernel-doc format.

Fixes this docs build warning:

arch/arm64/kvm/hyp/nvhe/sys_regs.c:478: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
    * Handler for protected VM restricted exceptions.
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Reported-by: Nkernel test robot <lkp@intel.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: kvmarm@lists.cs.columbia.edu
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211106032529.15057-1-rdunlap@infradead.org

deacd669

KVM: arm64: Extract ESR_ELx.EC only · 8bb08411

由 Mark Rutland 提交于 11月 03, 2021

Since ARMv8.0 the upper 32 bits of ESR_ELx have been RES0, and recently
some of the upper bits gained a meaning and can be non-zero. For
example, when FEAT_LS64 is implemented, ESR_ELx[36:32] contain ISS2,
which for an ST64BV or ST64BV0 can be non-zero. This can be seen in ARM
DDI 0487G.b, page D13-3145, section D13.2.37.

Generally, we must not rely on RES0 bit remaining zero in future, and
when extracting ESR_ELx.EC we must mask out all other bits.

All C code uses the ESR_ELx_EC() macro, which masks out the irrelevant
bits, and therefore no alterations are required to C code to avoid
consuming irrelevant bits.

In a couple of places the KVM assembly extracts ESR_ELx.EC using LSR on
an X register, and so could in theory consume previously RES0 bits. In
both cases this is for comparison with EC values ESR_ELx_EC_HVC32 and
ESR_ELx_EC_HVC64, for which the upper bits of ESR_ELx must currently be
zero, but this could change in future.

This patch adjusts the KVM vectors to use UBFX rather than LSR to
extract ESR_ELx.EC, ensuring these are robust to future additions to
ESR_ELx.

Cc: stable@vger.kernel.org
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: NWill Deacon <will@kernel.org>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211103110545.4613-1-mark.rutland@arm.com

8bb08411

arm64: pgtable: make __pte_to_phys/__phys_to_pte_val inline functions · c7c386fb

由 Arnd Bergmann 提交于 11月 05, 2021

gcc warns about undefined behavior the vmalloc code when building
with CONFIG_ARM64_PA_BITS_52, when the 'idx++' in the argument to
__phys_to_pte_val() is evaluated twice:

mm/vmalloc.c: In function 'vmap_pfn_apply':
mm/vmalloc.c:2800:58: error: operation on 'data->idx' may be undefined [-Werror=sequence-point]
 2800 |         *pte = pte_mkspecial(pfn_pte(data->pfns[data->idx++], data->prot));
      |                                                 ~~~~~~~~~^~
arch/arm64/include/asm/pgtable-types.h:25:37: note: in definition of macro '__pte'
   25 | #define __pte(x)        ((pte_t) { (x) } )
      |                                     ^
arch/arm64/include/asm/pgtable.h:80:15: note: in expansion of macro '__phys_to_pte_val'
   80 |         __pte(__phys_to_pte_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot))
      |               ^~~~~~~~~~~~~~~~~
mm/vmalloc.c:2800:30: note: in expansion of macro 'pfn_pte'
 2800 |         *pte = pte_mkspecial(pfn_pte(data->pfns[data->idx++], data->prot));
      |                              ^~~~~~~

I have no idea why this never showed up earlier, but the safest
workaround appears to be changing those macros into inline functions
so the arguments get evaluated only once.

Cc: Matthew Wilcox <willy@infradead.org>
Fixes: 75387b92 ("arm64: handle 52-bit physical addresses in page table entries")
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20211105075414.2553155-1-arnd@kernel.orgSigned-off-by: NWill Deacon <will@kernel.org>

c7c386fb

arm64: Track no early_pgtable_alloc() for kmemleak · c6975d7c

由 Qian Cai 提交于 11月 05, 2021

After switched page size from 64KB to 4KB on several arm64 servers here,
kmemleak starts to run out of early memory pool due to a huge number of
those early_pgtable_alloc() calls:

  kmemleak_alloc_phys()
  memblock_alloc_range_nid()
  memblock_phys_alloc_range()
  early_pgtable_alloc()
  init_pmd()
  alloc_init_pud()
  __create_pgd_mapping()
  __map_memblock()
  paging_init()
  setup_arch()
  start_kernel()

Increased the default value of DEBUG_KMEMLEAK_MEM_POOL_SIZE by 4 times
won't be enough for a server with 200GB+ memory. There isn't much
interesting to check memory leaks for those early page tables and those
early memory mappings should not reference to other memory. Hence, no
kmemleak false positives, and we can safely skip tracking those early
allocations from kmemleak like we did in the commit fed84c78
("mm/memblock.c: skip kmemleak for kasan_init()") without needing to
introduce complications to automatically scale the value depends on the
runtime memory size etc. After the patch, the default value of
DEBUG_KMEMLEAK_MEM_POOL_SIZE becomes sufficient again.
Signed-off-by: NQian Cai <quic_qiancai@quicinc.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Reviewed-by: NMike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/20211105150509.7826-1-quic_qiancai@quicinc.comSigned-off-by: NWill Deacon <will@kernel.org>

c6975d7c

arm64: vdso: remove -nostdlib compiler flag · 34688c76

由 Masahiro Yamada 提交于 11月 08, 2021

The -nostdlib option requests the compiler to not use the standard
system startup files or libraries when linking. It is effective only
when $(CC) is used as a linker driver.

Since commit 691efbed ("arm64: vdso: use $(LD) instead of $(CC)
to link VDSO"), $(LD) is directly used, hence -nostdlib is unneeded.
Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20211107161802.323125-1-masahiroy@kernel.orgSigned-off-by: NWill Deacon <will@kernel.org>

34688c76

arm64: arm64_ftr_reg->name may not be a human-readable string · 9dc232a8

由 Reiji Watanabe 提交于 10月 31, 2021

The id argument of ARM64_FTR_REG_OVERRIDE() is used for two purposes:
one as the system register encoding (used for the sys_id field of
__ftr_reg_entry), and the other as the register name (stringified
and used for the name field of arm64_ftr_reg), which is debug
information. The id argument is supposed to be a macro that
indicates an encoding of the register (eg. SYS_ID_AA64PFR0_EL1, etc).

ARM64_FTR_REG(), which also has the same id argument,
uses ARM64_FTR_REG_OVERRIDE() and passes the id to the macro.
Since the id argument is completely macro-expanded before it is
substituted into a macro body of ARM64_FTR_REG_OVERRIDE(),
the stringified id in the body of ARM64_FTR_REG_OVERRIDE is not
a human-readable register name, but a string of numeric bitwise
operations.

Fix this so that human-readable register names are available as
debug information.

Fixes: 8f266a5d ("arm64: cpufeature: Add global feature override facility")
Signed-off-by: NReiji Watanabe <reijiw@google.com>
Reviewed-by: NOliver Upton <oupton@google.com>
Acked-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211101045421.2215822-1-reijiw@google.comSigned-off-by: NWill Deacon <will@kernel.org>

9dc232a8

07 11月, 2021 3 次提交

memblock: rename memblock_free to memblock_phys_free · 3ecc6834

由 Mike Rapoport 提交于 11月 05, 2021

Since memblock_free() operates on a physical range, make its name
reflect it and rename it to memblock_phys_free(), so it will be a
logical counterpart to memblock_phys_alloc().

The callers are updated with the below semantic patch:

    @@
    expression addr;
    expression size;
    @@
    - memblock_free(addr, size);
    + memblock_phys_free(addr, size);

Link: https://lkml.kernel.org/r/20210930185031.18648-6-rppt@kernel.orgSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Juergen Gross <jgross@suse.com>
Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3ecc6834

kasan: arm64: fix pcpu_page_first_chunk crash with KASAN_VMALLOC · 3252b1d8

由 Kefeng Wang 提交于 11月 05, 2021

With KASAN_VMALLOC and NEED_PER_CPU_PAGE_FIRST_CHUNK the kernel crashes:

  Unable to handle kernel paging request at virtual address ffff7000028f2000
  ...
  swapper pgtable: 64k pages, 48-bit VAs, pgdp=0000000042440000
  [ffff7000028f2000] pgd=000000063e7c0003, p4d=000000063e7c0003, pud=000000063e7c0003, pmd=000000063e7b0003, pte=0000000000000000
  Internal error: Oops: 96000007 [#1] PREEMPT SMP
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper Not tainted 5.13.0-rc4-00003-gc6e6e28f3f30-dirty #62
  Hardware name: linux,dummy-virt (DT)
  pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO BTYPE=--)
  pc : kasan_check_range+0x90/0x1a0
  lr : memcpy+0x88/0xf4
  sp : ffff80001378fe20
  ...
  Call trace:
   kasan_check_range+0x90/0x1a0
   pcpu_page_first_chunk+0x3f0/0x568
   setup_per_cpu_areas+0xb8/0x184
   start_kernel+0x8c/0x328

The vm area used in vm_area_register_early() has no kasan shadow memory,
Let's add a new kasan_populate_early_vm_area_shadow() function to
populate the vm area shadow memory to fix the issue.

[wangkefeng.wang@huawei.com: fix redefinition of 'kasan_populate_early_vm_area_shadow']
  Link: https://lkml.kernel.org/r/20211011123211.3936196-1-wangkefeng.wang@huawei.com

Link: https://lkml.kernel.org/r/20210910053354.26721-4-wangkefeng.wang@huawei.comSigned-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Marco Elver <elver@google.com>		[KASAN]
Acked-by: Andrey Konovalov <andreyknvl@gmail.com>	[KASAN]
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3252b1d8

arm64: support page mapping percpu first chunk allocator · 09cea619

由 Kefeng Wang 提交于 11月 05, 2021

Percpu embedded first chunk allocator is the firstly option, but it
could fails on ARM64, eg,

  percpu: max_distance=0x5fcfdc640000 too large for vmalloc space 0x781fefff0000
  percpu: max_distance=0x600000540000 too large for vmalloc space 0x7dffb7ff0000
  percpu: max_distance=0x5fff9adb0000 too large for vmalloc space 0x5dffb7ff0000

then we could get

  WARNING: CPU: 15 PID: 461 at vmalloc.c:3087 pcpu_get_vm_areas+0x488/0x838

and the system could not boot successfully.

Let's implement page mapping percpu first chunk allocator as a fallback
to the embedding allocator to increase the robustness of the system.

Link: https://lkml.kernel.org/r/20210910053354.26721-3-wangkefeng.wang@huawei.comSigned-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Marco Elver <elver@google.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

09cea619

05 11月, 2021 1 次提交

arm64: cpufeature: Export this_cpu_has_cap helper · 20b02fe3

由 Arnd Bergmann 提交于 11月 03, 2021

Export the this_cpu_has_cap() for use by modules. This is
used by TRBE driver. Without this patch, TRBE will fail
to build as a module :

ERROR: modpost: "this_cpu_has_cap" [drivers/hwtracing/coresight/coresight-trbe.ko] undefined!

Fixes: 8a106512 ("coresight: trbe: Add infrastructure for Errata handling")
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
[ change to EXPORT_SYMBOL_GPL ]
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
[ Added Will AB tag]
Acked-by: NWill Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20211103221256.725080-1-suzuki.poulose@arm.comSigned-off-by: NMathieu Poirier <mathieu.poirier@linaro.org>

20b02fe3

02 11月, 2021 1 次提交

xen: allow pv-only hypercalls only with CONFIG_XEN_PV · ee1f9d19

由 Juergen Gross 提交于 10月 28, 2021

Put the definitions of the hypercalls usable only by pv guests inside
CONFIG_XEN_PV sections.

On Arm two dummy functions related to pv hypercalls can be removed.

While at it remove the no longer supported tmem hypercall definition.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20211028081221.2475-3-jgross@suse.comReviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

ee1f9d19

28 10月, 2021 3 次提交

arm64: Select POSIX_CPU_TIMERS_TASK_WORK · a68773bd

由 Nicolas Saenz Julienne 提交于 10月 18, 2021

With 6caa5812 ("KVM: arm64: Use generic KVM xfer to guest work
function") all arm64 exit paths are properly equipped to handle the
POSIX timers' task work.

Deferring timer callbacks to thread context, not only limits the amount
of time spent in hard interrupt context, but is a safer
implementation[1], and will allow PREEMPT_RT setups to use KVM[2].

So let's enable POSIX_CPU_TIMERS_TASK_WORK on arm64.

[1] https://lore.kernel.org/all/20200716201923.228696399@linutronix.de/
[2] https://lore.kernel.org/linux-rt-users/87v92bdnlx.ffs@tglx/Signed-off-by: NNicolas Saenz Julienne <nsaenzju@redhat.com>
Acked-by: NMark Rutland <mark.rutland@arm.com>
Acked-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211018144713.873464-1-nsaenzju@redhat.comSigned-off-by: NWill Deacon <will@kernel.org>

a68773bd

arm64: errata: Enable TRBE workaround for write to out-of-range address · 561ced0b

由 Suzuki K Poulose 提交于 10月 19, 2021

With the TRBE driver workaround available, enable the config symbols
to be built without COMPILE_TEST

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: NAnshuman Khandual <anshuman.khandual@arm.com>
Acked-by: NWill Deacon <will@kernel.org>
Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20211019163153.3692640-16-suzuki.poulose@arm.comSigned-off-by: NMathieu Poirier <mathieu.poirier@linaro.org>

561ced0b

arm64: errata: Enable workaround for TRBE overwrite in FILL mode · 74b2740f

由 Suzuki K Poulose 提交于 10月 19, 2021

With the workaround enabled in TRBE, enable the config entries
to be built without COMPILE_TEST

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: NAnshuman Khandual <anshuman.khandual@arm.com>
Acked-by: NWill Deacon <will@kernel.org>
Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20211019163153.3692640-15-suzuki.poulose@arm.comSigned-off-by: NMathieu Poirier <mathieu.poirier@linaro.org>

74b2740f

27 10月, 2021 5 次提交

arm64: dts: apple: j274: Expose PCI node for the Ethernet MAC address · e1bebf97

由 Marc Zyngier 提交于 10月 25, 2021

At the moment, all the Minis running Linux have the same MAC
address (00:10:18:00:00:00), which is a bit annoying.

Expose the PCI node corresponding to the Ethernet device, and
declare a 'local-mac-address' property. The bootloader will update
it (m1n1 already has the required feature). And if it doesn't, then
the default value is already present in the DT.

This relies on forcing the bus number for each port so that the
endpoints connected to them are correctly numbered (and keeps dtc
quiet).
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Signed-off-by: NHector Martin <marcan@marcan.st>

e1bebf97

arm64: dts: apple: t8103: Add root port interrupt routing · 128888a6

由 Marc Zyngier 提交于 10月 25, 2021

Add the interrupt-map properties that are required for INTx
signalling.
Tested-by: NAlyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Signed-off-by: NHector Martin <marcan@marcan.st>

128888a6

arm64: dts: apple: t8103: Add PCIe DARTs · 3c866bb7

由 Marc Zyngier 提交于 10月 25, 2021

PCIe on the Apple M1 (aka t8103) requires the use of IOMMUs (aka
DARTs). Add the three instances that deal with the internal PCIe
ports and route each port's traffic through its DART.
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Signed-off-by: NHector Martin <marcan@marcan.st>

3c866bb7

arm64: apple: Add PCIe node · ff2a8d91

由 Mark Kettenis 提交于 10月 25, 2021

Add node corresponding to the apcie,t8103 node in the
Apple device tree for the Mac mini (M1, 2020).

Power domain references and DART (IOMMU) references are left out
at the moment and will be added once the appropriate bindings have
been settled upon.
Acked-by: NMarc Zyngier <maz@kernel.org>
Signed-off-by: NMark Kettenis <kettenis@openbsd.org>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210921183420.436-5-kettenis@openbsd.orgSigned-off-by: NHector Martin <marcan@marcan.st>

ff2a8d91

arm64: apple: Add pinctrl nodes · 0a8282b8

由 Mark Kettenis 提交于 10月 25, 2021

Add pinctrl nodes corresponding to the gpio,t8101 nodes in the
Apple device tree for the Mac mini (M1, 2020).

Clock references are left out at the moment and will be added once
the appropriate bindings have been settled upon.
Signed-off-by: NMark Kettenis <kettenis@openbsd.org>
Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210520171310.772-3-mark.kettenis@xs4all.nlSigned-off-by: NHector Martin <marcan@marcan.st>

0a8282b8

26 10月, 2021 7 次提交

ARM: dts: arm: Update register-bit-led nodes 'reg' and node names · 25b892b5

由 Rob Herring 提交于 10月 25, 2021

Add a 'reg' entry for register-bit-led nodes on the Arm Ltd platforms.
The 'reg' entry is the LED control register address. With this, the node
name can be updated to use a generic node name, 'led', and a
unit-address.
Signed-off-by: NRob Herring <robh@kernel.org>
Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Liviu Dudau <liviu.dudau@arm.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20211024232003.211484-1-linus.walleij@linaro.org'
Signed-off-by: NArnd Bergmann <arnd@arndb.de>

25b892b5

arm64/sve: Fix warnings when SVE is disabled · 04ee53a5

由 Mark Brown 提交于 10月 22, 2021

In configurations where SVE is disabled we define but never reference the
functions for retrieving the default vector length, causing warnings. Fix
this by move the ifdef up, marking get_default_vl() inline since it is
referenced from code guarded by an IS_ENABLED() check, and do the same for
the other accessors for consistency.
Reported-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NMark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211022141635.2360415-3-broonie@kernel.orgSigned-off-by: NWill Deacon <will@kernel.org>

04ee53a5

arm64/sve: Add stub for sve_max_virtualisable_vl() · 49ed9204

由 Mark Brown 提交于 10月 22, 2021

Fixes build problems for configurations with KVM enabled but SVE disabled.
Reported-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NMark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211022141635.2360415-2-broonie@kernel.orgSigned-off-by: NWill Deacon <will@kernel.org>

49ed9204

irq: remove handle_domain_{irq,nmi}() · 0953fb26

由 Mark Rutland 提交于 10月 20, 2021

Now that entry code handles IRQ entry (including setting the IRQ regs)
before calling irqchip code, irqchip code can safely call
generic_handle_domain_irq(), and there's no functional reason for it to
call handle_domain_irq().

Let's cement this split of responsibility and remove handle_domain_irq()
entirely, updating irqchip drivers to call generic_handle_domain_irq().

For consistency, handle_domain_nmi() is similarly removed and replaced
with a generic_handle_domain_nmi() function which also does not perform
any entry logic.

Previously handle_domain_{irq,nmi}() had a WARN_ON() which would fire
when they were called in an inappropriate context. So that we can
identify similar issues going forward, similar WARN_ON_ONCE() logic is
added to the generic_handle_*() functions, and comments are updated for
clarity and consistency.
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Reviewed-by: NMarc Zyngier <maz@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>

0953fb26

irq: arm64: perform irqentry in entry code · 26dc1293

由 Mark Rutland 提交于 10月 19, 2021

In preparation for removing HANDLE_DOMAIN_IRQ_IRQENTRY, have arch/arm64
perform all the irqentry accounting in its entry code.

As arch/arm64 already performs portions of the irqentry logic in
enter_from_kernel_mode() and exit_to_kernel_mode(), including
rcu_irq_{enter,exit}(), the only additional calls that need to be made
are to irq_{enter,exit}_rcu(). Removing the calls to
rcu_irq_{enter,exit}() from handle_domain_irq() ensures that we inform
RCU once per IRQ entry and will correctly identify quiescent periods.

Since we should not call irq_{enter,exit}_rcu() when entering a
pseudo-NMI, el1_interrupt() is reworked to have separate __el1_irq() and
__el1_pnmi() paths for regular IRQ and psuedo-NMI entry, with
irq_{enter,exit}_irq() only called for the former.

In preparation for removing HANDLE_DOMAIN_IRQ, the irq regs are managed
in do_interrupt_handler() for both regular IRQ and pseudo-NMI. This is
currently redundant, but not harmful.

For clarity the preemption logic is moved into __el1_irq(). We should
never preempt within a pseudo-NMI, and arm64_enter_nmi() already
enforces this by incrementing the preempt_count, but it's clearer if we
never invoke the preemption logic when entering a pseudo-NMI.
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Reviewed-by: NPingfan Liu <kernelfans@gmail.com>
Reviewed-by: NMarc Zyngier <maz@kernel.org>
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>

26dc1293

arm64: dts: exynos: add chipid node for exynosautov9 SoC · b2f217cc

由 Chanho Park 提交于 10月 21, 2021

It can be compatible with exynos850's chipid. The SoC has eight chipid
registers that can be used for OTP.

Cc: Sam Protsenko <semen.protsenko@linaro.org>
Signed-off-by: NChanho Park <chanho61.park@samsung.com>
Link: https://lore.kernel.org/r/20211021012017.158919-3-chanho61.park@samsung.comSigned-off-by: NKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>

b2f217cc

Revert "arm64: dts: qcom: sm8250: remove bus clock from the mdss node for sm8250 target" · e091b836

由 Amit Pundir 提交于 10月 14, 2021

This reverts commit 001ce978.

This upstream commit broke AOSP (post Android 12 merge) build
on RB5. The device either silently crashes into USB crash mode
after android boot animation or we see a blank blue screen
with following dpu errors in dmesg:

[  T444] hw recovery is not complete for ctl:3
[  T444] [drm:dpu_encoder_phys_vid_prepare_for_kickoff:539] [dpu error]enc31 intf1 ctl 3 reset failure: -22
[  T444] [drm:dpu_encoder_phys_vid_wait_for_commit_done:513] [dpu error]vblank timeout
[  T444] [drm:dpu_kms_wait_for_commit_done:454] [dpu error]wait for commit done returned -110
[    C7] [drm:dpu_encoder_frame_done_timeout:2127] [dpu error]enc31 frame done timeout
[  T444] [drm:dpu_encoder_phys_vid_wait_for_commit_done:513] [dpu error]vblank timeout
[  T444] [drm:dpu_kms_wait_for_commit_done:454] [dpu error]wait for commit done returned -110

Fixes: 001ce978 ("arm64: dts: qcom: sm8250: remove bus clock from the mdss node for sm8250 target")
Signed-off-by: NAmit Pundir <amit.pundir@linaro.org>
Signed-off-by: NDmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: NBjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211014135410.4136412-1-dmitry.baryshkov@linaro.org

e091b836

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功