提交 · 5402e239d09feea482d25d60df9b908cfaf9ec3c · openeuler / Kernel

30 11月, 2021 1 次提交

powerpc/64s: Get LPID bit width from device tree · 5402e239

由 Nicholas Piggin 提交于 11月 29, 2021

Allow the LPID bit width and partition table size to be set at runtime
from the device tree.

Move the PID bit width detection into the same place.

KVM does not support using the extra bits yet, this is mainly required
to get the PTCR register values correct (so KVM will run but it will
not allocate > 4096 LPIDs).

OPAL firmware provides this property for POWER10 CPUs since skiboot
commit 9b85f7d961f2 ("hdata: add mmu-pid-bits and mmu-lpid-bits for
POWER10 CPUs").
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Reviewed-by: NFabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211129030915.1888332-1-npiggin@gmail.com

5402e239

24 11月, 2021 1 次提交

powerpc/64s: Keep AMOR SPR a constant ~0 at runtime · 46f9caf1

由 Nicholas Piggin 提交于 11月 23, 2021

This register controls supervisor SPR modifications, and as such is only
relevant for KVM. KVM always sets AMOR to ~0 on guest entry, and never
restores it coming back out to the host, so it can be kept constant and
avoid the mtSPR in KVM guest entry.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Reviewed-by: NFabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-10-npiggin@gmail.com

46f9caf1

22 10月, 2021 1 次提交

powerpc/s64: Clarify that radix lacks DEBUG_PAGEALLOC · 4f703e7f

由 Joel Stanley 提交于 10月 14, 2021

The page_alloc.c code will call into __kernel_map_pages() when
DEBUG_PAGEALLOC is configured and enabled.

As the implementation assumes hash, this should crash spectacularly if
not for a bit of luck in __kernel_map_pages(). In this function
linear_map_hash_count is always zero, the for loop exits without doing
any damage.

There are no other platforms that determine if they support
debug_pagealloc at runtime. Instead of adding code to mm/page_alloc.c to
do that, this change turns the map/unmap into a noop when in radix
mode and prints a warning once.
Signed-off-by: NJoel Stanley <joel@jms.id.au>
Reviewed-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Reformat if per Christophe's suggestion]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211013213438.675095-1-joel@jms.id.au

4f703e7f

26 8月, 2021 1 次提交

powerpc: Refactor verification of MSR_RI · 806c0e6e

由 Christophe Leroy 提交于 8月 23, 2021

40x and BOOKE don't have MSR_RI therefore all tests involving
MSR_RI may be problematic on those plateforms.

Create helpers to check or set MSR_RI in regs, and use them
in common code.
Signed-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c2fb93708196734f4176dda334aaa3055f213b89.1629707037.git.christophe.leroy@csgroup.eu

806c0e6e

13 8月, 2021 2 次提交

powerpc: rename powerpc_debugfs_root to arch_debugfs_dir · dbf77fed

由 Aneesh Kumar K.V 提交于 8月 12, 2021

No functional change in this patch. arch_debugfs_dir is the generic kernel
name declared in linux/debugfs.h for arch-specific debugfs directory.
Architectures like x86/s390 already use the name. Rename powerpc
specific powerpc_debugfs_root to arch_debugfs_dir.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210812132831.233794-2-aneesh.kumar@linux.ibm.com

dbf77fed

powerpc/book3s64/radix: make tlb_single_page_flush_ceiling a debugfs entry · 3e188b1a

由 Aneesh Kumar K.V 提交于 8月 12, 2021

Similar to x86/s390 add a debugfs file to tune tlb_single_page_flush_ceiling.
Also add a debugfs entry for tlb_local_single_page_flush_ceiling.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210812132831.233794-1-aneesh.kumar@linux.ibm.com

3e188b1a

26 7月, 2021 1 次提交

powerpc/kexec: blacklist functions called in real mode for kprobe · 8119cefd

由 Hari Bathini 提交于 7月 14, 2021

As kprobe does not handle events happening in real mode, blacklist the
functions that only get called in real mode or in kexec sequence with
MMU turned off.
Signed-off-by: NHari Bathini <hbathini@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/162626687834.155313.4692863392927831843.stgit@hbathini-workstation.ibm.com

8119cefd

09 7月, 2021 3 次提交

powerpc/book3s64/mm: update flush_tlb_range to flush page walk cache · cec6515a

由 Aneesh Kumar K.V 提交于 7月 07, 2021

flush_tlb_range is special in that we don't specify the page size used for
the translation.  Hence when flushing TLB we flush the translation cache
for all possible page sizes.  The kernel also uses the same interface when
moving page tables around.  Such a move requires us to flush the page walk
cache.

Instead of adding another interface to force page walk cache flush, update
flush_tlb_range to flush page walk cache if the range flushed is more than
the PMD range.  A page table move will always involve an invalidate range
more than PMD_SIZE.

Running microbenchmark with mprotect and parallel memory access didn't
show any observable performance impact.

Link: https://lkml.kernel.org/r/20210616045735.374532-3-aneesh.kumar@linux.ibm.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cec6515a

mm: rename p4d_page_vaddr to p4d_pgtable and make it return pud_t * · dc4875f0

由 Aneesh Kumar K.V 提交于 7月 07, 2021

No functional change in this patch.

[aneesh.kumar@linux.ibm.com: m68k build error reported by kernel robot]
  Link: https://lkml.kernel.org/r/87tulxnb2v.fsf@linux.ibm.com

Link: https://lkml.kernel.org/r/20210615110859.320299-2-aneesh.kumar@linux.ibm.com
Link: https://lore.kernel.org/linuxppc-dev/CAHk-=wi+J+iodze9FtjM3Zi4j4OeS+qqbKxME9QN4roxPEXH9Q@mail.gmail.com/Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dc4875f0

mm: rename pud_page_vaddr to pud_pgtable and make it return pmd_t * · 9cf6fa24

由 Aneesh Kumar K.V 提交于 7月 07, 2021

No functional change in this patch.

[aneesh.kumar@linux.ibm.com: fix]
  Link: https://lkml.kernel.org/r/87wnqtnb60.fsf@linux.ibm.com
[sfr@canb.auug.org.au: another fix]
  Link: https://lkml.kernel.org/r/20210619134410.89559-1-aneesh.kumar@linux.ibm.com

Link: https://lkml.kernel.org/r/20210615110859.320299-1-aneesh.kumar@linux.ibm.com
Link: https://lore.kernel.org/linuxppc-dev/CAHk-=wi+J+iodze9FtjM3Zi4j4OeS+qqbKxME9QN4roxPEXH9Q@mail.gmail.com/Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9cf6fa24

30 6月, 2021 1 次提交

powerpc/64s: fix hash page fault interrupt handler · 5567b1ee

由 Nicholas Piggin 提交于 6月 30, 2021

The early bad fault or key fault test in do_hash_fault() ends up calling
into ___do_page_fault without having gone through an interrupt handler
wrapper (except the initial _RAW one). This can end up calling local irq
functions while the interrupt has not been reconciled, which will likely
cause crashes and it trips up on a later patch that adds more assertions.

pkey_exec_prot from selftests causes this path to be executed.

There is no real reason to run the in_nmi() test should be performed
before the key fault check. In fact if a perf interrupt in the hash
fault code did a stack walk that was made to take a key fault somehow
then running ___do_page_fault could possibly cause another hash fault
causing problems. Move the in_nmi() test first, and then do everything
else inside the regular interrupt handler function.

Fixes: 3a96570f ("powerpc: convert interrupt handlers to use wrappers")
Reported-by: NSachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Tested-by: NSachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210630074621.2109197-2-npiggin@gmail.com

5567b1ee

22 6月, 2021 1 次提交

KVM: PPC: Book3S HV: Nested support in H_RPT_INVALIDATE · 53324b51

由 Bharata B Rao 提交于 6月 21, 2021

Enable support for process-scoped invalidations from nested
guests and partition-scoped invalidations for nested guests.

Process-scoped invalidations for any level of nested guests
are handled by implementing H_RPT_INVALIDATE handler in the
nested guest exit path in L0.

Partition-scoped invalidation requests are forwarded to the
right nested guest, handled there and passed down to L0
for eventual handling.
Signed-off-by: NBharata B Rao <bharata@linux.ibm.com>
[aneesh: Nested guest partition-scoped invalidation changes]
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Squash in fixup patch]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210621085003.904767-5-bharata@linux.ibm.com

53324b51

21 6月, 2021 2 次提交

KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE · f0c6fbbb

由 Bharata B Rao 提交于 6月 21, 2021

H_RPT_INVALIDATE does two types of TLB invalidations:

1. Process-scoped invalidations for guests when LPCR[GTSE]=0.
   This is currently not used in KVM as GTSE is not usually
   disabled in KVM.
2. Partition-scoped invalidations that an L1 hypervisor does on
   behalf of an L2 guest. This is currently handled
   by H_TLB_INVALIDATE hcall and this new replaces the old that.

This commit enables process-scoped invalidations for L1 guests.
Support for process-scoped and partition-scoped invalidations
from/for nested guests will be added separately.

Process scoped tlbie invalidations from L1 and nested guests
need RS register for TLBIE instruction to contain both PID and
LPID.  This patch introduces primitives that execute tlbie
instruction with both PID and LPID set in prepartion for
H_RPT_INVALIDATE hcall.

A description of H_RPT_INVALIDATE follows:

int64   /* H_Success: Return code on successful completion */
        /* H_Busy - repeat the call with the same */
        /* H_Parameter, H_P2, H_P3, H_P4, H_P5 : Invalid
	   parameters */
hcall(const uint64 H_RPT_INVALIDATE, /* Invalidate RPT
					translation
					lookaside information */
      uint64 id,        /* PID/LPID to invalidate */
      uint64 target,    /* Invalidation target */
      uint64 type,      /* Type of lookaside information */
      uint64 pg_sizes,  /* Page sizes */
      uint64 start,     /* Start of Effective Address (EA)
			   range (inclusive) */
      uint64 end)       /* End of EA range (exclusive) */

Invalidation targets (target)
-----------------------------
Core MMU        0x01 /* All virtual processors in the
			partition */
Core local MMU  0x02 /* Current virtual processor */
Nest MMU        0x04 /* All nest/accelerator agents
			in use by the partition */

A combination of the above can be specified,
except core and core local.

Type of translation to invalidate (type)
---------------------------------------
NESTED       0x0001  /* invalidate nested guest partition-scope */
TLB          0x0002  /* Invalidate TLB */
PWC          0x0004  /* Invalidate Page Walk Cache */
PRT          0x0008  /* Invalidate caching of Process Table
			Entries if NESTED is clear */
PAT          0x0008  /* Invalidate caching of Partition Table
			Entries if NESTED is set */

A combination of the above can be specified.

Page size mask (pages)
----------------------
4K              0x01
64K             0x02
2M              0x04
1G              0x08
All sizes       (-1UL)

A combination of the above can be specified.
All page sizes can be selected with -1.

Semantics: Invalidate radix tree lookaside information
           matching the parameters given.
* Return H_P2, H_P3 or H_P4 if target, type, or pageSizes parameters
  are different from the defined values.
* Return H_PARAMETER if NESTED is set and pid is not a valid nested
  LPID allocated to this partition
* Return H_P5 if (start, end) doesn't form a valid range. Start and
  end should be a valid Quadrant address and  end > start.
* Return H_NotSupported if the partition is not in running in radix
  translation mode.
* May invalidate more translation information than requested.
* If start = 0 and end = -1, set the range to cover all valid
  addresses. Else start and end should be aligned to 4kB (lower 11
  bits clear).
* If NESTED is clear, then invalidate process scoped lookaside
  information. Else pid specifies a nested LPID, and the invalidation
  is performed   on nested guest partition table and nested guest
  partition scope real addresses.
* If pid = 0 and NESTED is clear, then valid addresses are quadrant 3
  and quadrant 0 spaces, Else valid addresses are quadrant 0.
* Pages which are fully covered by the range are to be invalidated.
  Those which are partially covered are considered outside
  invalidation range, which allows a caller to optimally invalidate
  ranges that may   contain mixed page sizes.
* Return H_SUCCESS on success.
Signed-off-by: NBharata B Rao <bharata@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210621085003.904767-4-bharata@linux.ibm.com

f0c6fbbb

powerpc/book3s64/radix: Add H_RPT_INVALIDATE pgsize encodings to mmu_psize_def · d6265cb3

由 Bharata B Rao 提交于 6月 21, 2021

Add a field to mmu_psize_def to store the page size encodings
of H_RPT_INVALIDATE hcall. Initialize this while scanning the radix
AP encodings. This will be used when invalidating with required
page size encoding in the hcall.
Signed-off-by: NBharata B Rao <bharata@linux.ibm.com>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210621085003.904767-3-bharata@linux.ibm.com

d6265cb3

17 6月, 2021 1 次提交

powerpc/mm/book3s64: Fix possible build error · 07d8ad6f

由 Aneesh Kumar K.V 提交于 6月 10, 2021

Update _tlbiel_pid() such that we can avoid build errors like below when
using this function in other places.

arch/powerpc/mm/book3s64/radix_tlb.c: In function ‘__radix__flush_tlb_range_psize’:
arch/powerpc/mm/book3s64/radix_tlb.c:114:2: warning: ‘asm’ operand 3 probably does not match constraints
  114 |  asm volatile(PPC_TLBIEL(%0, %4, %3, %2, %1)
      |  ^~~
arch/powerpc/mm/book3s64/radix_tlb.c:114:2: error: impossible constraint in ‘asm’
make[4]: *** [scripts/Makefile.build:271: arch/powerpc/mm/book3s64/radix_tlb.o] Error 1
m

With this fix, we can also drop the __always_inline in __radix_flush_tlb_range_psize
which was added by commit e12d6d7d ("powerpc/mm/radix: mark __radix__flush_tlb_range_psize() as __always_inline")
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210610083639.387365-1-aneesh.kumar@linux.ibm.com

07d8ad6f

10 6月, 2021 2 次提交

KVM: PPC: Book3S HV: Implement radix prefetch workaround by disabling MMU · 2e1ae9cd

由 Nicholas Piggin 提交于 5月 28, 2021

Rather than partition the guest PID space + flush a rogue guest PID to
work around this problem, instead fix it by always disabling the MMU when
switching in or out of guest MMU context in HV mode.

This may be a bit less efficient, but it is a lot less complicated and
allows the P9 path to trivally implement the workaround too. Newer CPUs
are not subject to this issue.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210528090752.3542186-22-npiggin@gmail.com

2e1ae9cd

powerpc: Add missing linux/{of.h,irqdomain.h} include directives · 13a9a5d1

由 Marc Zyngier 提交于 9月 13, 2018

A bunch of PPC files are missing the inclusion of linux/of.h and
linux/irqdomain.h, relying on transitive inclusion from another
file.

As we are about to break this dependency, make sure these dependencies
are explicit.
Signed-off-by: NMarc Zyngier <maz@kernel.org>

13a9a5d1

01 5月, 2021 2 次提交

powerpc: inline huge vmap supported functions · 8309c9d7

由 Nicholas Piggin 提交于 4月 29, 2021

This allows unsupported levels to be constant folded away, and so
p4d_free_pud_page can be removed because it's no longer linked to.

Link: https://lkml.kernel.org/r/20210317062402.533919-8-npiggin@gmail.comSigned-off-by: NNicholas Piggin <npiggin@gmail.com>
Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ding Tianhong <dingtianhong@huawei.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8309c9d7

mm: HUGE_VMAP arch support cleanup · bbc180a5

由 Nicholas Piggin 提交于 4月 29, 2021

This changes the awkward approach where architectures provide init
functions to determine which levels they can provide large mappings for,
to one where the arch is queried for each call.

This removes code and indirection, and allows constant-folding of dead
code for unsupported levels.

This also adds a prot argument to the arch query.  This is unused
currently but could help with some architectures (e.g., some powerpc
processors can't map uncacheable memory with large pages).

Link: https://lkml.kernel.org/r/20210317062402.533919-7-npiggin@gmail.comSigned-off-by: NNicholas Piggin <npiggin@gmail.com>
Reviewed-by: NDing Tianhong <dingtianhong@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bbc180a5

17 4月, 2021 1 次提交

powerpc/traps: Enhance readability for trap types · 7153d4bf

由 Xiongwei Song 提交于 4月 14, 2021

Define macros to list ppc interrupt types in interttupt.h, replace the
reference of the trap hex values with these macros.

Referred the hex numbers in arch/powerpc/kernel/exceptions-64e.S,
arch/powerpc/kernel/exceptions-64s.S, arch/powerpc/kernel/head_*.S,
arch/powerpc/kernel/head_booke.h and arch/powerpc/include/asm/kvm_asm.h.
Signed-off-by: NXiongwei Song <sxwjean@gmail.com>
[mpe: Resolve conflicts in nmi_disables_ftrace(), fix 40x build]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1618398033-13025-1-git-send-email-sxwjean@me.com

7153d4bf

14 4月, 2021 3 次提交

powerpc/mm/radix: Make radix__change_memory_range() static · 7098f8f0

由 Michael Ellerman 提交于 4月 13, 2021

The lkp bot pointed out that with W=1 we get:

  arch/powerpc/mm/book3s64/radix_pgtable.c:183:6: error: no previous
  prototype for 'radix__change_memory_range'

Which is really saying that it could be static, make it so.
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7098f8f0

powerpc: clean up do_page_fault · c45ba4f4

由 Nicholas Piggin 提交于 3月 16, 2021

search_exception_tables + __bad_page_fault can be substituted with
bad_page_fault, do_page_fault no longer needs to return a value
to asm for any sub-architecture, and __bad_page_fault can be static.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210316104206.407354-10-npiggin@gmail.com

c45ba4f4

powerpc/mm: Add cond_resched() while removing hpte mappings · a5d6a3e7

由 Vaibhav Jain 提交于 4月 04, 2021

While removing large number of mappings from hash page tables for
large memory systems as soft-lockup is reported because of the time
spent inside htap_remove_mapping() like one below:

 watchdog: BUG: soft lockup - CPU#8 stuck for 23s!
 <snip>
 NIP plpar_hcall+0x38/0x58
 LR  pSeries_lpar_hpte_invalidate+0x68/0xb0
 Call Trace:
  0x1fffffffffff000 (unreliable)
  pSeries_lpar_hpte_removebolted+0x9c/0x230
  hash__remove_section_mapping+0xec/0x1c0
  remove_section_mapping+0x28/0x3c
  arch_remove_memory+0xfc/0x150
  devm_memremap_pages_release+0x180/0x2f0
  devm_action_release+0x30/0x50
  release_nodes+0x28c/0x300
  device_release_driver_internal+0x16c/0x280
  unbind_store+0x124/0x170
  drv_attr_store+0x44/0x60
  sysfs_kf_write+0x64/0x90
  kernfs_fop_write+0x1b0/0x290
  __vfs_write+0x3c/0x70
  vfs_write+0xd4/0x270
  ksys_write+0xdc/0x130
  system_call+0x5c/0x70

Fix this by adding a cond_resched() to the loop in
htap_remove_mapping() that issues hcall to remove hpte mapping. The
call to cond_resched() is issued every HZ jiffies which should prevent
the soft-lockup from being reported.
Suggested-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: NVaibhav Jain <vaibhav@linux.ibm.com>
Reviewed-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210404163148.321346-1-vaibhav@linux.ibm.com

a5d6a3e7

08 4月, 2021 5 次提交

powerpc/mm/64s/hash: Add real-mode change_memory_range() for hash LPAR · 87e65ad7

由 Michael Ellerman 提交于 3月 31, 2021

When we enabled STRICT_KERNEL_RWX we received some reports of boot
failures when using the Hash MMU and running under phyp. The crashes
are intermittent, and often exhibit as a completely unresponsive
system, or possibly an oops.

One example, which was caught in xmon:

  [   14.068327][    T1] devtmpfs: mounted
  [   14.069302][    T1] Freeing unused kernel memory: 5568K
  [   14.142060][  T347] BUG: Unable to handle kernel instruction fetch
  [   14.142063][    T1] Run /sbin/init as init process
  [   14.142074][  T347] Faulting instruction address: 0xc000000000004400
  cpu 0x2: Vector: 400 (Instruction Access) at [c00000000c7475e0]
      pc: c000000000004400: exc_virt_0x4400_instruction_access+0x0/0x80
      lr: c0000000001862d4: update_rq_clock+0x44/0x110
      sp: c00000000c747880
     msr: 8000000040001031
    current = 0xc00000000c60d380
    paca    = 0xc00000001ec9de80   irqmask: 0x03   irq_happened: 0x01
      pid   = 347, comm = kworker/2:1
  ...
  enter ? for help
  [c00000000c747880] c0000000001862d4 update_rq_clock+0x44/0x110 (unreliable)
  [c00000000c7478f0] c000000000198794 update_blocked_averages+0xb4/0x6d0
  [c00000000c7479f0] c000000000198e40 update_nohz_stats+0x90/0xd0
  [c00000000c747a20] c0000000001a13b4 _nohz_idle_balance+0x164/0x390
  [c00000000c747b10] c0000000001a1af8 newidle_balance+0x478/0x610
  [c00000000c747be0] c0000000001a1d48 pick_next_task_fair+0x58/0x480
  [c00000000c747c40] c000000000eaab5c __schedule+0x12c/0x950
  [c00000000c747cd0] c000000000eab3e8 schedule+0x68/0x120
  [c00000000c747d00] c00000000016b730 worker_thread+0x130/0x640
  [c00000000c747da0] c000000000174d50 kthread+0x1a0/0x1b0
  [c00000000c747e10] c00000000000e0f0 ret_from_kernel_thread+0x5c/0x6c

This shows that CPU 2, which was idle, woke up and then appears to
randomly take an instruction fault on a completely valid area of
kernel text.

The cause turns out to be the call to hash__mark_rodata_ro(), late in
boot. Due to the way we layout text and rodata, that function actually
changes the permissions for all of text and rodata to read-only plus
execute.

To do the permission change we use a hypervisor call, H_PROTECT. On
phyp that appears to be implemented by briefly removing the mapping of
the kernel text, before putting it back with the updated permissions.
If any other CPU is executing during that window, it will see spurious
faults on the kernel text and/or data, leading to crashes.

To fix it we use stop machine to collect all other CPUs, and then have
them drop into real mode (MMU off), while we change the mapping. That
way they are unaffected by the mapping temporarily disappearing.

We don't see this bug on KVM because KVM always use VPM=1, where
faults are directed to the hypervisor, and the fault will be
serialised vs the h_protect() by HPTE_V_HVLOCK.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210331003845.216246-5-mpe@ellerman.id.au

87e65ad7

powerpc/mm/64s/hash: Factor out change_memory_range() · 6f223ebe

由 Michael Ellerman 提交于 3月 31, 2021

Pull the loop calling hpte_updateboltedpp() out of
hash__change_memory_range() into a helper function. We need it to be a
separate function for the next patch.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210331003845.216246-4-mpe@ellerman.id.au

6f223ebe

powerpc/64s: Use htab_convert_pte_flags() in hash__mark_rodata_ro() · 2c02e656

由 Michael Ellerman 提交于 3月 31, 2021

In hash__mark_rodata_ro() we pass the raw PP_RXXX value to
hash__change_memory_range(). That has the effect of setting the key to
zero, because PP_RXXX contains no key value.

Fix it by using htab_convert_pte_flags(), which knows how to convert a
pgprot into a pp value, including the key.

Fixes: d94b827e ("powerpc/book3s64/kuap: Use Key 3 for kernel mapping with hash translation")
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NDaniel Axtens <dja@axtens.net>
Link: https://lore.kernel.org/r/20210331003845.216246-3-mpe@ellerman.id.au

2c02e656

powerpc/64s: Fix pte update for kernel memory on radix · b8b2f37c

由 Jordan Niethe 提交于 2月 08, 2021

When adding a PTE a ptesync is needed to order the update of the PTE
with subsequent accesses otherwise a spurious fault may be raised.

radix__set_pte_at() does not do this for performance gains. For
non-kernel memory this is not an issue as any faults of this kind are
corrected by the page fault handler. For kernel memory these faults
are not handled. The current solution is that there is a ptesync in
flush_cache_vmap() which should be called when mapping from the
vmalloc region.

However, map_kernel_page() does not call flush_cache_vmap(). This is
troublesome in particular for code patching with Strict RWX on radix.
In do_patch_instruction() the page frame that contains the instruction
to be patched is mapped and then immediately patched. With no ordering
or synchronization between setting up the PTE and writing to the page
it is possible for faults.

As the code patching is done using __put_user_asm_goto() the resulting
fault is obscured - but using a normal store instead it can be seen:

  BUG: Unable to handle kernel data access on write at 0xc008000008f24a3c
  Faulting instruction address: 0xc00000000008bd74
  Oops: Kernel access of bad area, sig: 11 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
  Modules linked in: nop_module(PO+) [last unloaded: nop_module]
  CPU: 4 PID: 757 Comm: sh Tainted: P           O      5.10.0-rc5-01361-ge3c1b78c8440-dirty #43
  NIP:  c00000000008bd74 LR: c00000000008bd50 CTR: c000000000025810
  REGS: c000000016f634a0 TRAP: 0300   Tainted: P           O       (5.10.0-rc5-01361-ge3c1b78c8440-dirty)
  MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 44002884  XER: 00000000
  CFAR: c00000000007c68c DAR: c008000008f24a3c DSISR: 42000000 IRQMASK: 1

This results in the kind of issue reported here:
  https://lore.kernel.org/linuxppc-dev/15AC5B0E-A221-4B8C-9039-FA96B8EF7C88@lca.pw/

Chris Riedl suggested a reliable way to reproduce the issue:
  $ mount -t debugfs none /sys/kernel/debug
  $ (while true; do echo function > /sys/kernel/debug/tracing/current_tracer ; echo nop > /sys/kernel/debug/tracing/current_tracer ; done) &

Turning ftrace on and off does a large amount of code patching which
in usually less then 5min will crash giving a trace like:

   ftrace-powerpc: (____ptrval____): replaced (4b473b11) != old (60000000)
   ------------[ ftrace bug ]------------
   ftrace failed to modify
   [<c000000000bf8e5c>] napi_busy_loop+0xc/0x390
    actual:   11:3b:47:4b
   Setting ftrace call site to call ftrace function
   ftrace record flags: 80000001
    (1)
    expected tramp: c00000000006c96c
   ------------[ cut here ]------------
   WARNING: CPU: 4 PID: 809 at kernel/trace/ftrace.c:2065 ftrace_bug+0x28c/0x2e8
   Modules linked in: nop_module(PO-) [last unloaded: nop_module]
   CPU: 4 PID: 809 Comm: sh Tainted: P           O      5.10.0-rc5-01360-gf878ccaf250a #1
   NIP:  c00000000024f334 LR: c00000000024f330 CTR: c0000000001a5af0
   REGS: c000000004c8b760 TRAP: 0700   Tainted: P           O       (5.10.0-rc5-01360-gf878ccaf250a)
   MSR:  900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 28008848  XER: 20040000
   CFAR: c0000000001a9c98 IRQMASK: 0
   GPR00: c00000000024f330 c000000004c8b9f0 c000000002770600 0000000000000022
   GPR04: 00000000ffff7fff c000000004c8b6d0 0000000000000027 c0000007fe9bcdd8
   GPR08: 0000000000000023 ffffffffffffffd8 0000000000000027 c000000002613118
   GPR12: 0000000000008000 c0000007fffdca00 0000000000000000 0000000000000000
   GPR16: 0000000023ec37c5 0000000000000000 0000000000000000 0000000000000008
   GPR20: c000000004c8bc90 c0000000027a2d20 c000000004c8bcd0 c000000002612fe8
   GPR24: 0000000000000038 0000000000000030 0000000000000028 0000000000000020
   GPR28: c000000000ff1b68 c000000000bf8e5c c00000000312f700 c000000000fbb9b0
   NIP ftrace_bug+0x28c/0x2e8
   LR  ftrace_bug+0x288/0x2e8
   Call Trace:
     ftrace_bug+0x288/0x2e8 (unreliable)
     ftrace_modify_all_code+0x168/0x210
     arch_ftrace_update_code+0x18/0x30
     ftrace_run_update_code+0x44/0xc0
     ftrace_startup+0xf8/0x1c0
     register_ftrace_function+0x4c/0xc0
     function_trace_init+0x80/0xb0
     tracing_set_tracer+0x2a4/0x4f0
     tracing_set_trace_write+0xd4/0x130
     vfs_write+0xf0/0x330
     ksys_write+0x84/0x140
     system_call_exception+0x14c/0x230
     system_call_common+0xf0/0x27c

To fix this when updating kernel memory PTEs using ptesync.

Fixes: f1cb8f9b ("powerpc/64s/radix: avoid ptesync after set_pte and ptep_set_access_flags")
Signed-off-by: NJordan Niethe <jniethe5@gmail.com>
Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
[mpe: Tidy up change log slightly]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210208032957.1232102-1-jniethe5@gmail.com

b8b2f37c

powerpc: Spelling/typo fixes · 4763d378

由 Bhaskar Chowdhury 提交于 1月 20, 2021

Various spelling/typo fixes.
Signed-off-by: NBhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4763d378

29 3月, 2021 3 次提交

powerpc/64s: Fix hash fault to use TRAP accessor · 1479e3d3

由 Nicholas Piggin 提交于 3月 16, 2021

Hash faults use the trap vector to decide whether this is an
instruction or data fault. This should use the TRAP accessor
rather than open access regs->trap.

This won't cause a problem at the moment because 64s only uses
trap flags for system call interrupts (the norestart flag), but
that could change if any other trap flags get used in future.

Fixes: a4922f54 ("powerpc/64s: move the hash fault handling logic to C")
Suggested-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210316105205.407767-1-npiggin@gmail.com

1479e3d3

powerpc/64s: Fold update_current_thread_[i]amr() into their only callers · c2a2a5d0

由 Michael Ellerman 提交于 3月 14, 2021

lkp reported warnings in some configuration due to
update_current_thread_amr() being unused:

  arch/powerpc/mm/book3s64/pkeys.c:284:20: error: unused function 'update_current_thread_amr'
  static inline void update_current_thread_amr(u64 value)

Which is because it's only use is inside an ifdef. We could move it
inside the ifdef, but it's a single line function and only has one
caller, so just fold it in.

Similarly update_current_thread_iamr() is small and only called once,
so fold it in also.

Fixes: 48a8ab4e ("powerpc/book3s64/pkeys: Don't update SPRN_AMR when in kernel mode.")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210314093320.132331-1-mpe@ellerman.id.au

c2a2a5d0

powerpc/mm/book3s64: Fix a typo in mmu_context.c · 7a7d744f

由 Bhaskar Chowdhury 提交于 3月 12, 2021

s/detalis/details/
Signed-off-by: NBhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210312112537.4585-1-unixbhaskar@gmail.com

7a7d744f

11 2月, 2021 2 次提交

powerpc/mm: Add PG_dcache_clean to indicate dcache clean state · ec94b9b2

由 Aneesh Kumar K.V 提交于 2月 03, 2021

This just add a better name for PG_arch_1. No functional change in this patch.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210203045812.234439-2-aneesh.kumar@linux.ibm.com

ec94b9b2

powerpc/mm/64s: Fix no previous prototype warning · 2bb421a3

由 Michael Ellerman 提交于 2月 11, 2021

As reported by lkp:

  arch/powerpc/mm/book3s64/radix_tlb.c:646:6: warning: no previous
  prototype for function 'exit_lazy_flush_tlb'

Fix it by moving the prototype into the existing header.

Fixes: 032b7f08 ("powerpc/64s/radix: serialize_against_pte_lookup IPIs trim mm_cpumask")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210210130804.3190952-2-mpe@ellerman.id.au

2bb421a3

08 2月, 2021 7 次提交

powerpc/64s/radix: serialize_against_pte_lookup IPIs trim mm_cpumask · 032b7f08

由 Nicholas Piggin 提交于 12月 17, 2020

serialize_against_pte_lookup() performs IPIs to all CPUs in mm_cpumask.
Take this opportunity to try trim the CPU out of mm_cpumask. This can
reduce the cost of future serialize_against_pte_lookup() and/or the
cost of future TLB flushes.
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201217134731.488135-7-npiggin@gmail.com

032b7f08

powerpc/64s/radix: occasionally attempt to trim mm_cpumask · 93935448

由 Nicholas Piggin 提交于 12月 17, 2020

A single-threaded process that is flushing its own address space is
so far the only case where the mm_cpumask is attempted to be trimmed.
This patch expands that to flush in other situations, multi-threaded
processes and external sources. For now it's a relatively simple
occasional trim attempt. The main aim is to add the mechanism,
tweaking and tuning can come with more data.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201217134731.488135-6-npiggin@gmail.com

93935448

powerpc/64s/radix: Allow mm_cpumask trimming from external sources · 780de406

由 Nicholas Piggin 提交于 12月 17, 2020

mm_cpumask trimming is currently restricted to be issued by the current
thread of a single-threaded mm. This patch relaxes that and allows the
mask to be trimmed from any context.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201217134731.488135-5-npiggin@gmail.com

780de406

powerpc/64s/radix: Check for no TLB flush required · 54bb5033

由 Nicholas Piggin 提交于 12月 17, 2020

If there are no CPUs in mm_cpumask, no TLB flush is required at all.
This patch adds a check for this case.

Currently it's not tested for, in fact mm_is_thread_local() returns
false if the current CPU is not in mm_cpumask, so it's treated as a
global flush.

This can come up in some cases like exec failure before the new mm has
ever been switched to. This patch reduces TLBIE instructions required
to build a kernel from about 120,000 to 45,000. Another situation it
could help is page reclaim, KSM, THP, etc., (i.e., asynch operations
external to the process) where the process is sleeping and has all TLBs
flushed out of all CPUs.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201217134731.488135-4-npiggin@gmail.com

54bb5033

powerpc/64s/radix: refactor TLB flush type selection · 26418b36

由 Nicholas Piggin 提交于 12月 17, 2020

The logic to decide what kind of TLB flush is required (local, global,
or IPI) is spread multiple times over the several kinds of TLB flushes.

Move it all into a single function which may issue IPIs if necessary,
and also returns a flush type that is to be used.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201217134731.488135-3-npiggin@gmail.com

26418b36

powerpc/64s/radix: add warning and comments in mm_cpumask trim · a2496049

由 Nicholas Piggin 提交于 12月 17, 2020

Add a comment explaining part of the logic for mm_cpumask trimming, and
add a (hopefully graceful) check and warning in case something gets it
wrong.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201217134731.488135-2-npiggin@gmail.com

a2496049

powerpc/64: context tracking move to interrupt wrappers · 540d4d34

由 Nicholas Piggin 提交于 1月 30, 2021

This moves exception_enter/exit calls to wrapper functions for
synchronous interrupts. More interrupt handlers are covered by
this than previously.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210130130852.2952424-33-npiggin@gmail.com

540d4d34

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功