提交 · ac94ac79dc0fc473aebcbe2ba4c26a5341b866be · openeuler / Kernel

01 5月, 2016 30 次提交

powerpc/mm: Add radix callbacks to pte accessors · ac94ac79

由 Aneesh Kumar K.V 提交于 4月 29, 2016

For those pte accessors, that operate on a different set of pte bits
between hash/radix, we add a generic variant that does a conditional
to hash linux or radix variant.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ac94ac79

powerpc/mm/radix: Add dummy radix_enabled() · 566ca99a

由 Aneesh Kumar K.V 提交于 4月 29, 2016

In this patch we add the radix Kconfig and conditional check.
radix_enabled() is written to always return 0 here. Once we have all
needed radix changes added, we will update this to an mmu_feature check.

We need to add this early so that we can get it all build in the early
stage.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

566ca99a

powerpc/mm/radix: Add radix pte #defines · b0b5e9b1

由 Aneesh Kumar K.V 提交于 4月 29, 2016

This adds Power ISA 3.0 specific pte defines. We share most of the
details with hash Linux page table format. This patch indicates only
things where we differ.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b0b5e9b1

powerpc/mm: Move pte related functions together · 34fbadd8

由 Aneesh Kumar K.V 提交于 4月 29, 2016

Only code movement. No functionality change.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NBalbir Singh <bsingharora@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

34fbadd8

powerpc/mm: Move page table index and and vaddr to pgtable.h · aba480e1

由 Aneesh Kumar K.V 提交于 4月 29, 2016

Now that the page table size is a variable, we can move these to
generic pgtable.h.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

aba480e1

powerpc/mm: Make page table size a variable · dd1842a2

由 Aneesh Kumar K.V 提交于 4月 29, 2016

Radix and hash MMU models support different page table sizes. Make
the #defines a variable so that existing code can work with variable
sizes.

Slice related code is only used by hash, so use hash constants there. We
will replicate some of the boundary conditions with resepct to TASK_SIZE
using radix values too. Right now we do boundary condition check using
hash constants.

Swapper pgdir size is initialized in asm code. We select the max pgd
size to keep it simple. For now we select hash pgdir. When adding radix
we will switch that to radix pgdir which is 64K.

BUILD_BUG_ON check which is removed is already done in hugepage_init()
using MAYBE_BUILD_BUG_ON().
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

dd1842a2

powerpc/mm: Move pte accessors that operate on common pte bits to pgtable.h · 13f829a5

由 Aneesh Kumar K.V 提交于 4月 29, 2016

These pte functions will remain the same between radix and hash. Move
them to pgtable.h.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

13f829a5

powerpc/mm: Move common pte bits and accessors to book3s/64/pgtable.h · 2e873519

由 Aneesh Kumar K.V 提交于 4月 29, 2016

Now that we have moved book3s hash64 Linux pte bits to match Power ISA
3.0 radix pte bit positions, we move the matching pte bits to a common
header.

Only code movement in this patch. No functionality change.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

2e873519

powerpc/mm: Handle _PTE_NONE_MASK · d2cf0050

由 Aneesh Kumar K.V 提交于 4月 29, 2016

I am splitting this as a separate patch to get better review. If ok
we should merge this with previous patch.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

d2cf0050

powerpc/mm/book3s: Rename hash specific PTE bits to carry H_ prefix · 945537df

由 Aneesh Kumar K.V 提交于 4月 29, 2016

This helps to make following hash only pte bits easier.

We have kept _PAGE_CHG_MASK, _HPAGE_CHG_MASK and _PAGE_PROT_BITS as it
is in this patch eventhough they use hash specific bits. Using them in
radix as it is should be ok, because with radix we expect those bit
positions to be zero.

Only renames in this patch, no change in functionality.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

945537df

powerpc/mm: Move hash and no hash code to separate files · eee24b5a

由 Aneesh Kumar K.V 提交于 4月 29, 2016

This patch reduces the number of #ifdefs in C code and will also help in
adding radix changes later. Only code movement in this patch.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Propagate copyrights and update GPL text]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

eee24b5a

powerpc/mm/hash: Add support for Power9 Hash · 50de596d

由 Aneesh Kumar K.V 提交于 4月 29, 2016

PowerISA 3.0 adds a parition table indexed by LPID. Parition table
allows us to specify the MMU model that will be used for guest and host
translation.

This patch adds support with SLB based hash model (UPRT = 0). What is
required with this model is to support the new hash page table entry
format and also setup partition table such that we use hash table for
address translation.

We don't have segment table support yet.

In order to make sure we don't load KVM module on Power9 (since we don't
have kvm support yet) this patch also disables KVM on Power9.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

50de596d

powerpc/mm/radix: Add partition table format & callback · e9983344

由 Aneesh Kumar K.V 提交于 4月 29, 2016

Add structs and #defines related to the radix MMU partition table
format. We also add a ppc_md callback for updating a partition table
entry.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e9983344

powerpc/mm: Move radix/hash common data structures to book3s64 headers · 11a6f6ab

由 Aneesh Kumar K.V 提交于 4月 29, 2016

Start moving code that is generic between radix and hash to book3s64
specific headers from the book3s64 hash specific one.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

11a6f6ab

powerpc/mm: Use generic version of ptep_clear_flush_young() · 33d336d9

由 Aneesh Kumar K.V 提交于 4月 29, 2016

The radix variant is going to require a flush_tlb_range(). With
flush_tlb_range() added, ptep_clear_flush_young() is the same as the
generic version. So drop the powerpc specific variant.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

33d336d9

powerpc/mm: Use generic version of pmdp_clear_flush_young() · ff844b74

由 Aneesh Kumar K.V 提交于 4月 29, 2016

The radix variant is going to require a flush_pmd_tlb_range(). With
flush_pmd_tlb_range() added, pmdp_clear_flush_young() is the same as the
generic version. So drop the powerpc specific variant.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ff844b74

powerpc/mm: Drop WIMG in favour of new constants · 30bda41a

由 Aneesh Kumar K.V 提交于 4月 29, 2016

PowerISA 3.0 introduces two pte bits with the below meaning for radix:
  00 -> Normal Memory
  01 -> Strong Access Order (SAO)
  10 -> Non idempotent I/O (Cache inhibited and guarded)
  11 -> Tolerant I/O (Cache inhibited)

We drop the existing WIMG bits in the Linux page table in favour of the
above constants. We loose _PAGE_WRITETHRU with this conversion. We only
use writethru via pgprot_cached_wthru() which is used by
fbdev/controlfb.c which is Apple control display and also PPC32.

With respect to _PAGE_COHERENCE, we have been marking hpte always
coherent for some time now. htab_convert_pte_flags() always added
HPTE_R_M.

NOTE: KVM changes need closer review.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

30bda41a

powerpc/mm: Use a helper for finding pte bits mapping I/O area · 72176dd0

由 Aneesh Kumar K.V 提交于 4月 29, 2016

Use a helper instead of open coding with constants. A later patch will
drop the WIMG bits and use PowerISA 3.0 defines.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

72176dd0

powerpc/mm: Update _PAGE_KERNEL_RO · e58e87ad

由 Aneesh Kumar K.V 提交于 4月 29, 2016

PS3 had used a PPP bit hack to implement a read only mapping in the
kernel area. Since we are bolting the ioremap area, it used the pte
flags _PAGE_PRESENT | _PAGE_USER to get a PPP value of 0x3 there by
resulting in a read only mapping. This means the area can be accessed by
user space, but kernel will never return such an address to user space.

But we can do better by implementing a read only kernel mapping using
PPP bits 0b110.

This also allows us to do read only kernel mapping for radix in later
patches.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e58e87ad

powerpc/mm: Remove RPN_SHIFT and RPN_SIZE · 96270b1f

由 Aneesh Kumar K.V 提交于 4月 29, 2016

PTE_RPN_SHIFT is actually page size dependent. Even though PowerISA 3.0
expects only the lower 12 bits to be zero, we will always find the pages
to be PAGE_SHIFT aligned. In case of hash config, this also allows us to
use the additional 3 bits to track pte specific information. We need
to make sure we use these bits only for hash specific pte flags.

For both 4K and 64K config, pte now can hold 57 bits address.

Inorder to keep things simpler, drop PTE_RPN_SHIFT and PTE_RPN_SIZE and
specify the 57 bit detail explicitly.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

96270b1f

powerpc/mm: Replace _PAGE_USER with _PAGE_PRIVILEGED · ac29c640

由 Aneesh Kumar K.V 提交于 4月 29, 2016

_PAGE_PRIVILEGED means the page can be accessed only by the kernel. This
is done to keep pte bits similar to PowerISA 3.0 Radix PTE format. User
pages are now marked by clearing _PAGE_PRIVILEGED bit.

Previously we allowed the kernel to have a privileged page in the lower
address range (USER_REGION). With this patch such access is denied.

We also prevent a kernel access to a non-privileged page in higher
address range (ie, REGION_ID != 0).

Both the above access scenarios should never happen.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jeremy Kerr <jk@ozlabs.org>
Cc: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ac29c640

powerpc/mm: Use pte_user() instead of open coding · e7bfc462

由 Aneesh Kumar K.V 提交于 4月 29, 2016

We have a common declaration in pte-common.h Add a book3s specific one
and switch to pte_user() in callchain.c. In a subsequent patch we will
switch _PAGE_USER to _PAGE_PRIVILEGED in the book3s version only.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e7bfc462

powerpc/mm: Convert pte_user() to static inline · 7e1e63c5

由 Michael Ellerman 提交于 4月 29, 2016

In a subsequent patch we want to add a second definition of pte_user().
Before we do that, make the signature clear, ie. it takes a pte_t and
returns bool.

We move it up inside the existing #ifndef __ASSEMBLY__ block, but
otherwise it's a straight conversion.

Convert the call in settlbcam(), which passes an unsigned long, to pass
a pte_t.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7e1e63c5

powerpc/mm/subpage: Clear RWX bit to indicate no access · 73a1441a

由 Aneesh Kumar K.V 提交于 4月 29, 2016

Subpage protection used to depend on the _PAGE_USER bit to implement no
access mode. This patch switches that to use _PAGE_RWX. We clear Read,
Write and Execute access from the pte instead of clearing _PAGE_USER
now. This was done so that we can switch to _PAGE_PRIVILEGED in a later
patch.

subpage_protection() returns pte bits that need to be cleared. Instead
of updating the interface to handle no-access in a separate way, it
appears simpler to clear RWX acecss to indicate no access.

We still don't insert hash ptes for no access implied by !_PAGE_RWX.
Hence we should not get PROT_FAULT with change.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

73a1441a

powerpc/mm: Use _PAGE_READ to indicate Read access · c7d54842

由 Aneesh Kumar K.V 提交于 4月 29, 2016

This splits the _PAGE_RW bit into _PAGE_READ and _PAGE_WRITE. It also
removes the dependency on _PAGE_USER for implying read only. Few things
to note here is that, we have read implied with write and execute
permission. Hence we should always find _PAGE_READ set on hash pte
fault.

We still can't switch PROT_NONE to !(_PAGE_RWX). Auto numa depends on
marking a prot none pte _PAGE_WRITE. (For more details look at
b191f9b1 "mm: numa: preserve PTE write permissions across a NUMA
hinting fault")

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jeremy Kerr <jk@ozlabs.org>
Cc: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c7d54842

powerpc/mm: Use pte_raw() in pte_same()/pmd_same() · ee3caed3

由 Michael Ellerman 提交于 4月 29, 2016

We can avoid doing endian conversions by using pte_raw() in pxx_same().
The swap of the constant (_PAGE_HPTEFLAGS) should be done at compile
time by the compiler.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ee3caed3

powerpc/mm: Use big endian Linux page tables for book3s 64 · 5dc1ef85

由 Aneesh Kumar K.V 提交于 4月 29, 2016

Traditionally Power server machines have used the Hashed Page Table MMU
mode. In this mode Linux manages its own tree of nested page tables,
aka. "the Linux page tables", which are not used by the hardware
directly, and software loads translations into the hash page table for
use by the hardware.

Power ISA 3.0 defines a new MMU mode, known as Radix Tree Translation,
where the hardware can directly operate on the Linux page tables.
However the hardware requires that the page tables be in big endian
format.

To accommodate this, switch the pgtable types to __be64 and add
appropriate endian conversions.

Because we will be supporting a single kernel binary that boots using
either radix or hash mode, we always store the Linux page tables big
endian, even in hash mode where they are not actually used by the
hardware.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Fix sparse errors, flesh out change log]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

5dc1ef85

powerpc/mm: Add pte_xchg() helper · 3910a7f4

由 Michael Ellerman 提交于 4月 29, 2016

We have five locations in 64-bit hash MMU code that do a cmpxchg() of a
PTE. Currently doing it inline OK, but in a future patch we will be
converting the PTEs to __be64 in some configs. In that case we will need
casts at every cmpxchg() site in order to keep sparse happy.

So move the logic into a helper, this is a reasonably nice cleanup on
its own.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

3910a7f4

powerpc/mm: Drop PTE_ATOMIC_UPDATES from pmd_hugepage_update() · 4bece39b

由 Aneesh Kumar K.V 提交于 4月 29, 2016

pmd_hugepage_update() is inside #ifdef CONFIG_TRANSPARENT_HUGEPAGE. THP
can only be enabled if PPC_BOOK3S_64=y && PPC_64K_PAGES=y, aka. hash64.

On hash64 we always define PTE_ATOMIC_UPDATES to 1, meaning the #ifdef
in pmd_hugepage_update() is unnecessary, so drop it.

That is also the only use of PTE_ATOMIC_UPDATES in any of the hash code,
meaning we no longer need to #define it at all in the hash headers.

Note it's still #defined and used in the nohash code.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4bece39b

powerpc/mm: Always use STRICT_MM_TYPECHECKS · 670eea92

由 Michael Ellerman 提交于 4月 29, 2016

Testing done by Paul Mackerras has shown that with a modern compiler
there is no negative effect on code generation from enabling
STRICT_MM_TYPECHECKS.

So remove the option, and always use the strict type definitions.
Acked-by: NPaul Mackerras <paulus@ozlabs.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

670eea92

27 4月, 2016 6 次提交

powerpc/perf: Replace raw event hex values with #defines · 5bcca743

由 Madhavan Srinivasan 提交于 4月 21, 2016

Minor cleanup patch to replace the raw event hex values in
power8-pmu.c with #defines.
Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

5bcca743

ftrace: Match dot symbols when searching functions on ppc64 · 7132e2d6

由 Thiago Jung Bauermann 提交于 4月 25, 2016

In the ppc64 big endian ABI, function symbols point to function
descriptors. The symbols which point to the function entry points
have a dot in front of the function name. Consequently, when the
ftrace filter mechanism searches for the symbol corresponding to
an entry point address, it gets the dot symbol.

As a result, ftrace filter users have to be aware of this ABI detail on
ppc64 and prepend a dot to the function name when setting the filter.

The perf probe command insulates the user from this by ignoring the dot
in front of the symbol name when matching function names to symbols,
but the sysfs interface does not. This patch makes the ftrace filter
mechanism do the same when searching symbols.

Fixes the following failure in ftracetest's kprobe_ftrace.tc:

  .../kprobe_ftrace.tc: line 9: echo: write error: Invalid argument

That failure is on this line of kprobe_ftrace.tc:

  echo _do_fork > set_ftrace_filter

This is because there's no _do_fork entry in the functions list:

  # cat available_filter_functions | grep _do_fork
  ._do_fork

This change introduces no regressions on the perf and ftracetest
testsuite results.

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: NThiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7132e2d6

powerpc: rework sparse for lib/xor_vmx.c · 8fe08885

由 Daniel Axtens 提交于 4月 26, 2016

Sparse doesn't seem to be passing -maltivec around properly, leading
to lots of errors:

.../include/altivec.h:34:2: error: Use the "-maltivec" flag to enable PowerPC AltiVec support
arch/powerpc/lib/xor_vmx.c:27:16: error: Expected ; at end of declaration
arch/powerpc/lib/xor_vmx.c:27:16: error: got signed
arch/powerpc/lib/xor_vmx.c:60:9: error: No right hand side of '*'-expression
arch/powerpc/lib/xor_vmx.c:60:9: error: Expected ; at end of statement
arch/powerpc/lib/xor_vmx.c:60:9: error: got v1_in
...
arch/powerpc/lib/xor_vmx.c:87:9: error: too many errors

Only include the altivec.h header for non-__CHECKER__ builds.
For builds with __CHECKER__, make up some stubs instead, as
suggested by Balbir. (The vector size of 16 is arbitrary.)
Suggested-by: NBalbir Singh <bsingharora@gmail.com>
Signed-off-by: NDaniel Axtens <dja@axtens.net>
Tested-by: NBalbir Singh <bsingharora@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

8fe08885

powerpc: Add support for userspace P9 copy paste · 8a649045

由 Chris Smart 提交于 4月 26, 2016

The copy paste facility introduced in POWER9 provides an optimised
mechanism for a userspace application to copy a cacheline. This is
provided by a pair of instructions, copy and paste, while a third,
cp_abort (copy paste abort), provides a clean up of the state in case of
a failure.

The copy instruction will read a 128 byte cacheline and store it in an
internal buffer. The subsequent paste instruction will store this
internal buffer to memory and set a CR field if the paste succeeds.

Since the state of the copy paste buffer is internal (and not
architecturally visible), in the unlikely event of a context switch, the
state cannot be stored and the paste should therefore fail.

The cp_abort instruction exists to fail and clean up any such
interrupted copy paste sequence and is to be called by the kernel as
part of the context switch. Doing so prevents data from a preceding copy
in one process leaking into the paste of another.

This code enables use of the cp_abort instruction if a supported
processor is detected.

NOTE: this is for userspace only, not in kernel, and does not deal
with KVM guests.

Patch created with much assistance from Michael Neuling
<mikey@neuling.org>
Signed-off-by: NChris Smart <chris@distroguy.com>
Reviewed-by: NCyril Bur <cyrilbur@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

8a649045

powerpc/mpic: handle subsys_system_register() failure · 4ad5e883

由 Andrew Donnellan 提交于 4月 26, 2016

mpic_init_sys() currently doesn't check whether
subsys_system_register() succeeded or not. Check the return code of
subsys_system_register() and clean up if there's an error.
Signed-off-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4ad5e883

powerpc/eeh: fix misleading indentation · 2d521784

由 Andrew Donnellan 提交于 4月 26, 2016

Found by smatch.
Signed-off-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: NRussell Currey <ruscur@russell.cc>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

2d521784

21 4月, 2016 4 次提交

powerpc/perf: Add support for sampling interrupt register state · ed4a4ef8

由 Anju T 提交于 2月 20, 2016

The perf infrastructure uses a bit mask to find out valid registers to
display. Define a register mask for supported registers defined in
uapi/asm/perf_regs.h. The bit positions also correspond to register IDs
which is used by perf infrastructure to fetch the register values.
CONFIG_HAVE_PERF_REGS enables sampling of the interrupted machine state.
Signed-off-by: NAnju T <anju@linux.vnet.ibm.com>
[mpe: Add license, use CONFIG_PPC64, fix 32-bit build]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ed4a4ef8

powerpc/perf: Assign an id to each powerpc register · 1bfadabf

由 Anju T 提交于 2月 20, 2016

The enum definition assigns an 'id' to each register in "struct pt_regs"
of arch/powerpc. The order of these values in the enum definition are
based on the order of members in pt_regs.
Signed-off-by: NAnju T <anju@linux.vnet.ibm.com>
[mpe: Rename LNK to LINK, use _UAPI_ASM for include guards]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

1bfadabf

powerpc/book3s64: Remove __end_handlers marker · 057b6d7e

由 Hari Bathini 提交于 4月 08, 2016

The __end_handlers marker was intended to mark down upto code that gets
called from exception prologs. But that hasn't kept pace with code
changes. Case in point, slb_miss_realmode being called from exception
prolog code but isn't below __end_handlers marker. So, __end_handlers
marker is as good as a comment but could be misleading at times if it
isn't in sync with the code, as is the case now. So, let us avoid this
confusion by having a better comment and removing __end_handlers marker
altogether.
Signed-off-by: NHari Bathini <hbathini@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

057b6d7e

powerpc/book3s64: Fix branching to OOL handlers in relocatable kernel · 8ed8ab40

由 Hari Bathini 提交于 4月 15, 2016

Some of the interrupt vectors on 64-bit POWER server processors are only
32 bytes long (8 instructions), which is not enough for the full
first-level interrupt handler. For these we need to branch to an
out-of-line (OOL) handler. But when we are running a relocatable kernel,
interrupt vectors till __end_interrupts marker are copied down to real
address 0x100. So, branching to labels (ie. OOL handlers) outside this
section must be handled differently (see LOAD_HANDLER()), considering
relocatable kernel, which would need at least 4 instructions.

However, branching from interrupt vector means that we corrupt the
CFAR (come-from address register) on POWER7 and later processors as
mentioned in commit 1707dd16. So, EXCEPTION_PROLOG_0 (6 instructions)
that contains the part up to the point where the CFAR is saved in the
PACA should be part of the short interrupt vectors before we branch out
to OOL handlers.

But as mentioned already, there are interrupt vectors on 64-bit POWER
server processors that are only 32 bytes long (like vectors 0x4f00,
0x4f20, etc.), which cannot accomodate the above two cases at the same
time owing to space constraint. Currently, in these interrupt vectors,
we simply branch out to OOL handlers, without using LOAD_HANDLER(),
which leaves us vulnerable when running a relocatable kernel (eg. kdump
case). While this has been the case for sometime now and kdump is used
widely, we were fortunate not to see any problems so far, for three
reasons:

  1. In almost all cases, production kernel (relocatable) is used for
     kdump as well, which would mean that crashed kernel's OOL handler
     would be at the same place where we end up branching to, from short
     interrupt vector of kdump kernel.
  2. Also, OOL handler was unlikely the reason for crash in almost all
     the kdump scenarios, which meant we had a sane OOL handler from
     crashed kernel that we branched to.
  3. On most 64-bit POWER server processors, page size is large enough
     that marking interrupt vector code as executable (see commit
     429d2e83) leads to marking OOL handler code from crashed kernel,
     that sits right below interrupt vector code from kdump kernel, as
     executable as well.

Let us fix this by moving the __end_interrupts marker down past OOL
handlers to make sure that we also copy OOL handlers to real address
0x100 when running a relocatable kernel.

This fix has been tested successfully in kdump scenario, on an LPAR with
4K page size by using different default/production kernel and kdump
kernel.

Also tested by manually corrupting the OOL handlers in the first kernel
and then kdump'ing, and then causing the OOL handlers to fire - mpe.

Fixes: c1fb6816 ("powerpc: Add relocation on exception vector handlers")
Cc: stable@vger.kernel.org
Signed-off-by: NHari Bathini <hbathini@linux.vnet.ibm.com>
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

8ed8ab40

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功