提交 · e4c0fc5f72bca11432297168338aef46c12793a4 · openeuler / raspberrypi-kernel

15 6月, 2017 2 次提交

powerpc/64s: Leave interrupts hard enabled in context switch for radix · e4c0fc5f

由 Nicholas Piggin 提交于 6月 09, 2017

Commit 4387e9ff25 ("[POWERPC] Fix PMU + soft interrupt disable bug")
hard disabled interrupts over the low level context switch, because
the SLB management can't cope with a PMU interrupt accesing the stack
in that window.

Radix based kernel mapping does not use the SLB so it does not require
interrupts hard disabled here.

This is worth 1-2% in context switch performance on POWER9.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e4c0fc5f

powerpc/64: Avoid restore_math call if possible in syscall exit · bc4f65e4

由 Nicholas Piggin 提交于 6月 09, 2017

The syscall exit code that branches to restore_math is quite heavy on
Book3S, consisting of 2 mtmsr instructions. Threads that don't use both
FP and vector can get caught here if the kernel ever uses FP or vector.
Lazy-FP/vec context switching also trips this case.

So check for lazy FP and vector before switching RI for restore_math.
Move most of this case out of line.

For threads that do want to restore math registers, the MSR switches are
still suboptimal. Future direction may be to use a soft-RI bit to avoid
MSR switches in kernel (similar to soft-EE), but for now at least the
no-restore

POWER9 context switch rate increases by about 5% due to sched_yield(2)
return performance. I haven't constructed a test to measure the syscall
cost.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

bc4f65e4

27 4月, 2017 1 次提交

powerpc: Split ftrace bits into a separate file · 7853f9c0

由 Naveen N. Rao 提交于 4月 25, 2017

entry_*.S now includes a lot more than just kernel entry/exit code. As a
first step at cleaning this up, let's split out the ftrace bits into
separate files. Also move all related tracing code into a new trace/
subdirectory.

No functional changes.
Suggested-by: NMichael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7853f9c0

24 4月, 2017 1 次提交

powerpc/ftrace: Restore LR from pt_regs · 2f59be5b

由 Naveen N. Rao 提交于 4月 19, 2017

Pass the real LR to the ftrace handler. This is needed for KPROBES_ON_FTRACE for
the pre handlers.

Also, with KPROBES_ON_FTRACE, the link register may be updated by the pre
handlers or by a registed kretprobe. Honor updated LR by restoring it from
pt_regs, rather than from the stack save area.

Live patch and function graph continue to work fine with this change.
Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

2f59be5b

23 4月, 2017 1 次提交

powerpc/ftrace: Move stack setup and teardown code into ftrace_graph_caller() · 700e6437

由 Naveen N. Rao 提交于 4月 19, 2017

Move the stack setup and teardown code into ftrace_graph_caller(). This way, we
don't incur the cost of setting it up unless function graph is enabled for this
function.

Also, remove the extraneous LR restore code after the function graph stub. LR
has previously been restored and neither livepatch_handler() nor
ftrace_graph_caller() return back here.
Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
[mpe: Drop bad change to non-mprofile-kernel version of ftrace_graph_caller]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

700e6437

18 4月, 2017 1 次提交

powerpc/kprobe: Fix oops when kprobed on 'stdu' instruction · 9e1ba4f2

由 Ravi Bangoria 提交于 4月 11, 2017

If we set a kprobe on a 'stdu' instruction on powerpc64, we see a kernel
OOPS:

  Bad kernel stack pointer cd93c840 at c000000000009868
  Oops: Bad kernel stack pointer, sig: 6 [#1]
  ...
  GPR00: c000001fcd93cb30 00000000cd93c840 c0000000015c5e00 00000000cd93c840
  ...
  NIP [c000000000009868] resume_kernel+0x2c/0x58
  LR [c000000000006208] program_check_common+0x108/0x180

On a 64-bit system when the user probes on a 'stdu' instruction, the kernel does
not emulate actual store in emulate_step() because it may corrupt the exception
frame. So the kernel does the actual store operation in exception return code
i.e. resume_kernel().

resume_kernel() loads the saved stack pointer from memory using lwz, which only
loads the low 32-bits of the address, causing the kernel crash.

Fix this by loading the 64-bit value instead.

Fixes: be96f633 ("powerpc: Split out instruction analysis part of emulate_step()")
Cc: stable@vger.kernel.org # v3.18+
Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Reviewed-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: NAnanth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
[mpe: Change log massage, add stable tag]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

9e1ba4f2

20 9月, 2016 1 次提交

powerpc/64s: Optimise MSR handling in exception handling · 49d09bf2

由 Nicholas Piggin 提交于 9月 15, 2016

mtmsrd with L=1 only affects MSR_EE and MSR_RI bits, and we always
know what state those bits are, so the kernel MSR does not need to be
loaded when modifying them.

mtmsrd is often in the critical execution path, so avoiding dependency
on even L1 load is noticable. On a POWER8 this saves about 3 cycles
from the syscall path, and possibly a few from other exception returns
(not measured).
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

49d09bf2

29 8月, 2016 1 次提交

powerpc/tm: do not use r13 for tabort_syscall · cc7786d3

由 Nicholas Piggin 提交于 7月 25, 2016

tabort_syscall runs with RI=1, so a nested recoverable machine
check will load the paca into r13 and overwrite what we loaded
it with, because exceptions returning to privileged mode do not
restore r13.

Fixes: b4b56f9e (powerpc/tm: Abort syscalls in active transactions)
Cc: stable@vger.kernel.org
Signed-off-by: NNick Piggin <npiggin@gmail.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

cc7786d3

08 8月, 2016 1 次提交
- A
  ppc: move exports to definitions · 9445aa1a
  由 Al Viro 提交于 1月 13, 2016
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  9445aa1a
01 8月, 2016 1 次提交

powerpc/mm: Make MMU_FTR_RADIX a MMU family feature · 5a25b6f5

由 Aneesh Kumar K.V 提交于 7月 27, 2016

MMU feature bits are defined such that we use the lower half to
present MMU family features. Remove the strict split of half and
also move Radix to a mmu family feature. Radix introduce a new MMU
model and strictly speaking it is a new MMU family. This also free
up bits which can be used for individual features later.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

5a25b6f5

09 7月, 2016 1 次提交

powerpc32: provide VIRT_CPU_ACCOUNTING · c223c903

由 Christophe Leroy 提交于 5月 17, 2016

This patch provides VIRT_CPU_ACCOUTING to PPC32 architecture.
PPC32 doesn't have the PACA structure, so we use the task_info
structure to store the accounting data.

In order to reuse on PPC32 the PPC64 functions, all u64 data has
been replaced by 'unsigned long' so that it is u32 on PPC32 and
u64 on PPC64
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NScott Wood <oss@buserror.net>

c223c903

14 6月, 2016 1 次提交

powerpc: Define and use PPC64_ELF_ABI_v2/v1 · f55d9665

由 Michael Ellerman 提交于 6月 06, 2016

We're approaching 20 locations where we need to check for ELF ABI v2.
That's fine, except the logic is a bit awkward, because we have to check
that _CALL_ELF is defined and then what its value is.

So check it once in asm/types.h and define PPC64_ELF_ABI_v2 when ELF ABI
v2 is detected.

We also have a few places where what we're really trying to check is
that we are using the 64-bit v1 ABI, ie. function descriptors. So also
add a #define for that, which simplifies several checks.
Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f55d9665

11 5月, 2016 1 次提交

powerpc/mm/radix: Use STD_MMU_64 to properly isolate hash related code · caca285e

由 Aneesh Kumar K.V 提交于 4月 29, 2016

We also use MMU_FTR_RADIX to branch out from code path specific to
hash.

No functionality change.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

caca285e

27 4月, 2016 1 次提交

powerpc: Add support for userspace P9 copy paste · 8a649045

由 Chris Smart 提交于 4月 26, 2016

The copy paste facility introduced in POWER9 provides an optimised
mechanism for a userspace application to copy a cacheline. This is
provided by a pair of instructions, copy and paste, while a third,
cp_abort (copy paste abort), provides a clean up of the state in case of
a failure.

The copy instruction will read a 128 byte cacheline and store it in an
internal buffer. The subsequent paste instruction will store this
internal buffer to memory and set a CR field if the paste succeeds.

Since the state of the copy paste buffer is internal (and not
architecturally visible), in the unlikely event of a context switch, the
state cannot be stored and the paste should therefore fail.

The cp_abort instruction exists to fail and clean up any such
interrupted copy paste sequence and is to be called by the kernel as
part of the context switch. Doing so prevents data from a preceding copy
in one process leaking into the paste of another.

This code enables use of the cp_abort instruction if a supported
processor is detected.

NOTE: this is for userspace only, not in kernel, and does not deal
with KVM guests.

Patch created with much assistance from Michael Neuling
<mikey@neuling.org>
Signed-off-by: NChris Smart <chris@distroguy.com>
Reviewed-by: NCyril Bur <cyrilbur@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

8a649045

14 4月, 2016 1 次提交

powerpc/livepatch: Add live patching support on ppc64le · 85baa095

由 Michael Ellerman 提交于 3月 24, 2016

Add the kconfig logic & assembly support for handling live patched
functions. This depends on DYNAMIC_FTRACE_WITH_REGS, which in turn
depends on the new -mprofile-kernel ftrace ABI, which is only supported
currently on ppc64le.

Live patching is handled by a special ftrace handler. This means it runs
from ftrace_caller(). The live patch handler modifies the NIP so as to
redirect the return from ftrace_caller() to the new patched function.

However there is one particularly tricky case we need to handle.

If a function A calls another function B, and it is known at link time
that they share the same TOC, then A will not save or restore its TOC,
and will call the local entry point of B.

When we live patch B, we replace it with a new function C, which may
not have the same TOC as A. At live patch time it's too late to modify A
to do the TOC save/restore, so the live patching code must interpose
itself between A and C, and do the TOC save/restore that A omitted.

An additionaly complication is that the livepatch code can not create a
stack frame in order to save the TOC. That is because if C takes > 8
arguments, or is varargs, A will have written the arguments for C in
A's stack frame.

To solve this, we introduce a "livepatch stack" which grows upward from
the base of the regular stack, and is used to store the TOC & LR when
calling a live patched function.

When the patched function returns, we retrieve the real LR & TOC from
the livepatch stack, restore them, and pop the livepatch "stack frame".
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NTorsten Duwe <duwe@suse.de>
Reviewed-by: NBalbir Singh <bsingharora@gmail.com>

85baa095

16 3月, 2016 1 次提交

powerpc: Fix unrecoverable SLB miss during restore_math() · 6e669f08

由 Cyril Bur 提交于 3月 16, 2016

Commit 70fe3d98 "powerpc: Restore FPU/VEC/VSX if previously used" introduces a
call to restore_math() late in the syscall return path, after MSR_RI has been
cleared. The MSR_RI flag is used to indicate whether the kernel can take
another exception or not. A cleared MSR_RI flag indicates that the kernel
cannot.

Unfortunately when a machine is under SLB pressure an SLB miss can occur
in restore_math() which (with MSR_RI cleared) leads to an unrecoverable
exception.

  Unrecoverable exception 4100 at c0000000000088d8
  cpu 0x0: Vector: 4100  at [c0000003fa473b20]
      pc: c0000000000088d8: .load_vr_state+0x70/0x110
      lr: c00000000000f710: .restore_math+0x130/0x188
      sp: c0000003fa473da0
     msr: 9000000002003030
    current = 0xc0000007f876f180
    paca    = 0xc00000000fff0000	 softe: 0	 irq_happened: 0x01
      pid   = 1944, comm = K08umountfs
  [link register   ] c00000000000f710 .restore_math+0x130/0x188
  [c0000003fa473da0] c0000003fa473e30 (unreliable)
  [c0000003fa473e30] c000000000007b6c system_call+0x84/0xfc

The clearing of MSR_RI is actually an optimisation to avoid multiple MSR
writes, what must be disabled are interrupts. See comment in entry_64.S:

  /*
   * For performance reasons we clear RI the same time that we
   * clear EE. We only need to clear RI just before we restore r13
   * below, but batching it with EE saves us one expensive mtmsrd call.
   * We have to be careful to restore RI if we branch anywhere from
   * here (eg syscall_exit_work).
   */

At the point of calling restore_math() r13 has not been restored, as such, the
quick fix of turning MSR_RI back on for the call to restore_math() will
eliminate the occurrence of an unrecoverable exception.

We'd like to do a better fix in future.

Fixes: 70fe3d98 ("powerpc: Restore FPU/VEC/VSX if previously used")
Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

6e669f08

07 3月, 2016 1 次提交

powerpc/ftrace: Add support for -mprofile-kernel ftrace ABI · 15308664

由 Torsten Duwe 提交于 3月 03, 2016

The gcc switch -mprofile-kernel defines a new ABI for calling _mcount()
very early in the function with minimal overhead.

Although mprofile-kernel has been available since GCC 3.4, there were
bugs which were only fixed recently. Currently it is known to work in
GCC 4.9, 5 and 6.

Additionally there are two possible code sequences generated by the
flag, the first uses mflr/std/bl and the second is optimised to omit the
std. Currently only gcc 6 has the optimised sequence. This patch
supports both sequences.

Initial work started by Vojtech Pavlik, used with permission.

Key changes:
 - rework _mcount() to work for both the old and new ABIs.
 - implement new versions of ftrace_caller() and ftrace_graph_caller()
   which deal with the new ABI.
 - updates to __ftrace_make_nop() to recognise the new mcount calling
   sequence.
 - updates to __ftrace_make_call() to recognise the nop'ed sequence.
 - implement ftrace_modify_call().
 - updates to the module loader to surpress the toc save in the module
   stub when calling mcount with the new ABI.
Reviewed-by: NBalbir Singh <bsingharora@gmail.com>
Signed-off-by: NTorsten Duwe <duwe@suse.de>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

15308664

02 3月, 2016 1 次提交

powerpc: Restore FPU/VEC/VSX if previously used · 70fe3d98

由 Cyril Bur 提交于 2月 29, 2016

Currently the FPU, VEC and VSX facilities are lazily loaded. This is not
a problem unless a process is using these facilities.

Modern versions of GCC are very good at automatically vectorising code,
new and modernised workloads make use of floating point and vector
facilities, even the kernel makes use of vectorised memcpy.

All this combined greatly increases the cost of a syscall since the
kernel uses the facilities sometimes even in syscall fast-path making it
increasingly common for a thread to take an *_unavailable exception soon
after a syscall, not to mention potentially taking all three.

The obvious overcompensation to this problem is to simply always load
all the facilities on every exit to userspace. Loading up all FPU, VEC
and VSX registers every time can be expensive and if a workload does
avoid using them, it should not be forced to incur this penalty.

An 8bit counter is used to detect if the registers have been used in the
past and the registers are always loaded until the value wraps to back
to zero.

Several versions of the assembly in entry_64.S were tested:

  1. Always calling C.
  2. Performing a common case check and then calling C.
  3. A complex check in asm.

After some benchmarking it was determined that avoiding C in the common
case is a performance benefit (option 2). The full check in asm (option
3) greatly complicated that codepath for a negligible performance gain
and the trade-off was deemed not worth it.
Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
[mpe: Move load_vec in the struct to fill an existing hole, reword change log]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

fixup

70fe3d98

17 12月, 2015 2 次提交

M
powerpc/kernel: Open code SET_DEFAULT_THREAD_PPR · d8725ce8
由 Michael Ellerman 提交于 11月 25, 2015
```
This is only used in one location, open code it.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
```
d8725ce8

powerpc/kernel: Open code HMT_MEDIUM_LOW_HAS_PPR · d030a4b5

由 Michael Ellerman 提交于 11月 25, 2015

HMT_MEDIUM_LOW_HAS_PPR is only used in once place, open code it.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

d030a4b5

01 12月, 2015 3 次提交

powerpc: Remove redundant mflr in _switch · 68bfa962

由 Anton Blanchard 提交于 10月 29, 2015

No need to execute mflr twice.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

68bfa962

powerpc: Create context switch helpers save_sprs() and restore_sprs() · 152d523e

由 Anton Blanchard 提交于 10月 29, 2015

Move all our context switch SPR save and restore code into two
helpers. We do a few optimisations:

- Group all mfsprs and all mtsprs. In many cases an mtspr sets a
scoreboarding bit that an mfspr waits on, so the current practise of
mfspr A; mtspr A; mfpsr B; mtspr B is the worst scheduling we can
do.

- SPR writes are slow, so check that the value is changing before
writing it.

A context switch microbenchmark using yield():

http://ozlabs.org/~anton/junkcode/context_switch2.c

./context_switch2 --test=yield 0 0

shows an improvement of almost 10% on POWER8.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

152d523e

powerpc: Don't disable kernel FP/VMX/VSX MSR bits on context switch · 07e45c12

由 Anton Blanchard 提交于 10月 29, 2015

Writing the MSR is slow, so we want to avoid it whenever possible.

A subsequent patch will add a debug option that strictly manages the
FP/VMX/VSX unavailable bits. For now just remove it, matching what
we do in other areas of the kernel (eg enable_kernel_altivec()).

A context switch microbenchmark using yield():

http://ozlabs.org/~anton/junkcode/context_switch2.c

./context_switch2 --test=yield --fp 0 0

shows an improvement of almost 3% on POWER8.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

07e45c12

29 7月, 2015 2 次提交

powerpc/kernel: Change the do_syscall_trace_enter() API · d3837414

由 Michael Ellerman 提交于 7月 23, 2015

The API for calling do_syscall_trace_enter() is currently sensible
enough, it just returns the (modified) syscall number.

However once we enable seccomp filter it will get more complicated. When
seccomp filter runs, the seccomp kernel code (via SECCOMP_RET_ERRNO), or
a ptracer (via SECCOMP_RET_TRACE), may reject the syscall and *may* or may
*not* set a return value in r3.

That means the assembler that calls do_syscall_trace_enter() can not
blindly return ENOSYS, it needs to only return ENOSYS if a return value
has not already been set.

There is no way to implement that logic with the current API. So change
the do_syscall_trace_enter() API to make it deal with the return code
juggling, and the assembler can then just return whatever return code it
is given.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NKees Cook <keescook@chromium.org>

d3837414

powerpc/kernel: Switch to using MAX_ERRNO · c3525940

由 Michael Ellerman 提交于 7月 23, 2015

Currently on powerpc we have our own #define for the highest (negative)
errno value, called _LAST_ERRNO. This is defined to be 516, for reasons
which are not clear.

The generic code, and x86, use MAX_ERRNO, which is defined to be 4095.

In particular seccomp uses MAX_ERRNO to restrict the value that a
seccomp filter can return.

Currently with the mismatch between _LAST_ERRNO and MAX_ERRNO, a seccomp
tracer wanting to return 600, expecting it to be seen as an error, would
instead find on powerpc that userspace sees a successful syscall with a
return value of 600.

To avoid this inconsistency, switch powerpc to use MAX_ERRNO.

We are somewhat confident that generic syscalls that can return a
non-error value above negative MAX_ERRNO have already been updated to
use force_successful_syscall_return().

I have also checked all the powerpc specific syscalls, and believe that
none of them expect to return a non-error value between -MAX_ERRNO and
-516. So this change should be safe ...
Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NKees Cook <keescook@chromium.org>

c3525940

19 6月, 2015 1 次提交

powerpc/tm: Abort syscalls in active transactions · b4b56f9e

由 Sam bobroff 提交于 6月 12, 2015

This patch changes the syscall handler to doom (tabort) active
transactions when a syscall is made and return very early without
performing the syscall and keeping side effects to a minimum (no CPU
accounting or system call tracing is performed). Also included is a
new HWCAP2 bit, PPC_FEATURE2_HTM_NOSC, to indicate this
behaviour to userspace.

Currently, the system call instruction automatically suspends an
active transaction which causes side effects to persist when an active
transaction fails.

This does change the kernel's behaviour, but in a way that was
documented as unsupported.  It doesn't reduce functionality as
syscalls will still be performed after tsuspend; it just requires that
the transaction be explicitly suspended.  It also provides a
consistent interface and makes the behaviour of user code
substantially the same across powerpc and platforms that do not
support suspended transactions (e.g. x86 and s390).

Performance measurements using
http://ozlabs.org/~anton/junkcode/null_syscall.c indicate the cost of
a normal (non-aborted) system call increases by about 0.25%.
Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b4b56f9e

07 6月, 2015 1 次提交

powerpc/kernel: Rename PACA_DSCR to PACA_DSCR_DEFAULT · 1db36525

由 Anshuman Khandual 提交于 5月 21, 2015

PACA_DSCR offset macro tracks dscr_default element in the paca
structure. Better change the name of this macro to match that of the
data element it tracks. Makes the code more readable.
Signed-off-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

1db36525

30 4月, 2015 1 次提交

Revert "powerpc/tm: Abort syscalls in active transactions" · 68fc378c

由 Michael Ellerman 提交于 4月 30, 2015

This reverts commit feba4036.

Although the principle of this change is good, the implementation has a
few issues.

Firstly we can sometimes fail to abort a syscall because r12 may have
been clobbered by C code if we went down the virtual CPU accounting
path, or if syscall tracing was enabled.

Secondly we have decided that it is safer to abort the syscall even
earlier in the syscall entry path, so that we avoid the syscall tracing
path when we are transactional.

So that we have time to thoroughly test those changes we have decided to
revert this for this merge window and will merge the fixed version in
the next window.

NB. Rather than reverting the selftest we just drop tm-syscall from
TEST_PROGS so that it's not run by default.

Fixes: feba4036 ("powerpc/tm: Abort syscalls in active transactions")
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

68fc378c

11 4月, 2015 1 次提交

powerpc/tm: Abort syscalls in active transactions · feba4036

由 Sam bobroff 提交于 4月 10, 2015

This patch changes the syscall handler to doom (tabort) active
transactions when a syscall is made and return immediately without
performing the syscall.

Currently, the system call instruction automatically suspends an
active transaction which causes side effects to persist when an active
transaction fails.

This does change the kernel's behaviour, but in a way that was
documented as unsupported. It doesn't reduce functionality because
syscalls will still be performed after tsuspend. It also provides a
consistent interface and makes the behaviour of user code
substantially the same across powerpc and platforms that do not
support suspended transactions (e.g. x86 and s390).

Performance measurements using
http://ozlabs.org/~anton/junkcode/null_syscall.c
indicate the cost of a system call increases by about 0.5%.
Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
Acked-By: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

feba4036

28 3月, 2015 1 次提交

powerpc: Add a proper syscall for switching endianness · 529d235a

由 Michael Ellerman 提交于 3月 28, 2015

We currently have a "special" syscall for switching endianness. This is
syscall number 0x1ebe, which is handled explicitly in the 64-bit syscall
exception entry.

That has a few problems, firstly the syscall number is outside of the
usual range, which confuses various tools. For example strace doesn't
recognise the syscall at all.

Secondly it's handled explicitly as a special case in the syscall
exception entry, which is complicated enough without it.

As a first step toward removing the special syscall, we need to add a
regular syscall that implements the same functionality.

The logic is simple, it simply toggles the MSR_LE bit in the userspace
MSR. This is the same as the special syscall, with the caveat that the
special syscall clobbers fewer registers.

This version clobbers r9-r12, XER, CTR, and CR0-1,5-7.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

529d235a

02 2月, 2015 2 次提交

powerpc: Remove old compile time disabled syscall tracing code · a4bcbe6a

由 Michael Ellerman 提交于 1月 14, 2015

We have code to do syscall tracing which is disabled at compile time by
default. It's not been touched since the dawn of time (ie. v2.6.12).

There are now better ways to do syscall tracing, ie. using the
raw_syscall, or syscall tracepoints.

For the specific case of tracing syscalls at boot on a system that
doesn't get to userspace, you can boot with:

  trace_event=syscalls tp_printk=on

Which will trace syscalls from boot, and echo all output to the console.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a4bcbe6a

powerpc/kernel: Make syscall_exit a local label · 4c3b2168

由 Michael Ellerman 提交于 12月 05, 2014

Currently when we back trace something that is in a syscall we see
something like this:

[c000000000000000] [c000000000000000] SyS_read+0x6c/0x110
[c000000000000000] [c000000000000000] syscall_exit+0x0/0x98

Although it's entirely correct, seeing syscall_exit at the bottom can be
confusing - we were exiting from a syscall and then called SyS_read() ?

If we instead change syscall_exit to be a local label we get something
more intuitive:

[c0000001fa46fde0] [c00000000026719c] SyS_read+0x6c/0x110
[c0000001fa46fe30] [c000000000009264] system_call+0x38/0xd0

ie. we were handling a system call, and it was SyS_read().
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4c3b2168

23 1月, 2015 1 次提交

powerpc: Rename _TIF_SYSCALL_T_OR_A to _TIF_SYSCALL_DOTRACE · 10ea8343

由 Michael Ellerman 提交于 1月 15, 2015

Once upon a time, at least 9 years ago (< 2.6.12), _TIF_SYSCALL_T_OR_A
meant "TRACE or AUDIT". But these days it means TRACE or AUDIT or
SECCOMP or TRACEPOINT or NOHZ.

All of those are implemented via syscall_dotrace() so rename the flag to
that to try and clarify things.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

10ea8343

10 11月, 2014 2 次提交

powerpc/ftrace: simplify prepare_ftrace_return · b3c18725

由 Anton Blanchard 提交于 9月 17, 2014

Instead of passing in the stack address of the link register
to be modified, just pass in the old value and return the
new value and rely on ftrace_graph_caller to do the
modification.

This removes the exception handling around the stack update -
it isn't needed and we weren't consistent about it. Later on
we would do an unprotected modification:

       if (!ftrace_graph_entry(&trace)) {
               *parent = old;
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b3c18725

powerpc/ftrace: Remove mod_return_to_handler · 7d56c65a

由 Anton Blanchard 提交于 9月 17, 2014

mod_return_to_handler is the same as return_to_handler, except
it handles the change of the TOC (r2). Add this into
return_to_handler and remove mod_return_to_handler.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7d56c65a

31 10月, 2014 1 次提交

powerpc: do_notify_resume can be called with bad thread_info flags argument · 808be314

由 Anton Blanchard 提交于 10月 31, 2014

Back in 7230c564 ("powerpc: Rework lazy-interrupt handling") we
added a call out to restore_interrupts() (written in c) before calling
do_notify_resume:

        bl      restore_interrupts
        addi    r3,r1,STACK_FRAME_OVERHEAD
        bl      do_notify_resume

Unfortunately do_notify_resume takes two arguments, the second one
being the thread_info flags:

void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags)

We do populate r4 (the second argument) earlier, but
restore_interrupts() is free to muck it up all it wants. My guess is
the gcc compiler gods shone down on us and its register allocator
never used r4. Sometimes, rarely, luck is on our side.

LLVM on the other hand did trample r4.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Cc: stable@vger.kernel.org
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

808be314

05 8月, 2014 1 次提交

powerpc/book3s: Add basic infrastructure to handle HMI in Linux. · 0869b6fd

由 Mahesh Salgaonkar 提交于 7月 29, 2014

Handle Hypervisor Maintenance Interrupt (HMI) in Linux. This patch implements
basic infrastructure to handle HMI in Linux host. The design is to invoke
opal handle hmi in real mode for recovery and set irq_pending when we hit HMI.
During check_irq_replay pull opal hmi event and print hmi info on console.
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

0869b6fd

28 7月, 2014 1 次提交

powerpc: Remove MMU_FTR_SLB · 13b3d13b

由 Michael Ellerman 提交于 7月 10, 2014

We now only support cpus that use an SLB, so we don't need an MMU
feature to indicate that.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

13b3d13b

11 6月, 2014 1 次提交

powerpc: Correct DSCR during TM context switch · 96d01610

由 Sam bobroff 提交于 6月 05, 2014

Correct the DSCR SPR becoming temporarily corrupted if a task is
context switched during a transaction.

The problem occurs while suspending the task and is caused by saving
the DSCR to thread.dscr after it has already been set to the CPU's
default value:

__switch_to() calls __switch_to_tm()
	which calls tm_reclaim_task()
	which calls tm_reclaim_thread()
	which calls tm_reclaim()
		where the DSCR is set to the CPU's default
__switch_to() calls _switch()
		where thread.dscr is set to the DSCR

When the task is resumed, it's transaction will be doomed (as usual)
and the DSCR SPR will be corrupted, although the checkpointed value
will be correct. Therefore the DSCR will be immediately corrected by
the transaction aborting, unless it has been suspended. In that case
the incorrect value can be seen by the task until it resumes the
transaction.

The fix is to treat the DSCR similarly to the TAR and save it early
in __switch_to().

A program exposing the problem is added to the kernel self tests as:
tools/testing/selftests/powerpc/tm/tm-resched-dscr.
Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
CC: <stable@vger.kernel.org> [v3.10+]
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

96d01610

28 5月, 2014 1 次提交

powerpc: Fix regression of per-CPU DSCR setting · 1739ea9e

由 Sam bobroff 提交于 5月 21, 2014

Since commit "efcac658 powerpc: Per process DSCR + some fixes (try#4)"
it is no longer possible to set the DSCR on a per-CPU basis.

The old behaviour was to minipulate the DSCR SPR directly but this is no
longer sufficient: the value is quickly overwritten by context switching.

This patch stores the per-CPU DSCR value in a kernel variable rather than
directly in the SPR and it is used whenever a process has not set the DSCR
itself. The sysfs interface (/sys/devices/system/cpu/cpuN/dscr) is unchanged.

Writes to the old global default (/sys/devices/system/cpu/dscr_default)
now set all of the per-CPU values and reads return the last written value.

The new per-CPU default is added to the paca_struct and is used everywhere
outside of sysfs.c instead of the old global default.
Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

1739ea9e