提交 · 0428491cba9277db42d66eb245d74255bd3dbfe7 · openanolis / cloud-kernel

23 6月, 2017 1 次提交

powerpc/mm: Trace tlbie(l) instructions · 0428491c

由 Balbir Singh 提交于 4月 11, 2017

Add a trace point for tlbie(l) (Translation Lookaside Buffer Invalidate
Entry (Local)) instructions.

The tlbie instruction has changed over the years, so not all versions
accept the same operands. Use the ISA v3 field operands because they are
the most verbose, we may change them in future.

Example output:

  qemu-system-ppc-5371  [016]  1412.369519: tlbie:
  	tlbie with lpid 0, local 1, rb=67bd8900174c11c1, rs=0, ric=0 prs=0 r=0
Signed-off-by: NBalbir Singh <bsingharora@gmail.com>
[mpe: Add some missing trace_tlbie()s, reword change log]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

0428491c

22 6月, 2017 1 次提交

powerpc: Convert VDSO update function to use new update_vsyscall interface · d4cfb113

由 Paul Mackerras 提交于 5月 27, 2017

This converts the powerpc VDSO time update function to use the new
interface introduced in commit 576094b7 ("time: Introduce new
GENERIC_TIME_VSYSCALL", 2012-09-11).  Where the old interface gave
us the time as of the last update in seconds and whole nanoseconds,
with the new interface we get the nanoseconds part effectively in
a binary fixed-point format with tk->tkr_mono.shift bits to the
right of the binary point.

With the old interface, the fractional nanoseconds got truncated,
meaning that the value returned by the VDSO clock_gettime function
would have about 1ns of jitter in it compared to the value computed
by the generic timekeeping code in the kernel.

The powerpc VDSO time functions (clock_gettime and gettimeofday)
already work in units of 2^-32 seconds, or 0.23283 ns, because that
makes it simple to split the result into seconds and fractional
seconds, and represent the fractional seconds in either microseconds
or nanoseconds.  This is good enough accuracy for now, so this patch
avoids changing how the VDSO works or the interface in the VDSO data
page.

This patch converts the powerpc update_vsyscall_old to be called
update_vsyscall and use the new interface.  We convert the fractional
second to units of 2^-32 seconds without truncating to whole nanoseconds.
(There is still a conversion to whole nanoseconds for any legacy users
of the vdso_data/systemcfg stamp_xtime field.)

In addition, this improves the accuracy of the computation of tb_to_xs
for those systems with high-frequency timebase clocks (>= 268.5 MHz)
by doing the right shift in two parts, one before the multiplication and
one after, rather than doing the right shift before the multiplication.
(We can't do all of the right shift after the multiplication unless we
use 128-bit arithmetic.)
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
Acked-by: NJohn Stultz <john.stultz@linaro.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

d4cfb113

21 6月, 2017 4 次提交

powerpc/time: Fix tracing in time.c · 6b847d79

由 Santosh Sivaraj 提交于 6月 20, 2017

Since trace_clock is in a different file and already marked with notrace,
enable tracing in time.c by removing it from the disabled list in Makefile.
Also annotate clocksource read functions and sched_clock with notrace.

Testing: Timer and ftrace selftests run with different trace clocks.
Acked-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: NSantosh Sivaraj <santosh@fossix.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

6b847d79

powerpc/64s: Rename slb_allocate_realmode() to slb_allocate() · fd88b945

由 Michael Ellerman 提交于 6月 19, 2017

As for slb_miss_realmode(), rename slb_allocate_realmode() to avoid
confusion over whether it runs in real or virtual mode - it runs in
both.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NNicholas Piggin <npiggin@gmail.com>

fd88b945

powerpc/64s: Rename slb_miss_realmode() to slb_miss_common() · 442b6e8e

由 Michael Ellerman 提交于 6月 19, 2017

slb_miss_realmode() doesn't always runs in real mode, which is what the
name implies. So rename it to avoid confusing people.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NNicholas Piggin <npiggin@gmail.com>

442b6e8e

powerpc/64s: Use BRANCH_TO_COMMON() for slb_miss_realmode · b102063b

由 Michael Ellerman 提交于 6月 19, 2017

All the callers of slb_miss_realmode currently open code the #ifndef
CONFIG_RELOCATABLE check and the branch via CTR in the RELOCATABLE case.
We have a macro to do this, BRANCH_TO_COMMON(), so use it.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NNicholas Piggin <npiggin@gmail.com>

b102063b

20 6月, 2017 9 次提交

N
powerpc/64s/paca: EX_CTR is not used with RELOCATABLE=n, remove it · 8568f1e0
由 Nicholas Piggin 提交于 5月 21, 2017
```
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
```
8568f1e0

powerpc/64s/paca: EX_R3 can be merged with EX_DAR · 635942ae

由 Nicholas Piggin 提交于 5月 21, 2017

EX_R3 is used only for a small section of the bad stack handler.
Merge it with EX_DAR.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

635942ae

powerpc/64s/paca: EX_LR can be merged with EX_DAR · dbeea1d6

由 Nicholas Piggin 提交于 5月 21, 2017

EX_LR is used only for a small section of the SLB miss handler.
Merge it with EX_DAR.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

dbeea1d6

powerpc/64s/paca: EX_SRR0 is unused, remove it · 36670fcf

由 Nicholas Piggin 提交于 5月 21, 2017

Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

36670fcf

powerpc/64s: Add EX_SIZE definition for paca exception save areas · 8c388514

由 Nicholas Piggin 提交于 5月 21, 2017

Rather than open-coding it 4 times.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
[mpe: Move __ASSEMBLY__ guards into head-64.h where they're really needed]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

8c388514

powerpc/64s: Avoid r3 save/restore in SLB miss handler · 4d7cd3b9

由 Nicholas Piggin 提交于 5月 21, 2017

The SLB miss handler uses r3 for the faulting address but r12 is
mostly able to be freed up to save r3 in. It just requires SRR1
be reloaded again on error.

It would be more conventional to use r12 for SRR1 (and use r11 to
save r3), but slb_allocate_realmode clobbers r11 and not r12.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4d7cd3b9

powerpc/64s: SLB miss already has CTR saved for relocatable kernel · fe5482c0

由 Nicholas Piggin 提交于 5月 21, 2017

The EXCEPTION_PROLOG_1 used by SLB miss already saves CTR when the
kernel is built with CONFIG_RELOCATABLE. So it does not have to be
saved and reloaded when branching to slb_miss_realmode. It can be
restored from the PACA as usual.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

fe5482c0

powerpc/64s: Avoid saving faulting address into EX_DAR in SLB miss · 7c28f048

由 Nicholas Piggin 提交于 5月 21, 2017

The EX_DAR save area is only used in exceptional cases. With r3 no
longer clobbered by slb_allocate_realmode, saving faulting address to
EX_DAR can be deferred to those cases.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7c28f048

powerpc/64s: Preserve r3 in slb_allocate_realmode() · d59afffd

由 Nicholas Piggin 提交于 5月 21, 2017

One fewer registers clobbered by this function means the SLB miss
handler can save one fewer.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

d59afffd

19 6月, 2017 9 次提交

powerpc/64s/idle: Run latch switch is done with MSR[EE]=0 · 40d24343

由 Nicholas Piggin 提交于 6月 13, 2017

In the idle sleep/wake code we know that MSR[EE] is clear, so we can
avoid 2 x mfmsr and 2 x mtmsr by calling the double-underscore
versions of the run latch routines which assume interrupts are already
disabled.
Acked-by: NVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

40d24343

powerpc/64s/idle: Predict HMI wakeup as unlikely · 95acdc07

由 Nicholas Piggin 提交于 6月 13, 2017

In a busy system, idle wakeups can be expected from IPIs and device
interrupts.
Reviewed-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

95acdc07

powerpc/64s/idle: Avoid SRR usage in idle sleep/wake paths · 9d292501

由 Nicholas Piggin 提交于 6月 13, 2017

Idle code now always runs at the 0xc... effective address whether
in real or virtual mode. This means rfid can be ditched, along
with a lot of SRR manipulations.

In the wakeup path, carry SRR1 around in r12. Use mtmsrd to change
MSR states as required.

This also balances the return prediction for the idle call, by
doing blr rather than rfid to return to the idle caller.

On POWER9, 2-process context switch on different cores, with snooze
disabled, increases performance by 2%.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
[mpe: Incorporate v2 fixes from Nick]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

9d292501

powerpc/64s/idle: Branch to handler with virtual mode offset · b51351e2

由 Nicholas Piggin 提交于 6月 13, 2017

Have the system reset idle wakeup handlers branched to in real mode
with the 0xc... kernel address applied. This allows simplifications of
avoiding rfid when switching to virtual mode in the wakeup handler.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b51351e2

powerpc/64s: Don't unbalance the return branch predictor in __replay_interrupt() · b48bbb82

由 Nicholas Piggin 提交于 6月 13, 2017

The __replay_interrupt() code is branched to with bl, but the caller is
returned to directly with rfid from the interrupt.

Instead, rfid to a stub that returns to the caller with blr, which
should keep the return branch predictor balanced.
Reviewed-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b48bbb82

powerpc/64s: msgclr when handling doorbell exceptions from system reset · a9af97aa

由 Nicholas Piggin 提交于 6月 13, 2017

msgsnd doorbell exceptions are cleared when the doorbell interrupt is
taken. However if a doorbell exception causes a system reset interrupt
wake from power saving state, the message is not cleared. Processing
the doorbell from the system reset interrupt requires msgclr to avoid
taking the exception again.

Testing this plus the previous wakup direct patch gives:

original wakeup direct msgclr
Different threads, same core: 315k/s 264k/s 345k/s
Different cores: 235k/s 242k/s 242k/s

Net speedup is +10% for same core, and +3% for different core.
Reviewed-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a9af97aa

powerpc/64s/idle: Process interrupts from system reset wakeup · 771d4304

由 Nicholas Piggin 提交于 6月 13, 2017

When the CPU wakes from low power state, it begins at the system reset
interrupt with the exception that caused the wakeup encoded in SRR1.

Today, powernv idle wakeup ignores the wakeup reason (except a special
case for HMI), and the regular interrupt corresponding to the
exception will fire after the idle wakeup exits.

Change this to replay the interrupt from the idle wakeup before
interrupts are hard-enabled.

Test on POWER8 of context_switch selftests benchmark with polling idle
disabled (e.g., always nap, giving cross-CPU IPIs) gives the following
results:

                                original         wakeup direct
Different threads, same core:   315k/s           264k/s
Different cores:                235k/s           242k/s

There is a slowdown for doorbell IPI (same core) case because system
reset wakeup does not clear the message and the doorbell interrupt
fires again needlessly.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

771d4304

powerpc/powernv: Simplify lazy IRQ handling in CPU offline · 2525db04

由 Nicholas Piggin 提交于 6月 13, 2017

Rather than concern ourselves with any soft-mask logic in the CPU
hotplug handler, just hard disable interrupts. This ensures there
are no lazy-irqs pending, which means we can call directly to idle
instruction in order to sleep.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

2525db04

powerpc/64s/idle: Move soft interrupt mask logic into C code · 2201f994

由 Nicholas Piggin 提交于 6月 13, 2017

This simplifies the asm and fixes irq-off tracing over sleep
instructions.

Also move powersave_nap check for POWER8 into C code, and move
PSSCR register value calculation for POWER9 into C.
Reviewed-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

2201f994

15 6月, 2017 8 次提交

powerpc/64s: Avoid cpabort in context switch when possible · 07d2a628

由 Nicholas Piggin 提交于 6月 09, 2017

The ISA v3.0B copy-paste facility only requires cpabort when switching
to a process that has foreign real addresses mapped (direct access to
accelerators), to clear a potential copy buffer filled by a previous
thread. There is no accelerator driver implemented yet, so cpabort can
be removed. It can be be re-added when a driver is implemented.

POWER9 DD1 requires the copy buffer to always be cleared on context
switch, but if accelerators are not in use, then an unpaired copy from
a dummy region is sufficient to clear data out of the copy buffer.

This increases context switch performance by about 5% on POWER9.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

07d2a628

powerpc/64: Drop explicit hwsync in context switch · 9145effd

由 Nicholas Piggin 提交于 6月 09, 2017

The sync (aka. hwsync, aka. heavyweight sync) in the context switch
code to prevent MMIO access being reordered from the point of view of
a single process if it gets migrated to a different CPU is not
required because there is an hwsync performed earlier in the context
switch path.

Comment this so it's clear enough if anything changes on the scheduler
or the powerpc sides. Remove the hwsync from _switch.

This improves context switch performance by 2-3% on POWER8.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

9145effd

powerpc/64: Drop reservation-clearing ldarx in context switch · 837e72f7

由 Nicholas Piggin 提交于 6月 09, 2017

There is no need to explicitly break the reservation in _switch,
because we are guaranteed that the context switch path will include a
larx/stcx.

Comment the guarantee and remove the reservation clear from _switch.

This is worth 1-2% in context switch performance.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

837e72f7

powerpc/64s: Leave interrupts hard enabled in context switch for radix · e4c0fc5f

由 Nicholas Piggin 提交于 6月 09, 2017

Commit 4387e9ff25 ("[POWERPC] Fix PMU + soft interrupt disable bug")
hard disabled interrupts over the low level context switch, because
the SLB management can't cope with a PMU interrupt accesing the stack
in that window.

Radix based kernel mapping does not use the SLB so it does not require
interrupts hard disabled here.

This is worth 1-2% in context switch performance on POWER9.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e4c0fc5f

powerpc/64: Avoid restore_math call if possible in syscall exit · bc4f65e4

由 Nicholas Piggin 提交于 6月 09, 2017

The syscall exit code that branches to restore_math is quite heavy on
Book3S, consisting of 2 mtmsr instructions. Threads that don't use both
FP and vector can get caught here if the kernel ever uses FP or vector.
Lazy-FP/vec context switching also trips this case.

So check for lazy FP and vector before switching RI for restore_math.
Move most of this case out of line.

For threads that do want to restore math registers, the MSR switches are
still suboptimal. Future direction may be to use a soft-RI bit to avoid
MSR switches in kernel (similar to soft-EE), but for now at least the
no-restore

POWER9 context switch rate increases by about 5% due to sched_yield(2)
return performance. I haven't constructed a test to measure the syscall
cost.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

bc4f65e4

powerpc/64s: Optimize hypercall/syscall entry · acd7d8ce

由 Nicholas Piggin 提交于 6月 09, 2017

After bc355125 ("powerpc/64: Allow for relocation-on interrupts from
guest to host"), a getppid() system call goes from 307 cycles to 358
cycles (+17%) on POWER8. This is due significantly to the scratch SPR
used by the hypercall check.

It turns out there are a some volatile registers common to both system
call and hypercall (in particular, r12, cr0, ctr), which can be used to
avoid the SPR and some other overheads. This brings getppid to 320 cycles
(+4%).

Testing hcall entry performance by running "sc 1" in guest userspace
before this patch is 854 cycles, afterwards is 826. Also a small win
there.

POWER9 syscall is improved by about the same amount, hcall not tested.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

acd7d8ce

powerpc/mm/radix: Only add X for pages overlapping kernel text · 9abcc981

由 Michael Ellerman 提交于 6月 06, 2017

Currently we map the whole linear mapping with PAGE_KERNEL_X. Instead we
should check if the page overlaps the kernel text and only then add
PAGE_KERNEL_X.

Note that we still use 1G pages if they're available, so this will
typically still result in a 1G executable page at KERNELBASE. So this fix is
primarily useful for catching stray branches to high linear mapping addresses.

Without this patch, we can execute at 1G in xmon using:

  0:mon> m c000000040000000
  c000000040000000  00 l
  c000000040000000  00000000 01006038
  c000000040000004  00000000 2000804e
  c000000040000008  00000000 x
  0:mon> di c000000040000000
  c000000040000000  38600001      li      r3,1
  c000000040000004  4e800020      blr
  0:mon> p c000000040000000
  return value is 0x1

After we get a 400 as expected:

  0:mon> p c000000040000000
  *** 400 exception occurred

Fixes: 2bfd65e4 ("powerpc/mm/radix: Add radix callbacks for early init routines")
Cc: stable@vger.kernel.org # v4.7+
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NBalbir Singh <bsingharora@gmail.com>

9abcc981

Revert "powerpc: Handle simultaneous interrupts at once" · 0edc2ca9

由 Michael Ellerman 提交于 6月 15, 2017

This reverts commit 45cb08f4.

For some reason this is causing IRQ problems on Freescale Book3E
machines, eg on my p5020ds:

  irq 25: nobody cared (try booting with the "irqpoll" option)
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc3-gcc-6.3.1-00037-g45cb08f4 #624
  Call Trace:
  [c0000000fffdbb10] [c00000000049962c] .dump_stack+0xa8/0xe8 (unreliable)
  [c0000000fffdbba0] [c0000000000babf4] .__report_bad_irq+0x54/0x140
  [c0000000fffdbc40] [c0000000000bb11c] .note_interrupt+0x324/0x380
  [c0000000fffdbd00] [c0000000000b7110] .handle_irq_event_percpu+0x68/0x88
  [c0000000fffdbd90] [c0000000000b718c] .handle_irq_event+0x5c/0xa8
  [c0000000fffdbe10] [c0000000000bc01c] .handle_fasteoi_irq+0xe4/0x298
  [c0000000fffdbe90] [c0000000000b59c4] .generic_handle_irq+0x50/0x74
  [c0000000fffdbf10] [c0000000000075d8] .__do_irq+0x74/0x1f0
  [c0000000fffdbf90] [c0000000000189f8] .call_do_irq+0x14/0x24
  [c0000000f7173060] [c0000000000077e4] .do_IRQ+0x90/0x120
  [c0000000f7173100] [c00000000001d93c] exc_0x500_common+0xfc/0x100
  --- interrupt: 501 at .prepare_to_wait_event+0xc/0x14c
      LR = .fsl_elbc_run_command+0xc8/0x23c
  [c0000000f71734d0] [c00000000065f418] .nand_reset+0xb8/0x168
  [c0000000f7173560] [c00000000065fec4] .nand_scan_ident+0x2b0/0x1638
  [c0000000f7173650] [c000000000666cd8] .fsl_elbc_nand_probe+0x34c/0x5f0
  ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
  [c0000000f7173750] [c0000000005a3c60] .platform_drv_probe+0x64/0xb0
  [c0000000f71737d0] [c0000000005a12e0] .really_probe+0x290/0x334
  [c0000000f7173870] [c0000000005a14a0] .__driver_attach+0x11c/0x120
  [c0000000f7173900] [c00000000059e6a0] .bus_for_each_dev+0x98/0xfc
  [c0000000f71739a0] [c0000000005a0b3c] .driver_attach+0x34/0x4c
  [c0000000f7173a20] [c0000000005a04b0] .bus_add_driver+0x1ac/0x2e0
  [c0000000f7173ac0] [c0000000005a2170] .driver_register+0x94/0x160
  [c0000000f7173b40] [c0000000005a3be0] .__platform_driver_register+0x60/0x7c
  [c0000000f7173bc0] [c000000000d6aab4] .fsl_elbc_nand_driver_init+0x24/0x38
  [c0000000f7173c30] [c000000000001934] .do_one_initcall+0x68/0x1b8
  [c0000000f7173d00] [c000000000d210f8] .kernel_init_freeable+0x260/0x338
  [c0000000f7173db0] [c0000000000021b0] .kernel_init+0x20/0xe70
  [c0000000f7173e30] [c0000000000009bc] .ret_from_kernel_thread+0x58/0x9c
  handlers:
  [<c000000000ed85c8>] .fsl_lbc_ctrl_irq
  Disabling IRQ #25

Ben also had concerns with the implementation being potentially slow on
some PICs, so revert it for now.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

0edc2ca9

06 6月, 2017 1 次提交

powerpc/64s: Machine check handle ifetch from foreign real address for POWER9 · 90df4bfb

由 Nicholas Piggin 提交于 5月 29, 2017

The i-side 0111b machine check, which is "Instruction Fetch to foreign
address space", was missed by 7b9f71f9 ("powerpc/64s: POWER9 machine
check handler").

    The POWER9 processor core considers host real addresses with a
    nonzero value in RA(8:12) as foreign address space, accessible only
    by the copy and paste instructions. The copy and paste instruction
    pair can be used to invoke the Nest accelerators via the Virtual
    Accelerator Switchboard (VAS).

It is an error for any regular load/store or ifetch to go to a foreign
addresses. When relocation is on, this causes an MMU exception. When
relocation is off, a machine check exception. It is possible to trigger
this machine check by branching to a foreign address with MSR[IR]=0.

Fixes: 7b9f71f9 ("powerpc/64s: POWER9 machine check handler")
Reported-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

90df4bfb

05 6月, 2017 6 次提交

powerpc/mm: Rename map_page() to map_kernel_page() on 32-bit · 4386c096

由 Christophe Leroy 提交于 5月 29, 2017

These two functions implement the same semantics, so unify their naming so we
can share code that calls them. The longer name is more descriptive so use it.
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Acked-by: NBalbir Singh <bsingharora@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4386c096

powerpc/mm/hugetlb: Add support for page accounting · d2485644

由 Balbir Singh 提交于 5月 02, 2017

Add __GFP_ACCOUNT to __hugepte_alloc()
Signed-off-by: NBalbir Singh <bsingharora@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

d2485644

powerpc/mm/book(e)(3s)/32: Add page table accounting · abd667be

由 Balbir Singh 提交于 5月 02, 2017

Add support in pte_alloc_one() and pgd_alloc() by
passing __GFP_ACCOUNT in the flags
Signed-off-by: NBalbir Singh <bsingharora@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

abd667be

powerpc/mm/book(e)(3s)/64: Add page table accounting · de3b8761

由 Balbir Singh 提交于 5月 02, 2017

Introduce a helper pgtable_gfp_flags() which
just returns the current gfp flags and adds
__GFP_ACCOUNT to account for page table allocation.
The generic helper is added to include/asm/pgalloc.h
and has two variants - WARNING ugly bits ahead

1. If the header is included from a module, no check
for mm == &init_mm is done, since init_mm is not
exported
2. For kernel includes, the check is done and required
see (3e79ec7d arch: x86: charge page tables to kmemcg)

The fundamental assumption is that no module should be
doing pgd/pud/pmd and pte alloc's on behalf of init_mm
directly.

NOTE: This adds an overhead to pmd/pud/pgd allocations
similar to x86.  The other alternative was to implement
pmd_alloc_kernel/pud_alloc_kernel and pgd_alloc_kernel
with their offset variants.

For 4k page size, pte_alloc_one no longer calls
pte_alloc_one_kernel.
Signed-off-by: NBalbir Singh <bsingharora@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

de3b8761

powerpc/mm/hash: Do a local flush if possible when no batch is active · c5cee642

由 Balbir Singh 提交于 5月 25, 2017

Currently in hpte_need_flush() if there is no batch pending we always do a
global TLB flush, which is inefficient if the mm has never run on another
thread.

Instead do the same check that __flush_tlb_pending() does and check if a local
flush is sufficient when batch->active is false. Instead of open-coding it we
use mm_is_thread_local().
Signed-off-by: NBalbir Singh <bsingharora@gmail.com>
[mpe: Don't use a local, just inline mm_is_thread_local()]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c5cee642

powerpc: Fix some spelling mistakes · b802ab46

由 Colin Ian King 提交于 6月 05, 2017

Collation of some spelling fixes from Colin.

 Attemping   -> Attempting
 intialized  -> initialized
 missmanaged -> mismanaged
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b802ab46

02 6月, 2017 1 次提交

powerpc/lib/xor_vmx: Ensure no altivec code executes before enable_kernel_altivec() · f718d426

由 Matt Brown 提交于 5月 24, 2017

The xor_vmx.c file is used for the RAID5 xor operations. In these functions
altivec is enabled to run the operation and then disabled.

The code uses enable_kernel_altivec() around the core of the algorithm, however
the whole file is built with -maltivec, so the compiler is within its rights to
generate altivec code anywhere. This has been seen at least once in the wild:

  0:mon> di $xor_altivec_2
  c0000000000b97d0  3c4c01d9	addis   r2,r12,473
  c0000000000b97d4  3842db30	addi    r2,r2,-9424
  c0000000000b97d8  7c0802a6	mflr    r0
  c0000000000b97dc  f8010010	std     r0,16(r1)
  c0000000000b97e0  60000000	nop
  c0000000000b97e4  7c0802a6	mflr    r0
  c0000000000b97e8  faa1ffa8	std     r21,-88(r1)
  ...
  c0000000000b981c  f821ff41	stdu    r1,-192(r1)
  c0000000000b9820  7f8101ce	stvx    v28,r1,r0		<-- POP
  c0000000000b9824  38000030	li      r0,48
  c0000000000b9828  7fa101ce	stvx    v29,r1,r0
  ...
  c0000000000b984c  4bf6a06d	bl      c0000000000238b8 # enable_kernel_altivec

This patch splits the non-altivec code into xor_vmx_glue.c which calls the
altivec functions in xor_vmx.c. By compiling xor_vmx_glue.c without
-maltivec we can guarantee that altivec instruction will not be executed
outside of the enable/disable block.
Signed-off-by: NMatt Brown <matthew.brown.dev@gmail.com>
[mpe: Rework change log and include disassembly]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f718d426

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功