提交 · d8d42b0511fefc78165ee9b4c2d95f5d6db7350d · openeuler / raspberrypi-kernel

13 9月, 2016 2 次提交

powerpc/64: Do load of PACAKBASE in LOAD_HANDLER · d8d42b05

由 Michael Ellerman 提交于 7月 26, 2016

The LOAD_HANDLER macro requires that you have previously loaded "reg"
with PACAKBASE. Although that gives callers flexibility to get PACAKBASE
in some interesting way, none of the callers actually do that. So fold
the load of PACAKBASE into the macro, making it simpler for callers to
use correctly.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NNick Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

d8d42b05

powerpc/mm: Preserve CFAR value on SLB miss caused by access to bogus address · f0f558b1

由 Paul Mackerras 提交于 9月 02, 2016

Currently, if userspace or the kernel accesses a completely bogus address,
for example with any of bits 46-59 set, we first take an SLB miss interrupt,
install a corresponding SLB entry with VSID 0, retry the instruction, then
take a DSI/ISI interrupt because there is no HPT entry mapping the address.
However, by the time of the second interrupt, the Come-From Address Register
(CFAR) has been overwritten by the rfid instruction at the end of the SLB
miss interrupt handler. Since bogus accesses can often be caused by a
function return after the stack has been overwritten, the CFAR value would
be very useful as it could indicate which function it was whose return had
led to the bogus address.

This patch adds code to create a full exception frame in the SLB miss handler
in the case of a bogus address, rather than inserting an SLB entry with a
zero VSID field. Then we call a new slb_miss_bad_addr() function in C code,
which delivers a signal for a user access or creates an oops for a kernel
access. In the latter case the oops message will show the CFAR value at the
time of the access.

In the case of the radix MMU, a segment miss interrupt indicates an access
outside the ranges mapped by the page tables. Previously this was handled
by the code for an unrecoverable SLB miss (one with MSR[RI] = 0), which is
not really correct. With this patch, we now handle these interrupts with
slb_miss_bad_addr(), which is much more consistent.
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f0f558b1

22 8月, 2016 2 次提交

powerpc/pseries: PACA save area fix for MCE vs MCE · a74599a5

由 Nicholas Piggin 提交于 8月 10, 2016

MCE must not enable MSR_RI until PACA_EXMC is no longer being used.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

a74599a5

powerpc/pseries: PACA save area fix for general exception vs MCE · 3f3b5dc1

由 Nicholas Piggin 提交于 8月 10, 2016

MCE must not use PACA_EXGEN. When a general exception enables MSR_RI,
that means SPRN_SRR[01] and SPRN_SPRG are no longer used. However the
PACA save area is still in use.
Acked-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

3f3b5dc1

09 8月, 2016 1 次提交

powerpc/powernv: Fix MCE handler to avoid trashing CR0/CR1 registers. · bc14c491

由 Mahesh Salgaonkar 提交于 8月 05, 2016

The current implementation of MCE early handling modifies CR0/1 registers
without saving its old values. Fix this by moving early check for
powersaving mode to machine_check_handle_early().

The power architecture 2.06 or later allows the possibility of getting
machine check while in nap/sleep/winkle. The last bit of HSPRG0 is set
to 1, if thread is woken up from winkle. Hence, clear the last bit of
HSPRG0 (r13) before MCE handler starts using it as paca pointer.

Also, the current code always puts the thread into nap state irrespective
of whatever idle state it woke up from. Fix that by looking at
paca->thread_idle_state and put the thread back into same state where it
came from.

Fixes: 1c51089f ("powerpc/book3s: Return from interrupt if coming from evil context.")
Reported-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Reviewed-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

bc14c491

01 8月, 2016 1 次提交

powerpc/mm: Make MMU_FTR_RADIX a MMU family feature · 5a25b6f5

由 Aneesh Kumar K.V 提交于 7月 27, 2016

MMU feature bits are defined such that we use the lower half to
present MMU family features. Remove the strict split of half and
also move Radix to a mmu family feature. Radix introduce a new MMU
model and strictly speaking it is a new MMU family. This also free
up bits which can be used for individual features later.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

5a25b6f5

17 7月, 2016 2 次提交

powerpc/irq: Add support for HV virtualization interrupts · 9baaef0a

由 Benjamin Herrenschmidt 提交于 7月 08, 2016

This will be delivering external interrupts from the XIVE to the
Hypervisor. We treat it as a normal external interrupt for the
lazy irq disable code (so it will be replayed as a 0x500) and
route it to do_IRQ.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

9baaef0a

powerpc/book64s: Move a few exception common handlers to make room · b88d4bce

由 Benjamin Herrenschmidt 提交于 7月 16, 2016

This moves the CBE RAS and facility unavailable "common" handlers
down to after the FWNMI page.

This frees up some space in the very demanded spaces before the
relocation-on vectors and before the FWNMI page. They are still
within 64K of __start, so CONFIG_RELOCATABLE should still work.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b88d4bce

15 7月, 2016 2 次提交

powerpc/powernv: Rename reusable idle functions to hardware agnostic names · 5fa6b6bd

由 Shreyas B. Prabhu 提交于 7月 08, 2016

Functions like power7_wakeup_loss, power7_wakeup_noloss,
power7_wakeup_tb_loss are used by POWER7 and POWER8 hardware. They can
also be used by POWER9. Hence rename these functions hardware agnostic
names.
Suggested-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

5fa6b6bd

powerpc/kvm: make hypervisor state restore a function · 17065671

由 Shreyas B. Prabhu 提交于 7月 08, 2016

In the current code, when the thread wakes up in reset vector, some
of the state restore code and check for whether a thread needs to
branch to kvm is duplicated. Reorder the code such that this
duplication is avoided.

At a higher level this is what the change looks like-

Before this patch -
power7_wakeup_tb_loss:
	restore hypervisor state
	if (thread needed by kvm)
		goto kvm_start_guest
	restore nvgprs, cr, pc
	rfid to process context

power7_wakeup_loss:
	restore nvgprs, cr, pc
	rfid to process context

reset vector:
	if (waking from deep idle states)
		goto power7_wakeup_tb_loss
	else
		if (thread needed by kvm)
			goto kvm_start_guest
		goto power7_wakeup_loss

After this patch -
power7_wakeup_tb_loss:
	restore hypervisor state
	return

power7_restore_hyp_resource():
	if (waking from deep idle states)
		goto power7_wakeup_tb_loss
	return

power7_wakeup_loss:
	restore nvgprs, cr, pc
	rfid to process context

reset vector:
	power7_restore_hyp_resource()
	if (thread needed by kvm)
                goto kvm_start_guest
	goto power7_wakeup_loss
Reviewed-by: NPaul Mackerras <paulus@samba.org>
Reviewed-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

17065671

23 6月, 2016 1 次提交

powerpc: Fix faults caused by radix patching of SLB miss handler · 6e914ee6

由 Michael Ellerman 提交于 6月 20, 2016

As part of the Radix MMU support we added some feature sections in the
SLB miss handler. These are intended to catch the case that we
incorrectly take an SLB miss when Radix is enabled, and instead of
crashing weirdly they bail out to a well defined exit path and trigger
an oops.

However the way they were written meant the bailout case was enabled by
default until we did CPU feature patching.

On powermacs the early debug prints in setup_system() can cause an SLB
miss, which happens before code patching, and so the SLB miss handler
would incorrectly bailout and crash during boot.

Fix it by inverting the sense of the feature section, so that the code
which is in place at boot is correct for the hash case. Once we
determine we are using Radix - which will never happen on a powermac -
only then do we patch in the bailout case which unconditionally jumps.

Fixes: caca285e ("powerpc/mm/radix: Use STD_MMU_64 to properly isolate hash related code")
Reported-by: NDenis Kirjanov <kda@linux-powerpc.org>
Tested-by: NDenis Kirjanov <kda@linux-powerpc.org>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

6e914ee6

20 6月, 2016 1 次提交

KVM: PPC: Book3S HV: Fix TB corruption in guest exit path on HMI interrupt · fd7bacbc

由 Mahesh Salgaonkar 提交于 5月 15, 2016

When a guest is assigned to a core it converts the host Timebase (TB)
into guest TB by adding guest timebase offset before entering into
guest. During guest exit it restores the guest TB to host TB. This means
under certain conditions (Guest migration) host TB and guest TB can differ.

When we get an HMI for TB related issues the opal HMI handler would
try fixing errors and restore the correct host TB value. With no guest
running, we don't have any issues. But with guest running on the core
we run into TB corruption issues.

If we get an HMI while in the guest, the current HMI handler invokes opal
hmi handler before forcing guest to exit. The guest exit path subtracts
the guest TB offset from the current TB value which may have already
been restored with host value by opal hmi handler. This leads to incorrect
host and guest TB values.

With split-core, things become more complex. With split-core, TB also gets
split and each subcore gets its own TB register. When a hmi handler fixes
a TB error and restores the TB value, it affects all the TB values of
sibling subcores on the same core. On TB errors all the thread in the core
gets HMI. With existing code, the individual threads call opal hmi handle
independently which can easily throw TB out of sync if we have guest
running on subcores. Hence we will need to co-ordinate with all the
threads before making opal hmi handler call followed by TB resync.

This patch introduces a sibling subcore state structure (shared by all
threads in the core) in paca which holds information about whether sibling
subcores are in Guest mode or host mode. An array in_guest[] of size
MAX_SUBCORE_PER_CORE=4 is used to maintain the state of each subcore.
The subcore id is used as index into in_guest[] array. Only primary
thread entering/exiting the guest is responsible to set/unset its
designated array element.

On TB error, we get HMI interrupt on every thread on the core. Upon HMI,
this patch will now force guest to vacate the core/subcore. Primary
thread from each subcore will then turn off its respective bit
from the above bitmap during the guest exit path just after the
guest->host partition switch is complete.

All other threads that have just exited the guest OR were already in host
will wait until all other subcores clears their respective bit.
Once all the subcores turn off their respective bit, all threads will
will make call to opal hmi handler.

It is not necessary that opal hmi handler would resync the TB value for
every HMI interrupts. It would do so only for the HMI caused due to
TB errors. For rest, it would not touch TB value. Hence to make things
simpler, primary thread would call TB resync explicitly once for each
core immediately after opal hmi handler instead of subtracting guest
offset from TB. TB resync call will restore the TB with host value.
Thus we can be sure about the TB state.

One of the primary threads exiting the guest will take up the
responsibility of calling TB resync. It will use one of the top bits
(bit 63) from subcore state flags bitmap to make the decision. The first
primary thread (among the subcores) that is able to set the bit will
have to call the TB resync. Rest all other threads will wait until TB
resync is complete. Once TB resync is complete all threads will then
proceed.
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

fd7bacbc

11 5月, 2016 2 次提交

powerpc/powernv: Rename machine_check_pSeries_early() to powernv · 2513767d

由 Mahesh Salgaonkar 提交于 3月 01, 2016

The routine machine_check_pSeries_early() is only used on powernv, not
pseries. Hence rename machine_check_pSeries_early() to
machine_check_powernv_early().
Reported-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

2513767d

powerpc/mm/radix: Use STD_MMU_64 to properly isolate hash related code · caca285e

由 Aneesh Kumar K.V 提交于 4月 29, 2016

We also use MMU_FTR_RADIX to branch out from code path specific to
hash.

No functionality change.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

caca285e

21 4月, 2016 2 次提交

powerpc/book3s64: Remove __end_handlers marker · 057b6d7e

由 Hari Bathini 提交于 4月 08, 2016

The __end_handlers marker was intended to mark down upto code that gets
called from exception prologs. But that hasn't kept pace with code
changes. Case in point, slb_miss_realmode being called from exception
prolog code but isn't below __end_handlers marker. So, __end_handlers
marker is as good as a comment but could be misleading at times if it
isn't in sync with the code, as is the case now. So, let us avoid this
confusion by having a better comment and removing __end_handlers marker
altogether.
Signed-off-by: NHari Bathini <hbathini@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

057b6d7e

powerpc/book3s64: Fix branching to OOL handlers in relocatable kernel · 8ed8ab40

由 Hari Bathini 提交于 4月 15, 2016

Some of the interrupt vectors on 64-bit POWER server processors are only
32 bytes long (8 instructions), which is not enough for the full
first-level interrupt handler. For these we need to branch to an
out-of-line (OOL) handler. But when we are running a relocatable kernel,
interrupt vectors till __end_interrupts marker are copied down to real
address 0x100. So, branching to labels (ie. OOL handlers) outside this
section must be handled differently (see LOAD_HANDLER()), considering
relocatable kernel, which would need at least 4 instructions.

However, branching from interrupt vector means that we corrupt the
CFAR (come-from address register) on POWER7 and later processors as
mentioned in commit 1707dd16. So, EXCEPTION_PROLOG_0 (6 instructions)
that contains the part up to the point where the CFAR is saved in the
PACA should be part of the short interrupt vectors before we branch out
to OOL handlers.

But as mentioned already, there are interrupt vectors on 64-bit POWER
server processors that are only 32 bytes long (like vectors 0x4f00,
0x4f20, etc.), which cannot accomodate the above two cases at the same
time owing to space constraint. Currently, in these interrupt vectors,
we simply branch out to OOL handlers, without using LOAD_HANDLER(),
which leaves us vulnerable when running a relocatable kernel (eg. kdump
case). While this has been the case for sometime now and kdump is used
widely, we were fortunate not to see any problems so far, for three
reasons:

  1. In almost all cases, production kernel (relocatable) is used for
     kdump as well, which would mean that crashed kernel's OOL handler
     would be at the same place where we end up branching to, from short
     interrupt vector of kdump kernel.
  2. Also, OOL handler was unlikely the reason for crash in almost all
     the kdump scenarios, which meant we had a sane OOL handler from
     crashed kernel that we branched to.
  3. On most 64-bit POWER server processors, page size is large enough
     that marking interrupt vector code as executable (see commit
     429d2e83) leads to marking OOL handler code from crashed kernel,
     that sits right below interrupt vector code from kdump kernel, as
     executable as well.

Let us fix this by moving the __end_interrupts marker down past OOL
handlers to make sure that we also copy OOL handlers to real address
0x100 when running a relocatable kernel.

This fix has been tested successfully in kdump scenario, on an LPAR with
4K page size by using different default/production kernel and kdump
kernel.

Also tested by manually corrupting the OOL handlers in the first kernel
and then kdump'ing, and then causing the OOL handlers to fire - mpe.

Fixes: c1fb6816 ("powerpc: Add relocation on exception vector handlers")
Cc: stable@vger.kernel.org
Signed-off-by: NHari Bathini <hbathini@linux.vnet.ibm.com>
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

8ed8ab40

11 4月, 2016 1 次提交

powerpc/mm: Remove long disabled SLB code · 1f4c66e8

由 Michael Ellerman 提交于 3月 16, 2016

We have a bunch of SLB related code in the tree which is there to handle
dynamic VSIDs - but currently it's all disabled at compile time. The
comments say "Keep that around for when we re-implement dynamic VSIDs".

But that was over 10 years ago (commit 3c726f8d ("[PATCH] ppc64:
support 64k pages")). The chance that it would still work unchanged is
minimal, and in the meantime it's confusing to folks browsing/grepping
the code. If we ever want to re-instate it, it's in the git history.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Acked-by: NBalbir Singh <bsingharora@gmail.com>

1f4c66e8

17 12月, 2015 2 次提交

powerpc/kernel: Combine vec/loc for STD_EXCEPTION_PSERIES · 2613265c

由 Michael Ellerman 提交于 12月 16, 2015

The STD_EXCEPTION_PSERIES macro takes both a vector number, and a
location (memory address). However both are always identical, so combine
them to save repeating ourselves.

This does mean an exception handler must always exist at the location in
memory that matches its vector number. But that's OK because this is the
"STD" macro (standard), which does exactly that. We have other macros
for the other cases, eg. STD_EXCEPTION_PSERIES_OOL (out of line).
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

2613265c

powerpc/kernel: Drop HMT_MEDIUM_PPR_DISCARD · d6265aea

由 Michael Ellerman 提交于 11月 25, 2015

HMT_MEDIUM_PPR_DISCARD is a macro which is present at the start of most
of our first level exception handlers. It conditionally executes a
HMT_MEDIUM instruction, which sets the processor priority to medium.

On on modern systems, ie. Power7 and later, it is nop'ed out at boot.
All it does is make the exception vectors more cramped, and consume 4
bytes of icache.

On old systems it has the effect of boosting the processor priority at
the start of exception processing. If we were previously in the idle
loop for example, we may be at low or very low priority. This is
desirable as we want to process the exception as fast as possible.

However looking closely at the generated code, we see that in all cases
we execute another HMT_MEDIUM just four instructions later. With code
patching applied, the final code on an old (Power6) system will look
like, eg:

  c000000000000300 <data_access_pSeries>:
  c000000000000300:	7c 42 13 78	mr	r2,r2		<-
  c000000000000304:	7d b2 43 a6	mtsprg	2,r13
  c000000000000308:	7d b1 42 a6	mfsprg	r13,1
  c00000000000030c:	f9 2d 00 80	std	r9,128(r13)
  c000000000000310:	60 00 00 00	nop
  c000000000000314:	7c 42 13 78	mr	r2,r2		<-

So I suggest that the added code complexity of HMT_MEDIUM_PPR_DISCARD is
not justified by the benefit of boosting the processor priority for the
duration of four instructions, and therefore we drop it.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

d6265aea

14 12月, 2015 1 次提交

powerpc/mm: Remove the dependency on pte bit position in asm code · 106713a1

由 Aneesh Kumar K.V 提交于 12月 01, 2015

We should not expect pte bit position in asm code. Simply
by moving part of that to C
Acked-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

106713a1

01 12月, 2015 1 次提交

powerpc/64: Include KVM guest test in all interrupt vectors · 31a40e2b

由 Paul Mackerras 提交于 11月 12, 2015

Currently, if HV KVM is configured but PR KVM isn't, we don't include
a test to see whether we were interrupted in KVM guest context for the
set of interrupts which get delivered directly to the guest by hardware
if they occur in the guest.  This includes things like program
interrupts.

However, the recent bug where userspace could set the MSR for a VCPU
to have an illegal value in the TS field, and thus cause a TM Bad Thing
type of program interrupt on the hrfid that enters the guest, showed that
we can never be completely sure that these interrupts can never occur
in the guest entry/exit code.  If one of these interrupts does happen
and we have HV KVM configured but not PR KVM, then we end up trying to
run the handler in the host with the MMU set to the guest MMU context,
which generally ends badly.

Thus, for robustness it is better to have the test in every interrupt
vector, so that if some way is found to trigger some interrupt in the
guest entry/exit path, we can handle it without immediately crashing
the host.

This means that the distinction between KVMTEST and KVMTEST_PR goes
away.  Thus we delete KVMTEST_PR and associated macros and use KVMTEST
everywhere that we previously used either KVMTEST_PR or KVMTEST.  It
also means that SOFTEN_TEST_HV_201 becomes the same as SOFTEN_TEST_PR,
so we deleted SOFTEN_TEST_HV_201 and use SOFTEN_TEST_PR instead.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

31a40e2b

02 6月, 2015 2 次提交

powerpc: Non relocatable system call doesn't need a trampoline · d20be433

由 Anton Blanchard 提交于 5月 26, 2015

We need to use a trampoline when using LOAD_HANDLER(), because the
destination needs to be in the first 64kB. An absolute branch has
no such limitations, so just jump there.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

d20be433

powerpc: Relocatable system call no longer uses the LR · 05b05f28

由 Anton Blanchard 提交于 5月 26, 2015

We had some code to restore the LR in the relocatable system call path
back when we used the LR to do an indirect branch.

Commit 6a404806 ("powerpc: Avoid link stack corruption in MMU
on syscall entry path") changed this to use the CTR which is volatile
across system calls so does not need restoring.

Remove the stale comment and the restore of the LR.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

05b05f28

23 3月, 2015 1 次提交

powerpc/book3s: Fix the MCE code to use CONFIG_KVM_BOOK3S_64_HANDLER · 44d5f6f5

由 Mahesh Salgaonkar 提交于 3月 17, 2015

commit id 2ba9f0d8 has changed CONFIG_KVM_BOOK3S_64_HV to tristate to allow
HV/PR bits to be built as modules. But the MCE code still depends on
CONFIG_KVM_BOOK3S_64_HV which is wrong. When user selects
CONFIG_KVM_BOOK3S_64_HV=m to build HV/PR bits as a separate module the
relevant MCE code gets excluded.

This patch fixes the MCE code to use CONFIG_KVM_BOOK3S_64_HANDLER. This
makes sure that the relevant MCE code is included when HV/PR bits
are built as a separate modules.

Fixes: 2ba9f0d8 ("kvm: powerpc: book3s: Support building HV and PR KVM as module")
Cc: stable@vger.kernel.org  # v3.14+
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

44d5f6f5

15 12月, 2014 2 次提交

powernv/powerpc: Add winkle support for offline cpus · 77b54e9f

由 Shreyas B. Prabhu 提交于 12月 10, 2014

Winkle is a deep idle state supported in power8 chips. A core enters
winkle when all the threads of the core enter winkle. In this state
power supply to the entire chiplet i.e core, private L2 and private L3
is turned off. As a result it gives higher powersavings compared to
sleep.

But entering winkle results in a total hypervisor state loss. Hence the
hypervisor context has to be preserved before entering winkle and
restored upon wake up.

Power-on Reset Engine (PORE) is a dedicated engine which is responsible
for powering on the chiplet during wake up. It can be programmed to
restore the register contests of a few specific registers. This patch
uses PORE to restore register state wherever possible and uses stack to
save and restore rest of the necessary registers.

With hypervisor state restore things fall under three categories-
per-core state, per-subcore state and per-thread state. To manage this,
extend the infrastructure introduced for sleep. Mainly we add a paca
variable subcore_sibling_mask. Using this and the core_idle_state we can
distingush first thread in core and subcore.
Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

77b54e9f

powernv/cpuidle: Redesign idle states management · 7cba160a

由 Shreyas B. Prabhu 提交于 12月 10, 2014

Deep idle states like sleep and winkle are per core idle states. A core
enters these states only when all the threads enter either the
particular idle state or a deeper one. There are tasks like fastsleep
hardware bug workaround and hypervisor core state save which have to be
done only by the last thread of the core entering deep idle state and
similarly tasks like timebase resync, hypervisor core register restore
that have to be done only by the first thread waking up from these
state.

The current idle state management does not have a way to distinguish the
first/last thread of the core waking/entering idle states. Tasks like
timebase resync are done for all the threads. This is not only is
suboptimal, but can cause functionality issues when subcores and kvm is
involved.

This patch adds the necessary infrastructure to track idle states of
threads in a per-core structure. It uses this info to perform tasks like
fastsleep workaround and timebase resync only once per core.
Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Originally-by: NPreeti U. Murthy <preeti@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: linux-pm@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7cba160a

08 12月, 2014 1 次提交

powerpc/powernv: Return to cpu offline loop when finished in KVM guest · 56548fc0

由 Paul Mackerras 提交于 12月 03, 2014

When a secondary hardware thread has finished running a KVM guest, we
currently put that thread into nap mode using a nap instruction in
the KVM code.  This changes the code so that instead of doing a nap
instruction directly, we instead cause the call to power7_nap() that
put the thread into nap mode to return.  The reason for doing this is
to avoid having the KVM code having to know what low-power mode to
put the thread into.

In the case of a secondary thread used to run a KVM guest, the thread
will be offline from the point of view of the host kernel, and the
relevant power7_nap() call is the one in pnv_smp_cpu_disable().
In this case we don't want to clear pending IPIs in the offline loop
in that function, since that might cause us to miss the wakeup for
the next time the thread needs to run a guest.  To tell whether or
not to clear the interrupt, we use the SRR1 value returned from
power7_nap(), and check if it indicates an external interrupt.  We
arrange that the return from power7_nap() when we have finished running
a guest returns 0, so pending interrupts don't get flushed in that
case.

Note that it is important a secondary thread that has finished
executing in the guest, or that didn't have a guest to run, should
not return to power7_nap's caller while the kvm_hstate.hwthread_req
flag in the PACA is non-zero, because the return from power7_nap
will reenable the MMU, and the MMU might still be in guest context.
In this situation we spin at low priority in real mode waiting for
hwthread_req to become zero.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

56548fc0

05 12月, 2014 1 次提交

powerpc/mm: don't do tlbie for updatepp request with NO HPTE fault · aefa5688

由 Aneesh Kumar K.V 提交于 12月 04, 2014

upatepp can get called for a nohpte fault when we find from the linux
page table that the translation was hashed before. In that case
we are sure that there is no existing translation, hence we could
avoid doing tlbie.

We could possibly race with a parallel fault filling the TLB. But
that should be ok because updatepp is only ever relaxing permissions.
We also look at linux pte permission bits when filling hash pte
permission bits. We also hold the linux pte busy bits while
inserting/updating a hashpte entry, hence a paralle update of
linux pte is not possible. On the other hand mprotect involves
ptep_modify_prot_start which cause a hpte invalidate and not updatepp.

Performance number:
We use randbox_access_bench written by Anton.

Kernel with THP disabled and smaller hash page table size.

86.60% random_access_b [kernel.kallsyms] [k] .native_hpte_updatepp
2.10% random_access_b random_access_bench [.] doit
1.99% random_access_b [kernel.kallsyms] [k] .do_raw_spin_lock
1.85% random_access_b [kernel.kallsyms] [k] .native_hpte_insert
1.26% random_access_b [kernel.kallsyms] [k] .native_flush_hash_range
1.18% random_access_b [kernel.kallsyms] [k] .__delay
0.69% random_access_b [kernel.kallsyms] [k] .native_hpte_remove
0.37% random_access_b [kernel.kallsyms] [k] .clear_user_page
0.34% random_access_b [kernel.kallsyms] [k] .__hash_page_64K
0.32% random_access_b [kernel.kallsyms] [k] fast_exception_return
0.30% random_access_b [kernel.kallsyms] [k] .hash_page_mm

With Fix:

27.54% random_access_b random_access_bench [.] doit
22.90% random_access_b [kernel.kallsyms] [k] .native_hpte_insert
5.76% random_access_b [kernel.kallsyms] [k] .native_hpte_remove
5.20% random_access_b [kernel.kallsyms] [k] fast_exception_return
5.12% random_access_b [kernel.kallsyms] [k] .__hash_page_64K
4.80% random_access_b [kernel.kallsyms] [k] .hash_page_mm
3.31% random_access_b [kernel.kallsyms] [k] data_access_common
1.84% random_access_b [kernel.kallsyms] [k] .trace_hardirqs_on_caller
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

aefa5688

02 12月, 2014 1 次提交

powerpc/powernv: Cleanup unused MCE definitions/declarations. · 6d626c5e

由 Mahesh Salgaonkar 提交于 11月 24, 2014

Cleanup OpalMCE_* definitions/declarations and other related code which
is not used anymore.
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: NBenjamin Herrrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

6d626c5e

12 11月, 2014 1 次提交

powerpc: Save/restore PPR for KVM hypercalls · 8b91a255

由 Suresh E. Warrier 提交于 11月 03, 2014

The system call FLIH (first-level interrupt handler) at 0xc00
unconditionally sets hardware priority to medium. For hypercalls, this
means we lose guest OS priority. The front end (do_kvm_0x**) to the
KVM interrupt handler always assumes that PPR priority is saved in
PACA exception save area, so it copies this to the kvm_hstate
structure. For hypercalls, this would be the saved priority from any
previous exception. Eventually, the guest gets resumed with an
incorrect priority.

The fix is to save the PPR priority in PACA exception save area before
switching HMT priorities in the FLIH so that existing code described above
in the KVM interrupt handler can copy it from there into the VCPU's saved
context.
Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@samba.org>
[mpe: Dropped HMT_MEDIUM_PPR_DISCARD and reworded comment]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

8b91a255

10 10月, 2014 1 次提交

powerpc/book3s: Don't clear MSR_RI in hmi handler. · c675c7db

由 Mahesh Salgaonkar 提交于 10月 07, 2014

In HMI interrupt handler we don't touch SRR0/SRR1, instead we touch
HSRR0/HSRR1. Hence we don't need to clear MSR_RI bit.
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c675c7db

13 8月, 2014 1 次提交

powerpc: Fix "attempt to move .org backwards" error · 11d54904

由 Guenter Roeck 提交于 8月 08, 2014

Once again, we see

arch/powerpc/kernel/exceptions-64s.S: Assembler messages:
arch/powerpc/kernel/exceptions-64s.S:865: Error: attempt to move .org backwards
arch/powerpc/kernel/exceptions-64s.S:866: Error: attempt to move .org backwards
arch/powerpc/kernel/exceptions-64s.S:890: Error: attempt to move .org backwards

when compiling ppc:allmodconfig.

This time the problem has been caused by to commit 0869b6fd
("powerpc/book3s: Add basic infrastructure to handle HMI in Linux"),
which adds functions hmi_exception_early and hmi_exception_after_realmode
into a critical (size-limited) code area, even though that does not appear
to be necessary.

Move those functions to a non-critical area of the file.
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

11d54904

05 8月, 2014 1 次提交

powerpc/book3s: Add basic infrastructure to handle HMI in Linux. · 0869b6fd

由 Mahesh Salgaonkar 提交于 7月 29, 2014

Handle Hypervisor Maintenance Interrupt (HMI) in Linux. This patch implements
basic infrastructure to handle HMI in Linux host. The design is to invoke
opal handle hmi in real mode for recovery and set irq_pending when we hit HMI.
During check_irq_replay pull opal hmi event and print hmi info on console.
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

0869b6fd

28 7月, 2014 3 次提交

powerpc: Remove misleading DISABLE_INTS · 9daf112b

由 Michael Ellerman 提交于 7月 15, 2014

DISABLE_INTS has a long and storied history, but for some time now it
has not actually disabled interrupts.

For the open-coded exception handlers, just stop using it, instead call
RECONCILE_IRQ_STATE directly. This has the benefit of removing a level
of indirection, and making it clear that r10 & r11 are used at that
point.

For the addition case we still need a macro, so rename it to clarify
what it actually does.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

9daf112b

powerpc: Move bad_stack() below the fwnmi_data_area · 4e2bf01b

由 Michael Ellerman 提交于 7月 15, 2014

At the moment the allmodconfig build is failing because we run out of
space between altivec_assist() at 0x5700 and the fwnmi_data_area at
0x7000.

Fixing it permanently will take some more work, but a quick fix is to
move bad_stack() below the fwnmi_data_area. That gives us just enough
room with everything enabled.

bad_stack() is called from the common exception handlers, but it's a
non-conditional branch, so we have plenty of scope to move it further
way.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

4e2bf01b

powerpc: Remove STAB code · 376af594

由 Michael Ellerman 提交于 7月 10, 2014

Old cpus didn't have a Segment Lookaside Buffer (SLB), instead they had
a Segment Table (STAB). Now that we've dropped support for those cpus,
we can remove the STAB support entirely.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

376af594

12 6月, 2014 1 次提交

powerpc/book3s: Fix some ABIv2 issues in machine check code · ad718622

由 Anton Blanchard 提交于 6月 12, 2014

Commit 2749a2f2 (powerpc/book3s: Fix machine check handling for
unhandled errors) introduced a few ABIv2 issues.

We can maintain ABIv1 and ABIv2 compatibility by branching to the
function rather than the dot symbol.

Fixes: 2749a2f2 ("powerpc/book3s: Fix machine check handling for unhandled errors")
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

ad718622

11 6月, 2014 2 次提交

powerpc/book3s: Add stack overflow check in machine check handler. · e75ad93a

由 Mahesh Salgaonkar 提交于 6月 11, 2014

Currently machine check handler does not check for stack overflow for
nested machine check. If we hit another MCE while inside the machine check
handler repeatedly from same address then we get into risk of stack
overflow which can cause huge memory corruption. This patch limits the
nested MCE level to 4 and panic when we cross level 4.
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

e75ad93a

powerpc/book3s: Fix machine check handling for unhandled errors · 2749a2f2

由 Mahesh Salgaonkar 提交于 6月 11, 2014

Current code does not check for unhandled/unrecovered errors and return from
interrupt if it is recoverable exception which in-turn triggers same machine
check exception in a loop causing hypervisor to be unresponsive.

This patch fixes this situation and forces hypervisor to panic for
unhandled/unrecovered errors.

This patch also fixes another issue where unrecoverable_exception routine
was called in real mode in case of unrecoverable exception (MSR_RI = 0).
This causes another exception vector 0x300 (data access) during system crash
leading to confusion while debugging cause of the system crash.

Also turn ME bit off while going down, so that when another MCE is hit during
panic path, system will checkstop and hypervisor will get restarted cleanly
by SP.

With the above fixes we now throw correct console messages (see below) while
crashing the system in case of unhandled/unrecoverable machine checks.

--------------
Severe Machine check interrupt [[Not recovered]
  Initiator: CPU
  Error type: UE [Instruction fetch]
    Effective address: 0000000030002864
Oops: Machine check, sig: 7 [#1]
SMP NR_CPUS=2048 NUMA PowerNV
Modules linked in: bork(O) bridge stp llc kvm [last unloaded: bork]
CPU: 36 PID: 55162 Comm: bash Tainted: G           O 3.14.0mce #1
task: c000002d72d022d0 ti: c000000007ec0000 task.ti: c000002d72de4000
NIP: 0000000030002864 LR: 00000000300151a4 CTR: 000000003001518c
REGS: c000000007ec3d80 TRAP: 0200   Tainted: G           O  (3.14.0mce)
MSR: 9000000000041002 <SF,HV,ME,RI>  CR: 28222848  XER: 20000000
CFAR: 0000000030002838 DAR: d0000000004d0000 DSISR: 00000000 SOFTE: 1
GPR00: 000000003001512c 0000000031f92cb0 0000000030078af0 0000000030002864
GPR04: d0000000004d0000 0000000000000000 0000000030002864 ffffffffffffffc9
GPR08: 0000000000000024 0000000030008af0 000000000000002c c00000000150e728
GPR12: 9000000000041002 0000000031f90000 0000000010142550 0000000040000000
GPR16: 0000000010143cdc 0000000000000000 00000000101306fc 00000000101424dc
GPR20: 00000000101424e0 000000001013c6f0 0000000000000000 0000000000000000
GPR24: 0000000010143ce0 00000000100f6440 c000002d72de7e00 c000002d72860250
GPR28: c000002d72860240 c000002d72ac0038 0000000000000008 0000000000040000
NIP [0000000030002864] 0x30002864
LR [00000000300151a4] 0x300151a4
Call Trace:
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
---[ end trace 7285f0beac1e29d3 ]---

Sending IPI to other CPUs
IPI complete
OPAL V3 detected !
--------------
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

2749a2f2

23 4月, 2014 1 次提交

powerpc: Remove dot symbol usage in exception macros · 35425501

由 Anton Blanchard 提交于 2月 04, 2014

STD_EXCEPTION_COMMON, STD_EXCEPTION_COMMON_ASYNC and
MASKABLE_EXCEPTION branch to the handler, so we can remove
the explicit dot symbol and binutils will do the right thing.
Signed-off-by: NAnton Blanchard <anton@samba.org>

35425501