提交 · 17bb312f4c75e59ef6f18015011c4efbb545a748 · openanolis / cloud-kernel

30 1月, 2015 4 次提交

powerpc/8xx: Take benefit of aligned PGDIR · 17bb312f

由 LEROY Christophe 提交于 1月 20, 2015

L1 base address is now aligned so we can insert L1 index into r11 directly and
then preserve r10
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NScott Wood <scottwood@freescale.com>

17bb312f

powerpc/8xx: remove tests on PGDIR entry validity · 5ddb75ce

由 LEROY Christophe 提交于 1月 20, 2015

Kernel MMU handling code handles validity of entries via _PMD_PRESENT which
corresponds to V bit in MD_TWC and MI_TWC. When the V bit is not set, MPC8xx
triggers TLBError exception. So we don't have to check that and branch ourself
to TLBError. We can set TLB entries with non present entries, remove all those
tests and let the 8xx handle it. This reduce the number of cycle when the
entries are valid which is the case most of the time, and doesn't significantly
increase the time for handling invalid entries.
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NScott Wood <scottwood@freescale.com>

5ddb75ce

powerpc/8xx: remove remaining unnecessary code in FixupDAR · 2374d0af

由 LEROY Christophe 提交于 1月 20, 2015

Since commit 33fb845a ("powerpc/8xx: Don't use MD_TWC for walk"), MD_EPN and
MD_TWC are not writen anymore in FixupDAR so saving r3 has become useless.
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NScott Wood <scottwood@freescale.com>

2374d0af

powerpc/8xx: use _PAGE_RO instead of _PAGE_RW · cadbfd01

由 LEROY Christophe 提交于 1月 19, 2015

On powerpc 8xx, in TLB entries, 0x400 bit is set to 1 for read-only pages
and is set to 0 for RW pages. So we should use _PAGE_RO instead of _PAGE_RW
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NScott Wood <scottwood@freescale.com>

cadbfd01

28 1月, 2015 1 次提交

powerpc: Remove some unused functions · 8aa989b8

由 Michael Ellerman 提交于 1月 27, 2015

Remove slice_set_psize() which is not used.

It was added in 3a8247cc "powerpc: Only demote individual slices
rather than whole process" but was never used.

Remove vsx_assist_exception() which is not used.

It was added in ce48b210 "powerpc: Add VSX context save/restore,
ptrace and signal support" but was never used.

Remove generic_mach_cpu_die() which is not used.

Its last caller was removed in 375f561a "powerpc/powernv: Always go
into nap mode when CPU is offline".

Remove mpc7448_hpc2_power_off() and mpc7448_hpc2_halt() which are
unused.

These were introduced in c5d56332 "[POWERPC] Add general support for
mpc7448hpc2 (Taiga) platform" but were never used.

This was partially found by using a static code analysis program called
cppcheck.
Signed-off-by: NRickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
[mpe: Update changelog with details on when/why they are unused]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

8aa989b8

27 1月, 2015 1 次提交

powerpc/pseries: Fix endian problems with LE migration · 3df76a9d

由 Cyril Bur 提交于 1月 21, 2015

RTAS events require arguments be passed in big endian while hypercalls
have their arguments passed in registers and the values should therefore
be in CPU endian.

The "ibm,suspend_me" 'RTAS' call makes a sequence of hypercalls to setup
one true RTAS call. This means that "ibm,suspend_me" is handled
specially in the ppc_rtas() syscall.

The ppc_rtas() syscall has its arguments in big endian and can therefore
pass these arguments directly to the RTAS call. "ibm,suspend_me" is
handled specially from within ppc_rtas() (by calling rtas_ibm_suspend_me())
which has left an endian bug on little endian systems due to the
requirement of hypercalls. The return value from rtas_ibm_suspend_me()
gets returned in cpu endian, and is left unconverted, also a bug on
little endian systems.

rtas_ibm_suspend_me() does not actually make use of the rtas_args that
it is passed. This patch removes the convoluted use of the rtas_args
struct to pass params to rtas_ibm_suspend_me() in favour of passing what
it needs as actual arguments. This patch also ensures the two callers of
rtas_ibm_suspend_me() pass function parameters in cpu endian and in the
case of ppc_rtas(), converts the return value.

migrate_store() (the other caller of rtas_ibm_suspend_me()) is from a
sysfs file which deals with everything in cpu endian so this function
only underwent cleanup.

This patch has been tested with KVM both LE and BE and on PowerVM both
LE and BE. Under QEMU/KVM the migration happens without touching these
code pathes.

For PowerVM there is no obvious regression on BE and the LE code path
now provides the correct parameters to the hypervisor.
Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

3df76a9d

23 1月, 2015 6 次提交

powerpc/eeh: Allow to set maximal frozen times · 1b28f170

由 Gavin Shan 提交于 12月 11, 2014

When PE's frozen count hits maximal allowed frozen times, which is
5 currently, it will be forced to be offline permanently. Once the
PE is removed permanently, rebooting machine is required to bring
the PE back. It's not convienent when testing EEH functionality.

The patch exports the maximal allowed frozen times through debugfs
entry (/sys/kernel/debug/powerpc/eeh_max_freezes).
Requested-by: NRyan Grimm <grimm@linux.vnet.ibm.com>
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

1b28f170

powerpc/eeh: Introduce flag EEH_PE_REMOVED · 432227e9

由 Gavin Shan 提交于 12月 11, 2014

The conditions that one specific PE's frozen count exceeds the maximal
allowed times (EEH_MAX_ALLOWED_FREEZES) and it's in isolated or recovery
state indicate the PE was removed permanently implicitly. The patch
introduces flag EEH_PE_REMOVED to indicate that explicitly so that we
don't depend on the fixed maximal allowed times, which can be varied as
we do in subsequent patch.

Flag EEH_PE_REMOVED is expected to be marked for the PE whose frozen
count exceeds the maximal allowed times, or just failed from recovery.
Requested-by: NRyan Grimm <grimm@linux.vnet.ibm.com>
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

432227e9

powerpc/eeh: Fix missed PE#0 on P7IOC · 2aa5cf9e

由 Gavin Shan 提交于 11月 25, 2014

PE#0 should be regarded as valid for P7IOC, while it's invalid for
PHB3. The patch adds flag EEH_VALID_PE_ZERO to differentiate those
two cases. Without the patch, we possibly see frozen PE#0 state is
cleared without EEH recovery taken on P7IOC as following kernel logs
indicate:

[root@ltcfbl8eb ~]# dmesg
       :
pci 0000:00     : [PE# 000] Secondary bus 0 associated with PE#0
pci 0000:01     : [PE# 001] Secondary bus 1 associated with PE#1
pci 0001:00     : [PE# 000] Secondary bus 0 associated with PE#0
pci 0001:01     : [PE# 001] Secondary bus 1 associated with PE#1
pci 0002:00     : [PE# 000] Secondary bus 0 associated with PE#0
pci 0002:01     : [PE# 001] Secondary bus 1 associated with PE#1
pci 0003:00     : [PE# 000] Secondary bus 0 associated with PE#0
pci 0003:01     : [PE# 001] Secondary bus 1 associated with PE#1
pci 0003:20     : [PE# 002] Secondary bus 32..63 associated with PE#2
       :
EEH: Clear non-existing PHB#3-PE#0
EEH: PHB location: U78AE.001.WZS00M9-P1-002
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

2aa5cf9e

powerpc/kernel: Avoid memory corruption at early stage · 6f20e7f2

由 Gavin Shan 提交于 1月 08, 2015

When calling to early_setup(), we pick "boot_paca" up for the master CPU
and initialize that with initialise_paca(). At that point, the SLB
shadow buffer isn't populated yet. Updating the SLB shadow buffer should
corrupt what we had in physical address 0 where the trap instruction is
usually stored.

This hasn't been observed to cause any trouble in practice, but is
obviously fishy.

Fixes: 6f4441ef ("powerpc: Dynamically allocate slb_shadow from memblock")
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

6f20e7f2

powerpc: Rename _TIF_SYSCALL_T_OR_A to _TIF_SYSCALL_DOTRACE · 10ea8343

由 Michael Ellerman 提交于 1月 15, 2015

Once upon a time, at least 9 years ago (< 2.6.12), _TIF_SYSCALL_T_OR_A
meant "TRACE or AUDIT". But these days it means TRACE or AUDIT or
SECCOMP or TRACEPOINT or NOHZ.

All of those are implemented via syscall_dotrace() so rename the flag to
that to try and clarify things.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

10ea8343

powerpc/pci: remove the multi-init for pci_dn->phb · 145a2d04

由 Wei Yang 提交于 12月 15, 2014

pci_dn->phb is set to phb in update_dn_pci_info(), if succeed.

This patch removes the duplication of pci_dn->phb initialization.
Signed-off-by: NWei Yang <weiyang@linux.vnet.ibm.com>
Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

145a2d04

29 12月, 2014 1 次提交

powerpc/kvm: Create proper names for the kvm_host_state PMU fields · 9a4fc4ea

由 Michael Ellerman 提交于 7月 10, 2014

We have two arrays in kvm_host_state that contain register values for
the PMU. Currently we only create an asm-offsets symbol for the base of
the arrays, and do the array offset in the assembly code.

Creating an asm-offsets symbol for each field individually makes the
code much nicer to read, particularly for the MMCRx/SIxR/SDAR fields, and
might have helped us notice the recent double restore bug we had in this
code.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Acked-by: NAlexander Graf <agraf@suse.de>

9a4fc4ea

17 12月, 2014 2 次提交

KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register · 4a157d61

由 Paul Mackerras 提交于 12月 03, 2014

There are two ways in which a guest instruction can be obtained from
the guest in the guest exit code in book3s_hv_rmhandlers.S.  If the
exit was caused by a Hypervisor Emulation interrupt (i.e. an illegal
instruction), the offending instruction is in the HEIR register
(Hypervisor Emulation Instruction Register).  If the exit was caused
by a load or store to an emulated MMIO device, we load the instruction
from the guest by turning data relocation on and loading the instruction
with an lwz instruction.

Unfortunately, in the case where the guest has opposite endianness to
the host, these two methods give results of different endianness, but
both get put into vcpu->arch.last_inst.  The HEIR value has been loaded
using guest endianness, whereas the lwz will load the instruction using
host endianness.  The rest of the code that uses vcpu->arch.last_inst
assumes it was loaded using host endianness.

To fix this, we define a new vcpu field to store the HEIR value.  Then,
in kvmppc_handle_exit_hv(), we transfer the value from this new field to
vcpu->arch.last_inst, doing a byte-swap if the guest and host endianness
differ.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

4a157d61

KVM: PPC: Book3S HV: Remove code for PPC970 processors · c17b98cf

由 Paul Mackerras 提交于 12月 03, 2014

This removes the code that was added to enable HV KVM to work
on PPC970 processors.  The PPC970 is an old CPU that doesn't
support virtualizing guest memory.  Removing PPC970 support also
lets us remove the code for allocating and managing contiguous
real-mode areas, the code for the !kvm->arch.using_mmu_notifiers
case, the code for pinning pages of guest memory when first
accessed and keeping track of which pages have been pinned, and
the code for handling H_ENTER hypercalls in virtual mode.

Book3S HV KVM is now supported only on POWER7 and POWER8 processors.
The KVM_CAP_PPC_RMA capability now always returns 0.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

c17b98cf

15 12月, 2014 3 次提交

powernv/powerpc: Add winkle support for offline cpus · 77b54e9f

由 Shreyas B. Prabhu 提交于 12月 10, 2014

Winkle is a deep idle state supported in power8 chips. A core enters
winkle when all the threads of the core enter winkle. In this state
power supply to the entire chiplet i.e core, private L2 and private L3
is turned off. As a result it gives higher powersavings compared to
sleep.

But entering winkle results in a total hypervisor state loss. Hence the
hypervisor context has to be preserved before entering winkle and
restored upon wake up.

Power-on Reset Engine (PORE) is a dedicated engine which is responsible
for powering on the chiplet during wake up. It can be programmed to
restore the register contests of a few specific registers. This patch
uses PORE to restore register state wherever possible and uses stack to
save and restore rest of the necessary registers.

With hypervisor state restore things fall under three categories-
per-core state, per-subcore state and per-thread state. To manage this,
extend the infrastructure introduced for sleep. Mainly we add a paca
variable subcore_sibling_mask. Using this and the core_idle_state we can
distingush first thread in core and subcore.
Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

77b54e9f

powernv/cpuidle: Redesign idle states management · 7cba160a

由 Shreyas B. Prabhu 提交于 12月 10, 2014

Deep idle states like sleep and winkle are per core idle states. A core
enters these states only when all the threads enter either the
particular idle state or a deeper one. There are tasks like fastsleep
hardware bug workaround and hypervisor core state save which have to be
done only by the last thread of the core entering deep idle state and
similarly tasks like timebase resync, hypervisor core register restore
that have to be done only by the first thread waking up from these
state.

The current idle state management does not have a way to distinguish the
first/last thread of the core waking/entering idle states. Tasks like
timebase resync are done for all the threads. This is not only is
suboptimal, but can cause functionality issues when subcores and kvm is
involved.

This patch adds the necessary infrastructure to track idle states of
threads in a per-core structure. It uses this info to perform tasks like
fastsleep workaround and timebase resync only once per core.
Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Originally-by: NPreeti U. Murthy <preeti@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: linux-pm@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7cba160a

powerpc/powernv: Switch off MMU before entering nap/sleep/rvwinkle mode · 8117ac6a

由 Paul Mackerras 提交于 12月 10, 2014

Currently, when going idle, we set the flag indicating that we are in
nap mode (paca->kvm_hstate.hwthread_state) and then execute the nap
(or sleep or rvwinkle) instruction, all with the MMU on.  This is bad
for two reasons: (a) the architecture specifies that those instructions
must be executed with the MMU off, and in fact with only the SF, HV, ME
and possibly RI bits set, and (b) this introduces a race, because as
soon as we set the flag, another thread can switch the MMU to a guest
context.  If the race is lost, this thread will typically start looping
on relocation-on ISIs at 0xc...4400.

This fixes it by setting the MSR as required by the architecture before
setting the flag or executing the nap/sleep/rvwinkle instruction.

Cc: stable@vger.kernel.org
[ shreyas@linux.vnet.ibm.com: Edited to handle LE ]
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

8117ac6a

09 12月, 2014 1 次提交

powerpc: Secondary CPUs must set cpu_callin_map after setting active and online · 7c5c92ed

由 Anton Blanchard 提交于 12月 09, 2014

I have a busy ppc64le KVM box where guests sometimes hit the infamous
"kernel BUG at kernel/smpboot.c:134!" issue during boot:

  BUG_ON(td->cpu != smp_processor_id());

Basically a per CPU hotplug thread scheduled on the wrong CPU. The oops
output confirms it:

  CPU: 0
  Comm: watchdog/130

The problem is that we aren't ensuring the CPU active and online bits are set
before allowing the master to continue on. The master unparks the secondary
CPUs kthreads and the scheduler looks for a CPU to run on. It calls
select_task_rq and realises the suggested CPU is not in the cpus_allowed
mask. It then ends up in select_fallback_rq, and since the active and
online bits aren't set we choose some other CPU to run on.

Cc: stable@vger.kernel.org
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7c5c92ed

08 12月, 2014 1 次提交

powerpc/powernv: Return to cpu offline loop when finished in KVM guest · 56548fc0

由 Paul Mackerras 提交于 12月 03, 2014

When a secondary hardware thread has finished running a KVM guest, we
currently put that thread into nap mode using a nap instruction in
the KVM code.  This changes the code so that instead of doing a nap
instruction directly, we instead cause the call to power7_nap() that
put the thread into nap mode to return.  The reason for doing this is
to avoid having the KVM code having to know what low-power mode to
put the thread into.

In the case of a secondary thread used to run a KVM guest, the thread
will be offline from the point of view of the host kernel, and the
relevant power7_nap() call is the one in pnv_smp_cpu_disable().
In this case we don't want to clear pending IPIs in the offline loop
in that function, since that might cause us to miss the wakeup for
the next time the thread needs to run a guest.  To tell whether or
not to clear the interrupt, we use the SRR1 value returned from
power7_nap(), and check if it indicates an external interrupt.  We
arrange that the return from power7_nap() when we have finished running
a guest returns 0, so pending interrupts don't get flushed in that
case.

Note that it is important a secondary thread that has finished
executing in the guest, or that didn't have a guest to run, should
not return to power7_nap's caller while the kvm_hstate.hwthread_req
flag in the PACA is non-zero, because the return from power7_nap
will reenable the MMU, and the MMU might still be in guest context.
In this situation we spin at low priority in real mode waiting for
hwthread_req to become zero.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

56548fc0

05 12月, 2014 2 次提交

powerpc/book3s: Fix partial invalidation of TLBs in MCE code. · 682e77c8

由 Mahesh Salgaonkar 提交于 12月 05, 2014

The existing MCE code calls flush_tlb hook with IS=0 (single page) resulting
in partial invalidation of TLBs which is not right. This patch fixes
that by passing IS=0xc00 to invalidate whole TLB for successful recovery
from TLB and ERAT errors.

Cc: stable@vger.kernel.org
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

682e77c8

powerpc/mm: don't do tlbie for updatepp request with NO HPTE fault · aefa5688

由 Aneesh Kumar K.V 提交于 12月 04, 2014

upatepp can get called for a nohpte fault when we find from the linux
page table that the translation was hashed before. In that case
we are sure that there is no existing translation, hence we could
avoid doing tlbie.

We could possibly race with a parallel fault filling the TLB. But
that should be ok because updatepp is only ever relaxing permissions.
We also look at linux pte permission bits when filling hash pte
permission bits. We also hold the linux pte busy bits while
inserting/updating a hashpte entry, hence a paralle update of
linux pte is not possible. On the other hand mprotect involves
ptep_modify_prot_start which cause a hpte invalidate and not updatepp.

Performance number:
We use randbox_access_bench written by Anton.

Kernel with THP disabled and smaller hash page table size.

86.60% random_access_b [kernel.kallsyms] [k] .native_hpte_updatepp
2.10% random_access_b random_access_bench [.] doit
1.99% random_access_b [kernel.kallsyms] [k] .do_raw_spin_lock
1.85% random_access_b [kernel.kallsyms] [k] .native_hpte_insert
1.26% random_access_b [kernel.kallsyms] [k] .native_flush_hash_range
1.18% random_access_b [kernel.kallsyms] [k] .__delay
0.69% random_access_b [kernel.kallsyms] [k] .native_hpte_remove
0.37% random_access_b [kernel.kallsyms] [k] .clear_user_page
0.34% random_access_b [kernel.kallsyms] [k] .__hash_page_64K
0.32% random_access_b [kernel.kallsyms] [k] fast_exception_return
0.30% random_access_b [kernel.kallsyms] [k] .hash_page_mm

With Fix:

27.54% random_access_b random_access_bench [.] doit
22.90% random_access_b [kernel.kallsyms] [k] .native_hpte_insert
5.76% random_access_b [kernel.kallsyms] [k] .native_hpte_remove
5.20% random_access_b [kernel.kallsyms] [k] fast_exception_return
5.12% random_access_b [kernel.kallsyms] [k] .__hash_page_64K
4.80% random_access_b [kernel.kallsyms] [k] .hash_page_mm
3.31% random_access_b [kernel.kallsyms] [k] data_access_common
1.84% random_access_b [kernel.kallsyms] [k] .trace_hardirqs_on_caller
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

aefa5688

02 12月, 2014 6 次提交

powerpc: Drop useless warning in eeh_init() · 221195fb

由 Greg Kurz 提交于 11月 25, 2014

This is what we get in dmesg when booting a pseries guest and
the hypervisor doesn't provide EEH support.

[    0.166655] EEH functionality not supported
[    0.166778] eeh_init: Failed to call platform init function (-22)

Since both powernv_eeh_init() and pseries_eeh_init() already complain when
hitting an error, it is not needed to print more (especially such an
uninformative message).
Signed-off-by: NGreg Kurz <gkurz@linux.vnet.ibm.com>
Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

221195fb

powerpc/powernv: Cleanup unused MCE definitions/declarations. · 6d626c5e

由 Mahesh Salgaonkar 提交于 11月 24, 2014

Cleanup OpalMCE_* definitions/declarations and other related code which
is not used anymore.
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: NBenjamin Herrrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

6d626c5e

powerpc/eeh: Dump PHB diag-data early · a450e8f5

由 Gavin Shan 提交于 11月 22, 2014

On PowerNV platform, PHB diag-data is dumped after stopping device
drivers. In case of recursive EEH errors, the kernel is usually
crashed before dumping PHB diag-data for the second EEH error. It's
hard to locate the root cause of the second EEH error without PHB
diag-data.

The patch adds one more EEH option "eeh=early_log", which helps
dumping PHB diag-data immediately once frozen PE is detected, in
order to get the PHB diag-data for the second EEH error.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

a450e8f5

powerpc/eeh: Recover EEH error on ownership change for BCM5719 · b1d76a7d

由 Gavin Shan 提交于 11月 14, 2014

In PCI passthrou scenario, we need simulate EEH recovery for Emulex
adapters when their ownership changes, as we did in commit 5cfb20b9
("powerpc/eeh: Emulate EEH recovery for VFIO devices"). Broadcom
BCM5719 adpaters are facing same problem and needs same cure.
Reported-by: NRajeshkumar Subramanian <rajeshkumars@in.ibm.com>
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

b1d76a7d

powerpc/eeh: Set EEH_PE_RESET on PE reset · 28bf36f9

由 Gavin Shan 提交于 11月 14, 2014

The patch introduces additional flag EEH_PE_RESET to indicate the
corresponding PE is under reset. In turn, the PE retrieval bakcend
on PowerNV platform can return unfrozen state for the EEH core to
moving forward. Flag EEH_PE_CFG_BLOCKED isn't the correct one for
the purpose.

In PCI passthrou case, the problem is more worse: Guest doesn't
recover 6th EEH error. The PE is left in isolated (frozen) and
config blocked state on Broadcom adapters. We can't retrieve the
PE's state correctly any more, even from the host side via sysfs
/sys/bus/pci/devices/xxx/eeh_pe_state.
Reported-by: NRajeshkumar Subramanian <rajeshkumars@in.ibm.com>
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

28bf36f9

powerpc/eeh: Refactor eeh_reset_pe() · b85743ee

由 Gavin Shan 提交于 11月 14, 2014

The patch refactors eeh_reset_pe() in order for:

   * Varied return values for different failure cases.
   * Replace pr_err() with pr_warn() and print function name.
   * Coding style cleanup.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

b85743ee

27 11月, 2014 2 次提交

powerpc: 32 bit getcpu VDSO function uses 64 bit instructions · 152d44a8

由 Anton Blanchard 提交于 11月 27, 2014

I used some 64 bit instructions when adding the 32 bit getcpu VDSO
function. Fix it.

Fixes: 18ad51dd ("powerpc: Add VDSO version of getcpu")
Cc: stable@vger.kernel.org
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

152d44a8

powerpc/eeh: Fix PE state format · 7531473c

由 Gavin Shan 提交于 11月 25, 2014

Obviously I had wrong format given to the PE state output from
/sys/bus/pci/devices/xxxx/eeh_pe_state with some typoes, which
was introduced by commit 2013add4. The patch fixes it up.

Fixes: 2013add4 ("powerpc/eeh: Show hex prefix for PE state sysfs")
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7531473c

24 11月, 2014 1 次提交

powerpc/pci: Remove unused force_32bit_msi quirk · 31345e1a

由 Benjamin Herrenschmidt 提交于 10月 07, 2014

This is now fully replaced with the generic "no_64bit_msi" one
that is set by the respective drivers directly.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

31345e1a

19 11月, 2014 1 次提交

powerpc: Remove more traces of bootmem · e39f223f

由 Michael Ellerman 提交于 11月 18, 2014

Although we are now selecting NO_BOOTMEM, we still have some traces of
bootmem lying around. That is because even with NO_BOOTMEM there is
still a shim that converts bootmem calls into memblock calls, but
ultimately we want to remove all traces of bootmem.

Most of the patch is conversions from alloc_bootmem() to
memblock_virt_alloc(). In general a call such as:

  p = (struct foo *)alloc_bootmem(x);

Becomes:

  p = memblock_virt_alloc(x, 0);

We don't need the cast because memblock_virt_alloc() returns a void *.
The alignment value of zero tells memblock to use the default alignment,
which is SMP_CACHE_BYTES, the same value alloc_bootmem() uses.

We remove a number of NULL checks on the result of
memblock_virt_alloc(). That is because memblock_virt_alloc() will panic
if it can't allocate, in exactly the same way as alloc_bootmem(), so the
NULL checks are and always have been redundant.

The memory returned by memblock_virt_alloc() is already zeroed, so we
remove several memsets of the result of memblock_virt_alloc().

Finally we convert a few uses of __alloc_bootmem(x, y, MAX_DMA_ADDRESS)
to just plain memblock_virt_alloc(). We don't use memblock_alloc_base()
because MAX_DMA_ADDRESS is ~0ul on powerpc, so limiting the allocation
to that is pointless, 16XB ought to be enough for anyone.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e39f223f

18 11月, 2014 1 次提交

powerpc/iommu: Rename iommu_[un]map_sg functions · 0690cbd2

由 Joerg Roedel 提交于 11月 05, 2014

The IOMMU-API gained support for a new iommu_map_sg
function. This causes compile failures on powerpc because
the function name is already globally used there.
This patch renames adds a ppc_ prefix to these functions to
solve the compile problem.
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

0690cbd2

17 11月, 2014 2 次提交

rtc/tpo: Driver to support rtc and wakeup on PowerNV platform · 16b1d26e

由 Neelesh Gupta 提交于 10月 14, 2014

The patch implements the OPAL rtc driver that binds with the rtc
driver subsystem. The driver uses the platform device infrastructure
to probe the rtc device and register it to rtc class framework. The
'wakeup' is supported depending upon the property 'has-tpo' present
in the OF node. It provides a way to load the generic rtc driver in
in the absence of an OPAL driver.

The patch also moves the existing OPAL rtc get/set time interfaces to the
new driver and exposes the necessary OPAL calls using EXPORT_SYMBOL_GPL.

Test results:
-------------
Host:
[root@tul169p1 ~]# ls -l /sys/class/rtc/
total 0
lrwxrwxrwx 1 root root 0 Oct 14 03:07 rtc0 -> ../../devices/opal-rtc/rtc/rtc0
[root@tul169p1 ~]# cat /sys/devices/opal-rtc/rtc/rtc0/time
08:10:07
[root@tul169p1 ~]# echo `date '+%s' -d '+ 2 minutes'` > /sys/class/rtc/rtc0/wakealarm
[root@tul169p1 ~]# cat /sys/class/rtc/rtc0/wakealarm
1413274345
[root@tul169p1 ~]#

FSP:
$ smgr mfgState
standby
$ rtim timeofday

System time is valid: 2014/10/14 08:12:04.225115

$ smgr mfgState
ipling
$

CC: devicetree@vger.kernel.org
CC: tglx@linutronix.de
CC: rtc-linux@googlegroups.com
CC: a.zummo@towertech.it
Signed-off-by: NNeelesh Gupta <neelegup@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

16b1d26e

powerpc: Use generic PIE randomization · 59994fb0

由 Vineeth Vijayan 提交于 11月 14, 2014

Back in 2009 we merged 501cb16d "Randomise PIEs", which added support for
randomizing PIE (Position Independent Executable) binaries.

That commit added randomize_et_dyn(), which correctly randomized the addresses,
but failed to honor PF_RANDOMIZE. That means it was not possible to disable PIE
randomization via the personality flag, or /proc/sys/kernel/randomize_va_space.

Since then there has been generic support for PIE randomization added to
binfmt_elf.c, selectable via ARCH_BINFMT_ELF_RANDOMIZE_PIE.

Enabling that allows us to drop randomize_et_dyn(), which means we start
honoring PF_RANDOMIZE correctly.

It also causes a fairly major change to how we layout PIE binaries.

Currently we will place the binary at 512MB-520MB for 32 bit binaries, or
512MB-1.5GB for 64 bit binaries, eg:

    $ cat /proc/$$/maps
    4e550000-4e580000 r-xp 00000000 08:02 129813       /bin/dash
    4e580000-4e590000 rw-p 00020000 08:02 129813       /bin/dash
    10014110000-10014140000 rw-p 00000000 00:00 0      [heap]
    3fffaa3f0000-3fffaa5a0000 r-xp 00000000 08:02 921  /lib/powerpc64le-linux-gnu/libc-2.19.so
    3fffaa5a0000-3fffaa5b0000 rw-p 001a0000 08:02 921  /lib/powerpc64le-linux-gnu/libc-2.19.so
    3fffaa5c0000-3fffaa5d0000 rw-p 00000000 00:00 0
    3fffaa5d0000-3fffaa5f0000 r-xp 00000000 00:00 0    [vdso]
    3fffaa5f0000-3fffaa620000 r-xp 00000000 08:02 1246 /lib/powerpc64le-linux-gnu/ld-2.19.so
    3fffaa620000-3fffaa630000 rw-p 00020000 08:02 1246 /lib/powerpc64le-linux-gnu/ld-2.19.so
    3ffffc340000-3ffffc370000 rw-p 00000000 00:00 0    [stack]

With this commit applied we don't do any special randomisation for the binary,
and instead rely on mmap randomisation. This means the binary ends up at high
addresses, eg:

    $ cat /proc/$$/maps
    3fff99820000-3fff999d0000 r-xp 00000000 08:02 921    /lib/powerpc64le-linux-gnu/libc-2.19.so
    3fff999d0000-3fff999e0000 rw-p 001a0000 08:02 921    /lib/powerpc64le-linux-gnu/libc-2.19.so
    3fff999f0000-3fff99a00000 rw-p 00000000 00:00 0
    3fff99a00000-3fff99a20000 r-xp 00000000 00:00 0      [vdso]
    3fff99a20000-3fff99a50000 r-xp 00000000 08:02 1246   /lib/powerpc64le-linux-gnu/ld-2.19.so
    3fff99a50000-3fff99a60000 rw-p 00020000 08:02 1246   /lib/powerpc64le-linux-gnu/ld-2.19.so
    3fff99a60000-3fff99a90000 r-xp 00000000 08:02 129813 /bin/dash
    3fff99a90000-3fff99aa0000 rw-p 00020000 08:02 129813 /bin/dash
    3fffc3de0000-3fffc3e10000 rw-p 00000000 00:00 0      [stack]
    3fffc55e0000-3fffc5610000 rw-p 00000000 00:00 0      [heap]

Although this should be OK, it's possible it might break badly written
binaries that make assumptions about the address space layout.
Signed-off-by: NVineeth Vijayan <vvijayan@mvista.com>
[mpe: Rewrite changelog]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

59994fb0

14 11月, 2014 1 次提交

powerpc: Disable CPU_FTR_TM if TM is disabled by firmware · 9e819963

由 Aneesh Kumar K.V 提交于 11月 02, 2014

Firmware is allowed to communicate to us via the "ibm,pa-features" property
that TM (Transactional Memory) support is disabled.

Currently this doesn't happen on any platform we're aware of, but we should
honor it anyway.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

9e819963

12 11月, 2014 3 次提交

powerpc: Save/restore PPR for KVM hypercalls · 8b91a255

由 Suresh E. Warrier 提交于 11月 03, 2014

The system call FLIH (first-level interrupt handler) at 0xc00
unconditionally sets hardware priority to medium. For hypercalls, this
means we lose guest OS priority. The front end (do_kvm_0x**) to the
KVM interrupt handler always assumes that PPR priority is saved in
PACA exception save area, so it copies this to the kvm_hstate
structure. For hypercalls, this would be the saved priority from any
previous exception. Eventually, the guest gets resumed with an
incorrect priority.

The fix is to save the PPR priority in PACA exception save area before
switching HMT priorities in the FLIH so that existing code described above
in the KVM interrupt handler can copy it from there into the VCPU's saved
context.
Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@samba.org>
[mpe: Dropped HMT_MEDIUM_PPR_DISCARD and reworded comment]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

8b91a255

powerpc: Fix bad NULL pointer check in udbg_uart_getc_poll() · cd32e2dc

由 Anton Blanchard 提交于 11月 11, 2014

We have some code in udbg_uart_getc_poll() that tries to protect
against a NULL udbg_uart_in, but gets it all wrong.

Found with the LLVM static analyzer (scan-build).

Fixes: 30925748 ("powerpc: Cleanup udbg_16550 and add support for LPC PIO-only UARTs")
Signed-off-by: NAnton Blanchard <anton@samba.org>
[mpe: Add some newlines for readability while we're here]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

cd32e2dc

ftrace: Add more information to ftrace_bug() output · 4fd3279b

由 Steven Rostedt (Red Hat) 提交于 10月 24, 2014

With the introduction of the dynamic trampolines, it is useful that if
things go wrong that ftrace_bug() produces more information about what
the current state is. This can help debug issues that may arise.

Ftrace has lots of checks to make sure that the state of the system it
touchs is exactly what it expects it to be. When it detects an abnormality
it calls ftrace_bug() and disables itself to prevent any further damage.
It is crucial that ftrace_bug() produces sufficient information that
can be used to debug the situation.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: NBorislav Petkov <bp@suse.de>
Tested-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: NJiri Kosina <jkosina@suse.cz>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

4fd3279b

10 11月, 2014 1 次提交

powerpc: LLVM complains about forward declaration of struct rtas_sensors · ecaf5fa0

由 Anton Blanchard 提交于 10月 31, 2014

Move the declaration up to silence the warning.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ecaf5fa0

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功