提交 · 82eae1afbbdcaf2d716f88025736dc2d6f7afbf0 · openeuler / Kernel

28 4月, 2017 6 次提交

powerpc/powernv: Check kzalloc() return value in pnv_pci_table_alloc · 82eae1af

由 Alexey Kardashevskiy 提交于 3月 27, 2017

pnv_pci_table_alloc() ignores possible failure from kzalloc_node(),
this adds a check. There are 2 callers of pnv_pci_table_alloc(),
one already checks for tbl!=NULL, this adds WARN_ON() to the other path
which only happens during boot time in IODA1 and not expected to fail.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

82eae1af

N
powerpc/pseries: Implement NMI IPI with H_SIGNAL_SYS_RESET · 102c05e8
由 Nicholas Piggin 提交于 12月 20, 2016
```
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
```
102c05e8

powerpc: Add struct smp_ops_t.cause_nmi_ipi operation · c64af645

由 Nicholas Piggin 提交于 12月 20, 2016

Have the NMI IPI code use this op when the platform defines it.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c64af645

powerpc: Add NMI IPI infrastructure · ddd703ca

由 Nicholas Piggin 提交于 12月 20, 2016

Add a simple NMI IPI system that handles concurrency and reentrancy.

The platform does not have to implement a true non-maskable interrupt,
the default is to simply use the debugger break IPI message. This has
now been co-opted for a general IPI message, and users (debugger and
crash) have been reimplemented on top of the NMI system.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
[mpe: Incorporate incremental fixes from Nick]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ddd703ca

powerpc/cbe: Do not process external or decremeter interrupts from sreset · 6e83985b

由 Nicholas Piggin 提交于 3月 20, 2017

Cell will wake from low power state at the system reset interrupt,
with the event encoded in SRR1, rather than waking at the interrupt
vector that corresponds to that event.

The system reset handler for this platform decodes SRR1 event reason
and calls the interrupt handler to process it directly from the system
reset handlre.

A subsequent change will treat the system reset interrupt as a Linux NMI
with its own per-CPU stack, and this will no longer work. Remove the
external and decrementer handlers from the system reset handler.

- The external exception remains raised and will fire again at the
  EE interrupt vector when system reset returns.

- The decrementer is set to 1 so it will be raised again and fire when
  the system reset returns.

It is possible to branch to an idle handler from the system reset
interrupt (like POWER does), then restore a normal stack and restore
this optimisation. But simplicity wins for now.
Tested-by: NMichael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

6e83985b

powerpc/pasemi: Do not process external or decrementer interrupts from sreset · 461e96a3

由 Nicholas Piggin 提交于 3月 20, 2017

PA Semi will wake from low power state at the system reset interrupt,
with the event encoded in SRR1, rather than waking at the interrupt
vector that corresponds to that event.

The system reset handler for this platform decodes SRR1 event reason
and calls the interrupt handler to process it directly from the system
reset handlre.

A subsequent change will treat the system reset interrupt as a Linux NMI
with its own per-CPU stack, and this will no longer work. Remove the
external and decrementer handlers from the system reset handler.

- The external exception remains raised and will fire again at the
  EE interrupt vector when system reset returns.

- The decrementer is set to 1 so it will be raised again and fire when
  the system reset returns.

It is possible to branch to an idle handler from the system reset
interrupt (like POWER does), then restore a normal stack and restore
this optimisation. But simplicity wins for now.
Tested-by: NChristian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

461e96a3

26 4月, 2017 2 次提交

powerpc/powernv: Fix oops on P9 DD1 in cause_ipi() · 45b21cfe

由 Michael Ellerman 提交于 4月 26, 2017

Recently we merged the native xive support for Power9, and then separately some
reworks for doorbell IPI support. In isolation both series were OK, but the
merged result had a bug in one case.

On P9 DD1 we use pnv_p9_dd1_cause_ipi() which tries to use doorbells, and then
falls back to the interrupt controller. However the fallback is implemented by
calling icp_ops->cause_ipi. But now that xive support is merged we might be
using xive, in which case icp_ops is not initialised, it's a xics specific
structure. This leads to an oops such as:

Unable to handle kernel paging request for data at address 0x00000028
Oops: Kernel access of bad area, sig: 11 [#1]
NIP pnv_p9_dd1_cause_ipi+0x74/0xe0
LR smp_muxed_ipi_message_pass+0x54/0x70

To fix it, rather than using icp_ops which might be NULL, have both xics and
xive set smp_ops->cause_ipi, and then in the powernv code we save that as
ic_cause_ipi before overriding smp_ops->cause_ipi. For paranoia add a WARN_ON()
to check if somehow smp_ops->cause_ipi is NULL.

Fixes: b866cc21 ("powerpc: Change the doorbell IPI calling convention")
Tested-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

45b21cfe

powerpc/powernv: Fix missing attr initialisation in opal_export_attrs() · 83c49190

由 Michael Ellerman 提交于 4月 26, 2017

In opal_export_attrs() we dynamically allocate some bin_attributes. They're
allocated with kmalloc() and although we initialise most of the fields, we don't
initialise write() or mmap(), and in particular we don't initialise the lockdep
related fields in the embedded struct attribute.

This leads to a lockdep warning at boot:

  BUG: key c0000000f11906d8 not in .data!
  WARNING: CPU: 0 PID: 1 at ../kernel/locking/lockdep.c:3136 lockdep_init_map+0x28c/0x2a0
  ...
  Call Trace:
    lockdep_init_map+0x288/0x2a0 (unreliable)
    __kernfs_create_file+0x8c/0x170
    sysfs_add_file_mode_ns+0xc8/0x240
    __machine_initcall_powernv_opal_init+0x60c/0x684
    do_one_initcall+0x60/0x1c0
    kernel_init_freeable+0x2f4/0x3d4
    kernel_init+0x24/0x160
    ret_from_kernel_thread+0x5c/0xb0

Fix it by kzalloc'ing the attr, which fixes the uninitialised write() and
mmap(), and calling sysfs_bin_attr_init() on it to initialise the lockdep
fields.

Fixes: 11fe909d ("powerpc/powernv: Add OPAL exports attributes to sysfs")
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

83c49190

24 4月, 2017 1 次提交

powerpc/pseries: Fix of_node_put() underflow during DLPAR remove · 68baf692

由 Tyrel Datwyler 提交于 4月 17, 2017

Historically struct device_node references were tracked using a kref embedded as
a struct field. Commit 75b57ecf ("of: Make device nodes kobjects so they
show up in sysfs") (Mar 2014) refactored device_nodes to be kobjects such that
the device tree could by more simply exposed to userspace using sysfs.

Commit 0829f6d1 ("of: device_node kobject lifecycle fixes") (Mar 2014)
followed up these changes to better control the kobject lifecycle and in
particular the referecne counting via of_node_get(), of_node_put(), and
of_node_init().

A result of this second commit was that it introduced an of_node_put() call when
a dynamic node is detached, in of_node_remove(), that removes the initial kobj
reference created by of_node_init().

Traditionally as the original dynamic device node user the pseries code had
assumed responsibilty for releasing this final reference in its platform
specific DLPAR detach code.

This patch fixes a refcount underflow introduced by commit 0829f6d1, and
recently exposed by the upstreaming of the recount API.

Messages like the following are no longer seen in the kernel log with this
patch following DLPAR remove operations of cpus and pci devices.

  rpadlpar_io: slot PHB 72 removed
  refcount_t: underflow; use-after-free.
  ------------[ cut here ]------------
  WARNING: CPU: 5 PID: 3335 at lib/refcount.c:128 refcount_sub_and_test+0xf4/0x110

Fixes: 0829f6d1 ("of: device_node kobject lifecycle fixes")
Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: NTyrel Datwyler <tyreld@linux.vnet.ibm.com>
[mpe: Make change log commit references more verbose]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

68baf692

23 4月, 2017 1 次提交

powerpc/64s: Stop using bit in HSPRG0 to test winkle · 544686ca

由 Nicholas Piggin 提交于 4月 19, 2017

The POWER8 idle code has a neat trick of programming the power on engine
to restore a low bit into HSPRG0, so idle wakeup code can test and see
if it has been programmed this way and therefore lost all state. Restore
time can be reduced if winkle has not been reached.

However this messes with our r13 PACA pointer, and requires HSPRG0 to be
written to. It also optimizes the slowest and most uncommon case at the
expense of another SPR write in the common nap state wakeup.

Remove this complexity and assume winkle sleeps always require a state
restore. This speedup could be made entirely contained within the winkle
idle code by counting per-core winkles and setting a thread bitmap when
all have gone to winkle.
Reviewed-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

544686ca

19 4月, 2017 1 次提交

powerpc/64s: Remove ICSWX feature from Power9 · 2384d2d7

由 Nicholas Piggin 提交于 4月 19, 2017

Power9 does not implement the icswx instruction. This CPU feature is not visible
to userspace and is only used in the CONFIG_PPC_ICSWX code, which is generally
not enabled, and can only be triggered by other code using icswx, which should
not happen on Power9 systems in the first place. So impact should be minimal.

Fixes: c3ab300e ("powerpc: Add POWER9 cputable entry")
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

2384d2d7

13 4月, 2017 5 次提交

powerpc/pseries: Always enable SMP when building pseries · 270e2dc9

由 Michael Ellerman 提交于 4月 05, 2017

The pseries platform supports Power4 and later CPUs, all of which are
multithreaded and/or multicore.

In practice no one ever builds a SMP=n kernel for these machines. So as
we did for powernv, have the pseries platform imply SMP=y.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

270e2dc9

powerpc/powernv: Always enable SMP when building powernv · 40e27565

由 Michael Ellerman 提交于 4月 05, 2017

The powernv platform supports Power7 and later CPUs, all of which are
multithreaded and multicore.

As such we never build a SMP=n kernel for those machines, other than
possibly for debugging or running in a simulator.

In the debugging case we can get a similar effect by booting with
nr_cpus=1, or there's always the option of building a custom kernel with
SMP hacked out.

For running in simulators the code size reduction from building without
SMP is not particularly important, what matters is the number of
instructions executed. A quick test shows that a SMP=y kernel takes ~6%
more instructions to boot to a shell. Booting with nr_cpus=1 recovers
about half that deficit.

On the flip side, keeping the SMP=n kernel building can be a pain at
times. And although we've mostly kept it building in recent years, no
one is regularly testing that the SMP=n kernel actually boots and works
well on these machines.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

40e27565

powerpc: Allow platforms to force-enable CONFIG_SMP · ebbe9d7d

由 Michael Ellerman 提交于 4月 05, 2017

Of the 64-bit Book3S platforms, only powermac supports booting on an
actual non-SMP system. The other platforms can be built with SMP
disabled, but it doesn't make a lot of sense given the CPUs they support
are all multicore or multithreaded.

So give platforms the option of forcing SMP=y.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ebbe9d7d

powerpc/powernv: POWER9 support for msgsnd/doorbell IPI · 6b3edefe

由 Nicholas Piggin 提交于 4月 13, 2017

POWER9 requires msgsync for receiver-side synchronization, and a DD1
workaround restricts IPIs to core-local.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
[mpe: Drop no longer needed asm feature macro changes]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

6b3edefe

powerpc: Change the doorbell IPI calling convention · b866cc21

由 Nicholas Piggin 提交于 4月 13, 2017

Change the doorbell callers to know about their msgsnd addressing,
rather than have them set a per-cpu target data tag at boot that gets
sent to the cause_ipi functions. The data is only used for doorbell IPI
functions, no other IPI types, so it makes sense to keep that detail
local to doorbell.

Have the platform code understand doorbell IPIs, rather than the
interrupt controller code understand them. Platform code can look at
capabilities it has available and decide which to use.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b866cc21

11 4月, 2017 6 次提交

powerpc/powernv: Recover correct PACA on wakeup from a stop on P9 DD1 · 17ed4c8f

由 Gautham R. Shenoy 提交于 3月 22, 2017

POWER9 DD1.0 hardware has a bug where the SPRs of a thread waking up
from stop 0,1,2 with ESL=1 can endup being misplaced in the core. Thus
the HSPRG0 of a thread waking up from can contain the paca pointer of
its sibling.

This patch implements a context recovery framework within threads of a
core, by provisioning space in paca_struct for saving every sibling
threads's paca pointers. Basically, we should be able to arrive at the
right paca pointer from any of the thread's existing paca pointer.

At bootup, during powernv idle-init, we save the paca address of every
CPU in each one its siblings paca_struct in the slot corresponding to
this CPU's index in the core.

On wakeup from a stop, the thread will determine its index in the core
from the TIR register and recover its PACA pointer by indexing into
the correct slot in the provisioned space in the current PACA.

Furthermore, ensure that the NVGPRs are restored from the stack on the
way out by setting the NAPSTATELOST in paca.

[Changelog written with inputs from svaidy@linux.vnet.ibm.com]
Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
[mpe: Call it a bug]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

17ed4c8f

powerpc/powernv/idle: Don't override default/deepest directly in kernel · f3b3f284

由 Gautham R. Shenoy 提交于 3月 22, 2017

Currently during idle-init on power9, if we don't find suitable stop
states in the device tree that can be used as the
default_stop/deepest_stop, we set stop0 (ESL=1,EC=1) as the default
stop state psscr to be used by power9_idle and deepest stop state
which is used by CPU-Hotplug.

However, if the platform firmware has not configured or enabled a stop
state, the kernel should not make any assumptions and fallback to a
default choice.

If the kernel uses a stop state that is not configured by the platform
firmware, it may lead to further failures which should be avoided.

In this patch, we modify the init code to ensure that the kernel uses
only the stop states exposed by the firmware through the device
tree. When a suitable default stop state isn't found, we disable
ppc_md.power_save for power9. Similarly, when a suitable
deepest_stop_state is not found in the device tree exported by the
firmware, fall back to the default busy-wait loop in the CPU-Hotplug
code.

[Changelog written with inputs from svaidy@linux.vnet.ibm.com]
Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f3b3f284

powerpc/powernv/smp: Add busy-wait loop as fall back for CPU-Hotplug · 90061231

由 Gautham R. Shenoy 提交于 3月 22, 2017

Currently, the powernv cpu-offline function assumes that platform idle
states such as stop on POWER9, winkle/sleep/nap on POWER8 are always
available. On POWER8, it picks nap as the default state if other deep
idle states like sleep/winkle are not available and enabled in the
platform.

On POWER9, nap is not available and all idle states are managed by
STOP instruction.  The parameters to the idle state are passed through
processor stop status control register (PSSCR).  Hence as such
executing STOP would take parameters from current PSSCR. We do not
want to make any assumptions in kernel on what STOP states and PSSCR
features are configured by the platform.

Ideally platform will configure a good set of stop states that can be
used in the kernel.  We would like to start with a clean slate, if the
platform choose to not configure any state or there is an error in
platform firmware that lead to no stop states being configured or
allowed to be requested.

This patch adds a fallback method for CPU-Hotplug that is similar to
snooze loop at idle where the threads are left to spin at low priority
and hence reduce the cycles consumed.

This is a safe fallback mechanism in the case when no stop state would
be requested if the platform firmware did not configure them most
likely due to an error condition.

Requesting a stop state when the platform has not configured them or
enabled them would lead to further error conditions which could be
difficult to debug.

[Changelog written with inputs from svaidy@linux.vnet.ibm.com]
Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

90061231

powerpc/powernv: Move CPU-Offline idle state invocation from smp.c to idle.c · a7cd88da

由 Gautham R. Shenoy 提交于 3月 22, 2017

Move the piece of code in powernv/smp.c::pnv_smp_cpu_kill_self() which
transitions the CPU to the deepest available platform idle state to a
new function named pnv_cpu_offline() in powernv/idle.c. The rationale
behind this code movement is that the data required to determine the
deepest available platform state resides in powernv/idle.c.
Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a7cd88da

powerpc: Create asm/debugfs.h and move powerpc_debugfs_root there · 7644d581

由 Michael Ellerman 提交于 2月 10, 2017

powerpc_debugfs_root is the dentry representing the root of the
"powerpc" directory tree in debugfs.

Currently it sits in asm/debug.h, a long with some other things that
have "debug" in the name, but are otherwise unrelated.

Pull it out into a separate header, which also includes linux/debugfs.h,
and convert all the users to include debugfs.h instead of debug.h.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7644d581

powerpc/powernv: Require MMU_NOTIFIER to fix NPU build · abfe8026

由 Alistair Popple 提交于 4月 10, 2017

In the recent commit 1ab66d1f ("powerpc/powernv: Introduce address
translation services for Nvlink2") the NPU code gained a dependency on MMU
notifiers.

All our defconfigs have KVM enabled, which selects MMU_NOTIFIER, but if KVM is
not enabled then the build breaks.

Fix it by always selecting MMU_NOTIFIER when we're building powernv.

Fixes: 1ab66d1f ("powerpc/powernv: Introduce address translation services for Nvlink2")
Signed-off-by: NAlistair Popple <alistair@popple.id.au>
[mpe: Reword change log]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

abfe8026

10 4月, 2017 2 次提交

powerpc: Consolidate variants of real-mode MMIOs · d381d7ca

由 Benjamin Herrenschmidt 提交于 4月 05, 2017

We have all sort of variants of MMIO accessors for the real mode
instructions. This creates a clean set of accessors based on
Linux normal naming conventions, replacing all occurrences of
the old ones in the tree.

I have purposefully removed the "out/in" variants in favor of
only including __raw variants. Any code using these is already
pretty much hand tuned to operate in a very specific environment.
I've fixed up the 2 users (only one of them actually needed
a barrier in the first place).
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

d381d7ca

powerpc/xive: Native exploitation of the XIVE interrupt controller · 243e2511

由 Benjamin Herrenschmidt 提交于 4月 05, 2017

The XIVE interrupt controller is the new interrupt controller
found in POWER9. It supports advanced virtualization capabilities
among other things.

Currently we use a set of firmware calls that simulate the old
"XICS" interrupt controller but this is fairly inefficient.

This adds the framework for using XIVE along with a native
backend which OPAL for configuration. Later, a backend allowing
the use in a KVM or PowerVM guest will also be provided.

This disables some fast path for interrupts in KVM when XIVE is
enabled as these rely on the firmware emulation code which is no
longer available when the XIVE is used natively by Linux.

A latter patch will make KVM also directly exploit the XIVE, thus
recovering the lost performance (and more).
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Fixup pr_xxx("XIVE:"...), don't split pr_xxx() strings,
 tweak Kconfig so XIVE_NATIVE selects XIVE and depends on POWERNV,
 fix build errors when SMP=n, fold in fixes from Ben:
   Don't call cpu_online() on an invalid CPU number
   Fix irq target selection returning out of bounds cpu#
   Extra sanity checks on cpu numbers
 ]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

243e2511

07 4月, 2017 1 次提交

powerpc/smp: Remove migrate_irq() custom implementation · a978e139

由 Benjamin Herrenschmidt 提交于 4月 05, 2017

Some powerpc platforms use this to move IRQs away from a CPU being
unplugged. This function has several bugs such as not taking the right
locks or failing to NULL check pointers.

There's a new generic function doing exactly the same thing without all
the bugs, so let's use it instead.

mpe: The obvious place for the select of GENERIC_IRQ_MIGRATION is on
HOTPLUG_CPU, but that doesn't work. On some configs PM_SLEEP_SMP will
select HOTPLUG_CPU even though its dependencies are not met, which means
the select of GENERIC_IRQ_MIGRATION doesn't happen. That leads to the
build breaking. Fix it by moving the select of GENERIC_IRQ_MIGRATION to
SMP.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a978e139

06 4月, 2017 1 次提交
- B
  powerpc/powernv: Add XIVE related definitions to opal-api.h · eeea1a43
  由 Benjamin Herrenschmidt 提交于 4月 06, 2017
```
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
```
  eeea1a43
04 4月, 2017 2 次提交

powerpc/powernv: Add OPAL exports attributes to sysfs · 11fe909d

由 Matt Brown 提交于 3月 30, 2017

New versions of OPAL have a device node /ibm,opal/firmware/exports, each
property of which describes a range of memory in OPAL that Linux might
want to export to userspace for debugging.

This patch adds a sysfs file under 'opal/exports' for each property
found there, and makes it read-only by root.
Signed-off-by: NMatt Brown <matthew.brown.dev@gmail.com>
[mpe: Drop counting of props, rename to attr, free on sysfs error, c'log]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

11fe909d

powerpc/powernv: Introduce address translation services for Nvlink2 · 1ab66d1f

由 Alistair Popple 提交于 4月 03, 2017

Nvlink2 supports address translation services (ATS) allowing devices
to request address translations from an mmu known as the nest MMU
which is setup to walk the CPU page tables.

To access this functionality certain firmware calls are required to
setup and manage hardware context tables in the nvlink processing unit
(NPU). The NPU also manages forwarding of TLB invalidates (known as
address translation shootdowns/ATSDs) to attached devices.

This patch exports several methods to allow device drivers to register
a process id (PASID/PID) in the hardware tables and to receive
notification of when a device should stop issuing address translation
requests (ATRs). It also adds a fault handler to allow device drivers
to demand fault pages in.
Signed-off-by: NAlistair Popple <alistair@popple.id.au>
[mpe: Fix up comment formatting, use flush_tlb_mm()]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

1ab66d1f

03 4月, 2017 2 次提交

powerpc/powernv: Add sanity checks to pnv_pci_get_{gpu|npu}_dev · 4c3b89ef

由 Alistair Popple 提交于 4月 03, 2017

The pnv_pci_get_{gpu|npu}_dev functions are used to find associations
between nvlink PCIe devices and standard PCIe devices. However they
lacked basic sanity checking which results in NULL pointer
dereferencing if they are incorrect called can be harder to spot than
an explicit WARN_ON.
Signed-off-by: NAlistair Popple <alistair@popple.id.au>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4c3b89ef

powerpc/book3s: Print task info if we take a machine check in user mode · 63f44d65

由 Michael Ellerman 提交于 4月 03, 2017

For an MCE (Machine Check Exception) that hits while in user mode
MSR(PR=1), print the task info to the console MCE error log. This may
help to identify an application that triggered the MCE.

After this patch the MCE console looks like:

  Severe Machine check interrupt [Recovered]
    NIP: [0000000010039778] PID: 762 Comm: ebizzy
    Initiator: CPU
    Error type: SLB [Multihit]
      Effective address: 0000000010039778

  Severe Machine check interrupt [Not recovered]
    NIP: [0000000010039778] PID: 763 Comm: ebizzy
    Initiator: CPU
    Error type: UE [Page table walk ifetch]
      Effective address: 0000000010039778
  ebizzy[763]: unhandled signal 7 at 0000000010039778 nip 0000000010039778 lr 0000000010001b44 code 30004
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

63f44d65

01 4月, 2017 1 次提交

powerpc/pseries: Skip using reserved virtual address range · 82228e36

由 Aneesh Kumar K.V 提交于 3月 22, 2017

Now that we use all the available virtual address range, we need to make
sure we don't generate VSID such that it overlaps with the reserved vsid
range. Reserved vsid range include the virtual address range used by the
adjunct partition and also the VRMA virtual segment. We find the context
value that can result in generating such a VSID and reserve it early in
boot.

We don't look at the adjunct range, because for now we disable the
adjunct usage in a Linux LPAR via CAS interface.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Rewrite hash__reserve_context_id(), move the rest into pseries]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

82228e36

31 3月, 2017 4 次提交

powerpc/mm: Add translation mode information in /proc/cpuinfo · 3a4c2601

由 Aneesh Kumar K.V 提交于 3月 21, 2017

With this we have on powernv and pseries /proc/cpuinfo reporting

timebase        : 512000000
platform        : PowerNV
model           : 8247-22L
machine         : PowerNV 8247-22L
firmware        : OPAL
MMU		: Hash
Reviewed-by: NPaul Mackerras <paulus@ozlabs.org>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

3a4c2601

powerpc/mm/nohash: MM_SLICE is only used by book3s 64 · b42279f0

由 Aneesh Kumar K.V 提交于 3月 21, 2017

BOOKE code is dead code as per the Kconfig details. So make it simpler
by enabling MM_SLICE only for book3s_64. The changes w.r.t nohash is just
removing deadcode. W.r.t ppc64, 4k without hugetlb will now enable MM_SLICE.
But that is good, because we reduce one extra variant which probably is not
getting tested much.
Reviewed-by: NPaul Mackerras <paulus@ozlabs.org>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b42279f0

powerpc/4xx: Make sam440ep_setup_rtc() init · adec9a2e

由 Yang Shi 提交于 4月 26, 2016

sam440ep_setup_rtc() is just called by machine_device_initcall() so make
it __init.
Signed-off-by: NYang Shi <yang.shi@windriver.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

adec9a2e

powerpc/powernv: Handle OPAL_WRONG_STATE in opal_get_sensor_data() · 17bb6951

由 Vipin K Parashar 提交于 3月 10, 2017

OPAL returns OPAL_WRONG_STATE upon failing to provide sensor data due to
core sleeping/offline. Add a check in opal_get_sensor_data() for sensor
read failure with OPAL_WRONG_STATE return code and return -EIO.
Signed-off-by: NVipin K Parashar <vipin@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

17bb6951

30 3月, 2017 3 次提交

powerpc/vfio_spapr_tce: Add reference counting to iommu_table · e5afdf9d

由 Alexey Kardashevskiy 提交于 3月 22, 2017

So far iommu_table obejcts were only used in virtual mode and had
a single owner. We are going to change this by implementing in-kernel
acceleration of DMA mapping requests. The proposed acceleration
will handle requests in real mode and KVM will keep references to tables.

This adds a kref to iommu_table and defines new helpers to update it.
This replaces iommu_free_table() with iommu_tce_table_put() and makes
iommu_free_table() static. iommu_tce_table_get() is not used in this patch
but it will be in the following patch.

Since this touches prototypes, this also removes @node_name parameter as
it has never been really useful on powernv and carrying it for
the pseries platform code to iommu_free_table() seems to be quite
useless as well.

This should cause no behavioral change.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Acked-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e5afdf9d

powerpc/iommu/vfio_spapr_tce: Cleanup iommu_table disposal · 11edf116

由 Alexey Kardashevskiy 提交于 3月 22, 2017

At the moment iommu_table can be disposed by either calling
iommu_table_free() directly or it_ops::free(); the only implementation
of free() is in IODA2 - pnv_ioda2_table_free() - and it calls
iommu_table_free() anyway.

As we are going to have reference counting on tables, we need an unified
way of disposing tables.

This moves it_ops::free() call into iommu_free_table() and makes use
of the latter. The free() callback now handles only platform-specific
data.

As from now on the iommu_free_table() calls it_ops->free(), we need
to have it_ops initialized before calling iommu_free_table() so this
moves this initialization in pnv_pci_ioda2_create_table().

This should cause no behavioral change.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Acked-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

11edf116

powerpc/powernv/iommu: Add real mode version of iommu_table_ops::exchange() · a540aa56

由 Alexey Kardashevskiy 提交于 3月 22, 2017

In real mode, TCE tables are invalidated using special
cache-inhibited store instructions which are not available in
virtual mode

This defines and implements exchange_rm() callback. This does not
define set_rm/clear_rm/flush_rm callbacks as there is no user for those -
exchange/exchange_rm are only to be used by KVM for VFIO.

The exchange_rm callback is defined for IODA1/IODA2 powernv platforms.

This replaces list_for_each_entry_rcu with its lockless version as
from now on pnv_pci_ioda2_tce_invalidate() can be called in
the real mode too.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a540aa56

28 3月, 2017 2 次提交

powerpc/powernv: Fix XSCOM address mangling for form 1 indirect · 517c2757

由 Michael Neuling 提交于 3月 24, 2017

POWER9 adds form 1 scoms. The form of the indirection is specified in
the top nibble of the scom address.

Currently we do some (ugly) bit mangling so that we can fit a 64 bit
scom address into the debugfs interface. The current code only shifts
the top bit (indirect bit).

This patch changes it to shift the whole top nibble so that the form
of the indirection is also shifted.

This patch is backwards compatible with older scoms.

(This change isn't required in the arch/powerpc/platforms/powernv/opal-prd.c
scom interface as it passes the whole 64bit scom address without any bit
mangling)
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

517c2757

powerpc/powernv: de-deuplicate OPAL call wrappers · c3a08e93

由 Oliver O'Halloran 提交于 3月 23, 2017

Currently the code to perform an OPAL call is duplicated between the
normal path and path taken when tracepoints are enabled. There's no
real need for this and combining them makes opal_tracepoint_entry
considerably easier to understand.
Signed-off-by: NOliver O'Halloran <oohall@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c3a08e93

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功