提交 · b9fde58db7e5738cacb740b0ec547933fe314fbe · openeuler / Kernel

26 9月, 2017 1 次提交

powerpc/powernv: Rework EEH initialization on powernv · b9fde58d

由 Benjamin Herrenschmidt 提交于 9月 07, 2017

Remove the post_init callback which is only used
by powernv, we can just call it explicitly from
the powernv code.

This partially kills the ability to "disable" eeh at
runtime via debugfs as this was calling that same
callback again, but this is both unused and broken
in several ways. If we want to revive it, we need
to create a dedicated enable/disable callback on the
backend that does the right thing.

Let the bulk of eeh initialize normally at
core_initcall() like it does on pseries by removing
the hack in eeh_init() that delays it.

Instead we make sure our eeh->probe cleanly bails
out of the PEs haven't been created yet and we force
a re-probe where we used to call eeh_init() again.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: NRussell Currey <ruscur@russell.cc>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b9fde58d

21 9月, 2017 2 次提交

powerpc/pseries: Fix parent_dn reference leak in add_dt_node() · b537ca6f

由 Tyrel Datwyler 提交于 9月 20, 2017

A reference to the parent device node is held by add_dt_node() for the
node to be added. If the call to dlpar_configure_connector() fails
add_dt_node() returns ENOENT and that reference is not freed.

Add a call to of_node_put(parent_dn) prior to bailing out after a
failed dlpar_configure_connector() call.

Fixes: 8d5ff320 ("powerpc/pseries: Make dlpar_configure_connector parent node aware")
Cc: stable@vger.kernel.org # v3.12+
Signed-off-by: NTyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b537ca6f

powerpc/pseries: Fix "OF: ERROR: Bad of_node_put() on /cpus" during DLPAR · 087ff6a5

由 Tyrel Datwyler 提交于 9月 20, 2017

Commit 215ee763 ("powerpc: pseries: remove dlpar_attach_node
dependency on full path") reworked dlpar_attach_node() to no longer
look up the parent node "/cpus", but instead to have the parent node
passed by the caller in the function parameter list.

As a result dlpar_attach_node() is no longer responsible for freeing
the reference to the parent node. However, commit 215ee763 failed
to remove the of_node_put(parent) call in dlpar_attach_node(), or to
take into account that the reference to the parent in the caller
dlpar_cpu_add() needs to be held until after dlpar_attach_node()
returns.

As a result doing repeated cpu add/remove dlpar operations will
eventually result in the following error:

  OF: ERROR: Bad of_node_put() on /cpus
  CPU: 0 PID: 10896 Comm: drmgr Not tainted 4.13.0-autotest #1
  Call Trace:
   dump_stack+0x15c/0x1f8 (unreliable)
   of_node_release+0x1a4/0x1c0
   kobject_put+0x1a8/0x310
   kobject_del+0xbc/0xf0
   __of_detach_node_sysfs+0x144/0x210
   of_detach_node+0xf0/0x180
   dlpar_detach_node+0xc4/0x120
   dlpar_cpu_remove+0x280/0x560
   dlpar_cpu_release+0xbc/0x1b0
   arch_cpu_release+0x6c/0xb0
   cpu_release_store+0xa0/0x100
   dev_attr_store+0x68/0xa0
   sysfs_kf_write+0xa8/0xf0
   kernfs_fop_write+0x2cc/0x400
   __vfs_write+0x5c/0x340
   vfs_write+0x1a8/0x3d0
   SyS_write+0xa8/0x1a0
   system_call+0x58/0x6c

Fix the issue by removing the of_node_put(parent) call from
dlpar_attach_node(), and ensuring that the reference to the parent
node is properly held and released by the caller dlpar_cpu_add().

Fixes: 215ee763 ("powerpc: pseries: remove dlpar_attach_node dependency on full path")
Signed-off-by: NTyrel Datwyler <tyreld@linux.vnet.ibm.com>
Reported-by: NAbdul Haleem <abdhalee@linux.vnet.ibm.com>
[mpe: Add a comment in the code and frob the change log slightly]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

087ff6a5

20 9月, 2017 1 次提交

powerpc/powernv: Clear LPCR[PECE1] via stop-api only for deep state offline · 5d298baa

由 Gautham R. Shenoy 提交于 8月 31, 2017

Commit 24be85a2 ("powerpc/powernv: Clear PECE1 in LPCR via
stop-api only on Hotplug") clears the PECE1 bit of the LPCR via
stop-api during CPU-Hotplug to prevent wakeup due to a decrementer on
an offlined CPU which is in a deep stop state.

In the case where the stop-api support is found to be lacking, the
commit 785a12af ("powerpc/powernv/idle: Disable LOSE_FULL_CONTEXT
states when stop-api fails") disables deep states that lose hypervisor
context. Thus in this case, the offlined CPU will be put to some
shallow idle state.

However, we currently unconditionally clear the PECE1 in LPCR via
stop-api during CPU-Hotplug even when deep states are disabled due to
stop-api failure.

Fix this by clearing PECE1 of LPCR via stop-api during CPU-Hotplug
*only* when the offlined CPU will be put to a deep state that loses
hypervisor context.

Fixes: 24be85a2 ("powerpc/powernv: Clear PECE1 in LPCR via stop-api only on Hotplug")
Reported-by: NPavithra Prakash <pavirampu@linux.vnet.ibm.com>
Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
Tested-by: NPavithra Prakash <pavrampu@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

5d298baa

14 9月, 2017 1 次提交

mm: treewide: remove GFP_TEMPORARY allocation flag · 0ee931c4

由 Michal Hocko 提交于 9月 13, 2017

GFP_TEMPORARY was introduced by commit e12ba74d ("Group short-lived
and reclaimable kernel allocations") along with __GFP_RECLAIMABLE.  It's
primary motivation was to allow users to tell that an allocation is
short lived and so the allocator can try to place such allocations close
together and prevent long term fragmentation.  As much as this sounds
like a reasonable semantic it becomes much less clear when to use the
highlevel GFP_TEMPORARY allocation flag.  How long is temporary? Can the
context holding that memory sleep? Can it take locks? It seems there is
no good answer for those questions.

The current implementation of GFP_TEMPORARY is basically GFP_KERNEL |
__GFP_RECLAIMABLE which in itself is tricky because basically none of
the existing caller provide a way to reclaim the allocated memory.  So
this is rather misleading and hard to evaluate for any benefits.

I have checked some random users and none of them has added the flag
with a specific justification.  I suspect most of them just copied from
other existing users and others just thought it might be a good idea to
use without any measuring.  This suggests that GFP_TEMPORARY just
motivates for cargo cult usage without any reasoning.

I believe that our gfp flags are quite complex already and especially
those with highlevel semantic should be clearly defined to prevent from
confusion and abuse.  Therefore I propose dropping GFP_TEMPORARY and
replace all existing users to simply use GFP_KERNEL.  Please note that
SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and
so they will be placed properly for memory fragmentation prevention.

I can see reasons we might want some gfp flag to reflect shorterm
allocations but I propose starting from a clear semantic definition and
only then add users with proper justification.

This was been brought up before LSF this year by Matthew [1] and it
turned out that GFP_TEMPORARY really doesn't have a clear semantic.  It
seems to be a heuristic without any measured advantage for most (if not
all) its current users.  The follow up discussion has revealed that
opinions on what might be temporary allocation differ a lot between
developers.  So rather than trying to tweak existing users into a
semantic which they haven't expected I propose to simply remove the flag
and start from scratch if we really need a semantic for short term
allocations.

[1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org

[akpm@linux-foundation.org: fix typo]
[akpm@linux-foundation.org: coding-style fixes]
[sfr@canb.auug.org.au: drm/i915: fix up]
  Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au
Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Acked-by: NMel Gorman <mgorman@suse.de>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0ee931c4

02 9月, 2017 1 次提交

powerpc/xive: guest exploitation of the XIVE interrupt controller · eac1e731

由 Cédric Le Goater 提交于 8月 30, 2017

This is the framework for using XIVE in a PowerVM guest. The support
is very similar to the native one in a much simpler form.

Each source is associated with an Event State Buffer (ESB). This is a
two bit state machine which is used to trigger events. The bits are
named "P" (pending) and "Q" (queued) and can be controlled by MMIO.
The Guest OS registers event (or notifications) queues on which the HW
will post event data for a target to notify.

Instead of OPAL calls, a set of Hypervisors call are used to configure
the interrupt sources and the event/notification queues of the guest:

 - H_INT_GET_SOURCE_INFO

   used to obtain the address of the MMIO page of the Event State
   Buffer (PQ bits) entry associated with the source.

 - H_INT_SET_SOURCE_CONFIG

   assigns a source to a "target".

 - H_INT_GET_SOURCE_CONFIG

   determines to which "target" and "priority" is assigned to a source

 - H_INT_GET_QUEUE_INFO

   returns the address of the notification management page associated
   with the specified "target" and "priority".

 - H_INT_SET_QUEUE_CONFIG

   sets or resets the event queue for a given "target" and "priority".
   It is also used to set the notification config associated with the
   queue, only unconditional notification for the moment.  Reset is
   performed with a queue size of 0 and queueing is disabled in that
   case.

 - H_INT_GET_QUEUE_CONFIG

   returns the queue settings for a given "target" and "priority".

 - H_INT_RESET

   resets all of the partition's interrupt exploitation structures to
   their initial state, losing all configuration set via the hcalls
   H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG.

 - H_INT_SYNC

   issue a synchronisation on a source to make sure sure all
   notifications have reached their queue.

As for XICS, the XIVE interface for the guest is described in the
device tree under the "interrupt-controller" node. A couple of new
properties are specific to XIVE :

 - "reg"

   contains the base address and size of the thread interrupt
   managnement areas (TIMA), also called rings, for the User level and
   for the Guest OS level. Only the Guest OS level is taken into
   account today.

 - "ibm,xive-eq-sizes"

   the size of the event queues. One cell per size supported, contains
   log2 of size, in ascending order.

 - "ibm,xive-lisn-ranges"

   the interrupt numbers ranges assigned to the guest. These are
   allocated using a simple bitmap.

and also :

 - "/ibm,plat-res-int-priorities"

   contains a list of priorities that the hypervisor has reserved for
   its own use.

Tested with a QEMU XIVE model for pseries and with the Power hypervisor.
Signed-off-by: NCédric Le Goater <clg@kaod.org>
Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

eac1e731

01 9月, 2017 2 次提交

powerpc/powernv/npu: Move tlb flush before launching ATSD · bab9f954

由 Alistair Popple 提交于 8月 11, 2017

The nest MMU tlb flush needs to happen before the GPU translation
shootdown is launched to avoid the GPU refilling its tlb with stale
nmmu translations prior to the nmmu flush completing.

Fixes: 1ab66d1f ("powerpc/powernv: Introduce address translation services for Nvlink2")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: NAlistair Popple <alistair@popple.id.au>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

bab9f954

powerpc/powernv: update to new mmu_notifier semantic · d1d5762e

由 Jérôme Glisse 提交于 8月 31, 2017

Calls to mmu_notifier_invalidate_page() were replaced by calls to
mmu_notifier_invalidate_range() and now are bracketed by calls to
mmu_notifier_invalidate_range_start()/end()

Remove now useless invalidate_page callback.
Signed-off-by: NJérôme Glisse <jglisse@redhat.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Alistair Popple <alistair@popple.id.au>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d1d5762e

31 8月, 2017 24 次提交

powerpc/pseries: Don't attempt to acquire drc during memory hot add for assigned lmbs · afb5519f

由 John Allen 提交于 8月 23, 2017

Check if an LMB is assigned before attempting to call dlpar_acquire_drc
in order to avoid any unnecessary rtas calls. This substantially
reduces the running time of memory hot add on lpars with large amounts
of memory.

[mpe: We need to explicitly set rc to 0 in the success case, otherwise
 the compiler might think we use rc without initialising it.]

Fixes: c21f515c ("powerpc/pseries: Make the acquire/release of the drc for memory a seperate step")
Cc: stable@vger.kernel.org # v4.11+
Signed-off-by: NJohn Allen <jallen@linux.vnet.ibm.com>
Reviewed-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

afb5519f

powerpc/4xx: Constify cpm_suspend_ops · 7def9a24

由 Arvind Yadav 提交于 8月 30, 2017

struct platform_suspend_ops are not supposed to change at runtime.
Functions suspend_set_ops working with const platform_suspend_ops. So
mark the non-const structs as const.
Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7def9a24

powerpc/asm: Convert .llong directives to .8byte · eb039161

由 Tobin C. Harding 提交于 3月 09, 2017

.llong is an undocumented PPC specific directive. The generic
equivalent is .quad, but even better (because it's self describing) is
.8byte.

Convert all .llong directives to .8byte.
Signed-off-by: NTobin C. Harding <me@tobin.cc>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

eb039161

powerpc: Squash lines for simple wrapper functions · 7f2462ac

由 Masahiro Yamada 提交于 9月 06, 2016

Remove unneeded variables and assignments.
Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7f2462ac

powerpc/powernv/vas: Define copy/paste interfaces · 2392c8c8

由 Sukadev Bhattiprolu 提交于 8月 28, 2017

Define interfaces (wrappers) to the 'copy' and 'paste'
instructions (which are new in PowerISA 3.0). These are intended to be
used to by NX driver(s) to submit Coprocessor Request Blocks (CRBs) to
the NX hardware engines.
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

2392c8c8

powerpc/powernv/vas: Define vas_tx_win_open() · 5239af67

由 Sukadev Bhattiprolu 提交于 8月 28, 2017

Define an interface to open a VAS send window. This interface is
intended to be used the Nest Accelerator (NX) driver(s) to open
a send window and use it to submit compression/encryption requests
to a VAS receive window.

The receive window, identified by the [vasid, cop] parameters, must
already be open in VAS (i.e connected to an NX engine).
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

5239af67

powerpc/powernv/vas: Define vas_win_close() interface · 98271d41

由 Sukadev Bhattiprolu 提交于 8月 28, 2017

Define the vas_win_close() interface which should be used to close a
send or receive windows.

While the hardware configurations required to open send and receive
windows differ, the configuration to close a window is the same for
both. So we use a single interface to close the window.
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

98271d41

powerpc/powernv/vas: Define vas_rx_win_open() interface · 62c4eda4

由 Sukadev Bhattiprolu 提交于 8月 28, 2017

Define the vas_rx_win_open() interface. This interface is intended to
be used by the Nest Accelerator (NX) driver(s) to setup receive
windows for one or more NX engines (which implement compression &
encryption algorithms in the hardware).

Follow-on patches will provide an interface to close the window and to
open a send window that kernel subsystems can use to access the NX
engines.

The interface to open a receive window is expected to be invoked for
each instance of VAS in the system.
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

62c4eda4

powerpc/powernv/vas: Define helpers to alloc/free windows · bbfe59f8

由 Sukadev Bhattiprolu 提交于 8月 28, 2017

Define helpers to allocate/free VAS window objects. These will be used
in follow-on patches when opening/closing windows.
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

bbfe59f8

powerpc/powernv/vas: Define helpers to init window context · b25b33ac

由 Sukadev Bhattiprolu 提交于 8月 28, 2017

Define helpers to initialize window context registers of the VAS
hardware. These will be used in follow-on patches when opening/closing
VAS windows.
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b25b33ac

powerpc/powernv/vas: Define helpers to access MMIO regions · 180fe15a

由 Sukadev Bhattiprolu 提交于 8月 28, 2017

Define some helper functions to access the MMIO regions. We use these
in follow-on patches to read/write VAS hardware registers. They are
also used to later issue 'paste' instructions to submit requests to
the NX hardware engines.
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

180fe15a

powerpc/powernv/vas: Define vas_init() and vas_exit() · 4dea2d1a

由 Sukadev Bhattiprolu 提交于 8月 28, 2017

Implement vas_init() and vas_exit() functions for a new VAS module.
This VAS module is essentially a library for other device drivers
and kernel users of the NX coprocessors like NX-842 and NX-GZIP.
In the future this will be extended to add support for user space
to access the NX coprocessors.

VAS is currently only supported with 64K page size.
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4dea2d1a

powerpc/powernv/vas: Define macros, register fields and structures · 96768914

由 Sukadev Bhattiprolu 提交于 8月 28, 2017

Define macros for the VAS hardware registers and bit-fields as well
as couple of data structures needed by the VAS driver.
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
[mpe: Fixup include guard to use _ASM_POWERPC_VAS_H]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

96768914

powerpc/pci: Remove OF node back pointer from pci_dn · f1e08232

由 Alexey Kardashevskiy 提交于 8月 29, 2017

The check_req() helper uses pci_get_pdn() to get an OF node pointer.
pci_get_pdn() returns a pci_dn pointer which either:
1) from the OF node returned by pci_device_to_OF_node();
2) from the parent child_list where entries don't have OF node pointers.
Since check_req() does not care about 2), it can call
pci_device_to_OF_node() directly, hence the change.

The find_pe_dn() helper uses embedded pci_dn to get an OF node which is
also stored in edev->pdev so let's take a shortcut and call
pci_device_to_OF_node() directly.

With these 2 changes, we can finally get rid of the OF node back pointer.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f1e08232

powerpc/eeh: Remove unnecessary config_addr from eeh_dev · 405b33a7

由 Alexey Kardashevskiy 提交于 8月 29, 2017

The eeh_dev struct hold a config space address of an associated node
and the very same address is also stored in the pci_dn struct which
is always present during the eeh_dev lifetime.

This uses bus:devfn directly from pci_dn instead of cached and packed
config_addr.

Since config_addr is made from device's bus:dev.fn, there is no point
in keeping it in the debugfs either so remove that too.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

405b33a7

powerpc/eeh: Remove unnecessary pointer to phb from eeh_dev · 69672bd7

由 Alexey Kardashevskiy 提交于 8月 29, 2017

The eeh_dev struct already holds a pointer to pci_dn which it does not
exist without and pci_dn itself holds the very same pointer so just
use it.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

69672bd7

powerpc/eeh: Reduce to one the number of places where edev is allocated · 8bae6a23

由 Alexey Kardashevskiy 提交于 8月 29, 2017

arch/powerpc/kernel/eeh_dev.c:57 is the only legit place where edev
is allocated; other 2 places allocate it on stack and in the heap for
a very short period of time to use eeh_pe_get() as takes edev.

This changes eeh_pe_get() to receive required parameters explicitly.

This removes unnecessary temporary allocation of edev.

This uses the "pe_no" name instead of the "pe_config_addr" name as
it actually is a PE number and not a config space address as it seemed.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: NRussell Currey <ruscur@russell.cc>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

8bae6a23

powerpc/512x: Constify clk_div_tables · 9e2b70fb

由 Arvind Yadav 提交于 8月 28, 2017

clk_div_tables are not supposed to change at runtime.
mpc512x_clk_divtable function working with const clk_div_table. So
mark the non-const structs as const.
Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

9e2b70fb

powerpc/83xx: Use sizeof correct type when ioremapping · c6554045

由 Dan Carpenter 提交于 6月 28, 2017

There is a cut and paste error here so we use sizeof(struct mpc83xx_pmc)
to remap the memory for "clock_regs". That sizeof() is 20 bytes and we
only need to remap 12 bytes. It presumably doesn't affect run time too
much...

I changed them to both use sizeof(*variable_name) because that's the
preferred kernel style these days.

Fixes: d49747bd ("powerpc/mpc83xx: Power Management support")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
[mpe: It will map at least one page anyway, but still a good cleanup]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c6554045

powerpc/powernv: Use kernel crash path for machine checks · 6fcd6baa

由 Nicholas Piggin 提交于 7月 19, 2017

There are quite a few machine check exceptions that can be caused by
kernel bugs. To make debugging easier, use the kernel crash path in
cases of synchronous machine checks that occur in kernel mode, if that
would not result in the machine going straight to panic or crash dump.

There is a downside here that die()ing the process in kernel mode can
still leave the system unstable. panic_on_oops will always force the
system to fail-stop, so systems where that behaviour is important will
still do the right thing.

As a test, when triggering an i-side 0111b error (ifetch from foreign
address) in kernel mode process context on POWER9, the kernel currently
dies quickly like this:

  Severe Machine check interrupt [Not recovered]
    NIP [ffff000000000000]: 0xffff000000000000
    Initiator: CPU
    Error type: Real address [Instruction fetch (foreign)]
  [  127.426651616,0] OPAL: Reboot requested due to Platform error.
      Effective[  127.426693712,3] OPAL: Reboot requested due to Platform error. address: ffff000000000000
  opal: Reboot type 1 not supported
  Kernel panic - not syncing: PowerNV Unrecovered Machine Check
  CPU: 56 PID: 4425 Comm: syscall Tainted: G   M            4.12.0-rc1-13857-ga4700a26-dirty #35
  Call Trace:
  [  128.017988928,4] IPMI: BUG: Dropping ESEL on the floor due to
    buggy/mising code in OPAL for this BMC
    Rebooting in 10 seconds..
  Trying to free IRQ 496 from IRQ context!

After this patch, the process is killed and the kernel continues with
this message, which gives enough information to identify the offending
branch (i.e., with CFAR):

  Severe Machine check interrupt [Not recovered]
    NIP [ffff000000000000]: 0xffff000000000000
    Initiator: CPU
    Error type: Real address [Instruction fetch (foreign)]
      Effective address: ffff000000000000
  Oops: Machine check, sig: 7 [#1]
  SMP NR_CPUS=2048
  NUMA
  PowerNV
  Modules linked in: iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 ...
  CPU: 22 PID: 4436 Comm: syscall Tainted: G   M            4.12.0-rc1-13857-ga4700a26-dirty #36
  task: c000000932300000 task.stack: c000000932380000
  NIP: ffff000000000000 LR: 00000000217706a4 CTR: ffff000000000000
  REGS: c00000000fc8fd80 TRAP: 0200   Tainted: G   M             (4.12.0-rc1-13857-ga4700a26-dirty)
  MSR: 90000000001c1003 <SF,HV,ME,RI,LE>
    CR: 24000484  XER: 20000000
  CFAR: c000000000004c80 DAR: 0000000021770a90 DSISR: 0a000000 SOFTE: 1
  GPR00: 0000000000001ebe 00007fffce4818b0 0000000021797f00 0000000000000000
  GPR04: 00007fff8007ac24 0000000044000484 0000000000004000 00007fff801405e8
  GPR08: 900000000280f033 0000000024000484 0000000000000000 0000000000000030
  GPR12: 9000000000001003 00007fff801bc370 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR28: 00007fff801b0000 0000000000000000 00000000217707a0 00007fffce481918
  NIP [ffff000000000000] 0xffff000000000000
  LR [00000000217706a4] 0x217706a4
  Call Trace:
  Instruction dump:
  XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
  XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Reviewed-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

6fcd6baa

powerpc/powernv: Flush console before platform error reboot · b746e3e0

由 Nicholas Piggin 提交于 7月 19, 2017

Unrecovered MCE and HMI errors are sent through a special restart OPAL
call to log the platform error. The downside is that they don't go
through normal Linux crash paths, so they don't give much information
to the Linux console.

Change this by providing a special crash function which does some of
the console flushing from the panic() path before calling firmware to
reboot.

The downside of this is a little more code to execute before reaching
the firmware reboot. However in practice, it's critical to get the
Linux console messages output in order to debug a problem. So this is
a desirable tradeoff.

Note on the implementation: It is difficult to plumb a custom reboot
handler into the panic path, because panic does a little bit too much
work. For example, it will try to delay with the timebase, but that
may be corrupted in some cases resulting in a hang without reaching
the platform reboot. Another problem is that panic can invoke the
crash dump code which is not what we want in the case of a hardware
platform error. Long-term the best solution will be to rework the
panic path so it can be suitable for this kind of panic, but for now
we just duplicate a bit of the code.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Reviewed-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b746e3e0

powerpc/pseries/le: Work around a firmware quirk · bded0706

由 Nicholas Piggin 提交于 7月 05, 2017

Some PowerVM firmware when delivering a system reset interrupt to a
little endian OS will mess up SRR registers. They are byteswapped, and
SRR1 is incorrect. An example from a crash:

  NIP: 14dd0900000000c0
  MSR: 1000000200000080

It's possible to detect this pattern in SRR1 (that would never happen
in normal operation), and at least fix the NIP. After this patch, the
same interrupt reports NIP properly:

  NIP [c00000000009dd14] plpar_hcall_norets+0x1c/0x28
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

bded0706

powerpc: Do not call ppc_md.panic in fadump panic notifier · a3b2cb30

由 Nicholas Piggin 提交于 7月 05, 2017

If fadump is not registered, and no other crash or debug handlers are
registered, the powerpc panic handler stops the guest before the
generic panic code can push out debug information to the console.

Currently, system reset injection causes the guest to silently stop.

Stop calling ppc_md.panic in the panic notifier. crash_fadump already
does rtas_os_term() to terminate the guest if fadump is registered.

Remove ppc_md.panic. Move fadump panic notifier into fadump code.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Reviewed-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a3b2cb30

powerpc/powernv: powernv platform is not constrained by RMA · 76b42e28

由 Nicholas Piggin 提交于 8月 13, 2017

Remove incorrect comment about real mode address restrictions on
powernv (bare metal), and unnecessary clamping to ppc64_rma_size.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

76b42e28

28 8月, 2017 1 次提交

powerpc/powernv: Fix build error in opal-imc.c when NUMA=n · 6538ac30

由 LABBE Corentin 提交于 8月 16, 2017

When building a random powerpc kernel I hit this build error:

  arch/powerpc/platforms/powernv/opal-imc.c:130:13: error : assignment
  discards « const » qualifier from pointer target type
  [-Werror=discarded-qualifiers]
     l_cpumask = cpumask_of_node(nid);
             ^

This happens because when CONFIG_NUMA=n cpumask_of_node() returns a
const pointer.

This patch simply adds const to l_cpumask to fix this issue.
Signed-off-by: NCorentin Labbe <clabbe.montjoie@gmail.com>
Reviewed-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
[mpe: Flesh out change log]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

6538ac30

24 8月, 2017 1 次提交

powerpc/powernv: Enable removal of memory for in memory tracing · 9d5171a8

由 Rashmica Gupta 提交于 6月 01, 2017

The hardware trace macro feature requires access to a chunk of real
memory. This patch provides a debugfs interface to do this. By
writing an integer containing the size of memory to be unplugged into
/sys/kernel/debug/powerpc/memtrace/enable, the code will attempt to
remove that much memory from the end of each NUMA node.

This patch also adds additional debugsfs files for each node that
allows the tracer to interact with the removed memory, as well as
a trace file that allows userspace to read the generated trace.

Note that this patch does not invoke the hardware trace macro, it
only allows memory to be removed during runtime for the trace macro
to utilise.
Signed-off-by: NRashmica Gupta <rashmica.g@gmail.com>
[mpe: Minor formatting etc fixups]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

9d5171a8

23 8月, 2017 3 次提交

powerpc: pseries: remove dlpar_attach_node dependency on full path · 215ee763

由 Rob Herring 提交于 8月 21, 2017

In preparation to stop storing the full node path in full_name, remove the
dependency on full_name from dlpar_attach_node(). Callers of
dlpar_attach_node() already have the parent device_node, so just pass the
parent node into dlpar_attach_node instead of the path. This avoids doing
a lookup of the parent node by the path.
Signed-off-by: NRob Herring <robh@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

215ee763

powerpc: Convert to using %pOF instead of full_name · b7c670d6

由 Rob Herring 提交于 8月 21, 2017

Now that we have a custom printf format specifier, convert users of
full_name to use %pOF instead. This is preparation to remove storing
of the full path string for each node.
Signed-off-by: NRob Herring <robh@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Anatolij Gustschin <agust@denx.de>
Cc: Scott Wood <oss@buserror.net>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linuxppc-dev@lists.ozlabs.org
Reviewed-by: NTyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b7c670d6

powerpc/vio: Use device_type to detect family · bcf21e3a

由 Michael Ellerman 提交于 8月 22, 2017

Currently in the vio.c code we use a comparision against the parent
device node's full path to decide if the device is a PFO or VIO family
device.

Both the ibm,platform-facilities and vdevice nodes are defined by PAPR,
and must have a matching device_type. So instead of using the path we
can instead compare the device_type.

I've checked Qemu and kvmtool both do this correctly, and all the
PowerVM systems I have access to do also. So it seems to be safe.

This removes the dependency on full_name, which is being removed
upstream.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

bcf21e3a

17 8月, 2017 1 次提交

powerpc: Add const to bin_attribute structures · 8bfa42ab

由 Bhumika Goyal 提交于 8月 02, 2017

Declare bin_attribute structures as const as they are only passed as an
argument to the function sysfs_create_bin_file. This argument is of
type const, so declare the structure as const.
Signed-off-by: NBhumika Goyal <bhumirks@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

8bfa42ab

15 8月, 2017 1 次提交

powerpc/chrp: Store the intended structure · 36992606

由 Julia Lawall 提交于 8月 13, 2017

Normally the values in the resource field and the argument to ARRAY_SIZE
in the num_resources are the same. In this case, the value in the reousrce
field is the same as the one in the previous platform_device structure, and
appears to be a copy-paste error. Replace the value in the resource field
with the argument to the local call to ARRAY_SIZE.
Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

36992606

14 8月, 2017 1 次提交

powerpc/8xx: Fix two CONFIG_8xx left behind · ab2675d6

由 Christophe Leroy 提交于 8月 14, 2017

Commit 968159c0 ("powerpc/8xx: Getting rid of remaining use of
CONFIG_8xx") removed all but 2 references to 8xx in Kconfigs.

This patch removes the two remaining ones.

Fixes: 968159c0 ("powerpc/8xx: Getting rid of remaining use of CONFIG_8xx")
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ab2675d6

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功