提交 · 7230c5644188cd9e3fb380cc97dde00c464a3ba7 · openeuler / raspberrypi-kernel

09 3月, 2012 15 次提交

powerpc: Rework lazy-interrupt handling · 7230c564

由 Benjamin Herrenschmidt 提交于 3月 06, 2012

The current implementation of lazy interrupts handling has some
issues that this tries to address.

We don't do the various workarounds we need to do when re-enabling
interrupts in some cases such as when returning from an interrupt
and thus we may still lose or get delayed decrementer or doorbell
interrupts.

The current scheme also makes it much harder to handle the external
"edge" interrupts provided by some BookE processors when using the
EPR facility (External Proxy) and the Freescale Hypervisor.

Additionally, we tend to keep interrupts hard disabled in a number
of cases, such as decrementer interrupts, external interrupts, or
when a masked decrementer interrupt is pending. This is sub-optimal.

This is an attempt at fixing it all in one go by reworking the way
we do the lazy interrupt disabling from the ground up.

The base idea is to replace the "hard_enabled" field with a
"irq_happened" field in which we store a bit mask of what interrupt
occurred while soft-disabled.

When re-enabling, either via arch_local_irq_restore() or when returning
from an interrupt, we can now decide what to do by testing bits in that
field.

We then implement replaying of the missed interrupts either by
re-using the existing exception frame (in exception exit case) or via
the creation of a new one from an assembly trampoline (in the
arch_local_irq_enable case).

This removes the need to play with the decrementer to try to create
fake interrupts, among others.

In addition, this adds a few refinements:

 - We no longer  hard disable decrementer interrupts that occur
while soft-disabled. We now simply bump the decrementer back to max
(on BookS) or leave it stopped (on BookE) and continue with hard interrupts
enabled, which means that we'll potentially get better sample quality from
performance monitor interrupts.

 - Timer, decrementer and doorbell interrupts now hard-enable
shortly after removing the source of the interrupt, which means
they no longer run entirely hard disabled. Again, this will improve
perf sample quality.

 - On Book3E 64-bit, we now make the performance monitor interrupt
act as an NMI like Book3S (the necessary C code for that to work
appear to already be present in the FSL perf code, notably calling
nmi_enter instead of irq_enter). (This also fixes a bug where BookE
perfmon interrupts could clobber r14 ... oops)

 - We could make "masked" decrementer interrupts act as NMIs when doing
timer-based perf sampling to improve the sample quality.

Signed-off-by-yet: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

v2:

- Add hard-enable to decrementer, timer and doorbells
- Fix CR clobber in masked irq handling on BookE
- Make embedded perf interrupt act as an NMI
- Add a PACA_HAPPENED_EE_EDGE for use by FSL if they want
  to retrigger an interrupt without preventing hard-enable

v3:

 - Fix or vs. ori bug on Book3E
 - Fix enabling of interrupts for some exceptions on Book3E

v4:

 - Fix resend of doorbells on return from interrupt on Book3E

v5:

 - Rebased on top of my latest series, which involves some significant
rework of some aspects of the patch.

v6:
 - 32-bit compile fix
 - more compile fixes with various .config combos
 - factor out the asm code to soft-disable interrupts
 - remove the C wrapper around preempt_schedule_irq

v7:
 - Fix a bug with hard irq state tracking on native power7

7230c564

powerpc: Replace mfmsr instructions with load from PACA kernel_msr field · d9ada91a

由 Benjamin Herrenschmidt 提交于 3月 02, 2012

On 64-bit, the mfmsr instruction can be quite slow, slower
than loading a field from the cache-hot PACA, which happens
to already contain the value we want in most cases.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

d9ada91a

powerpc: Fix 64-bit BookE FP unavailable exceptions · 9424fabf

由 Benjamin Herrenschmidt 提交于 3月 05, 2012

We were using CR0.EQ after EXCEPTION_COMMON, hoping it still
contained whether we came from userspace or kernel space.

However, under some circumstances, EXCEPTION_COMMON will
call C code and clobber non-volatile registers, so we really
need to re-load the previous MSR from the stackframe and
re-test.

While there, invert the condition to make the fast path more
obvious and remove the BUG_OPCODE which was a debugging
leftover and call .ret_from_except as we should.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

9424fabf

powerpc: Fix register clobbering when accumulating stolen time · 990118c8

由 Benjamin Herrenschmidt 提交于 3月 02, 2012

When running under a hypervisor that supports stolen time accounting,
we may call C code from the macro EXCEPTION_PROLOG_COMMON in the
exception entry path, which clobbers CR0.

However, the FPU and vector traps rely on CR0 indicating whether we
are coming from userspace or kernel to decide what to do.

So we need to restore that value after the C call
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

990118c8

powerpc/xmon: Add display of soft & hard irq states · 7ac21cd4

由 Benjamin Herrenschmidt 提交于 3月 02, 2012

Also use local_paca instead of get_paca() to avoid getting into
the smp_processor_id() debugging code from the debugger
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

7ac21cd4

powerpc: Add support for page fault retry and fatal signals · 9be72573

由 Benjamin Herrenschmidt 提交于 3月 01, 2012

Other architectures such as x86 and ARM have been growing
new support for features like retrying page faults after
dropping the mm semaphore to break contention, or being
able to return from a stuck page fault when a SIGKILL is
pending.

This refactors our implementation of do_page_fault() to
move the error handling out of line in a way similar to
x86 and adds support for those two features.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

9be72573

powerpc: Disable interrupts in 64-bit kernel FP and vector faults · 9f2f79e3

由 Benjamin Herrenschmidt 提交于 3月 01, 2012

If we get a floating point, altivec or vsx unavaible interrupt in
kernel, we trigger a kernel error. There is no point preserving
the interrupt state, in fact, that can even make debugging harder
as the processor state might change (we may even preempt) between
taking the exception and landing in a debugger.

So just make those 3 disable interrupts unconditionally.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
---

v2: On BookE only disable when hitting the kernel unavailable
    path, otherwise it will fail to restore softe as
    fast_exception_return doesn't do it.

9f2f79e3

powerpc: Call do_page_fault() with interrupts off · a546498f

由 Benjamin Herrenschmidt 提交于 3月 07, 2012

We currently turn interrupts back to their previous state before
calling do_page_fault(). This can be annoying when debugging as
a bad fault will potentially have lost some processor state before
getting into the debugger.

We also end up calling some generic code with interrupts enabled
such as notify_page_fault() with interrupts enabled, which could
be unexpected.

This changes our code to behave more like other architectures,
and make the assembly entry code call into do_page_faults() with
interrupts disabled. They are conditionally re-enabled from
within do_page_fault() in the same spot x86 does it.

While there, add the might_sleep() test in the case of a successful
trylock of the mmap semaphore, again like x86.

Also fix a bug in the existing assembly where r12 (_MSR) could get
clobbered by C calls (the DTL accounting in the exception common
macro and DISABLE_INTS) in some cases.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
---

v2. Add the r12 clobber fix

a546498f

powerpc: Improve behaviour of irq tracing on 64-bit exception entry · 1b701179

由 Benjamin Herrenschmidt 提交于 3月 01, 2012

Some exceptions would unconditionally disable interrupts on entry,
which is fine, but calling lockdep every time not only adds more
overhead than strictly needed, but also means we get quite a few
"redudant" disable logged, which makes it hard to spot the really
bad ones.

So instead, split the macro used by the exception code into a
normal one and a separate one used when CONFIG_TRACE_IRQFLAGS is
enabled, and make the later skip th tracing if interrupts were
already disabled.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

1b701179

powerpc: Improve 64-bit syscall entry/exit · 1421ae0b

由 Benjamin Herrenschmidt 提交于 3月 01, 2012

We unconditionally hard enable interrupts. This is unnecessary as
syscalls are expected to always be called with interrupts enabled.

While at it, we add a WARN_ON if that is not the case and
CONFIG_TRACE_IRQFLAGS is enabled (we don't want to add overhead
to the fast path when this is not set though).

Thus let's remove the enabling (and associated irq tracing) from
the syscall entry path. Also on Book3S, replace a few mfmsr
instructions with loads of PACAMSR from the PACA, which should be
faster & schedule better.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

1421ae0b

powerpc: Rework runlatch code · fe1952fc

由 Benjamin Herrenschmidt 提交于 3月 01, 2012

This moves the inlines into system.h and changes the runlatch
code to use the thread local flags (non-atomic) rather than
the TIF flags (atomic) to keep track of the latch state.

The code to turn it back on in an asynchronous interrupt is
now simplified and partially inlined.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

fe1952fc

powerpc: Use the same interrupt prolog for perfmon as other interrupts · 7450f6f0

由 Benjamin Herrenschmidt 提交于 3月 01, 2012

The perfmon interrupt is the sole user of a special variant of the
interrupt prolog which differs from the one used by external and timer
interrupts in that it saves the non-volatile GPRs and doesn't turn the
runlatch on.

The former is unnecessary and the later is arguably incorrect, so
let's clean that up by using the same prolog. While at it we rename
that prolog to use the _ASYNC prefix.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

7450f6f0

powerpc: Remove legacy iSeries bits from assembly files · 4f8cf36f

由 Benjamin Herrenschmidt 提交于 2月 28, 2012

This removes the various bits of assembly in the kernel entry,
exception handling and SLB management code that were specific
to running under the legacy iSeries hypervisor which is no
longer supported.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

4f8cf36f

powerpc: clean up vio.c · b0787660

由 Stephen Rothwell 提交于 3月 07, 2012

This cleans up vio.c after the removal of the legacy iSeries platform.
It also removes some no longer referenced include files.
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

b0787660

powerpc: Remove the main legacy iSerie platform code · 8ee3e0d6

由 Stephen Rothwell 提交于 3月 07, 2012

Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

8ee3e0d6

07 3月, 2012 7 次提交

powerpc/pmac: Use string library in nvram code · 2d4b9712

由 Akinobu Mita 提交于 1月 27, 2012

- Use memchr_inv to check if the data contains all 0xFF bytes.
  It is faster than looping for each byte.

- Use memcmp to compare memory areas
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

2d4b9712

powerpc: Make SPARSE_IRQ required · ad5b7f13

由 Grant Likely 提交于 1月 30, 2012

All IRQs on powerpc are managed via irq_domain anyway, there isn't really
any advantage to turning SPARSE_IRQ off, and it's the direction we want
to take the kernel design anyway.  This patch makes powerpc always use
SPARSE_IRQ.

On pseries_defconfig, SPARSE_IRQ adds only about 0x300 bytes to the
.text sections, and removes about 0x20000 from the data section for the
static irq_desc table.
Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Ben Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

ad5b7f13

powerpc/prom: Remove limit on maximum size of properties · e9daf2ad

由 Nishanth Aravamudan 提交于 2月 27, 2012

On a 16TB system (using AMS/CMO), I get:

WARNING: ignoring large property [/ibm,dynamic-reconfiguration-memory] ibm,dynamic-memory length 0x000000000017ffec

and significantly less memory is thus shown to the partition. As far as
I can tell, the constant used is arbitrary. Ben Herrenschmidt provided
additional background that

> The limit was originally set because of Apple machines carrying ROM
> images in the device-tree, at a time where we were much more memory
> constrained than we are now.

and that it is likely not very useful any longer.
Signed-off-by: NNishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

e9daf2ad

powerpc: Use set_current_blocked() and block_sigmask() · a2007ce8

由 Matt Fleming 提交于 2月 14, 2012

As described in e6fa16ab ("signal: sigprocmask() should do
retarget_shared_pending()") the modification of current->blocked is
incorrect as we need to check whether the signal we're about to block
is pending in the shared queue.

Also, use the new helper function introduced in commit 5e6292c0
("signal: add block_sigmask() for adding sigmask to current->blocked")
which centralises the code for updating current->blocked after
successfully delivering a signal and reduces the amount of duplicate
code across architectures. In the past some architectures got this
code wrong, so using this helper function should stop that from
happening again.

Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

a2007ce8

powerpc: Use vsprintf extention %pf with builtin_return_address · a2234b4b

由 Joe Perches 提交于 2月 28, 2012

Emit the function name not the address when possible.

builtin_return_address() gives an address.  When building
a kernel with CONFIG_KALLSYMS, emit the actual function
name not the address.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

a2234b4b

powerpc/icswx: Fix race condition with IPI setting ACOP · de801de1

由 Jimi Xenidis 提交于 2月 28, 2012

There is a race where a thread causes a coprocessor type to be valid
in its own ACOP _and_ in the current context, but it does not
propagate to the ACOP register of other threads in time for them to
use it.  The original code tries to solve this by sending an IPI to
all threads on the system, which is heavy handed, but unfortunately
still provides a window where the icswx is issued by other threads and
the ACOP is not up to date.

This patch detects that the ACOP DSI fault was a "false positive" and
syncs the ACOP and causes the icswx to be replayed.
Signed-off-by: NJimi Xenidis <jimix@pobox.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

de801de1

powerpc/atomic: Implement atomic*_inc_not_zero · a6cf7ed5

由 Anton Blanchard 提交于 2月 29, 2012

Implement atomic_inc_not_zero and atomic64_inc_not_zero. At the
moment we use atomic*_add_unless which requires us to put 0 and
1 constants into registers. We can also avoid a subtract by
saving the original value in a second temporary.

This removes 3 instructions from fget:

- c0000000001b63c0:       39 00 00 00     li      r8,0
- c0000000001b63c4:       39 40 00 01     li      r10,1
...
- c0000000001b63e8:       7c 0a 00 50     subf    r0,r10,r0
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

a6cf7ed5

27 2月, 2012 4 次提交

arch/powerpc/platforms/powernv/setup.c: included asm/xics.h twice · 0a167e0a

由 Danny Kukawka 提交于 2月 16, 2012

arch/powerpc/platforms/powernv/setup.c: included 'asm/xics.h' twice,
remove the duplicate.
Signed-off-by: NDanny Kukawka <danny.kukawka@bisect.de>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

0a167e0a

arch/powerpc/kvm/book3s_hv.c: included linux/sched.h twice · ed7e3d1c

由 Danny Kukawka 提交于 2月 16, 2012

arch/powerpc/kvm/book3s_hv.c: included 'linux/sched.h' twice,
remove the duplicate.
Signed-off-by: NDanny Kukawka <danny.kukawka@bisect.de>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

ed7e3d1c

powerpc: remove CONFIG_PPC_ISERIES from the architecture Kconfig files · 3d066d77

由 Stephen Rothwell 提交于 2月 22, 2012

After this, we can remove the legacy iSeries code more easily.
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

3d066d77

powerpc/mpic: Fix allocation of reverse-map for multi-ISU mpics · fe83364f

由 Benjamin Herrenschmidt 提交于 2月 22, 2012

When using a multi-ISU MPIC, we can interrupts up to
isu_size * MPIC_MAX_ISU, not just isu_size, so allocate
the right size reverse map.

Without this, the code will constantly fallback to
a linear search.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

fe83364f

24 2月, 2012 7 次提交

arch/arm/mach-shmobile/board-ag5evm.c: included linux/dma-mapping.h twice · 7372a4cd

由 Danny Kukawka 提交于 2月 16, 2012

arch/arm/mach-shmobile/board-ag5evm.c: included 'linux/dma-mapping.h'
twice, remove the duplicate.
Signed-off-by: NDanny Kukawka <danny.kukawka@bisect.de>
Signed-off-by: NPaul Mundt <lethal@linux-sh.org>

7372a4cd

ARM: mach-shmobile: r8a7779 PFC IPSR4 fix · 74eb436e

由 Magnus Damm 提交于 1月 30, 2012

Fix the bit field width information for the IPSR4 register
in the r8a7779 pin function controller (PFC).

Without this fix the Marzen board fails to receive data
over the serial console due to misconfigured pin function
for the RX pin.
Signed-off-by: NMagnus Damm <damm@opensource.se>
Tested-by: NKuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Tested-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NPaul Mundt <lethal@linux-sh.org>

74eb436e

ARM: mach-shmobile: sh73a0 PSTR 32-bit access fix · 689189fb

由 Magnus Damm 提交于 1月 30, 2012

Convert the sh73a0 SMP code to use 32-bit PSTR access.

This fixes wakeup from deep sleep for sh73a0 secondary CPUs.
Signed-off-by: NMagnus Damm <damm@opensource.se>
Signed-off-by: NPaul Mundt <lethal@linux-sh.org>

689189fb

P
sh: Fix sh2a build error for CONFIG_CACHE_WRITETHROUGH · 1ae911cb
由 Phil Edworthy 提交于 2月 21, 2012
```
Signed-off-by: NPhil Edworthy <phil.edworthy@renesas.com>
Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
```
1ae911cb

sh: modify a resource of sh_eth_giga1_resources in board-sh7757lcr · befe0756

由 Shimoda, Yoshihiro 提交于 2月 20, 2012

The latest sh_eth driver needs a resource of TSU in the channel 1,
if the controller has TSU registers. So, this patch adds the resource.
Signed-off-by: NYoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: NPaul Mundt <lethal@linux-sh.org>

befe0756

arch/sh: remove references to cpu_*_map. · 004f4ce9

由 Rusty Russell 提交于 2月 15, 2012

This has been obsolescent for a while; time for the final push.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: linux-sh@vger.kernel.org
Signed-off-by: NPaul Mundt <lethal@linux-sh.org>

004f4ce9

sh: Fix typo in pci-sh7780.c · ecfb68c6

由 Masanari Iida 提交于 2月 04, 2012

Correct spelling "erorr" to "error" in
arch/sh/drivers/pci/pci-sh7780.c
Signed-off-by: NMasanari Iida <standby24x7@gmail.com>
Signed-off-by: NPaul Mundt <lethal@linux-sh.org>

ecfb68c6

23 2月, 2012 7 次提交

powerpc/perf: Move perf core & PMU code into a subdirectory · f2699491

由 Michael Ellerman 提交于 2月 20, 2012

The perf code has grown a lot since it started, and is big enough to
warrant its own subdirectory. For reference it's ~60% bigger than the
oprofile code. It declutters the kernel directory, makes it simpler to
grep for "just perf stuff", and allows us to shorten some filenames.

While we're at it, make it more obvious that we have two implementations
of the core perf logic. One for (roughly) Book3S CPUs, which was the
original implementation, and the other for Freescale embedded CPUs.
Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

f2699491

fadump: Remove the phyp assisted dump code. · 12d92992

由 Mahesh Salgaonkar 提交于 2月 16, 2012

Remove the phyp assisted dump implementation which is not is use.
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

12d92992

fadump: Invalidate the fadump registration during machine shutdown. · 67b43b9d

由 Mahesh Salgaonkar 提交于 2月 16, 2012

If dump is active during system reboot, shutdown or halt then invalidate
the fadump registration as it does not get invalidated automatically.
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

67b43b9d

fadump: Invalidate registration and release reserved memory for general use. · b500afff

由 Mahesh Salgaonkar 提交于 2月 16, 2012

This patch introduces an sysfs interface '/sys/kernel/fadump_release_mem' to
invalidate the last fadump registration, invalidate '/proc/vmcore', release
the reserved memory for general use and re-register for future kernel dump.
Once the dump is copied to the disk, unlike phyp dump, the userspace tool
can release all the memory reserved for dump with one single operation of
echo 1 to '/sys/kernel/fadump_release_mem'.

Release the reserved memory region excluding the size of the memory required
for future kernel dump registration. And therefore, unlike kdump, Fadump
doesn't need a 2nd reboot to get back the system to the production
configuration.
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

b500afff

fadump: Add PT_NOTE program header for vmcoreinfo · d34c5f26

由 Mahesh Salgaonkar 提交于 2月 16, 2012

Introduce a PT_NOTE program header that points to physical address of
vmcoreinfo_note buffer declared in kernel/kexec.c. The vmcoreinfo
note buffer is populated during crash_fadump() at the time of system
crash.
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

d34c5f26

fadump: Convert firmware-assisted cpu state dump data into elf notes. · ebaeb5ae

由 Mahesh Salgaonkar 提交于 2月 16, 2012

When registered for firmware assisted dump on powerpc, firmware preserves
the registers for the active CPUs during a system crash. This patch reads
the cpu register data stored in Firmware-assisted dump format (except for
crashing cpu) and converts it into elf notes and updates the PT_NOTE program
header accordingly. The exact register state for crashing cpu is saved to
fadump crash info structure in scratch area during crash_fadump() and read
during second kernel boot.
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

ebaeb5ae

fadump: Initialize elfcore header and add PT_LOAD program headers. · 2df173d9

由 Mahesh Salgaonkar 提交于 2月 16, 2012

Build the crash memory range list by traversing through system memory during
the first kernel before we register for firmware-assisted dump. After the
successful dump registration, initialize the elfcore header and populate
PT_LOAD program headers with crash memory ranges. The elfcore header is
saved in the scratch area within the reserved memory. The scratch area starts
at the end of the memory reserved for saving RMR region contents. The
scratch area contains fadump crash info structure that contains magic number
for fadump validation and physical address where the eflcore header can be
found. This structure will also be used to pass some important crash info
data to the second kernel which will help second kernel to populate ELF core
header with correct data before it gets exported through /proc/vmcore. Since
the firmware preserves the entire partition memory at the time of crash the
contents of the scratch area will be preserved till second kernel boot.

Since the memory dump exported through /proc/vmcore is in ELF format similar
to kdump, it will help us to reuse the kdump infrastructure for dump capture
and filtering. Unlike phyp dump, userspace tool does not need to refer any
sysfs interface while reading /proc/vmcore.

NOTE: The current design implementation does not address a possibility of
introducing additional fields (in future) to this structure without affecting
compatibility. It's on TODO list to come up with better approach to
address this.

Reserved dump area start => +-------------------------------------+
                            |  CPU state dump data                |
                            +-------------------------------------+
                            |  HPTE region data                   |
                            +-------------------------------------+
                            |  RMR region data                    |
Scratch area start       => +-------------------------------------+
                            |  fadump crash info structure {      |
                            |     magic nummber                   |
                     +------|---- elfcorehdr_addr                 |
                     |      |  }                                  |
                     +----> +-------------------------------------+
                            |  ELF core header                    |
Reserved dump area end   => +-------------------------------------+
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

2df173d9