提交 · 6d626c5ea3d8411cc2a72d7cabe70f01dfc32d1d · openeuler / Kernel

02 12月, 2014 1 次提交

powerpc/powernv: Cleanup unused MCE definitions/declarations. · 6d626c5e

由 Mahesh Salgaonkar 提交于 11月 24, 2014

Cleanup OpalMCE_* definitions/declarations and other related code which
is not used anymore.
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: NBenjamin Herrrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

6d626c5e

05 8月, 2014 1 次提交

powerpc/powernv: Invoke opal call to handle hmi. · 0ef95b41

由 Mahesh Salgaonkar 提交于 7月 29, 2014

When we hit the HMI in Linux, invoke opal call to handle/recover from HMI
errors in real mode and then in virtual mode during check_irq_replay()
invoke opal_poll_events()/opal_do_notifier() to retrieve HMI event from
OPAL and act accordingly.

Now that we are ready to handle HMI interrupt directly in linux, remove
the HMI interrupt registration with firmware.
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

0ef95b41

28 7月, 2014 1 次提交

powerpc: Remove STAB code · 376af594

由 Michael Ellerman 提交于 7月 10, 2014

Old cpus didn't have a Segment Lookaside Buffer (SLB), instead they had
a Segment Table (STAB). Now that we've dropped support for those cpus,
we can remove the STAB support entirely.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

376af594

28 5月, 2014 1 次提交

powerpc: Fix regression of per-CPU DSCR setting · 1739ea9e

由 Sam bobroff 提交于 5月 21, 2014

Since commit "efcac658 powerpc: Per process DSCR + some fixes (try#4)"
it is no longer possible to set the DSCR on a per-CPU basis.

The old behaviour was to minipulate the DSCR SPR directly but this is no
longer sufficient: the value is quickly overwritten by context switching.

This patch stores the per-CPU DSCR value in a kernel variable rather than
directly in the SPR and it is used whenever a process has not set the DSCR
itself. The sysfs interface (/sys/devices/system/cpu/cpuN/dscr) is unchanged.

Writes to the old global default (/sys/devices/system/cpu/dscr_default)
now set all of the per-CPU values and reads return the last written value.

The new per-CPU default is added to the paca_struct and is used everywhere
outside of sysfs.c instead of the old global default.
Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

1739ea9e

20 3月, 2014 2 次提交

powerpc/booke64: Critical and machine check exception support · 609af38f

由 Scott Wood 提交于 3月 10, 2014

Add special state saving for critical and machine check exceptions.

Most of this code could be used to handle debug exceptions taken from
kernel space, but actually doing so is outside the scope of this patch.

The various critical and machine check exceptions now point to their
real handlers, rather than hanging the kernel.
Signed-off-by: NScott Wood <scottwood@freescale.com>

609af38f

powerpc/booke64: Use SPRG7 for VDSO · 9d378dfa

由 Scott Wood 提交于 3月 10, 2014

Previously SPRG3 was marked for use by both VDSO and critical
interrupts (though critical interrupts were not fully implemented).

In commit 8b64a9df ("powerpc/booke64:
Use SPRG0/3 scratch for bolted TLB miss & crit int"), Mihai Caraman
made an attempt to resolve this conflict by restoring the VDSO value
early in the critical interrupt, but this has some issues:

 - It's incompatible with EXCEPTION_COMMON which restores r13 from the
   by-then-overwritten scratch (this cost me some debugging time).
 - It forces critical exceptions to be a special case handled
   differently from even machine check and debug level exceptions.
 - It didn't occur to me that it was possible to make this work at all
   (by doing a final "ld r13, PACA_EXCRIT+EX_R13(r13)") until after
   I made (most of) this patch. :-)

It might be worth investigating using a load rather than SPRG on return
from all exceptions (except TLB misses where the scratch never leaves
the SPRG) -- it could save a few cycles.  Until then, let's stick with
SPRG for all exceptions.

Since we cannot use SPRG4-7 for scratch without corrupting the state of
a KVM guest, move VDSO to SPRG7 on book3e.  Since neither SPRG4-7 nor
critical interrupts exist on book3s, SPRG3 is still used for VDSO
there.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Cc: Mihai Caraman <mihai.caraman@freescale.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: kvm-ppc@vger.kernel.org

9d378dfa

15 1月, 2014 1 次提交

powerpc: Delete non-required instances of include <linux/init.h> · c141611f

由 Paul Gortmaker 提交于 1月 09, 2014

None of these files are actually using any __init type directives
and hence don't need to include <linux/init.h>.  Most are just a
left over from __devinit and __cpuinit removal, or simply due to
code getting copied from one driver to the next.

The one instance where we add an include for init.h covers off
a case where that file was implicitly getting it from another
header which itself didn't need it.
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

c141611f

10 1月, 2014 1 次提交

powerpc/e6500: TLB miss handler with hardware tablewalk support · 28efc35f

由 Scott Wood 提交于 10月 11, 2013

There are a few things that make the existing hw tablewalk handlers
unsuitable for e6500:

 - Indirect entries go in TLB1 (though the resulting direct entries go in
   TLB0).

 - It has threads, but no "tlbsrx." -- so we need a spinlock and
   a normal "tlbsx".  Because we need this lock, hardware tablewalk
   is mandatory on e6500 unless we want to add spinlock+tlbsx to
   the normal bolted TLB miss handler.

 - TLB1 has no HES (nor next-victim hint) so we need software round robin
   (TODO: integrate this round robin data with hugetlb/KVM)

 - The existing tablewalk handlers map half of a page table at a time,
   because IBM hardware has a fixed 1MiB indirect page size.  e6500
   has variable size indirect entries, with a minimum of 2MiB.
   So we can't do the half-page indirect mapping, and even if we
   could it would be less efficient than mapping the full page.

 - Like on e5500, the linear mapping is bolted, so we don't need the
   overhead of supporting nested tlb misses.

Note that hardware tablewalk does not work in rev1 of e6500.
We do not expect to support e6500 rev1 in mainline Linux.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Cc: Mihai Caraman <mihai.caraman@freescale.com>

28efc35f

05 12月, 2013 1 次提交

powerpc/book3s: Introduce exclusive emergency stack for machine check exception. · 729b0f71

由 Mahesh Salgaonkar 提交于 10月 30, 2013

This patch introduces exclusive emergency stack for machine check exception.
We use emergency stack to handle machine check exception so that we can save
MCE information (srr1, srr0, dar and dsisr) before turning on ME bit and be
ready for re-entrancy. This helps us to prevent clobbering of MCE information
in case of nested machine checks.

The reason for using emergency stack over normal kernel stack is that the
machine check might occur in the middle of setting up a stack frame which may
result into improper use of kernel stack.
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

729b0f71

17 10月, 2013 1 次提交

kvm: powerpc: book3s: pr: Rename KVM_BOOK3S_PR to KVM_BOOK3S_PR_POSSIBLE · 7aa79938

由 Aneesh Kumar K.V 提交于 10月 07, 2013

With later patches supporting PR kvm as a kernel module, the changes
that has to be built into the main kernel binary to enable PR KVM module
is now selected via KVM_BOOK3S_PR_POSSIBLE
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

7aa79938

14 8月, 2013 2 次提交

powerpc: Make rwlocks endian safe · 54bb7f4b

由 Anton Blanchard 提交于 8月 07, 2013

Our ppc64 spinlocks and rwlocks use a trick where a lock token and
the paca index are placed in the lock with a single store. Since we
are using two u16s they need adjusting for little endian.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

54bb7f4b

powerpc: Avoid link stack corruption for MMU on exceptions · bc2e6c6a

由 Michael Neuling 提交于 8月 13, 2013

When we have MMU on exceptions (POWER8) and a relocatable kernel, we
need to branch from the initial exception vectors at 0x0 to up high
where the kernel might be located.  Currently we do this using the link
register.

Unfortunately this corrupts the link stack and instead we should use the
count register.  We did this for the syscall entry path in:
  6a404806 powerpc: Avoid link stack corruption in MMU on syscall entry path
but I stupidly forgot to do the same for other exceptions.

This patch changes the initial exception vectors to use the count
register instead of the link register when we need to branch up to the
relocated kernel.

I have a dodgy userspace test which loops calling a function that reads
the PVR (mfpvr in userspace will be emulated by the kernel via the
program check exception).  On POWER8 and with CONFIG_RELOCATABLE=y, I
get a ~10% performance improvement with my userspace test with this
patch.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

bc2e6c6a

15 2月, 2013 2 次提交

powerpc: Add transactional memory paca scratch register to show_regs · afc07701

由 Michael Neuling 提交于 2月 13, 2013

Add transactional memory paca scratch register to show_regs.  This is useful
for debugging.
Signed-off-by: NMatt Evans <matt@ozlabs.org>
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

afc07701

powerpc: Move boot_paca into early_setup · 6a7e4064

由 Geoff Levand 提交于 2月 13, 2013

The powerpc boot_paca symbol is now only used within the
early_setup() routine, so move it from its global definition
into early_setup().
Signed-off-by: NGeoff Levand <geoff@infradead.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

6a7e4064

10 1月, 2013 1 次提交

powerpc: Increase exceptions arrays in paca struct to save PPR · a09688cd

由 Haren Myneni 提交于 12月 06, 2012

[PATCH 3/6] powerpc: Increase exceptions arrays in paca struct to save PPR

Using paca to save user defined PPR value in the first level exception vector.
Signed-off-by: NHaren Myneni <haren@us.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

a09688cd

17 9月, 2012 1 次提交

powerpc/mm: Use 32bit array for slb cache · 735cafc3

由 Aneesh Kumar K.V 提交于 9月 10, 2012

With larger vsid we need to track more bits of ESID in slb cache
for slb invalidate.
Reviewed-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

735cafc3

07 9月, 2012 1 次提交

powerpc: Restore VDSO information on critical exception om BookE · 0127262c

由 Mihai Caraman 提交于 9月 06, 2012

Critical exception on 64-bit booke uses user-visible SPRG3 as scratch.
Restore VDSO information in SPRG3 on exception prolog.

Use a common sprg3 field in PACA for all powerpc64 architectures.
Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

0127262c

09 3月, 2012 1 次提交

powerpc: Rework lazy-interrupt handling · 7230c564

由 Benjamin Herrenschmidt 提交于 3月 06, 2012

The current implementation of lazy interrupts handling has some
issues that this tries to address.

We don't do the various workarounds we need to do when re-enabling
interrupts in some cases such as when returning from an interrupt
and thus we may still lose or get delayed decrementer or doorbell
interrupts.

The current scheme also makes it much harder to handle the external
"edge" interrupts provided by some BookE processors when using the
EPR facility (External Proxy) and the Freescale Hypervisor.

Additionally, we tend to keep interrupts hard disabled in a number
of cases, such as decrementer interrupts, external interrupts, or
when a masked decrementer interrupt is pending. This is sub-optimal.

This is an attempt at fixing it all in one go by reworking the way
we do the lazy interrupt disabling from the ground up.

The base idea is to replace the "hard_enabled" field with a
"irq_happened" field in which we store a bit mask of what interrupt
occurred while soft-disabled.

When re-enabling, either via arch_local_irq_restore() or when returning
from an interrupt, we can now decide what to do by testing bits in that
field.

We then implement replaying of the missed interrupts either by
re-using the existing exception frame (in exception exit case) or via
the creation of a new one from an assembly trampoline (in the
arch_local_irq_enable case).

This removes the need to play with the decrementer to try to create
fake interrupts, among others.

In addition, this adds a few refinements:

 - We no longer  hard disable decrementer interrupts that occur
while soft-disabled. We now simply bump the decrementer back to max
(on BookS) or leave it stopped (on BookE) and continue with hard interrupts
enabled, which means that we'll potentially get better sample quality from
performance monitor interrupts.

 - Timer, decrementer and doorbell interrupts now hard-enable
shortly after removing the source of the interrupt, which means
they no longer run entirely hard disabled. Again, this will improve
perf sample quality.

 - On Book3E 64-bit, we now make the performance monitor interrupt
act as an NMI like Book3S (the necessary C code for that to work
appear to already be present in the FSL perf code, notably calling
nmi_enter instead of irq_enter). (This also fixes a bug where BookE
perfmon interrupts could clobber r14 ... oops)

 - We could make "masked" decrementer interrupts act as NMIs when doing
timer-based perf sampling to improve the sample quality.

Signed-off-by-yet: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

v2:

- Add hard-enable to decrementer, timer and doorbells
- Fix CR clobber in masked irq handling on BookE
- Make embedded perf interrupt act as an NMI
- Add a PACA_HAPPENED_EE_EDGE for use by FSL if they want
  to retrigger an interrupt without preventing hard-enable

v3:

 - Fix or vs. ori bug on Book3E
 - Fix enabling of interrupts for some exceptions on Book3E

v4:

 - Fix resend of doorbells on return from interrupt on Book3E

v5:

 - Rebased on top of my latest series, which involves some significant
rework of some aspects of the patch.

v6:
 - 32-bit compile fix
 - more compile fixes with various .config combos
 - factor out the asm code to soft-disable interrupts
 - remove the C wrapper around preempt_schedule_irq

v7:
 - Fix a bug with hard irq state tracking on native power7

7230c564

08 12月, 2011 1 次提交

powerpc: Provide a way for KVM to indicate that NV GPR values are lost · 2fde6d20

由 Paul Mackerras 提交于 12月 05, 2011

This fixes a problem where a CPU thread coming out of nap mode can
think it has valid values in the nonvolatile GPRs (r14 - r31) as saved
away in power7_idle, but in fact the values have been trashed because
the thread was used for KVM in the mean time.  The result is that the
thread crashes because code that called power7_idle (e.g.,
pnv_smp_cpu_kill_self()) goes to use values in registers that have
been trashed.

The bit field in SRR1 that tells whether state was lost only reflects
the most recent nap, which may not have been the nap instruction in
power7_idle.  So we need an extra PACA field to indicate that state
has been lost even if SRR1 indicates that the most recent nap didn't
lose state.  We clear this field when saving the state in power7_idle,
we set it to a non-zero value when we use the thread for KVM, and we
test it in power7_wakeup_noloss.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

2fde6d20

20 9月, 2011 1 次提交

powerpc/powernv: Machine check and other system interrupts · ed79ba9e

由 Benjamin Herrenschmidt 提交于 9月 19, 2011

OPAL can handle various interrupt for us such as Machine Checks (it
performs all sorts of recovery tasks and passes back control to us with
informations about the error), Hardware Management Interrupts and Softpatch
interrupts.

This wires up the mechanisms and prints out specific informations returned
by HAL when a machine check occurs.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

ed79ba9e

12 7月, 2011 2 次提交

KVM: PPC: Add support for Book3S processors in hypervisor mode · de56a948

由 Paul Mackerras 提交于 6月 29, 2011

This adds support for KVM running on 64-bit Book 3S processors,
specifically POWER7, in hypervisor mode. Using hypervisor mode means
that the guest can use the processor's supervisor mode. That means
that the guest can execute privileged instructions and access privileged
registers itself without trapping to the host. This gives excellent
performance, but does mean that KVM cannot emulate a processor
architecture other than the one that the hardware implements.

This code assumes that the guest is running paravirtualized using the
PAPR (Power Architecture Platform Requirements) interface, which is the
interface that IBM's PowerVM hypervisor uses. That means that existing
Linux distributions that run on IBM pSeries machines will also run
under KVM without modification. In order to communicate the PAPR
hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code
to include/linux/kvm.h.

Currently the choice between book3s_hv support and book3s_pr support
(i.e. the existing code, which runs the guest in user mode) has to be
made at kernel configuration time, so a given kernel binary can only
do one or the other.

This new book3s_hv code doesn't support MMIO emulation at present.
Since we are running paravirtualized guests, this isn't a serious
restriction.

With the guest running in supervisor mode, most exceptions go straight
to the guest. We will never get data or instruction storage or segment
interrupts, alignment interrupts, decrementer interrupts, program
interrupts, single-step interrupts, etc., coming to the hypervisor from
the guest. Therefore this introduces a new KVMTEST_NONHV macro for the
exception entry path so that we don't have to do the KVM test on entry
to those exception handlers.

We do however get hypervisor decrementer, hypervisor data storage,
hypervisor instruction storage, and hypervisor emulation assist
interrupts, so we have to handle those.

In hypervisor mode, real-mode accesses can access all of RAM, not just
a limited amount. Therefore we put all the guest state in the vcpu.arch
and use the shadow_vcpu in the PACA only for temporary scratch space.
We allocate the vcpu with kzalloc rather than vzalloc, and we don't use
anything in the kvmppc_vcpu_book3s struct, so we don't allocate it.
We don't have a shared page with the guest, but we still need a
kvm_vcpu_arch_shared struct to store the values of various registers,
so we include one in the vcpu_arch struct.

The POWER7 processor has a restriction that all threads in a core have
to be in the same partition. MMU-on kernel code counts as a partition
(partition 0), so we have to do a partition switch on every entry to and
exit from the guest. At present we require the host and guest to run
in single-thread mode because of this hardware restriction.

This code allocates a hashed page table for the guest and initializes
it with HPTEs for the guest's Virtual Real Memory Area (VRMA). We
require that the guest memory is allocated using 16MB huge pages, in
order to simplify the low-level memory management. This also means that
we can get away without tracking paging activity in the host for now,
since huge pages can't be paged or swapped.

This also adds a few new exports needed by the book3s_hv code.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

de56a948

KVM: PPC: Split host-state fields out of kvmppc_book3s_shadow_vcpu · 3c42bf8a

由 Paul Mackerras 提交于 6月 29, 2011

There are several fields in struct kvmppc_book3s_shadow_vcpu that
temporarily store bits of host state while a guest is running,
rather than anything relating to the particular guest or vcpu.
This splits them out into a new kvmppc_host_state structure and
modifies the definitions in asm-offsets.c to suit.

On 32-bit, we have a kvmppc_host_state structure inside the
kvmppc_book3s_shadow_vcpu since the assembly code needs to be able
to get to them both with one pointer.  On 64-bit they are separate
fields in the PACA.  This means that on 64-bit we don't need to
copy the kvmppc_host_state in and out on vcpu load/unload, and
in future will mean that the book3s_hv code doesn't need a
shadow_vcpu struct in the PACA at all.  That does mean that we
have to be careful not to rely on any values persisting in the
hstate field of the paca across any point where we could block
or get preempted.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

3c42bf8a

29 6月, 2011 1 次提交

powerpc/book3e-64: use a separate TLB handler when linear map is bolted · f67f4ef5

由 Scott Wood 提交于 6月 22, 2011

On MMUs such as FSL where we can guarantee the entire linear mapping is
bolted, we don't need to worry about linear TLB misses.  If on top of
that we do a full table walk, we get rid of all recursive TLB faults, and
can dispense with some state saving.  This gains a few percent on
TLB-miss-heavy workloads, and around 50% on a benchmark that had a high
rate of virtual page table faults under the normal handler.

While touching the EX_TLB layout, remove EX_TLB_MMUCR0, EX_TLB_SRR0, and
EX_TLB_SRR1 as they're not used.

[BenH: Fixed build with 64K pages (wsp config)]
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

f67f4ef5

04 5月, 2011 1 次提交

powerpc: Save Come-From Address Register (CFAR) in exception frame · 48404f2e

由 Paul Mackerras 提交于 5月 01, 2011

Recent 64-bit server processors (POWER6 and POWER7) have a "Come-From
Address Register" (CFAR), that records the address of the most recent
branch or rfid (return from interrupt) instruction for debugging purposes.

This saves the value of the CFAR in the exception entry code and stores
it in the exception frame. We also make xmon print the CFAR value in
its register dump code.

Rather than extend the pt_regs struct at this time, we steal the orig_gpr3
field, which is only used for system calls, and use it for the CFAR value
for all exceptions/interrupts other than system calls. This means we
don't save the CFAR on system calls, which is not a great problem since
system calls tend not to happen unexpectedly, and also avoids adding the
overhead of reading the CFAR to the system call entry path.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

48404f2e

27 4月, 2011 1 次提交

powerpc/book3e: Fix extlb size · a7b8ad40

由 Michael Ellerman 提交于 4月 07, 2011

The calculation of the size for the exception save area of the TLB
miss handler is wrong, luckily it's too big not too small.

Rework it to make it a bit clearer, and also correct. We want 3 save
areas, each EX_TLB_SIZE _bytes_.
Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
Acked-by: NKumar Gala <galak@kernel.crashing.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

a7b8ad40

20 4月, 2011 1 次提交

powerpc: Add NAP mode support on Power7 in HV mode · 948cf67c

由 Benjamin Herrenschmidt 提交于 1月 24, 2011

Wakeup comes from the system reset handler with a potential loss of
the non-hypervisor CPU state. We save the non-volatile state on the
stack and a pointer to it in the PACA, which the system reset handler
uses to restore things
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

948cf67c

19 10月, 2010 1 次提交

irq_work: Add generic hardirq context callbacks · e360adbe

由 Peter Zijlstra 提交于 10月 14, 2010

Provide a mechanism that allows running code in IRQ context. It is
most useful for NMI code that needs to interact with the rest of the
system -- like wakeup a task to drain buffers.

Perf currently has such a mechanism, so extract that and provide it as
a generic feature, independent of perf so that others may also
benefit.

The IRQ context callback is generated through self-IPIs where
possible, or on architectures like powerpc the decrementer (the
built-in timer facility) is set to generate an interrupt immediately.

Architectures that don't have anything like this get to do with a
callback from the timer tick. These architectures can call
irq_work_run() at the tail of any IRQ handlers that might enqueue such
work (like the perf IRQ handler) to avoid undue latencies in
processing the work.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NKyle McMartin <kyle@mcmartin.ca>
Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
[ various fixes ]
Signed-off-by: NHuang Ying <ying.huang@intel.com>
LKML-Reference: <1287036094.7768.291.camel@yhuang-dev>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e360adbe

02 9月, 2010 1 次提交

powerpc: Account time using timebase rather than PURR · cf9efce0

由 Paul Mackerras 提交于 8月 26, 2010

Currently, when CONFIG_VIRT_CPU_ACCOUNTING is enabled, we use the
PURR register for measuring the user and system time used by
processes, as well as other related times such as hardirq and
softirq times.  This turns out to be quite confusing for users
because it means that a program will often be measured as taking
less time when run on a multi-threaded processor (SMT2 or SMT4 mode)
than it does when run on a single-threaded processor (ST mode), even
though the program takes longer to finish.  The discrepancy is
accounted for as stolen time, which is also confusing, particularly
when there are no other partitions running.

This changes the accounting to use the timebase instead, meaning that
the reported user and system times are the actual number of real-time
seconds that the program was executing on the processor thread,
regardless of which SMT mode the processor is in.  Thus a program will
generally show greater user and system times when run on a
multi-threaded processor than on a single-threaded processor.

On pSeries systems on POWER5 or later processors, we measure the
stolen time (time when this partition wasn't running) using the
hypervisor dispatch trace log.  We check for new entries in the
log on every entry from user mode and on every transition from
kernel process context to soft or hard IRQ context (i.e. when
account_system_vtime() gets called).  So that we can correctly
distinguish time stolen from user time and time stolen from system
time, without having to check the log on every exit to user mode,
we store separate timestamps for exit to user mode and entry from
user mode.

On systems that have a SPURR (POWER6 and POWER7), we read the SPURR
in account_system_vtime() (as before), and then apportion the SPURR
ticks since the last time we read it between scaled user time and
scaled system time according to the relative proportions of user
time and system time over the same interval.  This avoids having to
read the SPURR on every kernel entry and exit.  On systems that have
PURR but not SPURR (i.e., POWER5), we do the same using the PURR
rather than the SPURR.

This disables the DTL user interface in /sys/debug/kernel/powerpc/dtl
for now since it conflicts with the use of the dispatch trace log
by the time accounting code.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

cf9efce0

31 7月, 2010 1 次提交

powerpc/kexec: Switch to a static PACA on the way out · fc53b420

由 Matt Evans 提交于 7月 07, 2010

With dynamic PACAs, the kexecing CPU's PACA won't lie within the kernel
static data and there is a chance that something may stomp it when preparing
to kexec.  This patch switches this final CPU to a static PACA just before
we pull the switch.
Signed-off-by: NMatt Evans <matt@ozlabs.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

fc53b420

21 5月, 2010 1 次提交

powerpc/kexec: Fix race in kexec shutdown · 1fc711f7

由 Michael Neuling 提交于 5月 13, 2010

In kexec_prepare_cpus, the primary CPU IPIs the secondary CPUs to
kexec_smp_down().  kexec_smp_down() calls kexec_smp_wait() which sets
the hw_cpu_id() to -1.  The primary does this while leaving IRQs on
which means the primary can take a timer interrupt which can lead to
the IPIing one of the secondary CPUs (say, for a scheduler re-balance)
but since the secondary CPU now has a hw_cpu_id = -1, we IPI CPU
-1... Kaboom!

We are hitting this case regularly on POWER7 machines.

There is also a second race, where the primary will tear down the MMU
mappings before knowing the secondaries have entered real mode.

Also, the secondaries are clearing out any pending IPIs before
guaranteeing that no more will be received.

This changes kexec_prepare_cpus() so that we turn off IRQs in the
primary CPU much earlier.  It adds a paca flag to say that the
secondaries have entered the kexec_smp_down() IPI and turned off IRQs,
rather than overloading hw_cpu_id with -1.  This new paca flag is
again used to in indicate when the secondaries has entered real mode.

It also ensures that all CPUs have their IRQs off before we clear out
any pending IPI requests (in kexec_cpu_down()) to ensure there are no
trailing IPIs left unacknowledged.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

1fc711f7

17 5月, 2010 3 次提交

KVM: PPC: Use now shadowed vcpu fields · 0604675f

由 Alexander Graf 提交于 4月 16, 2010

The shadow vcpu now contains some fields we don't use from the vcpu anymore.
Access to them happens using inline functions that happily use the shadow
vcpu fields.

So let's now ifdef them out to booke only and add asm-offsets.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

0604675f

KVM: PPC: Use KVM_BOOK3S_HANDLER · c14dea04

由 Alexander Graf 提交于 4月 16, 2010

So far we had a lot of conditional code on CONFIG_KVM_BOOK3S_64_HANDLER.
As we're moving towards common code between 32 and 64 bits, most of
these ifdefs can be moved to a more generic term define, called
CONFIG_KVM_BOOK3S_HANDLER.

This patch adds the new generic config option and moves ifdefs over.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

c14dea04

KVM: PPC: Name generic 64-bit code generic · 2191d657

由 Alexander Graf 提交于 4月 16, 2010

We have quite some code that can be used by Book3S_32 and Book3S_64 alike,
so let's call it "Book3S" instead of "Book3S_64", so we can later on
use it from the 32 bit port too.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

2191d657

09 3月, 2010 1 次提交

powerpc: Dynamically allocate pacas · 1426d5a3

由 Michael Ellerman 提交于 1月 28, 2010

On 64-bit kernels we currently have a 512 byte struct paca_struct for
each cpu (usually just called "the paca"). Currently they are statically
allocated, which means a kernel built for a large number of cpus will
waste a lot of space if it's booted on a machine with few cpus.

We can avoid that by only allocating the number of pacas we need at
boot. However this is complicated by the fact that we need to access
the paca before we know how many cpus there are in the system.

The solution is to dynamically allocate enough space for NR_CPUS pacas,
but then later in boot when we know how many cpus we have, we free any
unused pacas.
Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

1426d5a3

01 3月, 2010 1 次提交

KVM: PPC: Use PACA backed shadow vcpu · 7e57cba0

由 Alexander Graf 提交于 1月 08, 2010

We're being horribly racy right now. All the entry and exit code hijacks
random fields from the PACA that could easily be used by different code in
case we get interrupted, for example by a #MC or even page fault.

After discussing this with Ben, we figured it's best to reserve some more
space in the PACA and just shove off some vcpu state to there.

That way we can drastically improve the readability of the code, make it
less racy and less complex.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

7e57cba0

05 11月, 2009 1 次提交

Add fields to PACA · 4b7ae55d

由 Alexander Graf 提交于 10月 30, 2009

For KVM we need to store some information in the PACA, so we
need to extend it.

This patch adds KVM SLB shadow related entries to the PACA and
a field that indicates if we're inside a guest.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

4b7ae55d

21 9月, 2009 2 次提交

perf: Tidy up after the big rename · 57c0c15b

由 Ingo Molnar 提交于 9月 21, 2009

 - provide compatibility Kconfig entry for existing PERF_COUNTERS .config's

 - provide courtesy copy of old perf_counter.h, for user-space projects

 - small indentation fixups

 - fix up MAINTAINERS

 - fix small x86 printout fallout

 - fix up small PowerPC comment fallout (use 'counter' as in register)
Reviewed-by: NArjan van de Ven <arjan@linux.intel.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

57c0c15b

perf: Do the big rename: Performance Counters -> Performance Events · cdd6c482

由 Ingo Molnar 提交于 9月 21, 2009

Bye-bye Performance Counters, welcome Performance Events!

In the past few months the perfcounters subsystem has grown out its
initial role of counting hardware events, and has become (and is
becoming) a much broader generic event enumeration, reporting, logging,
monitoring, analysis facility.

Naming its core object 'perf_counter' and naming the subsystem
'perfcounters' has become more and more of a misnomer. With pending
code like hw-breakpoints support the 'counter' name is less and
less appropriate.

All in one, we've decided to rename the subsystem to 'performance
events' and to propagate this rename through all fields, variables
and API names. (in an ABI compatible fashion)

The word 'event' is also a bit shorter than 'counter' - which makes
it slightly more convenient to write/handle as well.

Thanks goes to Stephane Eranian who first observed this misnomer and
suggested a rename.

User-space tooling and ABI compatibility is not affected - this patch
should be function-invariant. (Also, defconfigs were not touched to
keep the size down.)

This patch has been generated via the following script:

  FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

  sed -i \
    -e 's/PERF_EVENT_/PERF_RECORD_/g' \
    -e 's/PERF_COUNTER/PERF_EVENT/g' \
    -e 's/perf_counter/perf_event/g' \
    -e 's/nb_counters/nb_events/g' \
    -e 's/swcounter/swevent/g' \
    -e 's/tpcounter_event/tp_event/g' \
    $FILES

  for N in $(find . -name perf_counter.[ch]); do
    M=$(echo $N | sed 's/perf_counter/perf_event/g')
    mv $N $M
  done

  FILES=$(find . -name perf_event.*)

  sed -i \
    -e 's/COUNTER_MASK/REG_MASK/g' \
    -e 's/COUNTER/EVENT/g' \
    -e 's/\<event\>/event_id/g' \
    -e 's/counter/event/g' \
    -e 's/Counter/Event/g' \
    $FILES

... to keep it as correct as possible. This script can also be
used by anyone who has pending perfcounters patches - it converts
a Linux kernel tree over to the new naming. We tried to time this
change to the point in time where the amount of pending patches
is the smallest: the end of the merge window.

Namespace clashes were fixed up in a preparatory patch - and some
stylistic fallout will be fixed up in a subsequent patch.

( NOTE: 'counters' are still the proper terminology when we deal
  with hardware registers - and these sed scripts are a bit
  over-eager in renaming them. I've undone some of that, but
  in case there's something left where 'counter' would be
  better than 'event' we can undo that on an individual basis
  instead of touching an otherwise nicely automated patch. )
Suggested-by: NStephane Eranian <eranian@google.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NPaul Mackerras <paulus@samba.org>
Reviewed-by: NArjan van de Ven <arjan@linux.intel.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

cdd6c482

20 8月, 2009 1 次提交

powerpc: Add PACA fields specific to 64-bit Book3E processors · dce6670a

由 Benjamin Herrenschmidt 提交于 7月 23, 2009

This adds various fields in the PACA that are for use specifically
by Book3E processors, such as exception save areas, current pgd
pointer, special exceptions kernel stacks etc...
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

dce6670a

09 6月, 2009 1 次提交

powerpc: Separate PACA fields for server CPUs · 91c60b5b

由 Benjamin Herrenschmidt 提交于 6月 02, 2009

This patch has no effect other than re-ordering PACA fields on
current server CPUs. It however is a pre-requisite for future
support of BookE 64-bit processors. Various parts of the PACA
struct are now moved under some ifdef's, either the new
CONFIG_PPC_BOOK3S or CONFIG_PPC_STD_MMU_64, whatever seems more
appropriate.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.craashing.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

91c60b5b

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功