提交 · 1a040b26c5c915b317103b87ae7006d40443f197 · OpenHarmony / kernel_linux

29 11月, 2010 1 次提交

powerpc: Remove second definition of STACK_FRAME_OVERHEAD · 46f52210

由 Stephen Rothwell 提交于 11月 18, 2010

Since STACK_FRAME_OVERHEAD is defined in asm/ptrace.h and that
is ASSEMBER safe, we can just include that instead of going via
asm-offsets.h.
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

46f52210

24 10月, 2010 5 次提交

KVM: PPC: Add mtsrin PV code · cbe487fa

由 Alexander Graf 提交于 8月 03, 2010

This is the guest side of the mtsr acceleration. Using this a guest can now
call mtsrin with almost no overhead as long as it ensures that it only uses
it with (MSR_IR|MSR_DR) == 0. Linux does that, so we're good.
Signed-off-by: NAlexander Graf <agraf@suse.de>

cbe487fa

KVM: PPC: Fix CONFIG_KVM_GUEST && !CONFIG_KVM case · 989044ee

由 Alexander Graf 提交于 8月 30, 2010

When CONFIG_KVM_GUEST is selected, but CONFIG_KVM is not, we were missing
some defines in asm-offsets.c and included too many headers at other places.

This patch makes above configuration work.
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

989044ee

KVM: PPC: Generic KVM PV guest support · d17051cb

由 Alexander Graf 提交于 7月 29, 2010

We have all the hypervisor pieces in place now, but the guest parts are still
missing.

This patch implements basic awareness of KVM when running Linux as guest. It
doesn't do anything with it yet though.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d17051cb

KVM: PPC: Convert MSR to shared page · 666e7252

由 Alexander Graf 提交于 7月 29, 2010

One of the most obvious registers to share with the guest directly is the
MSR. The MSR contains the "interrupts enabled" flag which the guest has to
toggle in critical sections.

So in order to bring the overhead of interrupt en- and disabling down, let's
put msr into the shared page. Keep in mind that even though you can fully read
its contents, writing to it doesn't always update all state. There are a few
safe fields that don't require hypervisor interaction. See the documentation
for a list of MSR bits that are safe to be set from inside the guest.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

666e7252

KVM: PPC: Introduce shared page · 96bc451a

由 Alexander Graf 提交于 7月 29, 2010

For transparent variable sharing between the hypervisor and guest, I introduce
a shared page. This shared page will contain all the registers the guest can
read and write safely without exiting guest context.

This patch only implements the stubs required for the basic structure of the
shared page. The actual register moving follows.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

96bc451a

14 10月, 2010 1 次提交

powerpc/fsl-booke64: Use TLB CAMs to cover linear mapping on FSL 64-bit chips · 55fd766b

由 Kumar Gala 提交于 10月 16, 2009

On Freescale parts typically have TLB array for large mappings that we can
bolt the linear mapping into.  We utilize the code that already exists
on PPC32 on the 64-bit side to setup the linear mapping to be cover by
bolted TLB entries.  We utilize a quarter of the variable size TLB array
for this purpose.

Additionally, we limit the amount of memory to what we can cover via
bolted entries so we don't get secondary faults in the TLB miss
handlers.  We should fix this limitation in the future.
Signed-off-by: NKumar Gala <galak@kernel.crashing.org>

55fd766b

02 9月, 2010 1 次提交

powerpc: Account time using timebase rather than PURR · cf9efce0

由 Paul Mackerras 提交于 8月 26, 2010

Currently, when CONFIG_VIRT_CPU_ACCOUNTING is enabled, we use the
PURR register for measuring the user and system time used by
processes, as well as other related times such as hardirq and
softirq times.  This turns out to be quite confusing for users
because it means that a program will often be measured as taking
less time when run on a multi-threaded processor (SMT2 or SMT4 mode)
than it does when run on a single-threaded processor (ST mode), even
though the program takes longer to finish.  The discrepancy is
accounted for as stolen time, which is also confusing, particularly
when there are no other partitions running.

This changes the accounting to use the timebase instead, meaning that
the reported user and system times are the actual number of real-time
seconds that the program was executing on the processor thread,
regardless of which SMT mode the processor is in.  Thus a program will
generally show greater user and system times when run on a
multi-threaded processor than on a single-threaded processor.

On pSeries systems on POWER5 or later processors, we measure the
stolen time (time when this partition wasn't running) using the
hypervisor dispatch trace log.  We check for new entries in the
log on every entry from user mode and on every transition from
kernel process context to soft or hard IRQ context (i.e. when
account_system_vtime() gets called).  So that we can correctly
distinguish time stolen from user time and time stolen from system
time, without having to check the log on every exit to user mode,
we store separate timestamps for exit to user mode and entry from
user mode.

On systems that have a SPURR (POWER6 and POWER7), we read the SPURR
in account_system_vtime() (as before), and then apportion the SPURR
ticks since the last time we read it between scaled user time and
scaled system time according to the relative proportions of user
time and system time over the same interval.  This avoids having to
read the SPURR on every kernel entry and exit.  On systems that have
PURR but not SPURR (i.e., POWER5), we do the same using the PURR
rather than the SPURR.

This disables the DTL user interface in /sys/debug/kernel/powerpc/dtl
for now since it conflicts with the use of the dispatch trace log
by the time accounting code.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

cf9efce0

29 7月, 2010 1 次提交

powerpc: Rework VDSO gettimeofday to prevent time going backwards · 0e469db8

由 Paul Mackerras 提交于 6月 20, 2010

Currently it is possible for userspace to see the result of
gettimeofday() going backwards by 1 microsecond, assuming that
userspace is using the gettimeofday() in the VDSO.  The VDSO
gettimeofday() algorithm computes the time in "xsecs", which are
units of 2^-20 seconds, or approximately 0.954 microseconds,
using the algorithm

	now = (timebase - tb_orig_stamp) * tb_to_xs + stamp_xsec

and then converts the time in xsecs to seconds and microseconds.

The kernel updates the tb_orig_stamp and stamp_xsec values every
tick in update_vsyscall().  If the length of the tick is not an
integer number of xsecs, then some precision is lost in converting
the current time to xsecs.  For example, with CONFIG_HZ=1000, the
tick is 1ms long, which is 1048.576 xsecs.  That means that
stamp_xsec will advance by either 1048 or 1049 on each tick.
With the right conditions, it is possible for userspace to get
(timebase - tb_orig_stamp) * tb_to_xs being 1049 if the kernel is
slightly late in updating the vdso_datapage, and then for stamp_xsec
to advance by 1048 when the kernel does update it, and for userspace
to then see (timebase - tb_orig_stamp) * tb_to_xs being zero due to
integer truncation.  The result is that time appears to go backwards
by 1 microsecond.

To fix this we change the VDSO gettimeofday to use a new field in the
VDSO datapage which stores the nanoseconds part of the time as a
fractional number of seconds in a 0.32 binary fraction format.
(Or put another way, as a 32-bit number in units of 0.23283 ns.)
This is convenient because we can use the mulhwu instruction to
convert it to either microseconds or nanoseconds.

Since it turns out that computing the time of day using this new field
is simpler than either using stamp_xsec (as gettimeofday does) or
stamp_xtime.tv_nsec (as clock_gettime does), this converts both
gettimeofday and clock_gettime to use the new field.  The existing
__do_get_tspec function is converted to use the new field and take
a parameter in r7 that indicates the desired resolution, 1,000,000
for microseconds or 1,000,000,000 for nanoseconds.  The __do_get_xsec
function is then unused and is deleted.

The new algorithm is

	now = ((timebase - tb_orig_stamp) << 12) * tb_to_xs
		+ (stamp_xtime_seconds << 32) + stamp_sec_fraction

with 'now' in units of 2^-32 seconds.  That is then converted to
seconds and either microseconds or nanoseconds with

	seconds = now >> 32
	partseconds = ((now & 0xffffffff) * resolution) >> 32

The 32-bit VDSO code also makes a further simplification: it ignores
the bottom 32 bits of the tb_to_xs value, which is a 0.64 format binary
fraction.  Doing so gets rid of 4 multiply instructions.  Assuming
a timebase frequency of 1GHz or less and an update interval of no
more than 10ms, the upper 32 bits of tb_to_xs will be at least
4503599, so the error from ignoring the low 32 bits will be at most
2.2ns, which is more than an order of magnitude less than the time
taken to do gettimeofday or clock_gettime on our fastest processors,
so there is no possibility of seeing inconsistent values due to this.

This also moves update_gtod() down next to its only caller, and makes
update_vsyscall use the time passed in via the wall_time argument rather
than accessing xtime directly.  At present, wall_time always points to
xtime, but that could change in future.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

0e469db8

09 7月, 2010 2 次提交

powerpc: Optimise per cpu accesses on 64bit · ae01f84b

由 Anton Blanchard 提交于 5月 31, 2010

Now we dynamically allocate the paca array, it takes an extra load
whenever we want to access another cpu's paca. One place we do that a lot
is per cpu variables. A simple example:

DEFINE_PER_CPU(unsigned long, vara);
unsigned long test4(int cpu)
{
	return per_cpu(vara, cpu);
}

This takes 4 loads, 5 if you include the actual load of the per cpu variable:

    ld r11,-32760(r30)  # load address of paca pointer
    ld r9,-32768(r30)   # load link address of percpu variable
    sldi r3,r29,9       # get offset into paca (each entry is 512 bytes)
    ld r0,0(r11)        # load paca pointer
    add r3,r0,r3        # paca + offset
    ld r11,64(r3)       # load paca[cpu].data_offset

    ldx r3,r9,r11       # load per cpu variable

If we remove the ppc64 specific per_cpu_offset(), we get the generic one
which indexes into a statically allocated array. This removes one load and
one add:

    ld r11,-32760(r30)  # load address of __per_cpu_offset
    ld r9,-32768(r30)   # load link address of percpu variable
    sldi r3,r29,3       # get offset into __per_cpu_offset (each entry 8 bytes)
    ldx r11,r11,r3      # load __per_cpu_offset[cpu]

    ldx r3,r9,r11       # load per cpu variable

Having all the offsets in one array also helps when iterating over a per cpu
variable across a number of cpus, such as in the scheduler. Before we would
need to load one paca cacheline when calculating each per cpu offset. Now we
have 16 (128 / sizeof(long)) per cpu offsets in each cacheline.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

ae01f84b

powerpc: Rework VDSO gettimeofday to prevent time going backwards · 8fd63a9e

由 Paul Mackerras 提交于 6月 20, 2010

Currently it is possible for userspace to see the result of
gettimeofday() going backwards by 1 microsecond, assuming that
userspace is using the gettimeofday() in the VDSO.  The VDSO
gettimeofday() algorithm computes the time in "xsecs", which are
units of 2^-20 seconds, or approximately 0.954 microseconds,
using the algorithm

	now = (timebase - tb_orig_stamp) * tb_to_xs + stamp_xsec

and then converts the time in xsecs to seconds and microseconds.

The kernel updates the tb_orig_stamp and stamp_xsec values every
tick in update_vsyscall().  If the length of the tick is not an
integer number of xsecs, then some precision is lost in converting
the current time to xsecs.  For example, with CONFIG_HZ=1000, the
tick is 1ms long, which is 1048.576 xsecs.  That means that
stamp_xsec will advance by either 1048 or 1049 on each tick.
With the right conditions, it is possible for userspace to get
(timebase - tb_orig_stamp) * tb_to_xs being 1049 if the kernel is
slightly late in updating the vdso_datapage, and then for stamp_xsec
to advance by 1048 when the kernel does update it, and for userspace
to then see (timebase - tb_orig_stamp) * tb_to_xs being zero due to
integer truncation.  The result is that time appears to go backwards
by 1 microsecond.

To fix this we change the VDSO gettimeofday to use a new field in the
VDSO datapage which stores the nanoseconds part of the time as a
fractional number of seconds in a 0.32 binary fraction format.
(Or put another way, as a 32-bit number in units of 0.23283 ns.)
This is convenient because we can use the mulhwu instruction to
convert it to either microseconds or nanoseconds.

Since it turns out that computing the time of day using this new field
is simpler than either using stamp_xsec (as gettimeofday does) or
stamp_xtime.tv_nsec (as clock_gettime does), this converts both
gettimeofday and clock_gettime to use the new field.  The existing
__do_get_tspec function is converted to use the new field and take
a parameter in r7 that indicates the desired resolution, 1,000,000
for microseconds or 1,000,000,000 for nanoseconds.  The __do_get_xsec
function is then unused and is deleted.

The new algorithm is

	now = ((timebase - tb_orig_stamp) << 12) * tb_to_xs
		+ (stamp_xtime_seconds << 32) + stamp_sec_fraction

with 'now' in units of 2^-32 seconds.  That is then converted to
seconds and either microseconds or nanoseconds with

	seconds = now >> 32
	partseconds = ((now & 0xffffffff) * resolution) >> 32

The 32-bit VDSO code also makes a further simplification: it ignores
the bottom 32 bits of the tb_to_xs value, which is a 0.64 format binary
fraction.  Doing so gets rid of 4 multiply instructions.  Assuming
a timebase frequency of 1GHz or less and an update interval of no
more than 10ms, the upper 32 bits of tb_to_xs will be at least
4503599, so the error from ignoring the low 32 bits will be at most
2.2ns, which is more than an order of magnitude less than the time
taken to do gettimeofday or clock_gettime on our fastest processors,
so there is no possibility of seeing inconsistent values due to this.

This also moves update_gtod() down next to its only caller, and makes
update_vsyscall use the time passed in via the wall_time argument rather
than accessing xtime directly.  At present, wall_time always points to
xtime, but that could change in future.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

8fd63a9e

21 5月, 2010 1 次提交

powerpc/kexec: Fix race in kexec shutdown · 1fc711f7

由 Michael Neuling 提交于 5月 13, 2010

In kexec_prepare_cpus, the primary CPU IPIs the secondary CPUs to
kexec_smp_down().  kexec_smp_down() calls kexec_smp_wait() which sets
the hw_cpu_id() to -1.  The primary does this while leaving IRQs on
which means the primary can take a timer interrupt which can lead to
the IPIing one of the secondary CPUs (say, for a scheduler re-balance)
but since the secondary CPU now has a hw_cpu_id = -1, we IPI CPU
-1... Kaboom!

We are hitting this case regularly on POWER7 machines.

There is also a second race, where the primary will tear down the MMU
mappings before knowing the secondaries have entered real mode.

Also, the secondaries are clearing out any pending IPIs before
guaranteeing that no more will be received.

This changes kexec_prepare_cpus() so that we turn off IRQs in the
primary CPU much earlier.  It adds a paca flag to say that the
secondaries have entered the kexec_smp_down() IPI and turned off IRQs,
rather than overloading hw_cpu_id with -1.  This new paca flag is
again used to in indicate when the secondaries has entered real mode.

It also ensures that all CPUs have their IRQs off before we clear out
any pending IPI requests (in kexec_cpu_down()) to ensure there are no
trailing IPIs left unacknowledged.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

1fc711f7

17 5月, 2010 5 次提交

powerpc/fsl-booke: Move loadcam_entry back to asm code to fix SMP ftrace · 78f62237

由 Kumar Gala 提交于 5月 13, 2010

When we build with ftrace enabled its possible that loadcam_entry would
have used the stack pointer (even though the code doesn't need it). We
call loadcam_entry in __secondary_start before the stack is setup. To
ensure that loadcam_entry doesn't use the stack pointer the easiest
solution is to just have it in asm code.
Signed-off-by: NKumar Gala <galak@kernel.crashing.org>

78f62237

PPC: Export SWITCH_FRAME_SIZE · 218d169c

由 Alexander Graf 提交于 4月 16, 2010

We need the SWITCH_FRAME_SIZE define on Book3S_32 now too.
So let's export it unconditionally.

CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NAvi Kivity <avi@redhat.com>

218d169c

KVM: PPC: Add SVCPU to Book3S_32 · 97e49255

由 Alexander Graf 提交于 4月 16, 2010

We need to keep the pointer to the shadow vcpu somewhere accessible from
within really early interrupt code. The best fit I found was the thread
struct, as that resides in an SPRG.

So let's put a pointer to the shadow vcpu in the thread struct and add
an asm-offset so we can find it.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

97e49255

KVM: PPC: Use now shadowed vcpu fields · 0604675f

由 Alexander Graf 提交于 4月 16, 2010

The shadow vcpu now contains some fields we don't use from the vcpu anymore.
Access to them happens using inline functions that happily use the shadow
vcpu fields.

So let's now ifdef them out to booke only and add asm-offsets.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

0604675f

KVM: PPC: Use CONFIG_PPC_BOOK3S define · 00c3a37c

由 Alexander Graf 提交于 4月 16, 2010

Upstream recently added a new name for PPC64: Book3S_64.

So instead of using CONFIG_PPC64 we should use CONFIG_PPC_BOOK3S consotently.
That makes understanding the code easier (I hope).
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

00c3a37c

14 5月, 2010 1 次提交

powerpc/fsl-booke: Move loadcam_entry back to asm code to fix SMP ftrace · 500a0e56

由 Kumar Gala 提交于 5月 13, 2010

500a0e56

12 5月, 2010 1 次提交

powerpc/perf_event: Fix oops due to perf_event_do_pending call · 0fe1ac48

由 Paul Mackerras 提交于 4月 13, 2010

Anton Blanchard found that large POWER systems would occasionally
crash in the exception exit path when profiling with perf_events.
The symptom was that an interrupt would occur late in the exit path
when the MSR[RI] (recoverable interrupt) bit was clear.  Interrupts
should be hard-disabled at this point but they were enabled.  Because
the interrupt was not recoverable the system panicked.

The reason is that the exception exit path was calling
perf_event_do_pending after hard-disabling interrupts, and
perf_event_do_pending will re-enable interrupts.

The simplest and cleanest fix for this is to use the same mechanism
that 32-bit powerpc does, namely to cause a self-IPI by setting the
decrementer to 1.  This means we can remove the tests in the exception
exit path and raw_local_irq_restore.

This also makes sure that the call to perf_event_do_pending from
timer_interrupt() happens within irq_enter/irq_exit.  (Note that
calling perf_event_do_pending from timer_interrupt does not mean that
there is a possible 1/HZ latency; setting the decrementer to 1 ensures
that the timer interrupt will happen immediately, i.e. within one
timebase tick, which is a few nanoseconds or 10s of nanoseconds.)
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Cc: stable@kernel.org
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

0fe1ac48

01 3月, 2010 3 次提交

KVM: PPC: Keep SRR1 flags around in shadow_msr · f7adbba1

由 Alexander Graf 提交于 1月 15, 2010

SRR1 stores more information that just the MSR value. It also stores
valuable information about the type of interrupt we received, for
example whether the storage interrupt we just got was because of a
missing htab entry or not.

We use that information to speed up the exit path.

Now if we get preempted before we can interpret the shadow_msr values,
we get into vcpu_put which then calls the MSR handler, which then sets
all the SRR1 information bits in shadow_msr to 0. Great.

So let's preserve the SRR1 specific bits in shadow_msr whenever we set
the MSR. They don't hurt.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

f7adbba1

KVM: PPC: Call SLB patching code in interrupt safe manner · 021ec9c6

由 Alexander Graf 提交于 1月 08, 2010

Currently we're racy when doing the transition from IR=1 to IR=0, from
the module memory entry code to the real mode SLB switching code.

To work around that I took a look at the RTAS entry code which is faced
with a similar problem and did the same thing:

  A small helper in linear mapped memory that does mtmsr with IR=0 and
  then RFIs info the actual handler.

Thanks to that trick we can safely take page faults in the entry code
and only need to be really wary of what to do as of the SLB switching
part.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

021ec9c6

KVM: PPC: Use PACA backed shadow vcpu · 7e57cba0

由 Alexander Graf 提交于 1月 08, 2010

We're being horribly racy right now. All the entry and exit code hijacks
random fields from the PACA that could easily be used by different code in
case we get interrupted, for example by a #MC or even page fault.

After discussing this with Ben, we figured it's best to reserve some more
space in the PACA and just shove off some vcpu state to there.

That way we can drastically improve the readability of the code, make it
less racy and less complex.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

7e57cba0

21 11月, 2009 1 次提交

powerpc/fsl-booke: Rework TLB CAM code · 8b27f0b6

由 Kumar Gala 提交于 10月 15, 2009

Re-write the code so its more standalone and fixed some issues:
* Bump'd # of CAM entries to 64 to support e500mc
* Make the code handle MAS7 properly
* Use pr_cont instead of creating a string as we go
Signed-off-by: NKumar Gala <galak@kernel.crashing.org>

8b27f0b6

05 11月, 2009 2 次提交

Export new PACA constants in asm-offsets · 55c75884

由 Alexander Graf 提交于 10月 30, 2009

In order to access fields in the PACA from assembly code, we need
to generate offsets using asm-offsets.c.

So let's add the new PACA related bits, we just introduced!
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

55c75884

Add Book3s_64 offsets to asm-offsets.c · 62908905

由 Alexander Graf 提交于 10月 30, 2009

We need to access some VCPU fields from assembly code. In order to get
the proper offsets, we have to define them in asm-offsets.c.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

62908905

21 9月, 2009 1 次提交

perf: Do the big rename: Performance Counters -> Performance Events · cdd6c482

由 Ingo Molnar 提交于 9月 21, 2009

Bye-bye Performance Counters, welcome Performance Events!

In the past few months the perfcounters subsystem has grown out its
initial role of counting hardware events, and has become (and is
becoming) a much broader generic event enumeration, reporting, logging,
monitoring, analysis facility.

Naming its core object 'perf_counter' and naming the subsystem
'perfcounters' has become more and more of a misnomer. With pending
code like hw-breakpoints support the 'counter' name is less and
less appropriate.

All in one, we've decided to rename the subsystem to 'performance
events' and to propagate this rename through all fields, variables
and API names. (in an ABI compatible fashion)

The word 'event' is also a bit shorter than 'counter' - which makes
it slightly more convenient to write/handle as well.

Thanks goes to Stephane Eranian who first observed this misnomer and
suggested a rename.

User-space tooling and ABI compatibility is not affected - this patch
should be function-invariant. (Also, defconfigs were not touched to
keep the size down.)

This patch has been generated via the following script:

  FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

  sed -i \
    -e 's/PERF_EVENT_/PERF_RECORD_/g' \
    -e 's/PERF_COUNTER/PERF_EVENT/g' \
    -e 's/perf_counter/perf_event/g' \
    -e 's/nb_counters/nb_events/g' \
    -e 's/swcounter/swevent/g' \
    -e 's/tpcounter_event/tp_event/g' \
    $FILES

  for N in $(find . -name perf_counter.[ch]); do
    M=$(echo $N | sed 's/perf_counter/perf_event/g')
    mv $N $M
  done

  FILES=$(find . -name perf_event.*)

  sed -i \
    -e 's/COUNTER_MASK/REG_MASK/g' \
    -e 's/COUNTER/EVENT/g' \
    -e 's/\<event\>/event_id/g' \
    -e 's/counter/event/g' \
    -e 's/Counter/Event/g' \
    $FILES

... to keep it as correct as possible. This script can also be
used by anyone who has pending perfcounters patches - it converts
a Linux kernel tree over to the new naming. We tried to time this
change to the point in time where the amount of pending patches
is the smallest: the end of the merge window.

Namespace clashes were fixed up in a preparatory patch - and some
stylistic fallout will be fixed up in a subsequent patch.

( NOTE: 'counters' are still the proper terminology when we deal
  with hardware registers - and these sed scripts are a bit
  over-eager in renaming them. I've undone some of that, but
  in case there's something left where 'counter' would be
  better than 'event' we can undo that on an individual basis
  instead of touching an otherwise nicely automated patch. )
Suggested-by: NStephane Eranian <eranian@google.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NPaul Mackerras <paulus@samba.org>
Reviewed-by: NArjan van de Ven <arjan@linux.intel.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

cdd6c482

20 8月, 2009 2 次提交

powerpc: Add PACA fields specific to 64-bit Book3E processors · dce6670a

由 Benjamin Herrenschmidt 提交于 7月 23, 2009

This adds various fields in the PACA that are for use specifically
by Book3E processors, such as exception save areas, current pgd
pointer, special exceptions kernel stacks etc...
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

dce6670a

powerpc: Add memory management headers for new 64-bit BookE · 57e2a99f

由 Benjamin Herrenschmidt 提交于 7月 28, 2009

This adds the PTE and pgtable format definitions, along with changes
to the kernel memory map and other definitions related to implementing
support for 64-bit Book3E. This also shields some asm-offset bits that
are currently only relevant on 32-bit

We also move the definition of the "linux" page size constants to
the common mmu.h file and add a few sizes that are relevant to
embedded processors.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

57e2a99f

18 8月, 2009 1 次提交

powerpc: Allow perf_counters to access user memory at interrupt time · 9c1e1052

由 Paul Mackerras 提交于 8月 17, 2009

This provides a mechanism to allow the perf_counters code to access
user memory in a PMU interrupt routine.  Such an access can cause
various kinds of interrupt: SLB miss, MMU hash table miss, segment
table miss, or TLB miss, depending on the processor.  This commit
only deals with 64-bit classic/server processors, which use an MMU
hash table.  32-bit processors are already able to access user memory
at interrupt time.  Since we don't soft-disable on 32-bit, we avoid
the possibility of reentering hash_page or the TLB miss handlers,
since they run with interrupts disabled.

On 64-bit processors, an SLB miss interrupt on a user address will
update the slb_cache and slb_cache_ptr fields in the paca.  This is
OK except in the case where a PMU interrupt occurs in switch_slb,
which also accesses those fields.  To prevent this, we hard-disable
interrupts in switch_slb.  Interrupts are already soft-disabled at
this point, and will get hard-enabled when they get soft-enabled
later.

This also reworks slb_flush_and_rebolt: to avoid hard-disabling twice,
and to make sure that it clears the slb_cache_ptr when called from
other callers than switch_slb, the existing routine is renamed to
__slb_flush_and_rebolt, which is called by switch_slb and the new
version of slb_flush_and_rebolt.

Similarly, switch_stab (used on POWER3 and RS64 processors) gets a
hard_irq_disable() to protect the per-cpu variables used there and
in ste_allocate.

If a MMU hashtable miss interrupt occurs, normally we would call
hash_page to look up the Linux PTE for the address and create a HPTE.
However, hash_page is fairly complex and takes some locks, so to
avoid the possibility of deadlock, we check the preemption count
to see if we are in a (pseudo-)NMI handler, and if so, we don't call
hash_page but instead treat it like a bad access that will get
reported up through the exception table mechanism.  An interrupt
whose handler runs even though the interrupt occurred when
soft-disabled (such as the PMU interrupt) is considered a pseudo-NMI
handler, which should use nmi_enter()/nmi_exit() rather than
irq_enter()/irq_exit().
Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

9c1e1052

09 6月, 2009 1 次提交

powerpc: Separate PACA fields for server CPUs · 91c60b5b

由 Benjamin Herrenschmidt 提交于 6月 02, 2009

This patch has no effect other than re-ordering PACA fields on
current server CPUs. It however is a pre-requisite for future
support of BookE 64-bit processors. Various parts of the PACA
struct are now moved under some ifdef's, either the new
CONFIG_PPC_BOOK3S or CONFIG_PPC_STD_MMU_64, whatever seems more
appropriate.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.craashing.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

91c60b5b

24 3月, 2009 1 次提交

KVM: ppc: No need to include core-header for KVM in asm-offsets.c currently · 366d4b9b

由 Hollis Blanchard 提交于 1月 03, 2009

Signed-off-by: NLiu Yu <yu.liu@freescale.com>
Signed-off-by: NHollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

366d4b9b

11 3月, 2009 1 次提交

powerpc: Remove unused asm-offsets entries for cpu_spec · 9e1e3723

由 Michael Ellerman 提交于 2月 23, 2009

Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

9e1e3723

09 1月, 2009 1 次提交

powerpc: Provide a way to defer perf counter work until interrupts are enabled · 93a6d3ce

由 Paul Mackerras 提交于 1月 09, 2009

Because 64-bit powerpc uses lazy (soft) interrupt disabling, it is
possible for a performance monitor exception to come in when the
kernel thinks interrupts are disabled (i.e. when they are
soft-disabled but hard-enabled).  In such a situation the performance
monitor exception handler might have some processing to do (such as
process wakeups) which can't be done in what is effectively an NMI
handler.

This provides a way to defer that work until interrupts get enabled,
either in raw_local_irq_restore() or by returning from an interrupt
handler to code that had interrupts enabled.  We have a per-processor
flag that indicates that there is work pending to do when interrupts
subsequently get re-enabled.  This flag is checked in the interrupt
return path and in raw_local_irq_restore(), and if it is set,
perf_counter_do_pending() is called to do the pending work.
Signed-off-by: NPaul Mackerras <paulus@samba.org>

93a6d3ce

08 1月, 2009 1 次提交

powerpc/fsl-booke: Don't hard-code size of struct tlbcam · 19f5465e

由 Trent Piepho 提交于 12月 08, 2008

Some assembly code in head_fsl_booke.S hard-coded the size of struct tlbcam
to 20 when it indexed the TLBCAM table.  Anyone changing the size of struct
tlbcam would not know to expect that.

The kernel already has a system to get the size of C structures into
assembly language files, asm-offsets, so let's use it.

The definition of the struct gets moved to a header, so that asm-offsets.c
can include it.
Signed-off-by: NTrent Piepho <tpiepho@freescale.com>
Signed-off-by: NKumar Gala <galak@kernel.crashing.org>

19f5465e

31 12月, 2008 4 次提交

KVM: ppc: Implement in-kernel exit timing statistics · 73e75b41

由 Hollis Blanchard 提交于 12月 02, 2008

Existing KVM statistics are either just counters (kvm_stat) reported for
KVM generally or trace based aproaches like kvm_trace.
For KVM on powerpc we had the need to track the timings of the different exit
types. While this could be achieved parsing data created with a kvm_trace
extension this adds too much overhead (at least on embedded PowerPC) slowing
down the workloads we wanted to measure.

Therefore this patch adds a in-kernel exit timing statistic to the powerpc kvm
code. These statistic is available per vm&vcpu under the kvm debugfs directory.
As this statistic is low, but still some overhead it can be enabled via a
.config entry and should be off by default.

Since this patch touched all powerpc kvm_stat code anyway this code is now
merged and simplified together with the exit timing statistic code (still
working with exit timing disabled in .config).
Signed-off-by: NChristian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: NHollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

73e75b41

KVM: ppc: directly insert shadow mappings into the hardware TLB · 7924bd41

由 Hollis Blanchard 提交于 12月 02, 2008

Formerly, we used to maintain a per-vcpu shadow TLB and on every entry to the
guest would load this array into the hardware TLB. This consumed 1280 bytes of
memory (64 entries of 16 bytes plus a struct page pointer each), and also
required some assembly to loop over the array on every entry.

Instead of saving a copy in memory, we can just store shadow mappings directly
into the hardware TLB, accepting that the host kernel will clobber these as
part of the normal 440 TLB round robin. When we do that we need less than half
the memory, and we have decreased the exit handling time for all guest exits,
at the cost of increased number of TLB misses because the host overwrites some
guest entries.

These savings will be increased on processors with larger TLBs or which
implement intelligent flush instructions like tlbivax (which will avoid the
need to walk arrays in software).

In addition to that and to the code simplification, we have a greater chance of
leaving other host userspace mappings in the TLB, instead of forcing all
subsequent tasks to re-fault all their mappings.
Signed-off-by: NHollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

7924bd41

KVM: ppc: create struct kvm_vcpu_44x and introduce container_of() accessor · db93f574

由 Hollis Blanchard 提交于 11月 05, 2008

This patch doesn't yet move all 44x-specific data into the new structure, but
is the first step down that path. In the future we may also want to create a
struct kvm_vcpu_booke.

Based on patch from Liu Yu <yu.liu@freescale.com>.
Signed-off-by: NHollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

db93f574

KVM: ppc: Rename "struct tlbe" to "struct kvmppc_44x_tlbe" · 0f55dc48

由 Hollis Blanchard 提交于 11月 05, 2008

This will ease ports to other cores.

Also remove unused "struct kvm_tlb" while we're at it.
Signed-off-by: NHollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

0f55dc48

29 12月, 2008 1 次提交

powerpc/44x: Support 16K/64K base page sizes on 44x · ca9153a3

由 Ilya Yanok 提交于 12月 11, 2008

This adds support for 16k and 64k page sizes on PowerPC 44x processors.

The PGDIR table is much smaller than a page when using 16k or 64k
pages (512 and 32 bytes respectively) so we allocate the PGDIR with
kzalloc() instead of __get_free_pages().

One PTE table covers rather a large memory area when using 16k or 64k
pages (32MB or 512MB respectively), so we can easily put FIXMAP and
PKMAP in the area covered by one PTE table.
Signed-off-by: NYuri Tikhonov <yur@emcraft.com>
Signed-off-by: NVladimir Panfilov <pvr@emcraft.com>
Signed-off-by: NIlya Yanok <yanok@emcraft.com>
Acked-by: NJosh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

ca9153a3

21 12月, 2008 1 次提交

powerpc/mm: Split mmu_context handling · 5e696617

由 Benjamin Herrenschmidt 提交于 12月 18, 2008

This splits the mmu_context handling between 32-bit hash based
processors, 64-bit hash based processors and everybody else.  This is
preliminary work for adding SMP support for BookE processors.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: NKumar Gala <galak@kernel.crashing.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

5e696617

OpenHarmony / kernel_linux 上一次同步 大约 4 年

OpenHarmony / kernel_linux
上一次同步大约 4 年