提交 · 39eda2aba6be642b71f2e0ad623dcb09fd9d79cf · openanolis / cloud-kernel

27 8月, 2013 2 次提交

pseries: Move plpar_wrapper.h to powerpc common include/asm location. · 212bebb4

由 Deepthi Dharwar 提交于 8月 22, 2013

As a part of pseries_idle backend driver cleanup to make
the code common to both pseries and powernv platforms, it
is necessary to move the backend-driver code to drivers/cpuidle.

As a pre-requisite for that, it is essential to move plpar_wrapper.h
to include/asm.
Signed-off-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

212bebb4

pseries/cpuidle: Remove dependency of pseries.h file · 9b3fbd6c

由 Deepthi Dharwar 提交于 8月 22, 2013

As a part of pseries_idle cleanup to make the backend driver
code common to both pseries and powernv.
Remove non-essential smt_snooze_delay declaration in pseries.h
header file and pseries.h file inclusion in
pseries/processor_idle.c
Signed-off-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

9b3fbd6c

14 8月, 2013 2 次提交

powerpc: Fix little endian lppaca, slb_shadow and dtl_entry · 7ffcf8ec

由 Anton Blanchard 提交于 8月 07, 2013

The lppaca, slb_shadow and dtl_entry hypervisor structures are
big endian, so we have to byte swap them in little endian builds.

LE KVM hosts will also need to be fixed but for now add an #error
to remind us.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

7ffcf8ec

powerpc: Stop using non-architected shared_proc field in lppaca · f13c13a0

由 Anton Blanchard 提交于 8月 07, 2013

Although the shared_proc field in the lppaca works today, it is
not architected. A shared processor partition will always have a non
zero yield_count so use that instead. Create a wrapper so users
don't have to know about the details.

In order for older kernels to continue to work on KVM we need
to set the shared_proc bit. While here, remove the ugly bitfield.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

f13c13a0

23 4月, 2013 1 次提交

cpuidle: remove en_core_tk_irqen flag · 554c06ba

由 Daniel Lezcano 提交于 4月 23, 2013

The en_core_tk_irqen flag is set in all the cpuidle driver which
means it is not necessary to specify this flag.

Remove the flag and the code related to it.
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Kevin Hilman <khilman@linaro.org>  # for mach-omap2/*
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

554c06ba

09 4月, 2013 1 次提交

POWERPC: pseries: cpuidle: use time keeping flag · 1ca80944

由 Daniel Lezcano 提交于 4月 03, 2013

The current code computes the idle time but that can be handled
by the cpuidle framework if we enable the .en_core_tk_irqen flag.

Set the flag and remove the code related to the time computation.
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

1ca80944

27 11月, 2012 1 次提交

cpuidle: Measure idle state durations with monotonic clock · a474a515

由 Julius Werner 提交于 11月 27, 2012

Many cpuidle drivers measure their time spent in an idle state by
reading the wallclock time before and after idling and calculating the
difference. This leads to erroneous results when the wallclock time gets
updated by another processor in the meantime, adding that clock
adjustment to the idle state's time counter.

If the clock adjustment was negative, the result is even worse due to an
erroneous cast from int to unsigned long long of the last_residency
variable. The negative 32 bit integer will zero-extend and result in a
forward time jump of roughly four billion milliseconds or 1.3 hours on
the idle state residency counter.

This patch changes all affected cpuidle drivers to either use the
monotonic clock for their measurements or make use of the generic time
measurement wrapper in cpuidle.c, which was already working correctly.
Some superfluous CLIs/STIs in the ACPI code are removed (interrupts
should always already be disabled before entering the idle function, and
not get reenabled until the generic wrapper has performed its second
measurement). It also removes the erroneous cast, making sure that
negative residency values are applied correctly even though they should
not appear anymore.
Signed-off-by: NJulius Werner <jwerner@chromium.org>
Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
Tested-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: NLen Brown <len.brown@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

a474a515

18 10月, 2012 3 次提交

cpuidle/powerpc: Fix snooze state problem in the cpuidle design on pseries. · 83dac594

由 Deepthi Dharwar 提交于 10月 03, 2012

Earlier without cpuidle framework on pseries, the native arch
idle routine comprised of both snooze and nap
states.  smt_snooze_delay variable was used to delay
the idle process entry to deeper idle state like  nap.
With the coming of cpuidle, this arch specific idle was replaced
by two different idle routines, one for supporting snooze and other
for nap. This enabled addition of more
low level idle states on pseries in the future.

On adopting the generic cpuidle framework for POWER systems,
the decision of which idle state to choose from,  given a predicted
idle time is taken by the menu governor based on
target_residency and  exit_latency of the idle states.
target_residency is the minimum time to be resident in that idle state.
Exit_latency is time taken to exit out of idle state.
Deeper the idle state, both the target residency and exit latency
would be higher.

In the current design, smt_snooze_delay is used as target_residency
for the  snooze state which is incorrect, as it is not the
minimum but the maximum duration to be in snooze state.
This would  result in the governor in taking bad decision,
as presently target_residency of nap < target_residency of snooze
inspite of nap being deeper idle state.

This patch aims to fix this problem by replacing the smt_snooze_delay loop
in snooze state, with the need_resched()  as the governor is aware of
entry and exit of various idle transitions based on which
next idle time prediction.

The governor is intelligent enough to determine the idle state the needs to
be transitioned to and maintains a whole of heuristics including
io load, previous idle states predictions etc for the same, based on
which idle state entry decision is taken.

With this fix, of setting target_residency of snooze to 0
					     nap to smt_snooze_delay
if the predicted idle time is less
than smt_snooze_delay (target_residency of nap)
value governor would pick snooze state, else nap. This adhers to the
previous native idle design.
Signed-off-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

83dac594

cpuidle/powerpc: Fix smt_snooze_delay functionality. · 8ea959a1

由 Deepthi Dharwar 提交于 10月 03, 2012

smt_snooze_delay was designed to  delay idle loop's nap entry
in the native idle code before it got  ported over to use as part of
the cpuidle framework.

A -ve value  assigned to smt_snooze_delay should result in
busy looping, in other words disabling the entry to nap state.

	- https://lists.ozlabs.org/pipermail/linuxppc-dev/2010-May/082450.html

This particular functionality can be achieved currently by
echo 1 > /sys/devices/system/cpu/cpu*/state1/disable
but it is broken when one assigns -ve value to  the smt_snooze_delay
variable either via sysfs entry or ppc64_cpu util.

This patch aims to fix this, by disabling nap state when smt_snooze_delay
variable is set to -ve value.
Signed-off-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

8ea959a1

cpuidle/powerpc: Fix target residency initialisation in pseries cpuidle · 817deb05

由 Deepthi Dharwar 提交于 10月 03, 2012

Remove the redundant target residency initialisation in pseries_cpuidle_driver_init().
This is currently over-writing the residency time updated as part of the static
table, resulting in all the idle states having the same target
residency of 100us which is incorrect. This may result in the menu governor making
wrong state decisions.
Signed-off-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

817deb05

11 7月, 2012 1 次提交

powerpc/cpuidle: Fixes for pseries_idle hotplug notifier · 852d8cb1

由 Deepthi Dharwar 提交于 7月 03, 2012

Currently the call to pseries_notify_cpuidle_add_cpu(), that takes
action on the cpuidle front when a cpu is added/removed
is being made from smp_xics_setup_cpu().
This caused lockdep issues as
reported https://lkml.org/lkml/2012/5/17/2

On addition of each cpu,
resources were cleared and re-allocated each time, all in critical
section as part of start_secondary() call were interrupts are disabled.
To resolve this issue, the pseries_notify_cpuidle_add_cpu() call is
is being replaced by a hotplug notifier which
would prevent cpuidle resources from being
released and allocated each time cpu is onlined in the critical code path.
It was fixed in https://lkml.org/lkml/2012/5/18/174.

Also it is essential to call cpuidle_enable/disable_device
between  cpuidle_pause_and_lock()  and
cpuidle_resume_and_unlock()  when used externally
to avoid race conditions. Add support for CPU_ONLINE_FROZEN
and CPU_DEAD_FROZEN as part of hotplug notify event for
pseries_idle  and unregister hotplug notifier
while exiting out. The above mentioned issues
are fixed as part of this patch.
Signed-off-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

852d8cb1

10 7月, 2012 1 次提交

powerpc: More fixes for lazy IRQ vs. idle · be2cf20a

由 Benjamin Herrenschmidt 提交于 7月 10, 2012

Looks like we still have issues with pSeries and Cell idle code
vs. the lazy irq state. In fact, the reset fixes that went upstream
are exposing the problem more by causing BUG_ON() to trigger (which
this patch turns into a WARN_ON instead).

We need to be careful when using a variant of low power state that
has the side effect of turning interrupts back on, to properly set
all the SW & lazy state to look as if everything is enabled before
we enter the low power state with MSR:EE off as we will return with
MSR:EE on. If not, we have a discrepancy of state which can cause
things to go very wrong later on.

This patch moves the logic into a helper and uses it from the
pseries and cell idle code. The power4/970 idle code already got
things right (in assembly even !) so I'm not touching it. The power7
"bare metal" idle code is subtly different and correct. Remains PA6T
and some hypervisor based Cell platforms which have questionable
code in there, but they are mostly dead platforms so I'll fix them
when I manage to get final answers from the respective maintainers
about how the low power state actually works on them.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
CC: stable@vger.kernel.org [v3.4]

be2cf20a

03 7月, 2012 1 次提交

powerpc/pseries/cpuidle: Replace pseries_notify_cpuidle_add call with notifier · 16aaaff6

由 Deepthi Dharwar 提交于 5月 20, 2012

The following patch is to remove the pseries_notify_add_cpu() call
and replace it by a hot plug notifier.

This would prevent cpuidle resources being released and allocated each
time cpu comes online on pseries.

The earlier design was causing a lockdep problem
in start_secondary as reported on this thread
	-https://lkml.org/lkml/2012/5/17/2

This applies on 3.4-rc7
Signed-off-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

16aaaff6

29 6月, 2012 1 次提交

powerpc: check_and_cede_processor() never cedes · 0b17ba72

由 Anton Blanchard 提交于 6月 27, 2012

Commit f948501b ("Make hard_irq_disable() actually hard-disable
interrupts") caused check_and_cede_processor to stop working.
->irq_happened will never be zero right after a hard_irq_disable
so the compiler removes the call to cede_processor completely.

The bug was introduced back in the lazy interrupt handling rework
of 3.4 but was hidden until recently because hard_irq_disable did
nothing.

This issue will eventually appear in 3.4 stable since the
hard_irq_disable fix is marked stable, so mark this one for stable
too.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Cc: stable@vger.kernel.org
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

0b17ba72

29 3月, 2012 1 次提交

Disintegrate asm/system.h for PowerPC · ae3a197e

由 David Howells 提交于 3月 28, 2012

Disintegrate asm/system.h for PowerPC.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
cc: linuxppc-dev@lists.ozlabs.org

ae3a197e

09 3月, 2012 1 次提交

powerpc: Rework lazy-interrupt handling · 7230c564

由 Benjamin Herrenschmidt 提交于 3月 06, 2012

The current implementation of lazy interrupts handling has some
issues that this tries to address.

We don't do the various workarounds we need to do when re-enabling
interrupts in some cases such as when returning from an interrupt
and thus we may still lose or get delayed decrementer or doorbell
interrupts.

The current scheme also makes it much harder to handle the external
"edge" interrupts provided by some BookE processors when using the
EPR facility (External Proxy) and the Freescale Hypervisor.

Additionally, we tend to keep interrupts hard disabled in a number
of cases, such as decrementer interrupts, external interrupts, or
when a masked decrementer interrupt is pending. This is sub-optimal.

This is an attempt at fixing it all in one go by reworking the way
we do the lazy interrupt disabling from the ground up.

The base idea is to replace the "hard_enabled" field with a
"irq_happened" field in which we store a bit mask of what interrupt
occurred while soft-disabled.

When re-enabling, either via arch_local_irq_restore() or when returning
from an interrupt, we can now decide what to do by testing bits in that
field.

We then implement replaying of the missed interrupts either by
re-using the existing exception frame (in exception exit case) or via
the creation of a new one from an assembly trampoline (in the
arch_local_irq_enable case).

This removes the need to play with the decrementer to try to create
fake interrupts, among others.

In addition, this adds a few refinements:

 - We no longer  hard disable decrementer interrupts that occur
while soft-disabled. We now simply bump the decrementer back to max
(on BookS) or leave it stopped (on BookE) and continue with hard interrupts
enabled, which means that we'll potentially get better sample quality from
performance monitor interrupts.

 - Timer, decrementer and doorbell interrupts now hard-enable
shortly after removing the source of the interrupt, which means
they no longer run entirely hard disabled. Again, this will improve
perf sample quality.

 - On Book3E 64-bit, we now make the performance monitor interrupt
act as an NMI like Book3S (the necessary C code for that to work
appear to already be present in the FSL perf code, notably calling
nmi_enter instead of irq_enter). (This also fixes a bug where BookE
perfmon interrupts could clobber r14 ... oops)

 - We could make "masked" decrementer interrupts act as NMIs when doing
timer-based perf sampling to improve the sample quality.

Signed-off-by-yet: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

v2:

- Add hard-enable to decrementer, timer and doorbells
- Fix CR clobber in masked irq handling on BookE
- Make embedded perf interrupt act as an NMI
- Add a PACA_HAPPENED_EE_EDGE for use by FSL if they want
  to retrigger an interrupt without preventing hard-enable

v3:

 - Fix or vs. ori bug on Book3E
 - Fix enabling of interrupts for some exceptions on Book3E

v4:

 - Fix resend of doorbells on return from interrupt on Book3E

v5:

 - Rebased on top of my latest series, which involves some significant
rework of some aspects of the patch.

v6:
 - 32-bit compile fix
 - more compile fixes with various .config combos
 - factor out the asm code to soft-disable interrupts
 - remove the C wrapper around preempt_schedule_irq

v7:
 - Fix a bug with hard irq state tracking on native power7

7230c564

08 12月, 2011 2 次提交

powerpc/cpuidle: Handle power_save=off · e8bb3e00

由 Deepthi Dharwar 提交于 11月 30, 2011

This patch makes pseries_idle_driver not to be registered when
power_save=off kernel boot option is specified. The
cpuidle_disable variable used here is similar to
its usage on x86. If cpuidle_disable is set then
sysfs entries for cpuidle framework are not created
and the required drivers are not loaded.
Signed-off-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: NTrinabh Gupta <g.trinabh@gmail.com>
Signed-off-by: NArun R Bharadwaj <arun.r.bharadwaj@gmail.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

e8bb3e00

powerpc/cpuidle: cpuidle driver for pSeries · 707827f3

由 Deepthi Dharwar 提交于 11月 30, 2011

This patch implements a back-end cpuidle driver for pSeries
based on pseries_dedicated_idle_loop and pseries_shared_idle_loop
routines.  The driver is built only if CONFIG_CPU_IDLE is set. This
cpuidle driver uses global registration of idle states and
not per-cpu.
Signed-off-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: NTrinabh Gupta <g.trinabh@gmail.com>
Signed-off-by: NArun R Bharadwaj <arun.r.bharadwaj@gmail.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

707827f3

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功