提交 · 1a7064380e6639e1cc5ed609ceb1ed46f3f188e3 · openeuler / raspberrypi-kernel

15 7月, 2013 4 次提交

cpuidle: Make cpuidle's sysfs directory dynamically allocated · 728ce22b

由 Daniel Lezcano 提交于 6月 12, 2013

The cpuidle sysfs code is designed to have a single instance of per
CPU cpuidle directory.  It is not possible to remove the sysfs entry
and create it again.  This is not a problem with the current code but
future changes will add CPU hotplug support to enable/disable the
device, so it will need to remove the sysfs entry like other
subsystems do.  That won't be possible without this change, because
the kobj is a static object which can't be reused for
kobj_init_and_add().

Add cpuidle_device_kobj to be allocated dynamically when
adding/removing a sysfs entry which is consistent with the other
cpuidle's sysfs entries.

An added benefit is that the sysfs code is now more self-contained
and the includes needed for sysfs can be moved from cpuidle.h
directly into sysfs.c so as to reduce the total number of headers
dragged along with cpuidle.h.

[rjw: Changelog]
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

728ce22b

cpuidle: Fix white space to follow CodingStyle · f89ae89e

由 Daniel Lezcano 提交于 6月 12, 2013

Fix white space in the cpuidle code to follow the rules described in
CodingStyle.

No changes in behavior should result from this.

[rjw: Changelog]
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

f89ae89e

cpuidle: Check cpuidle_enable_device() return value · 10b9d3f8

由 Daniel Lezcano 提交于 6月 12, 2013

We previously changed the ordering of the cpuidle framework
initialization so that the governors are registered before the
drivers which can register their devices right from the start.

Now, we can safely remove the __cpuidle_register_device() call hack
in cpuidle_enable_device() and check if the driver has been
registered before enabling it.  Then, cpuidle_register_device() can
consistently check the cpuidle_enable_device() return value when
enabling the device.

[rjw: Changelog]
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

10b9d3f8

cpuidle: Make it clear that governors cannot be modules · 137b944e

由 Daniel Lezcano 提交于 6月 12, 2013

cpufreq governors are defined as modules in the code, but the Kconfig
options do not allow them to be built as modules.  This is not really
a problem, but the cpuidle init ordering is: the cpuidle init
functions (framework and driver) and then the governors.  That leads
to some weirdness in the cpuidle framework.

Namely,  cpuidle_register_device() calls cpuidle_enable_device() which
fails at the first attempt, because governors have not been registered
yet.  When a governor is registered, the framework calls
cpuidle_enable_device() again which runs __cpuidle_register_device()
only then.  Of course, for that to work, the cpuidle_enable_device()
return value has to be ignored by cpuidle_register_device().

Instead of having this cyclic call graph and relying on a positive
side effects of the hackish back and forth cpuidle_enable_device()
calls it is better to fix the cpuidle init ordering.

To that end, replace the module init code with postcore_initcall()
so we have:

 * cpuidle framework : core_initcall
 * cpuidle governors : postcore_initcall
 * cpuidle drivers   : device_initcall

and remove the corresponding module exit code as it is dead anyway
(governors can't be built as modules).

[rjw: Changelog]
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

137b944e

24 6月, 2013 1 次提交

cpuidle: calxeda: select ARM_CPU_SUSPEND · 6eed846f

由 Arnd Bergmann 提交于 4月 30, 2013

Like other ARM specific drivers, this one requires ARM_CPU_SUSPEND,
as shown by this linker error:

drivers/built-in.o: In function `calxeda_pwrdown_idle':
drivers/cpuidle/cpuidle-calxeda.c:84: undefined reference to `cpu_suspend'
drivers/cpuidle/cpuidle-calxeda.c:86: undefined reference to `cpu_resume'
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NRafael J. Wysocki <rjw@sisk.pl>
Acked-by: NRob Herring <rob.herring@calxeda.com>
Acked-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Cc: linux-pm@vger.kernel.org

6eed846f

11 6月, 2013 3 次提交

cpuidle: Fix ARCH_NEEDS_CPU_IDLE_COUPLED dependency warning · b39b0981

由 Daniel Lezcano 提交于 6月 11, 2013

Before commit d6f346f2 (cpuidle: improve governor Kconfig options),
the CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED option didn't depend on
CONFIG_CPU_IDLE but now it has been moved under the CPU_IDLE
menuconfig.

That raises the following warnings:

 warning: (ARCH_OMAP4 && ARCH_TEGRA_2x_SOC) selects ARCH_NEEDS_CPU_IDLE_COUPLED
 which has unmet direct dependencies (CPU_IDLE)
 warning: (ARCH_OMAP4 && ARCH_TEGRA_2x_SOC) selects ARCH_NEEDS_CPU_IDLE_COUPLED
 which has unmet direct dependencies (CPU_IDLE)

because the tegra2 and omap4 Kconfig files select this option
without checking if CPU_IDLE is set.

Fix that by moving ARCH_NEEDS_CPU_IDLE_COUPLED outside of CPU_IDLE.

[rjw: Changelog]
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

b39b0981

cpuidle: Comment the driver's framework code · 6d19cb93

由 Daniel Lezcano 提交于 6月 07, 2013

Add kerneldoc (and other) comments to the cpuidle driver's framework
code.
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

6d19cb93

cpuidle: simplify multiple driver support · 82467a5a

由 Daniel Lezcano 提交于 6月 07, 2013

Commit bf4d1b5d (cpuidle: support multiple drivers) introduced support
for using multiple cpuidle drivers at the same time.  It added a
couple of new APIs to register the driver per CPU, but that led to
some unnecessary code complexity related to the kernel config options
deciding whether or not the multiple driver support is enabled.  The
code has to work as it did before when the multiple driver support is
not enabled and the multiple driver support has to be compatible with
the previously existing API.

Remove the new API, not used by any driver in the tree yet (but
needed for the HMP cpuidle drivers that will be submitted soon), and
add a new cpumask pointer to the cpuidle driver structure that will
point to the mask of CPUs handled by the given driver.  That will
allow the cpuidle_[un]register_driver() API to be used for the
multiple driver support along with the cpuidle_[un]register()
functions added recently.

[rjw: Changelog]
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

82467a5a

05 6月, 2013 1 次提交

ARM: zynq: Add cpuidle support · bd2a337a

由 Michal Simek 提交于 6月 04, 2013

Add cpuidle support for Xilinx Zynq.
Signed-off-by: NMichal Simek <michal.simek@xilinx.com>
Acked-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

bd2a337a

04 6月, 2013 1 次提交

cpuidle: improve governor Kconfig options · d6f346f2

由 Daniel Lezcano 提交于 5月 28, 2013

Each governor is suitable for different kernel configurations: the menu
governor suits better for a tickless system, while the ladder governor fits
better for a periodic timer tick system.

The Kconfig does not allow to [un]select a governor, thus both are compiled in
the kernel but the init order makes the menu governor to be the last one to be
registered, so becoming the default. The only way to switch back to the ladder
governor is to enable the sysfs governor switch in the kernel command line.

Because it seems nobody complained about this, the menu governor is used by
default most of the time on the system, having both governors is not really
necessary on a tickless system but there isn't a config option to disable one
or another governor.

Create a submenu for cpuidle and add a label for each governor, so we can see
the option in the menu config and enable/disable it.

The governors will be enabled depending on the CONFIG_NO_HZ option:
 - If CONFIG_NO_HZ is set, then the menu governor is selected and the ladder
   governor is optional, defaulting to 'yes'
 - If CONFIG_NO_HZ is not set, then the ladder governor is selected and the
   menu governor is optional, defaulting to 'yes'
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

d6f346f2

30 5月, 2013 1 次提交

ARM: introduce common set_auxcr/get_auxcr functions · bbc8d77d

由 Rob Herring 提交于 1月 16, 2013

Move the private set_auxcr/get_auxcr functions from
drivers/cpuidle/cpuidle-calxeda.c so they can be used across platforms.
Signed-off-by: NRob Herring <rob.herring@calxeda.com>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: NNicolas Pitre <nico@linaro.org>
Acked-by: NTony Lindgren <tony@atomide.com>
Acked-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
Reviewed-by: NWill Deacon <will.deacon@arm.com>

bbc8d77d

27 4月, 2013 1 次提交

cpuidle: add maintainer entry · a8e39c35

由 Daniel Lezcano 提交于 4月 26, 2013

Currently cpuidle drivers are spread across different archs.

As a result, there are several different paths for cpuidle patch
submissions: cpuidle core changes go through linux-pm, ARM driver
changes go to the arm-soc or SoC-specific trees, sh changes go
through the sh arch tree, pseries changes go through the PowerPC tree
and finally intel changes go through the Len's tree while ACPI idle
changes go through linux-pm.

That makes it difficult to consolidate code and to propagate
modifications from the cpuidle core to the different drivers.

Hopefully, a movement has started to put the majority of cpuidle
drivers under drivers/cpuidle like cpuidle-calxeda.c and
cpuidle-kirkwood.c.

Add a maintainer entry for cpuidle to MAINTAINERS to clarify the
situation and to indicate to new cpuidle driver authors that those
drivers should not go into arch-specific directories.

The upstreaming process is unchanged: Rafael takes patches for
merging into his tree, but with an Acked-by: tag from the driver's
maintainer, so indicate in the drivers' headers who maintains them.

The arrangement will be the same as for cpufreq.

[rjw: Changelog]
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: NLinus Walleij <linus.walleij@linaro.org>
Acked-by: Andrew Lunn <andrew@lunn.ch>  #for kirkwood
Acked-by: Jason Cooper <jason@lakedaemon.net> #for kirkwood
Acked-by: NKevin Hilman <khilman@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

a8e39c35

24 4月, 2013 1 次提交

cpuidle: fix comment format · 1c192d04

由 Daniel Lezcano 提交于 4月 23, 2013

Fix comment format for the kernel doc script.
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

1c192d04

23 4月, 2013 4 次提交

ARM: kirkwood: cpuidle: use init/exit common routine · 30dc72c6

由 Daniel Lezcano 提交于 4月 23, 2013

Remove the duplicated code and use the cpuidle common code for initialization.
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: NAndrew Lunn <andrew@lunn.ch>
Acked-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

30dc72c6

ARM: calxeda: cpuidle: use init/exit common routine · 0b210d96

由 Daniel Lezcano 提交于 4月 23, 2013

Remove the duplicated code and use the cpuidle common code for initialization.
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: NRob Herring <rob.herring@calxeda.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

0b210d96

cpuidle: make a single register function for all · 4c637b21

由 Daniel Lezcano 提交于 4月 23, 2013

The usual scheme to initialize a cpuidle driver on a SMP is:

	cpuidle_register_driver(drv);
	for_each_possible_cpu(cpu) {
		device = &per_cpu(cpuidle_dev, cpu);
		cpuidle_register_device(device);
	}

This code is duplicated in each cpuidle driver.

On UP systems, it is done this way:

	cpuidle_register_driver(drv);
	device = &per_cpu(cpuidle_dev, cpu);
	cpuidle_register_device(device);

On UP, the macro 'for_each_cpu' does one iteration:

#define for_each_cpu(cpu, mask)                 \
        for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask)

Hence, the initialization loop is the same for UP than SMP.

Beside, we saw different bugs / mis-initialization / return code unchecked in
the different drivers, the code is duplicated including bugs. After fixing all
these ones, it appears the initialization pattern is the same for everyone.

Please note, some drivers are doing dev->state_count = drv->state_count. This is
not necessary because it is done by the cpuidle_enable_device function in the
cpuidle framework. This is true, until you have the same states for all your
devices. Otherwise, the 'low level' API should be used instead with the specific
initialization for the driver.

Let's add a wrapper function doing this initialization with a cpumask parameter
for the coupled idle states and use it for all the drivers.

That will save a lot of LOC, consolidate the code, and the modifications in the
future could be done in a single place. Another benefit is the consolidation of
the cpuidle_device variable which is now in the cpuidle framework and no longer
spread accross the different arch specific drivers.
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

4c637b21

cpuidle: remove en_core_tk_irqen flag · 554c06ba

由 Daniel Lezcano 提交于 4月 23, 2013

The en_core_tk_irqen flag is set in all the cpuidle driver which
means it is not necessary to specify this flag.

Remove the flag and the code related to it.
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Kevin Hilman <khilman@linaro.org>  # for mach-omap2/*
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

554c06ba

01 4月, 2013 4 次提交

cpuidle: initialize the broadcast timer framework · a06df062

由 Daniel Lezcano 提交于 3月 27, 2013

The commit 89878baa73f0f1c679355006bd8632e5d78f96c2 introduced
the CPUIDLE_FLAG_TIMER_STOP flag where we specify a specific idle
state stops the local timer.

Now use this flag to check at init time if one state will need
the broadcast timer and, in this case, setup the broadcast timer
framework. That prevents multiple code duplication in the drivers.
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

a06df062

cpuidle: kirkwood: fix coccicheck warnings · 488540bf

由 Silviu-Mihai Popescu 提交于 3月 22, 2013

Convert all uses of devm_request_and_ioremap() to the newly introduced
devm_ioremap_resource() which provides more consistent error handling.

devm_ioremap_resource() provides its own error messages so all explicit
error messages can be removed from the failure code paths.
Signed-off-by: NSilviu-Mihai Popescu <silviupopescu1990@gmail.com>
Reviewed-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

488540bf

cpuidle / kirkwood: remove redundant Kconfig option · 9a23fe65

由 Daniel Lezcano 提交于 3月 12, 2013

When the CPU_IDLE and the ARCH_KIRKWOOD options are set it is
pointless to define a new option CPU_IDLE_KIRKWOOD because it
is redundant.

The Makefile drivers directory contains a condition to compile
the cpuidle drivers:

obj-$(CONFIG_CPU_IDLE)          += cpuidle/

Hence, if CPU_IDLE is not set we won't enter this directory.

This patch removes the useless Kconfig option and replaces the
condition in the Makefile by CONFIG_ARCH_KIRKWOOD.
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: NJason Cooper <jason@lakedaemon.net>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

9a23fe65

cpuidle : handle clockevent notify from the cpuidle framework · b60e6a0e

由 Daniel Lezcano 提交于 3月 21, 2013

When a cpu enters a deep idle state, the local timers are stopped and
the time framework falls back to the timer device used as a broadcast
timer.

The different cpuidle drivers are calling clockevents_notify ENTER/EXIT
when the idle state stops the local timer.

Add a new flag CPUIDLE_FLAG_TIMER_STOP which can be set by the cpuidle
drivers. If the flag is set, the cpuidle core code takes care of the
notification on behalf of the driver to avoid pointless code duplication.
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

b60e6a0e

01 2月, 2013 1 次提交

cpuidle: kirkwood: Move out of mach directory · 9cfc94eb

由 Andrew Lunn 提交于 1月 09, 2013

Move the Kirkwood cpuidle driver out of arch/arm/mach-kirkwood and
into drivers/cpuidle. Convert the driver into a platform driver.
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NJason Cooper <jason@lakedaemon.net>

9cfc94eb

26 1月, 2013 1 次提交

PM / tracing: remove deprecated power trace API · 43720bd6

由 Paul Gortmaker 提交于 1月 11, 2013

The text in Documentation said it would be removed in 2.6.41;
the text in the Kconfig said removal in the 3.1 release.  Either
way you look at it, we are well past both, so push it off a cliff.

Note that the POWER_CSTATE and the POWER_PSTATE are part of the
legacy tracing API.  Remove all tracepoints which use these flags.
As can be seen from context, most already have a trace entry via
trace_cpu_idle anyways.

Also, the cpufreq/cpufreq.c PSTATE one is actually unpaired, as
compared to the CSTATE ones which all have a clear start/stop.
As part of this, the trace_power_frequency also becomes orphaned,
so it too is deleted.
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

43720bd6

15 1月, 2013 1 次提交

cpuidle: remove the power_specified field in the driver · 8aef33a7

由 Daniel Lezcano 提交于 1月 15, 2013

We realized that the power usage field is never filled and when it
is filled for tegra, the power_specified flag is not set causing all
of these values to be reset when the driver is initialized with
set_power_state().

However, the power_specified flag can be simply removed under the
assumption that the states are always backward sorted, which is the
case with the current code.

This change allows the menu governor select function and the
cpuidle_play_dead() to be simplified.  Moreover, the
set_power_states() function can removed as it does not make sense
any more.

Drop the power_specified flag from struct cpuidle_driver and make
the related changes as described above.

As a consequence, this also fixes the bug where on the dynamic
C-states system, the power fields are not initialized.

[rjw: Changelog]
References: https://bugzilla.kernel.org/show_bug.cgi?id=42870
References: https://bugzilla.kernel.org/show_bug.cgi?id=43349
References: https://lkml.org/lkml/2012/10/16/518Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

8aef33a7

12 1月, 2013 1 次提交

cpuidle: fix number of initialized/destroyed states · 392370e7

由 Krzysztof Mazur 提交于 1月 11, 2013

Commit bf4d1b5d (cpuidle: support
multiple drivers) changed the number of initialized state kobjects
in cpuidle_add_state_sysfs() from device->state_count to
drv->state_count, but left device->state_count in
cpuidle_remove_state_sysfs().  The values of these two fields may be
different, in which case a NULL pointer dereference may happen in
cpuidle_remove_state_sysfs(), for example.  Fix this problem by making
cpuidle_add_state_sysfs() use device->state_count too (which restores
the original behavior of it).

[rjw: Changelog]
Signed-off-by: NKrzysztof Mazur <krzysiek@podlesie.net>
Acked-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

392370e7

03 1月, 2013 3 次提交

cpuidle: fix lock contention in the idle path · ac34d7c8

由 Daniel Lezcano 提交于 1月 03, 2013

Commit bf4d1b5d (cpuidle: support multiple drivers) introduced
locking in cpuidle_get_cpu_driver(), which is used in the
idle_call() function.

This leads to a contention problem with a large number of CPUs,
because they all try to run the idle routine at the same time.

The lock can be safely removed because of how is used the cpuidle
API.  Namely, cpuidle_register_driver() is called first, but the
cpuidle idle function is not entered before cpuidle_register_device()
is called, because the cpuidle device is not enabled then. Moreover,
cpuidle_unregister_driver(), which would reset the driver value to
NULL, is not called before cpuidle_unregister_device().

All of the cpuidle drivers use the API in the same way.

In general, a cleanup around the lock is necessary and a proper
refcounting mechanism should be used to ensure the consistency in the
API (for example, cpuidle_unregister_driver() should fail if the
driver's refcount is not 0). However, these modifications will require
some code reorganization and rewrite which will be too intrusive for
a fix.

For this reason, fix the contention problem introduced by commit
bf4d1b5d by simply removing the locking from cpuidle_get_cpu_driver(),
which restores the original behavior of that routine.

[rjw: Changelog.]
Reported-and-tested-by: NRuss Anderson <rja@sgi.com>
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

ac34d7c8

cpuidle / coupled: fix ready counter decrement · 92638e2f

由 Sivaram Nair 提交于 12月 18, 2012

The ready_waiting_counts atomic variable is compared against the wrong
online cpu count. The latter is computed incorrectly using logical-OR
instead of bit-OR. This patch fixes that.
Signed-off-by: NSivaram Nair <sivaramn@nvidia.com>
Acked-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
Acked-by: NColin Cross <ccross@android.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

92638e2f

cpuidle: Fix finding state with min power_usage · 0e5537b3

由 Sivaram Nair 提交于 12月 18, 2012

Since cpuidle_state.power_usage is a signed value, use INT_MAX (instead
of -1) to init the local copies so that functions that tries to find
cpuidle states with minimum power usage works correctly even if they use
non-negative values.
Signed-off-by: NSivaram Nair <sivaramn@nvidia.com>
Reviewed-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

0e5537b3

27 11月, 2012 1 次提交

cpuidle: Measure idle state durations with monotonic clock · a474a515

由 Julius Werner 提交于 11月 27, 2012

Many cpuidle drivers measure their time spent in an idle state by
reading the wallclock time before and after idling and calculating the
difference. This leads to erroneous results when the wallclock time gets
updated by another processor in the meantime, adding that clock
adjustment to the idle state's time counter.

If the clock adjustment was negative, the result is even worse due to an
erroneous cast from int to unsigned long long of the last_residency
variable. The negative 32 bit integer will zero-extend and result in a
forward time jump of roughly four billion milliseconds or 1.3 hours on
the idle state residency counter.

This patch changes all affected cpuidle drivers to either use the
monotonic clock for their measurements or make use of the generic time
measurement wrapper in cpuidle.c, which was already working correctly.
Some superfluous CLIs/STIs in the ACPI code are removed (interrupts
should always already be disabled before entering the idle function, and
not get reenabled until the generic wrapper has performed its second
measurement). It also removes the erroneous cast, making sure that
negative residency values are applied correctly even though they should
not appear anymore.
Signed-off-by: NJulius Werner <jwerner@chromium.org>
Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
Tested-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: NLen Brown <len.brown@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

a474a515

23 11月, 2012 1 次提交

cpuidle: fix a suspicious RCU usage in menu governor · a093b93e

由 Li Zhong 提交于 11月 23, 2012

I saw this suspicious RCU usage on the next tree of 11/15

[   67.123404] ===============================
[   67.123413] [ INFO: suspicious RCU usage. ]
[   67.123423] 3.7.0-rc5-next-20121115-dirty #1 Not tainted
[   67.123434] -------------------------------
[   67.123444] include/trace/events/timer.h:186 suspicious rcu_dereference_check() usage!
[   67.123458]
[   67.123458] other info that might help us debug this:
[   67.123458]
[   67.123474]
[   67.123474] RCU used illegally from idle CPU!
[   67.123474] rcu_scheduler_active = 1, debug_locks = 0
[   67.123493] RCU used illegally from extended quiescent state!
[   67.123507] 1 lock held by swapper/1/0:
[   67.123516]  #0:  (&cpu_base->lock){-.-...}, at: [<c0000000000979b0>] .__hrtimer_start_range_ns+0x28c/0x524
[   67.123555]
[   67.123555] stack backtrace:
[   67.123566] Call Trace:
[   67.123576] [c0000001e2ccb920] [c00000000001275c] .show_stack+0x78/0x184 (unreliable)
[   67.123599] [c0000001e2ccb9d0] [c0000000000c15a0] .lockdep_rcu_suspicious+0x120/0x148
[   67.123619] [c0000001e2ccba70] [c00000000009601c] .enqueue_hrtimer+0x1c0/0x1c8
[   67.123639] [c0000001e2ccbb00] [c000000000097aa0] .__hrtimer_start_range_ns+0x37c/0x524
[   67.123660] [c0000001e2ccbc20] [c0000000005c9698] .menu_select+0x508/0x5bc
[   67.123678] [c0000001e2ccbd20] [c0000000005c740c] .cpuidle_idle_call+0xa8/0x6e4
[   67.123699] [c0000001e2ccbdd0] [c0000000000459a0] .pSeries_idle+0x10/0x34
[   67.123717] [c0000001e2ccbe40] [c000000000014dc8] .cpu_idle+0x130/0x280
[   67.123738] [c0000001e2ccbee0] [c0000000006ffa8c] .start_secondary+0x378/0x384
[   67.123758] [c0000001e2ccbf90] [c00000000000936c] .start_secondary_prolog+0x10/0x14

hrtimer_start was added in 198fd638 and ae515197. The patch below tries
to use RCU_NONIDLE around it to avoid the above report.
Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

a093b93e

15 11月, 2012 10 次提交

cpuidle: support multiple drivers · bf4d1b5d

由 Daniel Lezcano 提交于 10月 31, 2012

With the tegra3 and the big.LITTLE [1] new architectures, several cpus
with different characteristics (latencies and states) can co-exists on the
system.

The cpuidle framework has the limitation of handling only identical cpus.

This patch removes this limitation by introducing the multiple driver support
for cpuidle.

This option is configurable at compile time and should be enabled for the
architectures mentioned above. So there is no impact for the other platforms
if the option is disabled. The option defaults to 'n'. Note the multiple drivers
support is also compatible with the existing drivers, even if just one driver is
needed, all the cpu will be tied to this driver using an extra small chunk of
processor memory.

The multiple driver support use a per-cpu driver pointer instead of a global
variable and the accessor to this variable are done from a cpu context.

In order to keep the compatibility with the existing drivers, the function
'cpuidle_register_driver' and 'cpuidle_unregister_driver' will register
the specified driver for all the cpus.

The semantic for the output of /sys/devices/system/cpu/cpuidle/current_driver
remains the same except the driver name will be related to the current cpu.

The /sys/devices/system/cpu/cpu[0-9]/cpuidle/driver/name files are added
allowing to read the per cpu driver name.

[1] http://lwn.net/Articles/481055/Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: NPeter De Schrijver <pdeschrijver@nvidia.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

bf4d1b5d

cpuidle: prepare the cpuidle core to handle multiple drivers · 13dd52f1

由 Daniel Lezcano 提交于 10月 31, 2012

This patch is a preparation for the multiple cpuidle drivers support.

As the next patch will introduce the multiple drivers with the Kconfig
option and we want to keep the code clean and understandable, this patch
defines a set of functions for encapsulating some common parts and splits
what should be done under a lock from the rest.

[rjw: Modified the subject and changelog slightly.]
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: NPeter De Schrijver <pdeschrijver@nvidia.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

13dd52f1

cpuidle: move driver checking within the lock section · 41682032

由 Daniel Lezcano 提交于 10月 31, 2012

The code is racy and the check with cpuidle_curr_driver should be
done under the lock.

I don't find a path in the different drivers where that could happen
because the arch specific drivers are written in such way it is not
possible to register a driver while it is unregistered, except maybe
in a very improbable case when "intel_idle" and "processor_idle" are
competing. One could unregister a driver, while the other one is
registering.
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: NPeter De Schrijver <pdeschrijver@nvidia.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

41682032

cpuidle: move driver's refcount to cpuidle · 42f67f2a

由 Daniel Lezcano 提交于 10月 31, 2012

We want to support different cpuidle drivers co-existing together.
In this case we should move the refcount to the cpuidle_driver
structure to handle several drivers at a time.
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: NPeter De Schrijver <pdeschrijver@nvidia.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

42f67f2a

cpuidle: fixup device.h header in cpuidle.h · 8f3e9953

由 Daniel Lezcano 提交于 10月 31, 2012

The "struct device" is only used in sysfs.c.

The other .c files including the private header "cpuidle.h"
do not need to pull the entire headers tree from there as they
don't manipulate the "struct device".

This patch fixes this by moving the header inclusion to sysfs.c
and adding a forward declaration for the struct device.

The number of lines generated by the preprocesor:
Without this patch : 17269 loc
With this patch : 16446 loc
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

8f3e9953

cpuidle / sysfs: move structure declaration into the sysfs.c file · 349631e0

由 Daniel Lezcano 提交于 10月 31, 2012

The structure cpuidle_state_kobj is not used anywhere except
in the sysfs.c file. The definition of this structure is not
needed in the cpuidle header file. This patch moves it to the
sysfs.c file in order to encapsulate the code a bit more.
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

349631e0

cpuidle: Get typical recent sleep interval · c96ca4fb

由 Youquan Song 提交于 10月 26, 2012

The function detect_repeating_patterns was not very useful for
workloads with alternating long and short pauses, for example
virtual machines handling network requests for each other (say
a web and database server).

Instead, try to find a recent sleep interval that is somewhere
between the median and the mode sleep time, by discarding outliers
to the up side and recalculating the average and standard deviation
until that is no longer required.

This should do something sane with a sleep interval series like:

	200 180 210 10000 30 1000 170 200

The current code would simply discard such a series, while the
new code will guess a typical sleep interval just shy of 200.

The original patch come from Rik van Riel <riel@redhat.com>.
Signed-off-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NYouquan Song <youquan.song@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

c96ca4fb

cpuidle: Set residency to 0 if target Cstate not enter · d73d68dc

由 Youquan Song 提交于 10月 26, 2012

When cpuidle governor choose a C-state to enter for idle CPU, but it notice that
there is tasks request to be executed. So the idle CPU will not really enter
the target C-state and go to run task.

In this situation, it will use the residency of previous really entered target
C-states. Obviously, it is not reasonable.

So, this patch fix it by set the target C-state residency to 0.
Signed-off-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NYouquan Song <youquan.song@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

d73d68dc

cpuidle: Quickly notice prediction failure in general case · e11538d1

由 Youquan Song 提交于 10月 26, 2012

The prediction for future is difficult and when the cpuidle governor prediction
fails and govenor possibly choose the shallower C-state than it should. How to
quickly notice and find the failure becomes important for power saving.

The patch extends to general case that prediction logic get a small predicted
residency, so it choose a shallow C-state though the expected residency is large
. Once the prediction will be fail, the CPU will keep staying at shallow C-state
for a long time. Acutally, the CPU has change enter into deep C-state.
So when the expected residency is long enough but governor choose a shallow
C-state, an timer will be added in order to monitor if the prediction failure.

When C-state is waken up prior to the adding timer, the timer will be cancelled
initiatively. When the timer is triggered and menu governor will quickly notice
prediction failure and re-evaluates deeper C-states possibility.
Signed-off-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NYouquan Song <youquan.song@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

e11538d1

cpuidle: Quickly notice prediction failure for repeat mode · 69a37bea

由 Youquan Song 提交于 10月 26, 2012

The prediction for future is difficult and when the cpuidle governor prediction
fails and govenor possibly choose the shallower C-state than it should. How to
quickly notice and find the failure becomes important for power saving.

cpuidle menu governor has a method to predict the repeat pattern if there are 8
C-states residency which are continuous and the same or very close, so it will
predict the next C-states residency will keep same residency time.

There is a real case that turbostat utility (tools/power/x86/turbostat)
at kernel 3.3 or early. turbostat utility will read 10 registers one by one at
Sandybridge, so it will generate 10 IPIs to wake up idle CPUs. So cpuidle menu
 governor will predict it is repeat mode and there is another IPI wake up idle
 CPU soon, so it keeps idle CPU stay at C1 state even though CPU is totally
idle. However, in the turbostat, following 10 registers reading is sleep 5
seconds by default, so the idle CPU will keep at C1 for a long time though it is
 idle until break event occurs.
In a idle Sandybridge system, run "./turbostat -v", we will notice that deep
C-state dangles between "70% ~ 99%". After patched the kernel, we will notice
deep C-state stays at >99.98%.

In the patch, a timer is added when menu governor detects a repeat mode and
choose a shallow C-state. The timer is set to a time out value that greater
than predicted time, and we conclude repeat mode prediction failure if timer is
triggered. When repeat mode happens as expected, the timer is not triggered
and CPU waken up from C-states and it will cancel the timer initiatively.
When repeat mode does not happen, the timer will be time out and menu governor
will quickly notice that the repeat mode prediction fails and then re-evaluates
deeper C-states possibility.

Below is another case which will clearly show the patch much benefit:

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <sys/time.h>
#include <time.h>
#include <pthread.h>

volatile int * shutdown;
volatile long * count;
int delay = 20;
int loop = 8;

void usage(void)
{
	fprintf(stderr,
		"Usage: idle_predict [options]\n"
		"  --help	-h  Print this help\n"
		"  --thread	-n  Thread number\n"
		"  --loop     	-l  Loop times in shallow Cstate\n"
		"  --delay	-t  Sleep time (uS)in shallow Cstate\n");
}

void *simple_loop() {
	int idle_num = 1;
	while (!(*shutdown)) {
		*count = *count + 1;

		if (idle_num % loop)
			usleep(delay);
		else {
			/* sleep 1 second */
			usleep(1000000);
			idle_num = 0;
		}
		idle_num++;
	}

}

static void sighand(int sig)
{
	*shutdown = 1;
}

int main(int argc, char *argv[])
{
	sigset_t sigset;
	int signum = SIGALRM;
	int i, c, er = 0, thread_num = 8;
	pthread_t pt[1024];

	static char optstr[] = "n:l:t:h:";

	while ((c = getopt(argc, argv, optstr)) != EOF)
		switch (c) {
			case 'n':
				thread_num = atoi(optarg);
				break;
			case 'l':
				loop = atoi(optarg);
				break;
			case 't':
				delay = atoi(optarg);
				break;
			case 'h':
			default:
				usage();
				exit(1);
		}

	printf("thread=%d,loop=%d,delay=%d\n",thread_num,loop,delay);
	count = malloc(sizeof(long));
	shutdown = malloc(sizeof(int));
	*count = 0;
	*shutdown = 0;

	sigemptyset(&sigset);
	sigaddset(&sigset, signum);
	sigprocmask (SIG_BLOCK, &sigset, NULL);
	signal(SIGINT, sighand);
	signal(SIGTERM, sighand);

	for(i = 0; i < thread_num ; i++)
		pthread_create(&pt[i], NULL, simple_loop, NULL);

	for (i = 0; i < thread_num; i++)
		pthread_join(pt[i], NULL);

	exit(0);
}

Get powertop V2 from git://github.com/fenrus75/powertop, build powertop.
After build the above test application, then run it.
Test plaform can be Intel Sandybridge or other recent platforms.
#./idle_predict -l 10 &
#./powertop

We will find that deep C-state will dangle between 40%~100% and much time spent
on C1 state. It is because menu governor wrongly predict that repeat mode
is kept, so it will choose the C1 shallow C-state even though it has chance to
sleep 1 second in deep C-state.

While after patched the kernel, we find that deep C-state will keep >99.6%.
Signed-off-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NYouquan Song <youquan.song@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

69a37bea