提交 · bd2a337a25dd22bcd6b3fb1f99461f6991773e68 · openeuler / Kernel

24 4月, 2013 1 次提交

由 Daniel Lezcano 提交于 4月 23, 2013

Fix comment format for the kernel doc script.
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

1c192d04

23 4月, 2013 2 次提交

cpuidle: make a single register function for all · 4c637b21

由 Daniel Lezcano 提交于 4月 23, 2013

The usual scheme to initialize a cpuidle driver on a SMP is:

	cpuidle_register_driver(drv);
	for_each_possible_cpu(cpu) {
		device = &per_cpu(cpuidle_dev, cpu);
		cpuidle_register_device(device);
	}

This code is duplicated in each cpuidle driver.

On UP systems, it is done this way:

	cpuidle_register_driver(drv);
	device = &per_cpu(cpuidle_dev, cpu);
	cpuidle_register_device(device);

On UP, the macro 'for_each_cpu' does one iteration:

#define for_each_cpu(cpu, mask)                 \
        for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask)

Hence, the initialization loop is the same for UP than SMP.

Beside, we saw different bugs / mis-initialization / return code unchecked in
the different drivers, the code is duplicated including bugs. After fixing all
these ones, it appears the initialization pattern is the same for everyone.

Please note, some drivers are doing dev->state_count = drv->state_count. This is
not necessary because it is done by the cpuidle_enable_device function in the
cpuidle framework. This is true, until you have the same states for all your
devices. Otherwise, the 'low level' API should be used instead with the specific
initialization for the driver.

Let's add a wrapper function doing this initialization with a cpumask parameter
for the coupled idle states and use it for all the drivers.

That will save a lot of LOC, consolidate the code, and the modifications in the
future could be done in a single place. Another benefit is the consolidation of
the cpuidle_device variable which is now in the cpuidle framework and no longer
spread accross the different arch specific drivers.
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

4c637b21

cpuidle: remove en_core_tk_irqen flag · 554c06ba

由 Daniel Lezcano 提交于 4月 23, 2013

The en_core_tk_irqen flag is set in all the cpuidle driver which
means it is not necessary to specify this flag.

Remove the flag and the code related to it.
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Kevin Hilman <khilman@linaro.org>  # for mach-omap2/*
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

554c06ba

01 4月, 2013 1 次提交

cpuidle : handle clockevent notify from the cpuidle framework · b60e6a0e

由 Daniel Lezcano 提交于 3月 21, 2013

When a cpu enters a deep idle state, the local timers are stopped and
the time framework falls back to the timer device used as a broadcast
timer.

The different cpuidle drivers are calling clockevents_notify ENTER/EXIT
when the idle state stops the local timer.

Add a new flag CPUIDLE_FLAG_TIMER_STOP which can be set by the cpuidle
drivers. If the flag is set, the cpuidle core code takes care of the
notification on behalf of the driver to avoid pointless code duplication.
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

b60e6a0e

26 1月, 2013 1 次提交

PM / tracing: remove deprecated power trace API · 43720bd6

由 Paul Gortmaker 提交于 1月 11, 2013

The text in Documentation said it would be removed in 2.6.41;
the text in the Kconfig said removal in the 3.1 release.  Either
way you look at it, we are well past both, so push it off a cliff.

Note that the POWER_CSTATE and the POWER_PSTATE are part of the
legacy tracing API.  Remove all tracepoints which use these flags.
As can be seen from context, most already have a trace entry via
trace_cpu_idle anyways.

Also, the cpufreq/cpufreq.c PSTATE one is actually unpaired, as
compared to the CSTATE ones which all have a clear start/stop.
As part of this, the trace_power_frequency also becomes orphaned,
so it too is deleted.
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

43720bd6

15 1月, 2013 1 次提交

cpuidle: remove the power_specified field in the driver · 8aef33a7

由 Daniel Lezcano 提交于 1月 15, 2013

We realized that the power usage field is never filled and when it
is filled for tegra, the power_specified flag is not set causing all
of these values to be reset when the driver is initialized with
set_power_state().

However, the power_specified flag can be simply removed under the
assumption that the states are always backward sorted, which is the
case with the current code.

This change allows the menu governor select function and the
cpuidle_play_dead() to be simplified.  Moreover, the
set_power_states() function can removed as it does not make sense
any more.

Drop the power_specified flag from struct cpuidle_driver and make
the related changes as described above.

As a consequence, this also fixes the bug where on the dynamic
C-states system, the power fields are not initialized.

[rjw: Changelog]
References: https://bugzilla.kernel.org/show_bug.cgi?id=42870
References: https://bugzilla.kernel.org/show_bug.cgi?id=43349
References: https://lkml.org/lkml/2012/10/16/518Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

8aef33a7

03 1月, 2013 1 次提交

cpuidle: Fix finding state with min power_usage · 0e5537b3

由 Sivaram Nair 提交于 12月 18, 2012

Since cpuidle_state.power_usage is a signed value, use INT_MAX (instead
of -1) to init the local copies so that functions that tries to find
cpuidle states with minimum power usage works correctly even if they use
non-negative values.
Signed-off-by: NSivaram Nair <sivaramn@nvidia.com>
Reviewed-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

0e5537b3

27 11月, 2012 1 次提交

cpuidle: Measure idle state durations with monotonic clock · a474a515

由 Julius Werner 提交于 11月 27, 2012

Many cpuidle drivers measure their time spent in an idle state by
reading the wallclock time before and after idling and calculating the
difference. This leads to erroneous results when the wallclock time gets
updated by another processor in the meantime, adding that clock
adjustment to the idle state's time counter.

If the clock adjustment was negative, the result is even worse due to an
erroneous cast from int to unsigned long long of the last_residency
variable. The negative 32 bit integer will zero-extend and result in a
forward time jump of roughly four billion milliseconds or 1.3 hours on
the idle state residency counter.

This patch changes all affected cpuidle drivers to either use the
monotonic clock for their measurements or make use of the generic time
measurement wrapper in cpuidle.c, which was already working correctly.
Some superfluous CLIs/STIs in the ACPI code are removed (interrupts
should always already be disabled before entering the idle function, and
not get reenabled until the generic wrapper has performed its second
measurement). It also removes the erroneous cast, making sure that
negative residency values are applied correctly even though they should
not appear anymore.
Signed-off-by: NJulius Werner <jwerner@chromium.org>
Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
Tested-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: NLen Brown <len.brown@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

a474a515

15 11月, 2012 4 次提交

cpuidle: support multiple drivers · bf4d1b5d

由 Daniel Lezcano 提交于 10月 31, 2012

With the tegra3 and the big.LITTLE [1] new architectures, several cpus
with different characteristics (latencies and states) can co-exists on the
system.

The cpuidle framework has the limitation of handling only identical cpus.

This patch removes this limitation by introducing the multiple driver support
for cpuidle.

This option is configurable at compile time and should be enabled for the
architectures mentioned above. So there is no impact for the other platforms
if the option is disabled. The option defaults to 'n'. Note the multiple drivers
support is also compatible with the existing drivers, even if just one driver is
needed, all the cpu will be tied to this driver using an extra small chunk of
processor memory.

The multiple driver support use a per-cpu driver pointer instead of a global
variable and the accessor to this variable are done from a cpu context.

In order to keep the compatibility with the existing drivers, the function
'cpuidle_register_driver' and 'cpuidle_unregister_driver' will register
the specified driver for all the cpus.

The semantic for the output of /sys/devices/system/cpu/cpuidle/current_driver
remains the same except the driver name will be related to the current cpu.

The /sys/devices/system/cpu/cpu[0-9]/cpuidle/driver/name files are added
allowing to read the per cpu driver name.

[1] http://lwn.net/Articles/481055/Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: NPeter De Schrijver <pdeschrijver@nvidia.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

bf4d1b5d

cpuidle: Set residency to 0 if target Cstate not enter · d73d68dc

由 Youquan Song 提交于 10月 26, 2012

When cpuidle governor choose a C-state to enter for idle CPU, but it notice that
there is tasks request to be executed. So the idle CPU will not really enter
the target C-state and go to run task.

In this situation, it will use the residency of previous really entered target
C-states. Obviously, it is not reasonable.

So, this patch fix it by set the target C-state residency to 0.
Signed-off-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NYouquan Song <youquan.song@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

d73d68dc

cpuidle / sysfs: move kobj initialization in the syfs file · e45a00d6

由 Daniel Lezcano 提交于 10月 26, 2012

Move the kobj initialization and completion in the sysfs.c
and encapsulate the code more.
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

e45a00d6

cpuidle / sysfs: change function parameter · 1aef40e2

由 Daniel Lezcano 提交于 10月 26, 2012

The function needs the cpuidle_device which is initially passed to the
caller.

The current code gets the struct device from the struct cpuidle_device,
pass it the cpuidle_add_sysfs function. This function calls
per_cpu(cpuidle_devices, cpu) to get the cpuidle_device.

This patch pass the cpuidle_device instead and simplify the code.
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

1aef40e2

09 10月, 2012 1 次提交

ACPI idle, CPU hotplug: Fix NULL pointer dereference during hotplug · cf31cd1a

由 Srivatsa S. Bhat 提交于 10月 08, 2012

On a KVM guest, when a CPU is taken offline and brought back online, we hit
the following NULL pointer dereference:

[   45.400843] Unregister pv shared memory for cpu 1
[   45.412331] smpboot: CPU 1 is now offline
[   45.529894] SMP alternatives: lockdep: fixing up alternatives
[   45.533472] smpboot: Booting Node 0 Processor 1 APIC 0x1
[   45.411526] kvm-clock: cpu 1, msr 0:7d14601, secondary cpu clock
[   45.571370] KVM setup async PF for cpu 1
[   45.572331] kvm-stealtime: cpu 1, msr 7d0e040
[   45.575031] BUG: unable to handle kernel NULL pointer dereference at           (null)
[   45.576017] IP: [<ffffffff81519f98>] cpuidle_disable_device+0x18/0x80
[   45.576017] PGD 5dfb067 PUD 5da8067 PMD 0
[   45.576017] Oops: 0000 [#1] SMP
[   45.576017] Modules linked in:
[   45.576017] CPU 0
[   45.576017] Pid: 607, comm: stress_cpu_hotp Not tainted 3.6.0-padata-tp-debug #3 Bochs Bochs
[   45.576017] RIP: 0010:[<ffffffff81519f98>]  [<ffffffff81519f98>] cpuidle_disable_device+0x18/0x80
[   45.576017] RSP: 0018:ffff880005d93ce8  EFLAGS: 00010286
[   45.576017] RAX: ffff880005d93fd8 RBX: 0000000000000000 RCX: 0000000000000006
[   45.576017] RDX: 0000000000000006 RSI: 2222222222222222 RDI: 0000000000000000
[   45.576017] RBP: ffff880005d93cf8 R08: 2222222222222222 R09: 2222222222222222
[   45.576017] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[   45.576017] R13: 0000000000000000 R14: ffffffff81c8cca0 R15: 0000000000000001
[   45.576017] FS:  00007f91936ae700(0000) GS:ffff880007c00000(0000) knlGS:0000000000000000
[   45.576017] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   45.576017] CR2: 0000000000000000 CR3: 0000000005db3000 CR4: 00000000000006f0
[   45.576017] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   45.576017] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   45.576017] Process stress_cpu_hotp (pid: 607, threadinfo ffff880005d92000, task ffff8800066bbf40)
[   45.576017] Stack:
[   45.576017]  ffff880007a96400 0000000000000000 ffff880005d93d28 ffffffff813ac689
[   45.576017]  ffff880007a96400 ffff880007a96400 0000000000000002 ffffffff81cd8d01
[   45.576017]  ffff880005d93d58 ffffffff813aa498 0000000000000001 00000000ffffffdd
[   45.576017] Call Trace:
[   45.576017]  [<ffffffff813ac689>] acpi_processor_hotplug+0x55/0x97
[   45.576017]  [<ffffffff813aa498>] acpi_cpu_soft_notify+0x93/0xce
[   45.576017]  [<ffffffff816ae47d>] notifier_call_chain+0x5d/0x110
[   45.576017]  [<ffffffff8109730e>] __raw_notifier_call_chain+0xe/0x10
[   45.576017]  [<ffffffff81069050>] __cpu_notify+0x20/0x40
[   45.576017]  [<ffffffff81069085>] cpu_notify+0x15/0x20
[   45.576017]  [<ffffffff816978f1>] _cpu_up+0xee/0x137
[   45.576017]  [<ffffffff81697983>] cpu_up+0x49/0x59
[   45.576017]  [<ffffffff8168758d>] store_online+0x9d/0xe0
[   45.576017]  [<ffffffff8140a9f8>] dev_attr_store+0x18/0x30
[   45.576017]  [<ffffffff812322c0>] sysfs_write_file+0xe0/0x150
[   45.576017]  [<ffffffff811b389c>] vfs_write+0xac/0x180
[   45.576017]  [<ffffffff811b3be2>] sys_write+0x52/0xa0
[   45.576017]  [<ffffffff816b31e9>] system_call_fastpath+0x16/0x1b
[   45.576017] Code: 48 c7 c7 40 e5 ca 81 e8 07 d0 18 00 5d c3 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 48 83 ec 10 48 89 5d f0 4c 89 65 f8 48 89 fb <f6> 07 02 75 13 48 8b 5d f0 4c 8b 65 f8 c9 c3 66 0f 1f 84 00 00
[   45.576017] RIP  [<ffffffff81519f98>] cpuidle_disable_device+0x18/0x80
[   45.576017]  RSP <ffff880005d93ce8>
[   45.576017] CR2: 0000000000000000
[   45.656079] ---[ end trace 433d6c9ac0b02cef ]---

Analysis:
Commit 3d339dcb (cpuidle / ACPI : move cpuidle_device field out of the
acpi_processor_power structure()) made the allocation of the dev structure
(struct cpuidle) of a CPU dynamic, whereas previously it was statically
allocated. And this dynamic allocation occurs in acpi_processor_power_init()
if pr->flags.power evaluates to non-zero.

On KVM guests, pr->flags.power evaluates to zero, hence dev is never
allocated. This causes the NULL pointer (dev) dereference in
cpuidle_disable_device() during a subsequent CPU online operation. Fix this
by ensuring that dev is non-NULL before dereferencing.
Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

cf31cd1a

11 7月, 2012 1 次提交

PM / cpuidle: System resume hang fix with cpuidle · 8651f97b

由 Preeti U Murthy 提交于 7月 09, 2012

On certain bios, resume hangs if cpus are allowed to enter idle states
during suspend [1].

This was fixed in apci idle driver [2].But intel_idle driver does not
have this fix. Thus instead of replicating the fix in both the idle
drivers, or in more platform specific idle drivers if needed, the
more general cpuidle infrastructure could handle this.

A suspend callback in cpuidle_driver could handle this fix. But
a cpuidle_driver provides only basic functionalities like platform idle
state detection capability and mechanisms to support entry and exit
into CPU idle states. All other cpuidle functions are found in the
cpuidle generic infrastructure for good reason that all cpuidle
drivers, irrepective of their platforms will support these functions.

One option therefore would be to register a suspend callback in cpuidle
which handles this fix. This could be called through a PM_SUSPEND_PREPARE
notifier. But this is too generic a notfier for a driver to handle.

Also, ideally the job of cpuidle is not to handle side effects of suspend.
It should expose the interfaces which "handle cpuidle 'during' suspend"
or any other operation, which the subsystems call during that respective
operation.

The fix demands that during suspend, no cpus should be allowed to enter
deep C-states. The interface cpuidle_uninstall_idle_handler() in cpuidle
ensures that. Not just that it also kicks all the cpus which are already
in idle out of their idle states which was being done during cpu hotplug
through a CPU_DYING_FROZEN callbacks.

Now the question arises about when during suspend should
cpuidle_uninstall_idle_handler() be called. Since we are dealing with
drivers it seems best to call this function during dpm_suspend().
Delaying the call till dpm_suspend_noirq() does no harm, as long as it is
before cpu_hotplug_begin() to avoid race conditions with cpu hotpulg
operations. In dpm_suspend_noirq(), it would be wise to place this call
before suspend_device_irqs() to avoid ugly interactions with the same.

Ananlogously, during resume.

References:
[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/674075.
[2] http://marc.info/?l=linux-pm&m=133958534231884&w=2Reported-and-tested-by: NDave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
Reviewed-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

8651f97b

04 7月, 2012 2 次提交

PM / Domains: Add preliminary support for cpuidle, v2 · cbc9ef02

由 Rafael J. Wysocki 提交于 7月 03, 2012

On some systems there are CPU cores located in the same power
domains as I/O devices.  Then, power can only be removed from the
domain if all I/O devices in it are not in use and the CPU core
is idle.  Add preliminary support for that to the generic PM domains
framework.

First, the platform is expected to provide a cpuidle driver with one
extra state designated for use with the generic PM domains code.
This state should be initially disabled and its exit_latency value
should be set to whatever time is needed to bring up the CPU core
itself after restoring power to it, not including the domain's
power on latency.  Its .enter() callback should point to a procedure
that will remove power from the domain containing the CPU core at
the end of the CPU power transition.

The remaining characteristics of the extra cpuidle state, referred to
as the "domain" cpuidle state below, (e.g. power usage, target
residency) should be populated in accordance with the properties of
the hardware.

Next, the platform should execute genpd_attach_cpuidle() on the PM
domain containing the CPU core.  That will cause the generic PM
domains framework to treat that domain in a special way such that:

 * When all devices in the domain have been suspended and it is about
   to be turned off, the states of the devices will be saved, but
   power will not be removed from the domain.  Instead, the "domain"
   cpuidle state will be enabled so that power can be removed from
   the domain when the CPU core is idle and the state has been chosen
   as the target by the cpuidle governor.

 * When the first I/O device in the domain is resumed and
   __pm_genpd_poweron(() is called for the first time after
   power has been removed from the domain, the "domain" cpuidle
   state will be disabled to avoid subsequent surprise power removals
   via cpuidle.

The effective exit_latency value of the "domain" cpuidle state
depends on the time needed to bring up the CPU core itself after
restoring power to it as well as on the power on latency of the
domain containing the CPU core.  Thus the "domain" cpuidle state's
exit_latency has to be recomputed every time the domain's power on
latency is updated, which may happen every time power is restored
to the domain, if the measured power on latency is greater than
the latency stored in the corresponding generic_pm_domain structure.
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Reviewed-by: NKevin Hilman <khilman@ti.com>

cbc9ef02

cpuidle: move field disable from per-driver to per-cpu · dc7fd275

由 ShuoX Liu 提交于 7月 03, 2012

Andrew J.Schorr raises a question.  When he changes the disable setting on
a single CPU, it affects all the other CPUs.  Basically, currently, the
disable field is per-driver instead of per-cpu.  All the C states of the
same driver are shared by all CPU in the same machine.

The patch changes the `disable' field to per-cpu, so we could set this
separately for each cpu.
Signed-off-by: NShuoX Liu <shuox.liu@intel.com>
Reported-by: NAndrew J.Schorr <aschorr@telemetry-investments.com>
Reviewed-by: NYanmin Zhang <yanmin_zhang@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

dc7fd275

02 6月, 2012 5 次提交

cpuidle: add support for states that affect multiple cpus · 4126c019

由 Colin Cross 提交于 5月 07, 2012

On some ARM SMP SoCs (OMAP4460, Tegra 2, and probably more), the
cpus cannot be independently powered down, either due to
sequencing restrictions (on Tegra 2, cpu 0 must be the last to
power down), or due to HW bugs (on OMAP4460, a cpu powering up
will corrupt the gic state unless the other cpu runs a work
around).  Each cpu has a power state that it can enter without
coordinating with the other cpu (usually Wait For Interrupt, or
WFI), and one or more "coupled" power states that affect blocks
shared between the cpus (L2 cache, interrupt controller, and
sometimes the whole SoC).  Entering a coupled power state must
be tightly controlled on both cpus.

The easiest solution to implementing coupled cpu power states is
to hotplug all but one cpu whenever possible, usually using a
cpufreq governor that looks at cpu load to determine when to
enable the secondary cpus.  This causes problems, as hotplug is an
expensive operation, so the number of hotplug transitions must be
minimized, leading to very slow response to loads, often on the
order of seconds.

This file implements an alternative solution, where each cpu will
wait in the WFI state until all cpus are ready to enter a coupled
state, at which point the coupled state function will be called
on all cpus at approximately the same time.

Once all cpus are ready to enter idle, they are woken by an smp
cross call.  At this point, there is a chance that one of the
cpus will find work to do, and choose not to enter idle.  A
final pass is needed to guarantee that all cpus will call the
power state enter function at the same time.  During this pass,
each cpu will increment the ready counter, and continue once the
ready counter matches the number of online coupled cpus.  If any
cpu exits idle, the other cpus will decrement their counter and
retry.

To use coupled cpuidle states, a cpuidle driver must:

   Set struct cpuidle_device.coupled_cpus to the mask of all
   coupled cpus, usually the same as cpu_possible_mask if all cpus
   are part of the same cluster.  The coupled_cpus mask must be
   set in the struct cpuidle_device for each cpu.

   Set struct cpuidle_device.safe_state to a state that is not a
   coupled state.  This is usually WFI.

   Set CPUIDLE_FLAG_COUPLED in struct cpuidle_state.flags for each
   state that affects multiple cpus.

   Provide a struct cpuidle_state.enter function for each state
   that affects multiple cpus.  This function is guaranteed to be
   called on all cpus at approximately the same time.  The driver
   should ensure that the cpus all abort together if any cpu tries
   to abort once the function is called.

update1:

cpuidle: coupled: fix count of online cpus

online_count was never incremented on boot, and was also counting
cpus that were not part of the coupled set.  Fix both issues by
introducting a new function that counts online coupled cpus, and
call it from register as well as the hotplug notifier.

update2:

cpuidle: coupled: fix decrementing ready count

cpuidle_coupled_set_not_ready sometimes refuses to decrement the
ready count in order to prevent a race condition.  This makes it
unsuitable for use when finished with idle.  Add a new function
cpuidle_coupled_set_done that decrements both the ready count and
waiting count, and call it after idle is complete.

Cc: Amit Kucheria <amit.kucheria@linaro.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Trinabh Gupta <g.trinabh@gmail.com>
Cc: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Reviewed-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
Tested-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
Reviewed-by: NKevin Hilman <khilman@ti.com>
Tested-by: NKevin Hilman <khilman@ti.com>
Signed-off-by: NColin Cross <ccross@android.com>
Acked-by: NRafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: NLen Brown <len.brown@intel.com>

4126c019

cpuidle: fix error handling in __cpuidle_register_device · 3af272ab

由 Colin Cross 提交于 5月 07, 2012

Fix the error handling in __cpuidle_register_device to include
the missing list_del.  Move it to a label, which will simplify
the error handling when coupled states are added.
Reviewed-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
Tested-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
Reviewed-by: NKevin Hilman <khilman@ti.com>
Tested-by: NKevin Hilman <khilman@ti.com>
Signed-off-by: NColin Cross <ccross@android.com>
Reviewed-by: NRafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: NLen Brown <len.brown@intel.com>

3af272ab

cpuidle: refactor out cpuidle_enter_state · 56cfbf74

由 Colin Cross 提交于 5月 07, 2012

Split the code to enter a state and update the stats into a helper
function, cpuidle_enter_state, and export it.  This function will
be called by the coupled state code to handle entering the safe
state and the final coupled state.
Reviewed-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
Tested-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
Reviewed-by: NKevin Hilman <khilman@ti.com>
Tested-by: NKevin Hilman <khilman@ti.com>
Signed-off-by: NColin Cross <ccross@android.com>
Reviewed-by: NRafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: NLen Brown <len.brown@intel.com>

56cfbf74

cpuidle: add checks to avoid NULL pointer dereference · 1b0a0e9a

由 Srivatsa S. Bhat 提交于 5月 04, 2012

The existing check for dev == NULL in __cpuidle_register_device() is
rendered useless because dev is dereferenced before the check itself.
Moreover, correctly speaking, it is the job of the callers of this
function, i.e., cpuidle_register_device() & cpuidle_enable_device() (which
also happen to be exported functions) to ensure that
__cpuidle_register_device() is called with a non-NULL dev.

So add the necessary dev == NULL checks in the two callers and remove the
(useless) check from __cpuidle_register_device().
Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLen Brown <len.brown@intel.com>

1b0a0e9a

cpuidle: remove unused hrtimer_peek_ahead_timers() call · 0aeb9cac

由 Sergey Senozhatsky 提交于 5月 04, 2012

  commit 9a655837
  Author: Arjan van de Ven <arjan@linux.intel.com>
  Date:   Sun Nov 9 12:45:10 2008 -0800

     regression: disable timer peek-ahead for 2.6.28

     It's showing up as regressions; disabling it very likely just papers
     over an underlying issue, but time is running out for 2.6.28, lets get
     back to this for 2.6.29

 Many years has passed since 2008, so it seems ok to remove whole `#if 0' block.
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Trinabh Gupta <g.trinabh@gmail.com>
Cc: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLen Brown <len.brown@intel.com>

0aeb9cac

08 5月, 2012 1 次提交

cpuidle: Use kick_all_cpus_sync() · 4a162513

由 Thomas Gleixner 提交于 5月 07, 2012

kick_all_cpus_sync() is the core implementation of cpu_idle_wait()
which is copied all over the arch code.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120507175652.119842173@linutronix.de

4a162513

07 4月, 2012 1 次提交

cpuidle: Fix panic in CPU off-lining with no idle driver · ee01e663

由 Toshi Kani 提交于 3月 31, 2012

Fix a NULL pointer dereference panic in cpuidle_play_dead() during
CPU off-lining when no cpuidle driver is registered.  A cpuidle
driver may be registered at boot-time based on CPU type.  This patch
allows an off-lined CPU to enter HLT-based idle in this condition.
Signed-off-by: NToshi Kani <toshi.kani@hp.com>
Cc: Boris Ostrovsky <boris.ostrovsky@amd.com>
Reviewed-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Tested-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

ee01e663

30 3月, 2012 3 次提交

idle, x86: Allow off-lined CPU to enter deeper C states · 1a022e3f

由 Boris Ostrovsky 提交于 3月 13, 2012

Currently when a CPU is off-lined it enters either MWAIT-based idle or,
if MWAIT is not desired or supported, HLT-based idle (which places the
processor in C1 state). This patch allows processors without MWAIT
support to stay in states deeper than C1.
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@amd.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

1a022e3f

cpuidle: use the driver's state_count as default · fc850f39

由 Daniel Lezcano 提交于 3月 26, 2012

If the state_count is not initialized for the device use
the driver's state count as the default. That will prevent
to add it manually in the cpuidle driver initialization
routine and will save us from duplicate line of code.
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NLen Brown <len.brown@intel.com>

fc850f39

cpuidle: add a sysfs entry to disable specific C state for debug purpose. · 3a53396b

由 ShuoX Liu 提交于 3月 28, 2012

Some C states of new CPU might be not good.  One reason is BIOS might
configure them incorrectly.  To help developers root cause it quickly, the
patch adds a new sysfs entry, so developers could disable specific C state
manually.

In addition, C state might have much impact on performance tuning, as it
takes much time to enter/exit C states, which might delay interrupt
processing.  With the new debug option, developers could check if a deep C
state could impact performance and how much impact it could cause.

Also add this option in Documentation/cpuidle/sysfs.txt.

[akpm@linux-foundation.org: check kstrtol return value]
Signed-off-by: NShuoX Liu <shuox.liu@intel.com>
Reviewed-by: NYanmin Zhang <yanmin_zhang@intel.com>
Reviewed-and-Tested-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLen Brown <len.brown@intel.com>

3a53396b

21 3月, 2012 1 次提交

cpuidle: Add common time keeping and irq enabling · e1689795

由 Robert Lee 提交于 3月 20, 2012

Make necessary changes to implement time keeping and irq enabling
in the core cpuidle code.  This will allow the removal of these
functionalities from various platform cpuidle implementations whose
timekeeping and irq enabling follows the form in this common code.
Signed-off-by: NRobert Lee <rob.lee@linaro.org>
Tested-by: NJean Pihet <j-pihet@ti.com>
Tested-by: NAmit Daniel <amit.kachhap@linaro.org>
Tested-by: NRobert Lee <rob.lee@linaro.org>
Reviewed-by: NKevin Hilman <khilman@ti.com>
Reviewed-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm.com>
Acked-by: NJean Pihet <j-pihet@ti.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

e1689795

13 2月, 2012 1 次提交

cpuidle/tracing: Denote the tracepoints as being in rcu_idle_exit() section · 76027ea8

由 Steven Rostedt 提交于 2月 07, 2012

As the tracepoints in the cpuidle code are called when rcu_idle_exit() is in
effect, the _rcuidle() version must be used, otherwise the rcu_read_lock()s
that protect the tracepoint will not be honored.

Cc: Len Brown <len.brown@intel.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

76027ea8

22 12月, 2011 1 次提交

cpu: convert 'cpu' and 'machinecheck' sysdev_class to a regular subsystem · 8a25a2fd

由 Kay Sievers 提交于 12月 21, 2011

This moves the 'cpu sysdev_class' over to a regular 'cpu' subsystem
and converts the devices to regular devices. The sysdev drivers are
implemented as subsystem interfaces now.

After all sysdev classes are ported to regular driver core entities, the
sysdev implementation will be entirely removed from the kernel.

Userspace relies on events and generic sysfs subsystem infrastructure
from sysdev devices, which are made available with this conversion.

Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Borislav Petkov <bp@amd64.org>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Cc: Len Brown <lenb@kernel.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

8a25a2fd

07 11月, 2011 4 次提交

cpuidle: Single/Global registration of idle states · 46bcfad7

由 Deepthi Dharwar 提交于 10月 28, 2011

This patch makes the cpuidle_states structure global (single copy)
instead of per-cpu. The statistics needed on per-cpu basis
by the governor are kept per-cpu. This simplifies the cpuidle
subsystem as state registration is done by single cpu only.
Having single copy of cpuidle_states saves memory. Rare case
of asymmetric C-states can be handled within the cpuidle driver
and architectures such as POWER do not have asymmetric C-states.

Having single/global registration of all the idle states,
dynamic C-state transitions on x86 are handled by
the boot cpu. Here, the boot cpu  would disable all the devices,
re-populate the states and later enable all the devices,
irrespective of the cpu that would receive the notification first.

Reference:
https://lkml.org/lkml/2011/4/25/83Signed-off-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: NTrinabh Gupta <g.trinabh@gmail.com>
Tested-by: NJean Pihet <j-pihet@ti.com>
Reviewed-by: NKevin Hilman <khilman@ti.com>
Acked-by: NArjan van de Ven <arjan@linux.intel.com>
Acked-by: NKevin Hilman <khilman@ti.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

46bcfad7

cpuidle: Split cpuidle_state structure and move per-cpu statistics fields · 4202735e

由 Deepthi Dharwar 提交于 10月 28, 2011

This is the first step towards global registration of cpuidle
states. The statistics used primarily by the governor are per-cpu
and have to be split from rest of the fields inside cpuidle_state,
which would be made global i.e. single copy. The driver_data field
is also per-cpu and moved.
Signed-off-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: NTrinabh Gupta <g.trinabh@gmail.com>
Tested-by: NJean Pihet <j-pihet@ti.com>
Reviewed-by: NKevin Hilman <khilman@ti.com>
Acked-by: NArjan van de Ven <arjan@linux.intel.com>
Acked-by: NKevin Hilman <khilman@ti.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

4202735e

cpuidle: Remove CPUIDLE_FLAG_IGNORE and dev->prepare() · b25edc42

由 Deepthi Dharwar 提交于 10月 28, 2011

The cpuidle_device->prepare() mechanism causes updates to the
cpuidle_state[].flags, setting and clearing CPUIDLE_FLAG_IGNORE
to tell the governor not to chose a state on a per-cpu basis at
run-time. State demotion is now handled by the driver and it returns
the actual state entered. Hence, this mechanism is not required.
Also this removes per-cpu flags from cpuidle_state enabling
it to be made global.

Reference:
https://lkml.org/lkml/2011/3/25/52Signed-off-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm>
Signed-off-by: NTrinabh Gupta <g.trinabh@gmail.com>
Tested-by: NJean Pihet <j-pihet@ti.com>
Acked-by: NArjan van de Ven <arjan@linux.intel.com>
Reviewed-by: NKevin Hilman <khilman@ti.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

b25edc42

cpuidle: Move dev->last_residency update to driver enter routine; remove dev->last_state · e978aa7d

由 Deepthi Dharwar 提交于 10月 28, 2011

Cpuidle governor only suggests the state to enter using the
governor->select() interface, but allows the low level driver to
override the recommended state. The actual entered state
may be different because of software or hardware demotion. Software
demotion is done by the back-end cpuidle driver and can be accounted
correctly. Current cpuidle code uses last_state field to capture the
actual state entered and based on that updates the statistics for the
state entered.

Ideally the driver enter routine should update the counters,
and it should return the state actually entered rather than the time
spent there. The generic cpuidle code should simply handle where
the counters live in the sysfs namespace, not updating the counters.

Reference:
https://lkml.org/lkml/2011/3/25/52Signed-off-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: NTrinabh Gupta <g.trinabh@gmail.com>
Tested-by: NJean Pihet <j-pihet@ti.com>
Reviewed-by: NKevin Hilman <khilman@ti.com>
Acked-by: NArjan van de Ven <arjan@linux.intel.com>
Acked-by: NKevin Hilman <khilman@ti.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

e978aa7d

01 11月, 2011 1 次提交
- P
  cpuidle: Add module.h to drivers/cpuidle files as required. · 884b17e1
  由 Paul Gortmaker 提交于 8月 29, 2011
```
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
```
  884b17e1
25 8月, 2011 1 次提交

PM QoS: Move and rename the implementation files · e8db0be1

由 Jean Pihet 提交于 8月 25, 2011

The PM QoS implementation files are better named
kernel/power/qos.c and include/linux/pm_qos.h.

The PM QoS support is compiled under the CONFIG_PM option.
Signed-off-by: NJean Pihet <j-pihet@ti.com>
Acked-by: Nmarkgross <markgross@thegnar.org>
Reviewed-by: NKevin Hilman <khilman@ti.com>
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

e8db0be1

04 8月, 2011 3 次提交

cpuidle: stop depending on pm_idle · a0bfa137

由 Len Brown 提交于 4月 01, 2011

cpuidle users should call cpuidle_call_idle() directly
rather than via (pm_idle)() function pointer.

Architecture may choose to continue using (pm_idle)(),
but cpuidle need not depend on it:

  my_arch_cpu_idle()
	...
	if(cpuidle_call_idle())
		pm_idle();

cc: Kevin Hilman <khilman@deeprootsystems.com>
cc: Paul Mundt <lethal@linux-sh.org>
cc: x86@kernel.org
Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

a0bfa137

cpuidle: replace xen access to x86 pm_idle and default_idle · d91ee586

由 Len Brown 提交于 4月 01, 2011

When a Xen Dom0 kernel boots on a hypervisor, it gets access
to the raw-hardware ACPI tables.  While it parses the idle tables
for the hypervisor's beneift, it uses HLT for its own idle.

Rather than have xen scribble on pm_idle and access default_idle,
have it simply disable_cpuidle() so acpi_idle will not load and
architecture default HLT will be used.

cc: xen-devel@lists.xensource.com
Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

d91ee586

cpuidle: create bootparam "cpuidle.off=1" · 62027aea

由 Len Brown 提交于 4月 01, 2011

useful for disabling cpuidle to fall back
to architecture-default idle loop

cpuidle drivers and governors will fail to register.
on x86 they'll say so:

intel_idle: intel_idle yielding to (null)
ACPI: acpi_idle yielding to (null)
Signed-off-by: NLen Brown <len.brown@intel.com>

62027aea

13 1月, 2011 2 次提交

cpuidle/x86/perf: fix power:cpu_idle double end events and throw cpu_idle... · f77cfe4e

由 Thomas Renninger 提交于 1月 07, 2011

cpuidle/x86/perf: fix power:cpu_idle double end events and throw cpu_idle events from the cpuidle layer

Currently intel_idle and acpi_idle driver show double cpu_idle "exit idle"
events -> this patch fixes it and makes cpu_idle events throwing less complex.

It also introduces cpu_idle events for all architectures which use
the cpuidle subsystem, namely:
  - arch/arm/mach-at91/cpuidle.c
  - arch/arm/mach-davinci/cpuidle.c
  - arch/arm/mach-kirkwood/cpuidle.c
  - arch/arm/mach-omap2/cpuidle34xx.c
  - arch/drivers/acpi/processor_idle.c (for all cases, not only mwait)
  - arch/x86/kernel/process.c (did throw events before, but was a mess)
  - drivers/idle/intel_idle.c (did throw events before)

Convention should be:
Fire cpu_idle events inside the current pm_idle function (not somewhere
down the the callee tree) to keep things easy.

Current possible pm_idle functions in X86:
c1e_idle, poll_idle, cpuidle_idle_call, mwait_idle, default_idle
-> this is really easy is now.

This affects userspace:
The type field of the cpu_idle power event can now direclty get
mapped to:
/sys/devices/system/cpu/cpuX/cpuidle/stateX/{name,desc,usage,time,...}
instead of throwing very CPU/mwait specific values.
This change is not visible for the intel_idle driver.
For the acpi_idle driver it should only be visible if the vendor
misses out C-states in his BIOS.
Another (perf timechart) patch reads out cpuidle info of cpu_idle
events from:
/sys/.../cpuidle/stateX/*, then the cpuidle events are mapped
to the correct C-/cpuidle state again, even if e.g. vendors miss
out C-states in their BIOS and for example only export C1 and C3.
-> everything is fine.
Signed-off-by: NThomas Renninger <trenn@suse.de>
CC: Robert Schoene <robert.schoene@tu-dresden.de>
CC: Jean Pihet <j-pihet@ti.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: linux-pm@lists.linux-foundation.org
CC: linux-acpi@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: linux-perf-users@vger.kernel.org
CC: linux-omap@vger.kernel.org
Signed-off-by: NLen Brown <len.brown@intel.com>

f77cfe4e

L
cpuidle: delete NOP CPUIDLE_FLAG_POLL · d247632c
由 Len Brown 提交于 1月 12, 2011
```
it serves no purpose
Signed-off-by: NLen Brown <len.brown@intel.com>
```
d247632c

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功