提交 · 99e98d3fb1008ef7416e16a1fd355cb73a253502 · openeuler / Kernel

06 11月, 2019 1 次提交

cpuidle: Consolidate disabled state checks · 99e98d3f

由 Rafael J. Wysocki 提交于 11月 04, 2019

There are two reasons why CPU idle states may be disabled: either
because the driver has disabled them or because they have been
disabled by user space via sysfs.

In the former case, the state's "disabled" flag is set once during
the initialization of the driver and it is never cleared later (it
is read-only effectively).  In the latter case, the "disable" field
of the given state's cpuidle_state_usage struct is set and it may be
changed via sysfs.  Thus checking whether or not an idle state has
been disabled involves reading these two flags every time.

In order to avoid the additional check of the state's "disabled" flag
(which is effectively read-only anyway), use the value of it at the
init time to set a (new) flag in the "disable" field of that state's
cpuidle_state_usage structure and use the sysfs interface to
manipulate another (new) flag in it.  This way the state is disabled
whenever the "disable" field of its cpuidle_state_usage structure is
nonzero, whatever the reason, and it is the only place to look into
to check whether or not the state has been disabled.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>

99e98d3f

30 7月, 2019 1 次提交

cpuidle: add poll_limit_ns to cpuidle_device structure · 259231a0

由 Marcelo Tosatti 提交于 7月 03, 2019

Add a poll_limit_ns variable to cpuidle_device structure.

Calculate and configure it in the new cpuidle_poll_time
function, in case its zero.

Individual governors are allowed to override this value.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

259231a0

10 4月, 2019 1 次提交

cpuidle: Export the next timer expiration for CPUs · 6f9b83ac

由 Ulf Hansson 提交于 3月 27, 2019

To be able to predict the sleep duration for a CPU entering idle, it
is essential to know the expiration time of the next timer.  Both the
teo and the menu cpuidle governors already use this information for
CPU idle state selection.

Moving forward, a similar prediction needs to be made for a group of
idle CPUs rather than for a single one and the following changes
implement a new genpd governor for that purpose.

In order to support that feature, add a new function called
tick_nohz_get_next_hrtimer() that will return the next hrtimer
expiration time of a given CPU to be invoked after deciding
whether or not to stop the scheduler tick on that CPU.

Make the cpuidle core call tick_nohz_get_next_hrtimer() right
before invoking the ->enter() callback provided by the cpuidle
driver for the given state and store its return value in the
per-CPU struct cpuidle_device, so as to make it available to code
outside of cpuidle.

Note that at the point when cpuidle calls tick_nohz_get_next_hrtimer(),
the governor's ->select() callback has already returned and indicated
whether or not the tick should be stopped, so in fact the value
returned by tick_nohz_get_next_hrtimer() always is the next hrtimer
expiration time for the given CPU, possibly including the tick (if
it hasn't been stopped).
Co-developed-by: NLina Iyer <lina.iyer@linaro.org>
Co-developed-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
[ rjw: Subject & changelog ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

6f9b83ac

13 12月, 2018 1 次提交

cpuidle: Add 'above' and 'below' idle state metrics · 04dab58a

由 Rafael J. Wysocki 提交于 12月 10, 2018

Add two new metrics for CPU idle states, "above" and "below", to count
the number of times the given state had been asked for (or entered
from the kernel's perspective), but the observed idle duration turned
out to be too short or too long for it (respectively).

These metrics help to estimate the quality of the CPU idle governor
in use.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

04dab58a

11 12月, 2018 1 次提交

cpuidle: Add cpuidle.governor= command line parameter · 61cb5758

由 Rafael J. Wysocki 提交于 12月 05, 2018

Add cpuidle.governor= command line parameter to allow the default
cpuidle governor to be replaced.

That is useful, for example, if someone running a tickful kernel
wants to use the menu governor on it.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

61cb5758

18 9月, 2018 1 次提交

cpuidle: enter_state: Don't needlessly calculate diff time · 7037b43e

由 Fieah Lim 提交于 9月 11, 2018

Currently, ktime_us_delta() is invoked unconditionally to compute the
idle residency of the CPU, but it only makes sense to do that if a
valid idle state has been entered, so move the ktime_us_delta()
invocation after the entered_state >= 0 check.

While at it, merge two comment blocks in there into one and drop
a space between type casting of diff.

This patch has no functional changes.
Signed-off-by: NFieah Lim <kw@fieahl.im>
[ rjw: Changelog cleanup, comment format fix ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

7037b43e

06 4月, 2018 1 次提交

cpuidle: Return nohz hint from cpuidle_select() · 45f1ff59

由 Rafael J. Wysocki 提交于 3月 22, 2018

Add a new pointer argument to cpuidle_select() and to the ->select
cpuidle governor callback to allow a boolean value indicating
whether or not the tick should be stopped before entering the
selected state to be returned from there.

Make the ladder governor ignore that pointer (to preserve its
current behavior) and make the menu governor return 'false" through
it if:
 (1) the idle exit latency is constrained at 0, or
 (2) the selected state is a polling one, or
 (3) the expected idle period duration is within the tick period
     range.

In addition to that, the correction factor computations in the menu
governor need to take the possibility that the tick may not be
stopped into account to avoid artificially small correction factor
values.  To that end, add a mechanism to record tick wakeups, as
suggested by Peter Zijlstra, and use it to modify the menu_update()
behavior when tick wakeup occurs.  Namely, if the CPU is woken up by
the tick and the return value of tick_nohz_get_sleep_length() is not
within the tick boundary, the predicted idle duration is likely too
short, so make menu_update() try to compensate for that by updating
the governor statistics as though the CPU was idle for a long time.

Since the value returned through the new argument pointer of
cpuidle_select() is not used by its caller yet, this change by
itself is not expected to alter the functionality of the code.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>

45f1ff59

29 3月, 2018 1 次提交

PM: cpuidle/suspend: Add s2idle usage and time state attributes · 64bdff69

由 Rafael J. Wysocki 提交于 3月 14, 2018

Add a new attribute group called "s2idle" under the sysfs directory
of each cpuidle state that supports the ->enter_s2idle callback
and put two new attributes, "usage" and "time", into that group to
represent the number of times the given state was requested for
suspend-to-idle and the total time spent in suspend-to-idle after
requesting that state, respectively.

That will allow diagnostic information related to suspend-to-idle
to be collected without enabling advanced debug features and
analyzing dmesg output.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

64bdff69

09 11月, 2017 2 次提交

cpuidle: Avoid assignment in if () argument · 3fc74bd8

由 Gaurav Jindal 提交于 9月 02, 2017

Clean up cpuidle_enable_device() to avoid doing an assignment
in an expression evaluated as an argument of if (), which also
makes the code in question more readable.
Signed-off-by: NGaurav Jindal <gauravjindal1104@gmail.com>
[ rjw: Subject & changelog ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

3fc74bd8

cpuidle: Clean up cpuidle_enable_device() error handling a bit · e7b06a09

由 Gaurav Jindal 提交于 9月 01, 2017

Do not fetch per CPU drv if cpuidle_curr_governor is NULL
to avoid useless per CPU processing.
Signed-off-by: NGaurav Jindal <gauravjindal1104@gmail.com>
[ rjw: Subject & changelog ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

e7b06a09

28 9月, 2017 1 次提交

cpuidle: fix broadcast control when broadcast can not be entered · f187851b

由 Nicholas Piggin 提交于 9月 01, 2017

When failing to enter broadcast timer mode for an idle state that
requires it, a new state is selected that does not require broadcast,
but the broadcast variable remains set. This causes
tick_broadcast_exit to be called despite not having entered broadcast
mode.

This causes the WARN_ON_ONCE(!irqs_disabled()) to trigger in some
cases. It does not appear to cause problems for code today, but seems
to violate the interface so should be fixed.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

f187851b

11 8月, 2017 1 次提交

PM / s2idle: Rename ->enter_freeze to ->enter_s2idle · 28ba086e

由 Rafael J. Wysocki 提交于 8月 10, 2017

Rename the ->enter_freeze cpuidle driver callback to ->enter_s2idle
to make it clear that it is used for entering suspend-to-idle and
rename the related functions, variables and so on accordingly.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

28ba086e

15 5月, 2017 1 次提交

cpuidle: Fix idle time tracking · f9fccdb9

由 Peter Zijlstra 提交于 4月 21, 2017

Ville reported that on his Core2, which has TSC stop in idle, we would
always report very short idle durations. He tracked this down to
commit:

  e93e59ce ("cpuidle: Replace ktime_get() with local_clock()")

which replaces ktime_get() with local_clock().

Add a sched_clock_idle_wakeup_event() call, which will re-sync the
clock with ktime_get_ns() when TSC is unstable and no-op otherwise.
Reported-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Fixes: e93e59ce ("cpuidle: Replace ktime_get() with local_clock()")
Signed-off-by: NIngo Molnar <mingo@kernel.org>

f9fccdb9

02 5月, 2017 1 次提交

cpuidle: check dev before usage in cpuidle_use_deepest_state() · 41dc750e

由 Li, Fei 提交于 4月 27, 2017

In case of there is no cpuidle devices registered, dev will be null, and
panic will be triggered like below;
In this patch, add checking of dev before usage, like that done in
cpuidle_idle_call.

Panic without fix:
[  184.961328] BUG: unable to handle kernel NULL pointer dereference at
  (null)
[  184.961328] IP: cpuidle_use_deepest_state+0x30/0x60
...
[  184.961328]  play_idle+0x8d/0x210
[  184.961328]  ? __schedule+0x359/0x8e0
[  184.961328]  ? _raw_spin_unlock_irqrestore+0x28/0x50
[  184.961328]  ? kthread_queue_delayed_work+0x41/0x80
[  184.961328]  clamp_idle_injection_func+0x64/0x1e0

Fixes: bb8313b6 (cpuidle: Allow enforcing deepest idle state selection)
Signed-off-by: NLi, Fei <fei.li@intel.com>
Tested-by: NShi, Feng <fengx.shi@intel.com>
Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: 4.10+ <stable@vger.kernel.org> # 4.10+
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

41dc750e

02 3月, 2017 1 次提交

sched/headers: Prepare for new header dependencies before moving code to <linux/sched/clock.h> · e6017571

由 Ingo Molnar 提交于 2月 01, 2017

We are going to split <linux/sched/clock.h> out of <linux/sched.h>, which
will have to be picked up from other headers and .c files.

Create a trivial placeholder <linux/sched/clock.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

e6017571

06 12月, 2016 1 次提交

cpuidle: Add a kerneldoc comment to cpuidle_use_deepest_state() · 0e7414b7

由 Rafael J. Wysocki 提交于 12月 01, 2016

Since cpuidle_use_deepest_state() is not static, add a proper
kerneldoc comment to it to document its purpose.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

0e7414b7

29 11月, 2016 1 次提交

cpuidle: Allow enforcing deepest idle state selection · bb8313b6

由 Jacob Pan 提交于 11月 28, 2016

When idle injection is used to cap power, we need to override the
governor's choice of idle states.

For this reason, make it possible the deepest idle state selection to
be enforced by setting a flag on a given CPU to achieve the maximum
potential power draw reduction.
Signed-off-by: NJacob Pan <jacob.jun.pan@linux.intel.com>
[ rjw: Subject & changelog ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

bb8313b6

04 7月, 2016 1 次提交

cpuidle: Fix last_residency division · dbd1b8ea

由 Shreyas B. Prabhu 提交于 7月 01, 2016

Snooze is a poll idle state in powernv and pseries platforms. Snooze
has a timeout so that if a CPU stays in snooze for more than target
residency of the next available idle state, then it would exit
thereby giving chance to the cpuidle governor to re-evaluate and
promote the CPU to a deeper idle state. Therefore whenever snooze
exits due to this timeout, its last_residency will be target_residency
of the next deeper state.

Commit e93e59ce "cpuidle: Replace ktime_get() with local_clock()"
changed the math around last_residency calculation. Specifically,
while converting last_residency value from nano- to microseconds, it
carries out right shift by 10. Because of that, in snooze timeout
exit scenarios last_residency calculated is roughly 2.3% less than
target_residency of the next available state. This pattern is picked
up by get_typical_interval() in the menu governor and therefore
expected_interval in menu_select() is frequently less than the
target_residency of any state other than snooze.

Due to this we are entering snooze at a higher rate, thereby
affecting the single thread performance.

Fix this by using more precise division via ktime_us_delta().

Fixes: e93e59ce "cpuidle: Replace ktime_get() with local_clock()"
Reported-by: NAnton Blanchard <anton@samba.org>
Bisected-by: NShilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Acked-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: NBalbir Singh <bsingharora@gmail.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

dbd1b8ea

18 5月, 2016 1 次提交

cpuidle: Fix cpuidle_state_is_coupled() argument in cpuidle_enter() · e7387da5

由 Daniel Lezcano 提交于 5月 17, 2016

Commit 0b89e9aa (cpuidle: delay enabling interrupts until all
coupled CPUs leave idle) rightfully fixed a regression by letting
the coupled idle state framework to handle local interrupt enabling
when the CPU is exiting an idle state.

The current code checks if the idle state is coupled and, if so, it
will let the coupled code to enable interrupts. This way, it can
decrement the ready-count before handling the interrupt. This
mechanism prevents the other CPUs from waiting for a CPU which is
handling interrupts.

But the check is done against the state index returned by the back
end driver's ->enter functions which could be different from the
initial index passed as parameter to the cpuidle_enter_state()
function.

 entered_state = target_state->enter(dev, drv, index);

 [ ... ]

 if (!cpuidle_state_is_coupled(drv, entered_state))
	local_irq_enable();

 [ ... ]

If the 'index' is referring to a coupled idle state but the
'entered_state' is *not* coupled, then the interrupts are enabled
again. All CPUs blocked on the sync barrier may busy loop longer
if the CPU has interrupts to handle before decrementing the
ready-count. That's consuming more energy than saving.

Fixes: 0b89e9aa (cpuidle: delay enabling interrupts until all coupled CPUs leave idle)
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Cc: 3.15+ <stable@vger.kernel.org> # 3.15+
[ rjw: Subject & changelog ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

e7387da5

26 4月, 2016 1 次提交

cpuidle: Replace ktime_get() with local_clock() · e93e59ce

由 Daniel Lezcano 提交于 4月 21, 2016

The ktime_get() can have a non negligeable overhead, use local_clock()
instead.

In order to test the difference between ktime_get() and local_clock(),
a quick hack has been added to trigger, via debugfs, 10000 times a
call to ktime_get() and local_clock() and measure the elapsed time.

Then the average value, the min and max is computed for each call.

From userspace, the test above was called 100 times every 2 seconds.

So, ktime_get() and local_clock() have been called 1000000 times in
total.

The results are:

ktime_get():
============
 * average: 101 ns (stddev: 27.4)
 * maximum: 38313 ns
 * minimum: 65 ns

local_clock():
==============
 * average: 60 ns (stddev: 9.8)
 * maximum: 13487 ns
 * minimum: 46 ns

The local_clock() is faster and more stable.

Even if it is a drop in the ocean, changing the ktime_get() by the
local_clock() allows to save 80ns at idle time (entry + exit). And
in some circumstances, especially when there are several CPUs racing
for the clock access, we save tens of microseconds.

The idle duration resulting from a diff is converted from nanosec to
microsec. This could be done with integer division (div 1000) - which is
an expensive operation or by 10 bits shifting (div 1024) - which is fast
but unprecise.

The following table gives some results at the limits.

 ------------------------------------------
|   nsec   |   div(1000)   |   div(1024)   |
 ------------------------------------------
|   1e3    |        1 usec |      976 nsec |
 ------------------------------------------
|   1e6    |     1000 usec |      976 usec |
 ------------------------------------------
|   1e9    |  1000000 usec |   976562 usec |
 ------------------------------------------

There is a linear deviation of 2.34%. This loss of precision is acceptable
in the context of the resulting diff which is used for statistics. These
ones are processed to guess estimate an approximation of the duration of the
next idle period which ends up into an idle state selection. The selection
criteria takes into account the next duration based on large intervals,
represented by the idle state's target residency.

The 2^10 division is enough because the approximation regarding the 1e3
division is lost in all the approximations done for the next idle duration
computation.
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
[ rjw: Subject ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

e93e59ce

09 4月, 2016 1 次提交

cpuidle: Indicate when a device has been unregistered · c998c078

由 Dave Gerlach 提交于 4月 05, 2016

Currently the 'registered' member of the cpuidle_device struct is set
to 1 during cpuidle_register_device. In this same function there are
checks to see if the device is already registered to prevent duplicate
calls to register the device, but this value is never set to 0 even on
unregister of the device. Because of this, any attempt to call
cpuidle_register_device after a call to cpuidle_unregister_device will
fail which shouldn't be the case.

To prevent this, set registered to 0 when the device is unregistered.

Fixes: c878a52d (cpuidle: Check if device is already registered)
Signed-off-by: NDave Gerlach <d-gerlach@ti.com>
Acked-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

c998c078

22 1月, 2016 1 次提交

cpuidle: fix fallback mechanism for suspend to idle in absence of enter_freeze · 6f16886b

由 Sudeep Holla 提交于 1月 21, 2016

Commit 51164251 "sched / idle: Drop default_idle_call() fallback
from call_cpuidle()" made find_deepest_state() return non-negative
value and check all the states with index > 0. Also as a result,
find_deepest_state() returns 0 even when enter_freeze callbacks are not
implemented and enter_freeze_proper() is called which ends up crashing
the kernel.

This patch updates the check for index > 0 in cpuidle_enter_freeze and
cpuidle_idle_call(when idle_should_freeze is true) to restore the
suspend-to-idle functionality in absence of enter_freeze callback.

Fixes: 51164251 "sched / idle: Drop default_idle_call() fallback from call_cpuidle()"
Signed-off-by: NSudeep Holla <sudeep.holla@arm.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

6f16886b

19 1月, 2016 1 次提交

sched / idle: Drop default_idle_call() fallback from call_cpuidle() · 51164251

由 Rafael J. Wysocki 提交于 1月 16, 2016

After commit 9c4b2867 (cpuidle: menu: Fix menu_select() for
CPUIDLE_DRIVER_STATE_START == 0) it is clear that menu_select()
cannot return negative values.  Moreover, ladder_select_state()
will never return a negative value too, so make find_deepest_state()
return non-negative values too and drop the default_idle_call()
fallback from call_cpuidle().

This eliminates one branch from the idle loop and makes the governors
and find_deepest_state() handle the case when all states have been
disabled from sysfs consistently.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NIngo Molnar <mingo@kernel.org>
Tested-by: NSudeep Holla <sudeep.holla@arm.com>

51164251

28 8月, 2015 1 次提交

cpuidle/coupled: Remove redundant 'dev' argument of cpuidle_state_is_coupled() · 4c1ed5a6

由 Xunlei Pang 提交于 8月 04, 2015

For cpuidle_state_is_coupled(), 'dev' is not used, so remove it.
Signed-off-by: NXunlei Pang <pang.xunlei@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

4c1ed5a6

21 7月, 2015 1 次提交

sched/idle: Move latency tracing stop/start calls deeper inside the idle loop · 63caae84

由 Lucas Stach 提交于 7月 20, 2015

Make sure to stop tracing only once we are past a point where
all latency tracing events have been processed (irqs are not
enabled again). This has the slight advantage of capturing more
latency related events in the idle path, but most importantly it
makes sure that latency tracing doesn't get re-enabled
inadvertently when new events are coming in.

This makes the irqsoff latency tracer useful again, as we stop
capturing CPU sleep time as IRQ latency.
Signed-off-by: NLucas Stach <l.stach@pengutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel@pengutronix.de
Cc: patchwork-lst@pengutronix.de
Link: http://lkml.kernel.org/r/1437410090-3747-1-git-send-email-l.stach@pengutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

63caae84

10 7月, 2015 1 次提交

suspend-to-idle: Prevent RCU from complaining about tick_freeze() · ae0afb4f

由 Rafael J. Wysocki 提交于 7月 09, 2015

Put tick_freeze() under RCU_NONIDLE() to prevent RCU from complaining
about suspicious RCU usage in idle by trace_suspend_resume() called
from there.

While at it, fix a comment related to another usage of RCU_NONIDLE()
in enter_freeze_proper().
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

ae0afb4f

30 5月, 2015 1 次提交

cpuidle: Do not use CPUIDLE_DRIVER_STATE_START in cpuidle.c · 7d51d979

由 Rafael J. Wysocki 提交于 5月 28, 2015

The CPUIDLE_DRIVER_STATE_START symbol is defined as 1 only if
CONFIG_ARCH_HAS_CPU_RELAX is set, otherwise it is defined as 0.
However, if CONFIG_ARCH_HAS_CPU_RELAX is set, the first (index 0)
entry in the cpuidle driver's table of states is overwritten with
the default "poll" entry by the core.  The "state" defined by the
"poll" entry doesn't provide ->enter_dead and ->enter_freeze
callbacks and its exit_latency is 0.

For this reason, it is not necessary to use CPUIDLE_DRIVER_STATE_START
in cpuidle_play_dead() (->enter_dead is NULL, so the "poll state"
will be skipped by the loop).

It also is arguably unuseful to return states with exit_latency
equal to 0 from find_deepest_state(), so the function can be modified
to start the loop from index 0 and the "poll state" will be skipped by
it as a result of the check against latency_req.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>

7d51d979

19 5月, 2015 1 次提交

PM / sleep: Make suspend-to-idle-specific code depend on CONFIG_SUSPEND · 87e9b9f1

由 Rafael J. Wysocki 提交于 5月 16, 2015

Since idle_should_freeze() is defined to always return 'false'
for CONFIG_SUSPEND unset, all of the code depending on it in
cpuidle_idle_call() is not necessary in that case.

Make that code depend on CONFIG_SUSPEND too to avoid building it
when it is not going to be used.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NThomas Gleixner <tglx@linutronix.de>

87e9b9f1

15 5月, 2015 3 次提交

cpuidle: Select a different state on tick_broadcast_enter() failures · 0d94039f

由 Rafael J. Wysocki 提交于 5月 10, 2015

If tick_broadcast_enter() fails in cpuidle_enter_state(),
try to find another idle state to enter instead of invoking
default_idle_call() immediately and returning -EBUSY which
should increase the chances of saving some energy in those
cases.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
Tested-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
Tested-by: NSudeep Holla <sudeep.holla@arm.com>
Acked-by: NKevin Hilman <khilman@linaro.org>

0d94039f

sched / idle: Call default_idle_call() from cpuidle_enter_state() · 827a5aef

由 Rafael J. Wysocki 提交于 5月 10, 2015

The check of the cpuidle_enter() return value against -EBUSY
made in call_cpuidle() will not be necessary any more if
cpuidle_enter_state() calls default_idle_call() directly when it
is about to return -EBUSY, so make that happen and eliminate the
check.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
Tested-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
Tested-by: NSudeep Holla <sudeep.holla@arm.com>
Acked-by: NKevin Hilman <khilman@linaro.org>

827a5aef

sched / idle: Call idle_set_state() from cpuidle_enter_state() · faad3849

由 Rafael J. Wysocki 提交于 5月 10, 2015

Introduce a wrapper function around idle_set_state() called
sched_idle_set_state() that will pass this_rq() to it as the
first argument and make cpuidle_enter_state() call the new
function before and after entering the target state.

At the same time, remove direct invocations of idle_set_state()
from call_cpuidle().

This will allow the invocation of default_idle_call() to be
moved from call_cpuidle() to cpuidle_enter_state() safely
and call_cpuidle() to be simplified a bit as a result.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
Tested-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
Tested-by: NSudeep Holla <sudeep.holla@arm.com>
Acked-by: NKevin Hilman <khilman@linaro.org>

faad3849

10 5月, 2015 1 次提交

cpuidle: Fix the kerneldoc comment for cpuidle_enter_state() · 7312280b

由 Rafael J. Wysocki 提交于 5月 09, 2015

The kerneldoc comment for cpuidle_enter_state() doesn't match the
function's header any more, so fix it.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

7312280b

05 5月, 2015 1 次提交

cpuidle: Check the sign of index in cpuidle_reflect() · a802ea96

由 Rafael J. Wysocki 提交于 5月 04, 2015

Avoid calling the governor's ->reflect method if the state index
passed to cpuidle_reflect() is negative.

This allows the analogous check to be dropped from menu_reflect(),
so do that too, and ensures that arbitrary error codes can be
passed to cpuidle_reflect() as the index with no adverse
consequences.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>

a802ea96

29 4月, 2015 1 次提交

cpuidle: Run tick_broadcast_exit() with disabled interrupts · df8d9eea

由 Rafael J. Wysocki 提交于 4月 29, 2015

Commit 335f4919 (sched/idle: Use explicit broadcast oneshot
control function) replaced clockevents_notify() invocations in
cpuidle_idle_call() with direct calls to tick_broadcast_enter()
and tick_broadcast_exit(), but it overlooked the fact that
interrupts were already enabled before calling the latter which
led to functional breakage on systems using idle states with the
CPUIDLE_FLAG_TIMER_STOP flag set.

Fix that by moving the invocations of tick_broadcast_enter()
and tick_broadcast_exit() down into cpuidle_enter_state() where
interrupts are still disabled when tick_broadcast_exit() is
called.  Also ensure that interrupts will be disabled before
running tick_broadcast_exit() even if they have been enabled by
the idle state's ->enter callback.  Trigger a WARN_ON_ONCE() in
that case, as we generally don't want that to happen for states
with CPUIDLE_FLAG_TIMER_STOP set.

Fixes: 335f4919 (sched/idle: Use explicit broadcast oneshot control function)
Reported-and-tested-by: NLinus Walleij <linus.walleij@linaro.org>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Reported-and-tested-by: NSudeep Holla <sudeep.holla@arm.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

df8d9eea

03 4月, 2015 1 次提交

cpuidle: remove state_count field from struct cpuidle_device · d75e4af1

由 Bartlomiej Zolnierkiewicz 提交于 3月 31, 2015

Thomas Schlichter reports the following issue on his Samsung NC20:

"The C-states C1 and C2 to the OS when connected to AC, and additionally
 provides the C3 C-state when disconnected from AC.  However, the number
 of C-states shown in sysfs is fixed to the number of C-states present
 at boot.
   If I boot with AC connected, I always only see the C-states up to C2
   even if I disconnect AC.

   The reason is commit 130a5f69 (ACPI / cpuidle: remove dev->state_count
   setting).  It removes the update of dev->state_count, but sysfs uses
   exactly this variable to show the C-states.

   The fix is to use drv->state_count in sysfs.  As this is currently the
   last user of dev->state_count, this variable can be completely removed."

Remove dev->state_count as per the above.
Reported-by: NThomas Schlichter <thomas.schlichter@web.de>
Signed-off-by: NBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: NKyungmin Park <kyungmin.park@samsung.com>
Acked-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
[ rjw: Changelog ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

d75e4af1

06 3月, 2015 1 次提交

cpuidle / sleep: Use broadcast timer for states that stop local timer · ef2b22ac

由 Rafael J. Wysocki 提交于 3月 02, 2015

Commit 38106313 (PM / sleep: Re-implement suspend-to-idle handling)
overlooked the fact that entering some sufficiently deep idle states
by CPUs may cause their local timers to stop and in those cases it
is necessary to switch over to a broadcast timer prior to entering
the idle state. If the cpuidle driver in use does not provide
the new ->enter_freeze callback for any of the idle states, that
problem affects suspend-to-idle too, but it is not taken into account
after the changes made by commit 38106313.

Fix that by changing the definition of cpuidle_enter_freeze() and
re-arranging of the code in cpuidle_idle_call(), so the former does
not call cpuidle_enter() any more and the fallback case is handled
by cpuidle_idle_call() directly.

Fixes: 38106313 (PM / sleep: Re-implement suspend-to-idle handling)
Reported-and-tested-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>

ef2b22ac

01 3月, 2015 2 次提交

cpuidle / sleep: Do sanity checks in cpuidle_enter_freeze() too · 31a34090

由 Rafael J. Wysocki 提交于 2月 27, 2015

Modify cpuidle_enter_freeze() to do the sanity checks done by
cpuidle_select() to avoid crashing the suspend-to-idle code
path in case something is missing.

Fixes: 38106313 (PM / sleep: Re-implement suspend-to-idle handling)
Original-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>

31a34090

idle / sleep: Avoid excessive disabling and enabling interrupts · 01e04f46

由 Rafael J. Wysocki 提交于 2月 27, 2015

Disabling interrupts at the end of cpuidle_enter_freeze() is not
useful, because its caller, cpuidle_idle_call(), re-enables them
right away after invoking it.

To avoid that unnecessary back and forth dance with interrupts,
make cpuidle_enter_freeze() enable interrupts after calling
enter_freeze_proper() and drop the local_irq_disable() at its
end, so that all of the code paths in it end up with interrupts
enabled.  Then, cpuidle_idle_call() will not need to re-enable
interrupts after calling cpuidle_enter_freeze() any more, because
the latter will return with interrupts enabled, in analogy with
cpuidle_enter().
Reported-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>

01e04f46

16 2月, 2015 1 次提交

PM / sleep: Make it possible to quiesce timers during suspend-to-idle · 124cf911

由 Rafael J. Wysocki 提交于 2月 13, 2015

The efficiency of suspend-to-idle depends on being able to keep CPUs
in the deepest available idle states for as much time as possible.
Ideally, they should only be brought out of idle by system wakeup
interrupts.

However, timer interrupts occurring periodically prevent that from
happening and it is not practical to chase all of the "misbehaving"
timers in a whack-a-mole fashion. A much more effective approach is
to suspend the local ticks for all CPUs and the entire timekeeping
along the lines of what is done during full suspend, which also
helps to keep suspend-to-idle and full suspend reasonably similar.

The idea is to suspend the local tick on each CPU executing
cpuidle_enter_freeze() and to make the last of them suspend the
entire timekeeping. That should prevent timer interrupts from
triggering until an IO interrupt wakes up one of the CPUs. It
needs to be done with interrupts disabled on all of the CPUs,
though, because otherwise the suspended clocksource might be
accessed by an interrupt handler which might lead to fatal
consequences.

Unfortunately, the existing ->enter callbacks provided by cpuidle
drivers generally cannot be used for implementing that, because some
of them re-enable interrupts temporarily and some idle entry methods
cause interrupts to be re-enabled automatically on exit. Also some
of these callbacks manipulate local clock event devices of the CPUs
which really shouldn't be done after suspending their ticks.

To overcome that difficulty, introduce a new cpuidle state callback,
->enter_freeze, that will be guaranteed (1) to keep interrupts
disabled all the time (and return with interrupts disabled) and (2)
not to touch the CPU timer devices. Modify cpuidle_enter_freeze() to
look for the deepest available idle state with ->enter_freeze present
and to make the CPU execute that callback with suspended tick (and the
last of the online CPUs to execute it with suspended timekeeping).
Suggested-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>

124cf911

14 2月, 2015 1 次提交

PM / sleep: Re-implement suspend-to-idle handling · 38106313

由 Rafael J. Wysocki 提交于 2月 12, 2015

In preparation for adding support for quiescing timers in the final
stage of suspend-to-idle transitions, rework the freeze_enter()
function making the system wait on a wakeup event, the freeze_wake()
function terminating the suspend-to-idle loop and the mechanism by
which deep idle states are entered during suspend-to-idle.

First of all, introduce a simple state machine for suspend-to-idle
and make the code in question use it.

Second, prevent freeze_enter() from losing wakeup events due to race
conditions and ensure that the number of online CPUs won't change
while it is being executed.  In addition to that, make it force
all of the CPUs re-enter the idle loop in case they are in idle
states already (so they can enter deeper idle states if possible).

Next, drop cpuidle_use_deepest_state() and replace use_deepest_state
checks in cpuidle_select() and cpuidle_reflect() with a single
suspend-to-idle state check in cpuidle_idle_call().

Finally, introduce cpuidle_enter_freeze() that will simply find the
deepest idle state available to the given CPU and enter it using
cpuidle_enter().
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>

38106313

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功