提交 · 78e2708691e9289f97750eb71aca31b5a2973d94 · openanolis / cloud-kernel

21 7月, 2014 11 次提交

cpufreq: intel_pstate: Remove core_pct rounding · 78e27086

由 Stratos Karafotis 提交于 7月 18, 2014

The specific rounding adds conditionally only 1/256 to fractional
part of core_pct.

We can safely remove it without any noticeable impact in
calculations.

Use div64_u64 instead of div_u64 to avoid possible overflow of
sample->mperf as divisor
Signed-off-by: NStratos Karafotis <stratosk@semaphore.gr>
Signed-off-by: NDirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

78e27086

cpufreq: intel_pstate: Simplify P state adjustment logic. · 4b707c89

由 Stratos Karafotis 提交于 7月 18, 2014

Simplify the code by removing the inline functions pstate_increase and
pstate_decrease and use directly the intel_pstate_set_pstate.
Signed-off-by: NStratos Karafotis <stratosk@semaphore.gr>
Acked-by: NDirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

4b707c89

cpufreq: intel_pstate: Keep values in aperf/mperf in full precision · ac658131

由 Stratos Karafotis 提交于 7月 18, 2014

Currently we shift right aperf and mperf variables by FRAC_BITS
to prevent overflow when we convert them to fix point numbers
(shift left by FRAC_BITS).

But this is not necessary, because we actually use delta aperf and mperf
which are much less than APERF and MPERF values.

So, use the unmodified APERF and MPERF values in calculation.
This also adds 8 bits in precision, although the gain is insignificant.
Signed-off-by: NStratos Karafotis <stratosk@semaphore.gr>
Signed-off-by: NDirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

ac658131

cpufreq: intel_pstate: Disable interrupts during MSRs reading · 4ab60c3f

由 Stratos Karafotis 提交于 7月 18, 2014

According to Intel 64 and IA-32 Architectures SDM, Volume 3,
Chapter 14.2, "Software needs to exercise care to avoid delays
between the two RDMSRs (for example interrupts)".

So, disable interrupts during reading MSRs IA32_APERF and IA32_MPERF.
This should increase the accuracy of the calculations.
Signed-off-by: NStratos Karafotis <stratosk@semaphore.gr>
Signed-off-by: NDirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

4ab60c3f

cpufreq: intel_pstate: Align multiple lines to open parenthesis · c410833a

由 Stratos Karafotis 提交于 7月 18, 2014

Suppress checkpatch.pl --strict warnings:
CHECK: Alignment should match open parenthesis
Signed-off-by: NStratos Karafotis <stratosk@semaphore.gr>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

c410833a

cpufreq: intel_pstate: Remove unnecessary intermediate variable sample_time · abf013bf

由 Stratos Karafotis 提交于 7月 18, 2014

Remove the unnecessary intermediate assignment and use directly the
pid_params.sample_rate_ms variable.
Signed-off-by: NStratos Karafotis <stratosk@semaphore.gr>
Signed-off-by: NDirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

abf013bf

cpufreq: intel_pstate: Cleanup parentheses · 285cb990

由 Stratos Karafotis 提交于 7月 18, 2014

Remove unnecessary parentheses.
Also, add parentheses in one case for better readability.
Signed-off-by: NStratos Karafotis <stratosk@semaphore.gr>
Signed-off-by: NDirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

285cb990

cpufreq: intel_pstate: Fit code in a single line where possible · 2d8d1f18

由 Stratos Karafotis 提交于 7月 18, 2014

We can fit these lines in a single one, under the 80 characters
limit.
Signed-off-by: NStratos Karafotis <stratosk@semaphore.gr>
Signed-off-by: NDirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

2d8d1f18

cpufreq: intel_pstate: Add missing blank lines after declarations · 845c1cbe

由 Stratos Karafotis 提交于 7月 18, 2014

Also, remove unnecessary blank lines.
Signed-off-by: NStratos Karafotis <stratosk@semaphore.gr>
Signed-off-by: NDirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

845c1cbe

cpufreq: intel_pstate: Remove unnecessary type casting in div_s64() call · fa30dff9

由 Stratos Karafotis 提交于 7月 18, 2014

div_s64() accepts the divisor parameter as s32. Helper div_fp()
also accepts divisor as int32_t.

So, remove the unnecessary int64_t type casting.
Signed-off-by: NStratos Karafotis <stratosk@semaphore.gr>
Signed-off-by: NDirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

fa30dff9

cpufreq: intel_pstate: Make intel_pstate_kobject and debugfs_parent locals · 317dd50e

由 Stratos Karafotis 提交于 7月 18, 2014

Since we never remove sysfs entry and debugfs files, we can make
the intel_pstate_kobject and debugfs_parent locals.

Also, annotate with __init intel_pstate_sysfs_expose_params()
and intel_pstate_debug_expose_params() in order to be freed
after bootstrap.
Signed-off-by: NStratos Karafotis <stratosk@semaphore.gr>
Signed-off-by: NDirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

317dd50e

17 7月, 2014 1 次提交

cpufreq: move policy kobj to policy->cpu at resume · 92c14bd9

由 Viresh Kumar 提交于 7月 17, 2014

This is only relevant to implementations with multiple clusters, where clusters
have separate clock lines but all CPUs within a cluster share it.

Consider a dual cluster platform with 2 cores per cluster. During suspend we
start hot unplugging CPUs in order 1 to 3. When CPU2 is removed, policy->kobj
would be moved to CPU3 and when CPU3 goes down we wouldn't free policy or its
kobj as we want to retain permissions/values/etc.

Now on resume, we will get CPU2 before CPU3 and will call __cpufreq_add_dev().
We will recover the old policy and update policy->cpu from 3 to 2 from
update_policy_cpu().

But the kobj is still tied to CPU3 and isn't moved to CPU2. We wouldn't create a
link for CPU2, but would try that for CPU3 while bringing it online. Which will
report errors as CPU3 already has kobj assigned to it.

This bug got introduced with commit 42f921a6, which overlooked this scenario.

To fix this, lets move kobj to the new policy->cpu while bringing first CPU of a
cluster back. Also do a WARN_ON() if kobject_move failed, as we would reach here
only for the first CPU of a non-boot cluster. And we can't recover from this
situation, if kobject_move() fails.

Fixes: 42f921a6 (cpufreq: remove sysfs files for CPUs which failed to come back after resume)
Cc: 3.13+ <stable@vger.kernel.org> # 3.13+
Reported-and-tested-by: NBu Yitian <ybu@qti.qualcomm.com>
Reported-by: NSaravana Kannan <skannan@codeaurora.org>
Reviewed-by: NSrivatsa S. Bhat <srivatsa@mit.edu>
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

92c14bd9

16 7月, 2014 4 次提交

cpufreq: cpu0: OPPs can be populated at runtime · 1bf8cc3d

由 Viresh Kumar 提交于 7月 11, 2014

OPPs can be populated statically, via DT, or added at run time with
dev_pm_opp_add().

While this driver handles the first case correctly, it would fail to populate
OPPs added at runtime. Because call to of_init_opp_table() would fail as there
are no OPPs in DT and probe will return early.

To fix this, remove error checking and call dev_pm_opp_init_cpufreq_table()
unconditionally.

Update bindings as well.
Suggested-by: NStephen Boyd <sboyd@codeaurora.org>
Tested-by: NStephen Boyd <sboyd@codeaurora.org>
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Acked-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

1bf8cc3d

cpufreq: kirkwood: Reinstate cpufreq driver for ARCH_KIRKWOOD · 2fa1adc0

由 Quentin Armitage 提交于 7月 10, 2014

Commit ff1f0018 ("drivers: Enable
building of Kirkwood drivers for mach-mvebu") added Kirkwood into
mach-mvebu, adding MACH_KIRKWOOD to ARCH_KIRKWOOD in the KConfig files.

The change for ARM_KIRKWOOD_CPUFREQ replaced ARCH_KIRKWOOD with
MACH_KIRKWOOD, whereas all the other changes were ARCH_KIRKWOOD ||
MACH_KIRKWOOD.

As a consequence of this change, the cpufreq driver is no longer enabled
for ARCH_KIRKWOOD. This patch reinstates ARM_KIRKWOOD_CPUFREQ for
ARCH_KIRKWOOD.
Signed-off-by: NQuentin Armitage <quentin@armitage.org.uk>
Acked-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

2fa1adc0

cpufreq: imx6q: Select PM_OPP · 7e021687

由 Nicolas Del Piano 提交于 7月 13, 2014

PM_OPP is a library used by several of the existing cpufreq drivers.
ARM IMX6Q cpufreq driver uses this library for its functionality.
Thus, it should be selected in Kconfig.
Reported-by: NEzequiel Garcia <ezequiel@vanguardiasur.com.ar>
Signed-off-by: NNicolas Del Piano <ndel314@gmail.com>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

7e021687

cpufreq: sa1110: set memory type for h3600 · 97d496bf

由 Linus Walleij 提交于 7月 12, 2014

The Compaq iPAQ h3600 also has the K4S281632b-1H memory type.
Verified by prying apart a broken board.
Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

97d496bf

09 7月, 2014 1 次提交

cpufreq: Makefile: fix compilation for davinci platform · 5a90af67

由 Prabhakar Lad 提交于 7月 08, 2014

Since commtit 8a7b1227 (cpufreq: davinci: move cpufreq driver to
drivers/cpufreq) this added dependancy only for CONFIG_ARCH_DAVINCI_DA850
where as davinci_cpufreq_init() call is used by all davinci platform.

This patch fixes following build error:

arch/arm/mach-davinci/built-in.o: In function `davinci_init_late':
:(.init.text+0x928): undefined reference to `davinci_cpufreq_init'
make: *** [vmlinux] Error 1

Fixes: 8a7b1227 (cpufreq: davinci: move cpufreq driver to drivers/cpufreq)
Signed-off-by: NLad, Prabhakar <prabhakar.csengg@gmail.com>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
Cc: 3.10+ <stable@vger.kernel.org> # 3.10+
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

5a90af67

07 7月, 2014 3 次提交

intel_pstate: Set CPU number before accessing MSRs · 179e8471

由 Vincent Minet 提交于 7月 05, 2014

Ensure that cpu->cpu is set before writing MSR_IA32_PERF_CTL during CPU
initialization. Otherwise only cpu0 has its P-state set and all other
cores are left with their values unchanged.

In most cases, this is not too serious because the P-states will be set
correctly when the timer function is run.  But when the default governor
is set to performance, the per-CPU current_pstate stays the same forever
and no attempts are made to write the MSRs again.
Signed-off-by: NVincent Minet <vincent@vincent-minet.net>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

179e8471

intel_pstate: don't touch turbo bit if turbo disabled or unavailable. · dd5fbf70

由 Dirk Brandewie 提交于 6月 20, 2014

If turbo is disabled in the BIOS bit 38 should be set in
MSR_IA32_MISC_ENABLE register per section 14.3.2.1 of the SDM Vol 3
document 325384-050US Feb 2014. If this bit is set do *not* attempt
to disable trubo via the MSR_IA32_PERF_CTL register. On some systems
trying to disable turbo via MSR_IA32_PERF_CTL will cause subsequent
writes to MSR_IA32_PERF_CTL not take affect, in fact reading
MSR_IA32_PERF_CTL will not show the IDA/Turbo DISENGAGE bit(32) as
set. A write of bit 32 to zero returns to normal operation.

Also deal with the case where the processor does not support
turbo and the BIOS does not report the fact in MSR_IA32_MISC_ENABLE
but does report the max and turbo P states as the same value.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=64251
Cc: 3.13+ <stable@vger.kernel.org> # 3.13+
Signed-off-by: NDirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

dd5fbf70

intel_pstate: Fix setting VID · c16ed060

由 Dirk Brandewie 提交于 6月 20, 2014

Commit 21855ff5 (intel_pstate: Set turbo VID for BayTrail) introduced
setting the turbo VID which is required to prevent a machine check on
some Baytrail SKUs under heavy graphics based workloads.  The
docmumentation update that brought the requirement to light also
changed the bit mask used for enumerating P state and VID values from
0x7f to 0x3f.

This change returns the mask value to 0x7f.

Tested with the Intel NUC DN2820FYK,
BIOS version FYBYT10H.86A.0034.2014.0513.1413 with v3.16-rc1 and
v3.14.8 kernel versions.

Fixes: 21855ff5 (intel_pstate: Set turbo VID for BayTrail)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=77951Reported-and-tested-by: NRune Reterson <rune@megahurts.dk>
Reported-and-tested-by: NEric Eickmeyer <erich@ericheickmeyer.com>
Cc: 3.13+ <stable@vger.kernel.org>  # 3.13+
Signed-off-by: NDirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

c16ed060

19 6月, 2014 1 次提交

cpufreq: unlock when failing cpufreq_update_policy() · fefa8ff8

由 Aaron Plattner 提交于 6月 18, 2014

Commit bd0fa9bb introduced a failure path to cpufreq_update_policy() if
cpufreq_driver->get(cpu) returns NULL. However, it jumps to the 'no_policy'
label, which exits without unlocking any of the locks the function acquired
earlier. This causes later calls into cpufreq to hang.

Fix this by creating a new 'unlock' label and jumping to that instead.

Fixes: bd0fa9bb ("cpufreq: Return error if ->get() failed in cpufreq_update_policy()")
Link: https://devtalk.nvidia.com/default/topic/751903/kernel-3-15-and-nv-drivers-337-340-failed-to-initialize-the-nvidia-kernel-module-gtx-550-ti-/Signed-off-by: NAaron Plattner <aplattner@nvidia.com>
Cc: 3.15+ <stable@vger.kernel.org> # 3.15+
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

fefa8ff8

18 6月, 2014 1 次提交

intel_pstate: Correct rounding in busy calculation · 51d211e9

由 Doug Smythies 提交于 6月 17, 2014

There was a mistake in the actual rounding portion this previous patch:
f0fe3cd7 (intel_pstate: Correct rounding in busy calculation) such that
the rounding was asymetric and incorrect.

Severity: Not very serious, but can increase target pstate by one extra value.
For real world work flows the issue should self correct (but I have no proof).
It is the equivalent of different PID gains for positive and negative numbers.

Examples:
 -3.000000 used to round to -4, rounds to -3 with this patch.
 -3.503906 used to round to -5, rounds to -4 with this patch.

Fixes: f0fe3cd7 (intel_pstate: Correct rounding in busy calculation)
Signed-off-by: NDoug Smythies <dsmythies@telus.net>
Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

51d211e9

17 6月, 2014 1 次提交

cpufreq: cpufreq-cpu0: fix CPU_THERMAL dependency · 217886d3

由 Arnd Bergmann 提交于 6月 13, 2014

5fbfbcd3 ("cpufreq: cpufreq-cpu0: remove dependency on THERMAL and
REGULATOR") was a little too quick in completely removing the dependency
on the THERMAL driver.

The problem is that while there are inline wrappers to turn the thermal
API calls into empty functions, those do not help if the cpu-thermal
driver is a loadable module and cpufreq-cpu0 is builtin.

Since CONFIG_CPU_THERMAL is a bool option that decides whether the cpu
code is built into the thermal module or not, we have to use a dependency
on the thermal driver itself. However, if CPU_THERMAL is disabled, we
don't need the dependency, hence the strange '!CPU_THERMAL || THERMAL'
construct.

Fixes: 5fbfbcd3 ("cpufreq: cpufreq-cpu0: remove dependency on THERMAL and REGULATOR")
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Tested-by: NViresh Kumar <viresh.kumar@linaro.org>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

217886d3

11 6月, 2014 3 次提交

cpufreq: cpufreq-cpu0: remove dependency on THERMAL and REGULATOR · 5fbfbcd3

由 Viresh Kumar 提交于 6月 10, 2014

cpufreq-cpu0 uses thermal framework to register a cooling device, but doesn't
depend on it as there are dummy calls provided by thermal layer when
CONFIG_THERMAL=n. And when these calls fail, the driver is still usable.

Similar explanation is valid for regulators as well. We do have dummy calls
available for regulator APIs and the driver can work even when those calls
fail.

So, we don't really need to mention thermal and regulators as a dependency for
cpufreq-cpu0 in Kconfig as platforms without support for thermal/regulator can
also use this driver. Remove this dependency.
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

5fbfbcd3

cpufreq: tegra: update comment for clarity · 40cc5490

由 Viresh Kumar 提交于 6月 10, 2014

Tegra's driver got updated a bit (00917ddc cpufreq: Tegra: implement intermediate
frequency callbacks) and implements new 'intermediate freq' infrastructure of
core. Above commit updated comments about when to call
clk_prepare_enable(pll_x_clk) and Doug wasn't satisfied with those comments and
said this:

> The "Though when target-freq is intermediate freq, we don't need to
> take this reference." makes me think that this function is actually
> called when target-freq is intermediate freq.  I don't think it is,
> right?

For better clarity just make that comment more explicit about when we call
tegra_target_intermediate().
Reviewed-by: NStephen Warren <swarren@nvidia.com>
Reported-and-reviewed-by: NDoug Anderson <dianders@chromium.org>
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

40cc5490

cpufreq: intel_pstate: Remove duplicate CPU ID check · 830bcac4

由 Stratos Karafotis 提交于 6月 10, 2014

We check the CPU ID during driver init. There is no need
to do it again per logical CPU initialization.

So, remove the duplicate check.
Signed-off-by: NStratos Karafotis <stratosk@semaphore.gr>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
Acked-by: NDirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

830bcac4

10 6月, 2014 1 次提交

cpufreq: Mark CPU0 driver with CPUFREQ_NEED_INITIAL_FREQ_CHECK flag · 93575b75

由 Viresh Kumar 提交于 6月 09, 2014

Sometimes boot loaders set CPU frequency to a value outside of frequency table
present with cpufreq core. In such cases CPU might be unstable if it has to run
on that frequency for long duration of time and so its better to set it to a
frequency which is specified in frequency table.

Sachin recently found this problem with cpufreq-cpu0 driver when he was testing
it for Exynos.

Set this flag for cpufreq-cpu0 driver.
Reported-and-tested-by: NSachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

93575b75

09 6月, 2014 1 次提交

cpufreq: governor: remove copy_prev_load from 'struct cpu_dbs_common_info' · c8ae481b

由 Viresh Kumar 提交于 6月 09, 2014

'copy_prev_load' was recently added by commit: 18b46abd (cpufreq: governor: Be
friendly towards latency-sensitive bursty workloads).

It actually is a bit redundant as we also have 'prev_load' which can store any
integer value and can be used instead of 'copy_prev_load' by setting it zero.

True load can also turn out to be zero during long idle intervals (and hence the
actual value of 'prev_load' and the overloaded value can clash). However this is
not a problem because, if the true load was really zero in the previous
interval, it makes sense to evaluate the load afresh for the current interval
rather than copying the previous load.

So, drop 'copy_prev_load' and use 'prev_load' instead.

Update comments as well to make it more clear.

There is another change here which was probably missed by Srivatsa during the
last version of updates he made. The unlikely in the 'if' statement was covering
only half of the condition and the whole line should actually come under it.

Also checkpatch is made more silent as it was reporting this (--strict option):

CHECK: Alignment should match open parenthesis
+		if (unlikely(wall_time > (2 * sampling_rate) &&
+						j_cdbs->prev_load)) {
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: NPavel Machek <pavel@ucw.cz>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

c8ae481b

08 6月, 2014 1 次提交

cpufreq: governor: Be friendly towards latency-sensitive bursty workloads · 18b46abd

由 Srivatsa S. Bhat 提交于 6月 08, 2014

Cpufreq governors like the ondemand governor calculate the load on the CPU
periodically by employing deferrable timers. A deferrable timer won't fire
if the CPU is completely idle (and there are no other timers to be run), in
order to avoid unnecessary wakeups and thus save CPU power.

However, the load calculation logic is agnostic to all this, and this can
lead to the problem described below.

Time (ms)               CPU 1

100                Task-A running

110                Governor's timer fires, finds load as 100% in the last
                   10ms interval and increases the CPU frequency.

110.5              Task-A running

120		   Governor's timer fires, finds load as 100% in the last
		   10ms interval and increases the CPU frequency.

125		   Task-A went to sleep. With nothing else to do, CPU 1
		   went completely idle.

200		   Task-A woke up and started running again.

200.5		   Governor's deferred timer (which was originally programmed
		   to fire at time 130) fires now. It calculates load for the
		   time period 120 to 200.5, and finds the load is almost zero.
		   Hence it decreases the CPU frequency to the minimum.

210		   Governor's timer fires, finds load as 100% in the last
		   10ms interval and increases the CPU frequency.

So, after the workload woke up and started running, the frequency was suddenly
dropped to absolute minimum, and after that, there was an unnecessary delay of
10ms (sampling period) to increase the CPU frequency back to a reasonable value.
And this pattern repeats for every wake-up-from-cpu-idle for that workload.
This can be quite undesirable for latency- or response-time sensitive bursty
workloads. So we need to fix the governor's logic to detect such wake-up-from-
cpu-idle scenarios and start the workload at a reasonably high CPU frequency.

One extreme solution would be to fake a load of 100% in such scenarios. But
that might lead to undesirable side-effects such as frequency spikes (which
might also need voltage changes) especially if the previous frequency happened
to be very low.

We just want to avoid the stupidity of dropping down the frequency to a minimum
and then enduring a needless (and long) delay before ramping it up back again.
So, let us simply carry forward the previous load - that is, let us just pretend
that the 'load' for the current time-window is the same as the load for the
previous window. That way, the frequency and voltage will continue to be set
to whatever values they were set at previously. This means that bursty workloads
will get a chance to influence the CPU frequency at which they wake up from
cpu-idle, based on their past execution history. Thus, they might be able to
avoid suffering from slow wakeups and long response-times.

However, we should take care not to over-do this. For example, such a "copy
previous load" logic will benefit cases like this: (where # represents busy
and . represents idle)

##########.........#########.........###########...........##########........

but it will be detrimental in cases like the one shown below, because it will
retain the high frequency (copied from the previous interval) even in a mostly
idle system:

##########.........#.................#.....................#...............

(i.e., the workload finished and the remaining tasks are such that their busy
periods are smaller than the sampling interval, which causes the timer to
always get deferred. So, this will make the copy-previous-load logic copy
the initial high load to subsequent idle periods over and over again, thus
keeping the frequency high unnecessarily).

So, we modify this copy-previous-load logic such that it is used only once
upon every wakeup-from-idle. Thus if we have 2 consecutive idle periods, the
previous load won't get blindly copied over; cpufreq will freshly evaluate the
load in the second idle interval, thus ensuring that the system comes back to
its normal state.

[ The right way to solve this whole problem is to teach the CPU frequency
governors to also track load on a per-task basis, not just a per-CPU basis,
and then use both the data sources intelligently to set the appropriate
frequency on the CPUs. But that involves redesigning the cpufreq subsystem,
so this patch should make the situation bearable until then. ]

Experimental results:
+-------------------+

I ran a modified version of ebizzy (called 'sleeping-ebizzy') that sleeps in
between its execution such that its total utilization can be a user-defined
value, say 10% or 20% (higher the utilization specified, lesser the amount of
sleeps injected). This ebizzy was run with a single-thread, tied to CPU 8.

Behavior observed with tracing (sample taken from 40% utilization runs):
------------------------------------------------------------------------

Without patch:
~~~~~~~~~~~~~~
kworker/8:2-12137  416.335742: cpu_frequency: state=2061000 cpu_id=8
kworker/8:2-12137  416.335744: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
      <...>-40753  416.345741: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
kworker/8:2-12137  416.345744: cpu_frequency: state=4123000 cpu_id=8
kworker/8:2-12137  416.345746: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
      <...>-40753  416.355738: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
<snip>  ---------------------------------------------------------------------  <snip>
      <...>-40753  416.402202: sched_switch: prev_comm=ebizzy ==> next_comm=swapper/8
     <idle>-0      416.502130: sched_switch: prev_comm=swapper/8 ==> next_comm=ebizzy
      <...>-40753  416.505738: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
kworker/8:2-12137  416.505739: cpu_frequency: state=2061000 cpu_id=8
kworker/8:2-12137  416.505741: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
      <...>-40753  416.515739: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
kworker/8:2-12137  416.515742: cpu_frequency: state=4123000 cpu_id=8
kworker/8:2-12137  416.515744: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy

Observation: Ebizzy went idle at 416.402202, and started running again at
416.502130. But cpufreq noticed the long idle period, and dropped the frequency
at 416.505739, only to increase it back again at 416.515742, realizing that the
workload is in-fact CPU bound. Thus ebizzy needlessly ran at the lowest frequency
for almost 13 milliseconds (almost 1 full sample period), and this pattern
repeats on every sleep-wakeup. This could hurt latency-sensitive workloads quite
a lot.

With patch:
~~~~~~~~~~~

kworker/8:2-29802  464.832535: cpu_frequency: state=2061000 cpu_id=8
<snip>  ---------------------------------------------------------------------  <snip>
kworker/8:2-29802  464.962538: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
      <...>-40738  464.972533: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
kworker/8:2-29802  464.972536: cpu_frequency: state=4123000 cpu_id=8
kworker/8:2-29802  464.972538: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
      <...>-40738  464.982531: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
<snip>  ---------------------------------------------------------------------  <snip>
kworker/8:2-29802  465.022533: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
      <...>-40738  465.032531: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
kworker/8:2-29802  465.032532: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
      <...>-40738  465.035797: sched_switch: prev_comm=ebizzy ==> next_comm=swapper/8
     <idle>-0      465.240178: sched_switch: prev_comm=swapper/8 ==> next_comm=ebizzy
      <...>-40738  465.242533: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
kworker/8:2-29802  465.242535: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
      <...>-40738  465.252531: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2

Observation: Ebizzy went idle at 465.035797, and started running again at
465.240178. Since ebizzy was the only real workload running on this CPU,
cpufreq retained the frequency at 4.1Ghz throughout the run of ebizzy, no
matter how many times ebizzy slept and woke-up in-between. Thus, ebizzy
got the 10ms worth of 4.1 Ghz benefit during every sleep-wakeup (as compared
to the run without the patch) and this boost gave a modest improvement in total
throughput, as shown below.

Sleeping-ebizzy records-per-second:
-----------------------------------

Utilization  Without patch  With patch  Difference (Absolute and % values)
    10%         274767        277046        +  2279 (+0.829%)
    20%         543429        553484        + 10055 (+1.850%)
    40%        1090744       1107959        + 17215 (+1.578%)
    60%        1634908       1662018        + 27110 (+1.658%)

A rudimentary and somewhat approximately latency-sensitive workload such as
sleeping-ebizzy itself showed a consistent, noticeable performance improvement
with this patch. Hence, workloads that are truly latency-sensitive will benefit
quite a bit from this change. Moreover, this is an overall win-win since this
patch does not hurt power-savings at all (because, this patch does not reduce
the idle time or idle residency; and the high frequency of the CPU when it goes
to cpu-idle does not affect/hurt the power-savings of deep idle states).
Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Reviewed-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

18b46abd

07 6月, 2014 2 次提交

cpufreq: ppc-corenet-cpu-freq: do_div use quotient · 906fe033

由 Ed Swarthout 提交于 6月 05, 2014

Commit 6712d293 (cpufreq: ppc-corenet-cpufreq: Fix __udivdi3 modpost
error) used the remainder from do_div instead of the quotient.  Fix that
and add one to ensure minimum is met.

Fixes: 6712d293 (cpufreq: ppc-corenet-cpufreq: Fix __udivdi3 modpost error)
Signed-off-by: NEd Swarthout <Ed.Swarthout@freescale.com>
Cc: 3.15+ <stable@vger.kernel.org> # 3.15+
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

906fe033

Revert "cpufreq: Enable big.LITTLE cpufreq driver on arm64" · 57aa5ea0

由 Rafael J. Wysocki 提交于 6月 05, 2014

This reverts commit 4920ab84 (cpufreq: Enable big.LITTLE cpufreq
driver on arm64) that breaks build on arm64.

Fixes: 4920ab84 (cpufreq: Enable big.LITTLE cpufreq driver on arm64)
Reported-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

57aa5ea0

06 6月, 2014 2 次提交

cpufreq: Tegra: implement intermediate frequency callbacks · 00917ddc

由 Viresh Kumar 提交于 6月 02, 2014

Tegra has been switching to intermediate frequency (pll_p_clk) forever.
CPUFreq core has better support for handling notifications for these
frequencies and so we can adapt Tegra's driver to it.

Also do a WARN() if clk_set_parent() fails while moving back to pll_x
as we should have atleast restored to earlier frequency on error.
Tested-by: NStephen Warren <swarren@nvidia.com>
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: NDoug Anderson <dianders@chromium.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

00917ddc

cpufreq: add support for intermediate (stable) frequencies · 1c03a2d0

由 Viresh Kumar 提交于 6月 02, 2014

Douglas Anderson, recently pointed out an interesting problem due to which
udelay() was expiring earlier than it should.

While transitioning between frequencies few platforms may temporarily switch to
a stable frequency, waiting for the main PLL to stabilize.

For example: When we transition between very low frequencies on exynos, like
between 200MHz and 300MHz, we may temporarily switch to a PLL running at 800MHz.
No CPUFREQ notification is sent for that. That means there's a period of time
when we're running at 800MHz but loops_per_jiffy is calibrated at between 200MHz
and 300MHz. And so udelay behaves badly.

To get this fixed in a generic way, introduce another set of callbacks
get_intermediate() and target_intermediate(), only for drivers with
target_index() and CPUFREQ_ASYNC_NOTIFICATION unset.

get_intermediate() should return a stable intermediate frequency platform wants
to switch to, and target_intermediate() should set CPU to that frequency,
before jumping to the frequency corresponding to 'index'. Core will take care of
sending notifications and driver doesn't have to handle them in
target_intermediate() or target_index().

NOTE: ->target_index() should restore to policy->restore_freq in case of
failures as core would send notifications for that.
Tested-by: NStephen Warren <swarren@nvidia.com>
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: NDoug Anderson <dianders@chromium.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

1c03a2d0

02 6月, 2014 4 次提交

intel_pstate: Improve initial busy calculation · bf810222

由 Doug Smythies 提交于 5月 30, 2014

This change makes the busy calculation using 64 bit math which prevents
overflow for large values of aperf/mperf.

Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: NDoug Smythies <dsmythies@telus.net>
Signed-off-by: NDirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

bf810222

intel_pstate: add sample time scaling · c4ee841f

由 Dirk Brandewie 提交于 5月 29, 2014

The PID assumes that samples are of equal time, which for a deferable
timers this is not true when the system goes idle. This causes the
PID to take a long time to converge to the min P state and depending
on the pattern of the idle load can make the P state appear stuck.

The hold-off value of three sample times before using the scaling is
to give a grace period for applications that have high performance
requirements and spend a lot of time idle, The poster child for this
behavior is the ffmpeg benchmark in the Phoronix test suite.

Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: NDirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

c4ee841f

intel_pstate: Correct rounding in busy calculation · f0fe3cd7

由 Dirk Brandewie 提交于 5月 29, 2014

Changing to fixed point math throughout the busy calculation in
commit e66c1768 (Change busy calculation to use fixed point
math.) Introduced some inaccuracies by rounding the busy value at two
points in the calculation.  This change removes roundings and moves
the rounding to the output of the PID where the calculations are
complete and the value returned as an integer.

Fixes: e66c1768 (intel_pstate: Change busy calculation to use fixed point math.)
Reported-by: NDoug Smythies <dsmythies@telus.net>
Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: NDirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

f0fe3cd7

intel_pstate: Remove C0 tracking · adacdf3f

由 Dirk Brandewie 提交于 5月 29, 2014

Commit fcb6a15c (intel_pstate: Take core C0 time into account for core
busy calculation) introduced a regression referenced below. The issue
with "lockup" after suspend that this commit was addressing is now dealt
with in the suspend path.

Fixes: fcb6a15c (intel_pstate: Take core C0 time into account for core busy calculation)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=66581
Link: https://bugzilla.kernel.org/show_bug.cgi?id=75121Reported-by: NDoug Smythies <dsmythies@telus.net>
Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: NDirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

adacdf3f

31 5月, 2014 1 次提交

cpufreq: exynos: Fix driver compilation with ARCH_MULTIPLATFORM · 4c8d8193

由 Tomasz Figa 提交于 5月 26, 2014

Currently Exynos cpufreq drivers rely on globally mapped
clock controller registers to configure frequency of CPU
cores. This is obviously wrong and will be removed in near
future, but to enable support for multi-platform builds
without introducing a regression it needs to be worked
around.

This patch hacks the code to look for clock controller node
in device tree and map its registers using of_iomap(),
instead of relying on global mapping, so dependencies on
platform headers are removed and the driver can compile
again with multiplatform support.
Signed-off-by: NTomasz Figa <t.figa@samsung.com>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NKukjin Kim <kgene.kim@samsung.com>

4c8d8193

29 5月, 2014 1 次提交

cpufreq: handle calls to ->target_index() in separate routine · 8d65775d

由 Viresh Kumar 提交于 5月 21, 2014

Handling calls to ->target_index() has got complex over time and might become
more complex. So, its better to take target_index() bits out in another routine
__target_index() for better code readability. Shouldn't have any functional
impact.
Tested-by: NStephen Warren <swarren@nvidia.com>
Reviewed-by: NDoug Anderson <dianders@chromium.org>
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

8d65775d

27 5月, 2014 1 次提交

cpufreq: s5pv210: drop check for CONFIG_PM_VERBOSE · 1ef546f2

由 Paul Bolle 提交于 5月 23, 2014

A pr_err() was added in v3.1. It was guarded by a check for
CONFIG_PM_VERBOSE. The Kconfig symbol PM_VERBOSE was removed in v3.0. So
this pr_err() has never been used. Drop that check and clean up the
message a bit.
Signed-off-by: NPaul Bolle <pebolle@tiscali.nl>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: NSachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

1ef546f2

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功