提交 · 694d0d0bb2030d2e36df73e2d23d5770511dbc8d · openeuler / raspberrypi-kernel

29 7月, 2016 1 次提交

cpufreq: intel_pstate: Add more out-of-band IDs · 65c1262f

由 Srinivas Pandruvada 提交于 7月 23, 2016

Add Skylake-X and Broadwell-X IDs for out-of-band (OBB) control of
P-States.

For these processors, if MSR_MISC_PWR_MGMT BIT(8) == 1, then the
Intel P-State driver should exit as OS can't control P-States.
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw : Subject/changelog ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

65c1262f

21 7月, 2016 3 次提交

cpufreq: intel_pstate: Check cpuid for MSR_HWP_INTERRUPT · da7de91c

由 Srinivas Pandruvada 提交于 7月 19, 2016

The MSR MSR_HWP_INTERRUPT is valid only when CPUID.06H:EAX[8] = 1, so
check for feature before accessing this MSR.
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

da7de91c

intel_pstate: Update cpu_frequency tracepoint every time · bc95a454

由 Rafael J. Wysocki 提交于 7月 19, 2016

Currently, intel_pstate only updates the cpu_frequency tracepoint
if the new P-state to set is different from the current one, but
that causes powertop to report 100% idle on an 100% loaded system
sometimes.

Prevent that from happening by updating the cpu_frequency tracepoint
every time intel_pstate_update_pstate() is called.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>-

bc95a454

cpufreq: intel_pstate: clean remnant struct element · 2630abc2

由 Carsten Emde 提交于 7月 19, 2016

When I was working with the Intel P state driver I came across a
remnant struct element that is no longer needed after the function
intel_pstate_calc_freq() was retired.
Signed-off-by: NCarsten Emde <C.Emde@osadl.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

2630abc2

11 7月, 2016 1 次提交

intel_pstate: Fix MSR_CONFIG_TDP_x addressing in core_get_max_pstate() · 5fc8f707

由 Jan Kiszka 提交于 7月 08, 2016

If MSR_CONFIG_TDP_CONTROL is locked, we currently try to address some
MSR 0x80000648 or so. Mask out the relevant level bits 0 and 1.

Found while running over the Jailhouse hypervisor which became upset
about this strange MSR index.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Acked-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: 4.4+ <stable@vger.kernel.org> # 4.4+
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

5fc8f707

07 7月, 2016 1 次提交

cpufreq: intel_pstate: Replace MSR_NHM_TURBO_RATIO_LIMIT · 100cf6f2

由 Srinivas Pandruvada 提交于 7月 06, 2016

Replace MSR_NHM_TURBO_RATIO_LIMIT with MSR_TURBO_RATIO_LIMIT.
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

100cf6f2

28 6月, 2016 4 次提交

intel_pstate: Declare pid_params/pstate_funcs/hwp_active __read_mostly · 4a7cb7a9

由 Jisheng Zhang 提交于 6月 27, 2016

pid_params is written once by copy_pid_params() during initialization,
and thereafter is mostly read by hot path intel_pstate_update_util().
The read of pid_params gets more after commit a4675fbc ("cpufreq:
intel_pstate: Replace timers with utilization update callbacks")

pstate_funcs is written once by copy_cpu_funcs() during initialization,
and thereafter is mostly read by hot path intel_pstate_update_util()

hwp_active is written to once during initialization and thereafter is
mostly read by hot path intel_pstate_update_util().

The fact that they are mostly read and not written to makes them
candidates for __read_mostly declarations.
Signed-off-by: NJisheng Zhang <jszhang@marvell.com>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
Acked-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

4a7cb7a9

intel_pstate: add __init/__initdata marker to some functions/variables · 29327c84

由 Jisheng Zhang 提交于 6月 27, 2016

These functions/variables are not needed after booting, so mark them
as __init or __initdata.
Signed-off-by: NJisheng Zhang <jszhang@marvell.com>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
Acked-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

29327c84

intel_pstate: Fix incorrect placement of __initdata · eed43609

由 Jisheng Zhang 提交于 6月 27, 2016

__initdata should be placed between the variable name and equal sign
(if there is) for the variable to be placed in the intended section.
Signed-off-by: NJisheng Zhang <jszhang@marvell.com>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
Acked-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

eed43609

intel_pstate: Do not clear utilization update hooks on policy changes · 5ab666e0

由 Rafael J. Wysocki 提交于 6月 27, 2016

intel_pstate_set_policy() is invoked by the cpufreq core during
driver initialization, on changes of policy attributes (minimim and
maximum frequency, for example) via sysfs and via CPU notifications
from the platform firmware.  On some platforms the latter may occur
relatively often.

Commit bb6ab52f (intel_pstate: Do not set utilization update hook
too early) made intel_pstate_set_policy() clear the CPU's utilization
update hook before updating the policy attributes for it (and set the
hook again after doind that), but that involves invoking
synchronize_sched() and adds overhead to the CPU notifications
mentioned above and to the sched-RCU handling in general.

That extra overhead is arguably not necessary, because updating
policy attributes when the CPU's utilization update hook is active
should not lead to any adverse effects, so drop the clearing of
the hook from intel_pstate_set_policy() and make it check if
the hook has been set already when attempting to set it.

Fixes: bb6ab52f (intel_pstate: Do not set utilization update hook too early)
Reported-by: NJisheng Zhang <jszhang@marvell.com>
Tested-by: NJisheng Zhang <jszhang@marvell.com>
Tested-by: NDoug Smythies <dsmythies@telus.net>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

5ab666e0

15 6月, 2016 1 次提交

cpufreq: intel_pstate: Adjust _PSS[0] freqeuency if needed · b00345d1

由 Srinivas Pandruvada 提交于 6月 14, 2016

The maximum turbo P-State used by the intel_pstate driver may be
limited by ACPI _PSS table entry 0.  After commit 9522a2ff
(cpufreq: intel_pstate: Enforce _PPC limits), the maximum performance
on servers will be capped by the _PSS table entry 0 by default.

Even though that is formally correct, it may lead to preformance
regressions in some cases.  Namely, if the _PSS table entry 0 is
not the maximum turbo P-State, performance measured after commit
9522a2ff will not match the performance measured before that
commit on the same system.

For this reason, modify the code to always use the maximum turbo
frequency as the one that corresponds to _PSS table entry 0 if turbo
is enabled in the BIOS.  This way, the performance levels from
before commit 9522a2ff will be restored on the affected systems.

Fixes: 9522a2ff (cpufreq: intel_pstate: Enforce _PPC limits)
Suggested-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw : Changelog ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

b00345d1

14 6月, 2016 1 次提交

cpufreq: intel_pstate: Broxton support · 41bad47f

由 Srinivas Pandruvada 提交于 6月 09, 2016

Add Broxton CPU model number.

Broxton requires core_params to get performance limits via MSRs, but
it is an Atom platform, which requires more power optimized algorithm.

So the P state selection will use similar algorithm as other Atom
platforms.
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

41bad47f

08 6月, 2016 3 次提交

x86/cpufreq: Use Intel family name macros for the intel_pstate cpufreq driver · 5b20c944

由 Dave Hansen 提交于 6月 02, 2016

Another straightforward replacement of magic numbers.
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Acked-by: NRafael J. Wysocki <rjw@rjwysocki.net>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: jacob.jun.pan@intel.com
Cc: linux-pm@vger.kernel.org
Link: http://lkml.kernel.org/r/20160603001945.0F5D02AA@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

5b20c944

cpufreq: intel_pstate: Fix ->set_policy() interface for no_turbo · 983e600e

由 Srinivas Pandruvada 提交于 6月 07, 2016

When turbo is disabled, the ->set_policy() interface is broken.

For example, when turbo is disabled and cpuinfo.max = 2900000 (full
max turbo frequency), setting the limits results in frequency less
than the requested one:
Set 1000000 KHz results in 0700000 KHz
Set 1500000 KHz results in 1100000 KHz
Set 2000000 KHz results in  1500000 KHz

This is because the limits->max_perf fraction is calculated using
the max turbo frequency as the reference, but when the max P-State is
capped in intel_pstate_get_min_max(), the reference is not the max
turbo P-State. This results in reducing max P-State.

One option is to always use max turbo as reference for calculating
limits. But this will not be correct. By definition the intel_pstate
sysfs limits, shows percentage of available performance. So when
BIOS has disabled turbo, the available performance is max non turbo.
So the max_perf_pct should still show 100%.
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw : Subject & changelog, rewrite in fewer lines of code ]
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

983e600e

cpufreq: intel_pstate: Fix code ordering in intel_pstate_set_policy() · 2c2c1af4

由 Srinivas Pandruvada 提交于 6月 07, 2016

The limits->max_perf is rounded_up but immediately overwritten by
another assignment to limits->max_perf.

Move that operation to the correct location.

While here also added a pr_debug() call in ->set_policy to aid in
debugging.

Fixes: 785ee278 (cpufreq: intel_pstate: Fix limits->max_perf rounding error)
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw : Subject & changelog ]
Cc: 4.4+ <stable@vger.kernel.org> # 4.4+
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

2c2c1af4

30 5月, 2016 1 次提交

cpufreq: intel_pstate: Downgrade print level for _PPC · 6cacd115

由 Srinivas Pandruvada 提交于 5月 29, 2016

Downgrade pr_info to pr_debug for the "_PPC limits will be enforced"
message.

In server systems with many cores this message is annoying.
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Changelog ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

6cacd115

18 5月, 2016 1 次提交

intel_pstate: Simplify conditional in intel_pstate_set_policy() · c749c64f

由 Rafael J. Wysocki 提交于 5月 12, 2016

One of the if () statements in intel_pstate_set_policy() causes
another if () to be evaluated if the condition is true and it
doesn't do anything else, so merge the two if () statements into
one.

No functional changes.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

c749c64f

12 5月, 2016 4 次提交

intel_pstate: Clean up get_target_pstate_use_performance() · 1aa7a6e2

由 Rafael J. Wysocki 提交于 5月 11, 2016

The comments and the core_busy variable name in
get_target_pstate_use_performance() are totally confusing,
so modify them to reflect what's going on.

The results of the computations should be the same as before.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

1aa7a6e2

intel_pstate: Use sample.core_avg_perf in get_avg_pstate() · 8edb0a6e

由 Rafael J. Wysocki 提交于 5月 11, 2016

Notice that get_avg_pstate() can use sample.core_avg_perf instead of
carrying the same division again, so make it do that.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

8edb0a6e

intel_pstate: Clarify average performance computation · a1c9787d

由 Rafael J. Wysocki 提交于 5月 11, 2016

The core_pct_busy field of struct sample actually contains the
average performace during the last sampling period (in percent)
and not the utilization of the core as suggested by its name
which is confusing.

For this reason, change the name of that field to core_avg_perf
and rename the function that computes its value accordingly.

Also notice that storing this value as percentage requires a costly
integer multiplication to be carried out in a hot path, so instead
store it as an "extended fixed point" value with more fraction bits
and update the code using it accordingly (it is better to change the
name of the field along with its meaning in one go than to make those
two changes separately, as that would likely lead to more
confusion).
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

a1c9787d

intel_pstate: Avoid unnecessary synchronize_sched() during initialization · 4578ee7e

由 Chen Yu 提交于 5月 11, 2016

Currently, in intel_pstate_clear_update_util_hook(), after
clearing the utilization update hook, we leverage
synchronize_sched() to deal with synchronization, which
is a little bit time-costly because synchronize_sched()
has to wait for all the CPUs to go through a grace period.

Actually, the synchronize_sched() is not necessary if the utilization
update hook has not been set for the given CPU yet, so make the driver
check if that's the case and avoid the synchronize_sched() call then.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=116371Tested-by: NTian Ye <yex.tian@intel.com>
Signed-off-by: NChen Yu <yu.c.chen@intel.com>
[ rjw : Rebase ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

4578ee7e

10 5月, 2016 1 次提交

intel_pstate: Clean up intel_pstate_get() · f96fd0c8

由 Rafael J. Wysocki 提交于 5月 07, 2016

intel_pstate_get() contains a local variable that's initialized but
never used and it can be written in fewer lines of code, so clean
it up.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

f96fd0c8

05 5月, 2016 1 次提交

cpufreq: intel_pstate: Ignore _PPC processing under HWP · e59a8f7f

由 Srinivas Pandruvada 提交于 5月 04, 2016

When HWP (hardware P states) feature is active, the ACPI _PSS and _PPC
is not used. So ignore processing for _PPC limits.
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

e59a8f7f

04 5月, 2016 1 次提交

intel_pstate: Fix intel_pstate_get() · 6d45b719

由 Rafael J. Wysocki 提交于 5月 04, 2016

After commit 8fa520af "intel_pstate: Remove freq calculation from
intel_pstate_calc_busy()" intel_pstate_get() calls get_avg_frequency()
to compute the average frequency, which is problematic for two reasons.

First, intel_pstate_get() may be invoked before the driver reads the
CPU feedback registers for the first time and if that happens,
get_avg_frequency() will attempt to divide by zero.

Second, the get_avg_frequency() call in intel_pstate_get() is racy
with respect to intel_pstate_sample() and it may end up returning
completely meaningless values for this reason.

Moreover, after commit 7349ec04 "intel_pstate: Move
intel_pstate_calc_busy() into get_target_pstate_use_performance()"
sample.core_pct_busy is never computed on Atom, but it is used in
intel_pstate_adjust_busy_pstate() in that case too.

To address those problems notice that if sample.core_pct_busy
was used in the average frequency computation carried out by
get_avg_frequency(), both the divide by zero problem and the
race with respect to intel_pstate_sample() would be avoided.

Accordingly, move the invocation of intel_pstate_calc_busy() from
get_target_pstate_use_performance() to intel_pstate_update_util(),
which also will take care of the uninitialized sample.core_pct_busy
on Atom, and modify get_avg_frequency() to use sample.core_pct_busy
as per the above.
Reported-by: Nkernel test robot <ying.huang@linux.intel.com>
Link: http://marc.info/?l=linux-kernel&m=146226437623173&w=4
Fixes: 8fa520af "intel_pstate: Remove freq calculation from intel_pstate_calc_busy()"
Fixes: 7349ec04 "intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()"
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

6d45b719

02 5月, 2016 1 次提交

cpufreq: intel_pstate: Fix HWP on boot CPU after system resume · ba41e1bc

由 Rafael J. Wysocki 提交于 5月 02, 2016

Commit 41cfd64c "Update frequencies of policy->cpus only from
->set_policy()" changed the way the intel_pstate driver's ->set_policy
callback updates the HWP (hardware-managed P-states) settings.
A side effect of it is that if those settings are modified on the
boot CPU during system suspend and wakeup, they will never be
restored during subsequent system resume.

To address this problem, allow cpufreq drivers that don't provide
->target or ->target_index callbacks to use ->suspend and ->resume
callbacks and add a ->resume callback to intel_pstate to restore
the HWP settings on the CPUs that belong to the given policy.

Fixes: 41cfd64c "Update frequencies of policy->cpus only from ->set_policy()"
Tested-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>

ba41e1bc

28 4月, 2016 3 次提交

cpufreq: intel_pstate: Enable PPC enforcement for servers · 2b3ec765

由 Srinivas Pandruvada 提交于 4月 27, 2016

For platforms which are controlled via remove node manager, enable _PPC by
default. These platforms are mostly categorized as enterprise server or
performance servers. These platforms needs to go through some
certifications tests, which tests control via _PPC.
The relative risk of enabling by default is low as this is is less likely
that these systems have broken _PSS table.
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

2b3ec765

cpufreq: intel_pstate: Adjust policy->max · 3be9200d

由 Srinivas Pandruvada 提交于 4月 27, 2016

When policy->max is changed via _PPC or sysfs and is more than the max non
turbo frequency, it does not really change resulting performance in some
processors. When policy->max results in a P-State ratio more than the
turbo activation ratio, then processor can choose any P-State up to max
turbo. So the user or _PPC setting has no value, but this can cause
undesirable side effects like:
- Showing reduced max percentage in Intel P-State sysfs
- It can cause reduced max performance under certain boundary conditions:
The requested max scaling frequency either via _PPC or via cpufreq-sysfs,
will be converted into a fixed floating point max percent scale. In
majority of the cases this will result in correct max. But not 100% of the
time. If the _PPC is requested at a point where the calculation lead to a
lower max, this can result in a lower P-State then expected and it will
impact performance.
Example of this condition using a Broadwell laptop with config TDP.

ACPI _PSS table from a Broadwell laptop
2301000 2300000 2200000 2000000 1900000 1800000 1700000 1500000 1400000
1300000 1100000 1000000 900000 800000 600000 500000

The actual results by disabling config TDP so that we can get what is
requested on or below 2300000Khz.

scaling_max_freq        Max Requested P-State   Resultant scaling
max
---------------------------------------- ----------------------
2400000                 18                      2900000 (max
turbo)
2300000                 17                      2300000 (max
physical non turbo)
2200000                 15                      2100000
2100000                 15                      2100000
2000000                 13                      1900000
1900000                 13                      1900000
1800000                 12                      1800000
1700000                 11                      1700000
1600000                 10                      1600000
1500000                 f                       1500000
1400000                 e                       1400000
1300000                 d                       1300000
1200000                 c                       1200000
1100000                 a                       1000000
1000000                 a                       1000000
900000                  9                        900000
800000                  8                        800000
700000                  7                        700000
600000                  6                        600000
500000                  5                        500000
------------------------------------------------------------------

Now set the config TDP level 1 ratio as 0x0b (equivalent to 1100000KHz)
in BIOS (not every system will let you adjust this).
The turbo activation ratio will be set to one less than that, which will
be 0x0a (So any request above 1000000KHz should result in turbo region
assuming no thermal limits).
Here _PPC will request max to 1100000KHz (which basically should still
result in turbo as this is more than the turbo activation ratio up to
max allowable turbo frequency), but actual calculation resulted in a max
ceiling P-State which is 0x0a. So under any load condition, this driver
will not request turbo P-States. This will be a huge performance hit.

When config TDP feature is ON, if the _PPC points to a frequency above
turbo activation ratio, the performance can still reach max turbo. In this
case we don't need to treat this as the reduced frequency in set_policy
callback.

In this change when config TDP is active (by checking if the physical max
non turbo ratio is more than the current max non turbo ratio), any request
above current max non turbo is treated as full performance.
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw : Minor cleanups ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

3be9200d

cpufreq: intel_pstate: Enforce _PPC limits · 9522a2ff

由 Srinivas Pandruvada 提交于 4月 27, 2016

Use ACPI _PPC notification to limit max P state driver will request.
ACPI _PPC change notification is sent by BIOS to limit max P state
in several cases:
- Reduce impact of platform thermal condition
- When Config TDP feature is used, a changed _PPC is sent to
follow TDP change
- Remote node managers in server want to control platform power
via baseboard management controller (BMC)

This change registers with ACPI processor performance lib so that
_PPC changes are notified to cpufreq core, which in turns will
result in call to .setpolicy() callback. Also the way _PSS
table identifies a turbo frequency is not compatible to max turbo
frequency in intel_pstate, so the very first entry in _PSS needs
to be adjusted.

This feature can be turned on by using kernel parameters:
intel_pstate=support_acpi_ppc
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Minor cleanups ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

9522a2ff

26 4月, 2016 1 次提交

cpufreq: intel_pstate: Fix processing for turbo activation ratio · 1becf035

由 Srinivas Pandruvada 提交于 4月 22, 2016

When the config TDP level is not nominal (level = 0), the MSR values for
reading level 1 and level 2 ratios contain power in low 14 bits and actual
ratio bits are at bits [23:16]. The current processing for level 1 and
level 2 is wrong as there is no shift done to get actual ratio.

Fixes: 6a35fc2d (cpufreq: intel_pstate: get P1 from TAR when available)
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: 4.4+ <stable@vger.kernel.org> # 4.4+
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

1becf035

25 4月, 2016 1 次提交

cpufreq: intel_pstate: Use average P-State instead of current P-State · bdcaa23f

由 Philippe Longepe 提交于 4月 22, 2016

The result returned by pid_calc() is subtracted from current_pstate
(which is the P-State requested during the last period) in order to
obtain the target P-State for the current iteration.

However, current_pstate may not reflect the real current P-State of
the CPU. In particular, that P-State may be higher because of the
frequency sharing per module.

The theory is:
 - The load is the percentage of time spent in C0 and is related to
   the average P-State during the same period.
 - The last requested P-State can be completely different than the
   average P-State (because of frequency sharing or throttling).
 - The P-State shift computed by the pid_calc is based on the load
   computed at average P-State, so the shift must be relative to
   this average P-State.

Using the average P-State instead of current P-State improves power
without significant performance penalty in cases when a task migrates
from one core to other core sharing frequency and voltage.

Performance and power comparison with this patch on Cherry Trail
platform using Android:

Benchmark               ?Perf    ?Power
FishTank                10.45%    3.1%
SmartBench-Gaming       -0.1%   -10.4%
SmartBench-Productivity -0.8%   -10.4%
CandyCrush                n/a   -17.4%
AngryBirds                n/a    -5.9%
videoPlayback             n/a   -13.9%
audioPlayback             n/a    -4.9%
IcyRocks-20-50           0.0%   -38.4%
iozone RR               -0.16%  -1.3%
iozone RW                0.74%  -1.3%
Signed-off-by: NPhilippe Longepe <philippe.longepe@linux.intel.com>
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

bdcaa23f

10 4月, 2016 1 次提交

intel_pstate: Avoid getting stuck in high P-states when idle · ffb81056

由 Rafael J. Wysocki 提交于 4月 10, 2016

Jörg Otte reports that commit a4675fbc (cpufreq: intel_pstate:
Replace timers with utilization update callbacks) caused the CPUs in
his Haswell-based system to stay in the very high frequency region
even if the system is completely idle.

That turns out to be an existing problem in the intel_pstate driver's
P-state selection algorithm for Core processors. Namely, all
decisions made by that algorithm are based on the average frequency
of the CPU between sampling events and on the P-state requested on
the last invocation, so it may get stuck at a very hight frequency
even if the utilization of the CPU is very low (in fact, it may get
stuck in a inadequate P-state regardless of the CPU utilization).
The only way to kick it out of that limbo is a sufficiently long idle
period (3 times longer than the prescribed sampling interval), but if
that doesn't happen often enough (eg. due to a timing change like
after the above commit), the P-state of the CPU may be inadequate
pretty much all the time.

To address the most egregious manifestations of that issue, reset the
core_busy value used to determine the next P-state to request if the
utilization of the CPU, determined with the help of the MPERF
feedback register and the TSC, is below 1%.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=115771Reported-and-tested-by: NJörg Otte <jrg.otte@gmail.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

ffb81056

09 4月, 2016 2 次提交

intel_pstate: Use pr_fmt · 4836df17

由 Joe Perches 提交于 4月 05, 2016

Prefix the output using the more common kernel style.
Signed-off-by: NJoe Perches <joe@perches.com>
Acked-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
[ rjw: Rebase ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

4836df17

intel_pstate: Avoid pointless FRAC_BITS shifts under div_fp() · 22590efb

由 Rafael J. Wysocki 提交于 4月 09, 2016

There are multiple places in intel_pstate where int_tofp() is applied
to both arguments of div_fp(), but this is pointless, because int_tofp()
simply shifts its argument to the left by FRAC_BITS which mathematically
is equivalent to multuplication by 2^FRAC_BITS, so if this is done
to both arguments of a division, the extra factors will cancel each
other during that operation anyway.

Drop the pointless int_tofp() applied to div_fp() arguments throughout
the driver.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

22590efb

05 4月, 2016 2 次提交

cpufreq: intel_pstate: Documenation for structures · 13ad7701

由 Srinivas Pandruvada 提交于 4月 03, 2016

No code change. Only added kernel doc style comments for structures.
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

13ad7701

cpufreq: intel_pstate: fix inconsistency in setting policy limits · 30a39153

由 Srinivas Pandruvada 提交于 4月 03, 2016

When user sets performance policy using cpufreq interface, it is possible
that because of policy->max limits, the actual performance is still
limited. But the current implementation will silently switch the
policy to powersave and start using powersave limits. If user modifies
any limits using intel_pstate sysfs, this is actually changing powersave
limits.

The current implementation tracks limits under powersave and performance
policy using two different variables. When policy->max is less than
policy->cpuinfo.max_freq, only powersave limit variable is used.

This fix causes the performance limits variable to be used always when
the policy is performance.
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

30a39153

02 4月, 2016 2 次提交

cpufreq: sched: Helpers to add and remove update_util hooks · 0bed612b

由 Rafael J. Wysocki 提交于 4月 02, 2016

Replace the single helper for adding and removing cpufreq utilization
update hooks, cpufreq_set_update_util_data(), with a pair of helpers,
cpufreq_add_update_util_hook() and cpufreq_remove_update_util_hook(),
and modify the users of cpufreq_set_update_util_data() accordingly.

With the new helpers, the code using them doesn't need to worry
about the internals of struct update_util_data and in particular
it doesn't need to worry about populating the func field in it
properly upfront.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>

0bed612b

intel_pstate: Avoid extra invocation of intel_pstate_sample() · febce40f

由 Rafael J. Wysocki 提交于 4月 02, 2016

The initialization of intel_pstate for a given CPU involves populating
the fields of its struct cpudata that represent the previous sample,
but currently that is done in a problematic way.

Namely, intel_pstate_init_cpu() makes an extra call to
intel_pstate_sample() so it reads the current register values that
will be used to populate the "previous sample" record during the
next invocation of intel_pstate_sample().  However, after commit
a4675fbc (cpufreq: intel_pstate: Replace timers with utilization
update callbacks) that doesn't work for last_sample_time, because
the time value is passed to intel_pstate_sample() as an argument now.
Passing 0 to it from intel_pstate_init_cpu() is problematic, because
that causes cpu->last_sample_time == 0 to be visible in
get_target_pstate_use_performance() (and hence the extra
cpu->last_sample_time > 0 check in there) and effectively allows
the first invocation of intel_pstate_sample() from
intel_pstate_update_util() to happen immediately after the
initialization which may lead to a significant "turn on"
effect in the governor algorithm.

To mitigate that issue, rework the initialization to avoid the
extra intel_pstate_sample() call from intel_pstate_init_cpu().
Instead, make intel_pstate_sample() return false if it has been
called with cpu->sample.time equal to zero, which will make
intel_pstate_update_util() skip the sample in that case, and
reset cpu->sample.time from intel_pstate_set_update_util_hook()
to make the algorithm start properly every time the hook is set.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

febce40f

31 3月, 2016 1 次提交

intel_pstate: Do not set utilization update hook too early · bb6ab52f

由 Rafael J. Wysocki 提交于 3月 31, 2016

The utilization update hook in the intel_pstate driver is set too
early, as it only should be set after the policy has been fully
initialized by the core.  That may cause intel_pstate_update_util()
to use incorrect data and put the CPUs into incorrect P-states as
a result.

To prevent that from happening, make intel_pstate_set_policy() set
the utilization update hook instead of intel_pstate_init_cpu() so
intel_pstate_update_util() only runs when all things have been
initialized as appropriate.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

bb6ab52f

20 3月, 2016 1 次提交

intel_pstate: Do not call wrmsrl_on_cpu() with disabled interrupts · fdfdb2b1

由 Rafael J. Wysocki 提交于 3月 18, 2016

After commit a4675fbc (cpufreq: intel_pstate: Replace timers with
utilization update callbacks) wrmsrl_on_cpu() cannot be called in the
intel_pstate_adjust_busy_pstate() path as that is executed with
disabled interrupts.  However, atom_set_pstate() called from there
via intel_pstate_set_pstate() uses wrmsrl_on_cpu() to update the
IA32_PERF_CTL MSR which triggers the WARN_ON_ONCE() in
smp_call_function_single().

The reason why wrmsrl_on_cpu() is used by atom_set_pstate() is
because intel_pstate_set_pstate() calling it is also invoked during
the initialization and cleanup of the driver and in those cases it is
not guaranteed to be run on the CPU that is being updated.  However,
in the case when intel_pstate_set_pstate() is called by
intel_pstate_adjust_busy_pstate(), wrmsrl() can be used to update
the register safely.  Moreover, intel_pstate_set_pstate() already
contains code that only is executed if the function is called by
intel_pstate_adjust_busy_pstate() and there is a special argument
passed to it because of that.

To fix the problem at hand, rearrange the code taking the above
observations into account.

First, replace the ->set() callback in struct pstate_funcs with a
->get_val() one that will return the value to be written to the
IA32_PERF_CTL MSR without updating the register.

Second, split intel_pstate_set_pstate() into two functions,
intel_pstate_update_pstate() to be called by
intel_pstate_adjust_busy_pstate() that will contain all of the
intel_pstate_set_pstate() code which only needs to be executed in
that case and will use wrmsrl() to update the MSR (after obtaining
the value to write to it from the ->get_val() callback), and
intel_pstate_set_min_pstate() to be invoked during the
initialization and cleanup that will set the P-state to the
minimum one and will update the MSR using wrmsrl_on_cpu().

Finally, move the code shared between intel_pstate_update_pstate()
and intel_pstate_set_min_pstate() to a new static inline function
intel_pstate_record_pstate() and make them both call it.

Of course, that unifies the handling of the IA32_PERF_CTL MSR writes
between Atom and Core.

Fixes: a4675fbc (cpufreq: intel_pstate: Replace timers with utilization update callbacks)
Reported-and-tested-by: NJosh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

fdfdb2b1

11 3月, 2016 1 次提交

intel_pstate: Do not skip samples partially · 4fec7ad5

由 Rafael J. Wysocki 提交于 3月 10, 2016

If the current value of MPERF or the current value of TSC is the
same as the previous one, respectively, intel_pstate_sample() bails
out early and skips the sample.

However, intel_pstate_adjust_busy_pstate() is still called in that
case which is not correct, so modify intel_pstate_sample() to
return a bool value indicating whether or not the sample has been
taken and use it to decide whether or not to call
intel_pstate_adjust_busy_pstate().

While at it, remove redundant parentheses from the MPERF/TSC
check in intel_pstate_sample().
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

4fec7ad5