提交 · 0e285e90887bcb248178d55960e1276188c657c9 · openeuler / raspberrypi-kernel

18 4月, 2017 1 次提交

cpufreq: schedutil: Use policy-dependent transition delays · 1b72e7fd

由 Rafael J. Wysocki 提交于 4月 11, 2017

Make the schedutil governor take the initial (default) value of the
rate_limit_us sysfs attribute from the (new) transition_delay_us
policy parameter (to be set by the scaling driver).

That will allow scaling drivers to make schedutil use smaller default
values of rate_limit_us and reduce the default average time interval
between consecutive frequency changes.

Make intel_pstate set transition_delay_us to 500.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>

1b72e7fd

30 3月, 2017 1 次提交

cpufreq: intel_pstate: Add support for Gemini Lake · 630e5757

由 Box, David E 提交于 3月 29, 2017

Use same parameters as INTEL_FAM6_ATOM_GOLDMONT to enable
Gemini Lake.
Signed-off-by: NBox, David E <david.e.box@intel.com>
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

630e5757

29 3月, 2017 16 次提交

cpufreq: intel_pstate: Eliminate intel_pstate_get_min_max() · b02aabe8

由 Rafael J. Wysocki 提交于 3月 28, 2017

Some computations in intel_pstate_get_min_max() are not necessary
and one of its two callers doesn't even use the full result.

First off, the fixed-point value of cpu->max_perf represents a
non-negative number between 0 and 1 inclusive and cpu->min_perf
cannot be greater than cpu->max_perf. It is not necessary to check
those conditions every time the numbers in question are used.

Moreover, since intel_pstate_max_within_limits() only needs the
upper boundary, it doesn't make sense to compute the lower one in
there and returning min and max from intel_pstate_get_min_max()
via pointers doesn't look particularly nice.

For the above reasons, drop intel_pstate_get_min_max(), add a helper
to get the base P-state for min/max computations and carry out them
directly in the previous callers of intel_pstate_get_min_max().
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

b02aabe8

cpufreq: intel_pstate: Do not walk policy->cpus · 2bfc4cbb

由 Rafael J. Wysocki 提交于 3月 28, 2017

intel_pstate_hwp_set() is the only function walking policy->cpus
in intel_pstate.  The rest of the code simply assumes one CPU per
policy, including the initialization code.

Therefore it doesn't make sense for intel_pstate_hwp_set() to
walk policy->cpus as it is guaranteed to have only one bit set
for policy->cpu.

For this reason, rearrange intel_pstate_hwp_set() to take the CPU
number as the argument and drop the loop over policy->cpus from it.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

2bfc4cbb

cpufreq: intel_pstate: Introduce pid_in_use() · 8ca6ce37

由 Rafael J. Wysocki 提交于 3月 28, 2017

Add a new function pid_in_use() to return the information on whether
or not the PID-based P-state selection algorithm is in use.

That allows a couple of complicated conditions in the code to be
reduced to simple checks against the new function's return value.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

8ca6ce37

cpufreq: intel_pstate: Drop struct cpu_defaults · 2f49afc2

由 Rafael J. Wysocki 提交于 3月 28, 2017

The cpu_defaults structure is redundant, because it only contains
one member of type struct pstate_funcs which can be used directly
instead of struct cpu_defaults.

For this reason, drop struct cpu_defaults, use struct pstate_funcs
directly instead of it where applicable and rename all of the
variables of that type accordingly.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

2f49afc2

cpufreq: intel_pstate: Move cpu_defaults definitions · de4a76cb

由 Rafael J. Wysocki 提交于 3月 28, 2017

Move the definitions of the cpu_defaults structures after the
definitions of utilization update callback routines to avoid
extra declarations of the latter.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

de4a76cb

cpufreq: intel_pstate: Add update_util callback to pstate_funcs · 67dd9bf4

由 Rafael J. Wysocki 提交于 3月 28, 2017

Avoid using extra function pointers during P-state selection by
dropping the get_target_pstate member from struct pstate_funcs,
adding a new update_util callback to it (to be registered with
the CPU scheduler as the utilization update callback in the active
mode) and reworking the utilization update callback routines to
invoke specific P-state selection functions directly.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

67dd9bf4

cpufreq: intel_pstate: Use different utilization update callbacks · eabd22c6

由 Rafael J. Wysocki 提交于 3月 28, 2017

Notice that some overhead in the utilization update callbacks
registered by intel_pstate in the active mode can be avoided if
those callbacks are tailored to specific configurations of the
driver.  For example, the utilization update callback for the HWP
enabled case only needs to update the average CPU performance
periodically whereas the utilization update callback for the
PID-based algorithm does not need to take IO-wait boosting into
account and so on.

With that in mind, define three utilization update callbacks for
three different use cases: HWP enabled, the CPU load "powersave"
P-state selection algorithm and the PID-based "powersave" P-state
selection algorithm and modify the driver initialization to
choose the callback matching its current configuration.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

eabd22c6

cpufreq: intel_pstate: Modify check in intel_pstate_update_status() · 0042b2c0

由 Rafael J. Wysocki 提交于 3月 28, 2017

One of the checks in intel_pstate_update_status() implicitly relies
on the information that there are only two struct cpufreq_driver
objects available, but it is better to do it directly against the
value it really is about (to make the code easier to follow if
nothing else).
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

0042b2c0

cpufreq: intel_pstate: Drop driver_registered variable · ee8df89a

由 Rafael J. Wysocki 提交于 3月 28, 2017

The driver_registered variable in intel_pstate is used for checking
whether or not the driver has been registered, but intel_pstate_driver
can be used for that too (with the rule that the driver is not
registered as long as it is NULL).

That is a bit more straightforward and the code may be simplified
a bit this way, so modify the driver accordingly.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

ee8df89a

cpufreq: intel_pstate: Skip unnecessary PID resets on init · 694cb173

由 Rafael J. Wysocki 提交于 3月 28, 2017

PID controller parameters only need to be initialized if the
get_target_pstate_use_performance() P-state selection routine
is going to be used.  It is not necessary to initialize them
otherwise, so don't do that.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

694cb173

cpufreq: intel_pstate: Set HWP sampling interval once · 7aec5b50

由 Rafael J. Wysocki 提交于 3月 28, 2017

In the HWP enabled case pid_params.sample_rate_ns only needs to be
updated once, because it is global, so do that when setting hwp_active
instead of doing it during the initialization of every CPU.

Moreover, pid_params.sample_rate_ms is never used if HWP is enabled,
so do not update it at all then.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

7aec5b50

cpufreq: intel_pstate: Clean up intel_pstate_busy_pid_reset() · ff35f02e

由 Rafael J. Wysocki 提交于 3月 28, 2017

intel_pstate_busy_pid_reset() is the only caller of pid_reset(),
pid_p_gain_set(), pid_i_gain_set(), and pid_d_gain_set().  Moreover,
it passes constants as two parameters of pid_reset() and all of
the other routines above essentially contain the same code, so
fold all of them into the caller and drop unnecessary computations.

Introduce percent_fp() for converting integer values in percent
to fixed-point fractions and use it in the above code cleanup.

Finally, rename intel_pstate_busy_pid_reset() to
intel_pstate_pid_reset() as it also is used for the
initialization of PID parameters for every CPU and the
meaning of the "busy" part of the name is not particularly
clear.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

ff35f02e

cpufreq: intel_pstate: Fold intel_pstate_reset_all_pid() into the caller · 4ddd0146

由 Rafael J. Wysocki 提交于 3月 28, 2017

There is only one caller of intel_pstate_reset_all_pid(), which is
pid_param_set() used in the debugfs interface only, and having that
code split does not make it particularly convenient to follow.

For this reason, move the body of intel_pstate_reset_all_pid() into
its caller and drop that function.

Also change the loop from for_each_online_cpu() (which is obviously
racy with respect to CPU offline/online) to for_each_possible_cpu(),
so that all PID parameters are reset for all CPUs regardless of their
online/offline status (to prevent, for example, a previously offline
CPU from going online with a stale set of PID parameters).
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

4ddd0146

cpufreq: intel_pstate: Initialize pid_params statically · 5c439053

由 Rafael J. Wysocki 提交于 3月 28, 2017

Notice that both the existing struct cpu_defaults instances in which
PID parameters are actually initialized use the same values of those
parameters, so it is not really necessary to copy them over to
pid_params dynamically.

Instead, initialize pid_params statically with those values and
drop the unused pid_policy member from struct cpu_defaults along
with copy_pid_params() used for initializing it.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

5c439053

cpufreq: intel_pstate: Drop pointless initialization of PID parameters · 64043678

由 Rafael J. Wysocki 提交于 3月 28, 2017

The P-state selection algorithm used by intel_pstate for Atom
processors is not based on the PID controller and the initialization
of PID parametrs for those processors is pointless and confusing, so
drop it.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

64043678

cpufreq: intel_pstate: Eliminate struct perf_limits · e14cf885

由 Rafael J. Wysocki 提交于 3月 28, 2017

After recent changes the purpose of struct perf_limits is not
particularly clear any more and the code may be made somewhat
easier to follow by eliminating it, so go for that.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

e14cf885

24 3月, 2017 4 次提交

cpufreq: intel_pstate: Avoid transient updates of cpuinfo.max_freq · 80b120ca

由 Rafael J. Wysocki 提交于 3月 23, 2017

Both intel_pstate_verify_policy() and intel_cpufreq_verify_policy()
set policy->cpuinfo.max_freq depending on the turbo status, but the
updates made by them are discarded by the core, because the policy
object passed to them by the core is temporary and cpuinfo.max_freq
from that object is not copied to the final policy object in
cpufreq_set_policy().

However, cpufreq_set_policy() passes the temporary policy object
to the ->setpolicy callback of the driver, so intel_pstate_set_policy()
actually sees the policy->cpuinfo.max_freq value updated by
intel_pstate_verify_policy() and not the final one.  It also
updates policy->max sometimes which basically has no effect after
it returns, because the core discards that update.

To avoid confusion, eliminate policy->cpuinfo.max_freq updates from
intel_pstate_verify_policy() and intel_cpufreq_verify_policy()
entirely and check the maximum frequency explicitly in
intel_pstate_update_perf_limits() instead of relying on the
transiently updated policy->cpuinfo.max_freq value.

Moreover, move the max->policy adjustment carried out in
intel_pstate_set_policy() to a separate function and call that
function from the ->verify driver callbacks to ensure that it will
actually be effective.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

80b120ca

cpufreq: intel_pstate: Active mode P-state limits rework · c5a2ee7d

由 Rafael J. Wysocki 提交于 3月 22, 2017

The coordination of P-state limits used by intel_pstate in the active
mode (ie. by default) is problematic, because it synchronizes all of
the limits (ie. the global ones and the per-policy ones) so as to use
one common pair of P-state limits (min and max) across all CPUs in
the system.  The drawbacks of that are as follows:

 - If P-states are coordinated in hardware, it is not necessary
   to coordinate them in software on top of that, so in that case
   all of the above activity is in vain.

 - If P-states are not coordinated in hardware, then the processor
   is actually capable of setting different P-states for different
   CPUs and coordinating them at the software level simply doesn't
   allow that capability to be utilized.

 - The coordination works in such a way that setting a per-policy
   limit (eg. scaling_max_freq) for one CPU causes the common
   effective limit to change (and it will affect all of the other
   CPUs too), but subsequent reads from the corresponding sysfs
   attributes for the other CPUs will return stale values (which
   is confusing).

 - Reads from the global P-state limit attributes, min_perf_pct and
   max_perf_pct, return the effective common values and not the last
   values set through these attributes.  However, the last values
   set through these attributes become hard limits that cannot be
   exceeded by writes to scaling_min_freq and scaling_max_freq,
   respectively, and they are not exposed, so essentially users
   have to remember what they are.

All of that is painful enough to warrant a change of the management
of P-state limits in the active mode.

To that end, redesign the active mode P-state limits management in
intel_pstate in accordance with the following rules:

 (1) All CPUs are affected by the global limits (that is, none of
     them can be requested to run faster than the global max and
     none of them can be requested to run slower than the global
     min).

 (2) Each individual CPU is affected by its own per-policy limits
     (that is, it cannot be requested to run faster than its own
     per-policy max and it cannot be requested to run slower than
     its own per-policy min).

 (3) The global and per-policy limits can be set independently.

Also, the global maximum and minimum P-state limits will be always
expressed as percentages of the maximum supported turbo P-state.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

c5a2ee7d

cpufreq: intel_pstate: Use load-based P-state selection more widely · 55395345

由 Rafael J. Wysocki 提交于 3月 22, 2017

Extend the set of systems for which intel_pstate will use the
"powersave" P-state selection algorithm based on CPU load in the
active mode by systems with ACPI preferred profile set to "tablet",
"appliance PC", "desktop", or "workstation" (ie. everything with a
specified preferred profile that is not a "server").
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

55395345

cpufreq: intel_pstate: Support HWP processors in all operation modes · eb5139d1

由 Rafael J. Wysocki 提交于 3月 22, 2017

Currently, some processors supporting HWP are only supported by
intel_pstate if HWP is actually going to be used and not supported
otherwise which is confusing.

Specifically, they are not supported if "intel_pstate=no_hwp" is
passed to the kernel in the command line or if the driver is started
in the passive mode ("intel_pstate=passive").

There is no real reason for that, because everything about those
processor is known anyway and the driver can work with them in all
modes, so make that happen, but use the load-based P-state selection
algorithm for the active mode "powersave" policy with them.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

eb5139d1

22 3月, 2017 1 次提交

cpufreq: intel_pstate: Fix policy data management in passive mode · 64897b20

由 Rafael J. Wysocki 提交于 3月 21, 2017

The policy->cpuinfo.max_freq and policy->max updates in
intel_cpufreq_turbo_update() are excessive as they are done for no
good reason and may lead to problems in principle, so they should be
dropped.  However, after dropping them intel_cpufreq_turbo_update()
becomes almost entirely pointless, because the check made by it is
made again down the road in intel_pstate_prepare_request().  The
only thing in it that still needs to be done is the call to
update_turbo_state(), so drop intel_cpufreq_turbo_update() altogether
and make its callers invoke update_turbo_state() directly instead of
it.

In addition to that, fix intel_cpufreq_verify_policy() so that it
checks global.no_turbo in addition to global.turbo_disabled when
updating policy->cpuinfo.max_freq to make it consistent with
intel_pstate_verify_policy().

Fixes: 001c76f0 (cpufreq: intel_pstate: Generic governors support)
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

64897b20

18 3月, 2017 1 次提交

cpufreq: intel_pstate: One set of global limits in active mode · 7de32556

由 Rafael J. Wysocki 提交于 3月 18, 2017

In the active mode intel_pstate currently uses two sets of global
limits, each associated with one of the possible scaling_governor
settings in that mode: "powersave" or "performance".

The driver switches over from one of those sets to the other
depending on the scaling_governor setting for the last CPU whose
per-policy cpufreq interface in sysfs was last used to change
parameters exposed in there.  That obviously leads to no end of
issues when the scaling_governor settings differ between CPUs.

The most recent issue was introduced by commit a240c4aa (cpufreq:
intel_pstate: Do not reinit performance limits in ->setpolicy)
that eliminated the reinitialization of "performance" limits in
intel_pstate_set_policy() preventing the max limit from being set
to anything below 100, among other things.

Namely, an undesirable side effect of commit a240c4aa is that
now, after setting scaling_governor to "performance" in the active
mode, the per-policy limits for the CPU in question go to the highest
level and stay there even when it is switched back to "powersave"
later.

As it turns out, some distributions set scaling_governor to
"performance" temporarily for all CPUs to speed-up system
initialization, so that change causes them to misbehave later.

To fix that, get rid of the performance/powersave global limits
split and use just one set of global limits for everything.

From the user's persepctive, after this modification, when
scaling_governor is switched from "performance" to "powersave"
or the other way around on one CPU, the limits settings (ie. the
global max/min_perf_pct and per-policy scaling_max/min_freq for
any CPUs) will not change.  Still, switching from "performance"
to "powersave" or the other way around changes the way in which
P-states are selected and in particular "performance" causes the
driver to always request the highest P-state it is allowed to ask
for for the given CPU.

Fixes: a240c4aa (cpufreq: intel_pstate: Do not reinit performance limits in ->setpolicy)
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

7de32556

15 3月, 2017 1 次提交

cpufreq: intel_pstate: Avoid percentages in limits-related computations · e4c204ce

由 Rafael J. Wysocki 提交于 3月 14, 2017

Currently, intel_pstate_update_perf_limits() first converts the
policy minimum and maximum limits into percentages of the maximum
turbo frequency (rounding up to an integer) and then converts these
percentages to fractions (by using fixed-point arithmetic to divide
them by 100).

That introduces a rounding error unnecessarily, because the fractions
can be obtained by carrying out fixed-point divisions directly on the
input numbers.

Rework the computations in intel_pstate_hwp_set() to use fractions
instead of percentages (and drop redundant local variables from
there) and modify intel_pstate_update_perf_limits() to compute the
fractions directly and percentages out of them.

While at it, introduce percent_ext_fp() for converting percentages
to fractions (with extended number of fraction bits) and use it in
the computations.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

e4c204ce

14 3月, 2017 2 次提交

cpufreq: intel_pstate: Correct frequency setting in the HWP mode · 3f8ed54a

由 Srinivas Pandruvada 提交于 3月 13, 2017

In the functions intel_pstate_hwp_set(), min/max range from HWP capability
MSR along with max_perf_pct and min_perf_pct, is used to set the HWP
request MSR. In some cases this doesn't result in the correct HWP max/min
in HWP request.

For example: In the following case:

HWP capabilities from MSR 0x771
0x70a1220

Here cpufreq min/max frequencies from above MSR dump are 700MHz and 3.2GHz
respectively.

This will result in
hwp_min = 0x07
hwp_max = 0x20

To limit max frequency to 2GHz:

perf_limits->max_perf_pct = 63 (2GHz as a percent of 3.2GHz rounded up)

With the current calculation:
adj_range = max_perf_pct * range / 100;
adj_range = 63 * (32 - 7) / 100
adj_range = 15

max = hw_min + adj_range;
max = 7 + 15 = 22

This will result in HWP request of 0x160f, which will result in a
frequency cap of 2.2GHz not 2GHz.

The problem with the above calculation is that hwp_min of 7 is treated
as 0% in the range. But max_perf_pct is calculated with respect to minimum
as 0 and max as 3.2GHz or hwp_max, so adding hwp_min to it will result in
more than the desired.

Since the min_perf_pct and max_perf_pct is already a percent of max
frequency or hwp_max, this min/max HWP request value can be calculated
directly applying these percentage to hwp_max.
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

3f8ed54a

cpufreq: intel_pstate: Update pid_params.sample_rate_ns in pid_param_set() · 6e7408ac

由 Rafael J. Wysocki 提交于 3月 12, 2017

Fix the debugfs interface for PID tuning to actually update
pid_params.sample_rate_ns on PID parameters updates, as changing
pid_params.sample_rate_ms via debugfs has no effect now.

Fixes: a4675fbc (cpufreq: intel_pstate: Replace timers with utilization update callbacks)
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>

6e7408ac

13 3月, 2017 1 次提交

cpufreq: intel_pstate: Drop redundant wrapper function · 5f98ced1

由 Rafael J. Wysocki 提交于 3月 09, 2017

intel_pstate_hwp_set_policy() is a wrapper around
intel_pstate_hwp_set(), but the only value it adds is to check
hwp_active before calling the latter and one of its two callers
has already checked hwp_active before that happens, so in that
code path the additional check is redundant and using the wrapper
is rather pointless.

For this reason, drop intel_pstate_hwp_set_policy() and make its
callers invoke intel_pstate_hwp_set() directly (after checking
hwp_active).
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>

5f98ced1

06 3月, 2017 3 次提交

cpufreq: intel_pstate: Do not reinit performance limits in ->setpolicy · a240c4aa

由 Rafael J. Wysocki 提交于 3月 02, 2017

If the current P-state selection algorithm is set to "performance"
in intel_pstate_set_policy(), the limits may be initialized from
scratch, but only if no_turbo is not set and the maximum frequency
allowed for the given CPU (i.e. the policy object representing it)
is at least equal to the max frequency supported by the CPU.  In all
of the other cases, the limits will not be updated.

For example, the following can happen:

 # cat intel_pstate/status
 active
 # echo performance > cpufreq/policy0/scaling_governor
 # cat intel_pstate/min_perf_pct
 100
 # echo 94 > intel_pstate/min_perf_pct
 # cat intel_pstate/min_perf_pct
 100
 # cat cpufreq/policy0/scaling_max_freq
 3100000
 echo 3000000 > cpufreq/policy0/scaling_max_freq
 # cat intel_pstate/min_perf_pct
 94
 # echo 95 > intel_pstate/min_perf_pct
 # cat intel_pstate/min_perf_pct
 95

That is confusing for two reasons.  First, the initial attempt to
change min_perf_pct to 94 seems to have no effect, even though
setting the global limits should always work.  Second, after
changing scaling_max_freq for policy0 the global min_perf_pct
attribute shows 94, even though it should have not been affected
by that operation in principle.

Moreover, the final attempt to change min_perf_pct to 95 worked
as expected, because scaling_max_freq for the only policy with
scaling_governor equal to "performance" was different from the
maximum at that time.

To make all that confusion go away, modify intel_pstate_set_policy()
so that it doesn't reinitialize the limits at all.

At the same time, change intel_pstate_set_performance_limits() to
set min_sysfs_pct to 100 in the "performance" limits set so that
switching the P-state selection algorithm to "performance" causes
intel_pstate/min_perf_pct in sysfs to go to 100 (or whatever value
min_sysfs_pct in the "performance" limits is set to later).

That requires per-CPU limits to be initialized explicitly rather
than by copying the global limits to avoid setting min_sysfs_pct
in the per-CPU limits to 100.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

a240c4aa

cpufreq: intel_pstate: Fix intel_pstate_verify_policy() · d74b1992

由 Rafael J. Wysocki 提交于 3月 01, 2017

The code added to intel_pstate_verify_policy() by commit 1443ebba
(cpufreq: intel_pstate: Fix sysfs limits enforcement for performance
policy) should use perf_limits instead of limits, because otherwise
setting global limits via sysfs may affect policies inconsistently.

For example, in the sequence of shell commands below, the
scaling_min_freq attribute for policy1 and policy2 should be
affected in the same way, because scaling_governor is set in
the same way for both of them:

 # cat cpufreq/policy1/scaling_governor
 powersave
 # cat cpufreq/policy2/scaling_governor
 powersave
 # echo performance > cpufreq/policy0/scaling_governor
 # echo 94 > intel_pstate/min_perf_pct
 # cat cpufreq/policy0/scaling_min_freq
 2914000
 # cat cpufreq/policy1/scaling_min_freq
 2914000
 # cat cpufreq/policy2/scaling_min_freq
 800000

The are affected differently, because intel_pstate_verify_policy()
is invoked with limits set to &performance_limits (left behind by
policy0) for policy1 and with limits set to &powersave_limits (left
behind by policy1) for policy2.  Since perf_limits is set to the
set of limits matching the policy being updated, using it instead
of limits fixes the inconsistency.

Fixes: 1443ebba (cpufreq: intel_pstate: Fix sysfs limits enforcement for performance policy)
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

d74b1992

cpufreq: intel_pstate: Fix global settings in active mode · cd59b4be

由 Rafael J. Wysocki 提交于 3月 01, 2017

Commit 111b8b3f (cpufreq: intel_pstate: Always keep all
limits settings in sync) changed intel_pstate to invoke
cpufreq_update_policy() for every registered CPU on global sysfs
attributes updates, but that led to undesirable effects in the
active mode if the "performance" P-state selection algorithm is
configufred for one CPU and the "powersave" one is chosen for
all of the other CPUs.

Namely, in that case, the following is possible:

 # cd /sys/devices/system/cpu/
 # cat intel_pstate/max_perf_pct
 100
 # cat intel_pstate/min_perf_pct
 26
 # echo performance > cpufreq/policy0/scaling_governor
 # cat intel_pstate/max_perf_pct
 100
 # cat intel_pstate/min_perf_pct
 100
 # echo 94 > intel_pstate/min_perf_pct
 # cat intel_pstate/min_perf_pct
 26

The reason why this happens is because intel_pstate attempts to
maintain two sets of global limits in the active mode, one for
the "performance" P-state selection algorithm and one for the
"powersave"  P-state selection algorithm, but the P-state selection
algorithms are set per policy, so the global limits cannot reflect
all of them at the same time if they are different for different
policies.

In the particular situation above, the attempt to change
min_perf_pct to 94 caused cpufreq_update_policy() to be run
for a CPU with the "powersave"  P-state selection algorithm
and intel_pstate_set_policy() called by it silently switched the
global limits to the "powersave" set which finally was reflected
by the sysfs interface.

To prevent that from happening, modify intel_pstate_update_policies()
to always switch back to the set of limits that was used right before
it has been invoked.

Fixes: 111b8b3f (cpufreq: intel_pstate: Always keep all limits settings in sync)
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

cd59b4be

04 3月, 2017 3 次提交

cpufreq: intel_pstate: Avoid triggering cpu_frequency tracepoint unnecessarily · 64078299

由 Rafael J. Wysocki 提交于 3月 03, 2017

In the passive mode the cpu_frequency trace event is already
triggered by the cpufreq core or by scaling governors, so
intel_pstate should not trigger it once again for the same
P-state updates.

In addition to that, the frequency returned by
intel_cpufreq_fast_switch() and passed via freqs.new from
intel_cpufreq_target() to cpufreq_freq_transition_end() should
reflect the P-state actually set, so make that happen.

Fixes: 001c76f0 (cpufreq: intel_pstate: Generic governors support)
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

64078299

cpufreq: intel_pstate: Fix intel_cpufreq_verify_policy() · 7f17326f

由 Rafael J. Wysocki 提交于 3月 02, 2017

The intel_pstate_update_perf_limits() called from
intel_cpufreq_verify_policy() may cause global P-state limits
to change which is generally confusing and unnecessary.

In the passive mode the global limits are only applied to the
frequency selected by the scaling governor (they are not taken
into account by governors when making decisions anyway), so making
them follow the per-policy limits serves no purpose and may go
against user expectations (as it generally causes the global
attributes in sysfs to change even though they have not been
written to in some cases).

Fix that by dropping the intel_pstate_update_perf_limits()
invocation from intel_cpufreq_verify_policy() (which also
reduces the code size by a few lines).

This change does not affect the per-CPU limits case, because those
limits allow any P-state to be set by default in the passive mode
and it removes the only piece of code updating them in that mode,
so the per-policy settings will be the only ones taken into account
in that case as expected.

Fixes: 001c76f0 (cpufreq: intel_pstate: Generic governors support)
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

7f17326f

cpufreq: intel_pstate: Do not use performance_limits in passive mode · 2bc756e7

由 Rafael J. Wysocki 提交于 3月 02, 2017

Using performance_limits in the passive mode doesn't make
sense, because in that mode the global limits are applied to the
frequency selected by the scaling governor.

The maximum and minimum P-state limits in performance_limits are both
set to 100 percent which will put all CPUs into the turbo range
regardless of what governor is used and what frequencies are
selected by it (that is particularly undesirable on CPUs with the
generic powersave governor attached).

For this reason, make intel_pstate_register_driver() always point
limits to powersave_limits in the passive mode.

Fixes: 001c76f0 (cpufreq: intel_pstate: Generic governors support)
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

2bc756e7

02 3月, 2017 1 次提交

sched/headers: Prepare for new header dependencies before moving code to <linux/sched/cpufreq.h> · 55687da1

由 Ingo Molnar 提交于 2月 08, 2017

We are going to split <linux/sched/cpufreq.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/cpufreq.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

55687da1

01 3月, 2017 1 次提交

intel_pstate: use MSR_ATOM_RATIOS definitions from msr-index.h · 92134bdb

由 Len Brown 提交于 2月 25, 2017

Originally, these MSRs were locally defined in this driver.
Now the definitions are in msr-index.h -- use them.
Signed-off-by: NLen Brown <len.brown@intel.com>

92134bdb

28 2月, 2017 1 次提交

cpufreq: intel_pstate: Fix limits issue with operation mode switching · c3a49c89

由 Rafael J. Wysocki 提交于 2月 28, 2017

There is a problem with intel_pstate operation mode switching
introduced by commit fb1fe104 (cpufreq: intel_pstate: Operation
mode control from sysfs), because the global sysfs limits are
preserved across operation modes while per-policy limits are
reinitialized from scratch on a mode switch and both sets of limits
may get out of sync this way.

Fix that by always reinitializing the global limits upon the
registration of the driver.

Fixes: fb1fe104 (cpufreq: intel_pstate: Operation mode control from sysfs)
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

c3a49c89

04 2月, 2017 3 次提交

cpufreq: intel_pstate: Disable energy efficiency optimization · 6e978b22

由 Srinivas Pandruvada 提交于 2月 03, 2017

Some Kabylake desktop processors may not reach max turbo when running in
HWP mode, even if running under sustained 100% utilization.

This occurs when the HWP.EPP (Energy Performance Preference) is set to
"balance_power" (0x80) -- the default on most systems.

It occurs because the platform BIOS may erroneously enable an
energy-efficiency setting -- MSR_IA32_POWER_CTL BIT-EE, which is not
recommended to be enabled on this SKU.

On the failing systems, this BIOS issue was not discovered when the
desktop motherboard was tested with Windows, because the BIOS also
neglects to provide the ACPI/CPPC table, that Windows requires to enable
HWP, and so Windows runs in legacy P-state mode, where this setting has
no effect.

Linux' intel_pstate driver does not require ACPI/CPPC to enable HWP, and
so it runs in HWP mode, exposing this incorrect BIOS configuration.

There are several ways to address this problem.

First, Linux can also run in legacy P-state mode on this system.
As intel_pstate is how Linux enables HWP, booting with
"intel_pstate=disable"
will run in acpi-cpufreq/ondemand legacy p-state mode.

Or second, the "performance" governor can be used with intel_pstate,
which will modify HWP.EPP to 0.

Or third, starting in 4.10, the
/sys/devices/system/cpu/cpufreq/policy*/energy_performance_preference
attribute in can be updated from "balance_power" to "performance".

Or fourth, apply this patch, which fixes the erroneous setting of
MSR_IA32_POWER_CTL BIT_EE on this model, allowing the default
configuration to function as designed.
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: NLen Brown <len.brown@intel.com>
Cc: 4.6+ <stable@vger.kernel.org> # 4.6+
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

6e978b22

cpufreq: intel_pstate: Calculate guaranteed performance for HWP · 8fc7554a

由 Srinivas Pandruvada 提交于 1月 19, 2017

When HWP is active, turbo activation ratio is not used to calculate max
non turbo ratio. But on these systems the max non turbo ratio is decided
by config TDP settings.

This change removes usage of MSR_TURBO_ACTIVATION_RATIO for HWP systems,
instead directly use TDP ratios, when more than one TDPs are available.
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

8fc7554a

cpufreq: intel_pstate: Make HWP limits compatible with legacy · 4e5d3f71

由 Srinivas Pandruvada 提交于 1月 18, 2017

Under HWP the performance limits are calculated using max_perf_pct
and min_perf_pct using possible performance, not available performance.
The available performance can be reduced by no_turbo setting. To make
compatible with legacy mode, use max/min performance percentage with
respect to available performance.
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

4e5d3f71