提交 · d77d4888cb8458b098accd4d7555c0f7f6399c4e · openeuler / raspberrypi-kernel

10 8月, 2017 2 次提交

cpufreq: intel_pstate: Shorten a couple of long names · d77d4888

由 Rafael J. Wysocki 提交于 8月 10, 2017

The names of the INTEL_PSTATE_DEFAULT_SAMPLING_INTERVAL symbol and
the get_target_pstate_use_cpu_load() function don't need to be so
long any more, so make them shorter.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

d77d4888

cpufreq: intel_pstate: Simplify intel_pstate_adjust_pstate() · a891283e

由 Rafael J. Wysocki 提交于 8月 10, 2017

Since there is only one P-state selection routine in intel_pstate
now, make intel_pstate_adjust_pstate() call it directly and drop
the target_pstate argument from that function.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

a891283e

04 8月, 2017 1 次提交

cpufreq: intel_pstate: Improve IO performance with per-core P-states · 7bde2d50

由 Srinivas Pandruvada 提交于 8月 03, 2017

In the current implementation, the response latency between seeing
SCHED_CPUFREQ_IOWAIT set and the actual P-state adjustment can be up
to 10ms.  It can be reduced by bumping up the P-state to the max at
the time SCHED_CPUFREQ_IOWAIT is passed to intel_pstate_update_util().
With this change, the IO performance improves significantly.

For a simple "grep -r . linux" (Here linux is the kernel source
folder) with caches dropped every time on a Broadwell Xeon workstation
with per-core P-states, the user and system time is shorter by as much
as 30% - 40%.

The same performance difference was not observed on clients that don't
support per-core P-state.
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Changelog ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

7bde2d50

01 8月, 2017 1 次提交

cpufreq: intel_pstate: Drop INTEL_PSTATE_HWP_SAMPLING_INTERVAL · f5c13f44

由 Rafael J. Wysocki 提交于 7月 31, 2017

After commit 62611cb9 (intel_pstate: delete scheduler hook in HWP
mode) the INTEL_PSTATE_HWP_SAMPLING_INTERVAL is not used anywhere in
the code, so drop it.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

f5c13f44

28 7月, 2017 1 次提交

cpufreq: intel_pstate: Drop ->get from intel_pstate structure · 22baebd4

由 Rafael J. Wysocki 提交于 7月 26, 2017

The ->get callback in the intel_pstate structure was mostly there
for the scaling_cur_freq sysfs attribute to work, but after commit
f8475cef (x86: use common aperfmperf_khz_on_cpu() to calculate
KHz using APERF/MPERF) that attribute uses arch_freq_get_on_cpu()
provided by the x86 arch code on all processors supported by
intel_pstate, so it doesn't need the ->get callback from the
driver any more.

Moreover, the very presence of the ->get callback in the intel_pstate
structure causes the cpuinfo_cur_freq attribute to be present when
intel_pstate operates in the active mode, which is bogus, because
the role of that attribute is to return the current CPU frequency
as seen by the hardware.  For intel_pstate, though, this is just an
average frequency and not really current, but computed for the
previous sampling interval (the actual current frequency may be
way different at the point this value is obtained by reading from
cpuinfo_cur_freq), and after commit 82b4e03e (intel_pstate: skip
scheduler hook when in "performance" mode) the value in
cpuinfo_cur_freq may be stale or just 0, depending on the driver's
operation mode.  In fact, however, on the hardware supported by
intel_pstate there is no way to read the current CPU frequency
from it, so the cpuinfo_cur_freq attribute should not be present
at all when this driver is in use.

For this reason, drop intel_pstate_get() and clear the ->get
callback pointer pointing to it, so that the cpuinfo_cur_freq is
not present for intel_pstate in the active mode any more.

Fixes: 82b4e03e (intel_pstate: skip scheduler hook when in "performance" mode)
Reported-by: NHuaisheng Ye <yehs1@lenovo.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>

22baebd4

27 7月, 2017 2 次提交

cpufreq: intel_pstate: Drop ->update_util from pstate_funcs · c4f3f70c

由 Rafael J. Wysocki 提交于 7月 25, 2017

All systems use the same P-state selection "powersave" algorithm
in the active mode if HWP is not used, so there's no need to provide
a pointer for it in struct pstate_funcs any more.

Drop ->update_util from struct pstate_funcs and make
intel_pstate_set_update_util_hook() use intel_pstate_update_util()
directly.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

c4f3f70c

cpufreq: intel_pstate: Do not use PID-based P-state selection · 9d0ef7af

由 Rafael J. Wysocki 提交于 7月 25, 2017

All systems with a defined ACPI preferred profile that are not
"servers" have been using the load-based P-state selection algorithm
in intel_pstate since 4.12-rc1 (mobile systems and laptops have been
using it since 4.10-rc1) and no problems with it have been reported
to date.  In particular, no regressions with respect to the PID-based
P-state selection have been reported.  Also testing indicates that
the P-state selection algorithm based on CPU load is generally on par
with the PID-based algorithm performance-wise, and for some workloads
it turns out to be better than the other one, while being more
straightforward and easier to understand at the same time.

Moreover, the PID-based P-state selection algorithm in intel_pstate
is known to be unstable in some situation and generally problematic,
the issues with it are hard to address and it has become a
significant maintenance burden.

For these reasons, make intel_pstate use the "powersave" P-state
selection algorithm based on CPU load in the active mode on all
systems and drop the PID-based P-state selection code along with
all things related to it from the driver.  Also update the
documentation accordingly.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

9d0ef7af

14 7月, 2017 1 次提交

cpufreq: intel_pstate: Correct the busy calculation for KNL · 6e34e1f2

由 Srinivas Pandruvada 提交于 7月 13, 2017

The busy percent calculated for the Knights Landing (KNL) platform
is 1024 times smaller than the correct busy value.  This causes
performance to get stuck at the lowest ratio.

The scaling algorithm used for KNL is performance-based, but it still
looks at the CPU load to set the scaled busy factor to 0 when the
load is less than 1 percent.  In this case, since the computed load
is 1024x smaller than it should be, the scaled busy factor will
always be 0, irrespective of CPU business.

This needs a fix similar to the turbostat one in commit b2b34dfe
(tools/power turbostat: KNL workaround for %Busy and Avg_MHz).

For this reason, add one more callback to processor-specific
callbacks to specify an MPERF multiplier represented by a number of
bit positions to shift the value of that register to the left to
copmensate for its rate difference with respect to the TSC.  This
shift value is used during CPU busy calculations.

Fixes: ffb81056 (intel_pstate: Avoid getting stuck in high P-states when idle)
Reported-and-tested-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: 4.6+ <stable@vger.kernel.org> # 4.6+
[ rjw: Changelog ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

6e34e1f2

12 7月, 2017 1 次提交

cpufreq: intel_pstate: Fix ratio setting for min_perf_pct · d4436c0d

由 Srinivas Pandruvada 提交于 7月 10, 2017

When the minimum performance limit percentage is set to the power-up
default, it is possible that minimum performance ratio is off by one.

In the set_policy() callback the minimum ratio is calculated by
applying global.min_perf_pct to turbo_ratio and rounding up, but the
power-up default global.min_perf_pct is already rounded up to the
next percent in min_perf_pct_min().  That results in two round up
operations, so for the default min_perf_pct one of them is not
required.

It is better to remove rounding up in min_perf_pct_min() as this
matches the displayed min_perf_pct prior to commit c5a2ee7d
(cpufreq: intel_pstate: Active mode P-state limits rework) in 4.12.

For example on a platform with max turbo ratio of 37 and minimum
ratio of 10, the min_perf_pct resulted in 28 with the above commit.
Before this commit it was 27 and it will be the same after this
change.

Fixes: 1a4fe38a (cpufreq: intel_pstate: Remove max/min fractions to limit performance)
Reported-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

d4436c0d

05 7月, 2017 1 次提交

cpufreq: intel_pstate: constify attribute_group structures · 106c9c77

由 Arvind Yadav 提交于 7月 03, 2017

attribute_groups are not supposed to change at runtime. All functions
working with attribute_groups provided by <linux/sysfs.h> work with const
attribute_group. So mark the non-const structs as const.

File size before:
   text	   data	    bss	    dec	    hex	filename
  15197	   2552	     40	  17789	   457d	drivers/cpufreq/intel_pstate.o

File size After adding 'const':
   text	   data	    bss	    dec	    hex	filename
  15261	   2488	     40	  17789	   457d	drivers/cpufreq/intel_pstate.o
Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

106c9c77

30 6月, 2017 1 次提交

cpufreq: intel_pstate: Clean up after performance governor changes · fab24dcc

由 Rafael J. Wysocki 提交于 6月 29, 2017

After commit 82b4e03e (intel_pstate: skip scheduler hook when in
"performance" mode) get_target_pstate_use_performance() and
get_target_pstate_use_cpu_load() are never called if scaling_governor
is "performance", so drop the CPUFREQ_POLICY_PERFORMANCE checks from
them as they will never trigger anyway.

Moreover, the documentation needs to be updated to reflect the change
made by the above commit, so do that too.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

fab24dcc

27 6月, 2017 2 次提交

intel_pstate: skip scheduler hook when in "performance" mode · 82b4e03e

由 Len Brown 提交于 6月 23, 2017

When the governor is set to "performance", intel_pstate does not
need the scheduler hook for doing any calculations.  Under these
conditions, its only purpose is to continue to maintain
cpufreq/scaling_cur_freq.

The cpufreq/scaling_cur_freq sysfs attribute is now provided by
shared x86 cpufreq code on modern x86 systems, including
all systems supported by the intel_pstate driver.

So in "performance" governor mode, the scheduler hook can be skipped.
This applies to both in Software and Hardware P-state control modes.
Suggested-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NLen Brown <len.brown@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

82b4e03e

intel_pstate: delete scheduler hook in HWP mode · 62611cb9

由 Len Brown 提交于 6月 23, 2017

The cpufreq/scaling_cur_freq sysfs attribute is now provided by
shared x86 cpufreq code on modern x86 systems, including
all systems supported by the intel_pstate driver.

In HWP mode, maintaining that value was the sole purpose of
the scheduler hook, intel_pstate_update_util_hwp(),
so it can now be removed.
Signed-off-by: NLen Brown <len.brown@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

62611cb9

24 6月, 2017 1 次提交

cpufreq: intel_pstate: Remove max/min fractions to limit performance · 1a4fe38a

由 Srinivas Pandruvada 提交于 6月 12, 2017

In the current model the max/min perf limits are a fraction of current
user space limits to the allowed max_freq or 100% for global limits.
This results in wrong ratio limits calculation because of rounding
issues for some user space limits.

Initially we tried to solve this issue by issue by having more shift
bits to increase precision. Still there are isolated cases where we still
have error.

This can be avoided by using ratios all together. Since the way we get
cpuinfo.max_freq is by multiplying scaling factor to max ratio, we can
easily keep the max/min ratios in terms of ratios and not fractions.

For example:
if the max ratio = 36
cpuinfo.max_freq = 36 * 100000 = 3600000

Suppose user space sets a limit of 1200000, then we can calculate
max ratio limit as
= 36 * 1200000 / 3600000
= 12
This will be correct for any user limits.

The other advantage is that, we don't need to do any calculation in the
fast path as ratio limit is already calculated via set_policy() callback.
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

1a4fe38a

05 6月, 2017 1 次提交

cpufreq: intel_pstate: Avoid division by 0 in min_perf_pct_min() · 57caf4ec

由 Rafael J. Wysocki 提交于 6月 05, 2017

Commit c5a2ee7d (cpufreq: intel_pstate: Active mode P-state
limits rework) incorrectly assumed that pstate.turbo_pstate would
always be nonzero for CPU0 in min_perf_pct_min() if
cpufreq_register_driver() had succeeded which may not be the case
in virtualized environments.

If that assumption doesn't hold, it leads to an early crash on boot
in intel_pstate_register_driver(), so add a sanity check to
min_perf_pct_min() to prevent the crash from happening.

Fixes: c5a2ee7d (cpufreq: intel_pstate: Active mode P-state limits rework)
Reported-and-tested-by: NJongman Heo <jongman.heo@samsung.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

57caf4ec

12 5月, 2017 1 次提交

intel_pstate: use updated msr-index.h HWP.EPP values · 3cedbc5a

由 Len Brown 提交于 5月 01, 2017

intel_pstate exports sysfs attributes for setting and observing HWP.EPP.
These attributes use strings to describe 4 operating states, and
inside the driver, these strings are mapped to numerical register
values.

The authorative mapping between the strings and numerical HWP.EPP values
are now globally defined in msr-index.h, replacing the out-dated
mapping that were open-coded into intel_pstate.c

new old string
--- --- ------
  0   0 performance
128  64 balance_performance
192 128 balance_power
255 192 power

Note that the HW and BIOS default value on most system is 128,
which intel_pstate will now call "balance_performance"
while it used to call it "balance_power".
Signed-off-by: NLen Brown <len.brown@intel.com>

3cedbc5a

18 4月, 2017 1 次提交

cpufreq: schedutil: Use policy-dependent transition delays · 1b72e7fd

由 Rafael J. Wysocki 提交于 4月 11, 2017

Make the schedutil governor take the initial (default) value of the
rate_limit_us sysfs attribute from the (new) transition_delay_us
policy parameter (to be set by the scaling driver).

That will allow scaling drivers to make schedutil use smaller default
values of rate_limit_us and reduce the default average time interval
between consecutive frequency changes.

Make intel_pstate set transition_delay_us to 500.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>

1b72e7fd

30 3月, 2017 1 次提交

cpufreq: intel_pstate: Add support for Gemini Lake · 630e5757

由 Box, David E 提交于 3月 29, 2017

Use same parameters as INTEL_FAM6_ATOM_GOLDMONT to enable
Gemini Lake.
Signed-off-by: NBox, David E <david.e.box@intel.com>
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

630e5757

29 3月, 2017 16 次提交

cpufreq: intel_pstate: Eliminate intel_pstate_get_min_max() · b02aabe8

由 Rafael J. Wysocki 提交于 3月 28, 2017

Some computations in intel_pstate_get_min_max() are not necessary
and one of its two callers doesn't even use the full result.

First off, the fixed-point value of cpu->max_perf represents a
non-negative number between 0 and 1 inclusive and cpu->min_perf
cannot be greater than cpu->max_perf. It is not necessary to check
those conditions every time the numbers in question are used.

Moreover, since intel_pstate_max_within_limits() only needs the
upper boundary, it doesn't make sense to compute the lower one in
there and returning min and max from intel_pstate_get_min_max()
via pointers doesn't look particularly nice.

For the above reasons, drop intel_pstate_get_min_max(), add a helper
to get the base P-state for min/max computations and carry out them
directly in the previous callers of intel_pstate_get_min_max().
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

b02aabe8

cpufreq: intel_pstate: Do not walk policy->cpus · 2bfc4cbb

由 Rafael J. Wysocki 提交于 3月 28, 2017

intel_pstate_hwp_set() is the only function walking policy->cpus
in intel_pstate.  The rest of the code simply assumes one CPU per
policy, including the initialization code.

Therefore it doesn't make sense for intel_pstate_hwp_set() to
walk policy->cpus as it is guaranteed to have only one bit set
for policy->cpu.

For this reason, rearrange intel_pstate_hwp_set() to take the CPU
number as the argument and drop the loop over policy->cpus from it.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

2bfc4cbb

cpufreq: intel_pstate: Introduce pid_in_use() · 8ca6ce37

由 Rafael J. Wysocki 提交于 3月 28, 2017

Add a new function pid_in_use() to return the information on whether
or not the PID-based P-state selection algorithm is in use.

That allows a couple of complicated conditions in the code to be
reduced to simple checks against the new function's return value.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

8ca6ce37

cpufreq: intel_pstate: Drop struct cpu_defaults · 2f49afc2

由 Rafael J. Wysocki 提交于 3月 28, 2017

The cpu_defaults structure is redundant, because it only contains
one member of type struct pstate_funcs which can be used directly
instead of struct cpu_defaults.

For this reason, drop struct cpu_defaults, use struct pstate_funcs
directly instead of it where applicable and rename all of the
variables of that type accordingly.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

2f49afc2

cpufreq: intel_pstate: Move cpu_defaults definitions · de4a76cb

由 Rafael J. Wysocki 提交于 3月 28, 2017

Move the definitions of the cpu_defaults structures after the
definitions of utilization update callback routines to avoid
extra declarations of the latter.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

de4a76cb

cpufreq: intel_pstate: Add update_util callback to pstate_funcs · 67dd9bf4

由 Rafael J. Wysocki 提交于 3月 28, 2017

Avoid using extra function pointers during P-state selection by
dropping the get_target_pstate member from struct pstate_funcs,
adding a new update_util callback to it (to be registered with
the CPU scheduler as the utilization update callback in the active
mode) and reworking the utilization update callback routines to
invoke specific P-state selection functions directly.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

67dd9bf4

cpufreq: intel_pstate: Use different utilization update callbacks · eabd22c6

由 Rafael J. Wysocki 提交于 3月 28, 2017

Notice that some overhead in the utilization update callbacks
registered by intel_pstate in the active mode can be avoided if
those callbacks are tailored to specific configurations of the
driver.  For example, the utilization update callback for the HWP
enabled case only needs to update the average CPU performance
periodically whereas the utilization update callback for the
PID-based algorithm does not need to take IO-wait boosting into
account and so on.

With that in mind, define three utilization update callbacks for
three different use cases: HWP enabled, the CPU load "powersave"
P-state selection algorithm and the PID-based "powersave" P-state
selection algorithm and modify the driver initialization to
choose the callback matching its current configuration.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

eabd22c6

cpufreq: intel_pstate: Modify check in intel_pstate_update_status() · 0042b2c0

由 Rafael J. Wysocki 提交于 3月 28, 2017

One of the checks in intel_pstate_update_status() implicitly relies
on the information that there are only two struct cpufreq_driver
objects available, but it is better to do it directly against the
value it really is about (to make the code easier to follow if
nothing else).
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

0042b2c0

cpufreq: intel_pstate: Drop driver_registered variable · ee8df89a

由 Rafael J. Wysocki 提交于 3月 28, 2017

The driver_registered variable in intel_pstate is used for checking
whether or not the driver has been registered, but intel_pstate_driver
can be used for that too (with the rule that the driver is not
registered as long as it is NULL).

That is a bit more straightforward and the code may be simplified
a bit this way, so modify the driver accordingly.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

ee8df89a

cpufreq: intel_pstate: Skip unnecessary PID resets on init · 694cb173

由 Rafael J. Wysocki 提交于 3月 28, 2017

PID controller parameters only need to be initialized if the
get_target_pstate_use_performance() P-state selection routine
is going to be used.  It is not necessary to initialize them
otherwise, so don't do that.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

694cb173

cpufreq: intel_pstate: Set HWP sampling interval once · 7aec5b50

由 Rafael J. Wysocki 提交于 3月 28, 2017

In the HWP enabled case pid_params.sample_rate_ns only needs to be
updated once, because it is global, so do that when setting hwp_active
instead of doing it during the initialization of every CPU.

Moreover, pid_params.sample_rate_ms is never used if HWP is enabled,
so do not update it at all then.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

7aec5b50

cpufreq: intel_pstate: Clean up intel_pstate_busy_pid_reset() · ff35f02e

由 Rafael J. Wysocki 提交于 3月 28, 2017

intel_pstate_busy_pid_reset() is the only caller of pid_reset(),
pid_p_gain_set(), pid_i_gain_set(), and pid_d_gain_set().  Moreover,
it passes constants as two parameters of pid_reset() and all of
the other routines above essentially contain the same code, so
fold all of them into the caller and drop unnecessary computations.

Introduce percent_fp() for converting integer values in percent
to fixed-point fractions and use it in the above code cleanup.

Finally, rename intel_pstate_busy_pid_reset() to
intel_pstate_pid_reset() as it also is used for the
initialization of PID parameters for every CPU and the
meaning of the "busy" part of the name is not particularly
clear.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

ff35f02e

cpufreq: intel_pstate: Fold intel_pstate_reset_all_pid() into the caller · 4ddd0146

由 Rafael J. Wysocki 提交于 3月 28, 2017

There is only one caller of intel_pstate_reset_all_pid(), which is
pid_param_set() used in the debugfs interface only, and having that
code split does not make it particularly convenient to follow.

For this reason, move the body of intel_pstate_reset_all_pid() into
its caller and drop that function.

Also change the loop from for_each_online_cpu() (which is obviously
racy with respect to CPU offline/online) to for_each_possible_cpu(),
so that all PID parameters are reset for all CPUs regardless of their
online/offline status (to prevent, for example, a previously offline
CPU from going online with a stale set of PID parameters).
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

4ddd0146

cpufreq: intel_pstate: Initialize pid_params statically · 5c439053

由 Rafael J. Wysocki 提交于 3月 28, 2017

Notice that both the existing struct cpu_defaults instances in which
PID parameters are actually initialized use the same values of those
parameters, so it is not really necessary to copy them over to
pid_params dynamically.

Instead, initialize pid_params statically with those values and
drop the unused pid_policy member from struct cpu_defaults along
with copy_pid_params() used for initializing it.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

5c439053

cpufreq: intel_pstate: Drop pointless initialization of PID parameters · 64043678

由 Rafael J. Wysocki 提交于 3月 28, 2017

The P-state selection algorithm used by intel_pstate for Atom
processors is not based on the PID controller and the initialization
of PID parametrs for those processors is pointless and confusing, so
drop it.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

64043678

cpufreq: intel_pstate: Eliminate struct perf_limits · e14cf885

由 Rafael J. Wysocki 提交于 3月 28, 2017

After recent changes the purpose of struct perf_limits is not
particularly clear any more and the code may be made somewhat
easier to follow by eliminating it, so go for that.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

e14cf885

24 3月, 2017 4 次提交

cpufreq: intel_pstate: Avoid transient updates of cpuinfo.max_freq · 80b120ca

由 Rafael J. Wysocki 提交于 3月 23, 2017

Both intel_pstate_verify_policy() and intel_cpufreq_verify_policy()
set policy->cpuinfo.max_freq depending on the turbo status, but the
updates made by them are discarded by the core, because the policy
object passed to them by the core is temporary and cpuinfo.max_freq
from that object is not copied to the final policy object in
cpufreq_set_policy().

However, cpufreq_set_policy() passes the temporary policy object
to the ->setpolicy callback of the driver, so intel_pstate_set_policy()
actually sees the policy->cpuinfo.max_freq value updated by
intel_pstate_verify_policy() and not the final one.  It also
updates policy->max sometimes which basically has no effect after
it returns, because the core discards that update.

To avoid confusion, eliminate policy->cpuinfo.max_freq updates from
intel_pstate_verify_policy() and intel_cpufreq_verify_policy()
entirely and check the maximum frequency explicitly in
intel_pstate_update_perf_limits() instead of relying on the
transiently updated policy->cpuinfo.max_freq value.

Moreover, move the max->policy adjustment carried out in
intel_pstate_set_policy() to a separate function and call that
function from the ->verify driver callbacks to ensure that it will
actually be effective.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

80b120ca

cpufreq: intel_pstate: Active mode P-state limits rework · c5a2ee7d

由 Rafael J. Wysocki 提交于 3月 22, 2017

The coordination of P-state limits used by intel_pstate in the active
mode (ie. by default) is problematic, because it synchronizes all of
the limits (ie. the global ones and the per-policy ones) so as to use
one common pair of P-state limits (min and max) across all CPUs in
the system.  The drawbacks of that are as follows:

 - If P-states are coordinated in hardware, it is not necessary
   to coordinate them in software on top of that, so in that case
   all of the above activity is in vain.

 - If P-states are not coordinated in hardware, then the processor
   is actually capable of setting different P-states for different
   CPUs and coordinating them at the software level simply doesn't
   allow that capability to be utilized.

 - The coordination works in such a way that setting a per-policy
   limit (eg. scaling_max_freq) for one CPU causes the common
   effective limit to change (and it will affect all of the other
   CPUs too), but subsequent reads from the corresponding sysfs
   attributes for the other CPUs will return stale values (which
   is confusing).

 - Reads from the global P-state limit attributes, min_perf_pct and
   max_perf_pct, return the effective common values and not the last
   values set through these attributes.  However, the last values
   set through these attributes become hard limits that cannot be
   exceeded by writes to scaling_min_freq and scaling_max_freq,
   respectively, and they are not exposed, so essentially users
   have to remember what they are.

All of that is painful enough to warrant a change of the management
of P-state limits in the active mode.

To that end, redesign the active mode P-state limits management in
intel_pstate in accordance with the following rules:

 (1) All CPUs are affected by the global limits (that is, none of
     them can be requested to run faster than the global max and
     none of them can be requested to run slower than the global
     min).

 (2) Each individual CPU is affected by its own per-policy limits
     (that is, it cannot be requested to run faster than its own
     per-policy max and it cannot be requested to run slower than
     its own per-policy min).

 (3) The global and per-policy limits can be set independently.

Also, the global maximum and minimum P-state limits will be always
expressed as percentages of the maximum supported turbo P-state.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

c5a2ee7d

cpufreq: intel_pstate: Use load-based P-state selection more widely · 55395345

由 Rafael J. Wysocki 提交于 3月 22, 2017

Extend the set of systems for which intel_pstate will use the
"powersave" P-state selection algorithm based on CPU load in the
active mode by systems with ACPI preferred profile set to "tablet",
"appliance PC", "desktop", or "workstation" (ie. everything with a
specified preferred profile that is not a "server").
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

55395345

cpufreq: intel_pstate: Support HWP processors in all operation modes · eb5139d1

由 Rafael J. Wysocki 提交于 3月 22, 2017

Currently, some processors supporting HWP are only supported by
intel_pstate if HWP is actually going to be used and not supported
otherwise which is confusing.

Specifically, they are not supported if "intel_pstate=no_hwp" is
passed to the kernel in the command line or if the driver is started
in the passive mode ("intel_pstate=passive").

There is no real reason for that, because everything about those
processor is known anyway and the driver can work with them in all
modes, so make that happen, but use the load-based P-state selection
algorithm for the active mode "powersave" policy with them.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

eb5139d1

22 3月, 2017 1 次提交

cpufreq: intel_pstate: Fix policy data management in passive mode · 64897b20

由 Rafael J. Wysocki 提交于 3月 21, 2017

The policy->cpuinfo.max_freq and policy->max updates in
intel_cpufreq_turbo_update() are excessive as they are done for no
good reason and may lead to problems in principle, so they should be
dropped.  However, after dropping them intel_cpufreq_turbo_update()
becomes almost entirely pointless, because the check made by it is
made again down the road in intel_pstate_prepare_request().  The
only thing in it that still needs to be done is the call to
update_turbo_state(), so drop intel_cpufreq_turbo_update() altogether
and make its callers invoke update_turbo_state() directly instead of
it.

In addition to that, fix intel_cpufreq_verify_policy() so that it
checks global.no_turbo in addition to global.turbo_disabled when
updating policy->cpuinfo.max_freq to make it consistent with
intel_pstate_verify_policy().

Fixes: 001c76f0 (cpufreq: intel_pstate: Generic governors support)
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

64897b20

18 3月, 2017 1 次提交

cpufreq: intel_pstate: One set of global limits in active mode · 7de32556

由 Rafael J. Wysocki 提交于 3月 18, 2017

In the active mode intel_pstate currently uses two sets of global
limits, each associated with one of the possible scaling_governor
settings in that mode: "powersave" or "performance".

The driver switches over from one of those sets to the other
depending on the scaling_governor setting for the last CPU whose
per-policy cpufreq interface in sysfs was last used to change
parameters exposed in there.  That obviously leads to no end of
issues when the scaling_governor settings differ between CPUs.

The most recent issue was introduced by commit a240c4aa (cpufreq:
intel_pstate: Do not reinit performance limits in ->setpolicy)
that eliminated the reinitialization of "performance" limits in
intel_pstate_set_policy() preventing the max limit from being set
to anything below 100, among other things.

Namely, an undesirable side effect of commit a240c4aa is that
now, after setting scaling_governor to "performance" in the active
mode, the per-policy limits for the CPU in question go to the highest
level and stay there even when it is switched back to "powersave"
later.

As it turns out, some distributions set scaling_governor to
"performance" temporarily for all CPUs to speed-up system
initialization, so that change causes them to misbehave later.

To fix that, get rid of the performance/powersave global limits
split and use just one set of global limits for everything.

From the user's persepctive, after this modification, when
scaling_governor is switched from "performance" to "powersave"
or the other way around on one CPU, the limits settings (ie. the
global max/min_perf_pct and per-policy scaling_max/min_freq for
any CPUs) will not change.  Still, switching from "performance"
to "powersave" or the other way around changes the way in which
P-states are selected and in particular "performance" causes the
driver to always request the highest P-state it is allowed to ask
for for the given CPU.

Fixes: a240c4aa (cpufreq: intel_pstate: Do not reinit performance limits in ->setpolicy)
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

7de32556