提交 · acb1872577b346bd15ab3a3f8dff780d6cca4b70 · openanolis / cloud-kernel

18 7月, 2018 1 次提交

cpufreq: intel_pstate: Register when ACPI PCCH is present · 95d6c085

由 Rafael J. Wysocki 提交于 7月 18, 2018

Currently, intel_pstate doesn't register if _PSS is not present on
HP Proliant systems, because it expects the firmware to take over
CPU performance scaling in that case.  However, if ACPI PCCH is
present, the firmware expects the kernel to use it for CPU
performance scaling and the pcc-cpufreq driver is loaded for that.

Unfortunately, the firmware interface used by that driver is not
scalable for fundamental reasons, so pcc-cpufreq is way suboptimal
on systems with more than just a few CPUs.  In fact, it is better to
avoid using it at all.

For this reason, modify intel_pstate to look for ACPI PCCH if _PSS
is not present and register if it is there.  Also prevent the
pcc-cpufreq driver from trying to initialize itself if intel_pstate
has been registered already.

Fixes: fbbcdc07 (intel_pstate: skip the driver if ACPI has power mgmt option)
Reported-by: NAndreas Herrmann <aherrmann@suse.com>
Reviewed-by: NAndreas Herrmann <aherrmann@suse.com>
Acked-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: NAndreas Herrmann <aherrmann@suse.com>
Cc: 4.16+ <stable@vger.kernel.org> # 4.16+
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

95d6c085

19 6月, 2018 1 次提交

cpufreq: intel_pstate: Fix scaling max/min limits with Turbo 3.0 · ff7c9917

由 Srinivas Pandruvada 提交于 6月 18, 2018

When scaling max/min settings are changed, internally they are converted
to a ratio using the max turbo 1 core turbo frequency. This works fine
when 1 core max is same irrespective of the core. But under Turbo 3.0,
this will not be the case. For example:
Core 0: max turbo pstate: 43 (4.3GHz)
Core 1: max turbo pstate: 45 (4.5GHz)
In this case 1 core turbo ratio will be maximum of all, so it will be
45 (4.5GHz). Suppose scaling max is set to 4GHz (ratio 40) for all cores
,then on core one it will be
 = max_state * policy->max / max_freq;
 = 43 * (4000000/4500000) = 38 (3.8GHz)
 = 38
which is 200MHz less than the desired.
On core2, it will be correctly set to ratio 40 (4GHz). Same holds true
for scaling min frequency limit. So this requires usage of correct turbo
max frequency for core one, which in this case is 4.3GHz. So we need to
adjust per CPU cpu->pstate.turbo_freq using the maximum HWP ratio of that
core.

This change uses the HWP capability of a core to adjust max turbo
frequency. But since Broadwell HWP doesn't use ratios in the HWP
capabilities, we have to use legacy max 1 core turbo ratio. This is not
a problem as the HWP capabilities don't differ among cores in Broadwell.
We need to check for non Broadwell CPU model for applying this change,
though.
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: 4.6+ <stable@vger.kernel.org> # 4.6+
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

ff7c9917

13 6月, 2018 1 次提交

treewide: Use array_size() in vzalloc() · fad953ce

由 Kees Cook 提交于 6月 12, 2018

The vzalloc() function has no 2-factor argument form, so multiplication
factors need to be wrapped in array_size(). This patch replaces cases of:

        vzalloc(a * b)

with:
        vzalloc(array_size(a, b))

as well as handling cases of:

        vzalloc(a * b * c)

with:

        vzalloc(array3_size(a, b, c))

This does, however, attempt to ignore constant size factors like:

        vzalloc(4 * 1024)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
  vzalloc(
-	(sizeof(TYPE)) * E
+	sizeof(TYPE) * E
  , ...)
|
  vzalloc(
-	(sizeof(THING)) * E
+	sizeof(THING) * E
  , ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
  vzalloc(
-	sizeof(u8) * (COUNT)
+	COUNT
  , ...)
|
  vzalloc(
-	sizeof(__u8) * (COUNT)
+	COUNT
  , ...)
|
  vzalloc(
-	sizeof(char) * (COUNT)
+	COUNT
  , ...)
|
  vzalloc(
-	sizeof(unsigned char) * (COUNT)
+	COUNT
  , ...)
|
  vzalloc(
-	sizeof(u8) * COUNT
+	COUNT
  , ...)
|
  vzalloc(
-	sizeof(__u8) * COUNT
+	COUNT
  , ...)
|
  vzalloc(
-	sizeof(char) * COUNT
+	COUNT
  , ...)
|
  vzalloc(
-	sizeof(unsigned char) * COUNT
+	COUNT
  , ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
  vzalloc(
-	sizeof(TYPE) * (COUNT_ID)
+	array_size(COUNT_ID, sizeof(TYPE))
  , ...)
|
  vzalloc(
-	sizeof(TYPE) * COUNT_ID
+	array_size(COUNT_ID, sizeof(TYPE))
  , ...)
|
  vzalloc(
-	sizeof(TYPE) * (COUNT_CONST)
+	array_size(COUNT_CONST, sizeof(TYPE))
  , ...)
|
  vzalloc(
-	sizeof(TYPE) * COUNT_CONST
+	array_size(COUNT_CONST, sizeof(TYPE))
  , ...)
|
  vzalloc(
-	sizeof(THING) * (COUNT_ID)
+	array_size(COUNT_ID, sizeof(THING))
  , ...)
|
  vzalloc(
-	sizeof(THING) * COUNT_ID
+	array_size(COUNT_ID, sizeof(THING))
  , ...)
|
  vzalloc(
-	sizeof(THING) * (COUNT_CONST)
+	array_size(COUNT_CONST, sizeof(THING))
  , ...)
|
  vzalloc(
-	sizeof(THING) * COUNT_CONST
+	array_size(COUNT_CONST, sizeof(THING))
  , ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

  vzalloc(
-	SIZE * COUNT
+	array_size(COUNT, SIZE)
  , ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
  vzalloc(
-	sizeof(TYPE) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vzalloc(
-	sizeof(TYPE) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vzalloc(
-	sizeof(TYPE) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vzalloc(
-	sizeof(TYPE) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vzalloc(
-	sizeof(THING) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  vzalloc(
-	sizeof(THING) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  vzalloc(
-	sizeof(THING) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  vzalloc(
-	sizeof(THING) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
  vzalloc(
-	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  vzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  vzalloc(
-	sizeof(THING1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  vzalloc(
-	sizeof(THING1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  vzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
|
  vzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
  vzalloc(
-	(COUNT) * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc(
-	COUNT * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc(
-	COUNT * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc(
-	(COUNT) * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc(
-	COUNT * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc(
-	(COUNT) * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc(
-	(COUNT) * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc(
-	COUNT * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
)

// Any remaining multi-factor products, first at least 3-factor products
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
  vzalloc(C1 * C2 * C3, ...)
|
  vzalloc(
-	E1 * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
)

// And then all remaining 2 factors products when they're not all constants.
@@
expression E1, E2;
constant C1, C2;
@@

(
  vzalloc(C1 * C2, ...)
|
  vzalloc(
-	E1 * E2
+	array_size(E1, E2)
  , ...)
)
Signed-off-by: NKees Cook <keescook@chromium.org>

fad953ce

08 6月, 2018 1 次提交

cpufreq: intel_pstate: enable boost for Skylake Xeon · 41ab43c9

由 Srinivas Pandruvada 提交于 6月 05, 2018

Enable HWP boost on Skylake server and workstations.
Reported-by: NMel Gorman <mgorman@techsingularity.net>
Tested-by: NGiovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

41ab43c9

06 6月, 2018 3 次提交

cpufreq: intel_pstate: New sysfs entry to control HWP boost · aaaece3d

由 Srinivas Pandruvada 提交于 6月 05, 2018

A new attribute is added to intel_pstate sysfs to enable/disable
HWP dynamic performance boost.
Reported-by: NMel Gorman <mgorman@techsingularity.net>
Tested-by: NGiovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

aaaece3d

cpufreq: intel_pstate: HWP boost performance on IO wakeup · 52ccc431

由 Srinivas Pandruvada 提交于 6月 05, 2018

This change uses SCHED_CPUFREQ_IOWAIT flag to boost HWP performance.
Since SCHED_CPUFREQ_IOWAIT flag is set frequently, we don't start
boosting steps unless we see two consecutive flags in two ticks. This
avoids boosting due to IO because of regular system activities.

To avoid synchronization issues, the actual processing of the flag is
done on the local CPU callback.
Reported-by: NMel Gorman <mgorman@techsingularity.net>
Tested-by: NGiovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

52ccc431

cpufreq: intel_pstate: Add HWP boost utility and sched util hooks · e0efd5be

由 Srinivas Pandruvada 提交于 6月 05, 2018

Added two utility functions to HWP boost up gradually and boost down to
the default cached HWP request values.

Boost up:
Boost up updates HWP request minimum value in steps. This minimum value
can reach upto at HWP request maximum values depends on how frequently,
this boost up function is called. At max, boost up will take three steps
to reach the maximum, depending on the current HWP request levels and HWP
capabilities. For example, if the current settings are:
If P0 (Turbo max) = P1 (Guaranteed max) = min
        No boost at all.
If P0 (Turbo max) > P1 (Guaranteed max) = min
        Should result in one level boost only for P0.
If P0 (Turbo max) = P1 (Guaranteed max) > min
        Should result in two level boost:
                (min + p1)/2 and P1.
If P0 (Turbo max) > P1 (Guaranteed max) > min
        Should result in three level boost:
                (min + p1)/2, P1 and P0.
We don't set any level between P0 and P1 as there is no guarantee that
they will be honored.

Boost down:
After the system is idle for hold time of 3ms, the HWP request is reset
to the default value from HWP init or user modified one via sysfs.

Caching of HWP Request and Capabilities
Store the HWP request value last set using MSR_HWP_REQUEST and read
MSR_HWP_CAPABILITIES. This avoid reading of MSRs in the boost utility
functions.

These boost utility functions calculated limits are based on the latest
HWP request value, which can be modified by setpolicy() callback. So if
user space modifies the minimum perf value, that will be accounted for
every time the boost up is called. There will be case when there can be
contention with the user modified minimum perf, in that case user value
will gain precedence. For example just before HWP_REQUEST MSR is updated
from setpolicy() callback, the boost up function is called via scheduler
tick callback. Here the cached MSR value is already the latest and limits
are updated based on the latest user limits, but on return the MSR write
callback called from setpolicy() callback will update the HWP_REQUEST
value. This will be used till next time the boost up function is called.

In addition add a variable to control HWP dynamic boosting. When HWP
dynamic boost is active then set the HWP specific update util hook. The
contents in the utility hooks will be filled in the subsequent patches.
Reported-by: NMel Gorman <mgorman@techsingularity.net>
Tested-by: NGiovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

e0efd5be

15 5月, 2018 1 次提交

cpufreq: intel_pstate: allow trace in passive mode · 50e9ffab

由 Doug Smythies 提交于 5月 14, 2018

Allow use of the trace_pstate_sample trace function
when the intel_pstate driver is in passive mode.
Since the core_busy and scaled_busy fields are not
used, and it might be desirable to know which path
through the driver was used, either intel_cpufreq_target
or intel_cpufreq_fast_switch, re-task the core_busy
field as a flag indicator.

The user can then use the intel_pstate_tracer.py utility
to summarize and plot the trace.

Note: The core_busy feild still goes by that name
in include/trace/events/power.h and within the
intel_pstate_tracer.py script and csv file headers,
but it is graphed as "performance", and called
core_avg_perf now in the intel_pstate driver.

Sometimes, in passive mode, the driver is not called for
many tens or even hundreds of seconds. The user
needs to understand, and not be confused by, this limitation.
Signed-off-by: NDoug Smythies <dsmythies@telus.net>
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

50e9ffab

10 4月, 2018 1 次提交

cpufreq: intel_pstate: Do not include debugfs.h · b258dfea

由 Rafael J. Wysocki 提交于 3月 30, 2018

The intel_pstate driver doesn't use debugfs any more, so drop
linux/debugfs.h from the list of included headers in it.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>

b258dfea

08 2月, 2018 1 次提交

cpufreq: intel_pstate: Enable HWP during system resume on CPU0 · 70f6bf2a

由 Chen Yu 提交于 1月 29, 2018

When maxcpus=1 is in the kernel command line, the BP is responsible
for re-enabling the HWP - because currently only the APs invoke
intel_pstate_hwp_enable() during their online process - which might
put the system into unstable state after resume.

Fix this by enabling the HWP explicitly on BP during resume.
Reported-by: NDoug Smythies <dsmythies@telus.net>
Suggested-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NYu Chen <yu.c.chen@intel.com>
[ rjw: Subject/changelog, minor modifications ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

70f6bf2a

12 1月, 2018 2 次提交

cpufreq: intel_pstate: Add Skylake servers support · d8de7a44

由 Srinivas Pandruvada 提交于 1月 10, 2018

Currently intel_pstate can function only in HWP mode on Skylake servers.
When HWP feature is not enabled on the processor then acpi-cpufreq is
driver is used.

Based on the power and performance tests using intel_pstate scaling
algorithm the results are comparable. But intel_pstate brings in
additional features:
 - Display of turbo frequency range, which many users like to see
 - Place limits in the turbo frequency range when platform allows

Since these tests are done only using non PID algorithm introduced in
kernel version 4.14, this patch is not a backport candidate. So each user
has to carefully weigh the benefits before he backports.
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

d8de7a44

cpufreq: intel_pstate: Replace bxt_funcs with core_funcs · dbd49b85

由 Srinivas Pandruvada 提交于 1月 10, 2018

Since core_funcs and bxt_funcs have same set of callbacks, replace
bxt_funcs with core_funcs.
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

dbd49b85

29 8月, 2017 1 次提交

intel_pstate: convert to use acpi_match_platform_list() · 5e932321

由 Toshi Kani 提交于 8月 23, 2017

Convert to use acpi_match_platform_list() for the platform check.
There is no change in functionality.
Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
Acked-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

5e932321

18 8月, 2017 1 次提交

cpufreq: remove setting of policy->cpu in policy->cpus during init · b20a3f3d

由 Sudeep Holla 提交于 8月 10, 2017

policy->cpu is copied into policy->cpus in cpufreq_online() before
calling into cpufreq_driver->init(). So there's no need to set the
same in the individual driver init() functions again.

This patch removes the redundant setting of policy->cpu in policy->cpus
in intel_pstate and cppc drivers.
Reported-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NSudeep Holla <sudeep.holla@arm.com>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

b20a3f3d

11 8月, 2017 1 次提交

cpufreq: intel_pstate: report correct CPU frequencies during trace · c587c79f

由 Doug Smythies 提交于 8月 08, 2017

The intel_pstate CPU frequency scaling driver has always
calculated CPU frequency incorrectly.  Recent changes have
eliminted most of the issues, however the frequency reported
in the trace buffer, if used, is incorrect.

It remains desireable that cpu->pstate.scaling still be a nice
round number for things such as when setting max and min frequencies.
So the proposal is to just fix the reported frequency in the trace data.

Fixes what remains of [1].

Link: https://bugzilla.kernel.org/show_bug.cgi?id=96521 # [1]
Signed-off-by: NDoug Smythies <dsmythies@telus.net>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

c587c79f

10 8月, 2017 2 次提交

cpufreq: intel_pstate: Shorten a couple of long names · d77d4888

由 Rafael J. Wysocki 提交于 8月 10, 2017

The names of the INTEL_PSTATE_DEFAULT_SAMPLING_INTERVAL symbol and
the get_target_pstate_use_cpu_load() function don't need to be so
long any more, so make them shorter.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

d77d4888

cpufreq: intel_pstate: Simplify intel_pstate_adjust_pstate() · a891283e

由 Rafael J. Wysocki 提交于 8月 10, 2017

Since there is only one P-state selection routine in intel_pstate
now, make intel_pstate_adjust_pstate() call it directly and drop
the target_pstate argument from that function.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

a891283e

04 8月, 2017 1 次提交

cpufreq: intel_pstate: Improve IO performance with per-core P-states · 7bde2d50

由 Srinivas Pandruvada 提交于 8月 03, 2017

In the current implementation, the response latency between seeing
SCHED_CPUFREQ_IOWAIT set and the actual P-state adjustment can be up
to 10ms.  It can be reduced by bumping up the P-state to the max at
the time SCHED_CPUFREQ_IOWAIT is passed to intel_pstate_update_util().
With this change, the IO performance improves significantly.

For a simple "grep -r . linux" (Here linux is the kernel source
folder) with caches dropped every time on a Broadwell Xeon workstation
with per-core P-states, the user and system time is shorter by as much
as 30% - 40%.

The same performance difference was not observed on clients that don't
support per-core P-state.
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Changelog ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

7bde2d50

01 8月, 2017 2 次提交

sched: cpufreq: Allow remote cpufreq callbacks · 674e7541

由 Viresh Kumar 提交于 7月 28, 2017

With Android UI and benchmarks the latency of cpufreq response to
certain scheduling events can become very critical. Currently, callbacks
into cpufreq governors are only made from the scheduler if the target
CPU of the event is the same as the current CPU. This means there are
certain situations where a target CPU may not run the cpufreq governor
for some time.

One testcase to show this behavior is where a task starts running on
CPU0, then a new task is also spawned on CPU0 by a task on CPU1. If the
system is configured such that the new tasks should receive maximum
demand initially, this should result in CPU0 increasing frequency
immediately. But because of the above mentioned limitation though, this
does not occur.

This patch updates the scheduler core to call the cpufreq callbacks for
remote CPUs as well.

The schedutil, ondemand and conservative governors are updated to
process cpufreq utilization update hooks called for remote CPUs where
the remote CPU is managed by the cpufreq policy of the local CPU.

The intel_pstate driver is updated to always reject remote callbacks.

This is tested with couple of usecases (Android: hackbench, recentfling,
galleryfling, vellamo, Ubuntu: hackbench) on ARM hikey board (64 bit
octa-core, single policy). Only galleryfling showed minor improvements,
while others didn't had much deviation.

The reason being that this patch only targets a corner case, where
following are required to be true to improve performance and that
doesn't happen too often with these tests:

- Task is migrated to another CPU.
- The task has high demand, and should take the target CPU to higher
  OPPs.
- And the target CPU doesn't call into the cpufreq governor until the
  next tick.

Based on initial work from Steve Muckle.
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Acked-by: NSaravana Kannan <skannan@codeaurora.org>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

674e7541

cpufreq: intel_pstate: Drop INTEL_PSTATE_HWP_SAMPLING_INTERVAL · f5c13f44

由 Rafael J. Wysocki 提交于 7月 31, 2017

After commit 62611cb9 (intel_pstate: delete scheduler hook in HWP
mode) the INTEL_PSTATE_HWP_SAMPLING_INTERVAL is not used anywhere in
the code, so drop it.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

f5c13f44

28 7月, 2017 1 次提交

cpufreq: intel_pstate: Drop ->get from intel_pstate structure · 22baebd4

由 Rafael J. Wysocki 提交于 7月 26, 2017

The ->get callback in the intel_pstate structure was mostly there
for the scaling_cur_freq sysfs attribute to work, but after commit
f8475cef (x86: use common aperfmperf_khz_on_cpu() to calculate
KHz using APERF/MPERF) that attribute uses arch_freq_get_on_cpu()
provided by the x86 arch code on all processors supported by
intel_pstate, so it doesn't need the ->get callback from the
driver any more.

Moreover, the very presence of the ->get callback in the intel_pstate
structure causes the cpuinfo_cur_freq attribute to be present when
intel_pstate operates in the active mode, which is bogus, because
the role of that attribute is to return the current CPU frequency
as seen by the hardware.  For intel_pstate, though, this is just an
average frequency and not really current, but computed for the
previous sampling interval (the actual current frequency may be
way different at the point this value is obtained by reading from
cpuinfo_cur_freq), and after commit 82b4e03e (intel_pstate: skip
scheduler hook when in "performance" mode) the value in
cpuinfo_cur_freq may be stale or just 0, depending on the driver's
operation mode.  In fact, however, on the hardware supported by
intel_pstate there is no way to read the current CPU frequency
from it, so the cpuinfo_cur_freq attribute should not be present
at all when this driver is in use.

For this reason, drop intel_pstate_get() and clear the ->get
callback pointer pointing to it, so that the cpuinfo_cur_freq is
not present for intel_pstate in the active mode any more.

Fixes: 82b4e03e (intel_pstate: skip scheduler hook when in "performance" mode)
Reported-by: NHuaisheng Ye <yehs1@lenovo.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>

22baebd4

27 7月, 2017 2 次提交

cpufreq: intel_pstate: Drop ->update_util from pstate_funcs · c4f3f70c

由 Rafael J. Wysocki 提交于 7月 25, 2017

All systems use the same P-state selection "powersave" algorithm
in the active mode if HWP is not used, so there's no need to provide
a pointer for it in struct pstate_funcs any more.

Drop ->update_util from struct pstate_funcs and make
intel_pstate_set_update_util_hook() use intel_pstate_update_util()
directly.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

c4f3f70c

cpufreq: intel_pstate: Do not use PID-based P-state selection · 9d0ef7af

由 Rafael J. Wysocki 提交于 7月 25, 2017

All systems with a defined ACPI preferred profile that are not
"servers" have been using the load-based P-state selection algorithm
in intel_pstate since 4.12-rc1 (mobile systems and laptops have been
using it since 4.10-rc1) and no problems with it have been reported
to date.  In particular, no regressions with respect to the PID-based
P-state selection have been reported.  Also testing indicates that
the P-state selection algorithm based on CPU load is generally on par
with the PID-based algorithm performance-wise, and for some workloads
it turns out to be better than the other one, while being more
straightforward and easier to understand at the same time.

Moreover, the PID-based P-state selection algorithm in intel_pstate
is known to be unstable in some situation and generally problematic,
the issues with it are hard to address and it has become a
significant maintenance burden.

For these reasons, make intel_pstate use the "powersave" P-state
selection algorithm based on CPU load in the active mode on all
systems and drop the PID-based P-state selection code along with
all things related to it from the driver.  Also update the
documentation accordingly.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

9d0ef7af

26 7月, 2017 1 次提交

cpufreq: Don't set transition_latency for setpolicy drivers · b8b78825

由 Viresh Kumar 提交于 7月 19, 2017

The transition_latency field isn't used for drivers with ->setpolicy()
callback present and there is no point setting it from the drivers.
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

b8b78825

14 7月, 2017 1 次提交

cpufreq: intel_pstate: Correct the busy calculation for KNL · 6e34e1f2

由 Srinivas Pandruvada 提交于 7月 13, 2017

The busy percent calculated for the Knights Landing (KNL) platform
is 1024 times smaller than the correct busy value.  This causes
performance to get stuck at the lowest ratio.

The scaling algorithm used for KNL is performance-based, but it still
looks at the CPU load to set the scaled busy factor to 0 when the
load is less than 1 percent.  In this case, since the computed load
is 1024x smaller than it should be, the scaled busy factor will
always be 0, irrespective of CPU business.

This needs a fix similar to the turbostat one in commit b2b34dfe
(tools/power turbostat: KNL workaround for %Busy and Avg_MHz).

For this reason, add one more callback to processor-specific
callbacks to specify an MPERF multiplier represented by a number of
bit positions to shift the value of that register to the left to
copmensate for its rate difference with respect to the TSC.  This
shift value is used during CPU busy calculations.

Fixes: ffb81056 (intel_pstate: Avoid getting stuck in high P-states when idle)
Reported-and-tested-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: 4.6+ <stable@vger.kernel.org> # 4.6+
[ rjw: Changelog ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

6e34e1f2

12 7月, 2017 1 次提交

cpufreq: intel_pstate: Fix ratio setting for min_perf_pct · d4436c0d

由 Srinivas Pandruvada 提交于 7月 10, 2017

When the minimum performance limit percentage is set to the power-up
default, it is possible that minimum performance ratio is off by one.

In the set_policy() callback the minimum ratio is calculated by
applying global.min_perf_pct to turbo_ratio and rounding up, but the
power-up default global.min_perf_pct is already rounded up to the
next percent in min_perf_pct_min().  That results in two round up
operations, so for the default min_perf_pct one of them is not
required.

It is better to remove rounding up in min_perf_pct_min() as this
matches the displayed min_perf_pct prior to commit c5a2ee7d
(cpufreq: intel_pstate: Active mode P-state limits rework) in 4.12.

For example on a platform with max turbo ratio of 37 and minimum
ratio of 10, the min_perf_pct resulted in 28 with the above commit.
Before this commit it was 27 and it will be the same after this
change.

Fixes: 1a4fe38a (cpufreq: intel_pstate: Remove max/min fractions to limit performance)
Reported-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

d4436c0d

05 7月, 2017 1 次提交

cpufreq: intel_pstate: constify attribute_group structures · 106c9c77

由 Arvind Yadav 提交于 7月 03, 2017

attribute_groups are not supposed to change at runtime. All functions
working with attribute_groups provided by <linux/sysfs.h> work with const
attribute_group. So mark the non-const structs as const.

File size before:
   text	   data	    bss	    dec	    hex	filename
  15197	   2552	     40	  17789	   457d	drivers/cpufreq/intel_pstate.o

File size After adding 'const':
   text	   data	    bss	    dec	    hex	filename
  15261	   2488	     40	  17789	   457d	drivers/cpufreq/intel_pstate.o
Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

106c9c77

30 6月, 2017 1 次提交

cpufreq: intel_pstate: Clean up after performance governor changes · fab24dcc

由 Rafael J. Wysocki 提交于 6月 29, 2017

After commit 82b4e03e (intel_pstate: skip scheduler hook when in
"performance" mode) get_target_pstate_use_performance() and
get_target_pstate_use_cpu_load() are never called if scaling_governor
is "performance", so drop the CPUFREQ_POLICY_PERFORMANCE checks from
them as they will never trigger anyway.

Moreover, the documentation needs to be updated to reflect the change
made by the above commit, so do that too.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

fab24dcc

27 6月, 2017 2 次提交

intel_pstate: skip scheduler hook when in "performance" mode · 82b4e03e

由 Len Brown 提交于 6月 23, 2017

When the governor is set to "performance", intel_pstate does not
need the scheduler hook for doing any calculations.  Under these
conditions, its only purpose is to continue to maintain
cpufreq/scaling_cur_freq.

The cpufreq/scaling_cur_freq sysfs attribute is now provided by
shared x86 cpufreq code on modern x86 systems, including
all systems supported by the intel_pstate driver.

So in "performance" governor mode, the scheduler hook can be skipped.
This applies to both in Software and Hardware P-state control modes.
Suggested-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NLen Brown <len.brown@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

82b4e03e

intel_pstate: delete scheduler hook in HWP mode · 62611cb9

由 Len Brown 提交于 6月 23, 2017

The cpufreq/scaling_cur_freq sysfs attribute is now provided by
shared x86 cpufreq code on modern x86 systems, including
all systems supported by the intel_pstate driver.

In HWP mode, maintaining that value was the sole purpose of
the scheduler hook, intel_pstate_update_util_hwp(),
so it can now be removed.
Signed-off-by: NLen Brown <len.brown@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

62611cb9

24 6月, 2017 1 次提交

cpufreq: intel_pstate: Remove max/min fractions to limit performance · 1a4fe38a

由 Srinivas Pandruvada 提交于 6月 12, 2017

In the current model the max/min perf limits are a fraction of current
user space limits to the allowed max_freq or 100% for global limits.
This results in wrong ratio limits calculation because of rounding
issues for some user space limits.

Initially we tried to solve this issue by issue by having more shift
bits to increase precision. Still there are isolated cases where we still
have error.

This can be avoided by using ratios all together. Since the way we get
cpuinfo.max_freq is by multiplying scaling factor to max ratio, we can
easily keep the max/min ratios in terms of ratios and not fractions.

For example:
if the max ratio = 36
cpuinfo.max_freq = 36 * 100000 = 3600000

Suppose user space sets a limit of 1200000, then we can calculate
max ratio limit as
= 36 * 1200000 / 3600000
= 12
This will be correct for any user limits.

The other advantage is that, we don't need to do any calculation in the
fast path as ratio limit is already calculated via set_policy() callback.
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

1a4fe38a

05 6月, 2017 1 次提交

cpufreq: intel_pstate: Avoid division by 0 in min_perf_pct_min() · 57caf4ec

由 Rafael J. Wysocki 提交于 6月 05, 2017

Commit c5a2ee7d (cpufreq: intel_pstate: Active mode P-state
limits rework) incorrectly assumed that pstate.turbo_pstate would
always be nonzero for CPU0 in min_perf_pct_min() if
cpufreq_register_driver() had succeeded which may not be the case
in virtualized environments.

If that assumption doesn't hold, it leads to an early crash on boot
in intel_pstate_register_driver(), so add a sanity check to
min_perf_pct_min() to prevent the crash from happening.

Fixes: c5a2ee7d (cpufreq: intel_pstate: Active mode P-state limits rework)
Reported-and-tested-by: NJongman Heo <jongman.heo@samsung.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

57caf4ec

12 5月, 2017 1 次提交

intel_pstate: use updated msr-index.h HWP.EPP values · 3cedbc5a

由 Len Brown 提交于 5月 01, 2017

intel_pstate exports sysfs attributes for setting and observing HWP.EPP.
These attributes use strings to describe 4 operating states, and
inside the driver, these strings are mapped to numerical register
values.

The authorative mapping between the strings and numerical HWP.EPP values
are now globally defined in msr-index.h, replacing the out-dated
mapping that were open-coded into intel_pstate.c

new old string
--- --- ------
  0   0 performance
128  64 balance_performance
192 128 balance_power
255 192 power

Note that the HW and BIOS default value on most system is 128,
which intel_pstate will now call "balance_performance"
while it used to call it "balance_power".
Signed-off-by: NLen Brown <len.brown@intel.com>

3cedbc5a

18 4月, 2017 1 次提交

cpufreq: schedutil: Use policy-dependent transition delays · 1b72e7fd

由 Rafael J. Wysocki 提交于 4月 11, 2017

Make the schedutil governor take the initial (default) value of the
rate_limit_us sysfs attribute from the (new) transition_delay_us
policy parameter (to be set by the scaling driver).

That will allow scaling drivers to make schedutil use smaller default
values of rate_limit_us and reduce the default average time interval
between consecutive frequency changes.

Make intel_pstate set transition_delay_us to 500.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>

1b72e7fd

30 3月, 2017 1 次提交

cpufreq: intel_pstate: Add support for Gemini Lake · 630e5757

由 Box, David E 提交于 3月 29, 2017

Use same parameters as INTEL_FAM6_ATOM_GOLDMONT to enable
Gemini Lake.
Signed-off-by: NBox, David E <david.e.box@intel.com>
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

630e5757

29 3月, 2017 5 次提交

cpufreq: intel_pstate: Eliminate intel_pstate_get_min_max() · b02aabe8

由 Rafael J. Wysocki 提交于 3月 28, 2017

Some computations in intel_pstate_get_min_max() are not necessary
and one of its two callers doesn't even use the full result.

First off, the fixed-point value of cpu->max_perf represents a
non-negative number between 0 and 1 inclusive and cpu->min_perf
cannot be greater than cpu->max_perf. It is not necessary to check
those conditions every time the numbers in question are used.

Moreover, since intel_pstate_max_within_limits() only needs the
upper boundary, it doesn't make sense to compute the lower one in
there and returning min and max from intel_pstate_get_min_max()
via pointers doesn't look particularly nice.

For the above reasons, drop intel_pstate_get_min_max(), add a helper
to get the base P-state for min/max computations and carry out them
directly in the previous callers of intel_pstate_get_min_max().
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

b02aabe8

cpufreq: intel_pstate: Do not walk policy->cpus · 2bfc4cbb

由 Rafael J. Wysocki 提交于 3月 28, 2017

intel_pstate_hwp_set() is the only function walking policy->cpus
in intel_pstate.  The rest of the code simply assumes one CPU per
policy, including the initialization code.

Therefore it doesn't make sense for intel_pstate_hwp_set() to
walk policy->cpus as it is guaranteed to have only one bit set
for policy->cpu.

For this reason, rearrange intel_pstate_hwp_set() to take the CPU
number as the argument and drop the loop over policy->cpus from it.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

2bfc4cbb

cpufreq: intel_pstate: Introduce pid_in_use() · 8ca6ce37

由 Rafael J. Wysocki 提交于 3月 28, 2017

Add a new function pid_in_use() to return the information on whether
or not the PID-based P-state selection algorithm is in use.

That allows a couple of complicated conditions in the code to be
reduced to simple checks against the new function's return value.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

8ca6ce37

cpufreq: intel_pstate: Drop struct cpu_defaults · 2f49afc2

由 Rafael J. Wysocki 提交于 3月 28, 2017

The cpu_defaults structure is redundant, because it only contains
one member of type struct pstate_funcs which can be used directly
instead of struct cpu_defaults.

For this reason, drop struct cpu_defaults, use struct pstate_funcs
directly instead of it where applicable and rename all of the
variables of that type accordingly.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

2f49afc2

cpufreq: intel_pstate: Move cpu_defaults definitions · de4a76cb

由 Rafael J. Wysocki 提交于 3月 28, 2017

Move the definitions of the cpu_defaults structures after the
definitions of utilization update callback routines to avoid
extra declarations of the latter.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

de4a76cb

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功