提交 · ae8b8d8f86a03c19c5ecfd848609b2e9438f1cf2 · openeuler / raspberrypi-kernel

25 10月, 2016 1 次提交

cpufreq: intel_pstate: Always set max P-state in performance mode · 2f1d407a

由 Rafael J. Wysocki 提交于 10月 24, 2016

The only times at which intel_pstate checks the policy set for
a given CPU is the initialization of that CPU and updates of its
policy settings from cpufreq when intel_pstate_set_policy() is
invoked.

That is insufficient, however, because intel_pstate uses the same
P-state selection function for all CPUs regardless of the policy
setting for each of them and the P-state limits are shared between
them.  Thus if the policy is set to "performance" for a particular
CPU, it may not behave as expected if the cpufreq settings are
changed subsequently for another CPU.

That can be easily demonstrated by writing "performance" to
scaling_governor for all CPUs and then switching it to "powersave"
for one of them in which case all of the CPUs will behave as though
their scaling_governor were all "powersave" (even though the policy
still appears to be "performance" for the remaining CPUs).

Fix this problem by modifying intel_pstate_adjust_busy_pstate() to
always set the P-state to the maximum allowed by the current limits
for all CPUs whose policy is set to "performance".

Note that it still is recommended to always change the policy setting
in the same way for all CPUs even with this fix applied to avoid
confusion.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

2f1d407a

22 10月, 2016 3 次提交

cpufreq: intel_pstate: Set P-state upfront in performance mode · a6c6ead1

由 Rafael J. Wysocki 提交于 10月 19, 2016

After commit a4675fbc (cpufreq: intel_pstate: Replace timers with
utilization update callbacks) the cpufreq governor callbacks may not
be invoked on NOHZ_FULL CPUs and, in particular, switching to the
"performance" policy via sysfs may not have any effect on them. That
is a problem, because it usually is desirable to squeeze the last
bit of performance out of those CPUs, so work around it by setting
the maximum P-state (within the limits) in intel_pstate_set_policy()
upfront when the policy is CPUFREQ_POLICY_PERFORMANCE.

Fixes: a4675fbc (cpufreq: intel_pstate: Replace timers with utilization update callbacks)
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

a6c6ead1

cpufreq: intel_pstate: Remove PID debugfs when not used · 185d8245

由 Srinivas Pandruvada 提交于 10月 21, 2016

When target state is calculated using get_target_pstate_use_cpu_load(),
PID controller is not used, hence it has no effect on performance.
So don't present debugfs entries to tune PID controller.
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

185d8245

cpufreq: intel_pstate: Drop boost_iowait flag · 1d29815e

由 Rafael J. Wysocki 提交于 10月 19, 2016

The "IOwait boosting" mechanism is only used by the
get_target_pstate_use_cpu_load() governor function and the
boost_iowait flag in pid_params is always set when that function
is in use (and it is never set otherwise).  This means that the
boost_iowait flag is in fact redundant and may be dropped.

For this reason, replace the boost_iowait flag check in
intel_pstate_update_util() with an equivalent check against
pstate_funcs.get_target_pstate and drop that flag.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

1d29815e

13 10月, 2016 2 次提交

cpufreq: intel_pstate: Fix struct pstate_adjust_policy kerneldoc · 3954517e

由 Rafael J. Wysocki 提交于 10月 11, 2016

It looks like the name of struct pstate_adjust_policy was updated
without updating its kerneldoc comment accordingly, so fix that
mistake.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

3954517e

cpufreq: intel_pstate: Proportional algorithm for Atom · 0843e83c

由 Rafael J. Wysocki 提交于 10月 06, 2016

The PID algorithm used by the intel_pstate driver tends to drive
performance to the minimum for workloads with utilization below the
setpoint, which is undesirable, so replace it with a modified
"proportional" algorithm on Atom.

The new algorithm will set the new P-state to be 1.25 times the
available maximum times the (frequency-invariant) utilization during
the previous sampling period except when the target P-state computed
this way is lower than the average P-state during the previous
sampling period. In the latter case, it will increase the target by
50% of the difference between it and the average P-state to prevent
performance from dropping down too fast in some cases.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

0843e83c

10 10月, 2016 2 次提交

cpufreq: intel_pstate: Clarify comment in get_target_pstate_use_performance() · f00593a4

由 Rafael J. Wysocki 提交于 9月 30, 2016

Make the comment explaining the meaning of the perf_scaled variable
in get_target_pstate_use_performance() more straightforward.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

f00593a4

cpufreq: intel_pstate: Fix unsafe HWP MSR access · f9f4872d

由 Srinivas Pandruvada 提交于 10月 08, 2016

This is a requirement that MSR MSR_PM_ENABLE must be set to 0x01 before
reading MSR_HWP_CAPABILITIES on a given CPU. If cpufreq init() is
scheduled on a CPU which is not same as policy->cpu or migrates to a
different CPU before calling msr read for MSR_HWP_CAPABILITIES, it
is possible that MSR_PM_ENABLE was not to set to 0x01 on that CPU.
This will cause GP fault. So like other places in this path
rdmsrl_on_cpu should be used instead of rdmsrl.

Moreover the scope of MSR_HWP_CAPABILITIES is on per thread basis, so it
should be read from the same CPU, for which MSR MSR_HWP_REQUEST is
getting set.

dmesg dump or warning:

[   22.014488] WARNING: CPU: 139 PID: 1 at arch/x86/mm/extable.c:50 ex_handler_rdmsr_unsafe+0x68/0x70
[   22.014492] unchecked MSR access error: RDMSR from 0x771
[   22.014493] Modules linked in:
[   22.014507] CPU: 139 PID: 1 Comm: swapper/0 Not tainted 4.7.5+ #1
...
...
[   22.014516] Call Trace:
[   22.014542]  [<ffffffff813d7dd1>] dump_stack+0x63/0x82
[   22.014558]  [<ffffffff8107bc8b>] __warn+0xcb/0xf0
[   22.014561]  [<ffffffff8107bcff>] warn_slowpath_fmt+0x4f/0x60
[   22.014563]  [<ffffffff810676f8>] ex_handler_rdmsr_unsafe+0x68/0x70
[   22.014564]  [<ffffffff810677d9>] fixup_exception+0x39/0x50
[   22.014604]  [<ffffffff8102e400>] do_general_protection+0x80/0x150
[   22.014610]  [<ffffffff817f9ec8>] general_protection+0x28/0x30
[   22.014635]  [<ffffffff81687940>] ? get_target_pstate_use_performance+0xb0/0xb0
[   22.014642]  [<ffffffff810600c7>] ? native_read_msr+0x7/0x40
[   22.014657]  [<ffffffff81688123>] intel_pstate_hwp_set+0x23/0x130
[   22.014660]  [<ffffffff81688406>] intel_pstate_set_policy+0x1b6/0x340
[   22.014662]  [<ffffffff816829bb>] cpufreq_set_policy+0xeb/0x2c0
[   22.014664]  [<ffffffff81682f39>] cpufreq_init_policy+0x79/0xe0
[   22.014666]  [<ffffffff81682cb0>] ? cpufreq_update_policy+0x120/0x120
[   22.014669]  [<ffffffff816833a6>] cpufreq_online+0x406/0x820
[   22.014671]  [<ffffffff8168381f>] cpufreq_add_dev+0x5f/0x90
[   22.014717]  [<ffffffff81530ac8>] subsys_interface_register+0xb8/0x100
[   22.014719]  [<ffffffff816821bc>] cpufreq_register_driver+0x14c/0x210
[   22.014749]  [<ffffffff81fe1d90>] intel_pstate_init+0x39d/0x4d5
[   22.014751]  [<ffffffff81fe13f2>] ? cpufreq_gov_dbs_init+0x12/0x12

Cc: 4.3+ <stable@vger.kernel.org> # 4.3+
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

f9f4872d

17 9月, 2016 1 次提交

cpufreq: intel_pstate: Add io_boost trace · 3ba7bcaa

由 Srinivas Pandruvada 提交于 9月 13, 2016

Add io_boost percent to current pstate_sample tracepoint.
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

3ba7bcaa

14 9月, 2016 1 次提交

cpufreq: intel_pstate: Use IOWAIT flag in Atom algorithm · 09c448d3

由 Rafael J. Wysocki 提交于 9月 14, 2016

Modify the P-state selection algorithm for Atom processors to use
the new SCHED_CPUFREQ_IOWAIT flag instead of the questionable
get_cpu_iowait_time_us() function.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

09c448d3

13 9月, 2016 1 次提交

intel_pstate: constify local structures · 42ce8921

由 Julia Lawall 提交于 9月 11, 2016

For structure types defined in the same file or local header files, find
top-level static structure declarations that have the following
properties:
1. Never reassigned.
2. Address never taken
3. Not passed to a top-level macro call
4. No pointer or array-typed field passed to a function or stored in a
variable.
Declare structures having all of these properties as const.

Done using Coccinelle.
Based on a suggestion by Joe Perches <joe@perches.com>.
Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

42ce8921

17 8月, 2016 1 次提交

cpufreq / sched: Pass flags to cpufreq_update_util() · 58919e83

由 Rafael J. Wysocki 提交于 8月 16, 2016

It is useful to know the reason why cpufreq_update_util() has just
been called and that can be passed as flags to cpufreq_update_util()
and to the ->func() callback in struct update_util_data.  However,
doing that in addition to passing the util and max arguments they
already take would be clumsy, so avoid it.

Instead, use the observation that the schedutil governor is part
of the scheduler proper, so it can access scheduler data directly.
This allows the util and max arguments of cpufreq_update_util()
and the ->func() callback in struct update_util_data to be replaced
with a flags one, but schedutil has to be modified to follow.

Thus make the schedutil governor obtain the CFS utilization
information from the scheduler and use the "RT" and "DL" flags
instead of the special utilization value of ULONG_MAX to track
updates from the RT and DL sched classes.  Make it non-modular
too to avoid having to export scheduler variables to modules at
large.

Next, update all of the other users of cpufreq_update_util()
and the ->func() callback in struct update_util_data accordingly.
Suggested-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>

58919e83

29 7月, 2016 1 次提交

cpufreq: intel_pstate: Add more out-of-band IDs · 65c1262f

由 Srinivas Pandruvada 提交于 7月 23, 2016

Add Skylake-X and Broadwell-X IDs for out-of-band (OBB) control of
P-States.

For these processors, if MSR_MISC_PWR_MGMT BIT(8) == 1, then the
Intel P-State driver should exit as OS can't control P-States.
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw : Subject/changelog ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

65c1262f

21 7月, 2016 3 次提交

cpufreq: intel_pstate: Check cpuid for MSR_HWP_INTERRUPT · da7de91c

由 Srinivas Pandruvada 提交于 7月 19, 2016

The MSR MSR_HWP_INTERRUPT is valid only when CPUID.06H:EAX[8] = 1, so
check for feature before accessing this MSR.
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

da7de91c

intel_pstate: Update cpu_frequency tracepoint every time · bc95a454

由 Rafael J. Wysocki 提交于 7月 19, 2016

Currently, intel_pstate only updates the cpu_frequency tracepoint
if the new P-state to set is different from the current one, but
that causes powertop to report 100% idle on an 100% loaded system
sometimes.

Prevent that from happening by updating the cpu_frequency tracepoint
every time intel_pstate_update_pstate() is called.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>-

bc95a454

cpufreq: intel_pstate: clean remnant struct element · 2630abc2

由 Carsten Emde 提交于 7月 19, 2016

When I was working with the Intel P state driver I came across a
remnant struct element that is no longer needed after the function
intel_pstate_calc_freq() was retired.
Signed-off-by: NCarsten Emde <C.Emde@osadl.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

2630abc2

11 7月, 2016 1 次提交

intel_pstate: Fix MSR_CONFIG_TDP_x addressing in core_get_max_pstate() · 5fc8f707

由 Jan Kiszka 提交于 7月 08, 2016

If MSR_CONFIG_TDP_CONTROL is locked, we currently try to address some
MSR 0x80000648 or so. Mask out the relevant level bits 0 and 1.

Found while running over the Jailhouse hypervisor which became upset
about this strange MSR index.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Acked-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: 4.4+ <stable@vger.kernel.org> # 4.4+
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

5fc8f707

07 7月, 2016 1 次提交

cpufreq: intel_pstate: Replace MSR_NHM_TURBO_RATIO_LIMIT · 100cf6f2

由 Srinivas Pandruvada 提交于 7月 06, 2016

Replace MSR_NHM_TURBO_RATIO_LIMIT with MSR_TURBO_RATIO_LIMIT.
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

100cf6f2

28 6月, 2016 4 次提交

intel_pstate: Declare pid_params/pstate_funcs/hwp_active __read_mostly · 4a7cb7a9

由 Jisheng Zhang 提交于 6月 27, 2016

pid_params is written once by copy_pid_params() during initialization,
and thereafter is mostly read by hot path intel_pstate_update_util().
The read of pid_params gets more after commit a4675fbc ("cpufreq:
intel_pstate: Replace timers with utilization update callbacks")

pstate_funcs is written once by copy_cpu_funcs() during initialization,
and thereafter is mostly read by hot path intel_pstate_update_util()

hwp_active is written to once during initialization and thereafter is
mostly read by hot path intel_pstate_update_util().

The fact that they are mostly read and not written to makes them
candidates for __read_mostly declarations.
Signed-off-by: NJisheng Zhang <jszhang@marvell.com>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
Acked-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

4a7cb7a9

intel_pstate: add __init/__initdata marker to some functions/variables · 29327c84

由 Jisheng Zhang 提交于 6月 27, 2016

These functions/variables are not needed after booting, so mark them
as __init or __initdata.
Signed-off-by: NJisheng Zhang <jszhang@marvell.com>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
Acked-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

29327c84

intel_pstate: Fix incorrect placement of __initdata · eed43609

由 Jisheng Zhang 提交于 6月 27, 2016

__initdata should be placed between the variable name and equal sign
(if there is) for the variable to be placed in the intended section.
Signed-off-by: NJisheng Zhang <jszhang@marvell.com>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
Acked-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

eed43609

intel_pstate: Do not clear utilization update hooks on policy changes · 5ab666e0

由 Rafael J. Wysocki 提交于 6月 27, 2016

intel_pstate_set_policy() is invoked by the cpufreq core during
driver initialization, on changes of policy attributes (minimim and
maximum frequency, for example) via sysfs and via CPU notifications
from the platform firmware.  On some platforms the latter may occur
relatively often.

Commit bb6ab52f (intel_pstate: Do not set utilization update hook
too early) made intel_pstate_set_policy() clear the CPU's utilization
update hook before updating the policy attributes for it (and set the
hook again after doind that), but that involves invoking
synchronize_sched() and adds overhead to the CPU notifications
mentioned above and to the sched-RCU handling in general.

That extra overhead is arguably not necessary, because updating
policy attributes when the CPU's utilization update hook is active
should not lead to any adverse effects, so drop the clearing of
the hook from intel_pstate_set_policy() and make it check if
the hook has been set already when attempting to set it.

Fixes: bb6ab52f (intel_pstate: Do not set utilization update hook too early)
Reported-by: NJisheng Zhang <jszhang@marvell.com>
Tested-by: NJisheng Zhang <jszhang@marvell.com>
Tested-by: NDoug Smythies <dsmythies@telus.net>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

5ab666e0

15 6月, 2016 1 次提交

cpufreq: intel_pstate: Adjust _PSS[0] freqeuency if needed · b00345d1

由 Srinivas Pandruvada 提交于 6月 14, 2016

The maximum turbo P-State used by the intel_pstate driver may be
limited by ACPI _PSS table entry 0.  After commit 9522a2ff
(cpufreq: intel_pstate: Enforce _PPC limits), the maximum performance
on servers will be capped by the _PSS table entry 0 by default.

Even though that is formally correct, it may lead to preformance
regressions in some cases.  Namely, if the _PSS table entry 0 is
not the maximum turbo P-State, performance measured after commit
9522a2ff will not match the performance measured before that
commit on the same system.

For this reason, modify the code to always use the maximum turbo
frequency as the one that corresponds to _PSS table entry 0 if turbo
is enabled in the BIOS.  This way, the performance levels from
before commit 9522a2ff will be restored on the affected systems.

Fixes: 9522a2ff (cpufreq: intel_pstate: Enforce _PPC limits)
Suggested-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw : Changelog ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

b00345d1

14 6月, 2016 1 次提交

cpufreq: intel_pstate: Broxton support · 41bad47f

由 Srinivas Pandruvada 提交于 6月 09, 2016

Add Broxton CPU model number.

Broxton requires core_params to get performance limits via MSRs, but
it is an Atom platform, which requires more power optimized algorithm.

So the P state selection will use similar algorithm as other Atom
platforms.
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

41bad47f

08 6月, 2016 3 次提交

x86/cpufreq: Use Intel family name macros for the intel_pstate cpufreq driver · 5b20c944

由 Dave Hansen 提交于 6月 02, 2016

Another straightforward replacement of magic numbers.
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Acked-by: NRafael J. Wysocki <rjw@rjwysocki.net>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: jacob.jun.pan@intel.com
Cc: linux-pm@vger.kernel.org
Link: http://lkml.kernel.org/r/20160603001945.0F5D02AA@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

5b20c944

cpufreq: intel_pstate: Fix ->set_policy() interface for no_turbo · 983e600e

由 Srinivas Pandruvada 提交于 6月 07, 2016

When turbo is disabled, the ->set_policy() interface is broken.

For example, when turbo is disabled and cpuinfo.max = 2900000 (full
max turbo frequency), setting the limits results in frequency less
than the requested one:
Set 1000000 KHz results in 0700000 KHz
Set 1500000 KHz results in 1100000 KHz
Set 2000000 KHz results in  1500000 KHz

This is because the limits->max_perf fraction is calculated using
the max turbo frequency as the reference, but when the max P-State is
capped in intel_pstate_get_min_max(), the reference is not the max
turbo P-State. This results in reducing max P-State.

One option is to always use max turbo as reference for calculating
limits. But this will not be correct. By definition the intel_pstate
sysfs limits, shows percentage of available performance. So when
BIOS has disabled turbo, the available performance is max non turbo.
So the max_perf_pct should still show 100%.
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw : Subject & changelog, rewrite in fewer lines of code ]
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

983e600e

cpufreq: intel_pstate: Fix code ordering in intel_pstate_set_policy() · 2c2c1af4

由 Srinivas Pandruvada 提交于 6月 07, 2016

The limits->max_perf is rounded_up but immediately overwritten by
another assignment to limits->max_perf.

Move that operation to the correct location.

While here also added a pr_debug() call in ->set_policy to aid in
debugging.

Fixes: 785ee278 (cpufreq: intel_pstate: Fix limits->max_perf rounding error)
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw : Subject & changelog ]
Cc: 4.4+ <stable@vger.kernel.org> # 4.4+
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

2c2c1af4

30 5月, 2016 1 次提交

cpufreq: intel_pstate: Downgrade print level for _PPC · 6cacd115

由 Srinivas Pandruvada 提交于 5月 29, 2016

Downgrade pr_info to pr_debug for the "_PPC limits will be enforced"
message.

In server systems with many cores this message is annoying.
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Changelog ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

6cacd115

18 5月, 2016 1 次提交

intel_pstate: Simplify conditional in intel_pstate_set_policy() · c749c64f

由 Rafael J. Wysocki 提交于 5月 12, 2016

One of the if () statements in intel_pstate_set_policy() causes
another if () to be evaluated if the condition is true and it
doesn't do anything else, so merge the two if () statements into
one.

No functional changes.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

c749c64f

12 5月, 2016 4 次提交

intel_pstate: Clean up get_target_pstate_use_performance() · 1aa7a6e2

由 Rafael J. Wysocki 提交于 5月 11, 2016

The comments and the core_busy variable name in
get_target_pstate_use_performance() are totally confusing,
so modify them to reflect what's going on.

The results of the computations should be the same as before.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

1aa7a6e2

intel_pstate: Use sample.core_avg_perf in get_avg_pstate() · 8edb0a6e

由 Rafael J. Wysocki 提交于 5月 11, 2016

Notice that get_avg_pstate() can use sample.core_avg_perf instead of
carrying the same division again, so make it do that.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

8edb0a6e

intel_pstate: Clarify average performance computation · a1c9787d

由 Rafael J. Wysocki 提交于 5月 11, 2016

The core_pct_busy field of struct sample actually contains the
average performace during the last sampling period (in percent)
and not the utilization of the core as suggested by its name
which is confusing.

For this reason, change the name of that field to core_avg_perf
and rename the function that computes its value accordingly.

Also notice that storing this value as percentage requires a costly
integer multiplication to be carried out in a hot path, so instead
store it as an "extended fixed point" value with more fraction bits
and update the code using it accordingly (it is better to change the
name of the field along with its meaning in one go than to make those
two changes separately, as that would likely lead to more
confusion).
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

a1c9787d

intel_pstate: Avoid unnecessary synchronize_sched() during initialization · 4578ee7e

由 Chen Yu 提交于 5月 11, 2016

Currently, in intel_pstate_clear_update_util_hook(), after
clearing the utilization update hook, we leverage
synchronize_sched() to deal with synchronization, which
is a little bit time-costly because synchronize_sched()
has to wait for all the CPUs to go through a grace period.

Actually, the synchronize_sched() is not necessary if the utilization
update hook has not been set for the given CPU yet, so make the driver
check if that's the case and avoid the synchronize_sched() call then.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=116371Tested-by: NTian Ye <yex.tian@intel.com>
Signed-off-by: NChen Yu <yu.c.chen@intel.com>
[ rjw : Rebase ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

4578ee7e

10 5月, 2016 1 次提交

intel_pstate: Clean up intel_pstate_get() · f96fd0c8

由 Rafael J. Wysocki 提交于 5月 07, 2016

intel_pstate_get() contains a local variable that's initialized but
never used and it can be written in fewer lines of code, so clean
it up.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

f96fd0c8

05 5月, 2016 1 次提交

cpufreq: intel_pstate: Ignore _PPC processing under HWP · e59a8f7f

由 Srinivas Pandruvada 提交于 5月 04, 2016

When HWP (hardware P states) feature is active, the ACPI _PSS and _PPC
is not used. So ignore processing for _PPC limits.
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

e59a8f7f

04 5月, 2016 1 次提交

intel_pstate: Fix intel_pstate_get() · 6d45b719

由 Rafael J. Wysocki 提交于 5月 04, 2016

After commit 8fa520af "intel_pstate: Remove freq calculation from
intel_pstate_calc_busy()" intel_pstate_get() calls get_avg_frequency()
to compute the average frequency, which is problematic for two reasons.

First, intel_pstate_get() may be invoked before the driver reads the
CPU feedback registers for the first time and if that happens,
get_avg_frequency() will attempt to divide by zero.

Second, the get_avg_frequency() call in intel_pstate_get() is racy
with respect to intel_pstate_sample() and it may end up returning
completely meaningless values for this reason.

Moreover, after commit 7349ec04 "intel_pstate: Move
intel_pstate_calc_busy() into get_target_pstate_use_performance()"
sample.core_pct_busy is never computed on Atom, but it is used in
intel_pstate_adjust_busy_pstate() in that case too.

To address those problems notice that if sample.core_pct_busy
was used in the average frequency computation carried out by
get_avg_frequency(), both the divide by zero problem and the
race with respect to intel_pstate_sample() would be avoided.

Accordingly, move the invocation of intel_pstate_calc_busy() from
get_target_pstate_use_performance() to intel_pstate_update_util(),
which also will take care of the uninitialized sample.core_pct_busy
on Atom, and modify get_avg_frequency() to use sample.core_pct_busy
as per the above.
Reported-by: Nkernel test robot <ying.huang@linux.intel.com>
Link: http://marc.info/?l=linux-kernel&m=146226437623173&w=4
Fixes: 8fa520af "intel_pstate: Remove freq calculation from intel_pstate_calc_busy()"
Fixes: 7349ec04 "intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()"
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

6d45b719

02 5月, 2016 1 次提交

cpufreq: intel_pstate: Fix HWP on boot CPU after system resume · ba41e1bc

由 Rafael J. Wysocki 提交于 5月 02, 2016

Commit 41cfd64c "Update frequencies of policy->cpus only from
->set_policy()" changed the way the intel_pstate driver's ->set_policy
callback updates the HWP (hardware-managed P-states) settings.
A side effect of it is that if those settings are modified on the
boot CPU during system suspend and wakeup, they will never be
restored during subsequent system resume.

To address this problem, allow cpufreq drivers that don't provide
->target or ->target_index callbacks to use ->suspend and ->resume
callbacks and add a ->resume callback to intel_pstate to restore
the HWP settings on the CPUs that belong to the given policy.

Fixes: 41cfd64c "Update frequencies of policy->cpus only from ->set_policy()"
Tested-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>

ba41e1bc

28 4月, 2016 3 次提交

cpufreq: intel_pstate: Enable PPC enforcement for servers · 2b3ec765

由 Srinivas Pandruvada 提交于 4月 27, 2016

For platforms which are controlled via remove node manager, enable _PPC by
default. These platforms are mostly categorized as enterprise server or
performance servers. These platforms needs to go through some
certifications tests, which tests control via _PPC.
The relative risk of enabling by default is low as this is is less likely
that these systems have broken _PSS table.
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

2b3ec765

cpufreq: intel_pstate: Adjust policy->max · 3be9200d

由 Srinivas Pandruvada 提交于 4月 27, 2016

When policy->max is changed via _PPC or sysfs and is more than the max non
turbo frequency, it does not really change resulting performance in some
processors. When policy->max results in a P-State ratio more than the
turbo activation ratio, then processor can choose any P-State up to max
turbo. So the user or _PPC setting has no value, but this can cause
undesirable side effects like:
- Showing reduced max percentage in Intel P-State sysfs
- It can cause reduced max performance under certain boundary conditions:
The requested max scaling frequency either via _PPC or via cpufreq-sysfs,
will be converted into a fixed floating point max percent scale. In
majority of the cases this will result in correct max. But not 100% of the
time. If the _PPC is requested at a point where the calculation lead to a
lower max, this can result in a lower P-State then expected and it will
impact performance.
Example of this condition using a Broadwell laptop with config TDP.

ACPI _PSS table from a Broadwell laptop
2301000 2300000 2200000 2000000 1900000 1800000 1700000 1500000 1400000
1300000 1100000 1000000 900000 800000 600000 500000

The actual results by disabling config TDP so that we can get what is
requested on or below 2300000Khz.

scaling_max_freq        Max Requested P-State   Resultant scaling
max
---------------------------------------- ----------------------
2400000                 18                      2900000 (max
turbo)
2300000                 17                      2300000 (max
physical non turbo)
2200000                 15                      2100000
2100000                 15                      2100000
2000000                 13                      1900000
1900000                 13                      1900000
1800000                 12                      1800000
1700000                 11                      1700000
1600000                 10                      1600000
1500000                 f                       1500000
1400000                 e                       1400000
1300000                 d                       1300000
1200000                 c                       1200000
1100000                 a                       1000000
1000000                 a                       1000000
900000                  9                        900000
800000                  8                        800000
700000                  7                        700000
600000                  6                        600000
500000                  5                        500000
------------------------------------------------------------------

Now set the config TDP level 1 ratio as 0x0b (equivalent to 1100000KHz)
in BIOS (not every system will let you adjust this).
The turbo activation ratio will be set to one less than that, which will
be 0x0a (So any request above 1000000KHz should result in turbo region
assuming no thermal limits).
Here _PPC will request max to 1100000KHz (which basically should still
result in turbo as this is more than the turbo activation ratio up to
max allowable turbo frequency), but actual calculation resulted in a max
ceiling P-State which is 0x0a. So under any load condition, this driver
will not request turbo P-States. This will be a huge performance hit.

When config TDP feature is ON, if the _PPC points to a frequency above
turbo activation ratio, the performance can still reach max turbo. In this
case we don't need to treat this as the reduced frequency in set_policy
callback.

In this change when config TDP is active (by checking if the physical max
non turbo ratio is more than the current max non turbo ratio), any request
above current max non turbo is treated as full performance.
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw : Minor cleanups ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

3be9200d

cpufreq: intel_pstate: Enforce _PPC limits · 9522a2ff

由 Srinivas Pandruvada 提交于 4月 27, 2016

Use ACPI _PPC notification to limit max P state driver will request.
ACPI _PPC change notification is sent by BIOS to limit max P state
in several cases:
- Reduce impact of platform thermal condition
- When Config TDP feature is used, a changed _PPC is sent to
follow TDP change
- Remote node managers in server want to control platform power
via baseboard management controller (BMC)

This change registers with ACPI processor performance lib so that
_PPC changes are notified to cpufreq core, which in turns will
result in call to .setpolicy() callback. Also the way _PSS
table identifies a turbo frequency is not compatible to max turbo
frequency in intel_pstate, so the very first entry in _PSS needs
to be adjusted.

This feature can be turned on by using kernel parameters:
intel_pstate=support_acpi_ppc
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Minor cleanups ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

9522a2ff