- 21 7月, 2014 11 次提交
-
-
由 Stratos Karafotis 提交于
The specific rounding adds conditionally only 1/256 to fractional part of core_pct. We can safely remove it without any noticeable impact in calculations. Use div64_u64 instead of div_u64 to avoid possible overflow of sample->mperf as divisor Signed-off-by: NStratos Karafotis <stratosk@semaphore.gr> Signed-off-by: NDirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Stratos Karafotis 提交于
Simplify the code by removing the inline functions pstate_increase and pstate_decrease and use directly the intel_pstate_set_pstate. Signed-off-by: NStratos Karafotis <stratosk@semaphore.gr> Acked-by: NDirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Stratos Karafotis 提交于
Currently we shift right aperf and mperf variables by FRAC_BITS to prevent overflow when we convert them to fix point numbers (shift left by FRAC_BITS). But this is not necessary, because we actually use delta aperf and mperf which are much less than APERF and MPERF values. So, use the unmodified APERF and MPERF values in calculation. This also adds 8 bits in precision, although the gain is insignificant. Signed-off-by: NStratos Karafotis <stratosk@semaphore.gr> Signed-off-by: NDirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Stratos Karafotis 提交于
According to Intel 64 and IA-32 Architectures SDM, Volume 3, Chapter 14.2, "Software needs to exercise care to avoid delays between the two RDMSRs (for example interrupts)". So, disable interrupts during reading MSRs IA32_APERF and IA32_MPERF. This should increase the accuracy of the calculations. Signed-off-by: NStratos Karafotis <stratosk@semaphore.gr> Signed-off-by: NDirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Stratos Karafotis 提交于
Suppress checkpatch.pl --strict warnings: CHECK: Alignment should match open parenthesis Signed-off-by: NStratos Karafotis <stratosk@semaphore.gr> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Stratos Karafotis 提交于
Remove the unnecessary intermediate assignment and use directly the pid_params.sample_rate_ms variable. Signed-off-by: NStratos Karafotis <stratosk@semaphore.gr> Signed-off-by: NDirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Stratos Karafotis 提交于
Remove unnecessary parentheses. Also, add parentheses in one case for better readability. Signed-off-by: NStratos Karafotis <stratosk@semaphore.gr> Signed-off-by: NDirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Stratos Karafotis 提交于
We can fit these lines in a single one, under the 80 characters limit. Signed-off-by: NStratos Karafotis <stratosk@semaphore.gr> Signed-off-by: NDirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Stratos Karafotis 提交于
Also, remove unnecessary blank lines. Signed-off-by: NStratos Karafotis <stratosk@semaphore.gr> Signed-off-by: NDirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Stratos Karafotis 提交于
div_s64() accepts the divisor parameter as s32. Helper div_fp() also accepts divisor as int32_t. So, remove the unnecessary int64_t type casting. Signed-off-by: NStratos Karafotis <stratosk@semaphore.gr> Signed-off-by: NDirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Stratos Karafotis 提交于
Since we never remove sysfs entry and debugfs files, we can make the intel_pstate_kobject and debugfs_parent locals. Also, annotate with __init intel_pstate_sysfs_expose_params() and intel_pstate_debug_expose_params() in order to be freed after bootstrap. Signed-off-by: NStratos Karafotis <stratosk@semaphore.gr> Signed-off-by: NDirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 17 7月, 2014 1 次提交
-
-
由 Viresh Kumar 提交于
This is only relevant to implementations with multiple clusters, where clusters have separate clock lines but all CPUs within a cluster share it. Consider a dual cluster platform with 2 cores per cluster. During suspend we start hot unplugging CPUs in order 1 to 3. When CPU2 is removed, policy->kobj would be moved to CPU3 and when CPU3 goes down we wouldn't free policy or its kobj as we want to retain permissions/values/etc. Now on resume, we will get CPU2 before CPU3 and will call __cpufreq_add_dev(). We will recover the old policy and update policy->cpu from 3 to 2 from update_policy_cpu(). But the kobj is still tied to CPU3 and isn't moved to CPU2. We wouldn't create a link for CPU2, but would try that for CPU3 while bringing it online. Which will report errors as CPU3 already has kobj assigned to it. This bug got introduced with commit 42f921a6, which overlooked this scenario. To fix this, lets move kobj to the new policy->cpu while bringing first CPU of a cluster back. Also do a WARN_ON() if kobject_move failed, as we would reach here only for the first CPU of a non-boot cluster. And we can't recover from this situation, if kobject_move() fails. Fixes: 42f921a6 (cpufreq: remove sysfs files for CPUs which failed to come back after resume) Cc: 3.13+ <stable@vger.kernel.org> # 3.13+ Reported-and-tested-by: NBu Yitian <ybu@qti.qualcomm.com> Reported-by: NSaravana Kannan <skannan@codeaurora.org> Reviewed-by: NSrivatsa S. Bhat <srivatsa@mit.edu> Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 16 7月, 2014 4 次提交
-
-
由 Viresh Kumar 提交于
OPPs can be populated statically, via DT, or added at run time with dev_pm_opp_add(). While this driver handles the first case correctly, it would fail to populate OPPs added at runtime. Because call to of_init_opp_table() would fail as there are no OPPs in DT and probe will return early. To fix this, remove error checking and call dev_pm_opp_init_cpufreq_table() unconditionally. Update bindings as well. Suggested-by: NStephen Boyd <sboyd@codeaurora.org> Tested-by: NStephen Boyd <sboyd@codeaurora.org> Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org> Acked-by: NSantosh Shilimkar <santosh.shilimkar@ti.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Quentin Armitage 提交于
Commit ff1f0018 ("drivers: Enable building of Kirkwood drivers for mach-mvebu") added Kirkwood into mach-mvebu, adding MACH_KIRKWOOD to ARCH_KIRKWOOD in the KConfig files. The change for ARM_KIRKWOOD_CPUFREQ replaced ARCH_KIRKWOOD with MACH_KIRKWOOD, whereas all the other changes were ARCH_KIRKWOOD || MACH_KIRKWOOD. As a consequence of this change, the cpufreq driver is no longer enabled for ARCH_KIRKWOOD. This patch reinstates ARM_KIRKWOOD_CPUFREQ for ARCH_KIRKWOOD. Signed-off-by: NQuentin Armitage <quentin@armitage.org.uk> Acked-by: NAndrew Lunn <andrew@lunn.ch> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Nicolas Del Piano 提交于
PM_OPP is a library used by several of the existing cpufreq drivers. ARM IMX6Q cpufreq driver uses this library for its functionality. Thus, it should be selected in Kconfig. Reported-by: NEzequiel Garcia <ezequiel@vanguardiasur.com.ar> Signed-off-by: NNicolas Del Piano <ndel314@gmail.com> Acked-by: NViresh Kumar <viresh.kumar@linaro.org> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Linus Walleij 提交于
The Compaq iPAQ h3600 also has the K4S281632b-1H memory type. Verified by prying apart a broken board. Signed-off-by: NLinus Walleij <linus.walleij@linaro.org> Acked-by: NViresh Kumar <viresh.kumar@linaro.org> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 09 7月, 2014 1 次提交
-
-
由 Prabhakar Lad 提交于
Since commtit 8a7b1227 (cpufreq: davinci: move cpufreq driver to drivers/cpufreq) this added dependancy only for CONFIG_ARCH_DAVINCI_DA850 where as davinci_cpufreq_init() call is used by all davinci platform. This patch fixes following build error: arch/arm/mach-davinci/built-in.o: In function `davinci_init_late': :(.init.text+0x928): undefined reference to `davinci_cpufreq_init' make: *** [vmlinux] Error 1 Fixes: 8a7b1227 (cpufreq: davinci: move cpufreq driver to drivers/cpufreq) Signed-off-by: NLad, Prabhakar <prabhakar.csengg@gmail.com> Acked-by: NViresh Kumar <viresh.kumar@linaro.org> Cc: 3.10+ <stable@vger.kernel.org> # 3.10+ Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 07 7月, 2014 3 次提交
-
-
由 Vincent Minet 提交于
Ensure that cpu->cpu is set before writing MSR_IA32_PERF_CTL during CPU initialization. Otherwise only cpu0 has its P-state set and all other cores are left with their values unchanged. In most cases, this is not too serious because the P-states will be set correctly when the timer function is run. But when the default governor is set to performance, the per-CPU current_pstate stays the same forever and no attempts are made to write the MSRs again. Signed-off-by: NVincent Minet <vincent@vincent-minet.net> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Dirk Brandewie 提交于
If turbo is disabled in the BIOS bit 38 should be set in MSR_IA32_MISC_ENABLE register per section 14.3.2.1 of the SDM Vol 3 document 325384-050US Feb 2014. If this bit is set do *not* attempt to disable trubo via the MSR_IA32_PERF_CTL register. On some systems trying to disable turbo via MSR_IA32_PERF_CTL will cause subsequent writes to MSR_IA32_PERF_CTL not take affect, in fact reading MSR_IA32_PERF_CTL will not show the IDA/Turbo DISENGAGE bit(32) as set. A write of bit 32 to zero returns to normal operation. Also deal with the case where the processor does not support turbo and the BIOS does not report the fact in MSR_IA32_MISC_ENABLE but does report the max and turbo P states as the same value. Link: https://bugzilla.kernel.org/show_bug.cgi?id=64251 Cc: 3.13+ <stable@vger.kernel.org> # 3.13+ Signed-off-by: NDirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Dirk Brandewie 提交于
Commit 21855ff5 (intel_pstate: Set turbo VID for BayTrail) introduced setting the turbo VID which is required to prevent a machine check on some Baytrail SKUs under heavy graphics based workloads. The docmumentation update that brought the requirement to light also changed the bit mask used for enumerating P state and VID values from 0x7f to 0x3f. This change returns the mask value to 0x7f. Tested with the Intel NUC DN2820FYK, BIOS version FYBYT10H.86A.0034.2014.0513.1413 with v3.16-rc1 and v3.14.8 kernel versions. Fixes: 21855ff5 (intel_pstate: Set turbo VID for BayTrail) Link: https://bugzilla.kernel.org/show_bug.cgi?id=77951Reported-and-tested-by: NRune Reterson <rune@megahurts.dk> Reported-and-tested-by: NEric Eickmeyer <erich@ericheickmeyer.com> Cc: 3.13+ <stable@vger.kernel.org> # 3.13+ Signed-off-by: NDirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 19 6月, 2014 1 次提交
-
-
由 Aaron Plattner 提交于
Commit bd0fa9bb introduced a failure path to cpufreq_update_policy() if cpufreq_driver->get(cpu) returns NULL. However, it jumps to the 'no_policy' label, which exits without unlocking any of the locks the function acquired earlier. This causes later calls into cpufreq to hang. Fix this by creating a new 'unlock' label and jumping to that instead. Fixes: bd0fa9bb ("cpufreq: Return error if ->get() failed in cpufreq_update_policy()") Link: https://devtalk.nvidia.com/default/topic/751903/kernel-3-15-and-nv-drivers-337-340-failed-to-initialize-the-nvidia-kernel-module-gtx-550-ti-/Signed-off-by: NAaron Plattner <aplattner@nvidia.com> Cc: 3.15+ <stable@vger.kernel.org> # 3.15+ Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 18 6月, 2014 1 次提交
-
-
由 Doug Smythies 提交于
There was a mistake in the actual rounding portion this previous patch: f0fe3cd7 (intel_pstate: Correct rounding in busy calculation) such that the rounding was asymetric and incorrect. Severity: Not very serious, but can increase target pstate by one extra value. For real world work flows the issue should self correct (but I have no proof). It is the equivalent of different PID gains for positive and negative numbers. Examples: -3.000000 used to round to -4, rounds to -3 with this patch. -3.503906 used to round to -5, rounds to -4 with this patch. Fixes: f0fe3cd7 (intel_pstate: Correct rounding in busy calculation) Signed-off-by: NDoug Smythies <dsmythies@telus.net> Cc: 3.14+ <stable@vger.kernel.org> # 3.14+ Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 17 6月, 2014 1 次提交
-
-
由 Arnd Bergmann 提交于
5fbfbcd3 ("cpufreq: cpufreq-cpu0: remove dependency on THERMAL and REGULATOR") was a little too quick in completely removing the dependency on the THERMAL driver. The problem is that while there are inline wrappers to turn the thermal API calls into empty functions, those do not help if the cpu-thermal driver is a loadable module and cpufreq-cpu0 is builtin. Since CONFIG_CPU_THERMAL is a bool option that decides whether the cpu code is built into the thermal module or not, we have to use a dependency on the thermal driver itself. However, if CPU_THERMAL is disabled, we don't need the dependency, hence the strange '!CPU_THERMAL || THERMAL' construct. Fixes: 5fbfbcd3 ("cpufreq: cpufreq-cpu0: remove dependency on THERMAL and REGULATOR") Signed-off-by: NArnd Bergmann <arnd@arndb.de> Tested-by: NViresh Kumar <viresh.kumar@linaro.org> Acked-by: NViresh Kumar <viresh.kumar@linaro.org> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 11 6月, 2014 3 次提交
-
-
由 Viresh Kumar 提交于
cpufreq-cpu0 uses thermal framework to register a cooling device, but doesn't depend on it as there are dummy calls provided by thermal layer when CONFIG_THERMAL=n. And when these calls fail, the driver is still usable. Similar explanation is valid for regulators as well. We do have dummy calls available for regulator APIs and the driver can work even when those calls fail. So, we don't really need to mention thermal and regulators as a dependency for cpufreq-cpu0 in Kconfig as platforms without support for thermal/regulator can also use this driver. Remove this dependency. Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Viresh Kumar 提交于
Tegra's driver got updated a bit (00917ddc cpufreq: Tegra: implement intermediate frequency callbacks) and implements new 'intermediate freq' infrastructure of core. Above commit updated comments about when to call clk_prepare_enable(pll_x_clk) and Doug wasn't satisfied with those comments and said this: > The "Though when target-freq is intermediate freq, we don't need to > take this reference." makes me think that this function is actually > called when target-freq is intermediate freq. I don't think it is, > right? For better clarity just make that comment more explicit about when we call tegra_target_intermediate(). Reviewed-by: NStephen Warren <swarren@nvidia.com> Reported-and-reviewed-by: NDoug Anderson <dianders@chromium.org> Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Stratos Karafotis 提交于
We check the CPU ID during driver init. There is no need to do it again per logical CPU initialization. So, remove the duplicate check. Signed-off-by: NStratos Karafotis <stratosk@semaphore.gr> Acked-by: NViresh Kumar <viresh.kumar@linaro.org> Acked-by: NDirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 10 6月, 2014 1 次提交
-
-
由 Viresh Kumar 提交于
Sometimes boot loaders set CPU frequency to a value outside of frequency table present with cpufreq core. In such cases CPU might be unstable if it has to run on that frequency for long duration of time and so its better to set it to a frequency which is specified in frequency table. Sachin recently found this problem with cpufreq-cpu0 driver when he was testing it for Exynos. Set this flag for cpufreq-cpu0 driver. Reported-and-tested-by: NSachin Kamat <sachin.kamat@linaro.org> Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 09 6月, 2014 1 次提交
-
-
由 Viresh Kumar 提交于
'copy_prev_load' was recently added by commit: 18b46abd (cpufreq: governor: Be friendly towards latency-sensitive bursty workloads). It actually is a bit redundant as we also have 'prev_load' which can store any integer value and can be used instead of 'copy_prev_load' by setting it zero. True load can also turn out to be zero during long idle intervals (and hence the actual value of 'prev_load' and the overloaded value can clash). However this is not a problem because, if the true load was really zero in the previous interval, it makes sense to evaluate the load afresh for the current interval rather than copying the previous load. So, drop 'copy_prev_load' and use 'prev_load' instead. Update comments as well to make it more clear. There is another change here which was probably missed by Srivatsa during the last version of updates he made. The unlikely in the 'if' statement was covering only half of the condition and the whole line should actually come under it. Also checkpatch is made more silent as it was reporting this (--strict option): CHECK: Alignment should match open parenthesis + if (unlikely(wall_time > (2 * sampling_rate) && + j_cdbs->prev_load)) { Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org> Reviewed-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: NPavel Machek <pavel@ucw.cz> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 08 6月, 2014 1 次提交
-
-
由 Srivatsa S. Bhat 提交于
Cpufreq governors like the ondemand governor calculate the load on the CPU periodically by employing deferrable timers. A deferrable timer won't fire if the CPU is completely idle (and there are no other timers to be run), in order to avoid unnecessary wakeups and thus save CPU power. However, the load calculation logic is agnostic to all this, and this can lead to the problem described below. Time (ms) CPU 1 100 Task-A running 110 Governor's timer fires, finds load as 100% in the last 10ms interval and increases the CPU frequency. 110.5 Task-A running 120 Governor's timer fires, finds load as 100% in the last 10ms interval and increases the CPU frequency. 125 Task-A went to sleep. With nothing else to do, CPU 1 went completely idle. 200 Task-A woke up and started running again. 200.5 Governor's deferred timer (which was originally programmed to fire at time 130) fires now. It calculates load for the time period 120 to 200.5, and finds the load is almost zero. Hence it decreases the CPU frequency to the minimum. 210 Governor's timer fires, finds load as 100% in the last 10ms interval and increases the CPU frequency. So, after the workload woke up and started running, the frequency was suddenly dropped to absolute minimum, and after that, there was an unnecessary delay of 10ms (sampling period) to increase the CPU frequency back to a reasonable value. And this pattern repeats for every wake-up-from-cpu-idle for that workload. This can be quite undesirable for latency- or response-time sensitive bursty workloads. So we need to fix the governor's logic to detect such wake-up-from- cpu-idle scenarios and start the workload at a reasonably high CPU frequency. One extreme solution would be to fake a load of 100% in such scenarios. But that might lead to undesirable side-effects such as frequency spikes (which might also need voltage changes) especially if the previous frequency happened to be very low. We just want to avoid the stupidity of dropping down the frequency to a minimum and then enduring a needless (and long) delay before ramping it up back again. So, let us simply carry forward the previous load - that is, let us just pretend that the 'load' for the current time-window is the same as the load for the previous window. That way, the frequency and voltage will continue to be set to whatever values they were set at previously. This means that bursty workloads will get a chance to influence the CPU frequency at which they wake up from cpu-idle, based on their past execution history. Thus, they might be able to avoid suffering from slow wakeups and long response-times. However, we should take care not to over-do this. For example, such a "copy previous load" logic will benefit cases like this: (where # represents busy and . represents idle) ##########.........#########.........###########...........##########........ but it will be detrimental in cases like the one shown below, because it will retain the high frequency (copied from the previous interval) even in a mostly idle system: ##########.........#.................#.....................#............... (i.e., the workload finished and the remaining tasks are such that their busy periods are smaller than the sampling interval, which causes the timer to always get deferred. So, this will make the copy-previous-load logic copy the initial high load to subsequent idle periods over and over again, thus keeping the frequency high unnecessarily). So, we modify this copy-previous-load logic such that it is used only once upon every wakeup-from-idle. Thus if we have 2 consecutive idle periods, the previous load won't get blindly copied over; cpufreq will freshly evaluate the load in the second idle interval, thus ensuring that the system comes back to its normal state. [ The right way to solve this whole problem is to teach the CPU frequency governors to also track load on a per-task basis, not just a per-CPU basis, and then use both the data sources intelligently to set the appropriate frequency on the CPUs. But that involves redesigning the cpufreq subsystem, so this patch should make the situation bearable until then. ] Experimental results: +-------------------+ I ran a modified version of ebizzy (called 'sleeping-ebizzy') that sleeps in between its execution such that its total utilization can be a user-defined value, say 10% or 20% (higher the utilization specified, lesser the amount of sleeps injected). This ebizzy was run with a single-thread, tied to CPU 8. Behavior observed with tracing (sample taken from 40% utilization runs): ------------------------------------------------------------------------ Without patch: ~~~~~~~~~~~~~~ kworker/8:2-12137 416.335742: cpu_frequency: state=2061000 cpu_id=8 kworker/8:2-12137 416.335744: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy <...>-40753 416.345741: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2 kworker/8:2-12137 416.345744: cpu_frequency: state=4123000 cpu_id=8 kworker/8:2-12137 416.345746: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy <...>-40753 416.355738: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2 <snip> --------------------------------------------------------------------- <snip> <...>-40753 416.402202: sched_switch: prev_comm=ebizzy ==> next_comm=swapper/8 <idle>-0 416.502130: sched_switch: prev_comm=swapper/8 ==> next_comm=ebizzy <...>-40753 416.505738: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2 kworker/8:2-12137 416.505739: cpu_frequency: state=2061000 cpu_id=8 kworker/8:2-12137 416.505741: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy <...>-40753 416.515739: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2 kworker/8:2-12137 416.515742: cpu_frequency: state=4123000 cpu_id=8 kworker/8:2-12137 416.515744: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy Observation: Ebizzy went idle at 416.402202, and started running again at 416.502130. But cpufreq noticed the long idle period, and dropped the frequency at 416.505739, only to increase it back again at 416.515742, realizing that the workload is in-fact CPU bound. Thus ebizzy needlessly ran at the lowest frequency for almost 13 milliseconds (almost 1 full sample period), and this pattern repeats on every sleep-wakeup. This could hurt latency-sensitive workloads quite a lot. With patch: ~~~~~~~~~~~ kworker/8:2-29802 464.832535: cpu_frequency: state=2061000 cpu_id=8 <snip> --------------------------------------------------------------------- <snip> kworker/8:2-29802 464.962538: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy <...>-40738 464.972533: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2 kworker/8:2-29802 464.972536: cpu_frequency: state=4123000 cpu_id=8 kworker/8:2-29802 464.972538: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy <...>-40738 464.982531: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2 <snip> --------------------------------------------------------------------- <snip> kworker/8:2-29802 465.022533: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy <...>-40738 465.032531: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2 kworker/8:2-29802 465.032532: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy <...>-40738 465.035797: sched_switch: prev_comm=ebizzy ==> next_comm=swapper/8 <idle>-0 465.240178: sched_switch: prev_comm=swapper/8 ==> next_comm=ebizzy <...>-40738 465.242533: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2 kworker/8:2-29802 465.242535: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy <...>-40738 465.252531: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2 Observation: Ebizzy went idle at 465.035797, and started running again at 465.240178. Since ebizzy was the only real workload running on this CPU, cpufreq retained the frequency at 4.1Ghz throughout the run of ebizzy, no matter how many times ebizzy slept and woke-up in-between. Thus, ebizzy got the 10ms worth of 4.1 Ghz benefit during every sleep-wakeup (as compared to the run without the patch) and this boost gave a modest improvement in total throughput, as shown below. Sleeping-ebizzy records-per-second: ----------------------------------- Utilization Without patch With patch Difference (Absolute and % values) 10% 274767 277046 + 2279 (+0.829%) 20% 543429 553484 + 10055 (+1.850%) 40% 1090744 1107959 + 17215 (+1.578%) 60% 1634908 1662018 + 27110 (+1.658%) A rudimentary and somewhat approximately latency-sensitive workload such as sleeping-ebizzy itself showed a consistent, noticeable performance improvement with this patch. Hence, workloads that are truly latency-sensitive will benefit quite a bit from this change. Moreover, this is an overall win-win since this patch does not hurt power-savings at all (because, this patch does not reduce the idle time or idle residency; and the high frequency of the CPU when it goes to cpu-idle does not affect/hurt the power-savings of deep idle states). Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Reviewed-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com> Acked-by: NViresh Kumar <viresh.kumar@linaro.org> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 07 6月, 2014 2 次提交
-
-
由 Ed Swarthout 提交于
Commit 6712d293 (cpufreq: ppc-corenet-cpufreq: Fix __udivdi3 modpost error) used the remainder from do_div instead of the quotient. Fix that and add one to ensure minimum is met. Fixes: 6712d293 (cpufreq: ppc-corenet-cpufreq: Fix __udivdi3 modpost error) Signed-off-by: NEd Swarthout <Ed.Swarthout@freescale.com> Cc: 3.15+ <stable@vger.kernel.org> # 3.15+ Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Rafael J. Wysocki 提交于
This reverts commit 4920ab84 (cpufreq: Enable big.LITTLE cpufreq driver on arm64) that breaks build on arm64. Fixes: 4920ab84 (cpufreq: Enable big.LITTLE cpufreq driver on arm64) Reported-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 06 6月, 2014 2 次提交
-
-
由 Viresh Kumar 提交于
Tegra has been switching to intermediate frequency (pll_p_clk) forever. CPUFreq core has better support for handling notifications for these frequencies and so we can adapt Tegra's driver to it. Also do a WARN() if clk_set_parent() fails while moving back to pll_x as we should have atleast restored to earlier frequency on error. Tested-by: NStephen Warren <swarren@nvidia.com> Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org> Reviewed-by: NDoug Anderson <dianders@chromium.org> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Viresh Kumar 提交于
Douglas Anderson, recently pointed out an interesting problem due to which udelay() was expiring earlier than it should. While transitioning between frequencies few platforms may temporarily switch to a stable frequency, waiting for the main PLL to stabilize. For example: When we transition between very low frequencies on exynos, like between 200MHz and 300MHz, we may temporarily switch to a PLL running at 800MHz. No CPUFREQ notification is sent for that. That means there's a period of time when we're running at 800MHz but loops_per_jiffy is calibrated at between 200MHz and 300MHz. And so udelay behaves badly. To get this fixed in a generic way, introduce another set of callbacks get_intermediate() and target_intermediate(), only for drivers with target_index() and CPUFREQ_ASYNC_NOTIFICATION unset. get_intermediate() should return a stable intermediate frequency platform wants to switch to, and target_intermediate() should set CPU to that frequency, before jumping to the frequency corresponding to 'index'. Core will take care of sending notifications and driver doesn't have to handle them in target_intermediate() or target_index(). NOTE: ->target_index() should restore to policy->restore_freq in case of failures as core would send notifications for that. Tested-by: NStephen Warren <swarren@nvidia.com> Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org> Reviewed-by: NDoug Anderson <dianders@chromium.org> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 02 6月, 2014 4 次提交
-
-
由 Doug Smythies 提交于
This change makes the busy calculation using 64 bit math which prevents overflow for large values of aperf/mperf. Cc: 3.14+ <stable@vger.kernel.org> # 3.14+ Signed-off-by: NDoug Smythies <dsmythies@telus.net> Signed-off-by: NDirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Dirk Brandewie 提交于
The PID assumes that samples are of equal time, which for a deferable timers this is not true when the system goes idle. This causes the PID to take a long time to converge to the min P state and depending on the pattern of the idle load can make the P state appear stuck. The hold-off value of three sample times before using the scaling is to give a grace period for applications that have high performance requirements and spend a lot of time idle, The poster child for this behavior is the ffmpeg benchmark in the Phoronix test suite. Cc: 3.14+ <stable@vger.kernel.org> # 3.14+ Signed-off-by: NDirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Dirk Brandewie 提交于
Changing to fixed point math throughout the busy calculation in commit e66c1768 (Change busy calculation to use fixed point math.) Introduced some inaccuracies by rounding the busy value at two points in the calculation. This change removes roundings and moves the rounding to the output of the PID where the calculations are complete and the value returned as an integer. Fixes: e66c1768 (intel_pstate: Change busy calculation to use fixed point math.) Reported-by: NDoug Smythies <dsmythies@telus.net> Cc: 3.14+ <stable@vger.kernel.org> # 3.14+ Signed-off-by: NDirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Dirk Brandewie 提交于
Commit fcb6a15c (intel_pstate: Take core C0 time into account for core busy calculation) introduced a regression referenced below. The issue with "lockup" after suspend that this commit was addressing is now dealt with in the suspend path. Fixes: fcb6a15c (intel_pstate: Take core C0 time into account for core busy calculation) Link: https://bugzilla.kernel.org/show_bug.cgi?id=66581 Link: https://bugzilla.kernel.org/show_bug.cgi?id=75121Reported-by: NDoug Smythies <dsmythies@telus.net> Cc: 3.14+ <stable@vger.kernel.org> # 3.14+ Signed-off-by: NDirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 31 5月, 2014 1 次提交
-
-
由 Tomasz Figa 提交于
Currently Exynos cpufreq drivers rely on globally mapped clock controller registers to configure frequency of CPU cores. This is obviously wrong and will be removed in near future, but to enable support for multi-platform builds without introducing a regression it needs to be worked around. This patch hacks the code to look for clock controller node in device tree and map its registers using of_iomap(), instead of relying on global mapping, so dependencies on platform headers are removed and the driver can compile again with multiplatform support. Signed-off-by: NTomasz Figa <t.figa@samsung.com> Acked-by: NViresh Kumar <viresh.kumar@linaro.org> Signed-off-by: NKukjin Kim <kgene.kim@samsung.com>
-
- 29 5月, 2014 1 次提交
-
-
由 Viresh Kumar 提交于
Handling calls to ->target_index() has got complex over time and might become more complex. So, its better to take target_index() bits out in another routine __target_index() for better code readability. Shouldn't have any functional impact. Tested-by: NStephen Warren <swarren@nvidia.com> Reviewed-by: NDoug Anderson <dianders@chromium.org> Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 27 5月, 2014 1 次提交
-
-
由 Paul Bolle 提交于
A pr_err() was added in v3.1. It was guarded by a check for CONFIG_PM_VERBOSE. The Kconfig symbol PM_VERBOSE was removed in v3.0. So this pr_err() has never been used. Drop that check and clean up the message a bit. Signed-off-by: NPaul Bolle <pebolle@tiscali.nl> Acked-by: NViresh Kumar <viresh.kumar@linaro.org> Reviewed-by: NSachin Kamat <sachin.kamat@linaro.org> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-