1. 21 7月, 2014 11 次提交
  2. 17 7月, 2014 1 次提交
    • V
      cpufreq: move policy kobj to policy->cpu at resume · 92c14bd9
      Viresh Kumar 提交于
      This is only relevant to implementations with multiple clusters, where clusters
      have separate clock lines but all CPUs within a cluster share it.
      
      Consider a dual cluster platform with 2 cores per cluster. During suspend we
      start hot unplugging CPUs in order 1 to 3. When CPU2 is removed, policy->kobj
      would be moved to CPU3 and when CPU3 goes down we wouldn't free policy or its
      kobj as we want to retain permissions/values/etc.
      
      Now on resume, we will get CPU2 before CPU3 and will call __cpufreq_add_dev().
      We will recover the old policy and update policy->cpu from 3 to 2 from
      update_policy_cpu().
      
      But the kobj is still tied to CPU3 and isn't moved to CPU2. We wouldn't create a
      link for CPU2, but would try that for CPU3 while bringing it online. Which will
      report errors as CPU3 already has kobj assigned to it.
      
      This bug got introduced with commit 42f921a6, which overlooked this scenario.
      
      To fix this, lets move kobj to the new policy->cpu while bringing first CPU of a
      cluster back. Also do a WARN_ON() if kobject_move failed, as we would reach here
      only for the first CPU of a non-boot cluster. And we can't recover from this
      situation, if kobject_move() fails.
      
      Fixes: 42f921a6 (cpufreq: remove sysfs files for CPUs which failed to come back after resume)
      Cc:  3.13+ <stable@vger.kernel.org> # 3.13+
      Reported-and-tested-by: NBu Yitian <ybu@qti.qualcomm.com>
      Reported-by: NSaravana Kannan <skannan@codeaurora.org>
      Reviewed-by: NSrivatsa S. Bhat <srivatsa@mit.edu>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      92c14bd9
  3. 16 7月, 2014 4 次提交
  4. 09 7月, 2014 1 次提交
  5. 07 7月, 2014 3 次提交
  6. 19 6月, 2014 1 次提交
  7. 18 6月, 2014 1 次提交
    • D
      intel_pstate: Correct rounding in busy calculation · 51d211e9
      Doug Smythies 提交于
      There was a mistake in the actual rounding portion this previous patch:
      f0fe3cd7 (intel_pstate: Correct rounding in busy calculation) such that
      the rounding was asymetric and incorrect.
      
      Severity: Not very serious, but can increase target pstate by one extra value.
      For real world work flows the issue should self correct (but I have no proof).
      It is the equivalent of different PID gains for positive and negative numbers.
      
      Examples:
       -3.000000 used to round to -4, rounds to -3 with this patch.
       -3.503906 used to round to -5, rounds to -4 with this patch.
      
      Fixes: f0fe3cd7 (intel_pstate: Correct rounding in busy calculation)
      Signed-off-by: NDoug Smythies <dsmythies@telus.net>
      Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      51d211e9
  8. 17 6月, 2014 1 次提交
  9. 11 6月, 2014 3 次提交
  10. 10 6月, 2014 1 次提交
  11. 09 6月, 2014 1 次提交
    • V
      cpufreq: governor: remove copy_prev_load from 'struct cpu_dbs_common_info' · c8ae481b
      Viresh Kumar 提交于
      'copy_prev_load' was recently added by commit: 18b46abd (cpufreq: governor: Be
      friendly towards latency-sensitive bursty workloads).
      
      It actually is a bit redundant as we also have 'prev_load' which can store any
      integer value and can be used instead of 'copy_prev_load' by setting it zero.
      
      True load can also turn out to be zero during long idle intervals (and hence the
      actual value of 'prev_load' and the overloaded value can clash). However this is
      not a problem because, if the true load was really zero in the previous
      interval, it makes sense to evaluate the load afresh for the current interval
      rather than copying the previous load.
      
      So, drop 'copy_prev_load' and use 'prev_load' instead.
      
      Update comments as well to make it more clear.
      
      There is another change here which was probably missed by Srivatsa during the
      last version of updates he made. The unlikely in the 'if' statement was covering
      only half of the condition and the whole line should actually come under it.
      
      Also checkpatch is made more silent as it was reporting this (--strict option):
      
      CHECK: Alignment should match open parenthesis
      +		if (unlikely(wall_time > (2 * sampling_rate) &&
      +						j_cdbs->prev_load)) {
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Reviewed-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      c8ae481b
  12. 08 6月, 2014 1 次提交
    • S
      cpufreq: governor: Be friendly towards latency-sensitive bursty workloads · 18b46abd
      Srivatsa S. Bhat 提交于
      Cpufreq governors like the ondemand governor calculate the load on the CPU
      periodically by employing deferrable timers. A deferrable timer won't fire
      if the CPU is completely idle (and there are no other timers to be run), in
      order to avoid unnecessary wakeups and thus save CPU power.
      
      However, the load calculation logic is agnostic to all this, and this can
      lead to the problem described below.
      
      Time (ms)               CPU 1
      
      100                Task-A running
      
      110                Governor's timer fires, finds load as 100% in the last
                         10ms interval and increases the CPU frequency.
      
      110.5              Task-A running
      
      120		   Governor's timer fires, finds load as 100% in the last
      		   10ms interval and increases the CPU frequency.
      
      125		   Task-A went to sleep. With nothing else to do, CPU 1
      		   went completely idle.
      
      200		   Task-A woke up and started running again.
      
      200.5		   Governor's deferred timer (which was originally programmed
      		   to fire at time 130) fires now. It calculates load for the
      		   time period 120 to 200.5, and finds the load is almost zero.
      		   Hence it decreases the CPU frequency to the minimum.
      
      210		   Governor's timer fires, finds load as 100% in the last
      		   10ms interval and increases the CPU frequency.
      
      So, after the workload woke up and started running, the frequency was suddenly
      dropped to absolute minimum, and after that, there was an unnecessary delay of
      10ms (sampling period) to increase the CPU frequency back to a reasonable value.
      And this pattern repeats for every wake-up-from-cpu-idle for that workload.
      This can be quite undesirable for latency- or response-time sensitive bursty
      workloads. So we need to fix the governor's logic to detect such wake-up-from-
      cpu-idle scenarios and start the workload at a reasonably high CPU frequency.
      
      One extreme solution would be to fake a load of 100% in such scenarios. But
      that might lead to undesirable side-effects such as frequency spikes (which
      might also need voltage changes) especially if the previous frequency happened
      to be very low.
      
      We just want to avoid the stupidity of dropping down the frequency to a minimum
      and then enduring a needless (and long) delay before ramping it up back again.
      So, let us simply carry forward the previous load - that is, let us just pretend
      that the 'load' for the current time-window is the same as the load for the
      previous window. That way, the frequency and voltage will continue to be set
      to whatever values they were set at previously. This means that bursty workloads
      will get a chance to influence the CPU frequency at which they wake up from
      cpu-idle, based on their past execution history. Thus, they might be able to
      avoid suffering from slow wakeups and long response-times.
      
      However, we should take care not to over-do this. For example, such a "copy
      previous load" logic will benefit cases like this: (where # represents busy
      and . represents idle)
      
      ##########.........#########.........###########...........##########........
      
      but it will be detrimental in cases like the one shown below, because it will
      retain the high frequency (copied from the previous interval) even in a mostly
      idle system:
      
      ##########.........#.................#.....................#...............
      
      (i.e., the workload finished and the remaining tasks are such that their busy
      periods are smaller than the sampling interval, which causes the timer to
      always get deferred. So, this will make the copy-previous-load logic copy
      the initial high load to subsequent idle periods over and over again, thus
      keeping the frequency high unnecessarily).
      
      So, we modify this copy-previous-load logic such that it is used only once
      upon every wakeup-from-idle. Thus if we have 2 consecutive idle periods, the
      previous load won't get blindly copied over; cpufreq will freshly evaluate the
      load in the second idle interval, thus ensuring that the system comes back to
      its normal state.
      
      [ The right way to solve this whole problem is to teach the CPU frequency
      governors to also track load on a per-task basis, not just a per-CPU basis,
      and then use both the data sources intelligently to set the appropriate
      frequency on the CPUs. But that involves redesigning the cpufreq subsystem,
      so this patch should make the situation bearable until then. ]
      
      Experimental results:
      +-------------------+
      
      I ran a modified version of ebizzy (called 'sleeping-ebizzy') that sleeps in
      between its execution such that its total utilization can be a user-defined
      value, say 10% or 20% (higher the utilization specified, lesser the amount of
      sleeps injected). This ebizzy was run with a single-thread, tied to CPU 8.
      
      Behavior observed with tracing (sample taken from 40% utilization runs):
      ------------------------------------------------------------------------
      
      Without patch:
      ~~~~~~~~~~~~~~
      kworker/8:2-12137  416.335742: cpu_frequency: state=2061000 cpu_id=8
      kworker/8:2-12137  416.335744: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
            <...>-40753  416.345741: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
      kworker/8:2-12137  416.345744: cpu_frequency: state=4123000 cpu_id=8
      kworker/8:2-12137  416.345746: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
            <...>-40753  416.355738: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
      <snip>  ---------------------------------------------------------------------  <snip>
            <...>-40753  416.402202: sched_switch: prev_comm=ebizzy ==> next_comm=swapper/8
           <idle>-0      416.502130: sched_switch: prev_comm=swapper/8 ==> next_comm=ebizzy
            <...>-40753  416.505738: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
      kworker/8:2-12137  416.505739: cpu_frequency: state=2061000 cpu_id=8
      kworker/8:2-12137  416.505741: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
            <...>-40753  416.515739: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
      kworker/8:2-12137  416.515742: cpu_frequency: state=4123000 cpu_id=8
      kworker/8:2-12137  416.515744: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
      
      Observation: Ebizzy went idle at 416.402202, and started running again at
      416.502130. But cpufreq noticed the long idle period, and dropped the frequency
      at 416.505739, only to increase it back again at 416.515742, realizing that the
      workload is in-fact CPU bound. Thus ebizzy needlessly ran at the lowest frequency
      for almost 13 milliseconds (almost 1 full sample period), and this pattern
      repeats on every sleep-wakeup. This could hurt latency-sensitive workloads quite
      a lot.
      
      With patch:
      ~~~~~~~~~~~
      
      kworker/8:2-29802  464.832535: cpu_frequency: state=2061000 cpu_id=8
      <snip>  ---------------------------------------------------------------------  <snip>
      kworker/8:2-29802  464.962538: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
            <...>-40738  464.972533: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
      kworker/8:2-29802  464.972536: cpu_frequency: state=4123000 cpu_id=8
      kworker/8:2-29802  464.972538: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
            <...>-40738  464.982531: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
      <snip>  ---------------------------------------------------------------------  <snip>
      kworker/8:2-29802  465.022533: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
            <...>-40738  465.032531: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
      kworker/8:2-29802  465.032532: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
            <...>-40738  465.035797: sched_switch: prev_comm=ebizzy ==> next_comm=swapper/8
           <idle>-0      465.240178: sched_switch: prev_comm=swapper/8 ==> next_comm=ebizzy
            <...>-40738  465.242533: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
      kworker/8:2-29802  465.242535: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
            <...>-40738  465.252531: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
      
      Observation: Ebizzy went idle at 465.035797, and started running again at
      465.240178. Since ebizzy was the only real workload running on this CPU,
      cpufreq retained the frequency at 4.1Ghz throughout the run of ebizzy, no
      matter how many times ebizzy slept and woke-up in-between. Thus, ebizzy
      got the 10ms worth of 4.1 Ghz benefit during every sleep-wakeup (as compared
      to the run without the patch) and this boost gave a modest improvement in total
      throughput, as shown below.
      
      Sleeping-ebizzy records-per-second:
      -----------------------------------
      
      Utilization  Without patch  With patch  Difference (Absolute and % values)
          10%         274767        277046        +  2279 (+0.829%)
          20%         543429        553484        + 10055 (+1.850%)
          40%        1090744       1107959        + 17215 (+1.578%)
          60%        1634908       1662018        + 27110 (+1.658%)
      
      A rudimentary and somewhat approximately latency-sensitive workload such as
      sleeping-ebizzy itself showed a consistent, noticeable performance improvement
      with this patch. Hence, workloads that are truly latency-sensitive will benefit
      quite a bit from this change. Moreover, this is an overall win-win since this
      patch does not hurt power-savings at all (because, this patch does not reduce
      the idle time or idle residency; and the high frequency of the CPU when it goes
      to cpu-idle does not affect/hurt the power-savings of deep idle states).
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Reviewed-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      18b46abd
  13. 07 6月, 2014 2 次提交
  14. 06 6月, 2014 2 次提交
    • V
      cpufreq: Tegra: implement intermediate frequency callbacks · 00917ddc
      Viresh Kumar 提交于
      Tegra has been switching to intermediate frequency (pll_p_clk) forever.
      CPUFreq core has better support for handling notifications for these
      frequencies and so we can adapt Tegra's driver to it.
      
      Also do a WARN() if clk_set_parent() fails while moving back to pll_x
      as we should have atleast restored to earlier frequency on error.
      Tested-by: NStephen Warren <swarren@nvidia.com>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Reviewed-by: NDoug Anderson <dianders@chromium.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      00917ddc
    • V
      cpufreq: add support for intermediate (stable) frequencies · 1c03a2d0
      Viresh Kumar 提交于
      Douglas Anderson, recently pointed out an interesting problem due to which
      udelay() was expiring earlier than it should.
      
      While transitioning between frequencies few platforms may temporarily switch to
      a stable frequency, waiting for the main PLL to stabilize.
      
      For example: When we transition between very low frequencies on exynos, like
      between 200MHz and 300MHz, we may temporarily switch to a PLL running at 800MHz.
      No CPUFREQ notification is sent for that. That means there's a period of time
      when we're running at 800MHz but loops_per_jiffy is calibrated at between 200MHz
      and 300MHz. And so udelay behaves badly.
      
      To get this fixed in a generic way, introduce another set of callbacks
      get_intermediate() and target_intermediate(), only for drivers with
      target_index() and CPUFREQ_ASYNC_NOTIFICATION unset.
      
      get_intermediate() should return a stable intermediate frequency platform wants
      to switch to, and target_intermediate() should set CPU to that frequency,
      before jumping to the frequency corresponding to 'index'. Core will take care of
      sending notifications and driver doesn't have to handle them in
      target_intermediate() or target_index().
      
      NOTE: ->target_index() should restore to policy->restore_freq in case of
      failures as core would send notifications for that.
      Tested-by: NStephen Warren <swarren@nvidia.com>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Reviewed-by: NDoug Anderson <dianders@chromium.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      1c03a2d0
  15. 02 6月, 2014 4 次提交
  16. 31 5月, 2014 1 次提交
  17. 29 5月, 2014 1 次提交
  18. 27 5月, 2014 1 次提交