1. 09 11月, 2017 1 次提交
    • V
      cpufreq: schedutil: Reset cached_raw_freq when not in sync with next_freq · 07458f6a
      Viresh Kumar 提交于
      'cached_raw_freq' is used to get the next frequency quickly but should
      always be in sync with sg_policy->next_freq. There is a case where it is
      not and in such cases it should be reset to avoid switching to incorrect
      frequencies.
      
      Consider this case for example:
      
       - policy->cur is 1.2 GHz (Max)
       - New request comes for 780 MHz and we store that in cached_raw_freq.
       - Based on 780 MHz, we calculate the effective frequency as 800 MHz.
       - We then see the CPU wasn't idle recently and choose to keep the next
         freq as 1.2 GHz.
       - Now we have cached_raw_freq is 780 MHz and sg_policy->next_freq is
         1.2 GHz.
       - Now if the utilization doesn't change in then next request, then the
         next target frequency will still be 780 MHz and it will match with
         cached_raw_freq. But we will choose 1.2 GHz instead of 800 MHz here.
      
      Fixes: b7eaf1aa (cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely)
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Cc: 4.12+ <stable@vger.kernel.org> # 4.12+
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      07458f6a
  2. 05 11月, 2017 1 次提交
  3. 18 8月, 2017 2 次提交
    • V
      cpufreq: schedutil: Always process remote callback with slow switching · c49cbc19
      Viresh Kumar 提交于
      The frequency update from the utilization update handlers can be divided
      into two parts:
      
      (A) Finding the next frequency
      (B) Updating the frequency
      
      While any CPU can do (A), (B) can be restricted to a group of CPUs only,
      depending on the current platform.
      
      For platforms where fast cpufreq switching is possible, both (A) and (B)
      are always done from the same CPU and that CPU should be capable of
      changing the frequency of the target CPU.
      
      But for platforms where fast cpufreq switching isn't possible, after
      doing (A) we wake up a kthread which will eventually do (B). This
      kthread is already bound to the right set of CPUs, i.e. only those which
      can change the frequency of CPUs of a cpufreq policy. And so any CPU
      can actually do (A) in this case, as the frequency is updated from the
      right set of CPUs only.
      
      Check cpufreq_can_do_remote_dvfs() only for the fast switching case.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      c49cbc19
    • V
      cpufreq: schedutil: Don't restrict kthread to related_cpus unnecessarily · e2cabe48
      Viresh Kumar 提交于
      Utilization update callbacks are now processed remotely, even on the
      CPUs that don't share cpufreq policy with the target CPU (if
      dvfs_possible_from_any_cpu flag is set).
      
      But in non-fast switch paths, the frequency is changed only from one of
      policy->related_cpus. This happens because the kthread which does the
      actual update is bound to a subset of CPUs (i.e. related_cpus).
      
      Allow frequency to be remotely updated as well (i.e. call
      __cpufreq_driver_target()) if dvfs_possible_from_any_cpu flag is set.
      Reported-by: NPavan Kondeti <pkondeti@codeaurora.org>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      e2cabe48
  4. 10 8月, 2017 1 次提交
    • V
      cpufreq: Return 0 from ->fast_switch() on errors · 209887e6
      Viresh Kumar 提交于
      CPUFREQ_ENTRY_INVALID is a special symbol which is used to specify that
      an entry in the cpufreq table is invalid. But using it outside of the
      scope of the cpufreq table looks a bit incorrect.
      
      We can represent an invalid frequency by writing it as 0 instead if we
      need. Note that it is already done that way for the return value of the
      ->get() callback.
      
      Lets do the same for ->fast_switch() and not use CPUFREQ_ENTRY_INVALID
      outside of the scope of cpufreq table.
      
      Also update the comment over cpufreq_driver_fast_switch() to clearly
      mention what this returns.
      
      None of the drivers return CPUFREQ_ENTRY_INVALID as of now from
      ->fast_switch() callback and so we don't need to update any of those.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      209887e6
  5. 01 8月, 2017 1 次提交
    • V
      sched: cpufreq: Allow remote cpufreq callbacks · 674e7541
      Viresh Kumar 提交于
      With Android UI and benchmarks the latency of cpufreq response to
      certain scheduling events can become very critical. Currently, callbacks
      into cpufreq governors are only made from the scheduler if the target
      CPU of the event is the same as the current CPU. This means there are
      certain situations where a target CPU may not run the cpufreq governor
      for some time.
      
      One testcase to show this behavior is where a task starts running on
      CPU0, then a new task is also spawned on CPU0 by a task on CPU1. If the
      system is configured such that the new tasks should receive maximum
      demand initially, this should result in CPU0 increasing frequency
      immediately. But because of the above mentioned limitation though, this
      does not occur.
      
      This patch updates the scheduler core to call the cpufreq callbacks for
      remote CPUs as well.
      
      The schedutil, ondemand and conservative governors are updated to
      process cpufreq utilization update hooks called for remote CPUs where
      the remote CPU is managed by the cpufreq policy of the local CPU.
      
      The intel_pstate driver is updated to always reject remote callbacks.
      
      This is tested with couple of usecases (Android: hackbench, recentfling,
      galleryfling, vellamo, Ubuntu: hackbench) on ARM hikey board (64 bit
      octa-core, single policy). Only galleryfling showed minor improvements,
      while others didn't had much deviation.
      
      The reason being that this patch only targets a corner case, where
      following are required to be true to improve performance and that
      doesn't happen too often with these tests:
      
      - Task is migrated to another CPU.
      - The task has high demand, and should take the target CPU to higher
        OPPs.
      - And the target CPU doesn't call into the cpufreq governor until the
        next tick.
      
      Based on initial work from Steve Muckle.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Acked-by: NSaravana Kannan <skannan@codeaurora.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      674e7541
  6. 27 7月, 2017 2 次提交
  7. 26 7月, 2017 1 次提交
  8. 22 7月, 2017 1 次提交
  9. 12 7月, 2017 1 次提交
    • V
      cpufreq: schedutil: Fix sugov_start() versus sugov_update_shared() race · ab2f7cf1
      Vikram Mulukutla 提交于
      With a shared policy in place, when one of the CPUs in the policy is
      hotplugged out and then brought back online, sugov_stop() and
      sugov_start() are called in order.
      
      sugov_stop() removes utilization hooks for each CPU in the policy and
      does nothing else in the for_each_cpu() loop. sugov_start() on the
      other hand iterates through the CPUs in the policy and re-initializes
      the per-cpu structure _and_ adds the utilization hook.  This implies
      that the scheduler is allowed to invoke a CPU's utilization update
      hook when the rest of the per-cpu structures have yet to be
      re-inited.
      
      Apart from some strange values in tracepoints this doesn't cause a
      problem, but if we do end up accessing a pointer from the per-cpu
      sugov_cpu structure somewhere in the sugov_update_shared() path,
      we will likely see crashes since the memset for another CPU in the
      policy is free to race with sugov_update_shared from the CPU that is
      ready to go.  So let's fix this now to first init all per-cpu
      structures, and then add the per-cpu utilization update hooks all at
      once.
      Signed-off-by: NVikram Mulukutla <markivx@codeaurora.org>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      ab2f7cf1
  10. 12 6月, 2017 1 次提交
  11. 06 5月, 2017 1 次提交
    • J
      cpufreq: schedutil: use now as reference when aggregating shared policy requests · d86ab9cf
      Juri Lelli 提交于
      Currently, sugov_next_freq_shared() uses last_freq_update_time as a
      reference to decide when to start considering CPU contributions as
      stale.
      
      However, since last_freq_update_time is set by the last CPU that issued
      a frequency transition, this might cause problems in certain cases. In
      practice, the detection of stale utilization values fails whenever the
      CPU with such values was the last to update the policy. For example (and
      please note again that the SCHED_CPUFREQ_RT flag is not the problem
      here, but only the detection of after how much time that flag has to be
      considered stale), suppose a policy with 2 CPUs:
      
                     CPU0                |               CPU1
                                         |
                                         |     RT task scheduled
                                         |     SCHED_CPUFREQ_RT is set
                                         |     CPU1->last_update = now
                                         |     freq transition to max
                                         |     last_freq_update_time = now
                                         |
      
                              more than TICK_NSEC nsecs
      
                                         |
           a small CFS wakes up          |
           CPU0->last_update = now1      |
           delta_ns(CPU0) < TICK_NSEC*   |
           CPU0's util is considered     |
           delta_ns(CPU1) =              |
            last_freq_update_time -      |
            CPU1->last_update = 0        |
            < TICK_NSEC                  |
           CPU1 is still considered      |
           CPU1->SCHED_CPUFREQ_RT is set |
           we stay at max (until CPU1    |
           exits from idle)              |
      
      * delta_ns is actually negative as now1 > last_freq_update_time
      
      While last_freq_update_time is a sensible reference for rate limiting,
      it doesn't seem to be useful for working around stale CPU states.
      
      Fix the problem by always considering now (time) as the reference for
      deciding when CPUs have stale contributions.
      Signed-off-by: NJuri Lelli <juri.lelli@arm.com>
      Acked-by: NVincent Guittot <vincent.guittot@linaro.org>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      d86ab9cf
  12. 18 4月, 2017 1 次提交
  13. 13 4月, 2017 1 次提交
  14. 24 3月, 2017 1 次提交
    • R
      cpufreq: schedutil: Trace frequency only if it has changed · 38d4ea22
      Rafael J. Wysocki 提交于
      sugov_update_commit() calls trace_cpu_frequency() to record the
      current CPU frequency if it has not changed in the fast switch case
      to prevent utilities from getting confused (they may report that the
      CPU is idle if the frequency has not been recorded for too long, for
      example).
      
      However, that may cause the tracepoint to be triggered quite often
      for no real reason (if the frequency doesn't change, we will not
      modify the last update time stamp and governor computations may
      run again shortly when that happens), so don't do that (arguably, it
      is done to work around a utilities bug anyway).
      
      That allows code duplication in sugov_update_commit() to be reduced
      somewhat too.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      38d4ea22
  15. 23 3月, 2017 1 次提交
    • R
      cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely · b7eaf1aa
      Rafael J. Wysocki 提交于
      The way the schedutil governor uses the PELT metric causes it to
      underestimate the CPU utilization in some cases.
      
      That can be easily demonstrated by running kernel compilation on
      a Sandy Bridge Intel processor, running turbostat in parallel with
      it and looking at the values written to the MSR_IA32_PERF_CTL
      register.  Namely, the expected result would be that when all CPUs
      were 100% busy, all of them would be requested to run in the maximum
      P-state, but observation shows that this clearly isn't the case.
      The CPUs run in the maximum P-state for a while and then are
      requested to run slower and go back to the maximum P-state after
      a while again.  That causes the actual frequency of the processor to
      visibly oscillate below the sustainable maximum in a jittery fashion
      which clearly is not desirable.
      
      That has been attributed to CPU utilization metric updates on task
      migration that cause the total utilization value for the CPU to be
      reduced by the utilization of the migrated task.  If that happens,
      the schedutil governor may see a CPU utilization reduction and will
      attempt to reduce the CPU frequency accordingly right away.  That
      may be premature, though, for example if the system is generally
      busy and there are other runnable tasks waiting to be run on that
      CPU already.
      
      This is unlikely to be an issue on systems where cpufreq policies are
      shared between multiple CPUs, because in those cases the policy
      utilization is computed as the maximum of the CPU utilization values
      over the whole policy and if that turns out to be low, reducing the
      frequency for the policy most likely is a good idea anyway.  On
      systems with one CPU per policy, however, it may affect performance
      adversely and even lead to increased energy consumption in some cases.
      
      On those systems it may be addressed by taking another utilization
      metric into consideration, like whether or not the CPU whose
      frequency is about to be reduced has been idle recently, because if
      that's not the case, the CPU is likely to be busy in the near future
      and its frequency should not be reduced.
      
      To that end, use the counter of idle calls in the timekeeping code.
      Namely, make the schedutil governor look at that counter for the
      current CPU every time before its frequency is about to be reduced.
      If the counter has not changed since the previous iteration of the
      governor computations for that CPU, the CPU has been busy for all
      that time and its frequency should not be decreased, so if the new
      frequency would be lower than the one set previously, the governor
      will skip the frequency update.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Reviewed-by: NJoel Fernandes <joelaf@google.com>
      b7eaf1aa
  16. 21 3月, 2017 1 次提交
  17. 13 3月, 2017 2 次提交
    • V
      cpufreq: schedutil: Refactor sugov_next_freq_shared() · cba1dfb5
      Viresh Kumar 提交于
      The loop in sugov_next_freq_shared() contains an if block to skip the
      loop for the current CPU. This turns out to be an unnecessary
      conditional in the scheduler's hot-path for every CPU in the policy.
      
      It would be better to drop the conditional and make the loop treat all
      the CPUs in the same way. That would eliminate the need of calling
      sugov_iowait_boost() at the top of the routine.
      
      To keep the code optimized to return early if the current CPU has RT/DL
      flags set, move the flags check to sugov_update_shared() instead in
      order to avoid the function call entirely.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      cba1dfb5
    • V
      cpufreq: schedutil: Redefine the rate_limit_us tunable · 994a8f25
      Viresh Kumar 提交于
      The rate_limit_us tunable is intended to reduce the possible overhead
      from running the schedutil governor.  However, that overhead can be
      divided into two separate parts: the governor computations and the
      invocation of the scaling driver to set the CPU frequency.  The latter
      is where the real overhead comes from.  The former is much less
      expensive in terms of execution time and running it every time the
      governor callback is invoked by the scheduler, after rate_limit_us
      interval has passed since the last frequency update, would not be a
      problem.
      
      For this reason, redefine the rate_limit_us tunable so that it means the
      minimum time that has to pass between two consecutive invocations of the
      scaling driver by the schedutil governor (to set the CPU frequency).
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      994a8f25
  18. 06 3月, 2017 2 次提交
  19. 02 3月, 2017 1 次提交
  20. 25 11月, 2016 1 次提交
  21. 17 11月, 2016 4 次提交
  22. 14 9月, 2016 1 次提交
  23. 01 9月, 2016 1 次提交
  24. 17 8月, 2016 1 次提交
    • R
      cpufreq / sched: Pass flags to cpufreq_update_util() · 58919e83
      Rafael J. Wysocki 提交于
      It is useful to know the reason why cpufreq_update_util() has just
      been called and that can be passed as flags to cpufreq_update_util()
      and to the ->func() callback in struct update_util_data.  However,
      doing that in addition to passing the util and max arguments they
      already take would be clumsy, so avoid it.
      
      Instead, use the observation that the schedutil governor is part
      of the scheduler proper, so it can access scheduler data directly.
      This allows the util and max arguments of cpufreq_update_util()
      and the ->func() callback in struct update_util_data to be replaced
      with a flags one, but schedutil has to be modified to follow.
      
      Thus make the schedutil governor obtain the CFS utilization
      information from the scheduler and use the "RT" and "DL" flags
      instead of the special utilization value of ULONG_MAX to track
      updates from the RT and DL sched classes.  Make it non-modular
      too to avoid having to export scheduler variables to modules at
      large.
      
      Next, update all of the other users of cpufreq_update_util()
      and the ->func() callback in struct update_util_data accordingly.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      58919e83
  25. 22 7月, 2016 1 次提交
    • S
      cpufreq: schedutil: map raw required frequency to driver frequency · 5cbea469
      Steve Muckle 提交于
      The slow-path frequency transition path is relatively expensive as it
      requires waking up a thread to do work. Should support be added for
      remote CPU cpufreq updates that is also expensive since it requires an
      IPI. These activities should be avoided if they are not necessary.
      
      To that end, calculate the actual driver-supported frequency required by
      the new utilization value in schedutil by using the recently added
      cpufreq_driver_resolve_freq API. If it is the same as the previously
      requested driver frequency then there is no need to continue with the
      update assuming the cpu frequency limits have not changed. This will
      have additional benefits should the semantics of the rate limit be
      changed to apply solely to frequency transitions rather than to
      frequency calculations in schedutil.
      
      The last raw required frequency is cached. This allows the driver
      frequency lookup to be skipped in the event that the new raw required
      frequency matches the last one, assuming a frequency update has not been
      forced due to limits changing (indicated by a next_freq value of
      UINT_MAX, see sugov_should_update_freq).
      Signed-off-by: NSteve Muckle <smuckle@linaro.org>
      Reviewed-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      5cbea469
  26. 03 6月, 2016 2 次提交
  27. 19 5月, 2016 1 次提交
  28. 09 4月, 2016 1 次提交
    • R
      cpufreq: Call cpufreq_disable_fast_switch() in sugov_exit() · 6c9d9c81
      Rafael J. Wysocki 提交于
      Due to differences in the cpufreq core's handling of runtime CPU
      offline and nonboot CPUs disabling during system suspend-to-RAM,
      fast frequency switching gets disabled after a suspend-to-RAM and
      resume cycle on all of the nonboot CPUs.
      
      To prevent that from happening, move the invocation of
      cpufreq_disable_fast_switch() from cpufreq_exit_governor() to
      sugov_exit(), as the schedutil governor is the only user of fast
      frequency switching today anyway.
      
      That simply prevents cpufreq_disable_fast_switch() from being called
      without invoking the ->governor callback for the CPUFREQ_GOV_POLICY_EXIT
      event (which happens during system suspend now).
      
      Fixes: b7898fda (cpufreq: Support for fast frequency switching)
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      6c9d9c81
  29. 02 4月, 2016 1 次提交
    • R
      cpufreq: schedutil: New governor based on scheduler utilization data · 9bdcb44e
      Rafael J. Wysocki 提交于
      Add a new cpufreq scaling governor, called "schedutil", that uses
      scheduler-provided CPU utilization information as input for making
      its decisions.
      
      Doing that is possible after commit 34e2c555 (cpufreq: Add
      mechanism for registering utilization update callbacks) that
      introduced cpufreq_update_util() called by the scheduler on
      utilization changes (from CFS) and RT/DL task status updates.
      In particular, CPU frequency scaling decisions may be based on
      the the utilization data passed to cpufreq_update_util() by CFS.
      
      The new governor is relatively simple.
      
      The frequency selection formula used by it depends on whether or not
      the utilization is frequency-invariant.  In the frequency-invariant
      case the new CPU frequency is given by
      
      	next_freq = 1.25 * max_freq * util / max
      
      where util and max are the last two arguments of cpufreq_update_util().
      In turn, if util is not frequency-invariant, the maximum frequency in
      the above formula is replaced with the current frequency of the CPU:
      
      	next_freq = 1.25 * curr_freq * util / max
      
      The coefficient 1.25 corresponds to the frequency tipping point at
      (util / max) = 0.8.
      
      All of the computations are carried out in the utilization update
      handlers provided by the new governor.  One of those handlers is
      used for cpufreq policies shared between multiple CPUs and the other
      one is for policies with one CPU only (and therefore it doesn't need
      to use any extra synchronization means).
      
      The governor supports fast frequency switching if that is supported
      by the cpufreq driver in use and possible for the given policy.
      In the fast switching case, all operations of the governor take
      place in its utilization update handlers.  If fast switching cannot
      be used, the frequency switch operations are carried out with the
      help of a work item which only calls __cpufreq_driver_target()
      (under a mutex) to trigger a frequency update (to a value already
      computed beforehand in one of the utilization update handlers).
      
      Currently, the governor treats all of the RT and DL tasks as
      "unknown utilization" and sets the frequency to the allowed
      maximum when updated from the RT or DL sched classes.  That
      heavy-handed approach should be replaced with something more
      subtle and specifically targeted at RT and DL tasks.
      
      The governor shares some tunables management code with the
      "ondemand" and "conservative" governors and uses some common
      definitions from cpufreq_governor.h, but apart from that it
      is stand-alone.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      9bdcb44e