1. 10 4月, 2018 1 次提交
  2. 08 2月, 2018 1 次提交
  3. 12 1月, 2018 2 次提交
  4. 29 8月, 2017 1 次提交
  5. 18 8月, 2017 1 次提交
  6. 11 8月, 2017 1 次提交
  7. 10 8月, 2017 2 次提交
  8. 04 8月, 2017 1 次提交
    • S
      cpufreq: intel_pstate: Improve IO performance with per-core P-states · 7bde2d50
      Srinivas Pandruvada 提交于
      In the current implementation, the response latency between seeing
      SCHED_CPUFREQ_IOWAIT set and the actual P-state adjustment can be up
      to 10ms.  It can be reduced by bumping up the P-state to the max at
      the time SCHED_CPUFREQ_IOWAIT is passed to intel_pstate_update_util().
      With this change, the IO performance improves significantly.
      
      For a simple "grep -r . linux" (Here linux is the kernel source
      folder) with caches dropped every time on a Broadwell Xeon workstation
      with per-core P-states, the user and system time is shorter by as much
      as 30% - 40%.
      
      The same performance difference was not observed on clients that don't
      support per-core P-state.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      [ rjw: Changelog ]
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      7bde2d50
  9. 01 8月, 2017 2 次提交
    • V
      sched: cpufreq: Allow remote cpufreq callbacks · 674e7541
      Viresh Kumar 提交于
      With Android UI and benchmarks the latency of cpufreq response to
      certain scheduling events can become very critical. Currently, callbacks
      into cpufreq governors are only made from the scheduler if the target
      CPU of the event is the same as the current CPU. This means there are
      certain situations where a target CPU may not run the cpufreq governor
      for some time.
      
      One testcase to show this behavior is where a task starts running on
      CPU0, then a new task is also spawned on CPU0 by a task on CPU1. If the
      system is configured such that the new tasks should receive maximum
      demand initially, this should result in CPU0 increasing frequency
      immediately. But because of the above mentioned limitation though, this
      does not occur.
      
      This patch updates the scheduler core to call the cpufreq callbacks for
      remote CPUs as well.
      
      The schedutil, ondemand and conservative governors are updated to
      process cpufreq utilization update hooks called for remote CPUs where
      the remote CPU is managed by the cpufreq policy of the local CPU.
      
      The intel_pstate driver is updated to always reject remote callbacks.
      
      This is tested with couple of usecases (Android: hackbench, recentfling,
      galleryfling, vellamo, Ubuntu: hackbench) on ARM hikey board (64 bit
      octa-core, single policy). Only galleryfling showed minor improvements,
      while others didn't had much deviation.
      
      The reason being that this patch only targets a corner case, where
      following are required to be true to improve performance and that
      doesn't happen too often with these tests:
      
      - Task is migrated to another CPU.
      - The task has high demand, and should take the target CPU to higher
        OPPs.
      - And the target CPU doesn't call into the cpufreq governor until the
        next tick.
      
      Based on initial work from Steve Muckle.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Acked-by: NSaravana Kannan <skannan@codeaurora.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      674e7541
    • R
      cpufreq: intel_pstate: Drop INTEL_PSTATE_HWP_SAMPLING_INTERVAL · f5c13f44
      Rafael J. Wysocki 提交于
      After commit 62611cb9 (intel_pstate: delete scheduler hook in HWP
      mode) the INTEL_PSTATE_HWP_SAMPLING_INTERVAL is not used anywhere in
      the code, so drop it.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      f5c13f44
  10. 28 7月, 2017 1 次提交
    • R
      cpufreq: intel_pstate: Drop ->get from intel_pstate structure · 22baebd4
      Rafael J. Wysocki 提交于
      The ->get callback in the intel_pstate structure was mostly there
      for the scaling_cur_freq sysfs attribute to work, but after commit
      f8475cef (x86: use common aperfmperf_khz_on_cpu() to calculate
      KHz using APERF/MPERF) that attribute uses arch_freq_get_on_cpu()
      provided by the x86 arch code on all processors supported by
      intel_pstate, so it doesn't need the ->get callback from the
      driver any more.
      
      Moreover, the very presence of the ->get callback in the intel_pstate
      structure causes the cpuinfo_cur_freq attribute to be present when
      intel_pstate operates in the active mode, which is bogus, because
      the role of that attribute is to return the current CPU frequency
      as seen by the hardware.  For intel_pstate, though, this is just an
      average frequency and not really current, but computed for the
      previous sampling interval (the actual current frequency may be
      way different at the point this value is obtained by reading from
      cpuinfo_cur_freq), and after commit 82b4e03e (intel_pstate: skip
      scheduler hook when in "performance" mode) the value in
      cpuinfo_cur_freq may be stale or just 0, depending on the driver's
      operation mode.  In fact, however, on the hardware supported by
      intel_pstate there is no way to read the current CPU frequency
      from it, so the cpuinfo_cur_freq attribute should not be present
      at all when this driver is in use.
      
      For this reason, drop intel_pstate_get() and clear the ->get
      callback pointer pointing to it, so that the cpuinfo_cur_freq is
      not present for intel_pstate in the active mode any more.
      
      Fixes: 82b4e03e (intel_pstate: skip scheduler hook when in "performance" mode)
      Reported-by: NHuaisheng Ye <yehs1@lenovo.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      22baebd4
  11. 27 7月, 2017 2 次提交
    • R
      cpufreq: intel_pstate: Drop ->update_util from pstate_funcs · c4f3f70c
      Rafael J. Wysocki 提交于
      All systems use the same P-state selection "powersave" algorithm
      in the active mode if HWP is not used, so there's no need to provide
      a pointer for it in struct pstate_funcs any more.
      
      Drop ->update_util from struct pstate_funcs and make
      intel_pstate_set_update_util_hook() use intel_pstate_update_util()
      directly.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      c4f3f70c
    • R
      cpufreq: intel_pstate: Do not use PID-based P-state selection · 9d0ef7af
      Rafael J. Wysocki 提交于
      All systems with a defined ACPI preferred profile that are not
      "servers" have been using the load-based P-state selection algorithm
      in intel_pstate since 4.12-rc1 (mobile systems and laptops have been
      using it since 4.10-rc1) and no problems with it have been reported
      to date.  In particular, no regressions with respect to the PID-based
      P-state selection have been reported.  Also testing indicates that
      the P-state selection algorithm based on CPU load is generally on par
      with the PID-based algorithm performance-wise, and for some workloads
      it turns out to be better than the other one, while being more
      straightforward and easier to understand at the same time.
      
      Moreover, the PID-based P-state selection algorithm in intel_pstate
      is known to be unstable in some situation and generally problematic,
      the issues with it are hard to address and it has become a
      significant maintenance burden.
      
      For these reasons, make intel_pstate use the "powersave" P-state
      selection algorithm based on CPU load in the active mode on all
      systems and drop the PID-based P-state selection code along with
      all things related to it from the driver.  Also update the
      documentation accordingly.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      9d0ef7af
  12. 26 7月, 2017 1 次提交
  13. 14 7月, 2017 1 次提交
    • S
      cpufreq: intel_pstate: Correct the busy calculation for KNL · 6e34e1f2
      Srinivas Pandruvada 提交于
      The busy percent calculated for the Knights Landing (KNL) platform
      is 1024 times smaller than the correct busy value.  This causes
      performance to get stuck at the lowest ratio.
      
      The scaling algorithm used for KNL is performance-based, but it still
      looks at the CPU load to set the scaled busy factor to 0 when the
      load is less than 1 percent.  In this case, since the computed load
      is 1024x smaller than it should be, the scaled busy factor will
      always be 0, irrespective of CPU business.
      
      This needs a fix similar to the turbostat one in commit b2b34dfe
      (tools/power turbostat: KNL workaround for %Busy and Avg_MHz).
      
      For this reason, add one more callback to processor-specific
      callbacks to specify an MPERF multiplier represented by a number of
      bit positions to shift the value of that register to the left to
      copmensate for its rate difference with respect to the TSC.  This
      shift value is used during CPU busy calculations.
      
      Fixes: ffb81056 (intel_pstate: Avoid getting stuck in high P-states when idle)
      Reported-and-tested-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Cc: 4.6+ <stable@vger.kernel.org> # 4.6+
      [ rjw: Changelog ]
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      6e34e1f2
  14. 12 7月, 2017 1 次提交
    • S
      cpufreq: intel_pstate: Fix ratio setting for min_perf_pct · d4436c0d
      Srinivas Pandruvada 提交于
      When the minimum performance limit percentage is set to the power-up
      default, it is possible that minimum performance ratio is off by one.
      
      In the set_policy() callback the minimum ratio is calculated by
      applying global.min_perf_pct to turbo_ratio and rounding up, but the
      power-up default global.min_perf_pct is already rounded up to the
      next percent in min_perf_pct_min().  That results in two round up
      operations, so for the default min_perf_pct one of them is not
      required.
      
      It is better to remove rounding up in min_perf_pct_min() as this
      matches the displayed min_perf_pct prior to commit c5a2ee7d
      (cpufreq: intel_pstate: Active mode P-state limits rework) in 4.12.
      
      For example on a platform with max turbo ratio of 37 and minimum
      ratio of 10, the min_perf_pct resulted in 28 with the above commit.
      Before this commit it was 27 and it will be the same after this
      change.
      
      Fixes: 1a4fe38a (cpufreq: intel_pstate: Remove max/min fractions to limit performance)
      Reported-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      d4436c0d
  15. 05 7月, 2017 1 次提交
  16. 30 6月, 2017 1 次提交
  17. 27 6月, 2017 2 次提交
  18. 24 6月, 2017 1 次提交
    • S
      cpufreq: intel_pstate: Remove max/min fractions to limit performance · 1a4fe38a
      Srinivas Pandruvada 提交于
      In the current model the max/min perf limits are a fraction of current
      user space limits to the allowed max_freq or 100% for global limits.
      This results in wrong ratio limits calculation because of rounding
      issues for some user space limits.
      
      Initially we tried to solve this issue by issue by having more shift
      bits to increase precision. Still there are isolated cases where we still
      have error.
      
      This can be avoided by using ratios all together. Since the way we get
      cpuinfo.max_freq is by multiplying scaling factor to max ratio, we can
      easily keep the max/min ratios in terms of ratios and not fractions.
      
      For example:
      if the max ratio = 36
      cpuinfo.max_freq = 36 * 100000 = 3600000
      
      Suppose user space sets a limit of 1200000, then we can calculate
      max ratio limit as
      = 36 * 1200000 / 3600000
      = 12
      This will be correct for any user limits.
      
      The other advantage is that, we don't need to do any calculation in the
      fast path as ratio limit is already calculated via set_policy() callback.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      1a4fe38a
  19. 05 6月, 2017 1 次提交
  20. 12 5月, 2017 1 次提交
    • L
      intel_pstate: use updated msr-index.h HWP.EPP values · 3cedbc5a
      Len Brown 提交于
      intel_pstate exports sysfs attributes for setting and observing HWP.EPP.
      These attributes use strings to describe 4 operating states, and
      inside the driver, these strings are mapped to numerical register
      values.
      
      The authorative mapping between the strings and numerical HWP.EPP values
      are now globally defined in msr-index.h, replacing the out-dated
      mapping that were open-coded into intel_pstate.c
      
      new old string
      --- --- ------
        0   0 performance
      128  64 balance_performance
      192 128 balance_power
      255 192 power
      
      Note that the HW and BIOS default value on most system is 128,
      which intel_pstate will now call "balance_performance"
      while it used to call it "balance_power".
      Signed-off-by: NLen Brown <len.brown@intel.com>
      3cedbc5a
  21. 18 4月, 2017 1 次提交
  22. 30 3月, 2017 1 次提交
  23. 29 3月, 2017 13 次提交
    • R
      cpufreq: intel_pstate: Eliminate intel_pstate_get_min_max() · b02aabe8
      Rafael J. Wysocki 提交于
      Some computations in intel_pstate_get_min_max() are not necessary
      and one of its two callers doesn't even use the full result.
      
      First off, the fixed-point value of cpu->max_perf represents a
      non-negative number between 0 and 1 inclusive and cpu->min_perf
      cannot be greater than cpu->max_perf.  It is not necessary to check
      those conditions every time the numbers in question are used.
      
      Moreover, since intel_pstate_max_within_limits() only needs the
      upper boundary, it doesn't make sense to compute the lower one in
      there and returning min and max from intel_pstate_get_min_max()
      via pointers doesn't look particularly nice.
      
      For the above reasons, drop intel_pstate_get_min_max(), add a helper
      to get the base P-state for min/max computations and carry out them
      directly in the previous callers of intel_pstate_get_min_max().
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      b02aabe8
    • R
      cpufreq: intel_pstate: Do not walk policy->cpus · 2bfc4cbb
      Rafael J. Wysocki 提交于
      intel_pstate_hwp_set() is the only function walking policy->cpus
      in intel_pstate.  The rest of the code simply assumes one CPU per
      policy, including the initialization code.
      
      Therefore it doesn't make sense for intel_pstate_hwp_set() to
      walk policy->cpus as it is guaranteed to have only one bit set
      for policy->cpu.
      
      For this reason, rearrange intel_pstate_hwp_set() to take the CPU
      number as the argument and drop the loop over policy->cpus from it.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      2bfc4cbb
    • R
      cpufreq: intel_pstate: Introduce pid_in_use() · 8ca6ce37
      Rafael J. Wysocki 提交于
      Add a new function pid_in_use() to return the information on whether
      or not the PID-based P-state selection algorithm is in use.
      
      That allows a couple of complicated conditions in the code to be
      reduced to simple checks against the new function's return value.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      8ca6ce37
    • R
      cpufreq: intel_pstate: Drop struct cpu_defaults · 2f49afc2
      Rafael J. Wysocki 提交于
      The cpu_defaults structure is redundant, because it only contains
      one member of type struct pstate_funcs which can be used directly
      instead of struct cpu_defaults.
      
      For this reason, drop struct cpu_defaults, use struct pstate_funcs
      directly instead of it where applicable and rename all of the
      variables of that type accordingly.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      2f49afc2
    • R
      cpufreq: intel_pstate: Move cpu_defaults definitions · de4a76cb
      Rafael J. Wysocki 提交于
      Move the definitions of the cpu_defaults structures after the
      definitions of utilization update callback routines to avoid
      extra declarations of the latter.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      de4a76cb
    • R
      cpufreq: intel_pstate: Add update_util callback to pstate_funcs · 67dd9bf4
      Rafael J. Wysocki 提交于
      Avoid using extra function pointers during P-state selection by
      dropping the get_target_pstate member from struct pstate_funcs,
      adding a new update_util callback to it (to be registered with
      the CPU scheduler as the utilization update callback in the active
      mode) and reworking the utilization update callback routines to
      invoke specific P-state selection functions directly.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      67dd9bf4
    • R
      cpufreq: intel_pstate: Use different utilization update callbacks · eabd22c6
      Rafael J. Wysocki 提交于
      Notice that some overhead in the utilization update callbacks
      registered by intel_pstate in the active mode can be avoided if
      those callbacks are tailored to specific configurations of the
      driver.  For example, the utilization update callback for the HWP
      enabled case only needs to update the average CPU performance
      periodically whereas the utilization update callback for the
      PID-based algorithm does not need to take IO-wait boosting into
      account and so on.
      
      With that in mind, define three utilization update callbacks for
      three different use cases: HWP enabled, the CPU load "powersave"
      P-state selection algorithm and the PID-based "powersave" P-state
      selection algorithm and modify the driver initialization to
      choose the callback matching its current configuration.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      eabd22c6
    • R
      cpufreq: intel_pstate: Modify check in intel_pstate_update_status() · 0042b2c0
      Rafael J. Wysocki 提交于
      One of the checks in intel_pstate_update_status() implicitly relies
      on the information that there are only two struct cpufreq_driver
      objects available, but it is better to do it directly against the
      value it really is about (to make the code easier to follow if
      nothing else).
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      0042b2c0
    • R
      cpufreq: intel_pstate: Drop driver_registered variable · ee8df89a
      Rafael J. Wysocki 提交于
      The driver_registered variable in intel_pstate is used for checking
      whether or not the driver has been registered, but intel_pstate_driver
      can be used for that too (with the rule that the driver is not
      registered as long as it is NULL).
      
      That is a bit more straightforward and the code may be simplified
      a bit this way, so modify the driver accordingly.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      ee8df89a
    • R
      cpufreq: intel_pstate: Skip unnecessary PID resets on init · 694cb173
      Rafael J. Wysocki 提交于
      PID controller parameters only need to be initialized if the
      get_target_pstate_use_performance() P-state selection routine
      is going to be used.  It is not necessary to initialize them
      otherwise, so don't do that.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      694cb173
    • R
      cpufreq: intel_pstate: Set HWP sampling interval once · 7aec5b50
      Rafael J. Wysocki 提交于
      In the HWP enabled case pid_params.sample_rate_ns only needs to be
      updated once, because it is global, so do that when setting hwp_active
      instead of doing it during the initialization of every CPU.
      
      Moreover, pid_params.sample_rate_ms is never used if HWP is enabled,
      so do not update it at all then.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      7aec5b50
    • R
      cpufreq: intel_pstate: Clean up intel_pstate_busy_pid_reset() · ff35f02e
      Rafael J. Wysocki 提交于
      intel_pstate_busy_pid_reset() is the only caller of pid_reset(),
      pid_p_gain_set(), pid_i_gain_set(), and pid_d_gain_set().  Moreover,
      it passes constants as two parameters of pid_reset() and all of
      the other routines above essentially contain the same code, so
      fold all of them into the caller and drop unnecessary computations.
      
      Introduce percent_fp() for converting integer values in percent
      to fixed-point fractions and use it in the above code cleanup.
      
      Finally, rename intel_pstate_busy_pid_reset() to
      intel_pstate_pid_reset() as it also is used for the
      initialization of PID parameters for every CPU and the
      meaning of the "busy" part of the name is not particularly
      clear.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      ff35f02e
    • R
      cpufreq: intel_pstate: Fold intel_pstate_reset_all_pid() into the caller · 4ddd0146
      Rafael J. Wysocki 提交于
      There is only one caller of intel_pstate_reset_all_pid(), which is
      pid_param_set() used in the debugfs interface only, and having that
      code split does not make it particularly convenient to follow.
      
      For this reason, move the body of intel_pstate_reset_all_pid() into
      its caller and drop that function.
      
      Also change the loop from for_each_online_cpu() (which is obviously
      racy with respect to CPU offline/online) to for_each_possible_cpu(),
      so that all PID parameters are reset for all CPUs regardless of their
      online/offline status (to prevent, for example, a previously offline
      CPU from going online with a stale set of PID parameters).
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      4ddd0146