1. 02 9月, 2020 2 次提交
  2. 11 8月, 2020 1 次提交
    • R
      cpufreq: intel_pstate: Implement passive mode with HWP enabled · f6ebbcf0
      Rafael J. Wysocki 提交于
      Allow intel_pstate to work in the passive mode with HWP enabled and
      make it set the HWP minimum performance limit (HWP floor) to the
      P-state value given by the target frequency supplied by the cpufreq
      governor, so as to prevent the HWP algorithm and the CPU scheduler
      from working against each other, at least when the schedutil governor
      is in use, and update the intel_pstate documentation accordingly.
      
      Among other things, this allows utilization clamps to be taken
      into account, at least to a certain extent, when intel_pstate is
      in use and makes it more likely that sufficient capacity for
      deadline tasks will be provided.
      
      After this change, the resulting behavior of an HWP system with
      intel_pstate in the passive mode should be close to the behavior
      of the analogous non-HWP system with intel_pstate in the passive
      mode, except that the HWP algorithm is generally allowed to make the
      CPU run at a frequency above the floor P-state set by intel_pstate in
      the entire available range of P-states, while without HWP a CPU can
      run in a P-state above the requested one if the latter falls into the
      range of turbo P-states (referred to as the turbo range) or if the
      P-states of all CPUs in one package are coordinated with each other
      at the hardware level.
      
      [Note that in principle the HWP floor may not be taken into account
       by the processor if it falls into the turbo range, in which case the
       processor has a license to choose any P-state, either below or above
       the HWP floor, just like a non-HWP processor in the case when the
       target P-state falls into the turbo range.]
      
      With this change applied, intel_pstate in the passive mode assumes
      complete control over the HWP request MSR and concurrent changes of
      that MSR (eg. via the direct MSR access interface) are overridden by
      it.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Reviewed-by: NFrancisco Jerez <currojerez@riseup.net>
      f6ebbcf0
  3. 04 8月, 2020 1 次提交
    • S
      cpufreq: intel_pstate: Fix cpuinfo_max_freq when MSR_TURBO_RATIO_LIMIT is 0 · 4daca379
      Srinivas Pandruvada 提交于
      The MSR_TURBO_RATIO_LIMIT can be 0. This is not an error. User can update
      this MSR via BIOS settings on some systems or can use msr tools to update.
      Also some systems boot with value = 0.
      
      This results in display of cpufreq/cpuinfo_max_freq wrong. This value
      will be equal to cpufreq/base_frequency, even though turbo is enabled.
      
      But platform will still function normally in HWP mode as we get max
      1-core frequency from the MSR_HWP_CAPABILITIES. This MSR is already used
      to calculate cpu->pstate.turbo_freq, which is used for to set
      policy->cpuinfo.max_freq. But some other places cpu->pstate.turbo_pstate
      is used. For example to set policy->max.
      
      To fix this, also update cpu->pstate.turbo_pstate when updating
      cpu->pstate.turbo_freq.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      4daca379
  4. 31 7月, 2020 2 次提交
  5. 16 7月, 2020 2 次提交
  6. 15 7月, 2020 1 次提交
  7. 13 7月, 2020 2 次提交
  8. 02 7月, 2020 2 次提交
    • S
      cpufreq: intel_pstate: Allow raw energy performance preference value · f473bf39
      Srinivas Pandruvada 提交于
      Currently using attribute "energy_performance_preference", user space can
      write one of the four per-defined preference string. These preference
      strings gets mapped to a hard-coded Energy-Performance Preference (EPP) or
      Energy-Performance Bias (EPB) knob.
      
      These four values are supposed to cover broad spectrum of use cases, but
      are not uniformly distributed in the range. There are number of cases,
      where this is not enough. For example:
      
      Suppose user wants more performance when connected to AC. Instead of using
      default "balance performance", the "performance" setting can be used. This
      changes EPP value from 0x80 to 0x00. But setting EPP to 0, results in
      electrical and thermal issues on some platforms. This results in
      aggressive throttling, which causes a drop in performance. But some value
      between 0x80 and 0x00 results in better performance. But that value can't
      be fixed as the power curve is not linear. In some cases just changing EPP
      from 0x80 to 0x75 is enough to get significant performance gain.
      
      Similarly on battery the default "balance_performance" mode can be
      aggressive in power consumption. But picking up the next choice
      "balance power" results in too much loss of performance, which results in
      bad user experience in use cases like "Google Hangout". It was observed
      that some value between these two EPP is optimal.
      
      This change allows fine grain EPP tuning for platform like Chromebook or
      for users who wants to fine tune power and performance.
      Here based on the product and use cases, different EPP values can be set.
      This change is similar to the change done for:
      /sys/devices/system/cpu/cpu*/power/energy_perf_bias
      where user has choice to write a predefined string or raw value.
      
      The change itself is trivial. When user preference doesn't match
      predefined string preferences and value is an unsigned integer and in
      range, use that value for EPP. When the EPP feature is not present
      writing raw value is not supported.
      Suggested-by: NLen Brown <lenb@kernel.org>
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      f473bf39
    • S
      cpufreq: intel_pstate: Allow enable/disable energy efficiency · ed7bde7a
      Srinivas Pandruvada 提交于
      By default intel_pstate the driver disables energy efficiency by setting
      MSR_IA32_POWER_CTL bit 19 for Kaby Lake desktop CPU model in HWP mode.
      This CPU model is also shared by Coffee Lake desktop CPUs. This allows
      these systems to reach maximum possible frequency. But this adds power
      penalty, which some customers don't want. They want some way to enable/
      disable dynamically.
      
      So, add an additional attribute "energy_efficiency" under
      /sys/devices/system/cpu/intel_pstate/ for these CPU models. This allows
      to read and write bit 19 ("Disable Energy Efficiency Optimization") in
      the MSR IA32_POWER_CTL.
      
      This attribute is present in both HWP and non-HWP mode as this has an
      effect in both modes. Refer to Intel Software Developer's manual for
      details.
      
      The scope of this bit is package wide. Also these systems are single
      package systems. So read/write MSR on the current CPU is enough.
      
      The energy efficiency (EE) bit setting needs to be preserved during
      suspend/resume and CPU offline/online operation. To do this:
      - Restoring the EE setting from the cpufreq resume() callback, if there
      is change from the system default.
      - By default, don't disable EE from cpufreq init() callback for matching
      CPU models. Since the scope is package wide and is a single package
      system, move the disable EE calls from init() callback to
      intel_pstate_init() function, which is called only once.
      Suggested-by: NLen Brown <lenb@kernel.org>
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      ed7bde7a
  9. 23 6月, 2020 1 次提交
  10. 27 4月, 2020 1 次提交
  11. 17 4月, 2020 1 次提交
    • R
      cpufreq: intel_pstate: Use passive mode by default without HWP · 33aa46f2
      Rafael J. Wysocki 提交于
      After recent changes allowing scale-invariant utilization to be
      used on x86, the schedutil governor on top of intel_pstate in the
      passive mode should be on par with (or better than) the active mode
      "powersave" algorithm of intel_pstate on systems in which
      hardware-managed P-states (HWP) are not used, so it should not be
      necessary to use the internal scaling algorithm in those cases.
      
      Accordingly, modify intel_pstate to start in the passive mode by
      default if the processor at hand does not support HWP of if the driver
      is requested to avoid using HWP through the kernel command line.
      
      Among other things, that will allow utilization clamps and the
      support for RT/DL tasks in the schedutil governor to be utilized on
      systems in which intel_pstate is used.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      33aa46f2
  12. 27 3月, 2020 1 次提交
    • R
      cpufreq: intel_pstate: Simplify intel_pstate_cpu_init() · 5ac54113
      Rafael J. Wysocki 提交于
      The initial policy value set by intel_pstate_cpu_init() depends on
      whether or not CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is set, but
      that is not necessary, because the core will set the policy to
      "performance" in cpufreq_init_policy() if the default governor is
      "performance" anyway.
      
      Accordingly, change intel_pstate_cpu_init() to always set policy
      to CPUFREQ_POLICY_POWERSAVE initially to provide a valid fallback
      value to cpufreq_init_policy() in case the default cpufreq governor
      is neither "powersave" nor "performance".
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      5ac54113
  13. 25 3月, 2020 2 次提交
  14. 14 3月, 2020 1 次提交
  15. 29 1月, 2020 1 次提交
  16. 27 1月, 2020 1 次提交
    • R
      cpufreq: Avoid creating excessively large stack frames · 1e4f63ae
      Rafael J. Wysocki 提交于
      In the process of modifying a cpufreq policy, the cpufreq core makes
      a copy of it including all of the internals which is stored on the
      CPU stack.  Because struct cpufreq_policy is relatively large, this
      may cause the size of the stack frame to exceed the 2 KB limit and
      so the GCC complains when -Wframe-larger-than= is used.
      
      In fact, it is not necessary to copy the entire policy structure
      in order to modify it, however.
      
      First, because cpufreq_set_policy() obtains the min and max policy
      limits from frequency QoS now, it is not necessary to pass the limits
      to it from the callers.  The only things that need to be passed to it
      from there are the new governor pointer or (if there is a built-in
      governor in the driver) the "policy" value representing the governor
      choice.  They both can be passed as individual arguments, though, so
      make cpufreq_set_policy() take them this way and rework its callers
      accordingly.  This avoids making copies of cpufreq policies in the
      callers of cpufreq_set_policy().
      
      Second, cpufreq_set_policy() still needs to pass the new policy
      data to the ->verify() callback of the cpufreq driver whose task
      is to sanitize the min and max policy limits.  It still does not
      need to make a full copy of struct cpufreq_policy for this purpose,
      but it needs to pass a few items from it to the driver in case they
      are needed (different drivers have different needs in that respect
      and all of them have to be covered).  For this reason, introduce
      struct cpufreq_policy_data to hold copies of the members of
      struct cpufreq_policy used by the existing ->verify() driver
      callbacks and pass a pointer to a temporary structure of that
      type to ->verify() (instead of passing a pointer to full struct
      cpufreq_policy to it).
      
      While at it, notice that intel_pstate and longrun don't really need
      to verify the "policy" value in struct cpufreq_policy, so drop those
      check from them to avoid copying "policy" into struct
      cpufreq_policy_data (which allows it to be slightly smaller).
      
      Also while at it fix up white space in a couple of places and make
      cpufreq_set_policy() static (as it can be so).
      
      Fixes: 3000ce3c ("cpufreq: Use per-policy frequency QoS")
      Link: https://lore.kernel.org/linux-pm/CAMuHMdX6-jb1W8uC2_237m8ctCpsnGp=JCxqt8pCWVqNXHmkVg@mail.gmail.comReported-by: Nkbuild test robot <lkp@intel.com>
      Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      1e4f63ae
  17. 13 1月, 2020 1 次提交
  18. 08 11月, 2019 1 次提交
  19. 06 11月, 2019 1 次提交
  20. 21 10月, 2019 1 次提交
  21. 28 8月, 2019 4 次提交
  22. 21 8月, 2019 1 次提交
    • V
      cpufreq: intel_pstate: Implement QoS supported freq constraints · da5c504c
      Viresh Kumar 提交于
      Intel pstate driver exposes min_perf_pct and max_perf_pct sysfs files,
      which can be used to force a limit on the min/max P state of the driver.
      Though these files eventually control the min/max frequencies that the
      CPUs will run at, they don't make a change to policy->min/max values.
      
      When the values of these files are changed (in passive mode of the
      driver), it leads to calling ->limits() callback of the cpufreq
      governors, like schedutil. On a call to it the governors shall
      forcefully update the frequency to come within the limits. Since the
      limits, i.e.  policy->min/max, aren't updated by the driver, the
      governors fails to get the target freq within limit and sometimes aborts
      the update believing that the frequency is already set to the target
      value.
      
      This patch implements the QoS supported frequency constraints to update
      policy->min/max values whenever min_perf_pct or max_perf_pct files are
      updated. This is only done for the passive mode as of now, as the driver
      is already working fine in active mode.
      
      Fixes: ecd28842 ("cpufreq: schedutil: Don't set next_freq to UINT_MAX")
      Reported-by: NDoug Smythies <dsmythies@telus.net>
      Tested-by: NDoug Smythies <dsmythies@telus.net>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      da5c504c
  23. 09 7月, 2019 1 次提交
  24. 05 6月, 2019 1 次提交
  25. 08 4月, 2019 2 次提交
  26. 02 4月, 2019 2 次提交
  27. 26 3月, 2019 1 次提交
  28. 12 3月, 2019 1 次提交
    • R
      cpufreq: intel_pstate: Fix up iowait_boost computation · 8e3b4039
      Rafael J. Wysocki 提交于
      After commit b8bd1581 ("cpufreq: intel_pstate: Rework iowait
      boosting to be less aggressive") the handling of the case when
      the SCHED_CPUFREQ_IOWAIT flag is set again after a few iterations of
      intel_pstate_update_util() is a bit inconsistent, because the
      new value of cpu->iowait_boost may be lower than ONE_EIGHTH_FP
      if it was set before, but has not dropped down to zero just yet.
      
      Fix that up by ensuring that the new value of cpu->iowait_boost
      will always be at least ONE_EIGHTH_FP then.
      
      Fixes: b8bd1581 ("cpufreq: intel_pstate: Rework iowait boosting to be less aggressive")
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      8e3b4039
  29. 18 2月, 2019 1 次提交
    • R
      cpufreq: intel_pstate: Rework iowait boosting to be less aggressive · b8bd1581
      Rafael J. Wysocki 提交于
      The current iowait boosting mechanism in intel_pstate_update_util()
      is quite aggressive, as it goes to the maximum P-state right away,
      and may cause excessive amounts of energy to be used, which is not
      desirable and arguably isn't necessary too.
      
      Follow commit a5a0809b ("cpufreq: schedutil: Make iowait boost
      more energy efficient") that reworked the analogous iowait boost
      mechanism in the schedutil governor and make the iowait boosting
      in intel_pstate_update_util() work along the same lines.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      b8bd1581