1. 08 1月, 2021 2 次提交
    • L
      cpufreq: intel_pstate: remove obsolete functions · c4151604
      Lukas Bulwahn 提交于
      percent_fp() was used in intel_pstate_pid_reset(), which was removed in
      commit 9d0ef7af ("cpufreq: intel_pstate: Do not use PID-based P-state
      selection") and hence, percent_fp() is unused since then.
      
      percent_ext_fp() was last used in intel_pstate_update_perf_limits(), which
      was refactored in commit 1a4fe38a ("cpufreq: intel_pstate: Remove
      max/min fractions to limit performance"), and hence, percent_ext_fp() is
      unused since then.
      
      make CC=clang W=1 points us those unused functions:
      
      drivers/cpufreq/intel_pstate.c:79:23: warning: unused function 'percent_fp' [-Wunused-function]
      static inline int32_t percent_fp(int percent)
                            ^
      
      drivers/cpufreq/intel_pstate.c:94:23: warning: unused function 'percent_ext_fp' [-Wunused-function]
      static inline int32_t percent_ext_fp(int percent)
                            ^
      
      Remove those obsolete functions.
      Signed-off-by: NLukas Bulwahn <lukas.bulwahn@gmail.com>
      Reviewed-by: NNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      c4151604
    • R
      cpufreq: intel_pstate: Use HWP capabilities in intel_cpufreq_adjust_perf() · 17ffd358
      Rafael J. Wysocki 提交于
      If turbo P-states cannot be used, either due to the configuration of
      the processor, or because intel_pstate is not allowed to used them,
      the maximum available P-state with HWP enabled corresponds to the
      HWP_CAP.GUARANTEED value which is not static.  It can be adjusted by
      an out-of-band agent or during an Intel Speed Select performance
      level change, so long as it remains less than or equal to
      HWP_CAP.MAX.
      
      However, if turbo P-states cannot be used, intel_cpufreq_adjust_perf()
      always uses pstate.max_pstate (set during the initialization of the
      driver only) as the maximum available P-state, so it may miss a change
      of the HWP_CAP.GUARANTEED value.
      
      Prevent that from happening by modifyig intel_cpufreq_adjust_perf()
      to always read the "guaranteed" and "maximum turbo" performance
      levels from the cached HWP_CAP value.
      
      Fixes: a365ab6b ("cpufreq: intel_pstate: Implement the ->adjust_perf() callback")
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      17ffd358
  2. 31 12月, 2020 1 次提交
  3. 21 12月, 2020 1 次提交
    • R
      cpufreq: intel_pstate: Use most recent guaranteed performance values · e40ad84c
      Rafael J. Wysocki 提交于
      When turbo has been disabled by the BIOS, but HWP_CAP.GUARANTEED is
      changed later, user space may want to take advantage of this increased
      guaranteed performance.
      
      HWP_CAP.GUARANTEED is not a static value.  It can be adjusted by an
      out-of-band agent or during an Intel Speed Select performance level
      change.  The HWP_CAP.MAX is still the maximum achievable performance
      with turbo disabled by the BIOS, so HWP_CAP.GUARANTEED can still
      change as long as it remains less than or equal to HWP_CAP.MAX.
      
      When HWP_CAP.GUARANTEED is changed, the sysfs base_frequency
      attribute shows the most recent guaranteed frequency value. This
      attribute can be used by user space software to update the scaling
      min/max limits of the CPU.
      
      Currently, the ->setpolicy() callback already uses the latest
      HWP_CAP values when setting HWP_REQ, but the ->verify() callback will
      restrict the user settings to the to old guaranteed performance value
      which prevents user space from making use of the extra CPU capacity
      theoretically available to it after increasing HWP_CAP.GUARANTEED.
      
      To address this, read HWP_CAP in intel_pstate_verify_cpu_policy()
      to obtain the maximum P-state that can be used and use that to
      confine the policy max limit instead of using the cached and
      possibly stale pstate.max_freq value for this purpose.
      
      For consistency, update intel_pstate_update_perf_limits() to use the
      maximum available P-state returned by intel_pstate_get_hwp_max() to
      compute the maximum frequency instead of using the return value of
      intel_pstate_get_max_freq() which, again, may be stale.
      
      This issue is a side-effect of fixing the scaling frequency limits in
      commit eacc9c5a ("cpufreq: intel_pstate: Fix intel_pstate_get_hwp_max()
      for turbo disabled") which corrected the setting of the reduced scaling
      frequency values, but caused stale HWP_CAP.GUARANTEED to be used in
      the case at hand.
      
      Fixes: eacc9c5a ("cpufreq: intel_pstate: Fix intel_pstate_get_hwp_max() for turbo disabled")
      Reported-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Tested-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Cc: 5.8+ <stable@vger.kernel.org> # 5.8+
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      e40ad84c
  4. 16 12月, 2020 1 次提交
  5. 12 12月, 2020 1 次提交
  6. 11 11月, 2020 1 次提交
    • R
      cpufreq: intel_pstate: Take CPUFREQ_GOV_STRICT_TARGET into account · fcb3a1ab
      Rafael J. Wysocki 提交于
      Make intel_pstate take the new CPUFREQ_GOV_STRICT_TARGET governor
      flag into account when it operates in the passive mode with HWP
      enabled, so as to fix the "powersave" governor behavior in that
      case (currently, HWP is allowed to scale the performance all the
      way up to the policy max limit when the "powersave" governor is
      used, but it should be constrained to the policy min limit then).
      
      Fixes: f6ebbcf0 ("cpufreq: intel_pstate: Implement passive mode with HWP enabled")
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Cc: 5.9+ <stable@vger.kernel.org> # 5.9+: 9a2a9ebc cpufreq: Introduce governor flags
      Cc: 5.9+ <stable@vger.kernel.org> # 5.9+: 218f6687 cpufreq: Introduce CPUFREQ_GOV_STRICT_TARGET
      Cc: 5.9+ <stable@vger.kernel.org> # 5.9+: ea9364bb cpufreq: Add strict_target to struct cpufreq_policy
      fcb3a1ab
  7. 28 10月, 2020 1 次提交
    • R
      cpufreq: intel_pstate: Avoid missing HWP max updates in passive mode · e0be38ed
      Rafael J. Wysocki 提交于
      If the cpufreq policy max limit is changed when intel_pstate operates
      in the passive mode with HWP enabled and the "powersave" governor is
      used on top of it, the HWP max limit is not updated as appropriate.
      
      Namely, in the "powersave" governor case, the target P-state
      is always equal to the policy min limit, so if the latter does
      not change, intel_cpufreq_adjust_hwp() is not invoked to update
      the HWP Request MSR due to the "target_pstate != old_pstate" check
      in intel_cpufreq_update_pstate(), so the HWP max limit is not
      updated as a result.
      
      Also, if the CPUFREQ_NEED_UPDATE_LIMITS flag is not set for the
      driver and the target frequency does not change along with the
      policy max limit, the "target_freq == policy->cur" check in
      __cpufreq_driver_target() prevents the driver's ->target() callback
      from being invoked at all, so the HWP max limit is not updated.
      
      To prevent that occurring, set the CPUFREQ_NEED_UPDATE_LIMITS flag
      in the intel_cpufreq driver structure if HWP is enabled and modify
      intel_cpufreq_update_pstate() to do the "target_pstate != old_pstate"
      check only in the non-HWP case and let intel_cpufreq_adjust_hwp()
      always run in the HWP case (it will update HWP Request only if the
      cached value of the register is different from the new one including
      the limits, so if neither the target P-state value nor the max limit
      changes, the register write will still be avoided).
      
      Fixes: f6ebbcf0 ("cpufreq: intel_pstate: Implement passive mode with HWP enabled")
      Reported-by: NZhang Rui <rui.zhang@intel.com>
      Cc: 5.9+ <stable@vger.kernel.org> # 5.9+: 1c534352 cpufreq: Introduce CPUFREQ_NEED_UPDATE_LIMITS ...
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Tested-by: NZhang Rui <rui.zhang@intel.com>
      e0be38ed
  8. 16 10月, 2020 1 次提交
    • C
      cpufreq: intel_pstate: Delete intel_pstate sysfs if failed to register the driver · cdc1719c
      Chen Yu 提交于
      There is a corner case that if the intel_pstate driver fails to be
      registered (might be due to invalid MSR access) and acpi_cpufreq
      takse over, the intel_pstate sysfs interface is still populated
      which may confuse user space (turbostat for example):
      
      grep . /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver
      acpi-cpufreq
      
      grep . /sys/devices/system/cpu/intel_pstate/*
      /sys/devices/system/cpu/intel_pstate/max_perf_pct:0
      /sys/devices/system/cpu/intel_pstate/min_perf_pct:0
      grep: /sys/devices/system/cpu/intel_pstate/no_turbo: Resource temporarily unavailable
      grep: /sys/devices/system/cpu/intel_pstate/num_pstates: Resource temporarily unavailable
      /sys/devices/system/cpu/intel_pstate/status:off
      grep: /sys/devices/system/cpu/intel_pstate/turbo_pct: Resource temporarily unavailable
      
      The mere presence of the intel_pstate sysfs interface does not mean
      that intel_pstate is in use (for example, echo "off" to "status"),
      but it should not be created in the failing case.
      
      Fix this issue by deleting the intel_pstate sysfs if the driver
      registration fails.
      Reported-by: NWendy Wang <wendy.wang@intel.com>
      Suggested-by: NZhang Rui <rui.zhang@intel.com>
      Signed-off-by: NChen Yu <yu.c.chen@intel.com>
      Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com
      [ rjw: Refactor code to avoid jumps, change function name, changelog edits ]
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      cdc1719c
  9. 30 9月, 2020 1 次提交
  10. 02 9月, 2020 6 次提交
  11. 11 8月, 2020 1 次提交
    • R
      cpufreq: intel_pstate: Implement passive mode with HWP enabled · f6ebbcf0
      Rafael J. Wysocki 提交于
      Allow intel_pstate to work in the passive mode with HWP enabled and
      make it set the HWP minimum performance limit (HWP floor) to the
      P-state value given by the target frequency supplied by the cpufreq
      governor, so as to prevent the HWP algorithm and the CPU scheduler
      from working against each other, at least when the schedutil governor
      is in use, and update the intel_pstate documentation accordingly.
      
      Among other things, this allows utilization clamps to be taken
      into account, at least to a certain extent, when intel_pstate is
      in use and makes it more likely that sufficient capacity for
      deadline tasks will be provided.
      
      After this change, the resulting behavior of an HWP system with
      intel_pstate in the passive mode should be close to the behavior
      of the analogous non-HWP system with intel_pstate in the passive
      mode, except that the HWP algorithm is generally allowed to make the
      CPU run at a frequency above the floor P-state set by intel_pstate in
      the entire available range of P-states, while without HWP a CPU can
      run in a P-state above the requested one if the latter falls into the
      range of turbo P-states (referred to as the turbo range) or if the
      P-states of all CPUs in one package are coordinated with each other
      at the hardware level.
      
      [Note that in principle the HWP floor may not be taken into account
       by the processor if it falls into the turbo range, in which case the
       processor has a license to choose any P-state, either below or above
       the HWP floor, just like a non-HWP processor in the case when the
       target P-state falls into the turbo range.]
      
      With this change applied, intel_pstate in the passive mode assumes
      complete control over the HWP request MSR and concurrent changes of
      that MSR (eg. via the direct MSR access interface) are overridden by
      it.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Reviewed-by: NFrancisco Jerez <currojerez@riseup.net>
      f6ebbcf0
  12. 04 8月, 2020 1 次提交
    • S
      cpufreq: intel_pstate: Fix cpuinfo_max_freq when MSR_TURBO_RATIO_LIMIT is 0 · 4daca379
      Srinivas Pandruvada 提交于
      The MSR_TURBO_RATIO_LIMIT can be 0. This is not an error. User can update
      this MSR via BIOS settings on some systems or can use msr tools to update.
      Also some systems boot with value = 0.
      
      This results in display of cpufreq/cpuinfo_max_freq wrong. This value
      will be equal to cpufreq/base_frequency, even though turbo is enabled.
      
      But platform will still function normally in HWP mode as we get max
      1-core frequency from the MSR_HWP_CAPABILITIES. This MSR is already used
      to calculate cpu->pstate.turbo_freq, which is used for to set
      policy->cpuinfo.max_freq. But some other places cpu->pstate.turbo_pstate
      is used. For example to set policy->max.
      
      To fix this, also update cpu->pstate.turbo_pstate when updating
      cpu->pstate.turbo_freq.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      4daca379
  13. 31 7月, 2020 2 次提交
  14. 16 7月, 2020 2 次提交
  15. 15 7月, 2020 1 次提交
  16. 13 7月, 2020 2 次提交
  17. 02 7月, 2020 2 次提交
    • S
      cpufreq: intel_pstate: Allow raw energy performance preference value · f473bf39
      Srinivas Pandruvada 提交于
      Currently using attribute "energy_performance_preference", user space can
      write one of the four per-defined preference string. These preference
      strings gets mapped to a hard-coded Energy-Performance Preference (EPP) or
      Energy-Performance Bias (EPB) knob.
      
      These four values are supposed to cover broad spectrum of use cases, but
      are not uniformly distributed in the range. There are number of cases,
      where this is not enough. For example:
      
      Suppose user wants more performance when connected to AC. Instead of using
      default "balance performance", the "performance" setting can be used. This
      changes EPP value from 0x80 to 0x00. But setting EPP to 0, results in
      electrical and thermal issues on some platforms. This results in
      aggressive throttling, which causes a drop in performance. But some value
      between 0x80 and 0x00 results in better performance. But that value can't
      be fixed as the power curve is not linear. In some cases just changing EPP
      from 0x80 to 0x75 is enough to get significant performance gain.
      
      Similarly on battery the default "balance_performance" mode can be
      aggressive in power consumption. But picking up the next choice
      "balance power" results in too much loss of performance, which results in
      bad user experience in use cases like "Google Hangout". It was observed
      that some value between these two EPP is optimal.
      
      This change allows fine grain EPP tuning for platform like Chromebook or
      for users who wants to fine tune power and performance.
      Here based on the product and use cases, different EPP values can be set.
      This change is similar to the change done for:
      /sys/devices/system/cpu/cpu*/power/energy_perf_bias
      where user has choice to write a predefined string or raw value.
      
      The change itself is trivial. When user preference doesn't match
      predefined string preferences and value is an unsigned integer and in
      range, use that value for EPP. When the EPP feature is not present
      writing raw value is not supported.
      Suggested-by: NLen Brown <lenb@kernel.org>
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      f473bf39
    • S
      cpufreq: intel_pstate: Allow enable/disable energy efficiency · ed7bde7a
      Srinivas Pandruvada 提交于
      By default intel_pstate the driver disables energy efficiency by setting
      MSR_IA32_POWER_CTL bit 19 for Kaby Lake desktop CPU model in HWP mode.
      This CPU model is also shared by Coffee Lake desktop CPUs. This allows
      these systems to reach maximum possible frequency. But this adds power
      penalty, which some customers don't want. They want some way to enable/
      disable dynamically.
      
      So, add an additional attribute "energy_efficiency" under
      /sys/devices/system/cpu/intel_pstate/ for these CPU models. This allows
      to read and write bit 19 ("Disable Energy Efficiency Optimization") in
      the MSR IA32_POWER_CTL.
      
      This attribute is present in both HWP and non-HWP mode as this has an
      effect in both modes. Refer to Intel Software Developer's manual for
      details.
      
      The scope of this bit is package wide. Also these systems are single
      package systems. So read/write MSR on the current CPU is enough.
      
      The energy efficiency (EE) bit setting needs to be preserved during
      suspend/resume and CPU offline/online operation. To do this:
      - Restoring the EE setting from the cpufreq resume() callback, if there
      is change from the system default.
      - By default, don't disable EE from cpufreq init() callback for matching
      CPU models. Since the scope is package wide and is a single package
      system, move the disable EE calls from init() callback to
      intel_pstate_init() function, which is called only once.
      Suggested-by: NLen Brown <lenb@kernel.org>
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      ed7bde7a
  18. 23 6月, 2020 1 次提交
  19. 27 4月, 2020 1 次提交
  20. 17 4月, 2020 1 次提交
    • R
      cpufreq: intel_pstate: Use passive mode by default without HWP · 33aa46f2
      Rafael J. Wysocki 提交于
      After recent changes allowing scale-invariant utilization to be
      used on x86, the schedutil governor on top of intel_pstate in the
      passive mode should be on par with (or better than) the active mode
      "powersave" algorithm of intel_pstate on systems in which
      hardware-managed P-states (HWP) are not used, so it should not be
      necessary to use the internal scaling algorithm in those cases.
      
      Accordingly, modify intel_pstate to start in the passive mode by
      default if the processor at hand does not support HWP of if the driver
      is requested to avoid using HWP through the kernel command line.
      
      Among other things, that will allow utilization clamps and the
      support for RT/DL tasks in the schedutil governor to be utilized on
      systems in which intel_pstate is used.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      33aa46f2
  21. 27 3月, 2020 1 次提交
    • R
      cpufreq: intel_pstate: Simplify intel_pstate_cpu_init() · 5ac54113
      Rafael J. Wysocki 提交于
      The initial policy value set by intel_pstate_cpu_init() depends on
      whether or not CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is set, but
      that is not necessary, because the core will set the policy to
      "performance" in cpufreq_init_policy() if the default governor is
      "performance" anyway.
      
      Accordingly, change intel_pstate_cpu_init() to always set policy
      to CPUFREQ_POLICY_POWERSAVE initially to provide a valid fallback
      value to cpufreq_init_policy() in case the default cpufreq governor
      is neither "powersave" nor "performance".
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      5ac54113
  22. 25 3月, 2020 2 次提交
  23. 14 3月, 2020 1 次提交
  24. 29 1月, 2020 1 次提交
  25. 27 1月, 2020 1 次提交
    • R
      cpufreq: Avoid creating excessively large stack frames · 1e4f63ae
      Rafael J. Wysocki 提交于
      In the process of modifying a cpufreq policy, the cpufreq core makes
      a copy of it including all of the internals which is stored on the
      CPU stack.  Because struct cpufreq_policy is relatively large, this
      may cause the size of the stack frame to exceed the 2 KB limit and
      so the GCC complains when -Wframe-larger-than= is used.
      
      In fact, it is not necessary to copy the entire policy structure
      in order to modify it, however.
      
      First, because cpufreq_set_policy() obtains the min and max policy
      limits from frequency QoS now, it is not necessary to pass the limits
      to it from the callers.  The only things that need to be passed to it
      from there are the new governor pointer or (if there is a built-in
      governor in the driver) the "policy" value representing the governor
      choice.  They both can be passed as individual arguments, though, so
      make cpufreq_set_policy() take them this way and rework its callers
      accordingly.  This avoids making copies of cpufreq policies in the
      callers of cpufreq_set_policy().
      
      Second, cpufreq_set_policy() still needs to pass the new policy
      data to the ->verify() callback of the cpufreq driver whose task
      is to sanitize the min and max policy limits.  It still does not
      need to make a full copy of struct cpufreq_policy for this purpose,
      but it needs to pass a few items from it to the driver in case they
      are needed (different drivers have different needs in that respect
      and all of them have to be covered).  For this reason, introduce
      struct cpufreq_policy_data to hold copies of the members of
      struct cpufreq_policy used by the existing ->verify() driver
      callbacks and pass a pointer to a temporary structure of that
      type to ->verify() (instead of passing a pointer to full struct
      cpufreq_policy to it).
      
      While at it, notice that intel_pstate and longrun don't really need
      to verify the "policy" value in struct cpufreq_policy, so drop those
      check from them to avoid copying "policy" into struct
      cpufreq_policy_data (which allows it to be slightly smaller).
      
      Also while at it fix up white space in a couple of places and make
      cpufreq_set_policy() static (as it can be so).
      
      Fixes: 3000ce3c ("cpufreq: Use per-policy frequency QoS")
      Link: https://lore.kernel.org/linux-pm/CAMuHMdX6-jb1W8uC2_237m8ctCpsnGp=JCxqt8pCWVqNXHmkVg@mail.gmail.comReported-by: Nkbuild test robot <lkp@intel.com>
      Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      1e4f63ae
  26. 13 1月, 2020 1 次提交
  27. 08 11月, 2019 1 次提交
  28. 06 11月, 2019 1 次提交
  29. 21 10月, 2019 1 次提交