1. 27 12月, 2016 1 次提交
  2. 08 12月, 2016 2 次提交
    • S
      cpufreq: intel_pstate: Support for energy performance hints with HWP · 984edbdc
      Srinivas Pandruvada 提交于
      It is possible to provide hints to the HWP algorithms in the processor
      to be more performance centric to more energy centric. These hints are
      provided by using HWP energy performance preference (EPP) or energy
      performance bias (EPB) settings.
      
      The scope of these settings is per logical processor, which means that
      each of the logical processors in the package can be programmed with a
      different value.
      
      This change provides cpufreq sysfs interface to provide hint. For each
      policy, two additional attributes will be available to check and provide
      hint. These attributes will only be present when the intel_pstate driver
      is using HWP mode.
      
      These attributes are:
       - energy_performance_available_preferences
       - energy_performance_preference
      
      To get list of supported hints:
      $ cat energy_performance_available_preferences
      default performance balance_performance balance_power power
      
      The current preference can be read or changed via cpufreq sysfs
      attribute "energy_performance_preference". Reading from this attribute
      will display current effective setting changed via any method. User can
      write any of the valid preference string to this attribute. User can
      always restore to power-on default by writing "default".
      
      Implementation
      Since these hints can be provided by direct MSR write or using some tools
      like x86_energy_perf_policy, the driver internally doesn't maintain any
      state. The user operation will result in direct read/write of MSR: 0x774
      (HWP_REQUEST_MSR). Also driver use read modify write to update other
      fields in this MSR.
      
      Summary of changes:
       - struct cpudata field epp_saved is renamed to epp_powersave, as this
         stores the value to restore once policy is switched from performance
         to powersave to restore original powersave EPP value.
       - A new struct cpudata field epp_saved is used to store the raw MSR
         EPP/EPB value when a CPU goes offline or on suspend and restore on
         online/resume. This ensures that EPP value is restored to correct
         value irrespective of the means used to set.
       - EPP/EPB value ranges are fixed for each preference, which can be
         set for the cpufreq sysfs, so user request is mapped to/from this
         range.
       - New attributes are only added when HWP is present.
       - Since EPP value of 0 is valid the fields are initialized to
         -EINVAL when not valid. The field epp_default is read only once
         after powerup to avoid reading on subsequent CPU online operation
       - New suspend callback to store epp on suspend operation
       - Don't invalidate old epp_saved field on resume and online as now
         we can restore last epp value on suspend and this field can still
         have old EPP value sampled during switch to performance from
         powersave.
       - While here optimized setting of cpu_data->epp_powersave = epp in
         intel_pstate_hwp_set() as this was done in both true and false
         paths.
       - epp/epb set function returns error to caller on failure to pass
         on to user space for display.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      984edbdc
    • S
      cpufreq: intel_pstate: Add locking around HWP requests · b59fe540
      Srinivas Pandruvada 提交于
      To avoid race conditions from multiple threads, increase the scope
      of intel_pstate_limits_lock to include HWP requests also.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      [ rjw: Subject ]
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      b59fe540
  3. 01 12月, 2016 1 次提交
  4. 28 11月, 2016 2 次提交
  5. 22 11月, 2016 2 次提交
  6. 21 11月, 2016 1 次提交
    • R
      cpufreq: intel_pstate: Generic governors support · 001c76f0
      Rafael J. Wysocki 提交于
      There may be reasons to use generic cpufreq governors (eg. schedutil)
      on Intel platforms instead of the intel_pstate driver's internal
      governor.  However, that currently can only be done by disabling
      intel_pstate altogether and using the acpi-cpufreq driver instead
      of it, which is subject to limitations.
      
      First of all, acpi-cpufreq only works on systems where the _PSS
      object is present in the ACPI tables for all logical CPUs.  Second,
      on those systems acpi-cpufreq will only use frequencies listed by
      _PSS which may be suboptimal.  In particular, by convention, the
      whole turbo range is represented in _PSS as a single P-state and
      the frequency assigned to it is greater by 1 MHz than the greatest
      non-turbo frequency listed by _PSS.  That may confuse governors to
      use turbo frequencies less frequently which may lead to suboptimal
      performance.
      
      For this reason, make it possible to use the intel_pstate driver
      with generic cpufreq governors as a "normal" cpufreq driver.  That
      mode is enforced by adding intel_pstate=passive to the kernel
      command line and cannot be disabled at run time.  In that mode,
      intel_pstate provides a cpufreq driver interface including
      the ->target() and ->fast_switch() callbacks and is listed in
      scaling_driver as "intel_cpufreq".
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Tested-by: NDoug Smythies <dsmythies@telus.net>
      001c76f0
  7. 18 11月, 2016 1 次提交
    • R
      cpufreq: intel_pstate: Request P-states control from SMM if needed · d0ea59e1
      Rafael J. Wysocki 提交于
      Currently, intel_pstate is unable to control P-states on my
      IvyBridge-based Acer Aspire S5, because they are controlled by SMM
      on that machine by default and it is necessary to request OS control
      of P-states from it via the SMI Command register exposed in the ACPI
      FADT.  intel_pstate doesn't do that now, but acpi-cpufreq and other
      cpufreq drivers for x86 platforms do.
      
      Address this problem by making intel_pstate use the ACPI-defined
      mechanism as well.  However, intel_pstate is not modular and it
      doesn't need the module refcount tricks played by
      acpi_processor_notify_smm(), so export the core of this function
      to it as acpi_processor_pstate_control() and make it call that.
      [The changes in processor_perflib.c related to this should not
      make any functional difference for the acpi_processor_notify_smm()
      users].
      
      To be safe, only call acpi_processor_notify_smm() from intel_pstate
      if ACPI _PPC support is enabled in it.
      Suggested-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      d0ea59e1
  8. 15 11月, 2016 1 次提交
  9. 01 11月, 2016 3 次提交
    • S
      cpufreq: intel_pstate: protect limits variable · a410c03d
      Srinivas Pandruvada 提交于
      The limits variable gets modified from intel_pstate sysfs and also gets
      modified from cpufreq sysfs. So protect with a mutex to keep data
      integrity, when they are getting modified from multiple threads.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      a410c03d
    • S
      cpufreq: intel_pstate: Reduce impact due to rounding error · 5879f877
      Srinivas Pandruvada 提交于
      When policy->max and policy->min are same, in some cases they don't
      result in the same frequency cap. The max_policy_pct is rounded up but
      not min_perf_pct. So even when they are same, results in different
      percentage or maximum and minimum.
      Since minimum is a conservative value for power, a lower value without
      rounding is better in most of the cases, unless user wants
      policy->max = policy->min.
      This change uses use the same policy percentage when policy->max and
      policy->min are same.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      5879f877
    • S
      cpufreq: intel_pstate: Per CPU P-State limits · eae48f04
      Srinivas Pandruvada 提交于
      Intel P-State offers two interface to set performance limits:
      - Intel P-State sysfs
      	/sys/devices/system/cpu/intel_pstate/max_perf_pct
      	/sys/devices/system/cpu/intel_pstate/min_perf_pct
      - cpufreq
      	/sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
      	/sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
      
      In the current implementation both of the above methods, change limits
      to every CPU in the system. Moreover the limits placed using cpufreq
      policy interface also presented in the Intel P-State sysfs via modified
      max_perf_pct and min_per_pct during sysfs reads. This allows to check
      percent of reduced/increased performance, irrespective of method used to
      limit.
      
      There are some new generations of processors, where it is possible to
      have limits placed on individual CPU cores. Using cpufreq interface it
      is possible to set limits on each CPU. But the current processing will
      use last limits placed on all CPUs. So the per core limit feature of
      CPUs can't be used.
      
      This change brings in capability to set P-States limits for each CPU,
      with some limitations. In this case what should be the read of
      max_perf_pct and min_perf_pct? It can be most restrictive limits placed
      on any CPU or max possible performance on any given CPU on which no
      limits are placed. In either case someone will have issue.
      
      So the consensus is, we can't have both sysfs controls present when user
      wants to use limit per core limits.
      - By default per-core-control feature is not enabled. So no one will
      notice any difference.
      - The way to enable is by kernel command line
      intel_pstate=per_cpu_perf_limits
      - When the per-core-controls are enabled there is no display of for both
      read and write on
      	/sys/devices/system/cpu/intel_pstate/max_perf_pct
      	/sys/devices/system/cpu/intel_pstate/min_perf_pct
      - User can change limits using
      	/sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
      	/sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
      	/sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
      - User can still observe turbo percent and number of P-States from
      	/sys/devices/system/cpu/intel_pstate/turbo_pct
      	/sys/devices/system/cpu/intel_pstate/num_pstates
      - User can read write system wide turbo status
      	/sys/devices/system/cpu/no_turbo
      
      While changing this BUG_ON is changed to WARN_ON, as they are not fatal
      errors for the system.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      eae48f04
  10. 25 10月, 2016 1 次提交
    • R
      cpufreq: intel_pstate: Always set max P-state in performance mode · 2f1d407a
      Rafael J. Wysocki 提交于
      The only times at which intel_pstate checks the policy set for
      a given CPU is the initialization of that CPU and updates of its
      policy settings from cpufreq when intel_pstate_set_policy() is
      invoked.
      
      That is insufficient, however, because intel_pstate uses the same
      P-state selection function for all CPUs regardless of the policy
      setting for each of them and the P-state limits are shared between
      them.  Thus if the policy is set to "performance" for a particular
      CPU, it may not behave as expected if the cpufreq settings are
      changed subsequently for another CPU.
      
      That can be easily demonstrated by writing "performance" to
      scaling_governor for all CPUs and then switching it to "powersave"
      for one of them in which case all of the CPUs will behave as though
      their scaling_governor were all "powersave" (even though the policy
      still appears to be "performance" for the remaining CPUs).
      
      Fix this problem by modifying intel_pstate_adjust_busy_pstate() to
      always set the P-state to the maximum allowed by the current limits
      for all CPUs whose policy is set to "performance".
      
      Note that it still is recommended to always change the policy setting
      in the same way for all CPUs even with this fix applied to avoid
      confusion.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      2f1d407a
  11. 22 10月, 2016 3 次提交
  12. 13 10月, 2016 2 次提交
  13. 10 10月, 2016 2 次提交
    • R
      cpufreq: intel_pstate: Clarify comment in get_target_pstate_use_performance() · f00593a4
      Rafael J. Wysocki 提交于
      Make the comment explaining the meaning of the perf_scaled variable
      in get_target_pstate_use_performance() more straightforward.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      f00593a4
    • S
      cpufreq: intel_pstate: Fix unsafe HWP MSR access · f9f4872d
      Srinivas Pandruvada 提交于
      This is a requirement that MSR MSR_PM_ENABLE must be set to 0x01 before
      reading MSR_HWP_CAPABILITIES on a given CPU. If cpufreq init() is
      scheduled on a CPU which is not same as policy->cpu or migrates to a
      different CPU before calling msr read for MSR_HWP_CAPABILITIES, it
      is possible that MSR_PM_ENABLE was not to set to 0x01 on that CPU.
      This will cause GP fault. So like other places in this path
      rdmsrl_on_cpu should be used instead of rdmsrl.
      
      Moreover the scope of MSR_HWP_CAPABILITIES is on per thread basis, so it
      should be read from the same CPU, for which MSR MSR_HWP_REQUEST is
      getting set.
      
      dmesg dump or warning:
      
      [   22.014488] WARNING: CPU: 139 PID: 1 at arch/x86/mm/extable.c:50 ex_handler_rdmsr_unsafe+0x68/0x70
      [   22.014492] unchecked MSR access error: RDMSR from 0x771
      [   22.014493] Modules linked in:
      [   22.014507] CPU: 139 PID: 1 Comm: swapper/0 Not tainted 4.7.5+ #1
      ...
      ...
      [   22.014516] Call Trace:
      [   22.014542]  [<ffffffff813d7dd1>] dump_stack+0x63/0x82
      [   22.014558]  [<ffffffff8107bc8b>] __warn+0xcb/0xf0
      [   22.014561]  [<ffffffff8107bcff>] warn_slowpath_fmt+0x4f/0x60
      [   22.014563]  [<ffffffff810676f8>] ex_handler_rdmsr_unsafe+0x68/0x70
      [   22.014564]  [<ffffffff810677d9>] fixup_exception+0x39/0x50
      [   22.014604]  [<ffffffff8102e400>] do_general_protection+0x80/0x150
      [   22.014610]  [<ffffffff817f9ec8>] general_protection+0x28/0x30
      [   22.014635]  [<ffffffff81687940>] ? get_target_pstate_use_performance+0xb0/0xb0
      [   22.014642]  [<ffffffff810600c7>] ? native_read_msr+0x7/0x40
      [   22.014657]  [<ffffffff81688123>] intel_pstate_hwp_set+0x23/0x130
      [   22.014660]  [<ffffffff81688406>] intel_pstate_set_policy+0x1b6/0x340
      [   22.014662]  [<ffffffff816829bb>] cpufreq_set_policy+0xeb/0x2c0
      [   22.014664]  [<ffffffff81682f39>] cpufreq_init_policy+0x79/0xe0
      [   22.014666]  [<ffffffff81682cb0>] ? cpufreq_update_policy+0x120/0x120
      [   22.014669]  [<ffffffff816833a6>] cpufreq_online+0x406/0x820
      [   22.014671]  [<ffffffff8168381f>] cpufreq_add_dev+0x5f/0x90
      [   22.014717]  [<ffffffff81530ac8>] subsys_interface_register+0xb8/0x100
      [   22.014719]  [<ffffffff816821bc>] cpufreq_register_driver+0x14c/0x210
      [   22.014749]  [<ffffffff81fe1d90>] intel_pstate_init+0x39d/0x4d5
      [   22.014751]  [<ffffffff81fe13f2>] ? cpufreq_gov_dbs_init+0x12/0x12
      
      Cc: 4.3+ <stable@vger.kernel.org> # 4.3+
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      f9f4872d
  14. 17 9月, 2016 1 次提交
  15. 14 9月, 2016 1 次提交
  16. 13 9月, 2016 1 次提交
  17. 17 8月, 2016 1 次提交
    • R
      cpufreq / sched: Pass flags to cpufreq_update_util() · 58919e83
      Rafael J. Wysocki 提交于
      It is useful to know the reason why cpufreq_update_util() has just
      been called and that can be passed as flags to cpufreq_update_util()
      and to the ->func() callback in struct update_util_data.  However,
      doing that in addition to passing the util and max arguments they
      already take would be clumsy, so avoid it.
      
      Instead, use the observation that the schedutil governor is part
      of the scheduler proper, so it can access scheduler data directly.
      This allows the util and max arguments of cpufreq_update_util()
      and the ->func() callback in struct update_util_data to be replaced
      with a flags one, but schedutil has to be modified to follow.
      
      Thus make the schedutil governor obtain the CFS utilization
      information from the scheduler and use the "RT" and "DL" flags
      instead of the special utilization value of ULONG_MAX to track
      updates from the RT and DL sched classes.  Make it non-modular
      too to avoid having to export scheduler variables to modules at
      large.
      
      Next, update all of the other users of cpufreq_update_util()
      and the ->func() callback in struct update_util_data accordingly.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      58919e83
  18. 29 7月, 2016 1 次提交
  19. 21 7月, 2016 3 次提交
  20. 11 7月, 2016 1 次提交
  21. 07 7月, 2016 1 次提交
  22. 28 6月, 2016 4 次提交
  23. 15 6月, 2016 1 次提交
  24. 14 6月, 2016 1 次提交
  25. 08 6月, 2016 2 次提交
    • D
      x86/cpufreq: Use Intel family name macros for the intel_pstate cpufreq driver · 5b20c944
      Dave Hansen 提交于
      Another straightforward replacement of magic numbers.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: NRafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: jacob.jun.pan@intel.com
      Cc: linux-pm@vger.kernel.org
      Link: http://lkml.kernel.org/r/20160603001945.0F5D02AA@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5b20c944
    • S
      cpufreq: intel_pstate: Fix ->set_policy() interface for no_turbo · 983e600e
      Srinivas Pandruvada 提交于
      When turbo is disabled, the ->set_policy() interface is broken.
      
      For example, when turbo is disabled and cpuinfo.max = 2900000 (full
      max turbo frequency), setting the limits results in frequency less
      than the requested one:
      Set 1000000 KHz results in 0700000 KHz
      Set 1500000 KHz results in 1100000 KHz
      Set 2000000 KHz results in  1500000 KHz
      
      This is because the limits->max_perf fraction is calculated using
      the max turbo frequency as the reference, but when the max P-State is
      capped in intel_pstate_get_min_max(), the reference is not the max
      turbo P-State. This results in reducing max P-State.
      
      One option is to always use max turbo as reference for calculating
      limits. But this will not be correct. By definition the intel_pstate
      sysfs limits, shows percentage of available performance. So when
      BIOS has disabled turbo, the available performance is max non turbo.
      So the max_perf_pct should still show 100%.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      [ rjw : Subject & changelog, rewrite in fewer lines of code ]
      Cc: All applicable <stable@vger.kernel.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      983e600e