1. 06 3月, 2017 3 次提交
    • R
      cpufreq: intel_pstate: Do not reinit performance limits in ->setpolicy · a240c4aa
      Rafael J. Wysocki 提交于
      If the current P-state selection algorithm is set to "performance"
      in intel_pstate_set_policy(), the limits may be initialized from
      scratch, but only if no_turbo is not set and the maximum frequency
      allowed for the given CPU (i.e. the policy object representing it)
      is at least equal to the max frequency supported by the CPU.  In all
      of the other cases, the limits will not be updated.
      
      For example, the following can happen:
      
       # cat intel_pstate/status
       active
       # echo performance > cpufreq/policy0/scaling_governor
       # cat intel_pstate/min_perf_pct
       100
       # echo 94 > intel_pstate/min_perf_pct
       # cat intel_pstate/min_perf_pct
       100
       # cat cpufreq/policy0/scaling_max_freq
       3100000
       echo 3000000 > cpufreq/policy0/scaling_max_freq
       # cat intel_pstate/min_perf_pct
       94
       # echo 95 > intel_pstate/min_perf_pct
       # cat intel_pstate/min_perf_pct
       95
      
      That is confusing for two reasons.  First, the initial attempt to
      change min_perf_pct to 94 seems to have no effect, even though
      setting the global limits should always work.  Second, after
      changing scaling_max_freq for policy0 the global min_perf_pct
      attribute shows 94, even though it should have not been affected
      by that operation in principle.
      
      Moreover, the final attempt to change min_perf_pct to 95 worked
      as expected, because scaling_max_freq for the only policy with
      scaling_governor equal to "performance" was different from the
      maximum at that time.
      
      To make all that confusion go away, modify intel_pstate_set_policy()
      so that it doesn't reinitialize the limits at all.
      
      At the same time, change intel_pstate_set_performance_limits() to
      set min_sysfs_pct to 100 in the "performance" limits set so that
      switching the P-state selection algorithm to "performance" causes
      intel_pstate/min_perf_pct in sysfs to go to 100 (or whatever value
      min_sysfs_pct in the "performance" limits is set to later).
      
      That requires per-CPU limits to be initialized explicitly rather
      than by copying the global limits to avoid setting min_sysfs_pct
      in the per-CPU limits to 100.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      a240c4aa
    • R
      cpufreq: intel_pstate: Fix intel_pstate_verify_policy() · d74b1992
      Rafael J. Wysocki 提交于
      The code added to intel_pstate_verify_policy() by commit 1443ebba
      (cpufreq: intel_pstate: Fix sysfs limits enforcement for performance
      policy) should use perf_limits instead of limits, because otherwise
      setting global limits via sysfs may affect policies inconsistently.
      
      For example, in the sequence of shell commands below, the
      scaling_min_freq attribute for policy1 and policy2 should be
      affected in the same way, because scaling_governor is set in
      the same way for both of them:
      
       # cat cpufreq/policy1/scaling_governor
       powersave
       # cat cpufreq/policy2/scaling_governor
       powersave
       # echo performance > cpufreq/policy0/scaling_governor
       # echo 94 > intel_pstate/min_perf_pct
       # cat cpufreq/policy0/scaling_min_freq
       2914000
       # cat cpufreq/policy1/scaling_min_freq
       2914000
       # cat cpufreq/policy2/scaling_min_freq
       800000
      
      The are affected differently, because intel_pstate_verify_policy()
      is invoked with limits set to &performance_limits (left behind by
      policy0) for policy1 and with limits set to &powersave_limits (left
      behind by policy1) for policy2.  Since perf_limits is set to the
      set of limits matching the policy being updated, using it instead
      of limits fixes the inconsistency.
      
      Fixes: 1443ebba (cpufreq: intel_pstate: Fix sysfs limits enforcement for performance policy)
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      d74b1992
    • R
      cpufreq: intel_pstate: Fix global settings in active mode · cd59b4be
      Rafael J. Wysocki 提交于
      Commit 111b8b3f (cpufreq: intel_pstate: Always keep all
      limits settings in sync) changed intel_pstate to invoke
      cpufreq_update_policy() for every registered CPU on global sysfs
      attributes updates, but that led to undesirable effects in the
      active mode if the "performance" P-state selection algorithm is
      configufred for one CPU and the "powersave" one is chosen for
      all of the other CPUs.
      
      Namely, in that case, the following is possible:
      
       # cd /sys/devices/system/cpu/
       # cat intel_pstate/max_perf_pct
       100
       # cat intel_pstate/min_perf_pct
       26
       # echo performance > cpufreq/policy0/scaling_governor
       # cat intel_pstate/max_perf_pct
       100
       # cat intel_pstate/min_perf_pct
       100
       # echo 94 > intel_pstate/min_perf_pct
       # cat intel_pstate/min_perf_pct
       26
      
      The reason why this happens is because intel_pstate attempts to
      maintain two sets of global limits in the active mode, one for
      the "performance" P-state selection algorithm and one for the
      "powersave"  P-state selection algorithm, but the P-state selection
      algorithms are set per policy, so the global limits cannot reflect
      all of them at the same time if they are different for different
      policies.
      
      In the particular situation above, the attempt to change
      min_perf_pct to 94 caused cpufreq_update_policy() to be run
      for a CPU with the "powersave"  P-state selection algorithm
      and intel_pstate_set_policy() called by it silently switched the
      global limits to the "powersave" set which finally was reflected
      by the sysfs interface.
      
      To prevent that from happening, modify intel_pstate_update_policies()
      to always switch back to the set of limits that was used right before
      it has been invoked.
      
      Fixes: 111b8b3f (cpufreq: intel_pstate: Always keep all limits settings in sync)
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      cd59b4be
  2. 04 3月, 2017 3 次提交
    • R
      cpufreq: intel_pstate: Avoid triggering cpu_frequency tracepoint unnecessarily · 64078299
      Rafael J. Wysocki 提交于
      In the passive mode the cpu_frequency trace event is already
      triggered by the cpufreq core or by scaling governors, so
      intel_pstate should not trigger it once again for the same
      P-state updates.
      
      In addition to that, the frequency returned by
      intel_cpufreq_fast_switch() and passed via freqs.new from
      intel_cpufreq_target() to cpufreq_freq_transition_end() should
      reflect the P-state actually set, so make that happen.
      
      Fixes: 001c76f0 (cpufreq: intel_pstate: Generic governors support)
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      64078299
    • R
      cpufreq: intel_pstate: Fix intel_cpufreq_verify_policy() · 7f17326f
      Rafael J. Wysocki 提交于
      The intel_pstate_update_perf_limits() called from
      intel_cpufreq_verify_policy() may cause global P-state limits
      to change which is generally confusing and unnecessary.
      
      In the passive mode the global limits are only applied to the
      frequency selected by the scaling governor (they are not taken
      into account by governors when making decisions anyway), so making
      them follow the per-policy limits serves no purpose and may go
      against user expectations (as it generally causes the global
      attributes in sysfs to change even though they have not been
      written to in some cases).
      
      Fix that by dropping the intel_pstate_update_perf_limits()
      invocation from intel_cpufreq_verify_policy() (which also
      reduces the code size by a few lines).
      
      This change does not affect the per-CPU limits case, because those
      limits allow any P-state to be set by default in the passive mode
      and it removes the only piece of code updating them in that mode,
      so the per-policy settings will be the only ones taken into account
      in that case as expected.
      
      Fixes: 001c76f0 (cpufreq: intel_pstate: Generic governors support)
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      7f17326f
    • R
      cpufreq: intel_pstate: Do not use performance_limits in passive mode · 2bc756e7
      Rafael J. Wysocki 提交于
      Using performance_limits in the passive mode doesn't make
      sense, because in that mode the global limits are applied to the
      frequency selected by the scaling governor.
      
      The maximum and minimum P-state limits in performance_limits are both
      set to 100 percent which will put all CPUs into the turbo range
      regardless of what governor is used and what frequencies are
      selected by it (that is particularly undesirable on CPUs with the
      generic powersave governor attached).
      
      For this reason, make intel_pstate_register_driver() always point
      limits to powersave_limits in the passive mode.
      
      Fixes: 001c76f0 (cpufreq: intel_pstate: Generic governors support)
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      2bc756e7
  3. 28 2月, 2017 1 次提交
  4. 04 2月, 2017 6 次提交
    • S
      cpufreq: intel_pstate: Disable energy efficiency optimization · 6e978b22
      Srinivas Pandruvada 提交于
      Some Kabylake desktop processors may not reach max turbo when running in
      HWP mode, even if running under sustained 100% utilization.
      
      This occurs when the HWP.EPP (Energy Performance Preference) is set to
      "balance_power" (0x80) -- the default on most systems.
      
      It occurs because the platform BIOS may erroneously enable an
      energy-efficiency setting -- MSR_IA32_POWER_CTL BIT-EE, which is not
      recommended to be enabled on this SKU.
      
      On the failing systems, this BIOS issue was not discovered when the
      desktop motherboard was tested with Windows, because the BIOS also
      neglects to provide the ACPI/CPPC table, that Windows requires to enable
      HWP, and so Windows runs in legacy P-state mode, where this setting has
      no effect.
      
      Linux' intel_pstate driver does not require ACPI/CPPC to enable HWP, and
      so it runs in HWP mode, exposing this incorrect BIOS configuration.
      
      There are several ways to address this problem.
      
      First, Linux can also run in legacy P-state mode on this system.
      As intel_pstate is how Linux enables HWP, booting with
      "intel_pstate=disable"
      will run in acpi-cpufreq/ondemand legacy p-state mode.
      
      Or second, the "performance" governor can be used with intel_pstate,
      which will modify HWP.EPP to 0.
      
      Or third, starting in 4.10, the
      /sys/devices/system/cpu/cpufreq/policy*/energy_performance_preference
      attribute in can be updated from "balance_power" to "performance".
      
      Or fourth, apply this patch, which fixes the erroneous setting of
      MSR_IA32_POWER_CTL BIT_EE on this model, allowing the default
      configuration to function as designed.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Reviewed-by: NLen Brown <len.brown@intel.com>
      Cc: 4.6+ <stable@vger.kernel.org> # 4.6+
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      6e978b22
    • S
      cpufreq: intel_pstate: Calculate guaranteed performance for HWP · 8fc7554a
      Srinivas Pandruvada 提交于
      When HWP is active, turbo activation ratio is not used to calculate max
      non turbo ratio. But on these systems the max non turbo ratio is decided
      by config TDP settings.
      
      This change removes usage of MSR_TURBO_ACTIVATION_RATIO for HWP systems,
      instead directly use TDP ratios, when more than one TDPs are available.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      8fc7554a
    • S
      cpufreq: intel_pstate: Make HWP limits compatible with legacy · 4e5d3f71
      Srinivas Pandruvada 提交于
      Under HWP the performance limits are calculated using max_perf_pct
      and min_perf_pct using possible performance, not available performance.
      The available performance can be reduced by no_turbo setting. To make
      compatible with legacy mode, use max/min performance percentage with
      respect to available performance.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      4e5d3f71
    • S
      cpufreq: intel_pstate: Lower frequency than expected under no_turbo · 7d9a8a9f
      Srinivas Pandruvada 提交于
      When turbo is not disabled by BIOS, but user disabled from intel P-State
      sysfs and changes max/min using cpufreq sysfs, the resultant frequency
      is lower than what user requested.
      
      The reason for this, when the perf limits are calculated in set_policy()
      callback, they are with reference to max cpu frequency (turbo frequency
      ), but when enforced in the intel_pstate_get_min_max() they are with
      reference to max available performance as documented in the intel_pstate
      documentation (in this case max non turbo P-State).
      
      This needs similar change as done in intel_cpufreq_verify_policy() for
      passive mode. Set policy->cpuinfo.max_freq based on the turbo status.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      7d9a8a9f
    • R
      cpufreq: intel_pstate: Operation mode control from sysfs · fb1fe104
      Rafael J. Wysocki 提交于
      Make it possible to change the operation mode of intel_pstate with
      the help of a new sysfs attribute called "status".
      
      There are three possible configurations that can be selected using
      this attribute:
      
       "off"     - The driver is not in use at this time.
       "active"  - The driver works as a P-state governor (default).
       "passive" - The driver works as a regular cpufreq one and collaborates
                   with the generic cpufreq governors (it sets P-states as
                   requested by those governors).  [This is the same mode
                   the driver can be started in by passing intel_pstate=passive
                   in the kernel command line.]
      
      The current setting is returned by reads from this attribute.  Writing
      one of the above strings to it changes the operation mode as indicated
      by that string, if possible.
      
      If HW-managed P-states (HWP) feature is enabled, it is not possible
      to change the driver's operation mode and attempts to write to this
      attribute will fail.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      fb1fe104
    • R
      cpufreq: intel_pstate: Expose global sysfs attributes upfront · 0c30b65b
      Rafael J. Wysocki 提交于
      Expose the intel_pstate's global sysfs attributes before registering
      the driver to prepare for the addition of an attribute that also will
      have to work if the driver is not registered.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      0c30b65b
  5. 20 1月, 2017 1 次提交
  6. 01 1月, 2017 3 次提交
  7. 27 12月, 2016 1 次提交
  8. 08 12月, 2016 2 次提交
    • S
      cpufreq: intel_pstate: Support for energy performance hints with HWP · 984edbdc
      Srinivas Pandruvada 提交于
      It is possible to provide hints to the HWP algorithms in the processor
      to be more performance centric to more energy centric. These hints are
      provided by using HWP energy performance preference (EPP) or energy
      performance bias (EPB) settings.
      
      The scope of these settings is per logical processor, which means that
      each of the logical processors in the package can be programmed with a
      different value.
      
      This change provides cpufreq sysfs interface to provide hint. For each
      policy, two additional attributes will be available to check and provide
      hint. These attributes will only be present when the intel_pstate driver
      is using HWP mode.
      
      These attributes are:
       - energy_performance_available_preferences
       - energy_performance_preference
      
      To get list of supported hints:
      $ cat energy_performance_available_preferences
      default performance balance_performance balance_power power
      
      The current preference can be read or changed via cpufreq sysfs
      attribute "energy_performance_preference". Reading from this attribute
      will display current effective setting changed via any method. User can
      write any of the valid preference string to this attribute. User can
      always restore to power-on default by writing "default".
      
      Implementation
      Since these hints can be provided by direct MSR write or using some tools
      like x86_energy_perf_policy, the driver internally doesn't maintain any
      state. The user operation will result in direct read/write of MSR: 0x774
      (HWP_REQUEST_MSR). Also driver use read modify write to update other
      fields in this MSR.
      
      Summary of changes:
       - struct cpudata field epp_saved is renamed to epp_powersave, as this
         stores the value to restore once policy is switched from performance
         to powersave to restore original powersave EPP value.
       - A new struct cpudata field epp_saved is used to store the raw MSR
         EPP/EPB value when a CPU goes offline or on suspend and restore on
         online/resume. This ensures that EPP value is restored to correct
         value irrespective of the means used to set.
       - EPP/EPB value ranges are fixed for each preference, which can be
         set for the cpufreq sysfs, so user request is mapped to/from this
         range.
       - New attributes are only added when HWP is present.
       - Since EPP value of 0 is valid the fields are initialized to
         -EINVAL when not valid. The field epp_default is read only once
         after powerup to avoid reading on subsequent CPU online operation
       - New suspend callback to store epp on suspend operation
       - Don't invalidate old epp_saved field on resume and online as now
         we can restore last epp value on suspend and this field can still
         have old EPP value sampled during switch to performance from
         powersave.
       - While here optimized setting of cpu_data->epp_powersave = epp in
         intel_pstate_hwp_set() as this was done in both true and false
         paths.
       - epp/epb set function returns error to caller on failure to pass
         on to user space for display.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      984edbdc
    • S
      cpufreq: intel_pstate: Add locking around HWP requests · b59fe540
      Srinivas Pandruvada 提交于
      To avoid race conditions from multiple threads, increase the scope
      of intel_pstate_limits_lock to include HWP requests also.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      [ rjw: Subject ]
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      b59fe540
  9. 01 12月, 2016 1 次提交
  10. 28 11月, 2016 2 次提交
  11. 25 11月, 2016 1 次提交
  12. 22 11月, 2016 2 次提交
  13. 21 11月, 2016 1 次提交
    • R
      cpufreq: intel_pstate: Generic governors support · 001c76f0
      Rafael J. Wysocki 提交于
      There may be reasons to use generic cpufreq governors (eg. schedutil)
      on Intel platforms instead of the intel_pstate driver's internal
      governor.  However, that currently can only be done by disabling
      intel_pstate altogether and using the acpi-cpufreq driver instead
      of it, which is subject to limitations.
      
      First of all, acpi-cpufreq only works on systems where the _PSS
      object is present in the ACPI tables for all logical CPUs.  Second,
      on those systems acpi-cpufreq will only use frequencies listed by
      _PSS which may be suboptimal.  In particular, by convention, the
      whole turbo range is represented in _PSS as a single P-state and
      the frequency assigned to it is greater by 1 MHz than the greatest
      non-turbo frequency listed by _PSS.  That may confuse governors to
      use turbo frequencies less frequently which may lead to suboptimal
      performance.
      
      For this reason, make it possible to use the intel_pstate driver
      with generic cpufreq governors as a "normal" cpufreq driver.  That
      mode is enforced by adding intel_pstate=passive to the kernel
      command line and cannot be disabled at run time.  In that mode,
      intel_pstate provides a cpufreq driver interface including
      the ->target() and ->fast_switch() callbacks and is listed in
      scaling_driver as "intel_cpufreq".
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Tested-by: NDoug Smythies <dsmythies@telus.net>
      001c76f0
  14. 18 11月, 2016 1 次提交
    • R
      cpufreq: intel_pstate: Request P-states control from SMM if needed · d0ea59e1
      Rafael J. Wysocki 提交于
      Currently, intel_pstate is unable to control P-states on my
      IvyBridge-based Acer Aspire S5, because they are controlled by SMM
      on that machine by default and it is necessary to request OS control
      of P-states from it via the SMI Command register exposed in the ACPI
      FADT.  intel_pstate doesn't do that now, but acpi-cpufreq and other
      cpufreq drivers for x86 platforms do.
      
      Address this problem by making intel_pstate use the ACPI-defined
      mechanism as well.  However, intel_pstate is not modular and it
      doesn't need the module refcount tricks played by
      acpi_processor_notify_smm(), so export the core of this function
      to it as acpi_processor_pstate_control() and make it call that.
      [The changes in processor_perflib.c related to this should not
      make any functional difference for the acpi_processor_notify_smm()
      users].
      
      To be safe, only call acpi_processor_notify_smm() from intel_pstate
      if ACPI _PPC support is enabled in it.
      Suggested-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      d0ea59e1
  15. 15 11月, 2016 1 次提交
  16. 01 11月, 2016 3 次提交
    • S
      cpufreq: intel_pstate: protect limits variable · a410c03d
      Srinivas Pandruvada 提交于
      The limits variable gets modified from intel_pstate sysfs and also gets
      modified from cpufreq sysfs. So protect with a mutex to keep data
      integrity, when they are getting modified from multiple threads.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      a410c03d
    • S
      cpufreq: intel_pstate: Reduce impact due to rounding error · 5879f877
      Srinivas Pandruvada 提交于
      When policy->max and policy->min are same, in some cases they don't
      result in the same frequency cap. The max_policy_pct is rounded up but
      not min_perf_pct. So even when they are same, results in different
      percentage or maximum and minimum.
      Since minimum is a conservative value for power, a lower value without
      rounding is better in most of the cases, unless user wants
      policy->max = policy->min.
      This change uses use the same policy percentage when policy->max and
      policy->min are same.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      5879f877
    • S
      cpufreq: intel_pstate: Per CPU P-State limits · eae48f04
      Srinivas Pandruvada 提交于
      Intel P-State offers two interface to set performance limits:
      - Intel P-State sysfs
      	/sys/devices/system/cpu/intel_pstate/max_perf_pct
      	/sys/devices/system/cpu/intel_pstate/min_perf_pct
      - cpufreq
      	/sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
      	/sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
      
      In the current implementation both of the above methods, change limits
      to every CPU in the system. Moreover the limits placed using cpufreq
      policy interface also presented in the Intel P-State sysfs via modified
      max_perf_pct and min_per_pct during sysfs reads. This allows to check
      percent of reduced/increased performance, irrespective of method used to
      limit.
      
      There are some new generations of processors, where it is possible to
      have limits placed on individual CPU cores. Using cpufreq interface it
      is possible to set limits on each CPU. But the current processing will
      use last limits placed on all CPUs. So the per core limit feature of
      CPUs can't be used.
      
      This change brings in capability to set P-States limits for each CPU,
      with some limitations. In this case what should be the read of
      max_perf_pct and min_perf_pct? It can be most restrictive limits placed
      on any CPU or max possible performance on any given CPU on which no
      limits are placed. In either case someone will have issue.
      
      So the consensus is, we can't have both sysfs controls present when user
      wants to use limit per core limits.
      - By default per-core-control feature is not enabled. So no one will
      notice any difference.
      - The way to enable is by kernel command line
      intel_pstate=per_cpu_perf_limits
      - When the per-core-controls are enabled there is no display of for both
      read and write on
      	/sys/devices/system/cpu/intel_pstate/max_perf_pct
      	/sys/devices/system/cpu/intel_pstate/min_perf_pct
      - User can change limits using
      	/sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
      	/sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
      	/sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
      - User can still observe turbo percent and number of P-States from
      	/sys/devices/system/cpu/intel_pstate/turbo_pct
      	/sys/devices/system/cpu/intel_pstate/num_pstates
      - User can read write system wide turbo status
      	/sys/devices/system/cpu/no_turbo
      
      While changing this BUG_ON is changed to WARN_ON, as they are not fatal
      errors for the system.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      eae48f04
  17. 25 10月, 2016 1 次提交
    • R
      cpufreq: intel_pstate: Always set max P-state in performance mode · 2f1d407a
      Rafael J. Wysocki 提交于
      The only times at which intel_pstate checks the policy set for
      a given CPU is the initialization of that CPU and updates of its
      policy settings from cpufreq when intel_pstate_set_policy() is
      invoked.
      
      That is insufficient, however, because intel_pstate uses the same
      P-state selection function for all CPUs regardless of the policy
      setting for each of them and the P-state limits are shared between
      them.  Thus if the policy is set to "performance" for a particular
      CPU, it may not behave as expected if the cpufreq settings are
      changed subsequently for another CPU.
      
      That can be easily demonstrated by writing "performance" to
      scaling_governor for all CPUs and then switching it to "powersave"
      for one of them in which case all of the CPUs will behave as though
      their scaling_governor were all "powersave" (even though the policy
      still appears to be "performance" for the remaining CPUs).
      
      Fix this problem by modifying intel_pstate_adjust_busy_pstate() to
      always set the P-state to the maximum allowed by the current limits
      for all CPUs whose policy is set to "performance".
      
      Note that it still is recommended to always change the policy setting
      in the same way for all CPUs even with this fix applied to avoid
      confusion.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      2f1d407a
  18. 22 10月, 2016 3 次提交
  19. 13 10月, 2016 2 次提交
  20. 10 10月, 2016 2 次提交
    • R
      cpufreq: intel_pstate: Clarify comment in get_target_pstate_use_performance() · f00593a4
      Rafael J. Wysocki 提交于
      Make the comment explaining the meaning of the perf_scaled variable
      in get_target_pstate_use_performance() more straightforward.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      f00593a4
    • S
      cpufreq: intel_pstate: Fix unsafe HWP MSR access · f9f4872d
      Srinivas Pandruvada 提交于
      This is a requirement that MSR MSR_PM_ENABLE must be set to 0x01 before
      reading MSR_HWP_CAPABILITIES on a given CPU. If cpufreq init() is
      scheduled on a CPU which is not same as policy->cpu or migrates to a
      different CPU before calling msr read for MSR_HWP_CAPABILITIES, it
      is possible that MSR_PM_ENABLE was not to set to 0x01 on that CPU.
      This will cause GP fault. So like other places in this path
      rdmsrl_on_cpu should be used instead of rdmsrl.
      
      Moreover the scope of MSR_HWP_CAPABILITIES is on per thread basis, so it
      should be read from the same CPU, for which MSR MSR_HWP_REQUEST is
      getting set.
      
      dmesg dump or warning:
      
      [   22.014488] WARNING: CPU: 139 PID: 1 at arch/x86/mm/extable.c:50 ex_handler_rdmsr_unsafe+0x68/0x70
      [   22.014492] unchecked MSR access error: RDMSR from 0x771
      [   22.014493] Modules linked in:
      [   22.014507] CPU: 139 PID: 1 Comm: swapper/0 Not tainted 4.7.5+ #1
      ...
      ...
      [   22.014516] Call Trace:
      [   22.014542]  [<ffffffff813d7dd1>] dump_stack+0x63/0x82
      [   22.014558]  [<ffffffff8107bc8b>] __warn+0xcb/0xf0
      [   22.014561]  [<ffffffff8107bcff>] warn_slowpath_fmt+0x4f/0x60
      [   22.014563]  [<ffffffff810676f8>] ex_handler_rdmsr_unsafe+0x68/0x70
      [   22.014564]  [<ffffffff810677d9>] fixup_exception+0x39/0x50
      [   22.014604]  [<ffffffff8102e400>] do_general_protection+0x80/0x150
      [   22.014610]  [<ffffffff817f9ec8>] general_protection+0x28/0x30
      [   22.014635]  [<ffffffff81687940>] ? get_target_pstate_use_performance+0xb0/0xb0
      [   22.014642]  [<ffffffff810600c7>] ? native_read_msr+0x7/0x40
      [   22.014657]  [<ffffffff81688123>] intel_pstate_hwp_set+0x23/0x130
      [   22.014660]  [<ffffffff81688406>] intel_pstate_set_policy+0x1b6/0x340
      [   22.014662]  [<ffffffff816829bb>] cpufreq_set_policy+0xeb/0x2c0
      [   22.014664]  [<ffffffff81682f39>] cpufreq_init_policy+0x79/0xe0
      [   22.014666]  [<ffffffff81682cb0>] ? cpufreq_update_policy+0x120/0x120
      [   22.014669]  [<ffffffff816833a6>] cpufreq_online+0x406/0x820
      [   22.014671]  [<ffffffff8168381f>] cpufreq_add_dev+0x5f/0x90
      [   22.014717]  [<ffffffff81530ac8>] subsys_interface_register+0xb8/0x100
      [   22.014719]  [<ffffffff816821bc>] cpufreq_register_driver+0x14c/0x210
      [   22.014749]  [<ffffffff81fe1d90>] intel_pstate_init+0x39d/0x4d5
      [   22.014751]  [<ffffffff81fe13f2>] ? cpufreq_gov_dbs_init+0x12/0x12
      
      Cc: 4.3+ <stable@vger.kernel.org> # 4.3+
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      f9f4872d