1. 18 4月, 2017 1 次提交
  2. 30 3月, 2017 1 次提交
  3. 29 3月, 2017 16 次提交
  4. 24 3月, 2017 4 次提交
    • R
      cpufreq: intel_pstate: Avoid transient updates of cpuinfo.max_freq · 80b120ca
      Rafael J. Wysocki 提交于
      Both intel_pstate_verify_policy() and intel_cpufreq_verify_policy()
      set policy->cpuinfo.max_freq depending on the turbo status, but the
      updates made by them are discarded by the core, because the policy
      object passed to them by the core is temporary and cpuinfo.max_freq
      from that object is not copied to the final policy object in
      cpufreq_set_policy().
      
      However, cpufreq_set_policy() passes the temporary policy object
      to the ->setpolicy callback of the driver, so intel_pstate_set_policy()
      actually sees the policy->cpuinfo.max_freq value updated by
      intel_pstate_verify_policy() and not the final one.  It also
      updates policy->max sometimes which basically has no effect after
      it returns, because the core discards that update.
      
      To avoid confusion, eliminate policy->cpuinfo.max_freq updates from
      intel_pstate_verify_policy() and intel_cpufreq_verify_policy()
      entirely and check the maximum frequency explicitly in
      intel_pstate_update_perf_limits() instead of relying on the
      transiently updated policy->cpuinfo.max_freq value.
      
      Moreover, move the max->policy adjustment carried out in
      intel_pstate_set_policy() to a separate function and call that
      function from the ->verify driver callbacks to ensure that it will
      actually be effective.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      80b120ca
    • R
      cpufreq: intel_pstate: Active mode P-state limits rework · c5a2ee7d
      Rafael J. Wysocki 提交于
      The coordination of P-state limits used by intel_pstate in the active
      mode (ie. by default) is problematic, because it synchronizes all of
      the limits (ie. the global ones and the per-policy ones) so as to use
      one common pair of P-state limits (min and max) across all CPUs in
      the system.  The drawbacks of that are as follows:
      
       - If P-states are coordinated in hardware, it is not necessary
         to coordinate them in software on top of that, so in that case
         all of the above activity is in vain.
      
       - If P-states are not coordinated in hardware, then the processor
         is actually capable of setting different P-states for different
         CPUs and coordinating them at the software level simply doesn't
         allow that capability to be utilized.
      
       - The coordination works in such a way that setting a per-policy
         limit (eg. scaling_max_freq) for one CPU causes the common
         effective limit to change (and it will affect all of the other
         CPUs too), but subsequent reads from the corresponding sysfs
         attributes for the other CPUs will return stale values (which
         is confusing).
      
       - Reads from the global P-state limit attributes, min_perf_pct and
         max_perf_pct, return the effective common values and not the last
         values set through these attributes.  However, the last values
         set through these attributes become hard limits that cannot be
         exceeded by writes to scaling_min_freq and scaling_max_freq,
         respectively, and they are not exposed, so essentially users
         have to remember what they are.
      
      All of that is painful enough to warrant a change of the management
      of P-state limits in the active mode.
      
      To that end, redesign the active mode P-state limits management in
      intel_pstate in accordance with the following rules:
      
       (1) All CPUs are affected by the global limits (that is, none of
           them can be requested to run faster than the global max and
           none of them can be requested to run slower than the global
           min).
      
       (2) Each individual CPU is affected by its own per-policy limits
           (that is, it cannot be requested to run faster than its own
           per-policy max and it cannot be requested to run slower than
           its own per-policy min).
      
       (3) The global and per-policy limits can be set independently.
      
      Also, the global maximum and minimum P-state limits will be always
      expressed as percentages of the maximum supported turbo P-state.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      c5a2ee7d
    • R
      cpufreq: intel_pstate: Use load-based P-state selection more widely · 55395345
      Rafael J. Wysocki 提交于
      Extend the set of systems for which intel_pstate will use the
      "powersave" P-state selection algorithm based on CPU load in the
      active mode by systems with ACPI preferred profile set to "tablet",
      "appliance PC", "desktop", or "workstation" (ie. everything with a
      specified preferred profile that is not a "server").
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      55395345
    • R
      cpufreq: intel_pstate: Support HWP processors in all operation modes · eb5139d1
      Rafael J. Wysocki 提交于
      Currently, some processors supporting HWP are only supported by
      intel_pstate if HWP is actually going to be used and not supported
      otherwise which is confusing.
      
      Specifically, they are not supported if "intel_pstate=no_hwp" is
      passed to the kernel in the command line or if the driver is started
      in the passive mode ("intel_pstate=passive").
      
      There is no real reason for that, because everything about those
      processor is known anyway and the driver can work with them in all
      modes, so make that happen, but use the load-based P-state selection
      algorithm for the active mode "powersave" policy with them.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      eb5139d1
  5. 22 3月, 2017 1 次提交
    • R
      cpufreq: intel_pstate: Fix policy data management in passive mode · 64897b20
      Rafael J. Wysocki 提交于
      The policy->cpuinfo.max_freq and policy->max updates in
      intel_cpufreq_turbo_update() are excessive as they are done for no
      good reason and may lead to problems in principle, so they should be
      dropped.  However, after dropping them intel_cpufreq_turbo_update()
      becomes almost entirely pointless, because the check made by it is
      made again down the road in intel_pstate_prepare_request().  The
      only thing in it that still needs to be done is the call to
      update_turbo_state(), so drop intel_cpufreq_turbo_update() altogether
      and make its callers invoke update_turbo_state() directly instead of
      it.
      
      In addition to that, fix intel_cpufreq_verify_policy() so that it
      checks global.no_turbo in addition to global.turbo_disabled when
      updating policy->cpuinfo.max_freq to make it consistent with
      intel_pstate_verify_policy().
      
      Fixes: 001c76f0 (cpufreq: intel_pstate: Generic governors support)
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      64897b20
  6. 18 3月, 2017 1 次提交
    • R
      cpufreq: intel_pstate: One set of global limits in active mode · 7de32556
      Rafael J. Wysocki 提交于
      In the active mode intel_pstate currently uses two sets of global
      limits, each associated with one of the possible scaling_governor
      settings in that mode: "powersave" or "performance".
      
      The driver switches over from one of those sets to the other
      depending on the scaling_governor setting for the last CPU whose
      per-policy cpufreq interface in sysfs was last used to change
      parameters exposed in there.  That obviously leads to no end of
      issues when the scaling_governor settings differ between CPUs.
      
      The most recent issue was introduced by commit a240c4aa (cpufreq:
      intel_pstate: Do not reinit performance limits in ->setpolicy)
      that eliminated the reinitialization of "performance" limits in
      intel_pstate_set_policy() preventing the max limit from being set
      to anything below 100, among other things.
      
      Namely, an undesirable side effect of commit a240c4aa is that
      now, after setting scaling_governor to "performance" in the active
      mode, the per-policy limits for the CPU in question go to the highest
      level and stay there even when it is switched back to "powersave"
      later.
      
      As it turns out, some distributions set scaling_governor to
      "performance" temporarily for all CPUs to speed-up system
      initialization, so that change causes them to misbehave later.
      
      To fix that, get rid of the performance/powersave global limits
      split and use just one set of global limits for everything.
      
      From the user's persepctive, after this modification, when
      scaling_governor is switched from "performance" to "powersave"
      or the other way around on one CPU, the limits settings (ie. the
      global max/min_perf_pct and per-policy scaling_max/min_freq for
      any CPUs) will not change.  Still, switching from "performance"
      to "powersave" or the other way around changes the way in which
      P-states are selected and in particular "performance" causes the
      driver to always request the highest P-state it is allowed to ask
      for for the given CPU.
      
      Fixes: a240c4aa (cpufreq: intel_pstate: Do not reinit performance limits in ->setpolicy)
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      7de32556
  7. 15 3月, 2017 1 次提交
    • R
      cpufreq: intel_pstate: Avoid percentages in limits-related computations · e4c204ce
      Rafael J. Wysocki 提交于
      Currently, intel_pstate_update_perf_limits() first converts the
      policy minimum and maximum limits into percentages of the maximum
      turbo frequency (rounding up to an integer) and then converts these
      percentages to fractions (by using fixed-point arithmetic to divide
      them by 100).
      
      That introduces a rounding error unnecessarily, because the fractions
      can be obtained by carrying out fixed-point divisions directly on the
      input numbers.
      
      Rework the computations in intel_pstate_hwp_set() to use fractions
      instead of percentages (and drop redundant local variables from
      there) and modify intel_pstate_update_perf_limits() to compute the
      fractions directly and percentages out of them.
      
      While at it, introduce percent_ext_fp() for converting percentages
      to fractions (with extended number of fraction bits) and use it in
      the computations.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      e4c204ce
  8. 14 3月, 2017 2 次提交
    • S
      cpufreq: intel_pstate: Correct frequency setting in the HWP mode · 3f8ed54a
      Srinivas Pandruvada 提交于
      In the functions intel_pstate_hwp_set(), min/max range from HWP capability
      MSR along with max_perf_pct and min_perf_pct, is used to set the HWP
      request MSR. In some cases this doesn't result in the correct HWP max/min
      in HWP request.
      
      For example: In the following case:
      
      HWP capabilities from MSR 0x771
      0x70a1220
      
      Here cpufreq min/max frequencies from above MSR dump are 700MHz and 3.2GHz
      respectively.
      
      This will result in
      hwp_min = 0x07
      hwp_max = 0x20
      
      To limit max frequency to 2GHz:
      
      perf_limits->max_perf_pct = 63 (2GHz as a percent of 3.2GHz rounded up)
      
      With the current calculation:
      adj_range = max_perf_pct * range / 100;
      adj_range = 63 * (32 - 7) / 100
      adj_range = 15
      
      max = hw_min + adj_range;
      max = 7 + 15 = 22
      
      This will result in HWP request of 0x160f, which will result in a
      frequency cap of 2.2GHz not 2GHz.
      
      The problem with the above calculation is that hwp_min of 7 is treated
      as 0% in the range. But max_perf_pct is calculated with respect to minimum
      as 0 and max as 3.2GHz or hwp_max, so adding hwp_min to it will result in
      more than the desired.
      
      Since the min_perf_pct and max_perf_pct is already a percent of max
      frequency or hwp_max, this min/max HWP request value can be calculated
      directly applying these percentage to hwp_max.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      3f8ed54a
    • R
      cpufreq: intel_pstate: Update pid_params.sample_rate_ns in pid_param_set() · 6e7408ac
      Rafael J. Wysocki 提交于
      Fix the debugfs interface for PID tuning to actually update
      pid_params.sample_rate_ns on PID parameters updates, as changing
      pid_params.sample_rate_ms via debugfs has no effect now.
      
      Fixes: a4675fbc (cpufreq: intel_pstate: Replace timers with utilization update callbacks)
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      6e7408ac
  9. 13 3月, 2017 1 次提交
  10. 06 3月, 2017 3 次提交
    • R
      cpufreq: intel_pstate: Do not reinit performance limits in ->setpolicy · a240c4aa
      Rafael J. Wysocki 提交于
      If the current P-state selection algorithm is set to "performance"
      in intel_pstate_set_policy(), the limits may be initialized from
      scratch, but only if no_turbo is not set and the maximum frequency
      allowed for the given CPU (i.e. the policy object representing it)
      is at least equal to the max frequency supported by the CPU.  In all
      of the other cases, the limits will not be updated.
      
      For example, the following can happen:
      
       # cat intel_pstate/status
       active
       # echo performance > cpufreq/policy0/scaling_governor
       # cat intel_pstate/min_perf_pct
       100
       # echo 94 > intel_pstate/min_perf_pct
       # cat intel_pstate/min_perf_pct
       100
       # cat cpufreq/policy0/scaling_max_freq
       3100000
       echo 3000000 > cpufreq/policy0/scaling_max_freq
       # cat intel_pstate/min_perf_pct
       94
       # echo 95 > intel_pstate/min_perf_pct
       # cat intel_pstate/min_perf_pct
       95
      
      That is confusing for two reasons.  First, the initial attempt to
      change min_perf_pct to 94 seems to have no effect, even though
      setting the global limits should always work.  Second, after
      changing scaling_max_freq for policy0 the global min_perf_pct
      attribute shows 94, even though it should have not been affected
      by that operation in principle.
      
      Moreover, the final attempt to change min_perf_pct to 95 worked
      as expected, because scaling_max_freq for the only policy with
      scaling_governor equal to "performance" was different from the
      maximum at that time.
      
      To make all that confusion go away, modify intel_pstate_set_policy()
      so that it doesn't reinitialize the limits at all.
      
      At the same time, change intel_pstate_set_performance_limits() to
      set min_sysfs_pct to 100 in the "performance" limits set so that
      switching the P-state selection algorithm to "performance" causes
      intel_pstate/min_perf_pct in sysfs to go to 100 (or whatever value
      min_sysfs_pct in the "performance" limits is set to later).
      
      That requires per-CPU limits to be initialized explicitly rather
      than by copying the global limits to avoid setting min_sysfs_pct
      in the per-CPU limits to 100.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      a240c4aa
    • R
      cpufreq: intel_pstate: Fix intel_pstate_verify_policy() · d74b1992
      Rafael J. Wysocki 提交于
      The code added to intel_pstate_verify_policy() by commit 1443ebba
      (cpufreq: intel_pstate: Fix sysfs limits enforcement for performance
      policy) should use perf_limits instead of limits, because otherwise
      setting global limits via sysfs may affect policies inconsistently.
      
      For example, in the sequence of shell commands below, the
      scaling_min_freq attribute for policy1 and policy2 should be
      affected in the same way, because scaling_governor is set in
      the same way for both of them:
      
       # cat cpufreq/policy1/scaling_governor
       powersave
       # cat cpufreq/policy2/scaling_governor
       powersave
       # echo performance > cpufreq/policy0/scaling_governor
       # echo 94 > intel_pstate/min_perf_pct
       # cat cpufreq/policy0/scaling_min_freq
       2914000
       # cat cpufreq/policy1/scaling_min_freq
       2914000
       # cat cpufreq/policy2/scaling_min_freq
       800000
      
      The are affected differently, because intel_pstate_verify_policy()
      is invoked with limits set to &performance_limits (left behind by
      policy0) for policy1 and with limits set to &powersave_limits (left
      behind by policy1) for policy2.  Since perf_limits is set to the
      set of limits matching the policy being updated, using it instead
      of limits fixes the inconsistency.
      
      Fixes: 1443ebba (cpufreq: intel_pstate: Fix sysfs limits enforcement for performance policy)
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      d74b1992
    • R
      cpufreq: intel_pstate: Fix global settings in active mode · cd59b4be
      Rafael J. Wysocki 提交于
      Commit 111b8b3f (cpufreq: intel_pstate: Always keep all
      limits settings in sync) changed intel_pstate to invoke
      cpufreq_update_policy() for every registered CPU on global sysfs
      attributes updates, but that led to undesirable effects in the
      active mode if the "performance" P-state selection algorithm is
      configufred for one CPU and the "powersave" one is chosen for
      all of the other CPUs.
      
      Namely, in that case, the following is possible:
      
       # cd /sys/devices/system/cpu/
       # cat intel_pstate/max_perf_pct
       100
       # cat intel_pstate/min_perf_pct
       26
       # echo performance > cpufreq/policy0/scaling_governor
       # cat intel_pstate/max_perf_pct
       100
       # cat intel_pstate/min_perf_pct
       100
       # echo 94 > intel_pstate/min_perf_pct
       # cat intel_pstate/min_perf_pct
       26
      
      The reason why this happens is because intel_pstate attempts to
      maintain two sets of global limits in the active mode, one for
      the "performance" P-state selection algorithm and one for the
      "powersave"  P-state selection algorithm, but the P-state selection
      algorithms are set per policy, so the global limits cannot reflect
      all of them at the same time if they are different for different
      policies.
      
      In the particular situation above, the attempt to change
      min_perf_pct to 94 caused cpufreq_update_policy() to be run
      for a CPU with the "powersave"  P-state selection algorithm
      and intel_pstate_set_policy() called by it silently switched the
      global limits to the "powersave" set which finally was reflected
      by the sysfs interface.
      
      To prevent that from happening, modify intel_pstate_update_policies()
      to always switch back to the set of limits that was used right before
      it has been invoked.
      
      Fixes: 111b8b3f (cpufreq: intel_pstate: Always keep all limits settings in sync)
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      cd59b4be
  11. 04 3月, 2017 3 次提交
    • R
      cpufreq: intel_pstate: Avoid triggering cpu_frequency tracepoint unnecessarily · 64078299
      Rafael J. Wysocki 提交于
      In the passive mode the cpu_frequency trace event is already
      triggered by the cpufreq core or by scaling governors, so
      intel_pstate should not trigger it once again for the same
      P-state updates.
      
      In addition to that, the frequency returned by
      intel_cpufreq_fast_switch() and passed via freqs.new from
      intel_cpufreq_target() to cpufreq_freq_transition_end() should
      reflect the P-state actually set, so make that happen.
      
      Fixes: 001c76f0 (cpufreq: intel_pstate: Generic governors support)
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      64078299
    • R
      cpufreq: intel_pstate: Fix intel_cpufreq_verify_policy() · 7f17326f
      Rafael J. Wysocki 提交于
      The intel_pstate_update_perf_limits() called from
      intel_cpufreq_verify_policy() may cause global P-state limits
      to change which is generally confusing and unnecessary.
      
      In the passive mode the global limits are only applied to the
      frequency selected by the scaling governor (they are not taken
      into account by governors when making decisions anyway), so making
      them follow the per-policy limits serves no purpose and may go
      against user expectations (as it generally causes the global
      attributes in sysfs to change even though they have not been
      written to in some cases).
      
      Fix that by dropping the intel_pstate_update_perf_limits()
      invocation from intel_cpufreq_verify_policy() (which also
      reduces the code size by a few lines).
      
      This change does not affect the per-CPU limits case, because those
      limits allow any P-state to be set by default in the passive mode
      and it removes the only piece of code updating them in that mode,
      so the per-policy settings will be the only ones taken into account
      in that case as expected.
      
      Fixes: 001c76f0 (cpufreq: intel_pstate: Generic governors support)
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      7f17326f
    • R
      cpufreq: intel_pstate: Do not use performance_limits in passive mode · 2bc756e7
      Rafael J. Wysocki 提交于
      Using performance_limits in the passive mode doesn't make
      sense, because in that mode the global limits are applied to the
      frequency selected by the scaling governor.
      
      The maximum and minimum P-state limits in performance_limits are both
      set to 100 percent which will put all CPUs into the turbo range
      regardless of what governor is used and what frequencies are
      selected by it (that is particularly undesirable on CPUs with the
      generic powersave governor attached).
      
      For this reason, make intel_pstate_register_driver() always point
      limits to powersave_limits in the passive mode.
      
      Fixes: 001c76f0 (cpufreq: intel_pstate: Generic governors support)
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      2bc756e7
  12. 02 3月, 2017 1 次提交
  13. 01 3月, 2017 1 次提交
  14. 28 2月, 2017 1 次提交
  15. 04 2月, 2017 3 次提交
    • S
      cpufreq: intel_pstate: Disable energy efficiency optimization · 6e978b22
      Srinivas Pandruvada 提交于
      Some Kabylake desktop processors may not reach max turbo when running in
      HWP mode, even if running under sustained 100% utilization.
      
      This occurs when the HWP.EPP (Energy Performance Preference) is set to
      "balance_power" (0x80) -- the default on most systems.
      
      It occurs because the platform BIOS may erroneously enable an
      energy-efficiency setting -- MSR_IA32_POWER_CTL BIT-EE, which is not
      recommended to be enabled on this SKU.
      
      On the failing systems, this BIOS issue was not discovered when the
      desktop motherboard was tested with Windows, because the BIOS also
      neglects to provide the ACPI/CPPC table, that Windows requires to enable
      HWP, and so Windows runs in legacy P-state mode, where this setting has
      no effect.
      
      Linux' intel_pstate driver does not require ACPI/CPPC to enable HWP, and
      so it runs in HWP mode, exposing this incorrect BIOS configuration.
      
      There are several ways to address this problem.
      
      First, Linux can also run in legacy P-state mode on this system.
      As intel_pstate is how Linux enables HWP, booting with
      "intel_pstate=disable"
      will run in acpi-cpufreq/ondemand legacy p-state mode.
      
      Or second, the "performance" governor can be used with intel_pstate,
      which will modify HWP.EPP to 0.
      
      Or third, starting in 4.10, the
      /sys/devices/system/cpu/cpufreq/policy*/energy_performance_preference
      attribute in can be updated from "balance_power" to "performance".
      
      Or fourth, apply this patch, which fixes the erroneous setting of
      MSR_IA32_POWER_CTL BIT_EE on this model, allowing the default
      configuration to function as designed.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Reviewed-by: NLen Brown <len.brown@intel.com>
      Cc: 4.6+ <stable@vger.kernel.org> # 4.6+
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      6e978b22
    • S
      cpufreq: intel_pstate: Calculate guaranteed performance for HWP · 8fc7554a
      Srinivas Pandruvada 提交于
      When HWP is active, turbo activation ratio is not used to calculate max
      non turbo ratio. But on these systems the max non turbo ratio is decided
      by config TDP settings.
      
      This change removes usage of MSR_TURBO_ACTIVATION_RATIO for HWP systems,
      instead directly use TDP ratios, when more than one TDPs are available.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      8fc7554a
    • S
      cpufreq: intel_pstate: Make HWP limits compatible with legacy · 4e5d3f71
      Srinivas Pandruvada 提交于
      Under HWP the performance limits are calculated using max_perf_pct
      and min_perf_pct using possible performance, not available performance.
      The available performance can be reduced by no_turbo setting. To make
      compatible with legacy mode, use max/min performance percentage with
      respect to available performance.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      4e5d3f71