1. 20 4月, 2017 4 次提交
  2. 18 4月, 2017 1 次提交
  3. 13 4月, 2017 1 次提交
    • C
      cpufreq: Bring CPUs up even if cpufreq_online() failed · c4a3fa26
      Chen Yu 提交于
      There is a report that after commit 27622b06 ("cpufreq: Convert
      to hotplug state machine"), the normal CPU offline/online cycle
      fails on some platforms.
      
      According to the ftrace result, this problem was triggered on
      platforms using acpi-cpufreq as the default cpufreq driver,
      and due to the lack of some ACPI freq method (eg. _PCT),
      cpufreq_online() failed and returned a negative value, so the CPU
      hotplug state machine rolled back the CPU online process.  Actually,
      from the user's perspective, the failure of cpufreq_online() should
      not prevent that CPU from being brought up, although cpufreq might
      not work on that CPU.
      
      BTW, during system startup cpufreq_online() is not invoked via CPU
      online but by the cpufreq device creation process, so the APs can be
      brought up even though cpufreq_online() fails in that stage.
      
      This patch ignores the return value of cpufreq_online/offline() and
      lets the cpufreq framework deal with the failure.  cpufreq_online()
      itself will do a proper rollback in that case and if _PCT is missing,
      the ACPI cpufreq driver will print a warning if the corresponding
      debug options have been enabled.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=194581
      Fixes: 27622b06 ("cpufreq: Convert to hotplug state machine")
      Reported-and-tested-by: NTomasz Maciej Nowak <tmn505@gmail.com>
      Signed-off-by: NChen Yu <yu.c.chen@intel.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Cc: 4.9+ <stable@vger.kernel.org> # 4.9+
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      c4a3fa26
  4. 30 3月, 2017 1 次提交
  5. 29 3月, 2017 16 次提交
  6. 28 3月, 2017 1 次提交
    • R
      cpufreq: Fix creation of symbolic links to policy directories · 2f0ba790
      Rafael J. Wysocki 提交于
      The cpufreq core only tries to create symbolic links from CPU
      directories in sysfs to policy directories in cpufreq_add_dev(),
      either when a given CPU is registered or when the cpufreq driver
      is registered, whichever happens first.  That is not sufficient,
      however, because cpufreq_add_dev() may be called for an offline CPU
      whose policy object has not been created yet and, quite obviously,
      the symbolic cannot be added in that case.
      
      Fix that by making cpufreq_online() attempt to add symbolic links to
      policy objects for the CPUs in the related_cpus mask of every new
      policy object created by it.
      
      The cpufreq_driver_lock locking around the for_each_cpu() loop
      in cpufreq_online() is dropped, because it is not necessary and the
      code is somewhat simpler without it.  Moreover, failures to create
      a symbolic link will not be regarded as hard errors any more and
      the CPUs without those links will not be taken offline automatically,
      but that should not be problematic in practice.
      Reported-and-tested-by: NPrashanth Prakash <pprakash@codeaurora.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: 4.9+ <stable@vger.kernel.org> # 4.9+
      2f0ba790
  7. 24 3月, 2017 4 次提交
    • R
      cpufreq: intel_pstate: Avoid transient updates of cpuinfo.max_freq · 80b120ca
      Rafael J. Wysocki 提交于
      Both intel_pstate_verify_policy() and intel_cpufreq_verify_policy()
      set policy->cpuinfo.max_freq depending on the turbo status, but the
      updates made by them are discarded by the core, because the policy
      object passed to them by the core is temporary and cpuinfo.max_freq
      from that object is not copied to the final policy object in
      cpufreq_set_policy().
      
      However, cpufreq_set_policy() passes the temporary policy object
      to the ->setpolicy callback of the driver, so intel_pstate_set_policy()
      actually sees the policy->cpuinfo.max_freq value updated by
      intel_pstate_verify_policy() and not the final one.  It also
      updates policy->max sometimes which basically has no effect after
      it returns, because the core discards that update.
      
      To avoid confusion, eliminate policy->cpuinfo.max_freq updates from
      intel_pstate_verify_policy() and intel_cpufreq_verify_policy()
      entirely and check the maximum frequency explicitly in
      intel_pstate_update_perf_limits() instead of relying on the
      transiently updated policy->cpuinfo.max_freq value.
      
      Moreover, move the max->policy adjustment carried out in
      intel_pstate_set_policy() to a separate function and call that
      function from the ->verify driver callbacks to ensure that it will
      actually be effective.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      80b120ca
    • R
      cpufreq: intel_pstate: Active mode P-state limits rework · c5a2ee7d
      Rafael J. Wysocki 提交于
      The coordination of P-state limits used by intel_pstate in the active
      mode (ie. by default) is problematic, because it synchronizes all of
      the limits (ie. the global ones and the per-policy ones) so as to use
      one common pair of P-state limits (min and max) across all CPUs in
      the system.  The drawbacks of that are as follows:
      
       - If P-states are coordinated in hardware, it is not necessary
         to coordinate them in software on top of that, so in that case
         all of the above activity is in vain.
      
       - If P-states are not coordinated in hardware, then the processor
         is actually capable of setting different P-states for different
         CPUs and coordinating them at the software level simply doesn't
         allow that capability to be utilized.
      
       - The coordination works in such a way that setting a per-policy
         limit (eg. scaling_max_freq) for one CPU causes the common
         effective limit to change (and it will affect all of the other
         CPUs too), but subsequent reads from the corresponding sysfs
         attributes for the other CPUs will return stale values (which
         is confusing).
      
       - Reads from the global P-state limit attributes, min_perf_pct and
         max_perf_pct, return the effective common values and not the last
         values set through these attributes.  However, the last values
         set through these attributes become hard limits that cannot be
         exceeded by writes to scaling_min_freq and scaling_max_freq,
         respectively, and they are not exposed, so essentially users
         have to remember what they are.
      
      All of that is painful enough to warrant a change of the management
      of P-state limits in the active mode.
      
      To that end, redesign the active mode P-state limits management in
      intel_pstate in accordance with the following rules:
      
       (1) All CPUs are affected by the global limits (that is, none of
           them can be requested to run faster than the global max and
           none of them can be requested to run slower than the global
           min).
      
       (2) Each individual CPU is affected by its own per-policy limits
           (that is, it cannot be requested to run faster than its own
           per-policy max and it cannot be requested to run slower than
           its own per-policy min).
      
       (3) The global and per-policy limits can be set independently.
      
      Also, the global maximum and minimum P-state limits will be always
      expressed as percentages of the maximum supported turbo P-state.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      c5a2ee7d
    • R
      cpufreq: intel_pstate: Use load-based P-state selection more widely · 55395345
      Rafael J. Wysocki 提交于
      Extend the set of systems for which intel_pstate will use the
      "powersave" P-state selection algorithm based on CPU load in the
      active mode by systems with ACPI preferred profile set to "tablet",
      "appliance PC", "desktop", or "workstation" (ie. everything with a
      specified preferred profile that is not a "server").
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      55395345
    • R
      cpufreq: intel_pstate: Support HWP processors in all operation modes · eb5139d1
      Rafael J. Wysocki 提交于
      Currently, some processors supporting HWP are only supported by
      intel_pstate if HWP is actually going to be used and not supported
      otherwise which is confusing.
      
      Specifically, they are not supported if "intel_pstate=no_hwp" is
      passed to the kernel in the command line or if the driver is started
      in the passive mode ("intel_pstate=passive").
      
      There is no real reason for that, because everything about those
      processor is known anyway and the driver can work with them in all
      modes, so make that happen, but use the load-based P-state selection
      algorithm for the active mode "powersave" policy with them.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      eb5139d1
  8. 22 3月, 2017 2 次提交
    • V
      cpufreq: Restore policy min/max limits on CPU online · ff010472
      Viresh Kumar 提交于
      On CPU online the cpufreq core restores the previous governor (or
      the previous "policy" setting for ->setpolicy drivers), but it does
      not restore the min/max limits at the same time, which is confusing,
      inconsistent and real pain for users who set the limits and then
      suspend/resume the system (using full suspend), in which case the
      limits are reset on all CPUs except for the boot one.
      
      Fix this by making cpufreq_online() restore the limits when an inactive
      policy is brought online.
      
      The commit log and patch are inspired from Rafael's earlier work.
      Reported-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Cc: 4.3+ <stable@vger.kernel.org> # 4.3+
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      ff010472
    • R
      cpufreq: intel_pstate: Fix policy data management in passive mode · 64897b20
      Rafael J. Wysocki 提交于
      The policy->cpuinfo.max_freq and policy->max updates in
      intel_cpufreq_turbo_update() are excessive as they are done for no
      good reason and may lead to problems in principle, so they should be
      dropped.  However, after dropping them intel_cpufreq_turbo_update()
      becomes almost entirely pointless, because the check made by it is
      made again down the road in intel_pstate_prepare_request().  The
      only thing in it that still needs to be done is the call to
      update_turbo_state(), so drop intel_cpufreq_turbo_update() altogether
      and make its callers invoke update_turbo_state() directly instead of
      it.
      
      In addition to that, fix intel_cpufreq_verify_policy() so that it
      checks global.no_turbo in addition to global.turbo_disabled when
      updating policy->cpuinfo.max_freq to make it consistent with
      intel_pstate_verify_policy().
      
      Fixes: 001c76f0 (cpufreq: intel_pstate: Generic governors support)
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      64897b20
  9. 18 3月, 2017 1 次提交
    • R
      cpufreq: intel_pstate: One set of global limits in active mode · 7de32556
      Rafael J. Wysocki 提交于
      In the active mode intel_pstate currently uses two sets of global
      limits, each associated with one of the possible scaling_governor
      settings in that mode: "powersave" or "performance".
      
      The driver switches over from one of those sets to the other
      depending on the scaling_governor setting for the last CPU whose
      per-policy cpufreq interface in sysfs was last used to change
      parameters exposed in there.  That obviously leads to no end of
      issues when the scaling_governor settings differ between CPUs.
      
      The most recent issue was introduced by commit a240c4aa (cpufreq:
      intel_pstate: Do not reinit performance limits in ->setpolicy)
      that eliminated the reinitialization of "performance" limits in
      intel_pstate_set_policy() preventing the max limit from being set
      to anything below 100, among other things.
      
      Namely, an undesirable side effect of commit a240c4aa is that
      now, after setting scaling_governor to "performance" in the active
      mode, the per-policy limits for the CPU in question go to the highest
      level and stay there even when it is switched back to "powersave"
      later.
      
      As it turns out, some distributions set scaling_governor to
      "performance" temporarily for all CPUs to speed-up system
      initialization, so that change causes them to misbehave later.
      
      To fix that, get rid of the performance/powersave global limits
      split and use just one set of global limits for everything.
      
      From the user's persepctive, after this modification, when
      scaling_governor is switched from "performance" to "powersave"
      or the other way around on one CPU, the limits settings (ie. the
      global max/min_perf_pct and per-policy scaling_max/min_freq for
      any CPUs) will not change.  Still, switching from "performance"
      to "powersave" or the other way around changes the way in which
      P-states are selected and in particular "performance" causes the
      driver to always request the highest P-state it is allowed to ask
      for for the given CPU.
      
      Fixes: a240c4aa (cpufreq: intel_pstate: Do not reinit performance limits in ->setpolicy)
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      7de32556
  10. 16 3月, 2017 2 次提交
  11. 15 3月, 2017 1 次提交
    • R
      cpufreq: intel_pstate: Avoid percentages in limits-related computations · e4c204ce
      Rafael J. Wysocki 提交于
      Currently, intel_pstate_update_perf_limits() first converts the
      policy minimum and maximum limits into percentages of the maximum
      turbo frequency (rounding up to an integer) and then converts these
      percentages to fractions (by using fixed-point arithmetic to divide
      them by 100).
      
      That introduces a rounding error unnecessarily, because the fractions
      can be obtained by carrying out fixed-point divisions directly on the
      input numbers.
      
      Rework the computations in intel_pstate_hwp_set() to use fractions
      instead of percentages (and drop redundant local variables from
      there) and modify intel_pstate_update_perf_limits() to compute the
      fractions directly and percentages out of them.
      
      While at it, introduce percent_ext_fp() for converting percentages
      to fractions (with extended number of fraction bits) and use it in
      the computations.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      e4c204ce
  12. 14 3月, 2017 2 次提交
    • S
      cpufreq: intel_pstate: Correct frequency setting in the HWP mode · 3f8ed54a
      Srinivas Pandruvada 提交于
      In the functions intel_pstate_hwp_set(), min/max range from HWP capability
      MSR along with max_perf_pct and min_perf_pct, is used to set the HWP
      request MSR. In some cases this doesn't result in the correct HWP max/min
      in HWP request.
      
      For example: In the following case:
      
      HWP capabilities from MSR 0x771
      0x70a1220
      
      Here cpufreq min/max frequencies from above MSR dump are 700MHz and 3.2GHz
      respectively.
      
      This will result in
      hwp_min = 0x07
      hwp_max = 0x20
      
      To limit max frequency to 2GHz:
      
      perf_limits->max_perf_pct = 63 (2GHz as a percent of 3.2GHz rounded up)
      
      With the current calculation:
      adj_range = max_perf_pct * range / 100;
      adj_range = 63 * (32 - 7) / 100
      adj_range = 15
      
      max = hw_min + adj_range;
      max = 7 + 15 = 22
      
      This will result in HWP request of 0x160f, which will result in a
      frequency cap of 2.2GHz not 2GHz.
      
      The problem with the above calculation is that hwp_min of 7 is treated
      as 0% in the range. But max_perf_pct is calculated with respect to minimum
      as 0 and max as 3.2GHz or hwp_max, so adding hwp_min to it will result in
      more than the desired.
      
      Since the min_perf_pct and max_perf_pct is already a percent of max
      frequency or hwp_max, this min/max HWP request value can be calculated
      directly applying these percentage to hwp_max.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      3f8ed54a
    • R
      cpufreq: intel_pstate: Update pid_params.sample_rate_ns in pid_param_set() · 6e7408ac
      Rafael J. Wysocki 提交于
      Fix the debugfs interface for PID tuning to actually update
      pid_params.sample_rate_ns on PID parameters updates, as changing
      pid_params.sample_rate_ms via debugfs has no effect now.
      
      Fixes: a4675fbc (cpufreq: intel_pstate: Replace timers with utilization update callbacks)
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      6e7408ac
  13. 13 3月, 2017 4 次提交