1. 13 10月, 2016 2 次提交
  2. 10 10月, 2016 2 次提交
    • R
      cpufreq: intel_pstate: Clarify comment in get_target_pstate_use_performance() · f00593a4
      Rafael J. Wysocki 提交于
      Make the comment explaining the meaning of the perf_scaled variable
      in get_target_pstate_use_performance() more straightforward.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      f00593a4
    • S
      cpufreq: intel_pstate: Fix unsafe HWP MSR access · f9f4872d
      Srinivas Pandruvada 提交于
      This is a requirement that MSR MSR_PM_ENABLE must be set to 0x01 before
      reading MSR_HWP_CAPABILITIES on a given CPU. If cpufreq init() is
      scheduled on a CPU which is not same as policy->cpu or migrates to a
      different CPU before calling msr read for MSR_HWP_CAPABILITIES, it
      is possible that MSR_PM_ENABLE was not to set to 0x01 on that CPU.
      This will cause GP fault. So like other places in this path
      rdmsrl_on_cpu should be used instead of rdmsrl.
      
      Moreover the scope of MSR_HWP_CAPABILITIES is on per thread basis, so it
      should be read from the same CPU, for which MSR MSR_HWP_REQUEST is
      getting set.
      
      dmesg dump or warning:
      
      [   22.014488] WARNING: CPU: 139 PID: 1 at arch/x86/mm/extable.c:50 ex_handler_rdmsr_unsafe+0x68/0x70
      [   22.014492] unchecked MSR access error: RDMSR from 0x771
      [   22.014493] Modules linked in:
      [   22.014507] CPU: 139 PID: 1 Comm: swapper/0 Not tainted 4.7.5+ #1
      ...
      ...
      [   22.014516] Call Trace:
      [   22.014542]  [<ffffffff813d7dd1>] dump_stack+0x63/0x82
      [   22.014558]  [<ffffffff8107bc8b>] __warn+0xcb/0xf0
      [   22.014561]  [<ffffffff8107bcff>] warn_slowpath_fmt+0x4f/0x60
      [   22.014563]  [<ffffffff810676f8>] ex_handler_rdmsr_unsafe+0x68/0x70
      [   22.014564]  [<ffffffff810677d9>] fixup_exception+0x39/0x50
      [   22.014604]  [<ffffffff8102e400>] do_general_protection+0x80/0x150
      [   22.014610]  [<ffffffff817f9ec8>] general_protection+0x28/0x30
      [   22.014635]  [<ffffffff81687940>] ? get_target_pstate_use_performance+0xb0/0xb0
      [   22.014642]  [<ffffffff810600c7>] ? native_read_msr+0x7/0x40
      [   22.014657]  [<ffffffff81688123>] intel_pstate_hwp_set+0x23/0x130
      [   22.014660]  [<ffffffff81688406>] intel_pstate_set_policy+0x1b6/0x340
      [   22.014662]  [<ffffffff816829bb>] cpufreq_set_policy+0xeb/0x2c0
      [   22.014664]  [<ffffffff81682f39>] cpufreq_init_policy+0x79/0xe0
      [   22.014666]  [<ffffffff81682cb0>] ? cpufreq_update_policy+0x120/0x120
      [   22.014669]  [<ffffffff816833a6>] cpufreq_online+0x406/0x820
      [   22.014671]  [<ffffffff8168381f>] cpufreq_add_dev+0x5f/0x90
      [   22.014717]  [<ffffffff81530ac8>] subsys_interface_register+0xb8/0x100
      [   22.014719]  [<ffffffff816821bc>] cpufreq_register_driver+0x14c/0x210
      [   22.014749]  [<ffffffff81fe1d90>] intel_pstate_init+0x39d/0x4d5
      [   22.014751]  [<ffffffff81fe13f2>] ? cpufreq_gov_dbs_init+0x12/0x12
      
      Cc: 4.3+ <stable@vger.kernel.org> # 4.3+
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      f9f4872d
  3. 17 9月, 2016 1 次提交
  4. 14 9月, 2016 1 次提交
  5. 13 9月, 2016 1 次提交
  6. 17 8月, 2016 1 次提交
    • R
      cpufreq / sched: Pass flags to cpufreq_update_util() · 58919e83
      Rafael J. Wysocki 提交于
      It is useful to know the reason why cpufreq_update_util() has just
      been called and that can be passed as flags to cpufreq_update_util()
      and to the ->func() callback in struct update_util_data.  However,
      doing that in addition to passing the util and max arguments they
      already take would be clumsy, so avoid it.
      
      Instead, use the observation that the schedutil governor is part
      of the scheduler proper, so it can access scheduler data directly.
      This allows the util and max arguments of cpufreq_update_util()
      and the ->func() callback in struct update_util_data to be replaced
      with a flags one, but schedutil has to be modified to follow.
      
      Thus make the schedutil governor obtain the CFS utilization
      information from the scheduler and use the "RT" and "DL" flags
      instead of the special utilization value of ULONG_MAX to track
      updates from the RT and DL sched classes.  Make it non-modular
      too to avoid having to export scheduler variables to modules at
      large.
      
      Next, update all of the other users of cpufreq_update_util()
      and the ->func() callback in struct update_util_data accordingly.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      58919e83
  7. 29 7月, 2016 1 次提交
  8. 21 7月, 2016 3 次提交
  9. 11 7月, 2016 1 次提交
  10. 07 7月, 2016 1 次提交
  11. 28 6月, 2016 4 次提交
  12. 15 6月, 2016 1 次提交
  13. 14 6月, 2016 1 次提交
  14. 08 6月, 2016 3 次提交
  15. 30 5月, 2016 1 次提交
  16. 18 5月, 2016 1 次提交
  17. 12 5月, 2016 4 次提交
  18. 10 5月, 2016 1 次提交
  19. 05 5月, 2016 1 次提交
  20. 04 5月, 2016 1 次提交
    • R
      intel_pstate: Fix intel_pstate_get() · 6d45b719
      Rafael J. Wysocki 提交于
      After commit 8fa520af "intel_pstate: Remove freq calculation from
      intel_pstate_calc_busy()" intel_pstate_get() calls get_avg_frequency()
      to compute the average frequency, which is problematic for two reasons.
      
      First, intel_pstate_get() may be invoked before the driver reads the
      CPU feedback registers for the first time and if that happens,
      get_avg_frequency() will attempt to divide by zero.
      
      Second, the get_avg_frequency() call in intel_pstate_get() is racy
      with respect to intel_pstate_sample() and it may end up returning
      completely meaningless values for this reason.
      
      Moreover, after commit 7349ec04 "intel_pstate: Move
      intel_pstate_calc_busy() into get_target_pstate_use_performance()"
      sample.core_pct_busy is never computed on Atom, but it is used in
      intel_pstate_adjust_busy_pstate() in that case too.
      
      To address those problems notice that if sample.core_pct_busy
      was used in the average frequency computation carried out by
      get_avg_frequency(), both the divide by zero problem and the
      race with respect to intel_pstate_sample() would be avoided.
      
      Accordingly, move the invocation of intel_pstate_calc_busy() from
      get_target_pstate_use_performance() to intel_pstate_update_util(),
      which also will take care of the uninitialized sample.core_pct_busy
      on Atom, and modify get_avg_frequency() to use sample.core_pct_busy
      as per the above.
      Reported-by: Nkernel test robot <ying.huang@linux.intel.com>
      Link: http://marc.info/?l=linux-kernel&m=146226437623173&w=4
      Fixes: 8fa520af "intel_pstate: Remove freq calculation from intel_pstate_calc_busy()"
      Fixes: 7349ec04 "intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()"
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      6d45b719
  21. 02 5月, 2016 1 次提交
  22. 28 4月, 2016 3 次提交
    • S
      cpufreq: intel_pstate: Enable PPC enforcement for servers · 2b3ec765
      Srinivas Pandruvada 提交于
      For platforms which are controlled via remove node manager, enable _PPC by
      default. These platforms are mostly categorized as enterprise server or
      performance servers. These platforms needs to go through some
      certifications tests, which tests control via _PPC.
      The relative risk of enabling by default is  low as this is is less likely
      that these systems have broken _PSS table.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      2b3ec765
    • S
      cpufreq: intel_pstate: Adjust policy->max · 3be9200d
      Srinivas Pandruvada 提交于
      When policy->max is changed via _PPC or sysfs and is more than the max non
      turbo frequency, it does not really change resulting performance in some
      processors. When policy->max results in a P-State ratio more than the
      turbo activation ratio, then processor can choose any P-State up to max
      turbo. So the user or _PPC setting has no value, but this can cause
      undesirable side effects like:
      - Showing reduced max percentage in Intel P-State sysfs
      - It can cause reduced max performance under certain boundary conditions:
      The requested max scaling frequency either via _PPC or via cpufreq-sysfs,
      will be converted into a fixed floating point max percent scale. In
      majority of the cases this will result in correct max. But not 100% of the
      time. If the _PPC is requested at a point where the calculation lead to a
      lower max, this can result in a lower P-State then expected and it will
      impact performance.
      Example of this condition using a Broadwell laptop with config TDP.
      
      ACPI _PSS table from a Broadwell laptop
      2301000 2300000 2200000 2000000 1900000 1800000 1700000 1500000 1400000
      1300000 1100000 1000000 900000 800000 600000 500000
      
      The actual results by disabling config TDP so that we can get what is
      requested on or below 2300000Khz.
      
      scaling_max_freq        Max Requested P-State   Resultant scaling
      max
      ---------------------------------------- ----------------------
      2400000                 18                      2900000 (max
      turbo)
      2300000                 17                      2300000 (max
      physical non turbo)
      2200000                 15                      2100000
      2100000                 15                      2100000
      2000000                 13                      1900000
      1900000                 13                      1900000
      1800000                 12                      1800000
      1700000                 11                      1700000
      1600000                 10                      1600000
      1500000                 f                       1500000
      1400000                 e                       1400000
      1300000                 d                       1300000
      1200000                 c                       1200000
      1100000                 a                       1000000
      1000000                 a                       1000000
      900000                  9                        900000
      800000                  8                        800000
      700000                  7                        700000
      600000                  6                        600000
      500000                  5                        500000
      ------------------------------------------------------------------
      
      Now set the config TDP level 1 ratio as 0x0b (equivalent to 1100000KHz)
      in BIOS (not every system will let you adjust this).
      The turbo activation ratio will be set to one less than that, which will
      be 0x0a (So any request above 1000000KHz should result in turbo region
      assuming no thermal limits).
      Here _PPC will request max to 1100000KHz (which basically should still
      result in turbo as this is more than the turbo activation ratio up to
      max allowable turbo frequency), but actual calculation resulted in a max
      ceiling P-State which is 0x0a. So under any load condition, this driver
      will not request turbo P-States. This will be a huge performance hit.
      
      When config TDP feature is ON, if the _PPC points to a frequency above
      turbo activation ratio, the performance can still reach max turbo. In this
      case we don't need to treat this as the reduced frequency in set_policy
      callback.
      
      In this change when config TDP is active (by checking if the physical max
      non turbo ratio is more than the current max non turbo ratio), any request
      above current max non turbo is treated as full performance.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      [ rjw : Minor cleanups ]
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      3be9200d
    • S
      cpufreq: intel_pstate: Enforce _PPC limits · 9522a2ff
      Srinivas Pandruvada 提交于
      Use ACPI _PPC notification to limit max P state driver will request.
      ACPI _PPC change notification is sent by BIOS to limit max P state
      in several cases:
      - Reduce impact of platform thermal condition
      - When Config TDP feature is used, a changed _PPC is sent to
      follow TDP change
      - Remote node managers in server want to control platform power
      via baseboard management controller (BMC)
      
      This change registers with ACPI processor performance lib so that
      _PPC changes are notified to cpufreq core, which in turns will
      result in call to .setpolicy() callback. Also the way _PSS
      table identifies a turbo frequency is not compatible to max turbo
      frequency in intel_pstate, so the very first entry in _PSS needs
      to be adjusted.
      
      This feature can be turned on by using kernel parameters:
      intel_pstate=support_acpi_ppc
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      [ rjw: Minor cleanups ]
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      9522a2ff
  23. 26 4月, 2016 1 次提交
  24. 25 4月, 2016 1 次提交
    • P
      cpufreq: intel_pstate: Use average P-State instead of current P-State · bdcaa23f
      Philippe Longepe 提交于
      The result returned by pid_calc() is subtracted from current_pstate
      (which is the P-State requested during the last period) in order to
      obtain the target P-State for the current iteration.
      
      However, current_pstate may not reflect the real current P-State of
      the CPU. In particular, that P-State may be higher because of the
      frequency sharing per module.
      
      The theory is:
       - The load is the percentage of time spent in C0 and is related to
         the average P-State during the same period.
       - The last requested P-State can be completely different than the
         average P-State (because of frequency sharing or throttling).
       - The P-State shift computed by the pid_calc is based on the load
         computed at average P-State, so the shift must be relative to
         this average P-State.
      
      Using the average P-State instead of current P-State improves power
      without significant performance penalty in cases when a task migrates
      from one core to other core sharing frequency and voltage.
      
      Performance and power comparison with this patch on Cherry Trail
      platform using Android:
      
      Benchmark               ?Perf    ?Power
      FishTank                10.45%    3.1%
      SmartBench-Gaming       -0.1%   -10.4%
      SmartBench-Productivity -0.8%   -10.4%
      CandyCrush                n/a   -17.4%
      AngryBirds                n/a    -5.9%
      videoPlayback             n/a   -13.9%
      audioPlayback             n/a    -4.9%
      IcyRocks-20-50           0.0%   -38.4%
      iozone RR               -0.16%  -1.3%
      iozone RW                0.74%  -1.3%
      Signed-off-by: NPhilippe Longepe <philippe.longepe@linux.intel.com>
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      bdcaa23f
  25. 10 4月, 2016 1 次提交
    • R
      intel_pstate: Avoid getting stuck in high P-states when idle · ffb81056
      Rafael J. Wysocki 提交于
      Jörg Otte reports that commit a4675fbc (cpufreq: intel_pstate:
      Replace timers with utilization update callbacks) caused the CPUs in
      his Haswell-based system to stay in the very high frequency region
      even if the system is completely idle.
      
      That turns out to be an existing problem in the intel_pstate driver's
      P-state selection algorithm for Core processors.  Namely, all
      decisions made by that algorithm are based on the average frequency
      of the CPU between sampling events and on the P-state requested on
      the last invocation, so it may get stuck at a very hight frequency
      even if the utilization of the CPU is very low (in fact, it may get
      stuck in a inadequate P-state regardless of the CPU utilization).
      The only way to kick it out of that limbo is a sufficiently long idle
      period (3 times longer than the prescribed sampling interval), but if
      that doesn't happen often enough (eg. due to a timing change like
      after the above commit), the P-state of the CPU may be inadequate
      pretty much all the time.
      
      To address the most egregious manifestations of that issue, reset the
      core_busy value used to determine the next P-state to request if the
      utilization of the CPU, determined with the help of the MPERF
      feedback register and the TSC, is below 1%.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=115771Reported-and-tested-by: NJörg Otte <jrg.otte@gmail.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      ffb81056
  26. 09 4月, 2016 1 次提交