1. 30 5月, 2018 2 次提交
  2. 13 5月, 2018 1 次提交
    • V
      cpufreq: optimize cpufreq_notify_transition() · 20b5324d
      Viresh Kumar 提交于
      cpufreq_notify_transition() calls __cpufreq_notify_transition() for each
      CPU of a policy. There is a lot of code in __cpufreq_notify_transition()
      though which isn't required to be executed for each CPU, like checking
      about disabled cpufreq or irqs, adjusting jiffies, updating cpufreq
      stats and some debug print messages.
      
      This commit merges __cpufreq_notify_transition() into
      cpufreq_notify_transition() and modifies cpufreq_notify_transition() to
      execute minimum amount of code for each CPU.
      
      Also fix the kerneldoc for cpufreq_notify_transition() while at it.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      20b5324d
  3. 20 3月, 2018 1 次提交
  4. 28 2月, 2018 2 次提交
    • V
      cpufreq: Validate frequency table in the core · d417e069
      Viresh Kumar 提交于
      By design, cpufreq drivers are responsible for calling
      cpufreq_frequency_table_cpuinfo() from their ->init()
      callbacks to validate the frequency table.
      
      However, if a cpufreq driver is buggy and fails to do so properly, it
      lead to unexpected behavior of the driver or the cpufreq core at a
      later point in time.  It would be better if the core could
      validate the frequency table during driver initialization.
      
      To that end, introduce cpufreq_table_validate_and_sort() and make
      the cpufreq core call it right after invoking the ->init() callback
      of the driver and destroy the cpufreq policy if the table is invalid.
      
      For the time being the validation of the table happens twice, once
      from the driver and then from the core.  The individual drivers will
      be updated separately to drop table validation if they don't need it
      for other reasons.
      
      The frequency table is marked "sorted" or "unsorted" by the new helper
      now instead of in cpufreq_table_validate_and_show(), as it should only
      be done after validating the table (which the drivers won't do going
      forward).
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      [ rjw: Subject/changelog ]
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      d417e069
    • V
      cpufreq: Reorder cpufreq_online() error code path · b24b6478
      Viresh Kumar 提交于
      Ideally the de-allocation of resources should happen in the exact
      opposite order in which they were allocated. It helps maintain the code
      in long term, even if nothing really breaks with incorrect ordering.
      
      That wasn't followed in cpufreq_online() and it has some
      inconsistencies.  For example, the symlinks were created from within
      the locked region while they are removed only after putting the locks.
      Also ->exit() should have been called only after the symlinks are
      removed and the lock is dropped, as that was the case when ->init()
      was first called.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      [ rjw: Subject ]
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      b24b6478
  5. 05 2月, 2018 1 次提交
    • B
      cpufreq: Skip cpufreq resume if it's not suspended · 703cbaa6
      Bo Yan 提交于
      cpufreq_resume can be called even without preceding cpufreq_suspend.
      This can happen in following scenario:
      
          suspend_devices_and_enter
             --> dpm_suspend_start
                --> dpm_prepare
                    --> device_prepare : this function errors out
                --> dpm_suspend: this is skipped due to dpm_prepare failure
                                 this means cpufreq_suspend is skipped over
             --> goto Recover_platform, due to previous error
             --> goto Resume_devices
             --> dpm_resume_end
                 --> dpm_resume
                     --> cpufreq_resume
      
      In case schedutil is used as frequency governor, cpufreq_resume will
      eventually call sugov_start, which does following:
      
          memset(sg_cpu, 0, sizeof(*sg_cpu));
          ....
      
      This effectively erases function pointer for frequency update, causing
      crash later on. The function pointer would have been set correctly if
      subsequent cpufreq_add_update_util_hook runs successfully, but that
      function returns earlier because cpufreq_suspend was not called:
      
          if (WARN_ON(per_cpu(cpufreq_update_util_data, cpu)))
      		return;
      
      The fix is to check cpufreq_suspended first, if it's false, that means
      cpufreq_suspend was not called in the first place, so do not resume
      cpufreq.
      Signed-off-by: NBo Yan <byan@nvidia.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      [ rjw: Dropped printing a message ]
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      703cbaa6
  6. 04 12月, 2017 4 次提交
  7. 03 10月, 2017 1 次提交
  8. 22 8月, 2017 1 次提交
    • V
      cpufreq: Cap the default transition delay value to 10 ms · e948bc8f
      Viresh Kumar 提交于
      If transition_delay_us isn't defined by the cpufreq driver, the default
      value of transition delay (time after which the cpufreq governor will
      try updating the frequency again) is currently calculated by multiplying
      transition_latency (nsec) with LATENCY_MULTIPLIER (1000) and then
      converting this time to usec. That gives the exact same value as
      transition_latency, just that the time unit is usec instead of nsec.
      
      With acpi-cpufreq for example, transition_latency is set to around 10
      usec and we get transition delay as 10 ms. Which seems to be a
      reasonable amount of time to reevaluate the frequency again.
      
      But for platforms where frequency switching isn't that fast (like ARM),
      the transition_latency varies from 500 usec to 3 ms, and the transition
      delay becomes 500 ms to 3 seconds. Of course, that is a pretty bad
      default value to start with.
      
      We can try to come across a better formula (instead of multiplying with
      LATENCY_MULTIPLIER) to solve this problem, but will that be worth it ?
      
      This patch tries a simple approach and caps the maximum value of default
      transition delay to 10 ms. Of course, userspace can still come in and
      change this value anytime or individual drivers can rather provide
      transition_delay_us instead.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      e948bc8f
  9. 10 8月, 2017 1 次提交
    • V
      cpufreq: Return 0 from ->fast_switch() on errors · 209887e6
      Viresh Kumar 提交于
      CPUFREQ_ENTRY_INVALID is a special symbol which is used to specify that
      an entry in the cpufreq table is invalid. But using it outside of the
      scope of the cpufreq table looks a bit incorrect.
      
      We can represent an invalid frequency by writing it as 0 instead if we
      need. Note that it is already done that way for the return value of the
      ->get() callback.
      
      Lets do the same for ->fast_switch() and not use CPUFREQ_ENTRY_INVALID
      outside of the scope of cpufreq table.
      
      Also update the comment over cpufreq_driver_fast_switch() to clearly
      mention what this returns.
      
      None of the drivers return CPUFREQ_ENTRY_INVALID as of now from
      ->fast_switch() callback and so we don't need to update any of those.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      209887e6
  10. 26 7月, 2017 3 次提交
  11. 22 7月, 2017 1 次提交
  12. 27 6月, 2017 1 次提交
    • L
      x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF · f8475cef
      Len Brown 提交于
      The goal of this change is to give users a uniform and meaningful
      result when they read /sys/...cpufreq/scaling_cur_freq
      on modern x86 hardware, as compared to what they get today.
      
      Modern x86 processors include the hardware needed
      to accurately calculate frequency over an interval --
      APERF, MPERF, and the TSC.
      
      Here we provide an x86 routine to make this calculation
      on supported hardware, and use it in preference to any
      driver driver-specific cpufreq_driver.get() routine.
      
      MHz is computed like so:
      
      MHz = base_MHz * delta_APERF / delta_MPERF
      
      MHz is the average frequency of the busy processor
      over a measurement interval.  The interval is
      defined to be the time between successive invocations
      of aperfmperf_khz_on_cpu(), which are expected to to
      happen on-demand when users read sysfs attribute
      cpufreq/scaling_cur_freq.
      
      As with previous methods of calculating MHz,
      idle time is excluded.
      
      base_MHz above is from TSC calibration global "cpu_khz".
      
      This x86 native method to calculate MHz returns a meaningful result
      no matter if P-states are controlled by hardware or firmware
      and/or if the Linux cpufreq sub-system is or is-not installed.
      
      When this routine is invoked more frequently, the measurement
      interval becomes shorter.  However, the code limits re-computation
      to 10ms intervals so that average frequency remains meaningful.
      
      Discerning users are encouraged to take advantage of
      the turbostat(8) utility, which can gracefully handle
      concurrent measurement intervals of arbitrary length.
      Signed-off-by: NLen Brown <len.brown@intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      f8475cef
  13. 30 5月, 2017 1 次提交
  14. 26 5月, 2017 1 次提交
  15. 13 4月, 2017 1 次提交
    • C
      cpufreq: Bring CPUs up even if cpufreq_online() failed · c4a3fa26
      Chen Yu 提交于
      There is a report that after commit 27622b06 ("cpufreq: Convert
      to hotplug state machine"), the normal CPU offline/online cycle
      fails on some platforms.
      
      According to the ftrace result, this problem was triggered on
      platforms using acpi-cpufreq as the default cpufreq driver,
      and due to the lack of some ACPI freq method (eg. _PCT),
      cpufreq_online() failed and returned a negative value, so the CPU
      hotplug state machine rolled back the CPU online process.  Actually,
      from the user's perspective, the failure of cpufreq_online() should
      not prevent that CPU from being brought up, although cpufreq might
      not work on that CPU.
      
      BTW, during system startup cpufreq_online() is not invoked via CPU
      online but by the cpufreq device creation process, so the APs can be
      brought up even though cpufreq_online() fails in that stage.
      
      This patch ignores the return value of cpufreq_online/offline() and
      lets the cpufreq framework deal with the failure.  cpufreq_online()
      itself will do a proper rollback in that case and if _PCT is missing,
      the ACPI cpufreq driver will print a warning if the corresponding
      debug options have been enabled.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=194581
      Fixes: 27622b06 ("cpufreq: Convert to hotplug state machine")
      Reported-and-tested-by: NTomasz Maciej Nowak <tmn505@gmail.com>
      Signed-off-by: NChen Yu <yu.c.chen@intel.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Cc: 4.9+ <stable@vger.kernel.org> # 4.9+
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      c4a3fa26
  16. 28 3月, 2017 1 次提交
    • R
      cpufreq: Fix creation of symbolic links to policy directories · 2f0ba790
      Rafael J. Wysocki 提交于
      The cpufreq core only tries to create symbolic links from CPU
      directories in sysfs to policy directories in cpufreq_add_dev(),
      either when a given CPU is registered or when the cpufreq driver
      is registered, whichever happens first.  That is not sufficient,
      however, because cpufreq_add_dev() may be called for an offline CPU
      whose policy object has not been created yet and, quite obviously,
      the symbolic cannot be added in that case.
      
      Fix that by making cpufreq_online() attempt to add symbolic links to
      policy objects for the CPUs in the related_cpus mask of every new
      policy object created by it.
      
      The cpufreq_driver_lock locking around the for_each_cpu() loop
      in cpufreq_online() is dropped, because it is not necessary and the
      code is somewhat simpler without it.  Moreover, failures to create
      a symbolic link will not be regarded as hard errors any more and
      the CPUs without those links will not be taken offline automatically,
      but that should not be problematic in practice.
      Reported-and-tested-by: NPrashanth Prakash <pprakash@codeaurora.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: 4.9+ <stable@vger.kernel.org> # 4.9+
      2f0ba790
  17. 22 3月, 2017 1 次提交
  18. 16 3月, 2017 1 次提交
  19. 06 3月, 2017 1 次提交
    • L
      cpufreq: Add the "cpufreq.off=1" cmdline option · d82f2692
      Len Brown 提交于
      Add the "cpufreq.off=1" cmdline option.
      
      At boot-time, this allows a user to request CONFIG_CPU_FREQ=n
      behavior from a kernel built with CONFIG_CPU_FREQ=y.
      
      This is analogous to the existing "cpuidle.off=1" option
      and CONFIG_CPU_IDLE=y
      
      This capability is valuable when we need to debug end-user
      issues in the BIOS or in Linux.  It is also convenient
      for enabling comparisons, which may otherwise require a new kernel,
      or help from BIOS SETUP, which may be buggy or unavailable.
      Signed-off-by: NLen Brown <len.brown@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      d82f2692
  20. 16 2月, 2017 1 次提交
  21. 04 2月, 2017 2 次提交
  22. 01 2月, 2017 1 次提交
    • F
      sched/cputime: Convert kcpustat to nsecs · 7fb1327e
      Frederic Weisbecker 提交于
      Kernel CPU stats are stored in cputime_t which is an architecture
      defined type, and hence a bit opaque and requiring accessors and mutators
      for any operation.
      
      Converting them to nsecs simplifies the code and is one step toward
      the removal of cputime_t in the core code.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1485832191-26889-4-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7fb1327e
  23. 21 11月, 2016 2 次提交
  24. 20 9月, 2016 2 次提交
  25. 13 9月, 2016 1 次提交
    • V
      cpufreq: create link to policy only for registered CPUs · 26619804
      Viresh Kumar 提交于
      If a cpufreq driver is registered very early in the boot stage (e.g.
      registered from postcore_initcall()), then cpufreq core may generate
      kernel warnings for it.
      
      In this case, the CPUs are brought online, then the cpufreq driver is
      registered, and then the CPU topology devices are registered. However,
      by the time cpufreq_add_dev() gets called, the cpu device isn't stored
      in the per-cpu variable (cpu_sys_devices,) which is read by
      get_cpu_device().
      
      So the cpufreq core fails to get device for the CPU, for which
      cpufreq_add_dev() was called in the first place and we will hit a
      WARN_ON(!cpu_dev).
      
      Even if we reuse the 'dev' parameter passed to cpufreq_add_dev() to
      avoid that warning, there might be other CPUs online that share the
      policy with the cpu for which cpufreq_add_dev() is called. Eventually
      get_cpu_device() will return NULL for them as well, and we will hit the
      same WARN_ON() again.
      
      In order to fix these issues, change cpufreq core to create links to the
      policy for a cpu only when cpufreq_add_dev() is called for that CPU.
      
      Reuse the 'real_cpus' mask to track that as well.
      
      Note that cpufreq_remove_dev() already handles removal of the links for
      individual CPUs and cpufreq_add_dev() has aligned with that now.
      Reported-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Tested-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      26619804
  26. 01 9月, 2016 1 次提交
  27. 22 7月, 2016 2 次提交
  28. 21 7月, 2016 1 次提交
    • S
      cpufreq: add cpufreq_driver_resolve_freq() · e3c06236
      Steve Muckle 提交于
      Cpufreq governors may need to know what a particular target frequency
      maps to in the driver without necessarily wanting to set the frequency.
      Support this operation via a new cpufreq API,
      cpufreq_driver_resolve_freq(). This API returns the lowest driver
      frequency equal or greater than the target frequency
      (CPUFREQ_RELATION_L), subject to any policy (min/max) or driver
      limitations. The mapping is also cached in the policy so that a
      subsequent fast_switch operation can avoid repeating the same lookup.
      
      The API will call a new cpufreq driver callback, resolve_freq(), if it
      has been registered by the driver. Otherwise the frequency is resolved
      via cpufreq_frequency_table_target(). Rather than require ->target()
      style drivers to provide a resolve_freq() callback it is left to the
      caller to ensure that the driver implements this callback if necessary
      to use cpufreq_driver_resolve_freq().
      Suggested-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NSteve Muckle <smuckle@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      e3c06236
  29. 04 7月, 2016 1 次提交