1. 10 8月, 2017 1 次提交
    • V
      cpufreq: Return 0 from ->fast_switch() on errors · 209887e6
      Viresh Kumar 提交于
      CPUFREQ_ENTRY_INVALID is a special symbol which is used to specify that
      an entry in the cpufreq table is invalid. But using it outside of the
      scope of the cpufreq table looks a bit incorrect.
      
      We can represent an invalid frequency by writing it as 0 instead if we
      need. Note that it is already done that way for the return value of the
      ->get() callback.
      
      Lets do the same for ->fast_switch() and not use CPUFREQ_ENTRY_INVALID
      outside of the scope of cpufreq table.
      
      Also update the comment over cpufreq_driver_fast_switch() to clearly
      mention what this returns.
      
      None of the drivers return CPUFREQ_ENTRY_INVALID as of now from
      ->fast_switch() callback and so we don't need to update any of those.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      209887e6
  2. 27 6月, 2017 1 次提交
    • L
      x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF · f8475cef
      Len Brown 提交于
      The goal of this change is to give users a uniform and meaningful
      result when they read /sys/...cpufreq/scaling_cur_freq
      on modern x86 hardware, as compared to what they get today.
      
      Modern x86 processors include the hardware needed
      to accurately calculate frequency over an interval --
      APERF, MPERF, and the TSC.
      
      Here we provide an x86 routine to make this calculation
      on supported hardware, and use it in preference to any
      driver driver-specific cpufreq_driver.get() routine.
      
      MHz is computed like so:
      
      MHz = base_MHz * delta_APERF / delta_MPERF
      
      MHz is the average frequency of the busy processor
      over a measurement interval.  The interval is
      defined to be the time between successive invocations
      of aperfmperf_khz_on_cpu(), which are expected to to
      happen on-demand when users read sysfs attribute
      cpufreq/scaling_cur_freq.
      
      As with previous methods of calculating MHz,
      idle time is excluded.
      
      base_MHz above is from TSC calibration global "cpu_khz".
      
      This x86 native method to calculate MHz returns a meaningful result
      no matter if P-states are controlled by hardware or firmware
      and/or if the Linux cpufreq sub-system is or is-not installed.
      
      When this routine is invoked more frequently, the measurement
      interval becomes shorter.  However, the code limits re-computation
      to 10ms intervals so that average frequency remains meaningful.
      
      Discerning users are encouraged to take advantage of
      the turbostat(8) utility, which can gracefully handle
      concurrent measurement intervals of arbitrary length.
      Signed-off-by: NLen Brown <len.brown@intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      f8475cef
  3. 30 5月, 2017 1 次提交
  4. 26 5月, 2017 1 次提交
  5. 13 4月, 2017 1 次提交
    • C
      cpufreq: Bring CPUs up even if cpufreq_online() failed · c4a3fa26
      Chen Yu 提交于
      There is a report that after commit 27622b06 ("cpufreq: Convert
      to hotplug state machine"), the normal CPU offline/online cycle
      fails on some platforms.
      
      According to the ftrace result, this problem was triggered on
      platforms using acpi-cpufreq as the default cpufreq driver,
      and due to the lack of some ACPI freq method (eg. _PCT),
      cpufreq_online() failed and returned a negative value, so the CPU
      hotplug state machine rolled back the CPU online process.  Actually,
      from the user's perspective, the failure of cpufreq_online() should
      not prevent that CPU from being brought up, although cpufreq might
      not work on that CPU.
      
      BTW, during system startup cpufreq_online() is not invoked via CPU
      online but by the cpufreq device creation process, so the APs can be
      brought up even though cpufreq_online() fails in that stage.
      
      This patch ignores the return value of cpufreq_online/offline() and
      lets the cpufreq framework deal with the failure.  cpufreq_online()
      itself will do a proper rollback in that case and if _PCT is missing,
      the ACPI cpufreq driver will print a warning if the corresponding
      debug options have been enabled.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=194581
      Fixes: 27622b06 ("cpufreq: Convert to hotplug state machine")
      Reported-and-tested-by: NTomasz Maciej Nowak <tmn505@gmail.com>
      Signed-off-by: NChen Yu <yu.c.chen@intel.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Cc: 4.9+ <stable@vger.kernel.org> # 4.9+
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      c4a3fa26
  6. 28 3月, 2017 1 次提交
    • R
      cpufreq: Fix creation of symbolic links to policy directories · 2f0ba790
      Rafael J. Wysocki 提交于
      The cpufreq core only tries to create symbolic links from CPU
      directories in sysfs to policy directories in cpufreq_add_dev(),
      either when a given CPU is registered or when the cpufreq driver
      is registered, whichever happens first.  That is not sufficient,
      however, because cpufreq_add_dev() may be called for an offline CPU
      whose policy object has not been created yet and, quite obviously,
      the symbolic cannot be added in that case.
      
      Fix that by making cpufreq_online() attempt to add symbolic links to
      policy objects for the CPUs in the related_cpus mask of every new
      policy object created by it.
      
      The cpufreq_driver_lock locking around the for_each_cpu() loop
      in cpufreq_online() is dropped, because it is not necessary and the
      code is somewhat simpler without it.  Moreover, failures to create
      a symbolic link will not be regarded as hard errors any more and
      the CPUs without those links will not be taken offline automatically,
      but that should not be problematic in practice.
      Reported-and-tested-by: NPrashanth Prakash <pprakash@codeaurora.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: 4.9+ <stable@vger.kernel.org> # 4.9+
      2f0ba790
  7. 22 3月, 2017 1 次提交
  8. 16 3月, 2017 1 次提交
  9. 06 3月, 2017 1 次提交
    • L
      cpufreq: Add the "cpufreq.off=1" cmdline option · d82f2692
      Len Brown 提交于
      Add the "cpufreq.off=1" cmdline option.
      
      At boot-time, this allows a user to request CONFIG_CPU_FREQ=n
      behavior from a kernel built with CONFIG_CPU_FREQ=y.
      
      This is analogous to the existing "cpuidle.off=1" option
      and CONFIG_CPU_IDLE=y
      
      This capability is valuable when we need to debug end-user
      issues in the BIOS or in Linux.  It is also convenient
      for enabling comparisons, which may otherwise require a new kernel,
      or help from BIOS SETUP, which may be buggy or unavailable.
      Signed-off-by: NLen Brown <len.brown@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      d82f2692
  10. 16 2月, 2017 1 次提交
  11. 04 2月, 2017 2 次提交
  12. 01 2月, 2017 1 次提交
    • F
      sched/cputime: Convert kcpustat to nsecs · 7fb1327e
      Frederic Weisbecker 提交于
      Kernel CPU stats are stored in cputime_t which is an architecture
      defined type, and hence a bit opaque and requiring accessors and mutators
      for any operation.
      
      Converting them to nsecs simplifies the code and is one step toward
      the removal of cputime_t in the core code.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1485832191-26889-4-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7fb1327e
  13. 21 11月, 2016 2 次提交
  14. 20 9月, 2016 2 次提交
  15. 13 9月, 2016 1 次提交
    • V
      cpufreq: create link to policy only for registered CPUs · 26619804
      Viresh Kumar 提交于
      If a cpufreq driver is registered very early in the boot stage (e.g.
      registered from postcore_initcall()), then cpufreq core may generate
      kernel warnings for it.
      
      In this case, the CPUs are brought online, then the cpufreq driver is
      registered, and then the CPU topology devices are registered. However,
      by the time cpufreq_add_dev() gets called, the cpu device isn't stored
      in the per-cpu variable (cpu_sys_devices,) which is read by
      get_cpu_device().
      
      So the cpufreq core fails to get device for the CPU, for which
      cpufreq_add_dev() was called in the first place and we will hit a
      WARN_ON(!cpu_dev).
      
      Even if we reuse the 'dev' parameter passed to cpufreq_add_dev() to
      avoid that warning, there might be other CPUs online that share the
      policy with the cpu for which cpufreq_add_dev() is called. Eventually
      get_cpu_device() will return NULL for them as well, and we will hit the
      same WARN_ON() again.
      
      In order to fix these issues, change cpufreq core to create links to the
      policy for a cpu only when cpufreq_add_dev() is called for that CPU.
      
      Reuse the 'real_cpus' mask to track that as well.
      
      Note that cpufreq_remove_dev() already handles removal of the links for
      individual CPUs and cpufreq_add_dev() has aligned with that now.
      Reported-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Tested-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      26619804
  16. 01 9月, 2016 1 次提交
  17. 22 7月, 2016 2 次提交
  18. 21 7月, 2016 1 次提交
    • S
      cpufreq: add cpufreq_driver_resolve_freq() · e3c06236
      Steve Muckle 提交于
      Cpufreq governors may need to know what a particular target frequency
      maps to in the driver without necessarily wanting to set the frequency.
      Support this operation via a new cpufreq API,
      cpufreq_driver_resolve_freq(). This API returns the lowest driver
      frequency equal or greater than the target frequency
      (CPUFREQ_RELATION_L), subject to any policy (min/max) or driver
      limitations. The mapping is also cached in the policy so that a
      subsequent fast_switch operation can avoid repeating the same lookup.
      
      The API will call a new cpufreq driver callback, resolve_freq(), if it
      has been registered by the driver. Otherwise the frequency is resolved
      via cpufreq_frequency_table_target(). Rather than require ->target()
      style drivers to provide a resolve_freq() callback it is left to the
      caller to ensure that the driver implements this callback if necessary
      to use cpufreq_driver_resolve_freq().
      Suggested-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NSteve Muckle <smuckle@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      e3c06236
  19. 04 7月, 2016 1 次提交
  20. 28 6月, 2016 1 次提交
  21. 09 6月, 2016 4 次提交
  22. 03 6月, 2016 5 次提交
    • R
      cpufreq: stats: Make the stats code non-modular · 1aefc75b
      Rafael J. Wysocki 提交于
      The modularity of cpufreq_stats is quite problematic.
      
      First off, the usage of policy notifiers for the initialization
      and cleanup in the cpufreq_stats module is inherently racy with
      respect to CPU offline/online and the initialization and cleanup
      of the cpufreq driver.
      
      Second, fast frequency switching (used by the schedutil governor)
      cannot be enabled if any transition notifiers are registered, so
      if the cpufreq_stats module (that registers a transition notifier
      for updating transition statistics) is loaded, the schedutil governor
      cannot use fast frequency switching.
      
      On the other hand, allowing cpufreq_stats to be built as a module
      doesn't really add much value.  Arguably, there's not much reason
      for that code to be modular at all.
      
      For the above reasons, make the cpufreq stats code non-modular,
      modify the core to invoke functions provided by that code directly
      and drop the notifiers from it.
      
      Make the stats sysfs attributes appear empty if fast frequency
      switching is enabled as the statistics will not be updated in that
      case anyway (and returning -EBUSY from those attributes breaks
      powertop).
      
      While at it, clean up Kconfig help for the CPU_FREQ_STAT and
      CPU_FREQ_STAT_DETAILS options.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      1aefc75b
    • V
      cpufreq: Use clamp_val() in __cpufreq_driver_target() · 910c6e88
      Viresh Kumar 提交于
      Use clamp_val() instead of open coding it.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      910c6e88
    • V
      cpufreq: Send START policy notification after sending CREATE · 388612ba
      Viresh Kumar 提交于
      The sequence got a bit wrong as we are sending CPUFREQ_START
      notifications even before we have sent CPUFREQ_CREATE_POLICY.
      
      Fix it.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      388612ba
    • R
      cpufreq: Drop the 'initialized' field from struct cpufreq_governor · 9a15fb2c
      Rafael J. Wysocki 提交于
      The 'initialized' field in struct cpufreq_governor is only used by
      the conservative governor (as a usage counter) and the way that
      happens is far from straightforward and arguably incorrect.
      
      Namely, the value of 'initialized' is checked by
      cpufreq_dbs_governor_init() and cpufreq_dbs_governor_exit() and
      the results of those checks are passed (as the second argument) to
      the ->init() and ->exit() callbacks in struct dbs_governor.  Those
      callbacks are only implemented by the ondemand and conservative
      governors and ondemand doesn't use their second argument at all.
      In turn, the conservative governor uses it to decide whether or not
      to either register or unregister a transition notifier.
      
      That whole mechanism is not only unnecessarily convoluted, but also
      racy, because the 'initialized' field of struct cpufreq_governor is
      updated in cpufreq_init_governor() and cpufreq_exit_governor() under
      policy->rwsem which doesn't help if one of these functions is run
      twice in parallel for different policies (which isn't impossible in
      principle), for example.
      
      Instead of it, add a proper usage counter to the conservative
      governor and update it from cs_init() and cs_exit() which is
      guaranteed to be non-racy, as those functions are only called
      under gov_dbs_data_mutex which is global.
      
      With that in place, drop the 'initialized' field from struct
      cpufreq_governor as it is not used any more.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      9a15fb2c
    • R
      cpufreq: governor: Get rid of governor events · e788892b
      Rafael J. Wysocki 提交于
      The design of the cpufreq governor API is not very straightforward,
      as struct cpufreq_governor provides only one callback to be invoked
      from different code paths for different purposes.  The purpose it is
      invoked for is determined by its second "event" argument, causing it
      to act as a "callback multiplexer" of sorts.
      
      Unfortunately, that leads to extra complexity in governors, some of
      which implement the ->governor() callback as a switch statement
      that simply checks the event argument and invokes a separate function
      to handle that specific event.
      
      That extra complexity can be eliminated by replacing the all-purpose
      ->governor() callback with a family of callbacks to carry out specific
      governor operations: initialization and exit, start and stop and policy
      limits updates.  That also turns out to reduce the code size too, so
      do it.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      e788892b
  23. 02 6月, 2016 1 次提交
  24. 30 5月, 2016 3 次提交
  25. 18 5月, 2016 3 次提交