1. 29 10月, 2019 1 次提交
  2. 31 5月, 2019 1 次提交
  3. 10 3月, 2019 1 次提交
    • V
      cpufreq: Use struct kobj_attribute instead of struct global_attr · 464b4279
      Viresh Kumar 提交于
      commit 625c85a62cb7d3c79f6e16de3cfa972033658250 upstream.
      
      The cpufreq_global_kobject is created using kobject_create_and_add()
      helper, which assigns the kobj_type as dynamic_kobj_ktype and show/store
      routines are set to kobj_attr_show() and kobj_attr_store().
      
      These routines pass struct kobj_attribute as an argument to the
      show/store callbacks. But all the cpufreq files created using the
      cpufreq_global_kobject expect the argument to be of type struct
      attribute. Things work fine currently as no one accesses the "attr"
      argument. We may not see issues even if the argument is used, as struct
      kobj_attribute has struct attribute as its first element and so they
      will both get same address.
      
      But this is logically incorrect and we should rather use struct
      kobj_attribute instead of struct global_attr in the cpufreq core and
      drivers and the show/store callbacks should take struct kobj_attribute
      as argument instead.
      
      This bug is caught using CFI CLANG builds in android kernel which
      catches mismatch in function prototypes for such callbacks.
      Reported-by: NDonghee Han <dh.han@samsung.com>
      Reported-by: NSangkyu Kim <skwith.kim@samsung.com>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      464b4279
  4. 20 2月, 2019 1 次提交
    • S
      cpufreq: check if policy is inactive early in __cpufreq_get() · c6f27cdd
      Sudeep Holla 提交于
      [ Upstream commit 2f66196208c98b3d1b4294edffb2c5a8197be899 ]
      
      cpuinfo_cur_freq gets current CPU frequency as detected by hardware
      while scaling_cur_freq last known CPU frequency. Some platforms may not
      allow checking the CPU frequency of an offline CPU or the associated
      resources may have been released via cpufreq_exit when the CPU gets
      offlined, in which case the policy would have been invalidated already.
      If we attempt to get current frequency from the hardware, it may result
      in hang or crash.
      
      For example on Juno, I see:
      
      Unable to handle kernel NULL pointer dereference at virtual address 0000000000000188
      [0000000000000188] pgd=0000000000000000
      Internal error: Oops: 96000004 [#1] PREEMPT SMP
      Modules linked in:
      CPU: 5 PID: 4202 Comm: cat Not tainted 4.20.0-08251-ga0f2c0318a15-dirty #87
      Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform
      pstate: 40000005 (nZcv daif -PAN -UAO)
      pc : scmi_cpufreq_get_rate+0x34/0xb0
      lr : scmi_cpufreq_get_rate+0x34/0xb0
      Call trace:
       scmi_cpufreq_get_rate+0x34/0xb0
       __cpufreq_get+0x34/0xc0
       show_cpuinfo_cur_freq+0x24/0x78
       show+0x40/0x60
       sysfs_kf_seq_show+0xc0/0x148
       kernfs_seq_show+0x44/0x50
       seq_read+0xd4/0x480
       kernfs_fop_read+0x15c/0x208
       __vfs_read+0x60/0x188
       vfs_read+0x94/0x150
       ksys_read+0x6c/0xd8
       __arm64_sys_read+0x24/0x30
       el0_svc_common+0x78/0x100
       el0_svc_handler+0x38/0x78
       el0_svc+0x8/0xc
      ---[ end trace 3d1024e58f77f6b2 ]---
      
      So fix the issue by checking if the policy is invalid early in
      __cpufreq_get before attempting to get the current frequency.
      Signed-off-by: NSudeep Holla <sudeep.holla@arm.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      c6f27cdd
  5. 26 7月, 2018 2 次提交
    • W
      cpufreq: Fix a circular lock dependency problem · 9b3d9bb3
      Waiman Long 提交于
      With lockdep turned on, the following circular lock dependency problem
      was reported:
      
      [   57.470040] ======================================================
      [   57.502900] WARNING: possible circular locking dependency detected
      [   57.535208] 4.18.0-0.rc3.1.el8+7.x86_64+debug #1 Tainted: G
      [   57.577761] ------------------------------------------------------
      [   57.609714] tuned/1505 is trying to acquire lock:
      [   57.633808] 00000000559deec5 (cpu_hotplug_lock.rw_sem){++++}, at: store+0x27/0x120
      [   57.672880]
      [   57.672880] but task is already holding lock:
      [   57.702184] 000000002136ca64 (kn->count#118){++++}, at: kernfs_fop_write+0x1d0/0x410
      [   57.742176]
      [   57.742176] which lock already depends on the new lock.
      [   57.742176]
      [   57.785220]
      [   57.785220] the existing dependency chain (in reverse order) is:
          :
      [   58.932512] other info that might help us debug this:
      [   58.932512]
      [   58.973344] Chain exists of:
      [   58.973344]   cpu_hotplug_lock.rw_sem --> subsys mutex#5 --> kn->count#118
      [   58.973344]
      [   59.030795]  Possible unsafe locking scenario:
      [   59.030795]
      [   59.061248]        CPU0                    CPU1
      [   59.085377]        ----                    ----
      [   59.108160]   lock(kn->count#118);
      [   59.124935]                                lock(subsys mutex#5);
      [   59.156330]                                lock(kn->count#118);
      [   59.186088]   lock(cpu_hotplug_lock.rw_sem);
      [   59.208541]
      [   59.208541]  *** DEADLOCK ***
      
      In the cpufreq_register_driver() function, the lock sequence is:
      
        cpus_read_lock --> kn->count
      
      For the cpufreq sysfs store method, the lock sequence is:
      
        kn->count --> cpus_read_lock
      
      These sequences are actually safe as they are taking a share lock on
      cpu_hotplug_lock. However, the current lockdep code doesn't check for
      share locking when detecting circular lock dependency.  Fixing that
      could be a substantial effort.
      
      Instead, we can work around this problem by using cpus_read_trylock()
      in the store method which is much simpler. The chance of not getting
      the read lock is very small. If that happens, the userspace application
      that writes the sysfs file will get an error.
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      9b3d9bb3
    • R
      cpufreq: trace frequency limits change · 601b2185
      Ruchi Kandoi 提交于
      systrace used for tracing for Android systems has carried a patch for
      many years in the Android tree that traces when the cpufreq limits
      change.  With the help of this information, systrace can know when the
      policy limits change and can visually display the data. Lets add
      upstream support for the same.
      Signed-off-by: NRuchi Kandoi <kandoiruchi@google.com>
      Signed-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      601b2185
  6. 30 5月, 2018 2 次提交
  7. 13 5月, 2018 1 次提交
    • V
      cpufreq: optimize cpufreq_notify_transition() · 20b5324d
      Viresh Kumar 提交于
      cpufreq_notify_transition() calls __cpufreq_notify_transition() for each
      CPU of a policy. There is a lot of code in __cpufreq_notify_transition()
      though which isn't required to be executed for each CPU, like checking
      about disabled cpufreq or irqs, adjusting jiffies, updating cpufreq
      stats and some debug print messages.
      
      This commit merges __cpufreq_notify_transition() into
      cpufreq_notify_transition() and modifies cpufreq_notify_transition() to
      execute minimum amount of code for each CPU.
      
      Also fix the kerneldoc for cpufreq_notify_transition() while at it.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      20b5324d
  8. 20 3月, 2018 1 次提交
  9. 28 2月, 2018 2 次提交
    • V
      cpufreq: Validate frequency table in the core · d417e069
      Viresh Kumar 提交于
      By design, cpufreq drivers are responsible for calling
      cpufreq_frequency_table_cpuinfo() from their ->init()
      callbacks to validate the frequency table.
      
      However, if a cpufreq driver is buggy and fails to do so properly, it
      lead to unexpected behavior of the driver or the cpufreq core at a
      later point in time.  It would be better if the core could
      validate the frequency table during driver initialization.
      
      To that end, introduce cpufreq_table_validate_and_sort() and make
      the cpufreq core call it right after invoking the ->init() callback
      of the driver and destroy the cpufreq policy if the table is invalid.
      
      For the time being the validation of the table happens twice, once
      from the driver and then from the core.  The individual drivers will
      be updated separately to drop table validation if they don't need it
      for other reasons.
      
      The frequency table is marked "sorted" or "unsorted" by the new helper
      now instead of in cpufreq_table_validate_and_show(), as it should only
      be done after validating the table (which the drivers won't do going
      forward).
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      [ rjw: Subject/changelog ]
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      d417e069
    • V
      cpufreq: Reorder cpufreq_online() error code path · b24b6478
      Viresh Kumar 提交于
      Ideally the de-allocation of resources should happen in the exact
      opposite order in which they were allocated. It helps maintain the code
      in long term, even if nothing really breaks with incorrect ordering.
      
      That wasn't followed in cpufreq_online() and it has some
      inconsistencies.  For example, the symlinks were created from within
      the locked region while they are removed only after putting the locks.
      Also ->exit() should have been called only after the symlinks are
      removed and the lock is dropped, as that was the case when ->init()
      was first called.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      [ rjw: Subject ]
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      b24b6478
  10. 05 2月, 2018 1 次提交
    • B
      cpufreq: Skip cpufreq resume if it's not suspended · 703cbaa6
      Bo Yan 提交于
      cpufreq_resume can be called even without preceding cpufreq_suspend.
      This can happen in following scenario:
      
          suspend_devices_and_enter
             --> dpm_suspend_start
                --> dpm_prepare
                    --> device_prepare : this function errors out
                --> dpm_suspend: this is skipped due to dpm_prepare failure
                                 this means cpufreq_suspend is skipped over
             --> goto Recover_platform, due to previous error
             --> goto Resume_devices
             --> dpm_resume_end
                 --> dpm_resume
                     --> cpufreq_resume
      
      In case schedutil is used as frequency governor, cpufreq_resume will
      eventually call sugov_start, which does following:
      
          memset(sg_cpu, 0, sizeof(*sg_cpu));
          ....
      
      This effectively erases function pointer for frequency update, causing
      crash later on. The function pointer would have been set correctly if
      subsequent cpufreq_add_update_util_hook runs successfully, but that
      function returns earlier because cpufreq_suspend was not called:
      
          if (WARN_ON(per_cpu(cpufreq_update_util_data, cpu)))
      		return;
      
      The fix is to check cpufreq_suspended first, if it's false, that means
      cpufreq_suspend was not called in the first place, so do not resume
      cpufreq.
      Signed-off-by: NBo Yan <byan@nvidia.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      [ rjw: Dropped printing a message ]
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      703cbaa6
  11. 04 12月, 2017 4 次提交
  12. 03 10月, 2017 1 次提交
  13. 22 8月, 2017 1 次提交
    • V
      cpufreq: Cap the default transition delay value to 10 ms · e948bc8f
      Viresh Kumar 提交于
      If transition_delay_us isn't defined by the cpufreq driver, the default
      value of transition delay (time after which the cpufreq governor will
      try updating the frequency again) is currently calculated by multiplying
      transition_latency (nsec) with LATENCY_MULTIPLIER (1000) and then
      converting this time to usec. That gives the exact same value as
      transition_latency, just that the time unit is usec instead of nsec.
      
      With acpi-cpufreq for example, transition_latency is set to around 10
      usec and we get transition delay as 10 ms. Which seems to be a
      reasonable amount of time to reevaluate the frequency again.
      
      But for platforms where frequency switching isn't that fast (like ARM),
      the transition_latency varies from 500 usec to 3 ms, and the transition
      delay becomes 500 ms to 3 seconds. Of course, that is a pretty bad
      default value to start with.
      
      We can try to come across a better formula (instead of multiplying with
      LATENCY_MULTIPLIER) to solve this problem, but will that be worth it ?
      
      This patch tries a simple approach and caps the maximum value of default
      transition delay to 10 ms. Of course, userspace can still come in and
      change this value anytime or individual drivers can rather provide
      transition_delay_us instead.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      e948bc8f
  14. 10 8月, 2017 1 次提交
    • V
      cpufreq: Return 0 from ->fast_switch() on errors · 209887e6
      Viresh Kumar 提交于
      CPUFREQ_ENTRY_INVALID is a special symbol which is used to specify that
      an entry in the cpufreq table is invalid. But using it outside of the
      scope of the cpufreq table looks a bit incorrect.
      
      We can represent an invalid frequency by writing it as 0 instead if we
      need. Note that it is already done that way for the return value of the
      ->get() callback.
      
      Lets do the same for ->fast_switch() and not use CPUFREQ_ENTRY_INVALID
      outside of the scope of cpufreq table.
      
      Also update the comment over cpufreq_driver_fast_switch() to clearly
      mention what this returns.
      
      None of the drivers return CPUFREQ_ENTRY_INVALID as of now from
      ->fast_switch() callback and so we don't need to update any of those.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      209887e6
  15. 26 7月, 2017 3 次提交
  16. 22 7月, 2017 1 次提交
  17. 27 6月, 2017 1 次提交
    • L
      x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF · f8475cef
      Len Brown 提交于
      The goal of this change is to give users a uniform and meaningful
      result when they read /sys/...cpufreq/scaling_cur_freq
      on modern x86 hardware, as compared to what they get today.
      
      Modern x86 processors include the hardware needed
      to accurately calculate frequency over an interval --
      APERF, MPERF, and the TSC.
      
      Here we provide an x86 routine to make this calculation
      on supported hardware, and use it in preference to any
      driver driver-specific cpufreq_driver.get() routine.
      
      MHz is computed like so:
      
      MHz = base_MHz * delta_APERF / delta_MPERF
      
      MHz is the average frequency of the busy processor
      over a measurement interval.  The interval is
      defined to be the time between successive invocations
      of aperfmperf_khz_on_cpu(), which are expected to to
      happen on-demand when users read sysfs attribute
      cpufreq/scaling_cur_freq.
      
      As with previous methods of calculating MHz,
      idle time is excluded.
      
      base_MHz above is from TSC calibration global "cpu_khz".
      
      This x86 native method to calculate MHz returns a meaningful result
      no matter if P-states are controlled by hardware or firmware
      and/or if the Linux cpufreq sub-system is or is-not installed.
      
      When this routine is invoked more frequently, the measurement
      interval becomes shorter.  However, the code limits re-computation
      to 10ms intervals so that average frequency remains meaningful.
      
      Discerning users are encouraged to take advantage of
      the turbostat(8) utility, which can gracefully handle
      concurrent measurement intervals of arbitrary length.
      Signed-off-by: NLen Brown <len.brown@intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      f8475cef
  18. 30 5月, 2017 1 次提交
  19. 26 5月, 2017 1 次提交
  20. 13 4月, 2017 1 次提交
    • C
      cpufreq: Bring CPUs up even if cpufreq_online() failed · c4a3fa26
      Chen Yu 提交于
      There is a report that after commit 27622b06 ("cpufreq: Convert
      to hotplug state machine"), the normal CPU offline/online cycle
      fails on some platforms.
      
      According to the ftrace result, this problem was triggered on
      platforms using acpi-cpufreq as the default cpufreq driver,
      and due to the lack of some ACPI freq method (eg. _PCT),
      cpufreq_online() failed and returned a negative value, so the CPU
      hotplug state machine rolled back the CPU online process.  Actually,
      from the user's perspective, the failure of cpufreq_online() should
      not prevent that CPU from being brought up, although cpufreq might
      not work on that CPU.
      
      BTW, during system startup cpufreq_online() is not invoked via CPU
      online but by the cpufreq device creation process, so the APs can be
      brought up even though cpufreq_online() fails in that stage.
      
      This patch ignores the return value of cpufreq_online/offline() and
      lets the cpufreq framework deal with the failure.  cpufreq_online()
      itself will do a proper rollback in that case and if _PCT is missing,
      the ACPI cpufreq driver will print a warning if the corresponding
      debug options have been enabled.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=194581
      Fixes: 27622b06 ("cpufreq: Convert to hotplug state machine")
      Reported-and-tested-by: NTomasz Maciej Nowak <tmn505@gmail.com>
      Signed-off-by: NChen Yu <yu.c.chen@intel.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Cc: 4.9+ <stable@vger.kernel.org> # 4.9+
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      c4a3fa26
  21. 28 3月, 2017 1 次提交
    • R
      cpufreq: Fix creation of symbolic links to policy directories · 2f0ba790
      Rafael J. Wysocki 提交于
      The cpufreq core only tries to create symbolic links from CPU
      directories in sysfs to policy directories in cpufreq_add_dev(),
      either when a given CPU is registered or when the cpufreq driver
      is registered, whichever happens first.  That is not sufficient,
      however, because cpufreq_add_dev() may be called for an offline CPU
      whose policy object has not been created yet and, quite obviously,
      the symbolic cannot be added in that case.
      
      Fix that by making cpufreq_online() attempt to add symbolic links to
      policy objects for the CPUs in the related_cpus mask of every new
      policy object created by it.
      
      The cpufreq_driver_lock locking around the for_each_cpu() loop
      in cpufreq_online() is dropped, because it is not necessary and the
      code is somewhat simpler without it.  Moreover, failures to create
      a symbolic link will not be regarded as hard errors any more and
      the CPUs without those links will not be taken offline automatically,
      but that should not be problematic in practice.
      Reported-and-tested-by: NPrashanth Prakash <pprakash@codeaurora.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: 4.9+ <stable@vger.kernel.org> # 4.9+
      2f0ba790
  22. 22 3月, 2017 1 次提交
  23. 16 3月, 2017 1 次提交
  24. 06 3月, 2017 1 次提交
    • L
      cpufreq: Add the "cpufreq.off=1" cmdline option · d82f2692
      Len Brown 提交于
      Add the "cpufreq.off=1" cmdline option.
      
      At boot-time, this allows a user to request CONFIG_CPU_FREQ=n
      behavior from a kernel built with CONFIG_CPU_FREQ=y.
      
      This is analogous to the existing "cpuidle.off=1" option
      and CONFIG_CPU_IDLE=y
      
      This capability is valuable when we need to debug end-user
      issues in the BIOS or in Linux.  It is also convenient
      for enabling comparisons, which may otherwise require a new kernel,
      or help from BIOS SETUP, which may be buggy or unavailable.
      Signed-off-by: NLen Brown <len.brown@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      d82f2692
  25. 16 2月, 2017 1 次提交
  26. 04 2月, 2017 2 次提交
  27. 01 2月, 2017 1 次提交
    • F
      sched/cputime: Convert kcpustat to nsecs · 7fb1327e
      Frederic Weisbecker 提交于
      Kernel CPU stats are stored in cputime_t which is an architecture
      defined type, and hence a bit opaque and requiring accessors and mutators
      for any operation.
      
      Converting them to nsecs simplifies the code and is one step toward
      the removal of cputime_t in the core code.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1485832191-26889-4-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7fb1327e
  28. 21 11月, 2016 2 次提交
  29. 20 9月, 2016 2 次提交