1. 26 11月, 2014 1 次提交
    • T
      cpufreq: Ref the policy object sooner · 6d4e81ed
      Tomeu Vizoso 提交于
      Do it before it's assigned to cpufreq_cpu_data, otherwise when a driver
      tries to get the cpu frequency during initialization the policy kobj is
      referenced and we get this warning:
      
      ------------[ cut here ]------------
      WARNING: CPU: 1 PID: 64 at include/linux/kref.h:47 kobject_get+0x64/0x70()
      Modules linked in:
      CPU: 1 PID: 64 Comm: irq/77-tegra-ac Not tainted 3.18.0-rc4-next-20141114ccu-00050-g3eff942 #326
      [<c0016fac>] (unwind_backtrace) from [<c001272c>] (show_stack+0x10/0x14)
      [<c001272c>] (show_stack) from [<c06085d8>] (dump_stack+0x98/0xd8)
      [<c06085d8>] (dump_stack) from [<c002892c>] (warn_slowpath_common+0x84/0xb4)
      [<c002892c>] (warn_slowpath_common) from [<c00289f8>] (warn_slowpath_null+0x1c/0x24)
      [<c00289f8>] (warn_slowpath_null) from [<c0220290>] (kobject_get+0x64/0x70)
      [<c0220290>] (kobject_get) from [<c03e944c>] (cpufreq_cpu_get+0x88/0xc8)
      [<c03e944c>] (cpufreq_cpu_get) from [<c03e9500>] (cpufreq_get+0xc/0x64)
      [<c03e9500>] (cpufreq_get) from [<c0285288>] (actmon_thread_isr+0x134/0x198)
      [<c0285288>] (actmon_thread_isr) from [<c0069008>] (irq_thread_fn+0x1c/0x40)
      [<c0069008>] (irq_thread_fn) from [<c0069324>] (irq_thread+0x134/0x174)
      [<c0069324>] (irq_thread) from [<c0040290>] (kthread+0xdc/0xf4)
      [<c0040290>] (kthread) from [<c000f4b8>] (ret_from_fork+0x14/0x3c)
      ---[ end trace b7bd64a81b340c59 ]---
      Signed-off-by: NTomeu Vizoso <tomeu.vizoso@collabora.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      6d4e81ed
  2. 12 11月, 2014 1 次提交
    • V
      cpufreq: respect the min/max settings from user space · 619c144c
      Vince Hsu 提交于
      When the user space tries to set scaling_(max|min)_freq through
      sysfs, the cpufreq_set_policy() asks other driver's opinions
      for the max/min frequencies. Some device drivers, like Tegra
      CPU EDP which is not upstreamed yet though, may constrain the
      CPU maximum frequency dynamically because of board design.
      So if the user space access happens and some driver is capping
      the cpu frequency at the same time, the user_policy->(max|min)
      is overridden by the capped value, and that's not expected by
      the user space. And if the user space is not invoked again,
      the CPU will always be capped by the user_policy->(max|min)
      even no drivers limit the CPU frequency any more.
      
      This patch preserves the user specified min/max settings, so that
      every time the cpufreq policy is updated, the new max/min can
      be re-evaluated correctly based on the user's expection and
      the present device drivers' status.
      Signed-off-by: NVince Hsu <vinceh@nvidia.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      619c144c
  3. 08 11月, 2014 1 次提交
    • G
      cpufreq: Avoid crash in resume on SMP without OPP · 09712f55
      Geert Uytterhoeven 提交于
      When resuming from s2ram on an SMP system without cpufreq operating
      points (e.g. there's no "operating-points" property for the CPU node in
      DT, or the platform doesn't use DT yet), the kernel crashes when
      bringing CPU 1 online:
      
          Enabling non-boot CPUs ...
          CPU1: Booted secondary processor
          Unable to handle kernel NULL pointer dereference at virtual address 0000003c
          pgd = ee5e6b00
          [0000003c] *pgd=6e579003, *pmd=6e588003, *pte=00000000
          Internal error: Oops: a07 [#1] SMP ARM
          Modules linked in:
          CPU: 0 PID: 1246 Comm: s2ram Tainted: G        W      3.18.0-rc3-koelsch-01614-g0377af242bb175c8-dirty #589
          task: eeec5240 ti: ee704000 task.ti: ee704000
          PC is at __cpufreq_add_dev.isra.24+0x24c/0x77c
          LR is at __cpufreq_add_dev.isra.24+0x244/0x77c
          pc : [<c0298efc>]    lr : [<c0298ef4>]    psr: 60000153
          sp : ee705d48  ip : ee705d48  fp : ee705d84
          r10: c04e0450  r9 : 00000000  r8 : 00000001
          r7 : c05426a8  r6 : 00000001  r5 : 00000001  r4 : 00000000
          r3 : 00000000  r2 : 00000000  r1 : 20000153  r0 : c0542734
      
      Verify that policy is not NULL before dereferencing it to fix this.
      Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Fixes: 8414809c (cpufreq: Preserve policy structure across suspend/resume)
      Cc: 3.12+ <stable@vger.kernel.org> # 3.12+
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      09712f55
  4. 24 10月, 2014 1 次提交
  5. 21 10月, 2014 1 次提交
  6. 01 10月, 2014 1 次提交
  7. 29 9月, 2014 2 次提交
    • R
      cpufreq: Replace strnicmp with strncasecmp · 7c4f4539
      Rasmus Villemoes 提交于
      The kernel used to contain two functions for length-delimited,
      case-insensitive string comparison, strnicmp with correct semantics
      and a slightly buggy strncasecmp. The latter is the POSIX name, so
      strnicmp was renamed to strncasecmp, and strnicmp made into a wrapper
      for the new strncasecmp to avoid breaking existing users.
      
      To allow the compat wrapper strnicmp to be removed at some point in
      the future, and to avoid the extra indirection cost, do
      s/strnicmp/strncasecmp/g.
      Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      7c4f4539
    • P
      cpufreq: Allow stop CPU callback to be used by all cpufreq drivers · 789ca243
      Preeti U Murthy 提交于
      Commit 367dc4aa ("cpufreq: Add stop CPU callback to
      cpufreq_driver interface") introduced the stop CPU callback for
      intel_pstate drivers. During the CPU_DOWN_PREPARE stage, this
      callback is invoked so that drivers can take some action on the
      pstate of the cpu before it is taken offline. This callback was
      assumed to be useful only for those drivers which have implemented
      the set_policy CPU callback because they have no other way to take
      action about the cpufreq of a CPU which is being hotplugged out
      except in the exit callback which is called very late in the offline
      process.
      
      The drivers which implement the target/target_index callbacks were
      expected to take care of requirements like the ones that commit
      367dc4aa addresses in the GOV_STOP notification event. But there
      are disadvantages to restricting the usage of stop CPU callback
      to cpufreq drivers that implement the set_policy callbacks and who
      want to take explicit action on the setting the cpufreq during a
      hotplug operation.
      
      1.GOV_STOP gets called for every CPU offline and drivers would usually
      want to take action when the last cpu in the policy->cpus mask
      is taken offline. As long as there is more than one cpu in the
      policy->cpus mask, cpufreq core itself makes sure that the freq
      for the other cpus in this mask is set according to the maximum load.
      This is sensible and drivers which implement the target_index callback
      would mostly not want to modify that. However the cpufreq core leaves a
      loose end when the cpu in the policy->cpus mask is the last one to go offline;
      it does nothing explicit to the frequency of the core. Drivers may need
      a way to take some action here and stop CPU callback mechanism is the
      best way to do it today.
      
      2. We cannot implement driver specific actions in the GOV_STOP mechanism.
      So we will need another driver callback which is invoked from here which is
      unnecessary.
      
      Therefore this patch extends the usage of stop CPU callback to be used
      by all cpufreq drivers as long as they have this callback implemented
      and irrespective of whether they are set_policy/target_index drivers.
      The assumption is if the drivers find the GOV_STOP path to be a suitable
      way of implementing what they want to do with the freq of the cpu
      going offine,they will not implement the stop CPU callback at all.
      Signed-off-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      789ca243
  8. 22 9月, 2014 2 次提交
    • P
      cpufreq: release policy->rwsem on error · 7106e02b
      Prarit Bhargava 提交于
      While debugging a cpufreq-related hardware failure on a system I saw the
      following lockdep warning:
      
       =========================
       [ BUG: held lock freed! ] 3.17.0-rc4+ #1 Tainted: G            E
       -------------------------
       insmod/2247 is freeing memory ffff88006e1b1400-ffff88006e1b17ff, with a lock still held there!
        (&policy->rwsem){+.+...}, at: [<ffffffff8156d37d>] __cpufreq_add_dev.isra.21+0x47d/0xb80
       3 locks held by insmod/2247:
        #0:  (subsys mutex#5){+.+.+.}, at: [<ffffffff81485579>] subsys_interface_register+0x69/0x120
        #1:  (cpufreq_rwsem){.+.+.+}, at: [<ffffffff8156cf73>] __cpufreq_add_dev.isra.21+0x73/0xb80
        #2:  (&policy->rwsem){+.+...}, at: [<ffffffff8156d37d>] __cpufreq_add_dev.isra.21+0x47d/0xb80
      
       stack backtrace:
       CPU: 0 PID: 2247 Comm: insmod Tainted: G            E  3.17.0-rc4+ #1
       Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 08/24/2013
        0000000000000000 000000008f3063c4 ffff88006f87bb30 ffffffff8171b358
        ffff88006bcf3750 ffff88006f87bb68 ffffffff810e09e1 ffff88006e1b1400
        ffffea0001b86c00 ffffffff8156d327 ffff880073003500 0000000000000246
       Call Trace:
        [<ffffffff8171b358>] dump_stack+0x4d/0x66
        [<ffffffff810e09e1>] debug_check_no_locks_freed+0x171/0x180
        [<ffffffff8156d327>] ? __cpufreq_add_dev.isra.21+0x427/0xb80
        [<ffffffff8121412b>] kfree+0xab/0x2b0
        [<ffffffff8156d327>] __cpufreq_add_dev.isra.21+0x427/0xb80
        [<ffffffff81724cf7>] ? _raw_spin_unlock+0x27/0x40
        [<ffffffffa003517f>] ? pcc_cpufreq_do_osc+0x17f/0x17f [pcc_cpufreq]
        [<ffffffff8156da8e>] cpufreq_add_dev+0xe/0x10
        [<ffffffff814855d1>] subsys_interface_register+0xc1/0x120
        [<ffffffff8156bcf2>] cpufreq_register_driver+0x112/0x340
        [<ffffffff8121415a>] ? kfree+0xda/0x2b0
        [<ffffffffa003517f>] ? pcc_cpufreq_do_osc+0x17f/0x17f [pcc_cpufreq]
        [<ffffffffa003562e>] pcc_cpufreq_init+0x4af/0xe81 [pcc_cpufreq]
        [<ffffffffa003517f>] ? pcc_cpufreq_do_osc+0x17f/0x17f [pcc_cpufreq]
        [<ffffffff81002144>] do_one_initcall+0xd4/0x210
        [<ffffffff811f7472>] ? __vunmap+0xd2/0x120
        [<ffffffff81127155>] load_module+0x1315/0x1b70
        [<ffffffff811222a0>] ? store_uevent+0x70/0x70
        [<ffffffff811229d9>] ? copy_module_from_fd.isra.44+0x129/0x180
        [<ffffffff81127b86>] SyS_finit_module+0xa6/0xd0
        [<ffffffff81725b69>] system_call_fastpath+0x16/0x1b
       cpufreq: __cpufreq_add_dev: ->get() failed
      insmod: ERROR: could not insert module pcc-cpufreq.ko: No such device
      
      The warning occurs in the __cpufreq_add_dev() code which does
      
              down_write(&policy->rwsem);
      	...
              if (cpufreq_driver->get && !cpufreq_driver->setpolicy) {
                      policy->cur = cpufreq_driver->get(policy->cpu);
                      if (!policy->cur) {
                              pr_err("%s: ->get() failed\n", __func__);
                              goto err_get_freq;
                      }
      
      If cpufreq_driver->get(policy->cpu) returns an error we execute the
      code at err_get_freq, which does not up the policy->rwsem.  This causes
      the lockdep warning.
      
      Trivial patch to up the policy->rwsem in the error path.
      
      After the patch has been applied, and an error occurs in the
      cpufreq_driver->get(policy->cpu) call we will now see
      
      cpufreq: __cpufreq_add_dev: ->get() failed
      cpufreq: __cpufreq_add_dev: ->get() failed
      modprobe: ERROR: could not insert 'pcc_cpufreq': No such device
      
      Fixes: 4e97b631 (cpufreq: Initialize governor for a new policy under policy->rwsem)
      Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      7106e02b
    • L
      cpufreq: fix cpufreq suspend/resume for intel_pstate · 8e30444e
      Lan Tianyu 提交于
      Cpufreq core introduces cpufreq_suspended flag to let cpufreq sysfs nodes
      across S2RAM/S2DISK. But the flag is only set in the cpufreq_suspend()
      for cpufreq drivers which have target or target_index callback. This
      skips intel_pstate driver. This patch is to set the flag before checking
      target or target_index callback.
      
      Fixes: 2f0aea93 (cpufreq: suspend governors on system suspend/hibernate)
      Signed-off-by: NLan Tianyu <tianyu.lan@intel.com>
      Cc: 3.15+ <stable@vger.kernel.org> # 3.15+
      [rjw: Subject]
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      8e30444e
  9. 21 7月, 2014 3 次提交
  10. 17 7月, 2014 1 次提交
    • V
      cpufreq: move policy kobj to policy->cpu at resume · 92c14bd9
      Viresh Kumar 提交于
      This is only relevant to implementations with multiple clusters, where clusters
      have separate clock lines but all CPUs within a cluster share it.
      
      Consider a dual cluster platform with 2 cores per cluster. During suspend we
      start hot unplugging CPUs in order 1 to 3. When CPU2 is removed, policy->kobj
      would be moved to CPU3 and when CPU3 goes down we wouldn't free policy or its
      kobj as we want to retain permissions/values/etc.
      
      Now on resume, we will get CPU2 before CPU3 and will call __cpufreq_add_dev().
      We will recover the old policy and update policy->cpu from 3 to 2 from
      update_policy_cpu().
      
      But the kobj is still tied to CPU3 and isn't moved to CPU2. We wouldn't create a
      link for CPU2, but would try that for CPU3 while bringing it online. Which will
      report errors as CPU3 already has kobj assigned to it.
      
      This bug got introduced with commit 42f921a6, which overlooked this scenario.
      
      To fix this, lets move kobj to the new policy->cpu while bringing first CPU of a
      cluster back. Also do a WARN_ON() if kobject_move failed, as we would reach here
      only for the first CPU of a non-boot cluster. And we can't recover from this
      situation, if kobject_move() fails.
      
      Fixes: 42f921a6 (cpufreq: remove sysfs files for CPUs which failed to come back after resume)
      Cc:  3.13+ <stable@vger.kernel.org> # 3.13+
      Reported-and-tested-by: NBu Yitian <ybu@qti.qualcomm.com>
      Reported-by: NSaravana Kannan <skannan@codeaurora.org>
      Reviewed-by: NSrivatsa S. Bhat <srivatsa@mit.edu>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      92c14bd9
  11. 19 6月, 2014 1 次提交
  12. 06 6月, 2014 1 次提交
    • V
      cpufreq: add support for intermediate (stable) frequencies · 1c03a2d0
      Viresh Kumar 提交于
      Douglas Anderson, recently pointed out an interesting problem due to which
      udelay() was expiring earlier than it should.
      
      While transitioning between frequencies few platforms may temporarily switch to
      a stable frequency, waiting for the main PLL to stabilize.
      
      For example: When we transition between very low frequencies on exynos, like
      between 200MHz and 300MHz, we may temporarily switch to a PLL running at 800MHz.
      No CPUFREQ notification is sent for that. That means there's a period of time
      when we're running at 800MHz but loops_per_jiffy is calibrated at between 200MHz
      and 300MHz. And so udelay behaves badly.
      
      To get this fixed in a generic way, introduce another set of callbacks
      get_intermediate() and target_intermediate(), only for drivers with
      target_index() and CPUFREQ_ASYNC_NOTIFICATION unset.
      
      get_intermediate() should return a stable intermediate frequency platform wants
      to switch to, and target_intermediate() should set CPU to that frequency,
      before jumping to the frequency corresponding to 'index'. Core will take care of
      sending notifications and driver doesn't have to handle them in
      target_intermediate() or target_index().
      
      NOTE: ->target_index() should restore to policy->restore_freq in case of
      failures as core would send notifications for that.
      Tested-by: NStephen Warren <swarren@nvidia.com>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Reviewed-by: NDoug Anderson <dianders@chromium.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      1c03a2d0
  13. 29 5月, 2014 1 次提交
  14. 08 5月, 2014 1 次提交
  15. 07 5月, 2014 1 次提交
    • S
      cpufreq: Catch double invocations of cpufreq_freq_transition_begin/end · ca654dc3
      Srivatsa S. Bhat 提交于
      Some cpufreq drivers were redundantly invoking the _begin() and _end()
      APIs around frequency transitions, and this double invocation (one from
      the cpufreq core and the other from the cpufreq driver) used to result
      in a self-deadlock, leading to system hangs during boot. (The _begin()
      API makes contending callers wait until the previous invocation is
      complete. Hence, the cpufreq driver would end up waiting on itself!).
      
      Now all such drivers have been fixed, but debugging this issue was not
      very straight-forward (even lockdep didn't catch this). So let us add a
      debug infrastructure to the cpufreq core to catch such issues more easily
      in the future.
      
      We add a new field called 'transition_task' to the policy structure, to keep
      track of the task which is performing the frequency transition. Using this
      field, we make note of this task during _begin() and print a warning if we
      find a case where the same task is calling _begin() again, before completing
      the previous frequency transition using the corresponding _end().
      
      We have left out ASYNC_NOTIFICATION drivers from this debug infrastructure
      for 2 reasons:
      
      1. At the moment, we have no way to avoid a particular scenario where this
         debug infrastructure can emit false-positive warnings for such drivers.
         The scenario is depicted below:
      
               Task A						Task B
      
          /* 1st freq transition */
          Invoke _begin() {
                  ...
                  ...
          }
      
          Change the frequency
      
          /* 2nd freq transition */
          Invoke _begin() {
      	    ...	//waiting for B to
                  ... //finish _end() for
      	    ... //the 1st transition
      	    ...	      |				Got interrupt for successful
      	    ...	      |				change of frequency (1st one).
      	    ...       |
      	    ...	      |				/* 1st freq transition */
      	    ...	      |				Invoke _end() {
      	    ...	      |					...
      	    ...	      V				}
      	    ...
      	    ...
          }
      
         This scenario is actually deadlock-free because, once Task A changes the
         frequency, it is Task B's responsibility to invoke the corresponding
         _end() for the 1st frequency transition. Hence it is perfectly legal for
         Task A to go ahead and attempt another frequency transition in the meantime.
         (Of course it won't be able to proceed until Task B finishes the 1st _end(),
         but this doesn't cause a deadlock or a hang).
      
         The debug infrastructure cannot handle this scenario and will treat it as
         a deadlock and print a warning. To avoid this, we exclude such drivers
         from the purview of this code.
      
      2. Luckily, we don't _need_ this infrastructure for ASYNC_NOTIFICATION drivers
         at all! The cpufreq core does not automatically invoke the _begin() and
         _end() APIs during frequency transitions in such drivers. Thus, the driver
         alone is responsible for invoking _begin()/_end() and hence there shouldn't
         be any conflicts which lead to double invocations. So, we can skip these
         drivers, since the probability that such drivers will hit this problem is
         extremely low, as outlined above.
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      ca654dc3
  16. 30 4月, 2014 1 次提交
  17. 26 3月, 2014 4 次提交
    • V
      cpufreq: Make cpufreq_notify_transition & cpufreq_notify_post_transition static · 236a9800
      Viresh Kumar 提交于
      cpufreq_notify_transition() and cpufreq_notify_post_transition() shouldn't be
      called directly by cpufreq drivers anymore and so these should be marked static.
      Reviewed-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      236a9800
    • V
      cpufreq: Convert existing drivers to use cpufreq_freq_transition_{begin|end} · 8fec051e
      Viresh Kumar 提交于
      CPUFreq core has new infrastructure that would guarantee serialized calls to
      target() or target_index() callbacks. These are called
      cpufreq_freq_transition_begin() and cpufreq_freq_transition_end().
      
      This patch converts existing drivers to use these new set of routines.
      Reviewed-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      8fec051e
    • S
      cpufreq: Make sure frequency transitions are serialized · 12478cf0
      Srivatsa S. Bhat 提交于
      Whenever we change the frequency of a CPU, we call the PRECHANGE and POSTCHANGE
      notifiers. They must be serialized, i.e. PRECHANGE and POSTCHANGE notifiers
      should strictly alternate, thereby preventing two different sets of PRECHANGE or
      POSTCHANGE notifiers from interleaving arbitrarily.
      
      The following examples illustrate why this is important:
      
      Scenario 1:
      -----------
      A thread reading the value of cpuinfo_cur_freq, will call
      __cpufreq_cpu_get()->cpufreq_out_of_sync()->cpufreq_notify_transition()
      
      The ondemand governor can decide to change the frequency of the CPU at the same
      time and hence it can end up sending the notifications via ->target().
      
      If the notifiers are not serialized, the following sequence can occur:
      - PRECHANGE Notification for freq A (from cpuinfo_cur_freq)
      - PRECHANGE Notification for freq B (from target())
      - Freq changed by target() to B
      - POSTCHANGE Notification for freq B
      - POSTCHANGE Notification for freq A
      
      We can see from the above that the last POSTCHANGE Notification happens for freq
      A but the hardware is set to run at freq B.
      
      Where would we break then?: adjust_jiffies() in cpufreq.c & cpufreq_callback()
      in arch/arm/kernel/smp.c (which also adjusts the jiffies). All the
      loops_per_jiffy calculations will get messed up.
      
      Scenario 2:
      -----------
      The governor calls __cpufreq_driver_target() to change the frequency. At the
      same time, if we change scaling_{min|max}_freq from sysfs, it will end up
      calling the governor's CPUFREQ_GOV_LIMITS notification, which will also call
      __cpufreq_driver_target(). And hence we end up issuing concurrent calls to
      ->target().
      
      Typically, platforms have the following logic in their ->target() routines:
      (Eg: cpufreq-cpu0, omap, exynos, etc)
      
      A. If new freq is more than old: Increase voltage
      B. Change freq
      C. If new freq is less than old: decrease voltage
      
      Now, if the two concurrent calls to ->target() are X and Y, where X is trying to
      increase the freq and Y is trying to decrease it, we get the following race
      condition:
      
      X.A: voltage gets increased for larger freq
      Y.A: nothing happens
      Y.B: freq gets decreased
      Y.C: voltage gets decreased
      X.B: freq gets increased
      X.C: nothing happens
      
      Thus we can end up setting a freq which is not supported by the voltage we have
      set. That will probably make the clock to the CPU unstable and the system might
      not work properly anymore.
      
      This patch introduces a set of synchronization primitives to serialize frequency
      transitions, which are to be used as shown below:
      
      cpufreq_freq_transition_begin();
      
      //Perform the frequency change
      
      cpufreq_freq_transition_end();
      
      The _begin() call sends the PRECHANGE notification whereas the _end() call sends
      the POSTCHANGE notification. Also, all the necessary synchronization is handled
      within these calls. In particular, even drivers which set the ASYNC_NOTIFICATION
      flag can also use these APIs for performing frequency transitions (ie., you can
      call _begin() from one task, and call the corresponding _end() from a different
      task).
      
      The actual synchronization underneath is not that complicated:
      
      The key challenge is to allow drivers to begin the transition from one thread
      and end it in a completely different thread (this is to enable drivers that do
      asynchronous POSTCHANGE notification from bottom-halves, to also use the same
      interface).
      
      To achieve this, a 'transition_ongoing' flag, a 'transition_lock' spinlock and a
      wait-queue are added per-policy. The flag and the wait-queue are used in
      conjunction to create an "uninterrupted flow" from _begin() to _end(). The
      spinlock is used to ensure that only one such "flow" is in flight at any given
      time. Put together, this provides us all the necessary synchronization.
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      12478cf0
    • V
      cpufreq: resume drivers before enabling governors · 0c5aa405
      Viresh Kumar 提交于
      During suspend, we first stop governors and then suspend cpufreq drivers and
      resume must be exactly opposite of that. i.e. resume drivers first and then
      start governors.
      
      But the current code in resume enables governors first and then resume drivers.
      Fix it be changing code sequence there.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      0c5aa405
  18. 20 3月, 2014 3 次提交
  19. 19 3月, 2014 2 次提交
  20. 13 3月, 2014 1 次提交
    • R
      cpufreq: Skip current frequency initialization for ->setpolicy drivers · 2ed99e39
      Rafael J. Wysocki 提交于
      After commit da60ce9f (cpufreq: call cpufreq_driver->get() after
      calling ->init()) __cpufreq_add_dev() sometimes fails for CPUs handled
      by intel_pstate, because that driver may return 0 from its ->get()
      callback if it has not run long enough to collect enough samples on the
      given CPU.  That didn't happen before commit da60ce9f which added
      policy->cur initialization to __cpufreq_add_dev() to help reduce code
      duplication in other cpufreq drivers.
      
      However, the code added by commit da60ce9f need not be executed
      for cpufreq drivers having the ->setpolicy callback defined, because
      the subsequent invocation of cpufreq_set_policy() will use that
      callback to initialize the policy anyway and it doesn't need
      policy->cur to be initialized upfront.  The analogous code in
      cpufreq_update_policy() is also unnecessary for cpufreq drivers
      having ->setpolicy set and may be skipped for them as well.
      
      Since intel_pstate provides ->setpolicy, skipping the upfront
      policy->cur initialization for cpufreq drivers with that callback
      set will cover intel_pstate and the problem it's been having after
      commit da60ce9f will be addressed.
      
      Fixes: da60ce9f (cpufreq: call cpufreq_driver->get() after calling ->init())
      References: https://bugzilla.kernel.org/show_bug.cgi?id=71931Reported-and-tested-by: NPatrik Lundquist <patrik.lundquist@gmail.com>
      Acked-by: NDirk Brandewie <dirk.j.brandewie@intel.com>
      Cc: 3.13+ <stable@vger.kernel.org> # 3.13+
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      2ed99e39
  21. 12 3月, 2014 3 次提交
  22. 06 3月, 2014 6 次提交
    • V
      cpufreq: Implement cpufreq_generic_suspend() · e28867ea
      Viresh Kumar 提交于
      Multiple platforms need to set CPUs to a particular frequency before
      suspending the system, so provide a common infrastructure for them.
      
      Those platforms only need to point their ->suspend callback pointers
      to the generic routine.
      Tested-by: NStephen Warren <swarren@nvidia.com>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      [rjw: Changelog]
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      e28867ea
    • V
      cpufreq: suspend governors on system suspend/hibernate · 2f0aea93
      Viresh Kumar 提交于
      This patch adds cpufreq suspend/resume calls to dpm_{suspend|resume}()
      for handling suspend/resume of cpufreq governors.
      
      Lan Tianyu (Intel) & Jinhyuk Choi (Broadcom) found an issue where the
      tunables configuration for clusters/sockets with non-boot CPUs was
      lost after system suspend/resume, as we were notifying governors with
      CPUFREQ_GOV_POLICY_EXIT on removal of the last CPU for that policy
      which caused the tunables memory to be freed.
      
      This is fixed by preventing any governor operations from being
      carried out between the device suspend and device resume stages of
      system suspend and resume, respectively.
      
      We could have added these callbacks at dpm_{suspend|resume}_noirq()
      level, but there is an additional problem that the majority of I/O
      devices is already suspended at that point and if cpufreq drivers
      want to change the frequency before suspending, then that not be
      possible on some platforms (which depend on peripherals like i2c,
      regulators, etc).
      Reported-and-tested-by: NLan Tianyu <tianyu.lan@intel.com>
      Reported-by: NJinhyuk Choi <jinchoi@broadcom.com>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      [rjw: Changelog]
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      2f0aea93
    • V
      cpufreq: move call to __find_governor() to cpufreq_init_policy() · 6e2c89d1
      viresh kumar 提交于
      We call __find_governor() during the addition of the first CPU of
      each policy from __cpufreq_add_dev() to find the last governor used
      for this CPU before it was hot-removed.
      
      After that we call cpufreq_parse_governor() in cpufreq_init_policy(),
      either with this governor, or with the default governor. Right after
      that policy->governor is set to NULL.
      
      While that code is not functionally problematic, the structure of it
      is suboptimal, because some of the code required in cpufreq_init_policy()
      is being executed by its caller, __cpufreq_add_dev(). So, it would make
      more sense to get all of it together in a single place to make code more
      readable.
      
      Accordingly, move the code needed for policy initialization to
      cpufreq_init_policy() and initialize policy->governor to NULL at the
      beginning.
      
      In order to clean up the code a bit more, some of the #ifdefs for
      CONFIG_HOTPLUG_CPU are dropped too.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      [rjw: Changelog]
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      6e2c89d1
    • V
      cpufreq: Initialize governor for a new policy under policy->rwsem · 4e97b631
      Viresh Kumar 提交于
      policy->rwsem is used to lock access to all parts of code modifying
      struct cpufreq_policy, but it's not used on a new policy created by
      __cpufreq_add_dev().
      
      Because of that, if cpufreq_update_policy() is called in a tight loop
      on one CPU in parallel with offline/online of another CPU, then the
      following crash can be triggered:
      
      Unable to handle kernel NULL pointer dereference at virtual address 00000020
      pgd = c0003000
      [00000020] *pgd=80000000004003, *pmd=00000000
      Internal error: Oops: 206 [#1] PREEMPT SMP ARM
      
      PC is at __cpufreq_governor+0x10/0x1ac
      LR is at cpufreq_update_policy+0x114/0x150
      
      ---[ end trace f23a8defea6cd706 ]---
      Kernel panic - not syncing: Fatal exception
      CPU0: stopping
      CPU: 0 PID: 7136 Comm: mpdecision Tainted: G      D W    3.10.0-gd727407-00074-g979ede8 #396
      
      [<c0afe180>] (notifier_call_chain+0x40/0x68) from [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58)
      [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58) from [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c)
      [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c) from [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8)
      [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8) from [<c0803e7c>] (cpufreq_init_policy+0x30/0x98)
      [<c0803e7c>] (cpufreq_init_policy+0x30/0x98) from [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4)
      [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4) from [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84)
      [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84) from [<c0afe180>] (notifier_call_chain+0x40/0x68)
      [<c0afe180>] (notifier_call_chain+0x40/0x68) from [<c02812dc>] (__cpu_notify+0x28/0x44)
      [<c02812dc>] (__cpu_notify+0x28/0x44) from [<c0aeed90>] (_cpu_up+0xf4/0x1dc)
      [<c0aeed90>] (_cpu_up+0xf4/0x1dc) from [<c0aeeed4>] (cpu_up+0x5c/0x78)
      [<c0aeeed4>] (cpu_up+0x5c/0x78) from [<c0aec808>] (store_online+0x44/0x74)
      [<c0aec808>] (store_online+0x44/0x74) from [<c03a40f4>] (sysfs_write_file+0x108/0x14c)
      [<c03a40f4>] (sysfs_write_file+0x108/0x14c) from [<c03517d4>] (vfs_write+0xd0/0x180)
      [<c03517d4>] (vfs_write+0xd0/0x180) from [<c0351ca8>] (SyS_write+0x38/0x68)
      [<c0351ca8>] (SyS_write+0x38/0x68) from [<c0205de0>] (ret_fast_syscall+0x0/0x30)
      
      Fix that by taking locks at appropriate places in __cpufreq_add_dev()
      as well.
      Reported-by: NSaravana Kannan <skannan@codeaurora.org>
      Suggested-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      [rjw: Changelog]
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      4e97b631
    • V
      cpufreq: Initialize policy before making it available for others to use · 5a7e56a5
      Viresh Kumar 提交于
      Policy must be fully initialized before it is being made available
      for use by others. Otherwise cpufreq_cpu_get() would be able to grab
      a half initialized policy structure that might not have affected_cpus
      (for example) populated. Then, anybody accessing those fields will get
      a wrong value and that will lead to unpredictable results.
      
      In order to fix this, do all the necessary initialization before we
      make the policy structure available via cpufreq_cpu_get(). That will
      guarantee that any code accessing fields of the policy will get
      correct data from them.
      Reported-by: NSaravana Kannan <skannan@codeaurora.org>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      [rjw: Changelog]
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      5a7e56a5
    • A
      cpufreq: use cpufreq_cpu_get() to avoid cpufreq_get() race conditions · 999976e0
      Aaron Plattner 提交于
      If a module calls cpufreq_get while cpufreq is initializing, it's
      possible for it to be called after cpufreq_driver is set but before
      cpufreq_cpu_data is written during subsys_interface_register.  This
      happens because cpufreq_get doesn't take the cpufreq_driver_lock
      around its use of cpufreq_cpu_data.
      
      Fix this by using cpufreq_cpu_get(cpu) to look up the policy rather
      than reading it out of cpufreq_cpu_data directly.  cpufreq_cpu_get()
      takes the appropriate locks to prevent this race from happening.
      
      Since it's possible for policy to be NULL if the caller passes in an
      invalid CPU number or calls the function before cpufreq is initialized,
      delete the BUG_ON(!policy) and simply return 0.  Don't try to return
      -ENOENT because that's negative and the function returns an unsigned
      integer.
      
      References: https://bbs.archlinux.org/viewtopic.php?id=177934Signed-off-by: NAaron Plattner <aplattner@nvidia.com>
      Cc: 3.13+ <stable@vger.kernel.org> # 3.13+
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      999976e0
  23. 01 3月, 2014 1 次提交