1. 06 1月, 2014 4 次提交
    • V
      cpufreq: Make sure CPU is running on a freq from freq-table · d3916691
      Viresh Kumar 提交于
      Sometimes boot loaders set CPU frequency to a value outside of frequency table
      present with cpufreq core. In such cases CPU might be unstable if it has to run
      on that frequency for long duration of time and so its better to set it to a
      frequency which is specified in freq-table. This also makes cpufreq stats
      inconsistent as cpufreq-stats would fail to register because current frequency
      of CPU isn't found in freq-table.
      
      Because we don't want this change to affect boot process badly, we go for the
      next freq which is >= policy->cur ('cur' must be set by now, otherwise we will
      end up setting freq to lowest of the table as 'cur' is initialized to zero).
      
      In case current frequency doesn't match any frequency from freq-table, we throw
      warnings to user, so that user can get this fixed in their bootloaders or
      freq-tables.
      Reported-by: NCarlos Hernandez <ceh@ti.com>
      Reported-and-tested-by: NNishanth Menon <nm@ti.com>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      d3916691
    • V
      cpufreq: send new set of notification for transition failures · ab1b1c4e
      Viresh Kumar 提交于
      In the current code, if we fail during a frequency transition, we
      simply send the POSTCHANGE notification with the old frequency. This
      isn't enough.
      
      One of the core users of these notifications is the code responsible
      for keeping loops_per_jiffy aligned with frequency changes. And mostly
      it is written as:
      
      	if ((val == CPUFREQ_PRECHANGE  && freq->old < freq->new) ||
      	    (val == CPUFREQ_POSTCHANGE && freq->old > freq->new)) {
      		update-loops-per-jiffy...
      	}
      
      So, suppose we are changing to a higher frequency and failed during
      transition, then following will happen:
      - CPUFREQ_PRECHANGE notification with freq-new > freq-old
      - CPUFREQ_POSTCHANGE notification with freq-new == freq-old
      
      The first one will update loops_per_jiffy and second one will do
      nothing. Even if we send the 2nd notification by exchanging values of
      freq-new and old, some users of these notifications might get
      unstable.
      
      This can be fixed by simply calling cpufreq_notify_post_transition()
      with error code and this routine will take care of sending
      notifications in the correct order.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      [rjw: Folded 3 patches into one, rebased unicore2 changes]
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      ab1b1c4e
    • V
      cpufreq: Introduce cpufreq_notify_post_transition() · f7ba3b41
      Viresh Kumar 提交于
      This introduces a new routine cpufreq_notify_post_transition() which
      can be used to send POSTCHANGE notification for new freq with or
      without both {PRE|POST}CHANGE notifications for last freq. This is
      useful at multiple places, especially for sending transition failure
      notifications.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      f7ba3b41
    • J
      cpufreq: Fix timer/workqueue corruption by protecting reading governor_enabled · 6f1e4efd
      Jane Li 提交于
      When a CPU is hot removed we'll cancel all the delayed work items via
      gov_cancel_work(). Sometimes the delayed work function determines that
      it should adjust the delay for all other CPUs that the policy is
      managing. If this scenario occurs, the canceling CPU will cancel its own
      work but queue up the other CPUs works to run.
      
      Commit 3617f2 (cpufreq: Fix timer/workqueue corruption due to double
      queueing) has tried to fix this, but reading governor_enabled is not
      protected by cpufreq_governor_lock. Even though od_dbs_timer() checks
      governor_enabled before gov_queue_work(), this scenario may occur. For
      example:
      
       CPU0                                        CPU1
       ----                                        ----
       cpu_down()
        ...                                        <work runs>
        __cpufreq_remove_dev()                     od_dbs_timer()
         __cpufreq_governor()                       policy->governor_enabled
          policy->governor_enabled = false;
          cpufreq_governor_dbs()
           case CPUFREQ_GOV_STOP:
            gov_cancel_work(dbs_data, policy);
             cpu0 work is canceled
              timer is canceled
              cpu1 work is canceled
              <waits for cpu1>
                                                    gov_queue_work(*, *, true);
                                                     cpu0 work queued
                                                     cpu1 work queued
                                                     cpu2 work queued
                                                     ...
              cpu1 work is canceled
              cpu2 work is canceled
              ...
      
      At the end of the GOV_STOP case cpu0 still has a work queued to
      run although the code is expecting all of the works to be
      canceled. __cpufreq_remove_dev() will then proceed to
      re-initialize all the other CPUs works except for the CPU that is
      going down. The CPUFREQ_GOV_START case in cpufreq_governor_dbs()
      will trample over the queued work and debugobjects will spit out
      a warning:
      
      WARNING: at lib/debugobjects.c:260 debug_print_object+0x94/0xbc()
      ODEBUG: init active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x14
      Modules linked in:
      CPU: 1 PID: 1205 Comm: sh Tainted: G        W    3.10.0 #200
      [<c01144f0>] (unwind_backtrace+0x0/0xf8) from [<c0111d98>] (show_stack+0x10/0x14)
      [<c0111d98>] (show_stack+0x10/0x14) from [<c01272cc>] (warn_slowpath_common+0x4c/0x68)
      [<c01272cc>] (warn_slowpath_common+0x4c/0x68) from [<c012737c>] (warn_slowpath_fmt+0x30/0x40)
      [<c012737c>] (warn_slowpath_fmt+0x30/0x40) from [<c034c640>] (debug_print_object+0x94/0xbc)
      [<c034c640>] (debug_print_object+0x94/0xbc) from [<c034c7f8>] (__debug_object_init+0xc8/0x3c0)
      [<c034c7f8>] (__debug_object_init+0xc8/0x3c0) from [<c01360e0>] (init_timer_key+0x20/0x104)
      [<c01360e0>] (init_timer_key+0x20/0x104) from [<c04872ac>] (cpufreq_governor_dbs+0x1dc/0x68c)
      [<c04872ac>] (cpufreq_governor_dbs+0x1dc/0x68c) from [<c04833a8>] (__cpufreq_governor+0x80/0x1b0)
      [<c04833a8>] (__cpufreq_governor+0x80/0x1b0) from [<c0483704>] (__cpufreq_remove_dev.isra.12+0x22c/0x380)
      [<c0483704>] (__cpufreq_remove_dev.isra.12+0x22c/0x380) from [<c0692f38>] (cpufreq_cpu_callback+0x48/0x5c)
      [<c0692f38>] (cpufreq_cpu_callback+0x48/0x5c) from [<c014fb40>] (notifier_call_chain+0x44/0x84)
      [<c014fb40>] (notifier_call_chain+0x44/0x84) from [<c012ae44>] (__cpu_notify+0x2c/0x48)
      [<c012ae44>] (__cpu_notify+0x2c/0x48) from [<c068dd40>] (_cpu_down+0x80/0x258)
      [<c068dd40>] (_cpu_down+0x80/0x258) from [<c068df40>] (cpu_down+0x28/0x3c)
      [<c068df40>] (cpu_down+0x28/0x3c) from [<c068e4c0>] (store_online+0x30/0x74)
      [<c068e4c0>] (store_online+0x30/0x74) from [<c03a7308>] (dev_attr_store+0x18/0x24)
      [<c03a7308>] (dev_attr_store+0x18/0x24) from [<c0256fe0>] (sysfs_write_file+0x100/0x180)
      [<c0256fe0>] (sysfs_write_file+0x100/0x180) from [<c01fec9c>] (vfs_write+0xbc/0x184)
      [<c01fec9c>] (vfs_write+0xbc/0x184) from [<c01ff034>] (SyS_write+0x40/0x68)
      [<c01ff034>] (SyS_write+0x40/0x68) from [<c010e200>] (ret_fast_syscall+0x0/0x48)
      
      In gov_queue_work(), lock cpufreq_governor_lock before gov_queue_work,
      and unlock it after __gov_queue_work(). In this way, governor_enabled
      is guaranteed not changed in gov_queue_work().
      Signed-off-by: NJane Li <jiel@marvell.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Reviewed-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      6f1e4efd
  2. 29 12月, 2013 2 次提交
  3. 22 12月, 2013 2 次提交
    • J
      cpufreq: Use CONFIG_CPU_FREQ_DEFAULT_* to set initial policy for setpolicy drivers · a27a9ab7
      Jason Baron 提交于
      When configuring a default governor (via CONFIG_CPU_FREQ_DEFAULT_*) with the
      intel_pstate driver, the desired default policy is not properly set. For
      example, setting 'CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE' ends up with the
      'powersave' policy being set.
      
      Fix by configuring the correct default policy, if either 'powersave' or
      'performance' are requested. Otherwise, fallback to what the driver originally
      set via its 'init' routine.
      Signed-off-by: NJason Baron <jbaron@akamai.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      a27a9ab7
    • V
      cpufreq: remove sysfs files for CPUs which failed to come back after resume · 42f921a6
      Viresh Kumar 提交于
      There are cases where cpufreq_add_dev() may fail for some CPUs
      during system resume. With the current code we will still have
      sysfs cpufreq files for those CPUs and struct cpufreq_policy
      would be already freed for them. Hence any operation on those
      sysfs files would result in kernel warnings.
      
      Example of problems resulting from resume errors (from Bjørn Mork):
      
      WARNING: CPU: 0 PID: 6055 at fs/sysfs/file.c:343 sysfs_open_file+0x77/0x212()
      missing sysfs attribute operations for kobject: (null)
      Modules linked in: [stripped as irrelevant]
      CPU: 0 PID: 6055 Comm: grep Tainted: G      D      3.13.0-rc2 #153
      Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011
       0000000000000009 ffff8802327ebb78 ffffffff81380b0e 0000000000000006
       ffff8802327ebbc8 ffff8802327ebbb8 ffffffff81038635 0000000000000000
       ffffffff811823c7 ffff88021a19e688 ffff88021a19e688 ffff8802302f9310
      Call Trace:
       [<ffffffff81380b0e>] dump_stack+0x55/0x76
       [<ffffffff81038635>] warn_slowpath_common+0x7c/0x96
       [<ffffffff811823c7>] ? sysfs_open_file+0x77/0x212
       [<ffffffff810386e3>] warn_slowpath_fmt+0x41/0x43
       [<ffffffff81182dec>] ? sysfs_get_active+0x6b/0x82
       [<ffffffff81182382>] ? sysfs_open_file+0x32/0x212
       [<ffffffff811823c7>] sysfs_open_file+0x77/0x212
       [<ffffffff81182350>] ? sysfs_schedule_callback+0x1ac/0x1ac
       [<ffffffff81122562>] do_dentry_open+0x17c/0x257
       [<ffffffff8112267e>] finish_open+0x41/0x4f
       [<ffffffff81130225>] do_last+0x80c/0x9ba
       [<ffffffff8112dbbd>] ? inode_permission+0x40/0x42
       [<ffffffff81130606>] path_openat+0x233/0x4a1
       [<ffffffff81130b7e>] do_filp_open+0x35/0x85
       [<ffffffff8113b787>] ? __alloc_fd+0x172/0x184
       [<ffffffff811232ea>] do_sys_open+0x6b/0xfa
       [<ffffffff811233a7>] SyS_openat+0xf/0x11
       [<ffffffff8138c812>] system_call_fastpath+0x16/0x1b
      
      To fix this, remove those sysfs files or put the associated kobject
      in case of such errors. Also, to make it simple, remove the cpufreq
      sysfs links from all the CPUs (except for the policy->cpu) during
      suspend, as that operation won't result in a loss of sysfs file
      permissions and we can create those links during resume just fine.
      
      Fixes: 5302c3fb ("cpufreq: Perform light-weight init/teardown during suspend/resume")
      Reported-and-tested-by: NBjørn Mork <bjorn@mork.no>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Cc: 3.12+ <stable@vger.kernel.org> # 3.12+
      [rjw: Changelog]
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      42f921a6
  4. 08 12月, 2013 2 次提交
  5. 03 12月, 2013 1 次提交
    • B
      cpufreq: fix garbage kobjects on errors during suspend/resume · 2167e239
      Bjørn Mork 提交于
      This is effectively a revert of commit 5302c3fb ("cpufreq: Perform
      light-weight init/teardown during suspend/resume"), which enabled
      suspend/resume optimizations leaving the sysfs files in place.
      
      Errors during suspend/resume are not handled properly, leaving
      dead sysfs attributes in case of failures.  There are are number of
      functions with special code for the "frozen" case, and all these
      need to also have special error handling.
      
      The problem is easy to demonstrate by making cpufreq_driver->init()
      or cpufreq_driver->get() fail during resume.
      
      The code is too complex for a simple fix, with split code paths
      in multiple blocks within a number of functions.  It is therefore
      best to revert the patch enabling this code until the error handling
      is in place.
      
      Examples of problems resulting from resume errors:
      
      WARNING: CPU: 0 PID: 6055 at fs/sysfs/file.c:343 sysfs_open_file+0x77/0x212()
      missing sysfs attribute operations for kobject: (null)
      Modules linked in: [stripped as irrelevant]
      CPU: 0 PID: 6055 Comm: grep Tainted: G      D      3.13.0-rc2 #153
      Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011
       0000000000000009 ffff8802327ebb78 ffffffff81380b0e 0000000000000006
       ffff8802327ebbc8 ffff8802327ebbb8 ffffffff81038635 0000000000000000
       ffffffff811823c7 ffff88021a19e688 ffff88021a19e688 ffff8802302f9310
      Call Trace:
       [<ffffffff81380b0e>] dump_stack+0x55/0x76
       [<ffffffff81038635>] warn_slowpath_common+0x7c/0x96
       [<ffffffff811823c7>] ? sysfs_open_file+0x77/0x212
       [<ffffffff810386e3>] warn_slowpath_fmt+0x41/0x43
       [<ffffffff81182dec>] ? sysfs_get_active+0x6b/0x82
       [<ffffffff81182382>] ? sysfs_open_file+0x32/0x212
       [<ffffffff811823c7>] sysfs_open_file+0x77/0x212
       [<ffffffff81182350>] ? sysfs_schedule_callback+0x1ac/0x1ac
       [<ffffffff81122562>] do_dentry_open+0x17c/0x257
       [<ffffffff8112267e>] finish_open+0x41/0x4f
       [<ffffffff81130225>] do_last+0x80c/0x9ba
       [<ffffffff8112dbbd>] ? inode_permission+0x40/0x42
       [<ffffffff81130606>] path_openat+0x233/0x4a1
       [<ffffffff81130b7e>] do_filp_open+0x35/0x85
       [<ffffffff8113b787>] ? __alloc_fd+0x172/0x184
       [<ffffffff811232ea>] do_sys_open+0x6b/0xfa
       [<ffffffff811233a7>] SyS_openat+0xf/0x11
       [<ffffffff8138c812>] system_call_fastpath+0x16/0x1b
      
      The failure to restore cpufreq devices on cancelled hibernation is
      not a new bug. It is caused by the ACPI _PPC call failing unless the
      hibernate is completed. This makes the acpi_cpufreq driver fail its
      init.
      
      Previously, the cpufreq device could be restored by offlining the
      cpu temporarily.  And as a complete hibernation cycle would do this,
      it would be automatically restored most of the time.  But after
      commit 5302c3fb the leftover sysfs attributes will block any
      device add action.  Therefore offlining and onlining CPU 1 will no
      longer restore the cpufreq object, and a complete suspend/resume
      cycle will replace it with garbage.
      
      Fixes: 5302c3fb ("cpufreq: Perform light-weight init/teardown during suspend/resume")
      Cc: 3.12+ <stable@vger.kernel.org> # 3.12+
      Signed-off-by: NBjørn Mork <bjorn@mork.no>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      2167e239
  6. 28 11月, 2013 1 次提交
  7. 31 10月, 2013 1 次提交
  8. 26 10月, 2013 2 次提交
  9. 17 10月, 2013 1 次提交
  10. 16 10月, 2013 9 次提交
  11. 10 10月, 2013 1 次提交
  12. 25 9月, 2013 1 次提交
  13. 20 9月, 2013 1 次提交
  14. 18 9月, 2013 2 次提交
  15. 12 9月, 2013 4 次提交
    • L
      cpufreq: Acquire the lock in cpufreq_policy_restore() for reading · 44871c9c
      Lan Tianyu 提交于
      In cpufreq_policy_restore() before system suspend policy is read from
      percpu's cpufreq_cpu_data_fallback.  It's a read operation rather
      than a write one, so take the lock for reading in there.
      Signed-off-by: NLan Tianyu <tianyu.lan@intel.com>
      Reviewed-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      44871c9c
    • S
      cpufreq: Prevent problems in update_policy_cpu() if last_cpu == new_cpu · cb38ed5c
      Srivatsa S. Bhat 提交于
      If update_policy_cpu() is invoked with the existing policy->cpu itself
      as the new-cpu parameter, then a lot of things can go terribly wrong.
      
      In its present form, update_policy_cpu() always assumes that the new-cpu
      is different from policy->cpu and invokes other functions to perform their
      respective updates. And those functions implement the actual update like
      this:
      
      per_cpu(..., new_cpu) = per_cpu(..., last_cpu);
      per_cpu(..., last_cpu) = NULL;
      
      Thus, when new_cpu == last_cpu, the final NULL assignment makes the per-cpu
      references vanish into thin air! (memory leak). From there, it leads to more
      problems: cpufreq_stats_create_table() now doesn't find the per-cpu reference
      and hence tries to create a new sysfs-group; but sysfs already had created
      the group earlier, so it complains that it cannot create a duplicate filename.
      In short, the repercussions of a rather innocuous invocation of
      update_policy_cpu() can turn out to be pretty nasty.
      
      Ideally update_policy_cpu() should handle this situation (new == last)
      gracefully, and not lead to such severe problems. So fix it by adding an
      appropriate check.
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Tested-by: NStephen Warren <swarren@nvidia.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      cb38ed5c
    • S
      cpufreq: Restructure if/else block to avoid unintended behavior · 61173f25
      Srivatsa S. Bhat 提交于
      In __cpufreq_remove_dev_prepare(), the code which decides whether to remove
      the sysfs link or nominate a new policy cpu, is governed by an if/else block
      with a rather complex set of conditionals. Worse, they harbor a subtlety
      which leads to certain unintended behavior.
      
      The code looks like this:
      
              if (cpu != policy->cpu && !frozen) {
                      sysfs_remove_link(&dev->kobj, "cpufreq");
              } else if (cpus > 1) {
      		new_cpu = cpufreq_nominate_new_policy_cpu(...);
      		...
      		update_policy_cpu(..., new_cpu);
      	}
      
      The original intention was:
      If the CPU going offline is not policy->cpu, just remove the link.
      On the other hand, if the CPU going offline is the policy->cpu itself,
      handover the policy->cpu job to some other surviving CPU in that policy.
      
      But because the 'if' condition also includes the 'frozen' check, now there
      are *two* possibilities by which we can enter the 'else' block:
      
      1. cpu == policy->cpu (intended)
      2. cpu != policy->cpu && frozen (unintended)
      
      Due to the second (unintended) scenario, we end up spuriously nominating
      a CPU as the policy->cpu, even when the existing policy->cpu is alive and
      well. This can cause problems further down the line, especially when we end
      up nominating the same policy->cpu as the new one (ie., old == new),
      because it totally confuses update_policy_cpu().
      
      To avoid this mess, restructure the if/else block to only do what was
      originally intended, and thus prevent any unwelcome surprises.
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Tested-by: NStephen Warren <swarren@nvidia.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      61173f25
    • S
      cpufreq: Fix crash in cpufreq-stats during suspend/resume · 0d66b91e
      Srivatsa S. Bhat 提交于
      Stephen Warren reported that the cpufreq-stats code hits a NULL pointer
      dereference during the second attempt to suspend a system. He also
      pin-pointed the problem to commit 5302c3fb "cpufreq: Perform light-weight
      init/teardown during suspend/resume".
      
      That commit actually ensured that the cpufreq-stats table and the
      cpufreq-stats sysfs entries are *not* torn down (ie., not freed) during
      suspend/resume, which makes it all the more surprising. However, it turns
      out that the root-cause is not that we access an already freed memory, but
      that the reference to the allocated memory gets moved around and we lose
      track of that during resume, leading to the reported crash in a subsequent
      suspend attempt.
      
      In the suspend path, during CPU offline, the value of policy->cpu is updated
      by choosing one of the surviving CPUs in that policy, as long as there is
      atleast one CPU in that policy. And cpufreq_stats_update_policy_cpu() is
      invoked to update the reference to the stats structure by assigning it to
      the new CPU. However, in the resume path, during CPU online, we end up
      assigning a fresh CPU as the policy->cpu, without letting cpufreq-stats
      know about this. Thus the reference to the stats structure remains
      (incorrectly) associated with the old CPU. So, in a subsequent suspend attempt,
      during CPU offline, we end up accessing an incorrect location to get the
      stats structure, which eventually leads to the NULL pointer dereference.
      
      Fix this by letting cpufreq-stats know about the update of the policy->cpu
      during CPU online in the resume path. (Also, move the update_policy_cpu()
      function higher up in the file, so that __cpufreq_add_dev() can invoke
      it).
      Reported-and-tested-by: NStephen Warren <swarren@nvidia.com>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      0d66b91e
  16. 10 9月, 2013 6 次提交
    • R
      Revert "cpufreq: make sure frequency transitions are serialized" · 798282a8
      Rafael J. Wysocki 提交于
      Commit 7c30ed53 (cpufreq: make sure frequency transitions are
      serialized) attempted to serialize frequency transitions by
      adding checks to the CPUFREQ_PRECHANGE and CPUFREQ_POSTCHANGE
      notifications.  However, it assumed that the notifications will
      always originate from the driver's .target() callback, but they
      also can be triggered by cpufreq_out_of_sync() and that leads to
      warnings like this on some systems:
      
       WARNING: CPU: 0 PID: 14543 at drivers/cpufreq/cpufreq.c:317
       __cpufreq_notify_transition+0x238/0x260()
       In middle of another frequency transition
      
      accompanied by a call trace similar to this one:
      
       [<ffffffff81720daa>] dump_stack+0x46/0x58
       [<ffffffff8106534c>] warn_slowpath_common+0x8c/0xc0
       [<ffffffff815b8560>] ? acpi_cpufreq_target+0x320/0x320
       [<ffffffff81065436>] warn_slowpath_fmt+0x46/0x50
       [<ffffffff815b1ec8>] __cpufreq_notify_transition+0x238/0x260
       [<ffffffff815b33be>] cpufreq_notify_transition+0x3e/0x70
       [<ffffffff815b345d>] cpufreq_out_of_sync+0x6d/0xb0
       [<ffffffff815b370c>] cpufreq_update_policy+0x10c/0x160
       [<ffffffff815b3760>] ? cpufreq_update_policy+0x160/0x160
       [<ffffffff81413813>] cpufreq_set_cur_state+0x8c/0xb5
       [<ffffffff814138df>] processor_set_cur_state+0xa3/0xcf
       [<ffffffff8158e13c>] thermal_cdev_update+0x9c/0xb0
       [<ffffffff8159046a>] step_wise_throttle+0x5a/0x90
       [<ffffffff8158e21f>] handle_thermal_trip+0x4f/0x140
       [<ffffffff8158e377>] thermal_zone_device_update+0x57/0xa0
       [<ffffffff81415b36>] acpi_thermal_check+0x2e/0x30
       [<ffffffff81415ca0>] acpi_thermal_notify+0x40/0xdc
       [<ffffffff813e7dbd>] acpi_device_notify+0x19/0x1b
       [<ffffffff813f8241>] acpi_ev_notify_dispatch+0x41/0x5c
       [<ffffffff813e3fbe>] acpi_os_execute_deferred+0x25/0x32
       [<ffffffff81081060>] process_one_work+0x170/0x4a0
       [<ffffffff81082121>] worker_thread+0x121/0x390
       [<ffffffff81082000>] ? manage_workers.isra.20+0x170/0x170
       [<ffffffff81088fe0>] kthread+0xc0/0xd0
       [<ffffffff81088f20>] ? flush_kthread_worker+0xb0/0xb0
       [<ffffffff8173582c>] ret_from_fork+0x7c/0xb0
       [<ffffffff81088f20>] ? flush_kthread_worker+0xb0/0xb0
      
      For this reason, revert commit 7c30ed53 along with the fix 266c13d7
      (cpufreq: Fix serialization of frequency transitions) on top of it
      and we will revisit the serialization problem later.
      Reported-by: NAlessandro Bono <alessandro.bono@gmail.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      798282a8
    • S
      cpufreq: Use signed type for 'ret' variable, to store negative error values · 5136fa56
      Srivatsa S. Bhat 提交于
      There are places where the variable 'ret' is declared as unsigned int
      and then used to store negative return values such as -EINVAL. Fix them
      by declaring the variable as a signed quantity.
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      5136fa56
    • S
      cpufreq: Remove temporary fix for race between CPU hotplug and sysfs-writes · 56d07db2
      Srivatsa S. Bhat 提交于
      Commit "cpufreq: serialize calls to __cpufreq_governor()" had been a temporary
      and partial solution to the race condition between writing to a cpufreq sysfs
      file and taking a CPU offline. Now that we have a proper and complete solution
      to that problem, remove the temporary fix.
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      56d07db2
    • S
      cpufreq: Synchronize the cpufreq store_*() routines with CPU hotplug · 4f750c93
      Srivatsa S. Bhat 提交于
      The functions that are used to write to cpufreq sysfs files (such as
      store_scaling_max_freq()) are not hotplug safe. They can race with CPU
      hotplug tasks and lead to problems such as trying to acquire an already
      destroyed timer-mutex etc.
      
      Eg:
      
          __cpufreq_remove_dev()
           __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
             policy->governor->governor(policy, CPUFREQ_GOV_STOP);
              cpufreq_governor_dbs()
               case CPUFREQ_GOV_STOP:
                mutex_destroy(&cpu_cdbs->timer_mutex)
                cpu_cdbs->cur_policy = NULL;
            <PREEMPT>
          store()
           __cpufreq_set_policy()
            __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
              policy->governor->governor(policy, CPUFREQ_GOV_LIMITS);
               case CPUFREQ_GOV_LIMITS:
                mutex_lock(&cpu_cdbs->timer_mutex); <-- Warning (destroyed mutex)
                 if (policy->max < cpu_cdbs->cur_policy->cur) <- cur_policy == NULL
      
      So use get_online_cpus()/put_online_cpus() in the store_*() functions, to
      synchronize with CPU hotplug. However, there is an additional point to note
      here: some parts of the CPU teardown in the cpufreq subsystem are done in
      the CPU_POST_DEAD stage, with cpu_hotplug.lock *released*. So, using the
      get/put_online_cpus() functions alone is insufficient; we should also ensure
      that we don't race with those latter steps in the hotplug sequence. We can
      easily achieve this by checking if the CPU is online before proceeding with
      the store, since the CPU would have been marked offline by the time the
      CPU_POST_DEAD notifiers are executed.
      Reported-by: NStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      4f750c93
    • S
      cpufreq: Invoke __cpufreq_remove_dev_finish() after releasing cpu_hotplug.lock · 1aee40ac
      Srivatsa S. Bhat 提交于
      __cpufreq_remove_dev_finish() handles the kobject cleanup for a CPU going
      offline. But because we destroy the kobject towards the end of the CPU offline
      phase, there are certain race windows where a task can try to write to a
      cpufreq sysfs file (eg: using store_scaling_max_freq()) while we are taking
      that CPU offline, and this can bump up the kobject refcount, which in turn might
      hinder the CPU offline task from running to completion. (It can also cause
      other more serious problems such as trying to acquire a destroyed timer-mutex
      etc., depending on the exact stage of the cleanup at which the task managed to
      take a new refcount).
      
      To fix the race window, we will need to synchronize those store_*() call-sites
      with CPU hotplug, using get_online_cpus()/put_online_cpus(). However, that
      in turn can cause a total deadlock because it can end up waiting for the
      CPU offline task to complete, with incremented refcount!
      
      Write to sysfs                            CPU offline task
      --------------                            ----------------
      kobj_refcnt++
      
                                                Acquire cpu_hotplug.lock
      
      get_online_cpus();
      
      					  Wait for kobj_refcnt to drop to zero
      
                           **DEADLOCK**
      
      A simple way to avoid this problem is to perform the kobject cleanup in the
      CPU offline path, with the cpu_hotplug.lock *released*. That is, we can
      perform the wait-for-kobj-refcnt-to-drop as well as the subsequent cleanup
      in the CPU_POST_DEAD stage of CPU offline, which is run with cpu_hotplug.lock
      released. Doing this helps us avoid deadlocks due to holding kobject refcounts
      and waiting on each other on the cpu_hotplug.lock.
      
      (Note: We can't move all of the cpufreq CPU offline steps to the
      CPU_POST_DEAD stage, because certain things such as stopping the governors
      have to be done before the outgoing CPU is marked offline. So retain those
      parts in the CPU_DOWN_PREPARE stage itself).
      Reported-by: NStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      1aee40ac
    • S
      cpufreq: Split __cpufreq_remove_dev() into two parts · cedb70af
      Srivatsa S. Bhat 提交于
      During CPU offline, the cpufreq core invokes __cpufreq_remove_dev()
      to perform work such as stopping the cpufreq governor, clearing the
      CPU from the policy structure etc, and finally cleaning up the
      kobject.
      
      There are certain subtle issues related to the kobject cleanup, and
      it would be much easier to deal with them if we separate that part
      from the rest of the cleanup-work in the CPU offline phase. So split
      the __cpufreq_remove_dev() function into 2 parts: one that handles
      the kobject cleanup, and the other that handles the rest of the work.
      Reported-by: NStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      cedb70af