1. 12 9月, 2013 3 次提交
    • S
      cpufreq: Prevent problems in update_policy_cpu() if last_cpu == new_cpu · cb38ed5c
      Srivatsa S. Bhat 提交于
      If update_policy_cpu() is invoked with the existing policy->cpu itself
      as the new-cpu parameter, then a lot of things can go terribly wrong.
      
      In its present form, update_policy_cpu() always assumes that the new-cpu
      is different from policy->cpu and invokes other functions to perform their
      respective updates. And those functions implement the actual update like
      this:
      
      per_cpu(..., new_cpu) = per_cpu(..., last_cpu);
      per_cpu(..., last_cpu) = NULL;
      
      Thus, when new_cpu == last_cpu, the final NULL assignment makes the per-cpu
      references vanish into thin air! (memory leak). From there, it leads to more
      problems: cpufreq_stats_create_table() now doesn't find the per-cpu reference
      and hence tries to create a new sysfs-group; but sysfs already had created
      the group earlier, so it complains that it cannot create a duplicate filename.
      In short, the repercussions of a rather innocuous invocation of
      update_policy_cpu() can turn out to be pretty nasty.
      
      Ideally update_policy_cpu() should handle this situation (new == last)
      gracefully, and not lead to such severe problems. So fix it by adding an
      appropriate check.
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Tested-by: NStephen Warren <swarren@nvidia.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      cb38ed5c
    • S
      cpufreq: Restructure if/else block to avoid unintended behavior · 61173f25
      Srivatsa S. Bhat 提交于
      In __cpufreq_remove_dev_prepare(), the code which decides whether to remove
      the sysfs link or nominate a new policy cpu, is governed by an if/else block
      with a rather complex set of conditionals. Worse, they harbor a subtlety
      which leads to certain unintended behavior.
      
      The code looks like this:
      
              if (cpu != policy->cpu && !frozen) {
                      sysfs_remove_link(&dev->kobj, "cpufreq");
              } else if (cpus > 1) {
      		new_cpu = cpufreq_nominate_new_policy_cpu(...);
      		...
      		update_policy_cpu(..., new_cpu);
      	}
      
      The original intention was:
      If the CPU going offline is not policy->cpu, just remove the link.
      On the other hand, if the CPU going offline is the policy->cpu itself,
      handover the policy->cpu job to some other surviving CPU in that policy.
      
      But because the 'if' condition also includes the 'frozen' check, now there
      are *two* possibilities by which we can enter the 'else' block:
      
      1. cpu == policy->cpu (intended)
      2. cpu != policy->cpu && frozen (unintended)
      
      Due to the second (unintended) scenario, we end up spuriously nominating
      a CPU as the policy->cpu, even when the existing policy->cpu is alive and
      well. This can cause problems further down the line, especially when we end
      up nominating the same policy->cpu as the new one (ie., old == new),
      because it totally confuses update_policy_cpu().
      
      To avoid this mess, restructure the if/else block to only do what was
      originally intended, and thus prevent any unwelcome surprises.
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Tested-by: NStephen Warren <swarren@nvidia.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      61173f25
    • S
      cpufreq: Fix crash in cpufreq-stats during suspend/resume · 0d66b91e
      Srivatsa S. Bhat 提交于
      Stephen Warren reported that the cpufreq-stats code hits a NULL pointer
      dereference during the second attempt to suspend a system. He also
      pin-pointed the problem to commit 5302c3fb "cpufreq: Perform light-weight
      init/teardown during suspend/resume".
      
      That commit actually ensured that the cpufreq-stats table and the
      cpufreq-stats sysfs entries are *not* torn down (ie., not freed) during
      suspend/resume, which makes it all the more surprising. However, it turns
      out that the root-cause is not that we access an already freed memory, but
      that the reference to the allocated memory gets moved around and we lose
      track of that during resume, leading to the reported crash in a subsequent
      suspend attempt.
      
      In the suspend path, during CPU offline, the value of policy->cpu is updated
      by choosing one of the surviving CPUs in that policy, as long as there is
      atleast one CPU in that policy. And cpufreq_stats_update_policy_cpu() is
      invoked to update the reference to the stats structure by assigning it to
      the new CPU. However, in the resume path, during CPU online, we end up
      assigning a fresh CPU as the policy->cpu, without letting cpufreq-stats
      know about this. Thus the reference to the stats structure remains
      (incorrectly) associated with the old CPU. So, in a subsequent suspend attempt,
      during CPU offline, we end up accessing an incorrect location to get the
      stats structure, which eventually leads to the NULL pointer dereference.
      
      Fix this by letting cpufreq-stats know about the update of the policy->cpu
      during CPU online in the resume path. (Also, move the update_policy_cpu()
      function higher up in the file, so that __cpufreq_add_dev() can invoke
      it).
      Reported-and-tested-by: NStephen Warren <swarren@nvidia.com>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      0d66b91e
  2. 10 9月, 2013 8 次提交
    • R
      Revert "cpufreq: make sure frequency transitions are serialized" · 798282a8
      Rafael J. Wysocki 提交于
      Commit 7c30ed53 (cpufreq: make sure frequency transitions are
      serialized) attempted to serialize frequency transitions by
      adding checks to the CPUFREQ_PRECHANGE and CPUFREQ_POSTCHANGE
      notifications.  However, it assumed that the notifications will
      always originate from the driver's .target() callback, but they
      also can be triggered by cpufreq_out_of_sync() and that leads to
      warnings like this on some systems:
      
       WARNING: CPU: 0 PID: 14543 at drivers/cpufreq/cpufreq.c:317
       __cpufreq_notify_transition+0x238/0x260()
       In middle of another frequency transition
      
      accompanied by a call trace similar to this one:
      
       [<ffffffff81720daa>] dump_stack+0x46/0x58
       [<ffffffff8106534c>] warn_slowpath_common+0x8c/0xc0
       [<ffffffff815b8560>] ? acpi_cpufreq_target+0x320/0x320
       [<ffffffff81065436>] warn_slowpath_fmt+0x46/0x50
       [<ffffffff815b1ec8>] __cpufreq_notify_transition+0x238/0x260
       [<ffffffff815b33be>] cpufreq_notify_transition+0x3e/0x70
       [<ffffffff815b345d>] cpufreq_out_of_sync+0x6d/0xb0
       [<ffffffff815b370c>] cpufreq_update_policy+0x10c/0x160
       [<ffffffff815b3760>] ? cpufreq_update_policy+0x160/0x160
       [<ffffffff81413813>] cpufreq_set_cur_state+0x8c/0xb5
       [<ffffffff814138df>] processor_set_cur_state+0xa3/0xcf
       [<ffffffff8158e13c>] thermal_cdev_update+0x9c/0xb0
       [<ffffffff8159046a>] step_wise_throttle+0x5a/0x90
       [<ffffffff8158e21f>] handle_thermal_trip+0x4f/0x140
       [<ffffffff8158e377>] thermal_zone_device_update+0x57/0xa0
       [<ffffffff81415b36>] acpi_thermal_check+0x2e/0x30
       [<ffffffff81415ca0>] acpi_thermal_notify+0x40/0xdc
       [<ffffffff813e7dbd>] acpi_device_notify+0x19/0x1b
       [<ffffffff813f8241>] acpi_ev_notify_dispatch+0x41/0x5c
       [<ffffffff813e3fbe>] acpi_os_execute_deferred+0x25/0x32
       [<ffffffff81081060>] process_one_work+0x170/0x4a0
       [<ffffffff81082121>] worker_thread+0x121/0x390
       [<ffffffff81082000>] ? manage_workers.isra.20+0x170/0x170
       [<ffffffff81088fe0>] kthread+0xc0/0xd0
       [<ffffffff81088f20>] ? flush_kthread_worker+0xb0/0xb0
       [<ffffffff8173582c>] ret_from_fork+0x7c/0xb0
       [<ffffffff81088f20>] ? flush_kthread_worker+0xb0/0xb0
      
      For this reason, revert commit 7c30ed53 along with the fix 266c13d7
      (cpufreq: Fix serialization of frequency transitions) on top of it
      and we will revisit the serialization problem later.
      Reported-by: NAlessandro Bono <alessandro.bono@gmail.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      798282a8
    • S
      cpufreq: Use signed type for 'ret' variable, to store negative error values · 5136fa56
      Srivatsa S. Bhat 提交于
      There are places where the variable 'ret' is declared as unsigned int
      and then used to store negative return values such as -EINVAL. Fix them
      by declaring the variable as a signed quantity.
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      5136fa56
    • S
      cpufreq: Remove temporary fix for race between CPU hotplug and sysfs-writes · 56d07db2
      Srivatsa S. Bhat 提交于
      Commit "cpufreq: serialize calls to __cpufreq_governor()" had been a temporary
      and partial solution to the race condition between writing to a cpufreq sysfs
      file and taking a CPU offline. Now that we have a proper and complete solution
      to that problem, remove the temporary fix.
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      56d07db2
    • S
      cpufreq: Synchronize the cpufreq store_*() routines with CPU hotplug · 4f750c93
      Srivatsa S. Bhat 提交于
      The functions that are used to write to cpufreq sysfs files (such as
      store_scaling_max_freq()) are not hotplug safe. They can race with CPU
      hotplug tasks and lead to problems such as trying to acquire an already
      destroyed timer-mutex etc.
      
      Eg:
      
          __cpufreq_remove_dev()
           __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
             policy->governor->governor(policy, CPUFREQ_GOV_STOP);
              cpufreq_governor_dbs()
               case CPUFREQ_GOV_STOP:
                mutex_destroy(&cpu_cdbs->timer_mutex)
                cpu_cdbs->cur_policy = NULL;
            <PREEMPT>
          store()
           __cpufreq_set_policy()
            __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
              policy->governor->governor(policy, CPUFREQ_GOV_LIMITS);
               case CPUFREQ_GOV_LIMITS:
                mutex_lock(&cpu_cdbs->timer_mutex); <-- Warning (destroyed mutex)
                 if (policy->max < cpu_cdbs->cur_policy->cur) <- cur_policy == NULL
      
      So use get_online_cpus()/put_online_cpus() in the store_*() functions, to
      synchronize with CPU hotplug. However, there is an additional point to note
      here: some parts of the CPU teardown in the cpufreq subsystem are done in
      the CPU_POST_DEAD stage, with cpu_hotplug.lock *released*. So, using the
      get/put_online_cpus() functions alone is insufficient; we should also ensure
      that we don't race with those latter steps in the hotplug sequence. We can
      easily achieve this by checking if the CPU is online before proceeding with
      the store, since the CPU would have been marked offline by the time the
      CPU_POST_DEAD notifiers are executed.
      Reported-by: NStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      4f750c93
    • S
      cpufreq: Invoke __cpufreq_remove_dev_finish() after releasing cpu_hotplug.lock · 1aee40ac
      Srivatsa S. Bhat 提交于
      __cpufreq_remove_dev_finish() handles the kobject cleanup for a CPU going
      offline. But because we destroy the kobject towards the end of the CPU offline
      phase, there are certain race windows where a task can try to write to a
      cpufreq sysfs file (eg: using store_scaling_max_freq()) while we are taking
      that CPU offline, and this can bump up the kobject refcount, which in turn might
      hinder the CPU offline task from running to completion. (It can also cause
      other more serious problems such as trying to acquire a destroyed timer-mutex
      etc., depending on the exact stage of the cleanup at which the task managed to
      take a new refcount).
      
      To fix the race window, we will need to synchronize those store_*() call-sites
      with CPU hotplug, using get_online_cpus()/put_online_cpus(). However, that
      in turn can cause a total deadlock because it can end up waiting for the
      CPU offline task to complete, with incremented refcount!
      
      Write to sysfs                            CPU offline task
      --------------                            ----------------
      kobj_refcnt++
      
                                                Acquire cpu_hotplug.lock
      
      get_online_cpus();
      
      					  Wait for kobj_refcnt to drop to zero
      
                           **DEADLOCK**
      
      A simple way to avoid this problem is to perform the kobject cleanup in the
      CPU offline path, with the cpu_hotplug.lock *released*. That is, we can
      perform the wait-for-kobj-refcnt-to-drop as well as the subsequent cleanup
      in the CPU_POST_DEAD stage of CPU offline, which is run with cpu_hotplug.lock
      released. Doing this helps us avoid deadlocks due to holding kobject refcounts
      and waiting on each other on the cpu_hotplug.lock.
      
      (Note: We can't move all of the cpufreq CPU offline steps to the
      CPU_POST_DEAD stage, because certain things such as stopping the governors
      have to be done before the outgoing CPU is marked offline. So retain those
      parts in the CPU_DOWN_PREPARE stage itself).
      Reported-by: NStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      1aee40ac
    • S
      cpufreq: Split __cpufreq_remove_dev() into two parts · cedb70af
      Srivatsa S. Bhat 提交于
      During CPU offline, the cpufreq core invokes __cpufreq_remove_dev()
      to perform work such as stopping the cpufreq governor, clearing the
      CPU from the policy structure etc, and finally cleaning up the
      kobject.
      
      There are certain subtle issues related to the kobject cleanup, and
      it would be much easier to deal with them if we separate that part
      from the rest of the cleanup-work in the CPU offline phase. So split
      the __cpufreq_remove_dev() function into 2 parts: one that handles
      the kobject cleanup, and the other that handles the rest of the work.
      Reported-by: NStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      cedb70af
    • V
      cpufreq: serialize calls to __cpufreq_governor() · 19c76303
      Viresh Kumar 提交于
      We can't take a big lock around __cpufreq_governor() as this causes
      recursive locking for some cases. But calls to this routine must be
      serialized for every policy. Otherwise we can see some unpredictable
      events.
      
      For example, consider following scenario:
      
      __cpufreq_remove_dev()
       __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
         policy->governor->governor(policy, CPUFREQ_GOV_STOP);
          cpufreq_governor_dbs()
           case CPUFREQ_GOV_STOP:
            mutex_destroy(&cpu_cdbs->timer_mutex)
            cpu_cdbs->cur_policy = NULL;
        <PREEMPT>
      store()
       __cpufreq_set_policy()
        __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
          policy->governor->governor(policy, CPUFREQ_GOV_LIMITS);
           case CPUFREQ_GOV_LIMITS:
            mutex_lock(&cpu_cdbs->timer_mutex); <-- Warning (destroyed mutex)
             if (policy->max < cpu_cdbs->cur_policy->cur) <- cur_policy == NULL
      
      And so store() will eventually result in a crash if cur_policy is
      NULL at this point.
      
      Introduce an additional variable which would guarantee serialization
      here.
      Reported-by: NStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      19c76303
    • V
      cpufreq: don't allow governor limits to be changed when it is disabled · f73d3933
      Viresh Kumar 提交于
      __cpufreq_governor() returns with -EBUSY when governor is already
      stopped and we try to stop it again, but when it is stopped we must
      not allow calls to CPUFREQ_GOV_LIMITS event as well.
      
      This patch adds this check in __cpufreq_governor().
      Reported-by: NStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      f73d3933
  3. 21 8月, 2013 1 次提交
    • L
      cpufreq: fix bad unlock balance on !CONFIG_SMP · 5025d628
      Li Zhong 提交于
      This patch tries to fix lockdep complaint attached below.
      
      It seems that we should always read acquire the cpufreq_rwsem,
      whether CONFIG_SMP is enabled or not.  And CONFIG_HOTPLUG_CPU
      depends on CONFIG_SMP, so it seems we don't need CONFIG_SMP for the
      code enabled by CONFIG_HOTPLUG_CPU.
      
      [    0.504191] =====================================
      [    0.504627] [ BUG: bad unlock balance detected! ]
      [    0.504627] 3.11.0-rc6-next-20130819 #1 Not tainted
      [    0.504627] -------------------------------------
      [    0.504627] swapper/1 is trying to release lock (cpufreq_rwsem) at:
      [    0.504627] [<ffffffff813d927a>] cpufreq_add_dev+0x13a/0x3e0
      [    0.504627] but there are no more locks to release!
      [    0.504627]
      [    0.504627] other info that might help us debug this:
      [    0.504627] 1 lock held by swapper/1:
      [    0.504627]  #0:  (subsys mutex#4){+.+.+.}, at: [<ffffffff8134a7bf>] subsys_interface_register+0x4f/0xe0
      [    0.504627]
      [    0.504627] stack backtrace:
      [    0.504627] CPU: 0 PID: 1 Comm: swapper Not tainted 3.11.0-rc6-next-20130819 #1
      [    0.504627] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
      [    0.504627]  ffffffff813d927a ffff88007f847c98 ffffffff814c062b ffff88007f847cc8
      [    0.504627]  ffffffff81098bce ffff88007f847cf8 ffffffff81aadc30 ffffffff813d927a
      [    0.504627]  00000000ffffffff ffff88007f847d68 ffffffff8109d0be 0000000000000006
      [    0.504627] Call Trace:
      [    0.504627]  [<ffffffff813d927a>] ? cpufreq_add_dev+0x13a/0x3e0
      [    0.504627]  [<ffffffff814c062b>] dump_stack+0x19/0x1b
      [    0.504627]  [<ffffffff81098bce>] print_unlock_imbalance_bug+0xfe/0x110
      [    0.504627]  [<ffffffff813d927a>] ? cpufreq_add_dev+0x13a/0x3e0
      [    0.504627]  [<ffffffff8109d0be>] lock_release_non_nested+0x1ee/0x310
      [    0.504627]  [<ffffffff81099d0e>] ? mark_held_locks+0xae/0x120
      [    0.504627]  [<ffffffff811510cb>] ? kfree+0xcb/0x1d0
      [    0.504627]  [<ffffffff813d77ea>] ? cpufreq_policy_free+0x4a/0x60
      [    0.504627]  [<ffffffff813d927a>] ? cpufreq_add_dev+0x13a/0x3e0
      [    0.504627]  [<ffffffff8109d2a4>] lock_release+0xc4/0x250
      [    0.504627]  [<ffffffff8106c9f3>] up_read+0x23/0x40
      [    0.504627]  [<ffffffff813d927a>] cpufreq_add_dev+0x13a/0x3e0
      [    0.504627]  [<ffffffff8134a809>] subsys_interface_register+0x99/0xe0
      [    0.504627]  [<ffffffff81b19f3b>] ? cpufreq_gov_dbs_init+0x12/0x12
      [    0.504627]  [<ffffffff813d7f0d>] cpufreq_register_driver+0x9d/0x1d0
      [    0.504627]  [<ffffffff81b19f3b>] ? cpufreq_gov_dbs_init+0x12/0x12
      [    0.504627]  [<ffffffff81b1a039>] acpi_cpufreq_init+0xfe/0x1f8
      [    0.504627]  [<ffffffff810002ba>] do_one_initcall+0xda/0x180
      [    0.504627]  [<ffffffff81ae301e>] kernel_init_freeable+0x12c/0x1bb
      [    0.504627]  [<ffffffff81ae2841>] ? do_early_param+0x8c/0x8c
      [    0.504627]  [<ffffffff814b4dd0>] ? rest_init+0x140/0x140
      [    0.504627]  [<ffffffff814b4dde>] kernel_init+0xe/0xf0
      [    0.504627]  [<ffffffff814d029a>] ret_from_fork+0x7a/0xb0
      [    0.504627]  [<ffffffff814b4dd0>] ? rest_init+0x140/0x140
      Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Acked-and-tested-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      5025d628
  4. 20 8月, 2013 5 次提交
  5. 18 8月, 2013 1 次提交
    • R
      Revert "cpufreq: Use cpufreq_policy_list for iterating over policies" · 878f6e07
      Rafael J. Wysocki 提交于
      Revert commit eb608521 (cpufreq: Use cpufreq_policy_list for iterating
      over policies), because it breaks system suspend/resume on multiple
      machines.
      
      It either causes resume to block indefinitely or causes the BUG_ON()
      in lock_policy_rwsem_##mode() to trigger on sysfs accesses to cpufreq
      attributes.
      
      Conflicts:
      	drivers/cpufreq/cpufreq.c
      878f6e07
  6. 10 8月, 2013 5 次提交
  7. 08 8月, 2013 15 次提交
  8. 30 7月, 2013 1 次提交
    • R
      cpufreq: Fix cpufreq driver module refcount balance after suspend/resume · 2a998599
      Rafael J. Wysocki 提交于
      Since cpufreq_cpu_put() called by __cpufreq_remove_dev() drops the
      driver module refcount, __cpufreq_remove_dev() causes that refcount
      to become negative for the cpufreq driver after a suspend/resume
      cycle.
      
      This is not the only bad thing that happens there, however, because
      kobject_put() should only be called for the policy kobject at this
      point if the CPU is not the last one for that policy.
      
      Namely, if the given CPU is the last one for that policy, the
      policy kobject's refcount should be 1 at this point, as set by
      cpufreq_add_dev_interface(), and only needs to be dropped once for
      the kobject to go away.  This actually happens under the cpu == 1
      check, so it need not be done before by cpufreq_cpu_put().
      
      On the other hand, if the given CPU is not the last one for that
      policy, this means that cpufreq_add_policy_cpu() has been called
      at least once for that policy and cpufreq_cpu_get() has been
      called for it too.  To balance that cpufreq_cpu_get(), we need to
      call cpufreq_cpu_put() in that case.
      
      Thus, to fix the described problem and keep the reference
      counters balanced in both cases, move the cpufreq_cpu_get() call
      in __cpufreq_remove_dev() to the code path executed only for
      CPUs that share the policy with other CPUs.
      Reported-and-tested-by: NToralf Förster <toralf.foerster@gmx.de>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: 3.10+ <stable@vger.kernel.org>
      2a998599
  9. 26 7月, 2013 1 次提交