1. 10 9月, 2013 5 次提交
    • S
      cpufreq: Synchronize the cpufreq store_*() routines with CPU hotplug · 4f750c93
      Srivatsa S. Bhat 提交于
      The functions that are used to write to cpufreq sysfs files (such as
      store_scaling_max_freq()) are not hotplug safe. They can race with CPU
      hotplug tasks and lead to problems such as trying to acquire an already
      destroyed timer-mutex etc.
      
      Eg:
      
          __cpufreq_remove_dev()
           __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
             policy->governor->governor(policy, CPUFREQ_GOV_STOP);
              cpufreq_governor_dbs()
               case CPUFREQ_GOV_STOP:
                mutex_destroy(&cpu_cdbs->timer_mutex)
                cpu_cdbs->cur_policy = NULL;
            <PREEMPT>
          store()
           __cpufreq_set_policy()
            __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
              policy->governor->governor(policy, CPUFREQ_GOV_LIMITS);
               case CPUFREQ_GOV_LIMITS:
                mutex_lock(&cpu_cdbs->timer_mutex); <-- Warning (destroyed mutex)
                 if (policy->max < cpu_cdbs->cur_policy->cur) <- cur_policy == NULL
      
      So use get_online_cpus()/put_online_cpus() in the store_*() functions, to
      synchronize with CPU hotplug. However, there is an additional point to note
      here: some parts of the CPU teardown in the cpufreq subsystem are done in
      the CPU_POST_DEAD stage, with cpu_hotplug.lock *released*. So, using the
      get/put_online_cpus() functions alone is insufficient; we should also ensure
      that we don't race with those latter steps in the hotplug sequence. We can
      easily achieve this by checking if the CPU is online before proceeding with
      the store, since the CPU would have been marked offline by the time the
      CPU_POST_DEAD notifiers are executed.
      Reported-by: NStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      4f750c93
    • S
      cpufreq: Invoke __cpufreq_remove_dev_finish() after releasing cpu_hotplug.lock · 1aee40ac
      Srivatsa S. Bhat 提交于
      __cpufreq_remove_dev_finish() handles the kobject cleanup for a CPU going
      offline. But because we destroy the kobject towards the end of the CPU offline
      phase, there are certain race windows where a task can try to write to a
      cpufreq sysfs file (eg: using store_scaling_max_freq()) while we are taking
      that CPU offline, and this can bump up the kobject refcount, which in turn might
      hinder the CPU offline task from running to completion. (It can also cause
      other more serious problems such as trying to acquire a destroyed timer-mutex
      etc., depending on the exact stage of the cleanup at which the task managed to
      take a new refcount).
      
      To fix the race window, we will need to synchronize those store_*() call-sites
      with CPU hotplug, using get_online_cpus()/put_online_cpus(). However, that
      in turn can cause a total deadlock because it can end up waiting for the
      CPU offline task to complete, with incremented refcount!
      
      Write to sysfs                            CPU offline task
      --------------                            ----------------
      kobj_refcnt++
      
                                                Acquire cpu_hotplug.lock
      
      get_online_cpus();
      
      					  Wait for kobj_refcnt to drop to zero
      
                           **DEADLOCK**
      
      A simple way to avoid this problem is to perform the kobject cleanup in the
      CPU offline path, with the cpu_hotplug.lock *released*. That is, we can
      perform the wait-for-kobj-refcnt-to-drop as well as the subsequent cleanup
      in the CPU_POST_DEAD stage of CPU offline, which is run with cpu_hotplug.lock
      released. Doing this helps us avoid deadlocks due to holding kobject refcounts
      and waiting on each other on the cpu_hotplug.lock.
      
      (Note: We can't move all of the cpufreq CPU offline steps to the
      CPU_POST_DEAD stage, because certain things such as stopping the governors
      have to be done before the outgoing CPU is marked offline. So retain those
      parts in the CPU_DOWN_PREPARE stage itself).
      Reported-by: NStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      1aee40ac
    • S
      cpufreq: Split __cpufreq_remove_dev() into two parts · cedb70af
      Srivatsa S. Bhat 提交于
      During CPU offline, the cpufreq core invokes __cpufreq_remove_dev()
      to perform work such as stopping the cpufreq governor, clearing the
      CPU from the policy structure etc, and finally cleaning up the
      kobject.
      
      There are certain subtle issues related to the kobject cleanup, and
      it would be much easier to deal with them if we separate that part
      from the rest of the cleanup-work in the CPU offline phase. So split
      the __cpufreq_remove_dev() function into 2 parts: one that handles
      the kobject cleanup, and the other that handles the rest of the work.
      Reported-by: NStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      cedb70af
    • V
      cpufreq: serialize calls to __cpufreq_governor() · 19c76303
      Viresh Kumar 提交于
      We can't take a big lock around __cpufreq_governor() as this causes
      recursive locking for some cases. But calls to this routine must be
      serialized for every policy. Otherwise we can see some unpredictable
      events.
      
      For example, consider following scenario:
      
      __cpufreq_remove_dev()
       __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
         policy->governor->governor(policy, CPUFREQ_GOV_STOP);
          cpufreq_governor_dbs()
           case CPUFREQ_GOV_STOP:
            mutex_destroy(&cpu_cdbs->timer_mutex)
            cpu_cdbs->cur_policy = NULL;
        <PREEMPT>
      store()
       __cpufreq_set_policy()
        __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
          policy->governor->governor(policy, CPUFREQ_GOV_LIMITS);
           case CPUFREQ_GOV_LIMITS:
            mutex_lock(&cpu_cdbs->timer_mutex); <-- Warning (destroyed mutex)
             if (policy->max < cpu_cdbs->cur_policy->cur) <- cur_policy == NULL
      
      And so store() will eventually result in a crash if cur_policy is
      NULL at this point.
      
      Introduce an additional variable which would guarantee serialization
      here.
      Reported-by: NStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      19c76303
    • V
      cpufreq: don't allow governor limits to be changed when it is disabled · f73d3933
      Viresh Kumar 提交于
      __cpufreq_governor() returns with -EBUSY when governor is already
      stopped and we try to stop it again, but when it is stopped we must
      not allow calls to CPUFREQ_GOV_LIMITS event as well.
      
      This patch adds this check in __cpufreq_governor().
      Reported-by: NStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      f73d3933
  2. 21 8月, 2013 1 次提交
    • L
      cpufreq: fix bad unlock balance on !CONFIG_SMP · 5025d628
      Li Zhong 提交于
      This patch tries to fix lockdep complaint attached below.
      
      It seems that we should always read acquire the cpufreq_rwsem,
      whether CONFIG_SMP is enabled or not.  And CONFIG_HOTPLUG_CPU
      depends on CONFIG_SMP, so it seems we don't need CONFIG_SMP for the
      code enabled by CONFIG_HOTPLUG_CPU.
      
      [    0.504191] =====================================
      [    0.504627] [ BUG: bad unlock balance detected! ]
      [    0.504627] 3.11.0-rc6-next-20130819 #1 Not tainted
      [    0.504627] -------------------------------------
      [    0.504627] swapper/1 is trying to release lock (cpufreq_rwsem) at:
      [    0.504627] [<ffffffff813d927a>] cpufreq_add_dev+0x13a/0x3e0
      [    0.504627] but there are no more locks to release!
      [    0.504627]
      [    0.504627] other info that might help us debug this:
      [    0.504627] 1 lock held by swapper/1:
      [    0.504627]  #0:  (subsys mutex#4){+.+.+.}, at: [<ffffffff8134a7bf>] subsys_interface_register+0x4f/0xe0
      [    0.504627]
      [    0.504627] stack backtrace:
      [    0.504627] CPU: 0 PID: 1 Comm: swapper Not tainted 3.11.0-rc6-next-20130819 #1
      [    0.504627] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
      [    0.504627]  ffffffff813d927a ffff88007f847c98 ffffffff814c062b ffff88007f847cc8
      [    0.504627]  ffffffff81098bce ffff88007f847cf8 ffffffff81aadc30 ffffffff813d927a
      [    0.504627]  00000000ffffffff ffff88007f847d68 ffffffff8109d0be 0000000000000006
      [    0.504627] Call Trace:
      [    0.504627]  [<ffffffff813d927a>] ? cpufreq_add_dev+0x13a/0x3e0
      [    0.504627]  [<ffffffff814c062b>] dump_stack+0x19/0x1b
      [    0.504627]  [<ffffffff81098bce>] print_unlock_imbalance_bug+0xfe/0x110
      [    0.504627]  [<ffffffff813d927a>] ? cpufreq_add_dev+0x13a/0x3e0
      [    0.504627]  [<ffffffff8109d0be>] lock_release_non_nested+0x1ee/0x310
      [    0.504627]  [<ffffffff81099d0e>] ? mark_held_locks+0xae/0x120
      [    0.504627]  [<ffffffff811510cb>] ? kfree+0xcb/0x1d0
      [    0.504627]  [<ffffffff813d77ea>] ? cpufreq_policy_free+0x4a/0x60
      [    0.504627]  [<ffffffff813d927a>] ? cpufreq_add_dev+0x13a/0x3e0
      [    0.504627]  [<ffffffff8109d2a4>] lock_release+0xc4/0x250
      [    0.504627]  [<ffffffff8106c9f3>] up_read+0x23/0x40
      [    0.504627]  [<ffffffff813d927a>] cpufreq_add_dev+0x13a/0x3e0
      [    0.504627]  [<ffffffff8134a809>] subsys_interface_register+0x99/0xe0
      [    0.504627]  [<ffffffff81b19f3b>] ? cpufreq_gov_dbs_init+0x12/0x12
      [    0.504627]  [<ffffffff813d7f0d>] cpufreq_register_driver+0x9d/0x1d0
      [    0.504627]  [<ffffffff81b19f3b>] ? cpufreq_gov_dbs_init+0x12/0x12
      [    0.504627]  [<ffffffff81b1a039>] acpi_cpufreq_init+0xfe/0x1f8
      [    0.504627]  [<ffffffff810002ba>] do_one_initcall+0xda/0x180
      [    0.504627]  [<ffffffff81ae301e>] kernel_init_freeable+0x12c/0x1bb
      [    0.504627]  [<ffffffff81ae2841>] ? do_early_param+0x8c/0x8c
      [    0.504627]  [<ffffffff814b4dd0>] ? rest_init+0x140/0x140
      [    0.504627]  [<ffffffff814b4dde>] kernel_init+0xe/0xf0
      [    0.504627]  [<ffffffff814d029a>] ret_from_fork+0x7a/0xb0
      [    0.504627]  [<ffffffff814b4dd0>] ? rest_init+0x140/0x140
      Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Acked-and-tested-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      5025d628
  3. 20 8月, 2013 5 次提交
  4. 18 8月, 2013 1 次提交
    • R
      Revert "cpufreq: Use cpufreq_policy_list for iterating over policies" · 878f6e07
      Rafael J. Wysocki 提交于
      Revert commit eb608521 (cpufreq: Use cpufreq_policy_list for iterating
      over policies), because it breaks system suspend/resume on multiple
      machines.
      
      It either causes resume to block indefinitely or causes the BUG_ON()
      in lock_policy_rwsem_##mode() to trigger on sysfs accesses to cpufreq
      attributes.
      
      Conflicts:
      	drivers/cpufreq/cpufreq.c
      878f6e07
  5. 10 8月, 2013 5 次提交
  6. 08 8月, 2013 15 次提交
  7. 30 7月, 2013 1 次提交
    • R
      cpufreq: Fix cpufreq driver module refcount balance after suspend/resume · 2a998599
      Rafael J. Wysocki 提交于
      Since cpufreq_cpu_put() called by __cpufreq_remove_dev() drops the
      driver module refcount, __cpufreq_remove_dev() causes that refcount
      to become negative for the cpufreq driver after a suspend/resume
      cycle.
      
      This is not the only bad thing that happens there, however, because
      kobject_put() should only be called for the policy kobject at this
      point if the CPU is not the last one for that policy.
      
      Namely, if the given CPU is the last one for that policy, the
      policy kobject's refcount should be 1 at this point, as set by
      cpufreq_add_dev_interface(), and only needs to be dropped once for
      the kobject to go away.  This actually happens under the cpu == 1
      check, so it need not be done before by cpufreq_cpu_put().
      
      On the other hand, if the given CPU is not the last one for that
      policy, this means that cpufreq_add_policy_cpu() has been called
      at least once for that policy and cpufreq_cpu_get() has been
      called for it too.  To balance that cpufreq_cpu_get(), we need to
      call cpufreq_cpu_put() in that case.
      
      Thus, to fix the described problem and keep the reference
      counters balanced in both cases, move the cpufreq_cpu_get() call
      in __cpufreq_remove_dev() to the code path executed only for
      CPUs that share the policy with other CPUs.
      Reported-and-tested-by: NToralf Förster <toralf.foerster@gmx.de>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: 3.10+ <stable@vger.kernel.org>
      2a998599
  8. 26 7月, 2013 1 次提交
  9. 15 7月, 2013 2 次提交
    • P
      cpufreq: delete __cpuinit usage from all cpufreq files · 2760984f
      Paul Gortmaker 提交于
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      This removes all the drivers/cpufreq uses of the __cpuinit macros
      from all C files.
      
      [1] https://lkml.org/lkml/2013/5/20/589
      
      [v2: leave 2nd lines of args misaligned as requested by Viresh]
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: cpufreq@vger.kernel.org
      Cc: linux-pm@vger.kernel.org
      Acked-by: NDirk Brandewie <dirk.j.brandewie@intel.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      2760984f
    • S
      cpufreq: Revert commit a66b2e to fix suspend/resume regression · aae760ed
      Srivatsa S. Bhat 提交于
      commit a66b2e (cpufreq: Preserve sysfs files across suspend/resume)
      has unfortunately caused several things in the cpufreq subsystem to
      break subtly after a suspend/resume cycle.
      
      The intention of that patch was to retain the file permissions of the
      cpufreq related sysfs files across suspend/resume.  To achieve that,
      the commit completely removed the calls to cpufreq_add_dev() and
      __cpufreq_remove_dev() during suspend/resume transitions.  But the
      problem is that those functions do 2 kinds of things:
        1. Low-level initialization/tear-down that are critical to the
           correct functioning of cpufreq-core.
        2. Kobject and sysfs related initialization/teardown.
      
      Ideally we should have reorganized the code to cleanly separate these
      two responsibilities, and skipped only the sysfs related parts during
      suspend/resume.  Since we skipped the entire callbacks instead (which
      also included some CPU and cpufreq-specific critical components),
      cpufreq subsystem started behaving erratically after suspend/resume.
      
      So revert the commit to fix the regression.  We'll revisit and address
      the original goal of that commit separately, since it involves quite a
      bit of careful code reorganization and appears to be non-trivial.
      
      (While reverting the commit, note that another commit f51e1eb6
       (cpufreq: Fix cpufreq regression after suspend/resume) already
       reverted part of the original set of changes.  So revert only the
       remaining ones).
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Tested-by: NPaul Bolle <pebolle@tiscali.nl>
      Cc: 3.10+ <stable@vger.kernel.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      aae760ed
  10. 04 7月, 2013 1 次提交
    • V
      cpufreq: Fix serialization of frequency transitions · 266c13d7
      Viresh Kumar 提交于
      Commit 7c30ed ("cpufreq: make sure frequency transitions are serialized")
      interacts poorly with systems that have a single core freqency for all
      cores.  On such systems we have a single policy for all cores with
      several CPUs.  When we do a frequency transition the governor calls the
      pre and post change notifiers which causes cpufreq_notify_transition()
      per CPU.  Since the policy is the same for all of them all CPUs after
      the first and the warnings added are generated by checking a per-policy
      flag the warnings will be triggered for all cores after the first.
      
      Fix this by allowing notifier to be called for n times. Where n is the number of
      cpus in policy->cpus.
      Reported-and-tested-by: NMark Brown <broonie@linaro.org>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      266c13d7
  11. 28 6月, 2013 2 次提交
  12. 21 6月, 2013 1 次提交