1. 01 10月, 2013 16 次提交
  2. 25 9月, 2013 3 次提交
  3. 20 9月, 2013 1 次提交
  4. 19 9月, 2013 2 次提交
  5. 18 9月, 2013 2 次提交
  6. 12 9月, 2013 4 次提交
    • L
      cpufreq: Acquire the lock in cpufreq_policy_restore() for reading · 44871c9c
      Lan Tianyu 提交于
      In cpufreq_policy_restore() before system suspend policy is read from
      percpu's cpufreq_cpu_data_fallback.  It's a read operation rather
      than a write one, so take the lock for reading in there.
      Signed-off-by: NLan Tianyu <tianyu.lan@intel.com>
      Reviewed-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      44871c9c
    • S
      cpufreq: Prevent problems in update_policy_cpu() if last_cpu == new_cpu · cb38ed5c
      Srivatsa S. Bhat 提交于
      If update_policy_cpu() is invoked with the existing policy->cpu itself
      as the new-cpu parameter, then a lot of things can go terribly wrong.
      
      In its present form, update_policy_cpu() always assumes that the new-cpu
      is different from policy->cpu and invokes other functions to perform their
      respective updates. And those functions implement the actual update like
      this:
      
      per_cpu(..., new_cpu) = per_cpu(..., last_cpu);
      per_cpu(..., last_cpu) = NULL;
      
      Thus, when new_cpu == last_cpu, the final NULL assignment makes the per-cpu
      references vanish into thin air! (memory leak). From there, it leads to more
      problems: cpufreq_stats_create_table() now doesn't find the per-cpu reference
      and hence tries to create a new sysfs-group; but sysfs already had created
      the group earlier, so it complains that it cannot create a duplicate filename.
      In short, the repercussions of a rather innocuous invocation of
      update_policy_cpu() can turn out to be pretty nasty.
      
      Ideally update_policy_cpu() should handle this situation (new == last)
      gracefully, and not lead to such severe problems. So fix it by adding an
      appropriate check.
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Tested-by: NStephen Warren <swarren@nvidia.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      cb38ed5c
    • S
      cpufreq: Restructure if/else block to avoid unintended behavior · 61173f25
      Srivatsa S. Bhat 提交于
      In __cpufreq_remove_dev_prepare(), the code which decides whether to remove
      the sysfs link or nominate a new policy cpu, is governed by an if/else block
      with a rather complex set of conditionals. Worse, they harbor a subtlety
      which leads to certain unintended behavior.
      
      The code looks like this:
      
              if (cpu != policy->cpu && !frozen) {
                      sysfs_remove_link(&dev->kobj, "cpufreq");
              } else if (cpus > 1) {
      		new_cpu = cpufreq_nominate_new_policy_cpu(...);
      		...
      		update_policy_cpu(..., new_cpu);
      	}
      
      The original intention was:
      If the CPU going offline is not policy->cpu, just remove the link.
      On the other hand, if the CPU going offline is the policy->cpu itself,
      handover the policy->cpu job to some other surviving CPU in that policy.
      
      But because the 'if' condition also includes the 'frozen' check, now there
      are *two* possibilities by which we can enter the 'else' block:
      
      1. cpu == policy->cpu (intended)
      2. cpu != policy->cpu && frozen (unintended)
      
      Due to the second (unintended) scenario, we end up spuriously nominating
      a CPU as the policy->cpu, even when the existing policy->cpu is alive and
      well. This can cause problems further down the line, especially when we end
      up nominating the same policy->cpu as the new one (ie., old == new),
      because it totally confuses update_policy_cpu().
      
      To avoid this mess, restructure the if/else block to only do what was
      originally intended, and thus prevent any unwelcome surprises.
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Tested-by: NStephen Warren <swarren@nvidia.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      61173f25
    • S
      cpufreq: Fix crash in cpufreq-stats during suspend/resume · 0d66b91e
      Srivatsa S. Bhat 提交于
      Stephen Warren reported that the cpufreq-stats code hits a NULL pointer
      dereference during the second attempt to suspend a system. He also
      pin-pointed the problem to commit 5302c3fb "cpufreq: Perform light-weight
      init/teardown during suspend/resume".
      
      That commit actually ensured that the cpufreq-stats table and the
      cpufreq-stats sysfs entries are *not* torn down (ie., not freed) during
      suspend/resume, which makes it all the more surprising. However, it turns
      out that the root-cause is not that we access an already freed memory, but
      that the reference to the allocated memory gets moved around and we lose
      track of that during resume, leading to the reported crash in a subsequent
      suspend attempt.
      
      In the suspend path, during CPU offline, the value of policy->cpu is updated
      by choosing one of the surviving CPUs in that policy, as long as there is
      atleast one CPU in that policy. And cpufreq_stats_update_policy_cpu() is
      invoked to update the reference to the stats structure by assigning it to
      the new CPU. However, in the resume path, during CPU online, we end up
      assigning a fresh CPU as the policy->cpu, without letting cpufreq-stats
      know about this. Thus the reference to the stats structure remains
      (incorrectly) associated with the old CPU. So, in a subsequent suspend attempt,
      during CPU offline, we end up accessing an incorrect location to get the
      stats structure, which eventually leads to the NULL pointer dereference.
      
      Fix this by letting cpufreq-stats know about the update of the policy->cpu
      during CPU online in the resume path. (Also, move the update_policy_cpu()
      function higher up in the file, so that __cpufreq_add_dev() can invoke
      it).
      Reported-and-tested-by: NStephen Warren <swarren@nvidia.com>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      0d66b91e
  7. 11 9月, 2013 1 次提交
  8. 10 9月, 2013 9 次提交
    • R
      Revert "cpufreq: make sure frequency transitions are serialized" · 798282a8
      Rafael J. Wysocki 提交于
      Commit 7c30ed53 (cpufreq: make sure frequency transitions are
      serialized) attempted to serialize frequency transitions by
      adding checks to the CPUFREQ_PRECHANGE and CPUFREQ_POSTCHANGE
      notifications.  However, it assumed that the notifications will
      always originate from the driver's .target() callback, but they
      also can be triggered by cpufreq_out_of_sync() and that leads to
      warnings like this on some systems:
      
       WARNING: CPU: 0 PID: 14543 at drivers/cpufreq/cpufreq.c:317
       __cpufreq_notify_transition+0x238/0x260()
       In middle of another frequency transition
      
      accompanied by a call trace similar to this one:
      
       [<ffffffff81720daa>] dump_stack+0x46/0x58
       [<ffffffff8106534c>] warn_slowpath_common+0x8c/0xc0
       [<ffffffff815b8560>] ? acpi_cpufreq_target+0x320/0x320
       [<ffffffff81065436>] warn_slowpath_fmt+0x46/0x50
       [<ffffffff815b1ec8>] __cpufreq_notify_transition+0x238/0x260
       [<ffffffff815b33be>] cpufreq_notify_transition+0x3e/0x70
       [<ffffffff815b345d>] cpufreq_out_of_sync+0x6d/0xb0
       [<ffffffff815b370c>] cpufreq_update_policy+0x10c/0x160
       [<ffffffff815b3760>] ? cpufreq_update_policy+0x160/0x160
       [<ffffffff81413813>] cpufreq_set_cur_state+0x8c/0xb5
       [<ffffffff814138df>] processor_set_cur_state+0xa3/0xcf
       [<ffffffff8158e13c>] thermal_cdev_update+0x9c/0xb0
       [<ffffffff8159046a>] step_wise_throttle+0x5a/0x90
       [<ffffffff8158e21f>] handle_thermal_trip+0x4f/0x140
       [<ffffffff8158e377>] thermal_zone_device_update+0x57/0xa0
       [<ffffffff81415b36>] acpi_thermal_check+0x2e/0x30
       [<ffffffff81415ca0>] acpi_thermal_notify+0x40/0xdc
       [<ffffffff813e7dbd>] acpi_device_notify+0x19/0x1b
       [<ffffffff813f8241>] acpi_ev_notify_dispatch+0x41/0x5c
       [<ffffffff813e3fbe>] acpi_os_execute_deferred+0x25/0x32
       [<ffffffff81081060>] process_one_work+0x170/0x4a0
       [<ffffffff81082121>] worker_thread+0x121/0x390
       [<ffffffff81082000>] ? manage_workers.isra.20+0x170/0x170
       [<ffffffff81088fe0>] kthread+0xc0/0xd0
       [<ffffffff81088f20>] ? flush_kthread_worker+0xb0/0xb0
       [<ffffffff8173582c>] ret_from_fork+0x7c/0xb0
       [<ffffffff81088f20>] ? flush_kthread_worker+0xb0/0xb0
      
      For this reason, revert commit 7c30ed53 along with the fix 266c13d7
      (cpufreq: Fix serialization of frequency transitions) on top of it
      and we will revisit the serialization problem later.
      Reported-by: NAlessandro Bono <alessandro.bono@gmail.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      798282a8
    • S
      cpufreq: Use signed type for 'ret' variable, to store negative error values · 5136fa56
      Srivatsa S. Bhat 提交于
      There are places where the variable 'ret' is declared as unsigned int
      and then used to store negative return values such as -EINVAL. Fix them
      by declaring the variable as a signed quantity.
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      5136fa56
    • S
      cpufreq: Remove temporary fix for race between CPU hotplug and sysfs-writes · 56d07db2
      Srivatsa S. Bhat 提交于
      Commit "cpufreq: serialize calls to __cpufreq_governor()" had been a temporary
      and partial solution to the race condition between writing to a cpufreq sysfs
      file and taking a CPU offline. Now that we have a proper and complete solution
      to that problem, remove the temporary fix.
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      56d07db2
    • S
      cpufreq: Synchronize the cpufreq store_*() routines with CPU hotplug · 4f750c93
      Srivatsa S. Bhat 提交于
      The functions that are used to write to cpufreq sysfs files (such as
      store_scaling_max_freq()) are not hotplug safe. They can race with CPU
      hotplug tasks and lead to problems such as trying to acquire an already
      destroyed timer-mutex etc.
      
      Eg:
      
          __cpufreq_remove_dev()
           __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
             policy->governor->governor(policy, CPUFREQ_GOV_STOP);
              cpufreq_governor_dbs()
               case CPUFREQ_GOV_STOP:
                mutex_destroy(&cpu_cdbs->timer_mutex)
                cpu_cdbs->cur_policy = NULL;
            <PREEMPT>
          store()
           __cpufreq_set_policy()
            __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
              policy->governor->governor(policy, CPUFREQ_GOV_LIMITS);
               case CPUFREQ_GOV_LIMITS:
                mutex_lock(&cpu_cdbs->timer_mutex); <-- Warning (destroyed mutex)
                 if (policy->max < cpu_cdbs->cur_policy->cur) <- cur_policy == NULL
      
      So use get_online_cpus()/put_online_cpus() in the store_*() functions, to
      synchronize with CPU hotplug. However, there is an additional point to note
      here: some parts of the CPU teardown in the cpufreq subsystem are done in
      the CPU_POST_DEAD stage, with cpu_hotplug.lock *released*. So, using the
      get/put_online_cpus() functions alone is insufficient; we should also ensure
      that we don't race with those latter steps in the hotplug sequence. We can
      easily achieve this by checking if the CPU is online before proceeding with
      the store, since the CPU would have been marked offline by the time the
      CPU_POST_DEAD notifiers are executed.
      Reported-by: NStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      4f750c93
    • S
      cpufreq: Invoke __cpufreq_remove_dev_finish() after releasing cpu_hotplug.lock · 1aee40ac
      Srivatsa S. Bhat 提交于
      __cpufreq_remove_dev_finish() handles the kobject cleanup for a CPU going
      offline. But because we destroy the kobject towards the end of the CPU offline
      phase, there are certain race windows where a task can try to write to a
      cpufreq sysfs file (eg: using store_scaling_max_freq()) while we are taking
      that CPU offline, and this can bump up the kobject refcount, which in turn might
      hinder the CPU offline task from running to completion. (It can also cause
      other more serious problems such as trying to acquire a destroyed timer-mutex
      etc., depending on the exact stage of the cleanup at which the task managed to
      take a new refcount).
      
      To fix the race window, we will need to synchronize those store_*() call-sites
      with CPU hotplug, using get_online_cpus()/put_online_cpus(). However, that
      in turn can cause a total deadlock because it can end up waiting for the
      CPU offline task to complete, with incremented refcount!
      
      Write to sysfs                            CPU offline task
      --------------                            ----------------
      kobj_refcnt++
      
                                                Acquire cpu_hotplug.lock
      
      get_online_cpus();
      
      					  Wait for kobj_refcnt to drop to zero
      
                           **DEADLOCK**
      
      A simple way to avoid this problem is to perform the kobject cleanup in the
      CPU offline path, with the cpu_hotplug.lock *released*. That is, we can
      perform the wait-for-kobj-refcnt-to-drop as well as the subsequent cleanup
      in the CPU_POST_DEAD stage of CPU offline, which is run with cpu_hotplug.lock
      released. Doing this helps us avoid deadlocks due to holding kobject refcounts
      and waiting on each other on the cpu_hotplug.lock.
      
      (Note: We can't move all of the cpufreq CPU offline steps to the
      CPU_POST_DEAD stage, because certain things such as stopping the governors
      have to be done before the outgoing CPU is marked offline. So retain those
      parts in the CPU_DOWN_PREPARE stage itself).
      Reported-by: NStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      1aee40ac
    • S
      cpufreq: Split __cpufreq_remove_dev() into two parts · cedb70af
      Srivatsa S. Bhat 提交于
      During CPU offline, the cpufreq core invokes __cpufreq_remove_dev()
      to perform work such as stopping the cpufreq governor, clearing the
      CPU from the policy structure etc, and finally cleaning up the
      kobject.
      
      There are certain subtle issues related to the kobject cleanup, and
      it would be much easier to deal with them if we separate that part
      from the rest of the cleanup-work in the CPU offline phase. So split
      the __cpufreq_remove_dev() function into 2 parts: one that handles
      the kobject cleanup, and the other that handles the rest of the work.
      Reported-by: NStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      cedb70af
    • A
      cpufreq: Fix wrong time unit conversion · a857c0b9
      Andreas Schwab 提交于
      The time spent by a CPU under a given frequency is stored in jiffies unit
      in the cpu var cpufreq_stats_table->time_in_state[i], i being the index of
      the frequency.
      
      This is what is displayed in the following file on the right column:
      
           cat /sys/devices/system/cpu/cpuX/cpufreq/stats/time_in_state
           2301000 19835820
           2300000 3172
           [...]
      
      Now cpufreq converts this jiffies unit delta to clock_t before returning it
      to the user as in the above file. And that conversion is achieved using the API
      cputime64_to_clock_t().
      
      Although it accidentally works on traditional tick based cputime accounting, where
      cputime_t maps directly to jiffies, it doesn't work with other types of cputime
      accounting such as CONFIG_VIRT_CPU_ACCOUNTING_* where cputime_t can map to nsecs
      or any granularity preffered by the architecture.
      
      For example we get a buggy zero delta on full dyntick configurations:
      
           cat /sys/devices/system/cpu/cpuX/cpufreq/stats/time_in_state
           2301000 0
           2300000 0
           [...]
      
      Fix this with using the proper jiffies_64_t to clock_t conversion.
      Reported-and-tested-by: NCarsten Emde <C.Emde@osadl.org>
      Signed-off-by: NAndreas Schwab <schwab@linux-m68k.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      a857c0b9
    • V
      cpufreq: serialize calls to __cpufreq_governor() · 19c76303
      Viresh Kumar 提交于
      We can't take a big lock around __cpufreq_governor() as this causes
      recursive locking for some cases. But calls to this routine must be
      serialized for every policy. Otherwise we can see some unpredictable
      events.
      
      For example, consider following scenario:
      
      __cpufreq_remove_dev()
       __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
         policy->governor->governor(policy, CPUFREQ_GOV_STOP);
          cpufreq_governor_dbs()
           case CPUFREQ_GOV_STOP:
            mutex_destroy(&cpu_cdbs->timer_mutex)
            cpu_cdbs->cur_policy = NULL;
        <PREEMPT>
      store()
       __cpufreq_set_policy()
        __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
          policy->governor->governor(policy, CPUFREQ_GOV_LIMITS);
           case CPUFREQ_GOV_LIMITS:
            mutex_lock(&cpu_cdbs->timer_mutex); <-- Warning (destroyed mutex)
             if (policy->max < cpu_cdbs->cur_policy->cur) <- cur_policy == NULL
      
      And so store() will eventually result in a crash if cur_policy is
      NULL at this point.
      
      Introduce an additional variable which would guarantee serialization
      here.
      Reported-by: NStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      19c76303
    • V
      cpufreq: don't allow governor limits to be changed when it is disabled · f73d3933
      Viresh Kumar 提交于
      __cpufreq_governor() returns with -EBUSY when governor is already
      stopped and we try to stop it again, but when it is stopped we must
      not allow calls to CPUFREQ_GOV_LIMITS event as well.
      
      This patch adds this check in __cpufreq_governor().
      Reported-by: NStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      f73d3933
  9. 30 8月, 2013 1 次提交
    • S
      cpufreq: Don't use smp_processor_id() in preemptible context · 69320783
      Stephen Boyd 提交于
      Workqueues are preemptible even if works are queued on them with
      queue_work_on(). Let's use raw_smp_processor_id() here to silence
      the warning.
      
      BUG: using smp_processor_id() in preemptible [00000000] code: kworker/3:2/674
      caller is gov_queue_work+0x28/0xb0
      CPU: 0 PID: 674 Comm: kworker/3:2 Tainted: G        W    3.10.0 #30
      Workqueue: events od_dbs_timer
      [<c010c178>] (unwind_backtrace+0x0/0x11c) from [<c0109dec>] (show_stack+0x10/0x14)
      [<c0109dec>] (show_stack+0x10/0x14) from [<c03885a4>] (debug_smp_processor_id+0xbc/0xf0)
      [<c03885a4>] (debug_smp_processor_id+0xbc/0xf0) from [<c0635864>] (gov_queue_work+0x28/0xb0)
      [<c0635864>] (gov_queue_work+0x28/0xb0) from [<c0635618>] (od_dbs_timer+0x108/0x134)
      [<c0635618>] (od_dbs_timer+0x108/0x134) from [<c01aa8f8>] (process_one_work+0x25c/0x444)
      [<c01aa8f8>] (process_one_work+0x25c/0x444) from [<c01aaf88>] (worker_thread+0x200/0x344)
      [<c01aaf88>] (worker_thread+0x200/0x344) from [<c01b03bc>] (kthread+0xa0/0xb0)
      [<c01b03bc>] (kthread+0xa0/0xb0) from [<c01061b8>] (ret_from_fork+0x14/0x3c)
      Signed-off-by: NStephen Boyd <sboyd@codeaurora.org>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      69320783
  10. 29 8月, 2013 1 次提交