1. 10 9月, 2013 6 次提交
    • S
      cpufreq: Synchronize the cpufreq store_*() routines with CPU hotplug · 4f750c93
      Srivatsa S. Bhat 提交于
      The functions that are used to write to cpufreq sysfs files (such as
      store_scaling_max_freq()) are not hotplug safe. They can race with CPU
      hotplug tasks and lead to problems such as trying to acquire an already
      destroyed timer-mutex etc.
      
      Eg:
      
          __cpufreq_remove_dev()
           __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
             policy->governor->governor(policy, CPUFREQ_GOV_STOP);
              cpufreq_governor_dbs()
               case CPUFREQ_GOV_STOP:
                mutex_destroy(&cpu_cdbs->timer_mutex)
                cpu_cdbs->cur_policy = NULL;
            <PREEMPT>
          store()
           __cpufreq_set_policy()
            __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
              policy->governor->governor(policy, CPUFREQ_GOV_LIMITS);
               case CPUFREQ_GOV_LIMITS:
                mutex_lock(&cpu_cdbs->timer_mutex); <-- Warning (destroyed mutex)
                 if (policy->max < cpu_cdbs->cur_policy->cur) <- cur_policy == NULL
      
      So use get_online_cpus()/put_online_cpus() in the store_*() functions, to
      synchronize with CPU hotplug. However, there is an additional point to note
      here: some parts of the CPU teardown in the cpufreq subsystem are done in
      the CPU_POST_DEAD stage, with cpu_hotplug.lock *released*. So, using the
      get/put_online_cpus() functions alone is insufficient; we should also ensure
      that we don't race with those latter steps in the hotplug sequence. We can
      easily achieve this by checking if the CPU is online before proceeding with
      the store, since the CPU would have been marked offline by the time the
      CPU_POST_DEAD notifiers are executed.
      Reported-by: NStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      4f750c93
    • S
      cpufreq: Invoke __cpufreq_remove_dev_finish() after releasing cpu_hotplug.lock · 1aee40ac
      Srivatsa S. Bhat 提交于
      __cpufreq_remove_dev_finish() handles the kobject cleanup for a CPU going
      offline. But because we destroy the kobject towards the end of the CPU offline
      phase, there are certain race windows where a task can try to write to a
      cpufreq sysfs file (eg: using store_scaling_max_freq()) while we are taking
      that CPU offline, and this can bump up the kobject refcount, which in turn might
      hinder the CPU offline task from running to completion. (It can also cause
      other more serious problems such as trying to acquire a destroyed timer-mutex
      etc., depending on the exact stage of the cleanup at which the task managed to
      take a new refcount).
      
      To fix the race window, we will need to synchronize those store_*() call-sites
      with CPU hotplug, using get_online_cpus()/put_online_cpus(). However, that
      in turn can cause a total deadlock because it can end up waiting for the
      CPU offline task to complete, with incremented refcount!
      
      Write to sysfs                            CPU offline task
      --------------                            ----------------
      kobj_refcnt++
      
                                                Acquire cpu_hotplug.lock
      
      get_online_cpus();
      
      					  Wait for kobj_refcnt to drop to zero
      
                           **DEADLOCK**
      
      A simple way to avoid this problem is to perform the kobject cleanup in the
      CPU offline path, with the cpu_hotplug.lock *released*. That is, we can
      perform the wait-for-kobj-refcnt-to-drop as well as the subsequent cleanup
      in the CPU_POST_DEAD stage of CPU offline, which is run with cpu_hotplug.lock
      released. Doing this helps us avoid deadlocks due to holding kobject refcounts
      and waiting on each other on the cpu_hotplug.lock.
      
      (Note: We can't move all of the cpufreq CPU offline steps to the
      CPU_POST_DEAD stage, because certain things such as stopping the governors
      have to be done before the outgoing CPU is marked offline. So retain those
      parts in the CPU_DOWN_PREPARE stage itself).
      Reported-by: NStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      1aee40ac
    • S
      cpufreq: Split __cpufreq_remove_dev() into two parts · cedb70af
      Srivatsa S. Bhat 提交于
      During CPU offline, the cpufreq core invokes __cpufreq_remove_dev()
      to perform work such as stopping the cpufreq governor, clearing the
      CPU from the policy structure etc, and finally cleaning up the
      kobject.
      
      There are certain subtle issues related to the kobject cleanup, and
      it would be much easier to deal with them if we separate that part
      from the rest of the cleanup-work in the CPU offline phase. So split
      the __cpufreq_remove_dev() function into 2 parts: one that handles
      the kobject cleanup, and the other that handles the rest of the work.
      Reported-by: NStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      cedb70af
    • A
      cpufreq: Fix wrong time unit conversion · a857c0b9
      Andreas Schwab 提交于
      The time spent by a CPU under a given frequency is stored in jiffies unit
      in the cpu var cpufreq_stats_table->time_in_state[i], i being the index of
      the frequency.
      
      This is what is displayed in the following file on the right column:
      
           cat /sys/devices/system/cpu/cpuX/cpufreq/stats/time_in_state
           2301000 19835820
           2300000 3172
           [...]
      
      Now cpufreq converts this jiffies unit delta to clock_t before returning it
      to the user as in the above file. And that conversion is achieved using the API
      cputime64_to_clock_t().
      
      Although it accidentally works on traditional tick based cputime accounting, where
      cputime_t maps directly to jiffies, it doesn't work with other types of cputime
      accounting such as CONFIG_VIRT_CPU_ACCOUNTING_* where cputime_t can map to nsecs
      or any granularity preffered by the architecture.
      
      For example we get a buggy zero delta on full dyntick configurations:
      
           cat /sys/devices/system/cpu/cpuX/cpufreq/stats/time_in_state
           2301000 0
           2300000 0
           [...]
      
      Fix this with using the proper jiffies_64_t to clock_t conversion.
      Reported-and-tested-by: NCarsten Emde <C.Emde@osadl.org>
      Signed-off-by: NAndreas Schwab <schwab@linux-m68k.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      a857c0b9
    • V
      cpufreq: serialize calls to __cpufreq_governor() · 19c76303
      Viresh Kumar 提交于
      We can't take a big lock around __cpufreq_governor() as this causes
      recursive locking for some cases. But calls to this routine must be
      serialized for every policy. Otherwise we can see some unpredictable
      events.
      
      For example, consider following scenario:
      
      __cpufreq_remove_dev()
       __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
         policy->governor->governor(policy, CPUFREQ_GOV_STOP);
          cpufreq_governor_dbs()
           case CPUFREQ_GOV_STOP:
            mutex_destroy(&cpu_cdbs->timer_mutex)
            cpu_cdbs->cur_policy = NULL;
        <PREEMPT>
      store()
       __cpufreq_set_policy()
        __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
          policy->governor->governor(policy, CPUFREQ_GOV_LIMITS);
           case CPUFREQ_GOV_LIMITS:
            mutex_lock(&cpu_cdbs->timer_mutex); <-- Warning (destroyed mutex)
             if (policy->max < cpu_cdbs->cur_policy->cur) <- cur_policy == NULL
      
      And so store() will eventually result in a crash if cur_policy is
      NULL at this point.
      
      Introduce an additional variable which would guarantee serialization
      here.
      Reported-by: NStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      19c76303
    • V
      cpufreq: don't allow governor limits to be changed when it is disabled · f73d3933
      Viresh Kumar 提交于
      __cpufreq_governor() returns with -EBUSY when governor is already
      stopped and we try to stop it again, but when it is stopped we must
      not allow calls to CPUFREQ_GOV_LIMITS event as well.
      
      This patch adds this check in __cpufreq_governor().
      Reported-by: NStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      f73d3933
  2. 30 8月, 2013 1 次提交
    • S
      cpufreq: Don't use smp_processor_id() in preemptible context · 69320783
      Stephen Boyd 提交于
      Workqueues are preemptible even if works are queued on them with
      queue_work_on(). Let's use raw_smp_processor_id() here to silence
      the warning.
      
      BUG: using smp_processor_id() in preemptible [00000000] code: kworker/3:2/674
      caller is gov_queue_work+0x28/0xb0
      CPU: 0 PID: 674 Comm: kworker/3:2 Tainted: G        W    3.10.0 #30
      Workqueue: events od_dbs_timer
      [<c010c178>] (unwind_backtrace+0x0/0x11c) from [<c0109dec>] (show_stack+0x10/0x14)
      [<c0109dec>] (show_stack+0x10/0x14) from [<c03885a4>] (debug_smp_processor_id+0xbc/0xf0)
      [<c03885a4>] (debug_smp_processor_id+0xbc/0xf0) from [<c0635864>] (gov_queue_work+0x28/0xb0)
      [<c0635864>] (gov_queue_work+0x28/0xb0) from [<c0635618>] (od_dbs_timer+0x108/0x134)
      [<c0635618>] (od_dbs_timer+0x108/0x134) from [<c01aa8f8>] (process_one_work+0x25c/0x444)
      [<c01aa8f8>] (process_one_work+0x25c/0x444) from [<c01aaf88>] (worker_thread+0x200/0x344)
      [<c01aaf88>] (worker_thread+0x200/0x344) from [<c01b03bc>] (kthread+0xa0/0xb0)
      [<c01b03bc>] (kthread+0xa0/0xb0) from [<c01061b8>] (ret_from_fork+0x14/0x3c)
      Signed-off-by: NStephen Boyd <sboyd@codeaurora.org>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      69320783
  3. 29 8月, 2013 3 次提交
    • S
      cpufreq: governor: Fix typos in comments · c4afc410
      Stratos Karafotis 提交于
       - 'Governer' should be 'Governor'.
       - 'S' is used for Siemens (electrical conductance) in SI units,
         so use small 's' for seconds.
      Signed-off-by: NStratos Karafotis <stratosk@semaphore.gr>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      c4afc410
    • S
      cpufreq: governors: Remove duplicate check of target freq in supported range · 934dac1e
      Stratos Karafotis 提交于
      Function __cpufreq_driver_target() checks if target_freq is within
      policy->min and policy->max range. generic_powersave_bias_target() also
      checks if target_freq is valid via a cpufreq_frequency_table_target()
      call. So, drop the unnecessary duplicate check in *_check_cpu().
      Signed-off-by: NStratos Karafotis <stratosk@semaphore.gr>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      934dac1e
    • S
      cpufreq: Fix timer/workqueue corruption due to double queueing · 3617f2ca
      Stephen Boyd 提交于
      When a CPU is hot removed we'll cancel all the delayed work items
      via gov_cancel_work(). Normally this will just cancels a delayed
      timer on each CPU that the policy is managing and the work won't
      run, but if the work is already running the workqueue code will
      wait for the work to finish before continuing to prevent the
      work items from re-queuing themselves like they normally do. This
      scheme will work most of the time, except for the case where the
      work function determines that it should adjust the delay for all
      other CPUs that the policy is managing. If this scenario occurs,
      the canceling CPU will cancel its own work but queue up the other
      CPUs works to run. For example:
      
       CPU0                                        CPU1
       ----                                        ----
       cpu_down()
        ...
        __cpufreq_remove_dev()
         cpufreq_governor_dbs()
          case CPUFREQ_GOV_STOP:
           gov_cancel_work(dbs_data, policy);
            cpu0 work is canceled
             timer is canceled
             cpu1 work is canceled                    <work runs>
             <waits for cpu1>                         od_dbs_timer()
                                                       gov_queue_work(*, *, true);
       						  cpu0 work queued
       						  cpu1 work queued
      						  cpu2 work queued
      						  ...
             cpu1 work is canceled
             cpu2 work is canceled
             ...
      
      At the end of the GOV_STOP case cpu0 still has a work queued to
      run although the code is expecting all of the works to be
      canceled. __cpufreq_remove_dev() will then proceed to
      re-initialize all the other CPUs works except for the CPU that is
      going down. The CPUFREQ_GOV_START case in cpufreq_governor_dbs()
      will trample over the queued work and debugobjects will spit out
      a warning:
      
      WARNING: at lib/debugobjects.c:260 debug_print_object+0x94/0xbc()
      ODEBUG: init active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x10
      Modules linked in:
      CPU: 0 PID: 1491 Comm: sh Tainted: G        W    3.10.0 #19
      [<c010c178>] (unwind_backtrace+0x0/0x11c) from [<c0109dec>] (show_stack+0x10/0x14)
      [<c0109dec>] (show_stack+0x10/0x14) from [<c01904cc>] (warn_slowpath_common+0x4c/0x6c)
      [<c01904cc>] (warn_slowpath_common+0x4c/0x6c) from [<c019056c>] (warn_slowpath_fmt+0x2c/0x3c)
      [<c019056c>] (warn_slowpath_fmt+0x2c/0x3c) from [<c0388a7c>] (debug_print_object+0x94/0xbc)
      [<c0388a7c>] (debug_print_object+0x94/0xbc) from [<c0388e34>] (__debug_object_init+0x2d0/0x340)
      [<c0388e34>] (__debug_object_init+0x2d0/0x340) from [<c019e3b0>] (init_timer_key+0x14/0xb0)
      [<c019e3b0>] (init_timer_key+0x14/0xb0) from [<c0635f78>] (cpufreq_governor_dbs+0x3e8/0x5f8)
      [<c0635f78>] (cpufreq_governor_dbs+0x3e8/0x5f8) from [<c06325a0>] (__cpufreq_governor+0xdc/0x1a4)
      [<c06325a0>] (__cpufreq_governor+0xdc/0x1a4) from [<c0633704>] (__cpufreq_remove_dev.isra.10+0x3b4/0x434)
      [<c0633704>] (__cpufreq_remove_dev.isra.10+0x3b4/0x434) from [<c08989f4>] (cpufreq_cpu_callback+0x60/0x80)
      [<c08989f4>] (cpufreq_cpu_callback+0x60/0x80) from [<c08a43c0>] (notifier_call_chain+0x38/0x68)
      [<c08a43c0>] (notifier_call_chain+0x38/0x68) from [<c01938e0>] (__cpu_notify+0x28/0x40)
      [<c01938e0>] (__cpu_notify+0x28/0x40) from [<c0892ad4>] (_cpu_down+0x7c/0x2c0)
      [<c0892ad4>] (_cpu_down+0x7c/0x2c0) from [<c0892d3c>] (cpu_down+0x24/0x40)
      [<c0892d3c>] (cpu_down+0x24/0x40) from [<c0893ea8>] (store_online+0x2c/0x74)
      [<c0893ea8>] (store_online+0x2c/0x74) from [<c04519d8>] (dev_attr_store+0x18/0x24)
      [<c04519d8>] (dev_attr_store+0x18/0x24) from [<c02a69d4>] (sysfs_write_file+0x100/0x148)
      [<c02a69d4>] (sysfs_write_file+0x100/0x148) from [<c0255c18>] (vfs_write+0xcc/0x174)
      [<c0255c18>] (vfs_write+0xcc/0x174) from [<c0255f70>] (SyS_write+0x38/0x64)
      [<c0255f70>] (SyS_write+0x38/0x64) from [<c0106120>] (ret_fast_syscall+0x0/0x30)
      Signed-off-by: NStephen Boyd <sboyd@codeaurora.org>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      3617f2ca
  4. 26 8月, 2013 1 次提交
    • S
      cpufreq: imx6q: Fix clock enable balance · fae19b84
      Sascha Hauer 提交于
      For changing the cpu frequency the i.MX6q has to be switched to some
      intermediate clock during the PLL reprogramming. The driver tries
      to be clever to keep the enable count correct but gets it wrong. If
      the cpufreq is increased it calls clk_disable_unprepare twice
      on pll2_pfd2_396m. This puts all other devices which get their clock
      from pll2_pfd2_396m into a nonworking state.
      
      Fix this by removing the clk enabling/disabling altogether since the
      clk core will do this automatically during a reparent.
      Signed-off-by: NSascha Hauer <s.hauer@pengutronix.de>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      fae19b84
  5. 24 8月, 2013 1 次提交
  6. 21 8月, 2013 10 次提交
  7. 20 8月, 2013 5 次提交
  8. 18 8月, 2013 1 次提交
    • R
      Revert "cpufreq: Use cpufreq_policy_list for iterating over policies" · 878f6e07
      Rafael J. Wysocki 提交于
      Revert commit eb608521 (cpufreq: Use cpufreq_policy_list for iterating
      over policies), because it breaks system suspend/resume on multiple
      machines.
      
      It either causes resume to block indefinitely or causes the BUG_ON()
      in lock_policy_rwsem_##mode() to trigger on sysfs accesses to cpufreq
      attributes.
      
      Conflicts:
      	drivers/cpufreq/cpufreq.c
      878f6e07
  9. 15 8月, 2013 7 次提交
  10. 12 8月, 2013 2 次提交
  11. 10 8月, 2013 3 次提交