1. 09 6月, 2014 1 次提交
    • V
      cpufreq: governor: remove copy_prev_load from 'struct cpu_dbs_common_info' · c8ae481b
      Viresh Kumar 提交于
      'copy_prev_load' was recently added by commit: 18b46abd (cpufreq: governor: Be
      friendly towards latency-sensitive bursty workloads).
      
      It actually is a bit redundant as we also have 'prev_load' which can store any
      integer value and can be used instead of 'copy_prev_load' by setting it zero.
      
      True load can also turn out to be zero during long idle intervals (and hence the
      actual value of 'prev_load' and the overloaded value can clash). However this is
      not a problem because, if the true load was really zero in the previous
      interval, it makes sense to evaluate the load afresh for the current interval
      rather than copying the previous load.
      
      So, drop 'copy_prev_load' and use 'prev_load' instead.
      
      Update comments as well to make it more clear.
      
      There is another change here which was probably missed by Srivatsa during the
      last version of updates he made. The unlikely in the 'if' statement was covering
      only half of the condition and the whole line should actually come under it.
      
      Also checkpatch is made more silent as it was reporting this (--strict option):
      
      CHECK: Alignment should match open parenthesis
      +		if (unlikely(wall_time > (2 * sampling_rate) &&
      +						j_cdbs->prev_load)) {
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Reviewed-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      c8ae481b
  2. 08 6月, 2014 1 次提交
    • S
      cpufreq: governor: Be friendly towards latency-sensitive bursty workloads · 18b46abd
      Srivatsa S. Bhat 提交于
      Cpufreq governors like the ondemand governor calculate the load on the CPU
      periodically by employing deferrable timers. A deferrable timer won't fire
      if the CPU is completely idle (and there are no other timers to be run), in
      order to avoid unnecessary wakeups and thus save CPU power.
      
      However, the load calculation logic is agnostic to all this, and this can
      lead to the problem described below.
      
      Time (ms)               CPU 1
      
      100                Task-A running
      
      110                Governor's timer fires, finds load as 100% in the last
                         10ms interval and increases the CPU frequency.
      
      110.5              Task-A running
      
      120		   Governor's timer fires, finds load as 100% in the last
      		   10ms interval and increases the CPU frequency.
      
      125		   Task-A went to sleep. With nothing else to do, CPU 1
      		   went completely idle.
      
      200		   Task-A woke up and started running again.
      
      200.5		   Governor's deferred timer (which was originally programmed
      		   to fire at time 130) fires now. It calculates load for the
      		   time period 120 to 200.5, and finds the load is almost zero.
      		   Hence it decreases the CPU frequency to the minimum.
      
      210		   Governor's timer fires, finds load as 100% in the last
      		   10ms interval and increases the CPU frequency.
      
      So, after the workload woke up and started running, the frequency was suddenly
      dropped to absolute minimum, and after that, there was an unnecessary delay of
      10ms (sampling period) to increase the CPU frequency back to a reasonable value.
      And this pattern repeats for every wake-up-from-cpu-idle for that workload.
      This can be quite undesirable for latency- or response-time sensitive bursty
      workloads. So we need to fix the governor's logic to detect such wake-up-from-
      cpu-idle scenarios and start the workload at a reasonably high CPU frequency.
      
      One extreme solution would be to fake a load of 100% in such scenarios. But
      that might lead to undesirable side-effects such as frequency spikes (which
      might also need voltage changes) especially if the previous frequency happened
      to be very low.
      
      We just want to avoid the stupidity of dropping down the frequency to a minimum
      and then enduring a needless (and long) delay before ramping it up back again.
      So, let us simply carry forward the previous load - that is, let us just pretend
      that the 'load' for the current time-window is the same as the load for the
      previous window. That way, the frequency and voltage will continue to be set
      to whatever values they were set at previously. This means that bursty workloads
      will get a chance to influence the CPU frequency at which they wake up from
      cpu-idle, based on their past execution history. Thus, they might be able to
      avoid suffering from slow wakeups and long response-times.
      
      However, we should take care not to over-do this. For example, such a "copy
      previous load" logic will benefit cases like this: (where # represents busy
      and . represents idle)
      
      ##########.........#########.........###########...........##########........
      
      but it will be detrimental in cases like the one shown below, because it will
      retain the high frequency (copied from the previous interval) even in a mostly
      idle system:
      
      ##########.........#.................#.....................#...............
      
      (i.e., the workload finished and the remaining tasks are such that their busy
      periods are smaller than the sampling interval, which causes the timer to
      always get deferred. So, this will make the copy-previous-load logic copy
      the initial high load to subsequent idle periods over and over again, thus
      keeping the frequency high unnecessarily).
      
      So, we modify this copy-previous-load logic such that it is used only once
      upon every wakeup-from-idle. Thus if we have 2 consecutive idle periods, the
      previous load won't get blindly copied over; cpufreq will freshly evaluate the
      load in the second idle interval, thus ensuring that the system comes back to
      its normal state.
      
      [ The right way to solve this whole problem is to teach the CPU frequency
      governors to also track load on a per-task basis, not just a per-CPU basis,
      and then use both the data sources intelligently to set the appropriate
      frequency on the CPUs. But that involves redesigning the cpufreq subsystem,
      so this patch should make the situation bearable until then. ]
      
      Experimental results:
      +-------------------+
      
      I ran a modified version of ebizzy (called 'sleeping-ebizzy') that sleeps in
      between its execution such that its total utilization can be a user-defined
      value, say 10% or 20% (higher the utilization specified, lesser the amount of
      sleeps injected). This ebizzy was run with a single-thread, tied to CPU 8.
      
      Behavior observed with tracing (sample taken from 40% utilization runs):
      ------------------------------------------------------------------------
      
      Without patch:
      ~~~~~~~~~~~~~~
      kworker/8:2-12137  416.335742: cpu_frequency: state=2061000 cpu_id=8
      kworker/8:2-12137  416.335744: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
            <...>-40753  416.345741: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
      kworker/8:2-12137  416.345744: cpu_frequency: state=4123000 cpu_id=8
      kworker/8:2-12137  416.345746: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
            <...>-40753  416.355738: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
      <snip>  ---------------------------------------------------------------------  <snip>
            <...>-40753  416.402202: sched_switch: prev_comm=ebizzy ==> next_comm=swapper/8
           <idle>-0      416.502130: sched_switch: prev_comm=swapper/8 ==> next_comm=ebizzy
            <...>-40753  416.505738: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
      kworker/8:2-12137  416.505739: cpu_frequency: state=2061000 cpu_id=8
      kworker/8:2-12137  416.505741: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
            <...>-40753  416.515739: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
      kworker/8:2-12137  416.515742: cpu_frequency: state=4123000 cpu_id=8
      kworker/8:2-12137  416.515744: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
      
      Observation: Ebizzy went idle at 416.402202, and started running again at
      416.502130. But cpufreq noticed the long idle period, and dropped the frequency
      at 416.505739, only to increase it back again at 416.515742, realizing that the
      workload is in-fact CPU bound. Thus ebizzy needlessly ran at the lowest frequency
      for almost 13 milliseconds (almost 1 full sample period), and this pattern
      repeats on every sleep-wakeup. This could hurt latency-sensitive workloads quite
      a lot.
      
      With patch:
      ~~~~~~~~~~~
      
      kworker/8:2-29802  464.832535: cpu_frequency: state=2061000 cpu_id=8
      <snip>  ---------------------------------------------------------------------  <snip>
      kworker/8:2-29802  464.962538: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
            <...>-40738  464.972533: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
      kworker/8:2-29802  464.972536: cpu_frequency: state=4123000 cpu_id=8
      kworker/8:2-29802  464.972538: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
            <...>-40738  464.982531: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
      <snip>  ---------------------------------------------------------------------  <snip>
      kworker/8:2-29802  465.022533: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
            <...>-40738  465.032531: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
      kworker/8:2-29802  465.032532: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
            <...>-40738  465.035797: sched_switch: prev_comm=ebizzy ==> next_comm=swapper/8
           <idle>-0      465.240178: sched_switch: prev_comm=swapper/8 ==> next_comm=ebizzy
            <...>-40738  465.242533: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
      kworker/8:2-29802  465.242535: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
            <...>-40738  465.252531: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
      
      Observation: Ebizzy went idle at 465.035797, and started running again at
      465.240178. Since ebizzy was the only real workload running on this CPU,
      cpufreq retained the frequency at 4.1Ghz throughout the run of ebizzy, no
      matter how many times ebizzy slept and woke-up in-between. Thus, ebizzy
      got the 10ms worth of 4.1 Ghz benefit during every sleep-wakeup (as compared
      to the run without the patch) and this boost gave a modest improvement in total
      throughput, as shown below.
      
      Sleeping-ebizzy records-per-second:
      -----------------------------------
      
      Utilization  Without patch  With patch  Difference (Absolute and % values)
          10%         274767        277046        +  2279 (+0.829%)
          20%         543429        553484        + 10055 (+1.850%)
          40%        1090744       1107959        + 17215 (+1.578%)
          60%        1634908       1662018        + 27110 (+1.658%)
      
      A rudimentary and somewhat approximately latency-sensitive workload such as
      sleeping-ebizzy itself showed a consistent, noticeable performance improvement
      with this patch. Hence, workloads that are truly latency-sensitive will benefit
      quite a bit from this change. Moreover, this is an overall win-win since this
      patch does not hurt power-savings at all (because, this patch does not reduce
      the idle time or idle residency; and the high frequency of the CPU when it goes
      to cpu-idle does not affect/hurt the power-savings of deep idle states).
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Reviewed-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      18b46abd
  3. 20 5月, 2014 1 次提交
    • B
      cpufreq: remove race while accessing cur_policy · c5450db8
      Bibek Basu 提交于
      While accessing cur_policy during executing events
      CPUFREQ_GOV_START, CPUFREQ_GOV_STOP, CPUFREQ_GOV_LIMITS,
      same mutex lock is not taken, dbs_data->mutex, which leads
      to race and data corruption while running continious suspend
      resume test. This is seen with ondemand governor with suspend
      resume test using rtcwake.
      
       Unable to handle kernel NULL pointer dereference at virtual address 00000028
       pgd = ed610000
       [00000028] *pgd=adf11831, *pte=00000000, *ppte=00000000
       Internal error: Oops: 17 [#1] PREEMPT SMP ARM
       Modules linked in: nvhost_vi
       CPU: 1 PID: 3243 Comm: rtcwake Not tainted 3.10.24-gf5cf9e5 #1
       task: ee708040 ti: ed61c000 task.ti: ed61c000
       PC is at cpufreq_governor_dbs+0x400/0x634
       LR is at cpufreq_governor_dbs+0x3f8/0x634
       pc : [<c05652b8>] lr : [<c05652b0>] psr: 600f0013
       sp : ed61dcb0 ip : 000493e0 fp : c1cc14f0
       r10: 00000000 r9 : 00000000 r8 : 00000000
       r7 : eb725280 r6 : c1cc1560 r5 : eb575200 r4 : ebad7740
       r3 : ee708040 r2 : ed61dca8 r1 : 001ebd24 r0 : 00000000
       Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
       Control: 10c5387d Table: ad61006a DAC: 00000015
       [<c05652b8>] (cpufreq_governor_dbs+0x400/0x634) from [<c055f700>] (__cpufreq_governor+0x98/0x1b4)
       [<c055f700>] (__cpufreq_governor+0x98/0x1b4) from [<c0560770>] (__cpufreq_set_policy+0x250/0x320)
       [<c0560770>] (__cpufreq_set_policy+0x250/0x320) from [<c0561dcc>] (cpufreq_update_policy+0xcc/0x168)
       [<c0561dcc>] (cpufreq_update_policy+0xcc/0x168) from [<c0561ed0>] (cpu_freq_notify+0x68/0xdc)
       [<c0561ed0>] (cpu_freq_notify+0x68/0xdc) from [<c008eff8>] (notifier_call_chain+0x4c/0x8c)
       [<c008eff8>] (notifier_call_chain+0x4c/0x8c) from [<c008f3d4>] (__blocking_notifier_call_chain+0x50/0x68)
       [<c008f3d4>] (__blocking_notifier_call_chain+0x50/0x68) from [<c008f40c>] (blocking_notifier_call_chain+0x20/0x28)
       [<c008f40c>] (blocking_notifier_call_chain+0x20/0x28) from [<c00aac6c>] (pm_qos_update_bounded_target+0xd8/0x310)
       [<c00aac6c>] (pm_qos_update_bounded_target+0xd8/0x310) from [<c00ab3b0>] (__pm_qos_update_request+0x64/0x70)
       [<c00ab3b0>] (__pm_qos_update_request+0x64/0x70) from [<c004b4b8>] (tegra_pm_notify+0x114/0x134)
       [<c004b4b8>] (tegra_pm_notify+0x114/0x134) from [<c008eff8>] (notifier_call_chain+0x4c/0x8c)
       [<c008eff8>] (notifier_call_chain+0x4c/0x8c) from [<c008f3d4>] (__blocking_notifier_call_chain+0x50/0x68)
       [<c008f3d4>] (__blocking_notifier_call_chain+0x50/0x68) from [<c008f40c>] (blocking_notifier_call_chain+0x20/0x28)
       [<c008f40c>] (blocking_notifier_call_chain+0x20/0x28) from [<c00ac228>] (pm_notifier_call_chain+0x1c/0x34)
       [<c00ac228>] (pm_notifier_call_chain+0x1c/0x34) from [<c00ad38c>] (enter_state+0xec/0x128)
       [<c00ad38c>] (enter_state+0xec/0x128) from [<c00ad400>] (pm_suspend+0x38/0xa4)
       [<c00ad400>] (pm_suspend+0x38/0xa4) from [<c00ac114>] (state_store+0x70/0xc0)
       [<c00ac114>] (state_store+0x70/0xc0) from [<c027b1e8>] (kobj_attr_store+0x14/0x20)
       [<c027b1e8>] (kobj_attr_store+0x14/0x20) from [<c019cd9c>] (sysfs_write_file+0x104/0x184)
       [<c019cd9c>] (sysfs_write_file+0x104/0x184) from [<c0143038>] (vfs_write+0xd0/0x19c)
       [<c0143038>] (vfs_write+0xd0/0x19c) from [<c0143414>] (SyS_write+0x4c/0x78)
       [<c0143414>] (SyS_write+0x4c/0x78) from [<c000f080>] (ret_fast_syscall+0x0/0x30)
       Code: e1a00006 eb084346 e59b0020 e5951024 (e5903028)
       ---[ end trace 0488523c8f6b0f9d ]---
      Signed-off-by: NBibek Basu <bbasu@nvidia.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Cc: 3.11+ <stable@vger.kernel.org> # 3.11+
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      c5450db8
  4. 06 1月, 2014 1 次提交
    • J
      cpufreq: Fix timer/workqueue corruption by protecting reading governor_enabled · 6f1e4efd
      Jane Li 提交于
      When a CPU is hot removed we'll cancel all the delayed work items via
      gov_cancel_work(). Sometimes the delayed work function determines that
      it should adjust the delay for all other CPUs that the policy is
      managing. If this scenario occurs, the canceling CPU will cancel its own
      work but queue up the other CPUs works to run.
      
      Commit 3617f2 (cpufreq: Fix timer/workqueue corruption due to double
      queueing) has tried to fix this, but reading governor_enabled is not
      protected by cpufreq_governor_lock. Even though od_dbs_timer() checks
      governor_enabled before gov_queue_work(), this scenario may occur. For
      example:
      
       CPU0                                        CPU1
       ----                                        ----
       cpu_down()
        ...                                        <work runs>
        __cpufreq_remove_dev()                     od_dbs_timer()
         __cpufreq_governor()                       policy->governor_enabled
          policy->governor_enabled = false;
          cpufreq_governor_dbs()
           case CPUFREQ_GOV_STOP:
            gov_cancel_work(dbs_data, policy);
             cpu0 work is canceled
              timer is canceled
              cpu1 work is canceled
              <waits for cpu1>
                                                    gov_queue_work(*, *, true);
                                                     cpu0 work queued
                                                     cpu1 work queued
                                                     cpu2 work queued
                                                     ...
              cpu1 work is canceled
              cpu2 work is canceled
              ...
      
      At the end of the GOV_STOP case cpu0 still has a work queued to
      run although the code is expecting all of the works to be
      canceled. __cpufreq_remove_dev() will then proceed to
      re-initialize all the other CPUs works except for the CPU that is
      going down. The CPUFREQ_GOV_START case in cpufreq_governor_dbs()
      will trample over the queued work and debugobjects will spit out
      a warning:
      
      WARNING: at lib/debugobjects.c:260 debug_print_object+0x94/0xbc()
      ODEBUG: init active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x14
      Modules linked in:
      CPU: 1 PID: 1205 Comm: sh Tainted: G        W    3.10.0 #200
      [<c01144f0>] (unwind_backtrace+0x0/0xf8) from [<c0111d98>] (show_stack+0x10/0x14)
      [<c0111d98>] (show_stack+0x10/0x14) from [<c01272cc>] (warn_slowpath_common+0x4c/0x68)
      [<c01272cc>] (warn_slowpath_common+0x4c/0x68) from [<c012737c>] (warn_slowpath_fmt+0x30/0x40)
      [<c012737c>] (warn_slowpath_fmt+0x30/0x40) from [<c034c640>] (debug_print_object+0x94/0xbc)
      [<c034c640>] (debug_print_object+0x94/0xbc) from [<c034c7f8>] (__debug_object_init+0xc8/0x3c0)
      [<c034c7f8>] (__debug_object_init+0xc8/0x3c0) from [<c01360e0>] (init_timer_key+0x20/0x104)
      [<c01360e0>] (init_timer_key+0x20/0x104) from [<c04872ac>] (cpufreq_governor_dbs+0x1dc/0x68c)
      [<c04872ac>] (cpufreq_governor_dbs+0x1dc/0x68c) from [<c04833a8>] (__cpufreq_governor+0x80/0x1b0)
      [<c04833a8>] (__cpufreq_governor+0x80/0x1b0) from [<c0483704>] (__cpufreq_remove_dev.isra.12+0x22c/0x380)
      [<c0483704>] (__cpufreq_remove_dev.isra.12+0x22c/0x380) from [<c0692f38>] (cpufreq_cpu_callback+0x48/0x5c)
      [<c0692f38>] (cpufreq_cpu_callback+0x48/0x5c) from [<c014fb40>] (notifier_call_chain+0x44/0x84)
      [<c014fb40>] (notifier_call_chain+0x44/0x84) from [<c012ae44>] (__cpu_notify+0x2c/0x48)
      [<c012ae44>] (__cpu_notify+0x2c/0x48) from [<c068dd40>] (_cpu_down+0x80/0x258)
      [<c068dd40>] (_cpu_down+0x80/0x258) from [<c068df40>] (cpu_down+0x28/0x3c)
      [<c068df40>] (cpu_down+0x28/0x3c) from [<c068e4c0>] (store_online+0x30/0x74)
      [<c068e4c0>] (store_online+0x30/0x74) from [<c03a7308>] (dev_attr_store+0x18/0x24)
      [<c03a7308>] (dev_attr_store+0x18/0x24) from [<c0256fe0>] (sysfs_write_file+0x100/0x180)
      [<c0256fe0>] (sysfs_write_file+0x100/0x180) from [<c01fec9c>] (vfs_write+0xbc/0x184)
      [<c01fec9c>] (vfs_write+0xbc/0x184) from [<c01ff034>] (SyS_write+0x40/0x68)
      [<c01ff034>] (SyS_write+0x40/0x68) from [<c010e200>] (ret_fast_syscall+0x0/0x48)
      
      In gov_queue_work(), lock cpufreq_governor_lock before gov_queue_work,
      and unlock it after __gov_queue_work(). In this way, governor_enabled
      is guaranteed not changed in gov_queue_work().
      Signed-off-by: NJane Li <jiel@marvell.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Reviewed-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      6f1e4efd
  5. 16 11月, 2013 1 次提交
  6. 30 8月, 2013 1 次提交
    • S
      cpufreq: Don't use smp_processor_id() in preemptible context · 69320783
      Stephen Boyd 提交于
      Workqueues are preemptible even if works are queued on them with
      queue_work_on(). Let's use raw_smp_processor_id() here to silence
      the warning.
      
      BUG: using smp_processor_id() in preemptible [00000000] code: kworker/3:2/674
      caller is gov_queue_work+0x28/0xb0
      CPU: 0 PID: 674 Comm: kworker/3:2 Tainted: G        W    3.10.0 #30
      Workqueue: events od_dbs_timer
      [<c010c178>] (unwind_backtrace+0x0/0x11c) from [<c0109dec>] (show_stack+0x10/0x14)
      [<c0109dec>] (show_stack+0x10/0x14) from [<c03885a4>] (debug_smp_processor_id+0xbc/0xf0)
      [<c03885a4>] (debug_smp_processor_id+0xbc/0xf0) from [<c0635864>] (gov_queue_work+0x28/0xb0)
      [<c0635864>] (gov_queue_work+0x28/0xb0) from [<c0635618>] (od_dbs_timer+0x108/0x134)
      [<c0635618>] (od_dbs_timer+0x108/0x134) from [<c01aa8f8>] (process_one_work+0x25c/0x444)
      [<c01aa8f8>] (process_one_work+0x25c/0x444) from [<c01aaf88>] (worker_thread+0x200/0x344)
      [<c01aaf88>] (worker_thread+0x200/0x344) from [<c01b03bc>] (kthread+0xa0/0xb0)
      [<c01b03bc>] (kthread+0xa0/0xb0) from [<c01061b8>] (ret_from_fork+0x14/0x3c)
      Signed-off-by: NStephen Boyd <sboyd@codeaurora.org>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      69320783
  7. 29 8月, 2013 2 次提交
    • S
      cpufreq: governor: Fix typos in comments · c4afc410
      Stratos Karafotis 提交于
       - 'Governer' should be 'Governor'.
       - 'S' is used for Siemens (electrical conductance) in SI units,
         so use small 's' for seconds.
      Signed-off-by: NStratos Karafotis <stratosk@semaphore.gr>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      c4afc410
    • S
      cpufreq: Fix timer/workqueue corruption due to double queueing · 3617f2ca
      Stephen Boyd 提交于
      When a CPU is hot removed we'll cancel all the delayed work items
      via gov_cancel_work(). Normally this will just cancels a delayed
      timer on each CPU that the policy is managing and the work won't
      run, but if the work is already running the workqueue code will
      wait for the work to finish before continuing to prevent the
      work items from re-queuing themselves like they normally do. This
      scheme will work most of the time, except for the case where the
      work function determines that it should adjust the delay for all
      other CPUs that the policy is managing. If this scenario occurs,
      the canceling CPU will cancel its own work but queue up the other
      CPUs works to run. For example:
      
       CPU0                                        CPU1
       ----                                        ----
       cpu_down()
        ...
        __cpufreq_remove_dev()
         cpufreq_governor_dbs()
          case CPUFREQ_GOV_STOP:
           gov_cancel_work(dbs_data, policy);
            cpu0 work is canceled
             timer is canceled
             cpu1 work is canceled                    <work runs>
             <waits for cpu1>                         od_dbs_timer()
                                                       gov_queue_work(*, *, true);
       						  cpu0 work queued
       						  cpu1 work queued
      						  cpu2 work queued
      						  ...
             cpu1 work is canceled
             cpu2 work is canceled
             ...
      
      At the end of the GOV_STOP case cpu0 still has a work queued to
      run although the code is expecting all of the works to be
      canceled. __cpufreq_remove_dev() will then proceed to
      re-initialize all the other CPUs works except for the CPU that is
      going down. The CPUFREQ_GOV_START case in cpufreq_governor_dbs()
      will trample over the queued work and debugobjects will spit out
      a warning:
      
      WARNING: at lib/debugobjects.c:260 debug_print_object+0x94/0xbc()
      ODEBUG: init active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x10
      Modules linked in:
      CPU: 0 PID: 1491 Comm: sh Tainted: G        W    3.10.0 #19
      [<c010c178>] (unwind_backtrace+0x0/0x11c) from [<c0109dec>] (show_stack+0x10/0x14)
      [<c0109dec>] (show_stack+0x10/0x14) from [<c01904cc>] (warn_slowpath_common+0x4c/0x6c)
      [<c01904cc>] (warn_slowpath_common+0x4c/0x6c) from [<c019056c>] (warn_slowpath_fmt+0x2c/0x3c)
      [<c019056c>] (warn_slowpath_fmt+0x2c/0x3c) from [<c0388a7c>] (debug_print_object+0x94/0xbc)
      [<c0388a7c>] (debug_print_object+0x94/0xbc) from [<c0388e34>] (__debug_object_init+0x2d0/0x340)
      [<c0388e34>] (__debug_object_init+0x2d0/0x340) from [<c019e3b0>] (init_timer_key+0x14/0xb0)
      [<c019e3b0>] (init_timer_key+0x14/0xb0) from [<c0635f78>] (cpufreq_governor_dbs+0x3e8/0x5f8)
      [<c0635f78>] (cpufreq_governor_dbs+0x3e8/0x5f8) from [<c06325a0>] (__cpufreq_governor+0xdc/0x1a4)
      [<c06325a0>] (__cpufreq_governor+0xdc/0x1a4) from [<c0633704>] (__cpufreq_remove_dev.isra.10+0x3b4/0x434)
      [<c0633704>] (__cpufreq_remove_dev.isra.10+0x3b4/0x434) from [<c08989f4>] (cpufreq_cpu_callback+0x60/0x80)
      [<c08989f4>] (cpufreq_cpu_callback+0x60/0x80) from [<c08a43c0>] (notifier_call_chain+0x38/0x68)
      [<c08a43c0>] (notifier_call_chain+0x38/0x68) from [<c01938e0>] (__cpu_notify+0x28/0x40)
      [<c01938e0>] (__cpu_notify+0x28/0x40) from [<c0892ad4>] (_cpu_down+0x7c/0x2c0)
      [<c0892ad4>] (_cpu_down+0x7c/0x2c0) from [<c0892d3c>] (cpu_down+0x24/0x40)
      [<c0892d3c>] (cpu_down+0x24/0x40) from [<c0893ea8>] (store_online+0x2c/0x74)
      [<c0893ea8>] (store_online+0x2c/0x74) from [<c04519d8>] (dev_attr_store+0x18/0x24)
      [<c04519d8>] (dev_attr_store+0x18/0x24) from [<c02a69d4>] (sysfs_write_file+0x100/0x148)
      [<c02a69d4>] (sysfs_write_file+0x100/0x148) from [<c0255c18>] (vfs_write+0xcc/0x174)
      [<c0255c18>] (vfs_write+0xcc/0x174) from [<c0255f70>] (SyS_write+0x38/0x64)
      [<c0255f70>] (SyS_write+0x38/0x64) from [<c0106120>] (ret_fast_syscall+0x0/0x30)
      Signed-off-by: NStephen Boyd <sboyd@codeaurora.org>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      3617f2ca
  8. 08 8月, 2013 2 次提交
  9. 26 7月, 2013 1 次提交
    • S
      cpufreq: ondemand: Change the calculation of target frequency · dfa5bb62
      Stratos Karafotis 提交于
      The ondemand governor calculates load in terms of frequency and
      increases it only if load_freq is greater than up_threshold
      multiplied by the current or average frequency.  This appears to
      produce oscillations of frequency between min and max because,
      for example, a relatively small load can easily saturate minimum
      frequency and lead the CPU to the max.  Then, it will decrease
      back to the min due to small load_freq.
      
      Change the calculation method of load and target frequency on the
      basis of the following two observations:
      
       - Load computation should not depend on the current or average
         measured frequency.  For example, absolute load of 80% at 100MHz
         is not necessarily equivalent to 8% at 1000MHz in the next
         sampling interval.
      
       - It should be possible to increase the target frequency to any
         value present in the frequency table proportional to the absolute
         load, rather than to the max only, so that:
      
         Target frequency = C * load
      
         where we take C = policy->cpuinfo.max_freq / 100.
      
      Tested on Intel i7-3770 CPU @ 3.40GHz and on Quad core 1500MHz Krait.
      Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
      increase ~1.5% in performance. cpufreq_stats (time_in_state) shows
      that middle frequencies are used more, with this patch.  Highest
      and lowest frequencies were used less by ~9%.
      
      [rjw: We have run multiple other tests on kernels with this
       change applied and in the vast majority of cases it turns out
       that the resulting performance improvement also leads to reduced
       consumption of energy.  The change is additionally justified by
       the overall simplification of the code in question.]
      Signed-off-by: NStratos Karafotis <stratosk@semaphore.gr>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      dfa5bb62
  10. 17 7月, 2013 1 次提交
    • S
      cpufreq: Revert commit 2f7021a8 to fix CPU hotplug regression · e8d05276
      Srivatsa S. Bhat 提交于
      commit 2f7021a8 "cpufreq: protect 'policy->cpus' from offlining
      during __gov_queue_work()" caused a regression in CPU hotplug,
      because it lead to a deadlock between cpufreq governor worker thread
      and the CPU hotplug writer task.
      
      Lockdep splat corresponding to this deadlock is shown below:
      
      [   60.277396] ======================================================
      [   60.277400] [ INFO: possible circular locking dependency detected ]
      [   60.277407] 3.10.0-rc7-dbg-01385-g241fd04-dirty #1744 Not tainted
      [   60.277411] -------------------------------------------------------
      [   60.277417] bash/2225 is trying to acquire lock:
      [   60.277422]  ((&(&j_cdbs->work)->work)){+.+...}, at: [<ffffffff810621b5>] flush_work+0x5/0x280
      [   60.277444] but task is already holding lock:
      [   60.277449]  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81042d8b>] cpu_hotplug_begin+0x2b/0x60
      [   60.277465] which lock already depends on the new lock.
      
      [   60.277472] the existing dependency chain (in reverse order) is:
      [   60.277477] -> #2 (cpu_hotplug.lock){+.+.+.}:
      [   60.277490]        [<ffffffff810ac6d4>] lock_acquire+0xa4/0x200
      [   60.277503]        [<ffffffff815b6157>] mutex_lock_nested+0x67/0x410
      [   60.277514]        [<ffffffff81042cbc>] get_online_cpus+0x3c/0x60
      [   60.277522]        [<ffffffff814b842a>] gov_queue_work+0x2a/0xb0
      [   60.277532]        [<ffffffff814b7891>] cs_dbs_timer+0xc1/0xe0
      [   60.277543]        [<ffffffff8106302d>] process_one_work+0x1cd/0x6a0
      [   60.277552]        [<ffffffff81063d31>] worker_thread+0x121/0x3a0
      [   60.277560]        [<ffffffff8106ae2b>] kthread+0xdb/0xe0
      [   60.277569]        [<ffffffff815bb96c>] ret_from_fork+0x7c/0xb0
      [   60.277580] -> #1 (&j_cdbs->timer_mutex){+.+...}:
      [   60.277592]        [<ffffffff810ac6d4>] lock_acquire+0xa4/0x200
      [   60.277600]        [<ffffffff815b6157>] mutex_lock_nested+0x67/0x410
      [   60.277608]        [<ffffffff814b785d>] cs_dbs_timer+0x8d/0xe0
      [   60.277616]        [<ffffffff8106302d>] process_one_work+0x1cd/0x6a0
      [   60.277624]        [<ffffffff81063d31>] worker_thread+0x121/0x3a0
      [   60.277633]        [<ffffffff8106ae2b>] kthread+0xdb/0xe0
      [   60.277640]        [<ffffffff815bb96c>] ret_from_fork+0x7c/0xb0
      [   60.277649] -> #0 ((&(&j_cdbs->work)->work)){+.+...}:
      [   60.277661]        [<ffffffff810ab826>] __lock_acquire+0x1766/0x1d30
      [   60.277669]        [<ffffffff810ac6d4>] lock_acquire+0xa4/0x200
      [   60.277677]        [<ffffffff810621ed>] flush_work+0x3d/0x280
      [   60.277685]        [<ffffffff81062d8a>] __cancel_work_timer+0x8a/0x120
      [   60.277693]        [<ffffffff81062e53>] cancel_delayed_work_sync+0x13/0x20
      [   60.277701]        [<ffffffff814b89d9>] cpufreq_governor_dbs+0x529/0x6f0
      [   60.277709]        [<ffffffff814b76a7>] cs_cpufreq_governor_dbs+0x17/0x20
      [   60.277719]        [<ffffffff814b5df8>] __cpufreq_governor+0x48/0x100
      [   60.277728]        [<ffffffff814b6b80>] __cpufreq_remove_dev.isra.14+0x80/0x3c0
      [   60.277737]        [<ffffffff815adc0d>] cpufreq_cpu_callback+0x38/0x4c
      [   60.277747]        [<ffffffff81071a4d>] notifier_call_chain+0x5d/0x110
      [   60.277759]        [<ffffffff81071b0e>] __raw_notifier_call_chain+0xe/0x10
      [   60.277768]        [<ffffffff815a0a68>] _cpu_down+0x88/0x330
      [   60.277779]        [<ffffffff815a0d46>] cpu_down+0x36/0x50
      [   60.277788]        [<ffffffff815a2748>] store_online+0x98/0xd0
      [   60.277796]        [<ffffffff81452a28>] dev_attr_store+0x18/0x30
      [   60.277806]        [<ffffffff811d9edb>] sysfs_write_file+0xdb/0x150
      [   60.277818]        [<ffffffff8116806d>] vfs_write+0xbd/0x1f0
      [   60.277826]        [<ffffffff811686fc>] SyS_write+0x4c/0xa0
      [   60.277834]        [<ffffffff815bbbbe>] tracesys+0xd0/0xd5
      [   60.277842] other info that might help us debug this:
      
      [   60.277848] Chain exists of:
        (&(&j_cdbs->work)->work) --> &j_cdbs->timer_mutex --> cpu_hotplug.lock
      
      [   60.277864]  Possible unsafe locking scenario:
      
      [   60.277869]        CPU0                    CPU1
      [   60.277873]        ----                    ----
      [   60.277877]   lock(cpu_hotplug.lock);
      [   60.277885]                                lock(&j_cdbs->timer_mutex);
      [   60.277892]                                lock(cpu_hotplug.lock);
      [   60.277900]   lock((&(&j_cdbs->work)->work));
      [   60.277907]  *** DEADLOCK ***
      
      [   60.277915] 6 locks held by bash/2225:
      [   60.277919]  #0:  (sb_writers#6){.+.+.+}, at: [<ffffffff81168173>] vfs_write+0x1c3/0x1f0
      [   60.277937]  #1:  (&buffer->mutex){+.+.+.}, at: [<ffffffff811d9e3c>] sysfs_write_file+0x3c/0x150
      [   60.277954]  #2:  (s_active#61){.+.+.+}, at: [<ffffffff811d9ec3>] sysfs_write_file+0xc3/0x150
      [   60.277972]  #3:  (x86_cpu_hotplug_driver_mutex){+.+...}, at: [<ffffffff81024cf7>] cpu_hotplug_driver_lock+0x17/0x20
      [   60.277990]  #4:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff815a0d32>] cpu_down+0x22/0x50
      [   60.278007]  #5:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81042d8b>] cpu_hotplug_begin+0x2b/0x60
      [   60.278023] stack backtrace:
      [   60.278031] CPU: 3 PID: 2225 Comm: bash Not tainted 3.10.0-rc7-dbg-01385-g241fd04-dirty #1744
      [   60.278037] Hardware name: Acer             Aspire 5741G    /Aspire 5741G    , BIOS V1.20 02/08/2011
      [   60.278042]  ffffffff8204e110 ffff88014df6b9f8 ffffffff815b3d90 ffff88014df6ba38
      [   60.278055]  ffffffff815b0a8d ffff880150ed3f60 ffff880150ed4770 3871c4002c8980b2
      [   60.278068]  ffff880150ed4748 ffff880150ed4770 ffff880150ed3f60 ffff88014df6bb00
      [   60.278081] Call Trace:
      [   60.278091]  [<ffffffff815b3d90>] dump_stack+0x19/0x1b
      [   60.278101]  [<ffffffff815b0a8d>] print_circular_bug+0x2b6/0x2c5
      [   60.278111]  [<ffffffff810ab826>] __lock_acquire+0x1766/0x1d30
      [   60.278123]  [<ffffffff81067e08>] ? __kernel_text_address+0x58/0x80
      [   60.278134]  [<ffffffff810ac6d4>] lock_acquire+0xa4/0x200
      [   60.278142]  [<ffffffff810621b5>] ? flush_work+0x5/0x280
      [   60.278151]  [<ffffffff810621ed>] flush_work+0x3d/0x280
      [   60.278159]  [<ffffffff810621b5>] ? flush_work+0x5/0x280
      [   60.278169]  [<ffffffff810a9b14>] ? mark_held_locks+0x94/0x140
      [   60.278178]  [<ffffffff81062d77>] ? __cancel_work_timer+0x77/0x120
      [   60.278188]  [<ffffffff810a9cbd>] ? trace_hardirqs_on_caller+0xfd/0x1c0
      [   60.278196]  [<ffffffff81062d8a>] __cancel_work_timer+0x8a/0x120
      [   60.278206]  [<ffffffff81062e53>] cancel_delayed_work_sync+0x13/0x20
      [   60.278214]  [<ffffffff814b89d9>] cpufreq_governor_dbs+0x529/0x6f0
      [   60.278225]  [<ffffffff814b76a7>] cs_cpufreq_governor_dbs+0x17/0x20
      [   60.278234]  [<ffffffff814b5df8>] __cpufreq_governor+0x48/0x100
      [   60.278244]  [<ffffffff814b6b80>] __cpufreq_remove_dev.isra.14+0x80/0x3c0
      [   60.278255]  [<ffffffff815adc0d>] cpufreq_cpu_callback+0x38/0x4c
      [   60.278265]  [<ffffffff81071a4d>] notifier_call_chain+0x5d/0x110
      [   60.278275]  [<ffffffff81071b0e>] __raw_notifier_call_chain+0xe/0x10
      [   60.278284]  [<ffffffff815a0a68>] _cpu_down+0x88/0x330
      [   60.278292]  [<ffffffff81024cf7>] ? cpu_hotplug_driver_lock+0x17/0x20
      [   60.278302]  [<ffffffff815a0d46>] cpu_down+0x36/0x50
      [   60.278311]  [<ffffffff815a2748>] store_online+0x98/0xd0
      [   60.278320]  [<ffffffff81452a28>] dev_attr_store+0x18/0x30
      [   60.278329]  [<ffffffff811d9edb>] sysfs_write_file+0xdb/0x150
      [   60.278337]  [<ffffffff8116806d>] vfs_write+0xbd/0x1f0
      [   60.278347]  [<ffffffff81185950>] ? fget_light+0x320/0x4b0
      [   60.278355]  [<ffffffff811686fc>] SyS_write+0x4c/0xa0
      [   60.278364]  [<ffffffff815bbbbe>] tracesys+0xd0/0xd5
      [   60.280582] smpboot: CPU 1 is now offline
      
      The intention of that commit was to avoid warnings during CPU
      hotplug, which indicated that offline CPUs were getting IPIs from the
      cpufreq governor's work items.  But the real root-cause of that
      problem was commit a66b2e50 (cpufreq: Preserve sysfs files across
      suspend/resume) because it totally skipped all the cpufreq callbacks
      during CPU hotplug in the suspend/resume path, and hence it never
      actually shut down the cpufreq governor's worker threads during CPU
      offline in the suspend/resume path.
      
      Reflecting back, the reason why we never suspected that commit as the
      root-cause earlier, was that the original issue was reported with
      just the halt command and nobody had brought in suspend/resume to the
      equation.
      
      The reason for _that_ in turn, as it turns out, is that earlier
      halt/shutdown was being done by disabling non-boot CPUs while tasks
      were frozen, just like suspend/resume....  but commit cf7df378
      (reboot: migrate shutdown/reboot to boot cpu) which came somewhere
      along that very same time changed that logic: shutdown/halt no longer
      takes CPUs offline.  Thus, the test-cases for reproducing the bug
      were vastly different and thus we went totally off the trail.
      
      Overall, it was one hell of a confusion with so many commits
      affecting each other and also affecting the symptoms of the problems
      in subtle ways.  Finally, now since the original problematic commit
      (a66b2e50) has been completely reverted, revert this intermediate fix
      too (2f7021a8), to fix the CPU hotplug deadlock.  Phew!
      Reported-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Reported-by: NBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Tested-by: NPeter Wu <lekensteyn@gmail.com>
      Cc: 3.10+ <stable@vger.kernel.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      e8d05276
  11. 28 6月, 2013 1 次提交
  12. 05 6月, 2013 1 次提交
    • M
      cpufreq: protect 'policy->cpus' from offlining during __gov_queue_work() · 2f7021a8
      Michael Wang 提交于
      Jiri Kosina <jkosina@suse.cz> and Borislav Petkov <bp@alien8.de>
      reported the warning:
      
      [   51.616759] ------------[ cut here ]------------
      [   51.621460] WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule+0x58/0x60()
      [   51.629638] Modules linked in: ext2 vfat fat loop snd_hda_codec_hdmi usbhid snd_hda_codec_realtek coretemp kvm_intel kvm snd_hda_intel snd_hda_codec crc32_pclmul crc32c_intel ghash_clmulni_intel snd_hwdep snd_pcm aesni_intel sb_edac aes_x86_64 ehci_pci snd_page_alloc glue_helper snd_timer xhci_hcd snd iTCO_wdt iTCO_vendor_support ehci_hcd edac_core lpc_ich acpi_cpufreq lrw gf128mul ablk_helper cryptd mperf usbcore usb_common soundcore mfd_core dcdbas evdev pcspkr processor i2c_i801 button microcode
      [   51.675581] CPU: 0 PID: 244 Comm: kworker/1:1 Tainted: G        W    3.10.0-rc1+ #10
      [   51.683407] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
      [   51.690901] Workqueue: events od_dbs_timer
      [   51.695069]  0000000000000009 ffff88043a2f5b68 ffffffff8161441c ffff88043a2f5ba8
      [   51.702602]  ffffffff8103e540 0000000000000033 0000000000000001 ffff88043d5f8000
      [   51.710136]  00000000ffff0ce1 0000000000000001 ffff88044fc4fc08 ffff88043a2f5bb8
      [   51.717691] Call Trace:
      [   51.720191]  [<ffffffff8161441c>] dump_stack+0x19/0x1b
      [   51.725396]  [<ffffffff8103e540>] warn_slowpath_common+0x70/0xa0
      [   51.731473]  [<ffffffff8103e58a>] warn_slowpath_null+0x1a/0x20
      [   51.737378]  [<ffffffff81025628>] native_smp_send_reschedule+0x58/0x60
      [   51.744013]  [<ffffffff81072cfd>] wake_up_nohz_cpu+0x2d/0xa0
      [   51.749745]  [<ffffffff8104f6bf>] add_timer_on+0x8f/0x110
      [   51.755214]  [<ffffffff8105f6fe>] __queue_delayed_work+0x16e/0x1a0
      [   51.761470]  [<ffffffff8105f251>] ? try_to_grab_pending+0xd1/0x1a0
      [   51.767724]  [<ffffffff8105f78a>] mod_delayed_work_on+0x5a/0xa0
      [   51.773719]  [<ffffffff814f6b5d>] gov_queue_work+0x4d/0xc0
      [   51.779271]  [<ffffffff814f60cb>] od_dbs_timer+0xcb/0x170
      [   51.784734]  [<ffffffff8105e75d>] process_one_work+0x1fd/0x540
      [   51.790634]  [<ffffffff8105e6f2>] ? process_one_work+0x192/0x540
      [   51.796711]  [<ffffffff8105ef22>] worker_thread+0x122/0x380
      [   51.802350]  [<ffffffff8105ee00>] ? rescuer_thread+0x320/0x320
      [   51.808264]  [<ffffffff8106634a>] kthread+0xea/0xf0
      [   51.813200]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
      [   51.819644]  [<ffffffff81623d5c>] ret_from_fork+0x7c/0xb0
      [   51.918165] nouveau E[     DRM] GPU lockup - switching to software fbcon
      [   51.930505]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
      [   51.936994] ---[ end trace f419538ada83b5c5 ]---
      
      It was caused by the policy->cpus changed during the process of
      __gov_queue_work(), in other word, cpu offline happened.
      
      Use get/put_online_cpus() to prevent the offline from happening while
      __gov_queue_work() is running.
      
      [rjw: The problem has been present since recent commit 031299b3
      (cpufreq: governors: Avoid unnecessary per cpu timer interrupts)]
      
      References: https://lkml.org/lkml/2013/6/5/88Reported-by: NBorislav Petkov <bp@alien8.de>
      Reported-and-tested-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NMichael Wang <wangyun@linux.vnet.ibm.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      2f7021a8
  13. 27 5月, 2013 3 次提交
  14. 12 5月, 2013 1 次提交
    • V
      cpufreq: governors: Fix CPUFREQ_GOV_POLICY_{INIT|EXIT} notifiers · a97c98ad
      Viresh Kumar 提交于
      There are two types of INIT/EXIT activities that we need to do for
      governors:
       - Done only once per governor (doesn't depend how many instances of
         the governor there are). eg: cpufreq_register_notifier() for
         conservative governor.
       - Done per governor instance, eg: sysfs_{create|remove}_group().
      
      There were some corner cases where current code isn't able to handle
      them separately and so failing for some test cases.
      
      We use two separate variables now for keeping track of above two
      requirements.
       - governor->initialized for first one
       - dbs_data->usage_count for per governor instance
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      a97c98ad
  15. 01 4月, 2013 3 次提交
    • S
      cpufreq: governors: Calculate iowait time only when necessary · 9366d840
      Stratos Karafotis 提交于
      Currently we always calculate the CPU iowait time and add it to idle time.
      If we are in ondemand and we use io_is_busy, we re-calculate iowait time
      and we subtract it from idle time.
      
      With this patch iowait time is calculated only when necessary avoiding
      the double call to get_cpu_iowait_time_us. We use a parameter in
      function get_cpu_idle_time to distinguish when the iowait time will be
      added to idle time or not, without the need of keeping the prev_io_wait.
      Signed-off-by: NStratos Karafotis <stratosk@semaphore.gr>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.,org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      9366d840
    • V
      cpufreq: governors: Avoid unnecessary per cpu timer interrupts · 031299b3
      Viresh Kumar 提交于
      Following patch has introduced per cpu timers or works for ondemand and
      conservative governors.
      
      	commit 2abfa876
      	Author: Rickard Andersson <rickard.andersson@stericsson.com>
      	Date:   Thu Dec 27 14:55:38 2012 +0000
      
      	    cpufreq: handle SW coordinated CPUs
      
      This causes additional unnecessary interrupts on all cpus when the load is
      recently evaluated by any other cpu. i.e. When load is recently evaluated by cpu
      x, we don't really need any other cpu to evaluate this load again for the next
      sampling_rate time.
      
      Some sort of code is present to avoid that but we are still getting timer
      interrupts for all cpus. A good way of avoiding this would be to modify delays
      for all cpus (policy->cpus) whenever any cpu has evaluated load.
      
      This patch does this change and some related code cleanup.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      031299b3
    • V
      cpufreq: governor: Implement per policy instances of governors · 4d5dcc42
      Viresh Kumar 提交于
      Currently, there can't be multiple instances of single governor_type.
      If we have a multi-package system, where we have multiple instances
      of struct policy (per package), we can't have multiple instances of
      same governor. i.e. We can't have multiple instances of ondemand
      governor for multiple packages.
      
      Governors directory in sysfs is created at /sys/devices/system/cpu/cpufreq/
      governor-name/. Which again reflects that there can be only one
      instance of a governor_type in the system.
      
      This is a bottleneck for multicluster system, where we want different
      packages to use same governor type, but with different tunables.
      
      This patch uses the infrastructure provided by earlier patch and
      implements init/exit routines for ondemand and conservative
      governors.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      4d5dcc42
  16. 09 2月, 2013 2 次提交
  17. 02 2月, 2013 8 次提交
  18. 24 11月, 2012 1 次提交
    • R
      cpufreq: governors: Fix jiffies/cputime mixup (revisited) · a0e5af3c
      Rafael J. Wysocki 提交于
      This change was made by commit 8636fd28 (cpufreq: fix jiffies/cputime
      mixup in conservative/ondemand governors) before, but then it has
      been reverted inadvertently by commit 4471a34f (cpufreq: governors:
      remove redundant code).
      
      The changelog of commit 8636fd28's says:
      
        The function get_cpu_idle_time_jiffy in both the conservative and
        ondemand governors use jiffies_to_usecs to convert a cputime value
        to usecs which gives the wrong value on architectures where cputime
        and jiffies use different units.  Only matters if NO_HZ is
        disabled, since otherwise get_cpu_idle_time_us should already
        return a valid value, and get_cpu_idle_time_jiffy isn't actually
        called.
      
      Since now we have only one common get_cpu_idle_time_jiffy() used by
      both governors in question, modify it along the lines of commit
      8636fd28 to restore the correct behavior.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: NViresh Kumar <viresh.kumar@linaro.org>
      a0e5af3c
  19. 15 11月, 2012 4 次提交