1. 06 1月, 2014 1 次提交
    • J
      cpufreq: Fix timer/workqueue corruption by protecting reading governor_enabled · 6f1e4efd
      Jane Li 提交于
      When a CPU is hot removed we'll cancel all the delayed work items via
      gov_cancel_work(). Sometimes the delayed work function determines that
      it should adjust the delay for all other CPUs that the policy is
      managing. If this scenario occurs, the canceling CPU will cancel its own
      work but queue up the other CPUs works to run.
      
      Commit 3617f2 (cpufreq: Fix timer/workqueue corruption due to double
      queueing) has tried to fix this, but reading governor_enabled is not
      protected by cpufreq_governor_lock. Even though od_dbs_timer() checks
      governor_enabled before gov_queue_work(), this scenario may occur. For
      example:
      
       CPU0                                        CPU1
       ----                                        ----
       cpu_down()
        ...                                        <work runs>
        __cpufreq_remove_dev()                     od_dbs_timer()
         __cpufreq_governor()                       policy->governor_enabled
          policy->governor_enabled = false;
          cpufreq_governor_dbs()
           case CPUFREQ_GOV_STOP:
            gov_cancel_work(dbs_data, policy);
             cpu0 work is canceled
              timer is canceled
              cpu1 work is canceled
              <waits for cpu1>
                                                    gov_queue_work(*, *, true);
                                                     cpu0 work queued
                                                     cpu1 work queued
                                                     cpu2 work queued
                                                     ...
              cpu1 work is canceled
              cpu2 work is canceled
              ...
      
      At the end of the GOV_STOP case cpu0 still has a work queued to
      run although the code is expecting all of the works to be
      canceled. __cpufreq_remove_dev() will then proceed to
      re-initialize all the other CPUs works except for the CPU that is
      going down. The CPUFREQ_GOV_START case in cpufreq_governor_dbs()
      will trample over the queued work and debugobjects will spit out
      a warning:
      
      WARNING: at lib/debugobjects.c:260 debug_print_object+0x94/0xbc()
      ODEBUG: init active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x14
      Modules linked in:
      CPU: 1 PID: 1205 Comm: sh Tainted: G        W    3.10.0 #200
      [<c01144f0>] (unwind_backtrace+0x0/0xf8) from [<c0111d98>] (show_stack+0x10/0x14)
      [<c0111d98>] (show_stack+0x10/0x14) from [<c01272cc>] (warn_slowpath_common+0x4c/0x68)
      [<c01272cc>] (warn_slowpath_common+0x4c/0x68) from [<c012737c>] (warn_slowpath_fmt+0x30/0x40)
      [<c012737c>] (warn_slowpath_fmt+0x30/0x40) from [<c034c640>] (debug_print_object+0x94/0xbc)
      [<c034c640>] (debug_print_object+0x94/0xbc) from [<c034c7f8>] (__debug_object_init+0xc8/0x3c0)
      [<c034c7f8>] (__debug_object_init+0xc8/0x3c0) from [<c01360e0>] (init_timer_key+0x20/0x104)
      [<c01360e0>] (init_timer_key+0x20/0x104) from [<c04872ac>] (cpufreq_governor_dbs+0x1dc/0x68c)
      [<c04872ac>] (cpufreq_governor_dbs+0x1dc/0x68c) from [<c04833a8>] (__cpufreq_governor+0x80/0x1b0)
      [<c04833a8>] (__cpufreq_governor+0x80/0x1b0) from [<c0483704>] (__cpufreq_remove_dev.isra.12+0x22c/0x380)
      [<c0483704>] (__cpufreq_remove_dev.isra.12+0x22c/0x380) from [<c0692f38>] (cpufreq_cpu_callback+0x48/0x5c)
      [<c0692f38>] (cpufreq_cpu_callback+0x48/0x5c) from [<c014fb40>] (notifier_call_chain+0x44/0x84)
      [<c014fb40>] (notifier_call_chain+0x44/0x84) from [<c012ae44>] (__cpu_notify+0x2c/0x48)
      [<c012ae44>] (__cpu_notify+0x2c/0x48) from [<c068dd40>] (_cpu_down+0x80/0x258)
      [<c068dd40>] (_cpu_down+0x80/0x258) from [<c068df40>] (cpu_down+0x28/0x3c)
      [<c068df40>] (cpu_down+0x28/0x3c) from [<c068e4c0>] (store_online+0x30/0x74)
      [<c068e4c0>] (store_online+0x30/0x74) from [<c03a7308>] (dev_attr_store+0x18/0x24)
      [<c03a7308>] (dev_attr_store+0x18/0x24) from [<c0256fe0>] (sysfs_write_file+0x100/0x180)
      [<c0256fe0>] (sysfs_write_file+0x100/0x180) from [<c01fec9c>] (vfs_write+0xbc/0x184)
      [<c01fec9c>] (vfs_write+0xbc/0x184) from [<c01ff034>] (SyS_write+0x40/0x68)
      [<c01ff034>] (SyS_write+0x40/0x68) from [<c010e200>] (ret_fast_syscall+0x0/0x48)
      
      In gov_queue_work(), lock cpufreq_governor_lock before gov_queue_work,
      and unlock it after __gov_queue_work(). In this way, governor_enabled
      is guaranteed not changed in gov_queue_work().
      Signed-off-by: NJane Li <jiel@marvell.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Reviewed-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      6f1e4efd
  2. 16 10月, 2013 1 次提交
  3. 29 8月, 2013 1 次提交
  4. 08 8月, 2013 3 次提交
  5. 26 7月, 2013 1 次提交
    • S
      cpufreq: ondemand: Change the calculation of target frequency · dfa5bb62
      Stratos Karafotis 提交于
      The ondemand governor calculates load in terms of frequency and
      increases it only if load_freq is greater than up_threshold
      multiplied by the current or average frequency.  This appears to
      produce oscillations of frequency between min and max because,
      for example, a relatively small load can easily saturate minimum
      frequency and lead the CPU to the max.  Then, it will decrease
      back to the min due to small load_freq.
      
      Change the calculation method of load and target frequency on the
      basis of the following two observations:
      
       - Load computation should not depend on the current or average
         measured frequency.  For example, absolute load of 80% at 100MHz
         is not necessarily equivalent to 8% at 1000MHz in the next
         sampling interval.
      
       - It should be possible to increase the target frequency to any
         value present in the frequency table proportional to the absolute
         load, rather than to the max only, so that:
      
         Target frequency = C * load
      
         where we take C = policy->cpuinfo.max_freq / 100.
      
      Tested on Intel i7-3770 CPU @ 3.40GHz and on Quad core 1500MHz Krait.
      Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
      increase ~1.5% in performance. cpufreq_stats (time_in_state) shows
      that middle frequencies are used more, with this patch.  Highest
      and lowest frequencies were used less by ~9%.
      
      [rjw: We have run multiple other tests on kernels with this
       change applied and in the vast majority of cases it turns out
       that the resulting performance improvement also leads to reduced
       consumption of energy.  The change is additionally justified by
       the overall simplification of the code in question.]
      Signed-off-by: NStratos Karafotis <stratosk@semaphore.gr>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      dfa5bb62
  6. 21 6月, 2013 1 次提交
  7. 27 5月, 2013 1 次提交
  8. 12 5月, 2013 1 次提交
    • V
      cpufreq: governors: Fix CPUFREQ_GOV_POLICY_{INIT|EXIT} notifiers · a97c98ad
      Viresh Kumar 提交于
      There are two types of INIT/EXIT activities that we need to do for
      governors:
       - Done only once per governor (doesn't depend how many instances of
         the governor there are). eg: cpufreq_register_notifier() for
         conservative governor.
       - Done per governor instance, eg: sysfs_{create|remove}_group().
      
      There were some corner cases where current code isn't able to handle
      them separately and so failing for some test cases.
      
      We use two separate variables now for keeping track of above two
      requirements.
       - governor->initialized for first one
       - dbs_data->usage_count for per governor instance
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      a97c98ad
  9. 10 4月, 2013 1 次提交
  10. 02 4月, 2013 1 次提交
  11. 01 4月, 2013 4 次提交
    • S
      cpufreq: governors: Calculate iowait time only when necessary · 9366d840
      Stratos Karafotis 提交于
      Currently we always calculate the CPU iowait time and add it to idle time.
      If we are in ondemand and we use io_is_busy, we re-calculate iowait time
      and we subtract it from idle time.
      
      With this patch iowait time is calculated only when necessary avoiding
      the double call to get_cpu_iowait_time_us. We use a parameter in
      function get_cpu_idle_time to distinguish when the iowait time will be
      added to idle time or not, without the need of keeping the prev_io_wait.
      Signed-off-by: NStratos Karafotis <stratosk@semaphore.gr>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.,org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      9366d840
    • V
      cpufreq: governors: Avoid unnecessary per cpu timer interrupts · 031299b3
      Viresh Kumar 提交于
      Following patch has introduced per cpu timers or works for ondemand and
      conservative governors.
      
      	commit 2abfa876
      	Author: Rickard Andersson <rickard.andersson@stericsson.com>
      	Date:   Thu Dec 27 14:55:38 2012 +0000
      
      	    cpufreq: handle SW coordinated CPUs
      
      This causes additional unnecessary interrupts on all cpus when the load is
      recently evaluated by any other cpu. i.e. When load is recently evaluated by cpu
      x, we don't really need any other cpu to evaluate this load again for the next
      sampling_rate time.
      
      Some sort of code is present to avoid that but we are still getting timer
      interrupts for all cpus. A good way of avoiding this would be to modify delays
      for all cpus (policy->cpus) whenever any cpu has evaluated load.
      
      This patch does this change and some related code cleanup.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      031299b3
    • V
      cpufreq: governor: Set MIN_LATENCY_MULTIPLIER to 20 · 98104ee2
      Viresh Kumar 提交于
      Currently MIN_LATENCY_MULTIPLIER is set defined as 100 and so on a system with
      transition latency of 1 ms, the minimum sampling time comes to be around 100 ms.
      That is quite big if you want to get better performance for your system.
      
      Redefine MIN_LATENCY_MULTIPLIER to 20 so that we can support 20ms sampling rate
      for such platforms.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      98104ee2
    • V
      cpufreq: governor: Implement per policy instances of governors · 4d5dcc42
      Viresh Kumar 提交于
      Currently, there can't be multiple instances of single governor_type.
      If we have a multi-package system, where we have multiple instances
      of struct policy (per package), we can't have multiple instances of
      same governor. i.e. We can't have multiple instances of ondemand
      governor for multiple packages.
      
      Governors directory in sysfs is created at /sys/devices/system/cpu/cpufreq/
      governor-name/. Which again reflects that there can be only one
      instance of a governor_type in the system.
      
      This is a bottleneck for multicluster system, where we want different
      packages to use same governor type, but with different tunables.
      
      This patch uses the infrastructure provided by earlier patch and
      implements init/exit routines for ondemand and conservative
      governors.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      4d5dcc42
  12. 04 3月, 2013 1 次提交
  13. 09 2月, 2013 1 次提交
  14. 02 2月, 2013 5 次提交
  15. 15 11月, 2012 2 次提交