1. 10 12月, 2015 2 次提交
    • V
      cpufreq: ondemand: update update_sampling_rate() to make it more efficient · f08f638b
      Viresh Kumar 提交于
      Currently update_sampling_rate() runs over each online CPU and
      cancels/queues timers on all policy->cpus every time. This should be
      done just once for any cpu belonging to a policy.
      
      Create a cpumask and keep on clearing it as and when we process
      policies, so that we don't have to traverse through all CPUs of the same
      policy.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      f08f638b
    • V
      cpufreq: governor: replace per-CPU delayed work with timers · 70f43e5e
      Viresh Kumar 提交于
      cpufreq governors evaluate load at sampling rate and based on that they
      update frequency for a group of CPUs belonging to the same cpufreq
      policy.
      
      This is required to be done in a single thread for all policy->cpus, but
      because we don't want to wakeup idle CPUs to do just that, we use
      deferrable work for this. If we would have used a single delayed
      deferrable work for the entire policy, there were chances that the CPU
      required to run the handler can be in idle and we might end up not
      changing the frequency for the entire group with load variations.
      
      And so we were forced to keep per-cpu works, and only the one that
      expires first need to do the real work and others are rescheduled for
      next sampling time.
      
      We have been using the more complex solution until now, where we used a
      delayed deferrable work for this, which is a combination of a timer and
      a work.
      
      This could be made lightweight by keeping per-cpu deferred timers with a
      single work item, which is scheduled by the first timer that expires.
      
      This patch does just that and here are important changes:
      - The timer handler will run in irq context and so we need to use a
        spin_lock instead of the timer_mutex. And so a separate timer_lock is
        created. This also makes the use of the mutex and lock quite clear, as
        we know what exactly they are protecting.
      - A new field 'skip_work' is added to track when the timer handlers can
        queue a work. More comments present in code.
      Suggested-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Reviewed-by: NAshwin Chaugule <ashwin.chaugule@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      70f43e5e
  2. 07 12月, 2015 3 次提交
  3. 28 10月, 2015 1 次提交
  4. 21 7月, 2015 2 次提交
  5. 18 7月, 2015 2 次提交
  6. 15 6月, 2015 2 次提交
    • V
      cpufreq: governor: Serialize governor callbacks · 732b6d61
      Viresh Kumar 提交于
      There are several races reported in cpufreq core around governors (only
      ondemand and conservative) by different people.
      
      There are at least two race scenarios present in governor code:
       (a) Concurrent access/updates of governor internal structures.
      
       It is possible that fields such as 'dbs_data->usage_count', etc.  are
       accessed simultaneously for different policies using same governor
       structure (i.e. CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag unset). And
       because of this we can dereference bad pointers.
      
       For example consider a system with two CPUs with separate 'struct
       cpufreq_policy' instances. CPU0 governor: ondemand and CPU1: powersave.
       CPU0 switching to powersave and CPU1 to ondemand:
      	CPU0				CPU1
      
      	store*				store*
      
      	cpufreq_governor_exit()		cpufreq_governor_init()
      					dbs_data = cdata->gdbs_data;
      
      	if (!--dbs_data->usage_count)
      		kfree(dbs_data);
      
      					dbs_data->usage_count++;
      					*Bad pointer dereference*
      
       There are other races possible between EXIT and START/STOP/LIMIT as
       well. Its really complicated.
      
       (b) Switching governor state in bad sequence:
      
       For example trying to switch a governor to START state, when the
       governor is in EXIT state. There are some checks present in
       __cpufreq_governor() but they aren't sufficient as they compare events
       against 'policy->governor_enabled', where as we need to take governor's
       state into account, which can be used by multiple policies.
      
      These two issues need to be solved separately and the responsibility
      should be properly divided between cpufreq and governor core.
      
      The first problem is more about the governor core, as it needs to
      protect its structures properly. And the second problem should be fixed
      in cpufreq core instead of governor, as its all about sequence of
      events.
      
      This patch is trying to solve only the first problem.
      
      There are two types of data we need to protect,
      - 'struct common_dbs_data': No matter what, there is going to be a
        single copy of this per governor.
      - 'struct dbs_data': With CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag set, we
        will have per-policy copy of this data, otherwise a single copy.
      
      Because of such complexities, the mutex present in 'struct dbs_data' is
      insufficient to solve our problem. For example we need to protect
      fetching of 'dbs_data' from different structures at the beginning of
      cpufreq_governor_dbs(), to make sure it isn't currently being updated.
      
      This can be fixed if we can guarantee serialization of event parsing
      code for an individual governor. This is best solved with a mutex per
      governor, and the placeholder for that is 'struct common_dbs_data'.
      
      And so this patch moves the mutex from 'struct dbs_data' to 'struct
      common_dbs_data' and takes it at the beginning and drops it at the end
      of cpufreq_governor_dbs().
      
      Tested with and without following configuration options:
      
      CONFIG_LOCKDEP_SUPPORT=y
      CONFIG_DEBUG_RT_MUTEXES=y
      CONFIG_DEBUG_PI_LIST=y
      CONFIG_DEBUG_SPINLOCK=y
      CONFIG_DEBUG_MUTEXES=y
      CONFIG_DEBUG_LOCK_ALLOC=y
      CONFIG_PROVE_LOCKING=y
      CONFIG_LOCKDEP=y
      CONFIG_DEBUG_ATOMIC_SLEEP=y
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      732b6d61
    • V
      cpufreq: governor: register notifier from cs_init() · 8e0484d2
      Viresh Kumar 提交于
      Notifiers are required only for conservative governor and the common
      governor code is unnecessarily polluted with that. Handle that from
      cs_init/exit() instead of cpufreq_governor_dbs().
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      8e0484d2
  7. 21 7月, 2014 1 次提交
    • S
      cpufreq: ondemand: Eliminate the deadband effect · 6393d6a1
      Stratos Karafotis 提交于
      Currently, ondemand calculates the target frequency proportional to load
      using the formula:
      	Target frequency = C * load
      	where C = policy->cpuinfo.max_freq / 100
      
      Though, in many cases, the minimum available frequency is pretty high and
      the above calculation introduces a dead band from load 0 to
      100 * policy->cpuinfo.min_freq / policy->cpuinfo.max_freq where the target
      frequency is always calculated to less than policy->cpuinfo.min_freq and
      the minimum frequency is selected.
      
      For example: on Intel i7-3770 @ 3.4GHz the policy->cpuinfo.min_freq = 1600000
      and the policy->cpuinfo.max_freq = 3400000 (without turbo). Thus, the CPU
      starts to scale up at a load above 47.
      On quad core 1500MHz Krait the policy->cpuinfo.min_freq = 384000
      and the policy->cpuinfo.max_freq = 1512000. Thus, the CPU starts to scale
      at load above 25.
      
      Change the calculation of target frequency to eliminate the above effect using
      the formula:
      
      	Target frequency = A + B * load
      	where A = policy->cpuinfo.min_freq and
      	      B = (policy->cpuinfo.max_freq - policy->cpuinfo->min_freq) / 100
      
      This will map load values 0 to 100 linearly to cpuinfo.min_freq to
      cpuinfo.max_freq.
      
      Also, use the CPUFREQ_RELATION_C in __cpufreq_driver_target to select the
      closest frequency in frequency_table. This is necessary to avoid selection
      of minimum frequency only when load equals to 0. It will also help for selection
      of frequencies using a more 'fair' criterion.
      
      Tables below show the difference in selected frequency for specific values
      of load without and with this patch. On Intel i7-3770 @ 3.40GHz:
      	Without			With
      Load	Target	Selected	Target	Selected
      0	0	1600000		1600000	1600000
      5	170050	1600000		1690050	1700000
      10	340100	1600000		1780100	1700000
      15	510150	1600000		1870150	1900000
      20	680200	1600000		1960200	2000000
      25	850250	1600000		2050250	2100000
      30	1020300	1600000		2140300	2100000
      35	1190350	1600000		2230350	2200000
      40	1360400	1600000		2320400	2400000
      45	1530450	1600000		2410450	2400000
      50	1700500	1900000		2500500	2500000
      55	1870550	1900000		2590550	2600000
      60	2040600	2100000		2680600	2600000
      65	2210650	2400000		2770650	2800000
      70	2380700	2400000		2860700	2800000
      75	2550750	2600000		2950750	3000000
      80	2720800	2800000		3040800	3000000
      85	2890850	2900000		3130850	3100000
      90	3060900	3100000		3220900	3300000
      95	3230950	3300000		3310950	3300000
      100	3401000	3401000		3401000	3401000
      
      On ARM quad core 1500MHz Krait:
      	Without			With
      Load	Target	Selected	Target	Selected
      0	0	384000		384000	384000
      5	75600	384000		440400	486000
      10	151200	384000		496800	486000
      15	226800	384000		553200	594000
      20	302400	384000		609600	594000
      25	378000	384000		666000	702000
      30	453600	486000		722400	702000
      35	529200	594000		778800	810000
      40	604800	702000		835200	810000
      45	680400	702000		891600	918000
      50	756000	810000		948000	918000
      55	831600	918000		1004400	1026000
      60	907200	918000		1060800	1026000
      65	982800	1026000		1117200	1134000
      70	1058400	1134000		1173600	1134000
      75	1134000	1134000		1230000	1242000
      80	1209600	1242000		1286400	1242000
      85	1285200	1350000		1342800	1350000
      90	1360800	1458000		1399200	1350000
      95	1436400	1458000		1455600	1458000
      100	1512000	1512000		1512000	1512000
      
      Tested on Intel i7-3770 CPU @ 3.40GHz and on ARM quad core 1500MHz Krait
      (Android smartphone).
      Benchmarks on Intel i7 shows a performance improvement on low and medium
      work loads with lower power consumption. Specifics:
      
      Phoronix Linux Kernel Compilation 3.1:
      Time: -0.40%, energy: -0.07%
      Phoronix Apache:
      Time: -4.98%, energy: -2.35%
      Phoronix FFMPEG:
      Time: -6.29%, energy: -4.02%
      
      Also, running mp3 decoding (very low load) shows no differences with and
      without this patch.
      Signed-off-by: NStratos Karafotis <stratosk@semaphore.gr>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      6393d6a1
  8. 01 11月, 2013 1 次提交
  9. 29 8月, 2013 1 次提交
  10. 08 8月, 2013 4 次提交
  11. 26 7月, 2013 1 次提交
    • S
      cpufreq: ondemand: Change the calculation of target frequency · dfa5bb62
      Stratos Karafotis 提交于
      The ondemand governor calculates load in terms of frequency and
      increases it only if load_freq is greater than up_threshold
      multiplied by the current or average frequency.  This appears to
      produce oscillations of frequency between min and max because,
      for example, a relatively small load can easily saturate minimum
      frequency and lead the CPU to the max.  Then, it will decrease
      back to the min due to small load_freq.
      
      Change the calculation method of load and target frequency on the
      basis of the following two observations:
      
       - Load computation should not depend on the current or average
         measured frequency.  For example, absolute load of 80% at 100MHz
         is not necessarily equivalent to 8% at 1000MHz in the next
         sampling interval.
      
       - It should be possible to increase the target frequency to any
         value present in the frequency table proportional to the absolute
         load, rather than to the max only, so that:
      
         Target frequency = C * load
      
         where we take C = policy->cpuinfo.max_freq / 100.
      
      Tested on Intel i7-3770 CPU @ 3.40GHz and on Quad core 1500MHz Krait.
      Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
      increase ~1.5% in performance. cpufreq_stats (time_in_state) shows
      that middle frequencies are used more, with this patch.  Highest
      and lowest frequencies were used less by ~9%.
      
      [rjw: We have run multiple other tests on kernels with this
       change applied and in the vast majority of cases it turns out
       that the resulting performance improvement also leads to reduced
       consumption of energy.  The change is additionally justified by
       the overall simplification of the code in question.]
      Signed-off-by: NStratos Karafotis <stratosk@semaphore.gr>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      dfa5bb62
  12. 26 6月, 2013 1 次提交
  13. 13 5月, 2013 1 次提交
  14. 10 4月, 2013 1 次提交
  15. 01 4月, 2013 4 次提交
    • S
      cpufreq: governors: Calculate iowait time only when necessary · 9366d840
      Stratos Karafotis 提交于
      Currently we always calculate the CPU iowait time and add it to idle time.
      If we are in ondemand and we use io_is_busy, we re-calculate iowait time
      and we subtract it from idle time.
      
      With this patch iowait time is calculated only when necessary avoiding
      the double call to get_cpu_iowait_time_us. We use a parameter in
      function get_cpu_idle_time to distinguish when the iowait time will be
      added to idle time or not, without the need of keeping the prev_io_wait.
      Signed-off-by: NStratos Karafotis <stratosk@semaphore.gr>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.,org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      9366d840
    • V
      cpufreq: governors: Avoid unnecessary per cpu timer interrupts · 031299b3
      Viresh Kumar 提交于
      Following patch has introduced per cpu timers or works for ondemand and
      conservative governors.
      
      	commit 2abfa876
      	Author: Rickard Andersson <rickard.andersson@stericsson.com>
      	Date:   Thu Dec 27 14:55:38 2012 +0000
      
      	    cpufreq: handle SW coordinated CPUs
      
      This causes additional unnecessary interrupts on all cpus when the load is
      recently evaluated by any other cpu. i.e. When load is recently evaluated by cpu
      x, we don't really need any other cpu to evaluate this load again for the next
      sampling_rate time.
      
      Some sort of code is present to avoid that but we are still getting timer
      interrupts for all cpus. A good way of avoiding this would be to modify delays
      for all cpus (policy->cpus) whenever any cpu has evaluated load.
      
      This patch does this change and some related code cleanup.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      031299b3
    • V
      cpufreq: ondemand: Don't update sample_type if we don't evaluate load again · 9d445920
      Viresh Kumar 提交于
      Because we have per cpu timer now, we check if we need to evaluate load again or
      not (In case it is recently evaluated). Here the 2nd cpu which got timer
      interrupt updates core_dbs_info->sample_type irrespective of load evaluation is
      required or not. Which is wrong as the first cpu is dependent on this variable
      set to an older value.
      
      Moreover it would be best in this case to schedule 2nd cpu's timer to
      sampling_rate instead of freq_lo or hi as that must be managed by the other cpu.
      In case the other cpu idles in between then also we wouldn't loose much power.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      9d445920
    • V
      cpufreq: governor: Implement per policy instances of governors · 4d5dcc42
      Viresh Kumar 提交于
      Currently, there can't be multiple instances of single governor_type.
      If we have a multi-package system, where we have multiple instances
      of struct policy (per package), we can't have multiple instances of
      same governor. i.e. We can't have multiple instances of ondemand
      governor for multiple packages.
      
      Governors directory in sysfs is created at /sys/devices/system/cpu/cpufreq/
      governor-name/. Which again reflects that there can be only one
      instance of a governor_type in the system.
      
      This is a bottleneck for multicluster system, where we want different
      packages to use same governor type, but with different tunables.
      
      This patch uses the infrastructure provided by earlier patch and
      implements init/exit routines for ondemand and conservative
      governors.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      4d5dcc42
  16. 09 2月, 2013 2 次提交
  17. 02 2月, 2013 6 次提交
  18. 27 11月, 2012 1 次提交
    • F
      cpufreq: ondemand: update sampling rate only on right CPUs · 3e33ee9e
      Fabio Baltieri 提交于
      Fix cpufreq_gov_ondemand to skip CPU where another governor is used.
      
      The bug present itself as NULL pointer access on the mutex_lock() call,
      an can be reproduced on an SMP machine by setting the default governor
      to anything other than ondemand, setting a single CPU's governor to
      ondemand, then changing the sample rate by writing on:
      
      > /sys/devices/system/cpu/cpufreq/ondemand/sampling_rate
      
      Backtrace:
      
      Nov 26 17:36:54 balto kernel: [  839.585241] BUG: unable to handle kernel NULL pointer dereference at           (null)
      Nov 26 17:36:54 balto kernel: [  839.585311] IP: [<ffffffff8174e082>] __mutex_lock_slowpath+0xb2/0x170
      [snip]
      Nov 26 17:36:54 balto kernel: [  839.587005] Call Trace:
      Nov 26 17:36:54 balto kernel: [  839.587030]  [<ffffffff8174da82>] mutex_lock+0x22/0x40
      Nov 26 17:36:54 balto kernel: [  839.587067]  [<ffffffff81610b8f>] store_sampling_rate+0xbf/0x150
      Nov 26 17:36:54 balto kernel: [  839.587110]  [<ffffffff81031e9c>] ?  __do_page_fault+0x1cc/0x4c0
      Nov 26 17:36:54 balto kernel: [  839.587153]  [<ffffffff813309bf>] kobj_attr_store+0xf/0x20
      Nov 26 17:36:54 balto kernel: [  839.587192]  [<ffffffff811bb62d>] sysfs_write_file+0xcd/0x140
      Nov 26 17:36:54 balto kernel: [  839.587234]  [<ffffffff8114c12c>] vfs_write+0xac/0x180
      Nov 26 17:36:54 balto kernel: [  839.587271]  [<ffffffff8114c472>] sys_write+0x52/0xa0
      Nov 26 17:36:54 balto kernel: [  839.587306]  [<ffffffff810321ce>] ?  do_page_fault+0xe/0x10
      Nov 26 17:36:54 balto kernel: [  839.587345]  [<ffffffff81751202>] system_call_fastpath+0x16/0x1b
      Signed-off-by: NFabio Baltieri <fabio.baltieri@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      3e33ee9e
  19. 24 11月, 2012 1 次提交
  20. 15 11月, 2012 2 次提交
  21. 15 9月, 2012 1 次提交