1. 15 7月, 2013 4 次提交
    • D
      cpuidle: Make cpuidle's sysfs directory dynamically allocated · 728ce22b
      Daniel Lezcano 提交于
      The cpuidle sysfs code is designed to have a single instance of per
      CPU cpuidle directory.  It is not possible to remove the sysfs entry
      and create it again.  This is not a problem with the current code but
      future changes will add CPU hotplug support to enable/disable the
      device, so it will need to remove the sysfs entry like other
      subsystems do.  That won't be possible without this change, because
      the kobj is a static object which can't be reused for
      kobj_init_and_add().
      
      Add cpuidle_device_kobj to be allocated dynamically when
      adding/removing a sysfs entry which is consistent with the other
      cpuidle's sysfs entries.
      
      An added benefit is that the sysfs code is now more self-contained
      and the includes needed for sysfs can be moved from cpuidle.h
      directly into sysfs.c so as to reduce the total number of headers
      dragged along with cpuidle.h.
      
      [rjw: Changelog]
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      728ce22b
    • D
      cpuidle: Fix white space to follow CodingStyle · f89ae89e
      Daniel Lezcano 提交于
      Fix white space in the cpuidle code to follow the rules described in
      CodingStyle.
      
      No changes in behavior should result from this.
      
      [rjw: Changelog]
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      f89ae89e
    • D
      cpuidle: Check cpuidle_enable_device() return value · 10b9d3f8
      Daniel Lezcano 提交于
      We previously changed the ordering of the cpuidle framework
      initialization so that the governors are registered before the
      drivers which can register their devices right from the start.
      
      Now, we can safely remove the __cpuidle_register_device() call hack
      in cpuidle_enable_device() and check if the driver has been
      registered before enabling it.  Then, cpuidle_register_device() can
      consistently check the cpuidle_enable_device() return value when
      enabling the device.
      
      [rjw: Changelog]
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      10b9d3f8
    • D
      cpuidle: Make it clear that governors cannot be modules · 137b944e
      Daniel Lezcano 提交于
      cpufreq governors are defined as modules in the code, but the Kconfig
      options do not allow them to be built as modules.  This is not really
      a problem, but the cpuidle init ordering is: the cpuidle init
      functions (framework and driver) and then the governors.  That leads
      to some weirdness in the cpuidle framework.
      
      Namely,  cpuidle_register_device() calls cpuidle_enable_device() which
      fails at the first attempt, because governors have not been registered
      yet.  When a governor is registered, the framework calls
      cpuidle_enable_device() again which runs __cpuidle_register_device()
      only then.  Of course, for that to work, the cpuidle_enable_device()
      return value has to be ignored by cpuidle_register_device().
      
      Instead of having this cyclic call graph and relying on a positive
      side effects of the hackish back and forth cpuidle_enable_device()
      calls it is better to fix the cpuidle init ordering.
      
      To that end, replace the module init code with postcore_initcall()
      so we have:
      
       * cpuidle framework : core_initcall
       * cpuidle governors : postcore_initcall
       * cpuidle drivers   : device_initcall
      
      and remove the corresponding module exit code as it is dead anyway
      (governors can't be built as modules).
      
      [rjw: Changelog]
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      137b944e
  2. 24 6月, 2013 1 次提交
  3. 11 6月, 2013 3 次提交
    • D
      cpuidle: Fix ARCH_NEEDS_CPU_IDLE_COUPLED dependency warning · b39b0981
      Daniel Lezcano 提交于
      Before commit d6f346f2 (cpuidle: improve governor Kconfig options),
      the CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED option didn't depend on
      CONFIG_CPU_IDLE but now it has been moved under the CPU_IDLE
      menuconfig.
      
      That raises the following warnings:
      
       warning: (ARCH_OMAP4 && ARCH_TEGRA_2x_SOC) selects ARCH_NEEDS_CPU_IDLE_COUPLED
       which has unmet direct dependencies (CPU_IDLE)
       warning: (ARCH_OMAP4 && ARCH_TEGRA_2x_SOC) selects ARCH_NEEDS_CPU_IDLE_COUPLED
       which has unmet direct dependencies (CPU_IDLE)
      
      because the tegra2 and omap4 Kconfig files select this option
      without checking if CPU_IDLE is set.
      
      Fix that by moving ARCH_NEEDS_CPU_IDLE_COUPLED outside of CPU_IDLE.
      
      [rjw: Changelog]
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      b39b0981
    • D
      cpuidle: Comment the driver's framework code · 6d19cb93
      Daniel Lezcano 提交于
      Add kerneldoc (and other) comments to the cpuidle driver's framework
      code.
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      6d19cb93
    • D
      cpuidle: simplify multiple driver support · 82467a5a
      Daniel Lezcano 提交于
      Commit bf4d1b5d (cpuidle: support multiple drivers) introduced support
      for using multiple cpuidle drivers at the same time.  It added a
      couple of new APIs to register the driver per CPU, but that led to
      some unnecessary code complexity related to the kernel config options
      deciding whether or not the multiple driver support is enabled.  The
      code has to work as it did before when the multiple driver support is
      not enabled and the multiple driver support has to be compatible with
      the previously existing API.
      
      Remove the new API, not used by any driver in the tree yet (but
      needed for the HMP cpuidle drivers that will be submitted soon), and
      add a new cpumask pointer to the cpuidle driver structure that will
      point to the mask of CPUs handled by the given driver.  That will
      allow the cpuidle_[un]register_driver() API to be used for the
      multiple driver support along with the cpuidle_[un]register()
      functions added recently.
      
      [rjw: Changelog]
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      82467a5a
  4. 05 6月, 2013 1 次提交
  5. 04 6月, 2013 1 次提交
    • D
      cpuidle: improve governor Kconfig options · d6f346f2
      Daniel Lezcano 提交于
      Each governor is suitable for different kernel configurations: the menu
      governor suits better for a tickless system, while the ladder governor fits
      better for a periodic timer tick system.
      
      The Kconfig does not allow to [un]select a governor, thus both are compiled in
      the kernel but the init order makes the menu governor to be the last one to be
      registered, so becoming the default. The only way to switch back to the ladder
      governor is to enable the sysfs governor switch in the kernel command line.
      
      Because it seems nobody complained about this, the menu governor is used by
      default most of the time on the system, having both governors is not really
      necessary on a tickless system but there isn't a config option to disable one
      or another governor.
      
      Create a submenu for cpuidle and add a label for each governor, so we can see
      the option in the menu config and enable/disable it.
      
      The governors will be enabled depending on the CONFIG_NO_HZ option:
       - If CONFIG_NO_HZ is set, then the menu governor is selected and the ladder
         governor is optional, defaulting to 'yes'
       - If CONFIG_NO_HZ is not set, then the ladder governor is selected and the
         menu governor is optional, defaulting to 'yes'
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      d6f346f2
  6. 30 5月, 2013 1 次提交
  7. 27 4月, 2013 1 次提交
    • D
      cpuidle: add maintainer entry · a8e39c35
      Daniel Lezcano 提交于
      Currently cpuidle drivers are spread across different archs.
      
      As a result, there are several different paths for cpuidle patch
      submissions: cpuidle core changes go through linux-pm, ARM driver
      changes go to the arm-soc or SoC-specific trees, sh changes go
      through the sh arch tree, pseries changes go through the PowerPC tree
      and finally intel changes go through the Len's tree while ACPI idle
      changes go through linux-pm.
      
      That makes it difficult to consolidate code and to propagate
      modifications from the cpuidle core to the different drivers.
      
      Hopefully, a movement has started to put the majority of cpuidle
      drivers under drivers/cpuidle like cpuidle-calxeda.c and
      cpuidle-kirkwood.c.
      
      Add a maintainer entry for cpuidle to MAINTAINERS to clarify the
      situation and to indicate to new cpuidle driver authors that those
      drivers should not go into arch-specific directories.
      
      The upstreaming process is unchanged: Rafael takes patches for
      merging into his tree, but with an Acked-by: tag from the driver's
      maintainer, so indicate in the drivers' headers who maintains them.
      
      The arrangement will be the same as for cpufreq.
      
      [rjw: Changelog]
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Acked-by: NLinus Walleij <linus.walleij@linaro.org>
      Acked-by: Andrew Lunn <andrew@lunn.ch>  #for kirkwood
      Acked-by: Jason Cooper <jason@lakedaemon.net> #for kirkwood
      Acked-by: NKevin Hilman <khilman@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      a8e39c35
  8. 24 4月, 2013 1 次提交
  9. 23 4月, 2013 4 次提交
  10. 01 4月, 2013 4 次提交
  11. 01 2月, 2013 1 次提交
  12. 26 1月, 2013 1 次提交
    • P
      PM / tracing: remove deprecated power trace API · 43720bd6
      Paul Gortmaker 提交于
      The text in Documentation said it would be removed in 2.6.41;
      the text in the Kconfig said removal in the 3.1 release.  Either
      way you look at it, we are well past both, so push it off a cliff.
      
      Note that the POWER_CSTATE and the POWER_PSTATE are part of the
      legacy tracing API.  Remove all tracepoints which use these flags.
      As can be seen from context, most already have a trace entry via
      trace_cpu_idle anyways.
      
      Also, the cpufreq/cpufreq.c PSTATE one is actually unpaired, as
      compared to the CSTATE ones which all have a clear start/stop.
      As part of this, the trace_power_frequency also becomes orphaned,
      so it too is deleted.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      43720bd6
  13. 15 1月, 2013 1 次提交
  14. 12 1月, 2013 1 次提交
  15. 03 1月, 2013 3 次提交
  16. 27 11月, 2012 1 次提交
    • J
      cpuidle: Measure idle state durations with monotonic clock · a474a515
      Julius Werner 提交于
      Many cpuidle drivers measure their time spent in an idle state by
      reading the wallclock time before and after idling and calculating the
      difference. This leads to erroneous results when the wallclock time gets
      updated by another processor in the meantime, adding that clock
      adjustment to the idle state's time counter.
      
      If the clock adjustment was negative, the result is even worse due to an
      erroneous cast from int to unsigned long long of the last_residency
      variable. The negative 32 bit integer will zero-extend and result in a
      forward time jump of roughly four billion milliseconds or 1.3 hours on
      the idle state residency counter.
      
      This patch changes all affected cpuidle drivers to either use the
      monotonic clock for their measurements or make use of the generic time
      measurement wrapper in cpuidle.c, which was already working correctly.
      Some superfluous CLIs/STIs in the ACPI code are removed (interrupts
      should always already be disabled before entering the idle function, and
      not get reenabled until the generic wrapper has performed its second
      measurement). It also removes the erroneous cast, making sure that
      negative residency values are applied correctly even though they should
      not appear anymore.
      Signed-off-by: NJulius Werner <jwerner@chromium.org>
      Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Tested-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Acked-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Acked-by: NLen Brown <len.brown@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      a474a515
  17. 23 11月, 2012 1 次提交
    • L
      cpuidle: fix a suspicious RCU usage in menu governor · a093b93e
      Li Zhong 提交于
      I saw this suspicious RCU usage on the next tree of 11/15
      
      [   67.123404] ===============================
      [   67.123413] [ INFO: suspicious RCU usage. ]
      [   67.123423] 3.7.0-rc5-next-20121115-dirty #1 Not tainted
      [   67.123434] -------------------------------
      [   67.123444] include/trace/events/timer.h:186 suspicious rcu_dereference_check() usage!
      [   67.123458]
      [   67.123458] other info that might help us debug this:
      [   67.123458]
      [   67.123474]
      [   67.123474] RCU used illegally from idle CPU!
      [   67.123474] rcu_scheduler_active = 1, debug_locks = 0
      [   67.123493] RCU used illegally from extended quiescent state!
      [   67.123507] 1 lock held by swapper/1/0:
      [   67.123516]  #0:  (&cpu_base->lock){-.-...}, at: [<c0000000000979b0>] .__hrtimer_start_range_ns+0x28c/0x524
      [   67.123555]
      [   67.123555] stack backtrace:
      [   67.123566] Call Trace:
      [   67.123576] [c0000001e2ccb920] [c00000000001275c] .show_stack+0x78/0x184 (unreliable)
      [   67.123599] [c0000001e2ccb9d0] [c0000000000c15a0] .lockdep_rcu_suspicious+0x120/0x148
      [   67.123619] [c0000001e2ccba70] [c00000000009601c] .enqueue_hrtimer+0x1c0/0x1c8
      [   67.123639] [c0000001e2ccbb00] [c000000000097aa0] .__hrtimer_start_range_ns+0x37c/0x524
      [   67.123660] [c0000001e2ccbc20] [c0000000005c9698] .menu_select+0x508/0x5bc
      [   67.123678] [c0000001e2ccbd20] [c0000000005c740c] .cpuidle_idle_call+0xa8/0x6e4
      [   67.123699] [c0000001e2ccbdd0] [c0000000000459a0] .pSeries_idle+0x10/0x34
      [   67.123717] [c0000001e2ccbe40] [c000000000014dc8] .cpu_idle+0x130/0x280
      [   67.123738] [c0000001e2ccbee0] [c0000000006ffa8c] .start_secondary+0x378/0x384
      [   67.123758] [c0000001e2ccbf90] [c00000000000936c] .start_secondary_prolog+0x10/0x14
      
      hrtimer_start was added in 198fd638 and ae515197. The patch below tries
      to use RCU_NONIDLE around it to avoid the above report.
      Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      a093b93e
  18. 15 11月, 2012 10 次提交
    • D
      cpuidle: support multiple drivers · bf4d1b5d
      Daniel Lezcano 提交于
      With the tegra3 and the big.LITTLE [1] new architectures, several cpus
      with different characteristics (latencies and states) can co-exists on the
      system.
      
      The cpuidle framework has the limitation of handling only identical cpus.
      
      This patch removes this limitation by introducing the multiple driver support
      for cpuidle.
      
      This option is configurable at compile time and should be enabled for the
      architectures mentioned above. So there is no impact for the other platforms
      if the option is disabled. The option defaults to 'n'. Note the multiple drivers
      support is also compatible with the existing drivers, even if just one driver is
      needed, all the cpu will be tied to this driver using an extra small chunk of
      processor memory.
      
      The multiple driver support use a per-cpu driver pointer instead of a global
      variable and the accessor to this variable are done from a cpu context.
      
      In order to keep the compatibility with the existing drivers, the function
      'cpuidle_register_driver' and 'cpuidle_unregister_driver' will register
      the specified driver for all the cpus.
      
      The semantic for the output of /sys/devices/system/cpu/cpuidle/current_driver
      remains the same except the driver name will be related to the current cpu.
      
      The /sys/devices/system/cpu/cpu[0-9]/cpuidle/driver/name files are added
      allowing to read the per cpu driver name.
      
      [1] http://lwn.net/Articles/481055/Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Acked-by: NPeter De Schrijver <pdeschrijver@nvidia.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      bf4d1b5d
    • D
      cpuidle: prepare the cpuidle core to handle multiple drivers · 13dd52f1
      Daniel Lezcano 提交于
      This patch is a preparation for the multiple cpuidle drivers support.
      
      As the next patch will introduce the multiple drivers with the Kconfig
      option and we want to keep the code clean and understandable, this patch
      defines a set of functions for encapsulating some common parts and splits
      what should be done under a lock from the rest.
      
      [rjw: Modified the subject and changelog slightly.]
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Acked-by: NPeter De Schrijver <pdeschrijver@nvidia.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      13dd52f1
    • D
      cpuidle: move driver checking within the lock section · 41682032
      Daniel Lezcano 提交于
      The code is racy and the check with cpuidle_curr_driver should be
      done under the lock.
      
      I don't find a path in the different drivers where that could happen
      because the arch specific drivers are written in such way it is not
      possible to register a driver while it is unregistered, except maybe
      in a very improbable case when "intel_idle" and "processor_idle" are
      competing. One could unregister a driver, while the other one is
      registering.
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Acked-by: NPeter De Schrijver <pdeschrijver@nvidia.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      41682032
    • D
      cpuidle: move driver's refcount to cpuidle · 42f67f2a
      Daniel Lezcano 提交于
      We want to support different cpuidle drivers co-existing together.
      In this case we should move the refcount to the cpuidle_driver
      structure to handle several drivers at a time.
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Acked-by: NPeter De Schrijver <pdeschrijver@nvidia.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      42f67f2a
    • D
      cpuidle: fixup device.h header in cpuidle.h · 8f3e9953
      Daniel Lezcano 提交于
      The "struct device" is only used in sysfs.c.
      
      The other .c files including the private header "cpuidle.h"
      do not need to pull the entire headers tree from there as they
      don't manipulate the "struct device".
      
      This patch fixes this by moving the header inclusion to sysfs.c
      and adding a forward declaration for the struct device.
      
      The number of lines generated by the preprocesor:
      Without this patch : 17269 loc
      With this patch : 16446 loc
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      8f3e9953
    • D
      cpuidle / sysfs: move structure declaration into the sysfs.c file · 349631e0
      Daniel Lezcano 提交于
      The structure cpuidle_state_kobj is not used anywhere except
      in the sysfs.c file. The definition of this structure is not
      needed in the cpuidle header file. This patch moves it to the
      sysfs.c file in order to encapsulate the code a bit more.
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      349631e0
    • Y
      cpuidle: Get typical recent sleep interval · c96ca4fb
      Youquan Song 提交于
      The function detect_repeating_patterns was not very useful for
      workloads with alternating long and short pauses, for example
      virtual machines handling network requests for each other (say
      a web and database server).
      
      Instead, try to find a recent sleep interval that is somewhere
      between the median and the mode sleep time, by discarding outliers
      to the up side and recalculating the average and standard deviation
      until that is no longer required.
      
      This should do something sane with a sleep interval series like:
      
      	200 180 210 10000 30 1000 170 200
      
      The current code would simply discard such a series, while the
      new code will guess a typical sleep interval just shy of 200.
      
      The original patch come from Rik van Riel <riel@redhat.com>.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NYouquan Song <youquan.song@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      c96ca4fb
    • Y
      cpuidle: Set residency to 0 if target Cstate not enter · d73d68dc
      Youquan Song 提交于
      When cpuidle governor choose a C-state to enter for idle CPU, but it notice that
      there is tasks request to be executed. So the idle CPU will not really enter
      the target C-state and go to run task.
      
      In this situation, it will use the residency of previous really entered target
      C-states. Obviously, it is not reasonable.
      
      So, this patch fix it by set the target C-state residency to 0.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NYouquan Song <youquan.song@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      d73d68dc
    • Y
      cpuidle: Quickly notice prediction failure in general case · e11538d1
      Youquan Song 提交于
      The prediction for future is difficult and when the cpuidle governor prediction
      fails and govenor possibly choose the shallower C-state than it should. How to
      quickly notice and find the failure becomes important for power saving.
      
      The patch extends to general case that prediction logic get a small predicted
      residency, so it choose a shallow C-state though the expected residency is large
      . Once the prediction will be fail, the CPU will keep staying at shallow C-state
      for a long time. Acutally, the CPU has change enter into deep C-state.
      So when the expected residency is long enough but governor choose a shallow
      C-state, an timer will be added in order to monitor if the prediction failure.
      
      When C-state is waken up prior to the adding timer, the timer will be cancelled
      initiatively. When the timer is triggered and menu governor will quickly notice
      prediction failure and re-evaluates deeper C-states possibility.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NYouquan Song <youquan.song@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      e11538d1
    • Y
      cpuidle: Quickly notice prediction failure for repeat mode · 69a37bea
      Youquan Song 提交于
      The prediction for future is difficult and when the cpuidle governor prediction
      fails and govenor possibly choose the shallower C-state than it should. How to
      quickly notice and find the failure becomes important for power saving.
      
      cpuidle menu governor has a method to predict the repeat pattern if there are 8
      C-states residency which are continuous and the same or very close, so it will
      predict the next C-states residency will keep same residency time.
      
      There is a real case that turbostat utility (tools/power/x86/turbostat)
      at kernel 3.3 or early. turbostat utility will read 10 registers one by one at
      Sandybridge, so it will generate 10 IPIs to wake up idle CPUs. So cpuidle menu
       governor will predict it is repeat mode and there is another IPI wake up idle
       CPU soon, so it keeps idle CPU stay at C1 state even though CPU is totally
      idle. However, in the turbostat, following 10 registers reading is sleep 5
      seconds by default, so the idle CPU will keep at C1 for a long time though it is
       idle until break event occurs.
      In a idle Sandybridge system, run "./turbostat -v", we will notice that deep
      C-state dangles between "70% ~ 99%". After patched the kernel, we will notice
      deep C-state stays at >99.98%.
      
      In the patch, a timer is added when menu governor detects a repeat mode and
      choose a shallow C-state. The timer is set to a time out value that greater
      than predicted time, and we conclude repeat mode prediction failure if timer is
      triggered. When repeat mode happens as expected, the timer is not triggered
      and CPU waken up from C-states and it will cancel the timer initiatively.
      When repeat mode does not happen, the timer will be time out and menu governor
      will quickly notice that the repeat mode prediction fails and then re-evaluates
      deeper C-states possibility.
      
      Below is another case which will clearly show the patch much benefit:
      
      #include <stdlib.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <signal.h>
      #include <sys/time.h>
      #include <time.h>
      #include <pthread.h>
      
      volatile int * shutdown;
      volatile long * count;
      int delay = 20;
      int loop = 8;
      
      void usage(void)
      {
      	fprintf(stderr,
      		"Usage: idle_predict [options]\n"
      		"  --help	-h  Print this help\n"
      		"  --thread	-n  Thread number\n"
      		"  --loop     	-l  Loop times in shallow Cstate\n"
      		"  --delay	-t  Sleep time (uS)in shallow Cstate\n");
      }
      
      void *simple_loop() {
      	int idle_num = 1;
      	while (!(*shutdown)) {
      		*count = *count + 1;
      
      		if (idle_num % loop)
      			usleep(delay);
      		else {
      			/* sleep 1 second */
      			usleep(1000000);
      			idle_num = 0;
      		}
      		idle_num++;
      	}
      
      }
      
      static void sighand(int sig)
      {
      	*shutdown = 1;
      }
      
      int main(int argc, char *argv[])
      {
      	sigset_t sigset;
      	int signum = SIGALRM;
      	int i, c, er = 0, thread_num = 8;
      	pthread_t pt[1024];
      
      	static char optstr[] = "n:l:t:h:";
      
      	while ((c = getopt(argc, argv, optstr)) != EOF)
      		switch (c) {
      			case 'n':
      				thread_num = atoi(optarg);
      				break;
      			case 'l':
      				loop = atoi(optarg);
      				break;
      			case 't':
      				delay = atoi(optarg);
      				break;
      			case 'h':
      			default:
      				usage();
      				exit(1);
      		}
      
      	printf("thread=%d,loop=%d,delay=%d\n",thread_num,loop,delay);
      	count = malloc(sizeof(long));
      	shutdown = malloc(sizeof(int));
      	*count = 0;
      	*shutdown = 0;
      
      	sigemptyset(&sigset);
      	sigaddset(&sigset, signum);
      	sigprocmask (SIG_BLOCK, &sigset, NULL);
      	signal(SIGINT, sighand);
      	signal(SIGTERM, sighand);
      
      	for(i = 0; i < thread_num ; i++)
      		pthread_create(&pt[i], NULL, simple_loop, NULL);
      
      	for (i = 0; i < thread_num; i++)
      		pthread_join(pt[i], NULL);
      
      	exit(0);
      }
      
      Get powertop V2 from git://github.com/fenrus75/powertop, build powertop.
      After build the above test application, then run it.
      Test plaform can be Intel Sandybridge or other recent platforms.
      #./idle_predict -l 10 &
      #./powertop
      
      We will find that deep C-state will dangle between 40%~100% and much time spent
      on C1 state. It is because menu governor wrongly predict that repeat mode
      is kept, so it will choose the C1 shallow C-state even though it has chance to
      sleep 1 second in deep C-state.
      
      While after patched the kernel, we find that deep C-state will keep >99.6%.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NYouquan Song <youquan.song@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      69a37bea