1. 01 4月, 2013 4 次提交
  2. 01 2月, 2013 1 次提交
  3. 26 1月, 2013 1 次提交
    • P
      PM / tracing: remove deprecated power trace API · 43720bd6
      Paul Gortmaker 提交于
      The text in Documentation said it would be removed in 2.6.41;
      the text in the Kconfig said removal in the 3.1 release.  Either
      way you look at it, we are well past both, so push it off a cliff.
      
      Note that the POWER_CSTATE and the POWER_PSTATE are part of the
      legacy tracing API.  Remove all tracepoints which use these flags.
      As can be seen from context, most already have a trace entry via
      trace_cpu_idle anyways.
      
      Also, the cpufreq/cpufreq.c PSTATE one is actually unpaired, as
      compared to the CSTATE ones which all have a clear start/stop.
      As part of this, the trace_power_frequency also becomes orphaned,
      so it too is deleted.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      43720bd6
  4. 15 1月, 2013 1 次提交
  5. 12 1月, 2013 1 次提交
  6. 03 1月, 2013 3 次提交
  7. 27 11月, 2012 1 次提交
    • J
      cpuidle: Measure idle state durations with monotonic clock · a474a515
      Julius Werner 提交于
      Many cpuidle drivers measure their time spent in an idle state by
      reading the wallclock time before and after idling and calculating the
      difference. This leads to erroneous results when the wallclock time gets
      updated by another processor in the meantime, adding that clock
      adjustment to the idle state's time counter.
      
      If the clock adjustment was negative, the result is even worse due to an
      erroneous cast from int to unsigned long long of the last_residency
      variable. The negative 32 bit integer will zero-extend and result in a
      forward time jump of roughly four billion milliseconds or 1.3 hours on
      the idle state residency counter.
      
      This patch changes all affected cpuidle drivers to either use the
      monotonic clock for their measurements or make use of the generic time
      measurement wrapper in cpuidle.c, which was already working correctly.
      Some superfluous CLIs/STIs in the ACPI code are removed (interrupts
      should always already be disabled before entering the idle function, and
      not get reenabled until the generic wrapper has performed its second
      measurement). It also removes the erroneous cast, making sure that
      negative residency values are applied correctly even though they should
      not appear anymore.
      Signed-off-by: NJulius Werner <jwerner@chromium.org>
      Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Tested-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Acked-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Acked-by: NLen Brown <len.brown@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      a474a515
  8. 23 11月, 2012 1 次提交
    • L
      cpuidle: fix a suspicious RCU usage in menu governor · a093b93e
      Li Zhong 提交于
      I saw this suspicious RCU usage on the next tree of 11/15
      
      [   67.123404] ===============================
      [   67.123413] [ INFO: suspicious RCU usage. ]
      [   67.123423] 3.7.0-rc5-next-20121115-dirty #1 Not tainted
      [   67.123434] -------------------------------
      [   67.123444] include/trace/events/timer.h:186 suspicious rcu_dereference_check() usage!
      [   67.123458]
      [   67.123458] other info that might help us debug this:
      [   67.123458]
      [   67.123474]
      [   67.123474] RCU used illegally from idle CPU!
      [   67.123474] rcu_scheduler_active = 1, debug_locks = 0
      [   67.123493] RCU used illegally from extended quiescent state!
      [   67.123507] 1 lock held by swapper/1/0:
      [   67.123516]  #0:  (&cpu_base->lock){-.-...}, at: [<c0000000000979b0>] .__hrtimer_start_range_ns+0x28c/0x524
      [   67.123555]
      [   67.123555] stack backtrace:
      [   67.123566] Call Trace:
      [   67.123576] [c0000001e2ccb920] [c00000000001275c] .show_stack+0x78/0x184 (unreliable)
      [   67.123599] [c0000001e2ccb9d0] [c0000000000c15a0] .lockdep_rcu_suspicious+0x120/0x148
      [   67.123619] [c0000001e2ccba70] [c00000000009601c] .enqueue_hrtimer+0x1c0/0x1c8
      [   67.123639] [c0000001e2ccbb00] [c000000000097aa0] .__hrtimer_start_range_ns+0x37c/0x524
      [   67.123660] [c0000001e2ccbc20] [c0000000005c9698] .menu_select+0x508/0x5bc
      [   67.123678] [c0000001e2ccbd20] [c0000000005c740c] .cpuidle_idle_call+0xa8/0x6e4
      [   67.123699] [c0000001e2ccbdd0] [c0000000000459a0] .pSeries_idle+0x10/0x34
      [   67.123717] [c0000001e2ccbe40] [c000000000014dc8] .cpu_idle+0x130/0x280
      [   67.123738] [c0000001e2ccbee0] [c0000000006ffa8c] .start_secondary+0x378/0x384
      [   67.123758] [c0000001e2ccbf90] [c00000000000936c] .start_secondary_prolog+0x10/0x14
      
      hrtimer_start was added in 198fd638 and ae515197. The patch below tries
      to use RCU_NONIDLE around it to avoid the above report.
      Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      a093b93e
  9. 15 11月, 2012 12 次提交
    • D
      cpuidle: support multiple drivers · bf4d1b5d
      Daniel Lezcano 提交于
      With the tegra3 and the big.LITTLE [1] new architectures, several cpus
      with different characteristics (latencies and states) can co-exists on the
      system.
      
      The cpuidle framework has the limitation of handling only identical cpus.
      
      This patch removes this limitation by introducing the multiple driver support
      for cpuidle.
      
      This option is configurable at compile time and should be enabled for the
      architectures mentioned above. So there is no impact for the other platforms
      if the option is disabled. The option defaults to 'n'. Note the multiple drivers
      support is also compatible with the existing drivers, even if just one driver is
      needed, all the cpu will be tied to this driver using an extra small chunk of
      processor memory.
      
      The multiple driver support use a per-cpu driver pointer instead of a global
      variable and the accessor to this variable are done from a cpu context.
      
      In order to keep the compatibility with the existing drivers, the function
      'cpuidle_register_driver' and 'cpuidle_unregister_driver' will register
      the specified driver for all the cpus.
      
      The semantic for the output of /sys/devices/system/cpu/cpuidle/current_driver
      remains the same except the driver name will be related to the current cpu.
      
      The /sys/devices/system/cpu/cpu[0-9]/cpuidle/driver/name files are added
      allowing to read the per cpu driver name.
      
      [1] http://lwn.net/Articles/481055/Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Acked-by: NPeter De Schrijver <pdeschrijver@nvidia.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      bf4d1b5d
    • D
      cpuidle: prepare the cpuidle core to handle multiple drivers · 13dd52f1
      Daniel Lezcano 提交于
      This patch is a preparation for the multiple cpuidle drivers support.
      
      As the next patch will introduce the multiple drivers with the Kconfig
      option and we want to keep the code clean and understandable, this patch
      defines a set of functions for encapsulating some common parts and splits
      what should be done under a lock from the rest.
      
      [rjw: Modified the subject and changelog slightly.]
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Acked-by: NPeter De Schrijver <pdeschrijver@nvidia.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      13dd52f1
    • D
      cpuidle: move driver checking within the lock section · 41682032
      Daniel Lezcano 提交于
      The code is racy and the check with cpuidle_curr_driver should be
      done under the lock.
      
      I don't find a path in the different drivers where that could happen
      because the arch specific drivers are written in such way it is not
      possible to register a driver while it is unregistered, except maybe
      in a very improbable case when "intel_idle" and "processor_idle" are
      competing. One could unregister a driver, while the other one is
      registering.
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Acked-by: NPeter De Schrijver <pdeschrijver@nvidia.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      41682032
    • D
      cpuidle: move driver's refcount to cpuidle · 42f67f2a
      Daniel Lezcano 提交于
      We want to support different cpuidle drivers co-existing together.
      In this case we should move the refcount to the cpuidle_driver
      structure to handle several drivers at a time.
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Acked-by: NPeter De Schrijver <pdeschrijver@nvidia.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      42f67f2a
    • D
      cpuidle: fixup device.h header in cpuidle.h · 8f3e9953
      Daniel Lezcano 提交于
      The "struct device" is only used in sysfs.c.
      
      The other .c files including the private header "cpuidle.h"
      do not need to pull the entire headers tree from there as they
      don't manipulate the "struct device".
      
      This patch fixes this by moving the header inclusion to sysfs.c
      and adding a forward declaration for the struct device.
      
      The number of lines generated by the preprocesor:
      Without this patch : 17269 loc
      With this patch : 16446 loc
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      8f3e9953
    • D
      cpuidle / sysfs: move structure declaration into the sysfs.c file · 349631e0
      Daniel Lezcano 提交于
      The structure cpuidle_state_kobj is not used anywhere except
      in the sysfs.c file. The definition of this structure is not
      needed in the cpuidle header file. This patch moves it to the
      sysfs.c file in order to encapsulate the code a bit more.
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      349631e0
    • Y
      cpuidle: Get typical recent sleep interval · c96ca4fb
      Youquan Song 提交于
      The function detect_repeating_patterns was not very useful for
      workloads with alternating long and short pauses, for example
      virtual machines handling network requests for each other (say
      a web and database server).
      
      Instead, try to find a recent sleep interval that is somewhere
      between the median and the mode sleep time, by discarding outliers
      to the up side and recalculating the average and standard deviation
      until that is no longer required.
      
      This should do something sane with a sleep interval series like:
      
      	200 180 210 10000 30 1000 170 200
      
      The current code would simply discard such a series, while the
      new code will guess a typical sleep interval just shy of 200.
      
      The original patch come from Rik van Riel <riel@redhat.com>.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NYouquan Song <youquan.song@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      c96ca4fb
    • Y
      cpuidle: Set residency to 0 if target Cstate not enter · d73d68dc
      Youquan Song 提交于
      When cpuidle governor choose a C-state to enter for idle CPU, but it notice that
      there is tasks request to be executed. So the idle CPU will not really enter
      the target C-state and go to run task.
      
      In this situation, it will use the residency of previous really entered target
      C-states. Obviously, it is not reasonable.
      
      So, this patch fix it by set the target C-state residency to 0.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NYouquan Song <youquan.song@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      d73d68dc
    • Y
      cpuidle: Quickly notice prediction failure in general case · e11538d1
      Youquan Song 提交于
      The prediction for future is difficult and when the cpuidle governor prediction
      fails and govenor possibly choose the shallower C-state than it should. How to
      quickly notice and find the failure becomes important for power saving.
      
      The patch extends to general case that prediction logic get a small predicted
      residency, so it choose a shallow C-state though the expected residency is large
      . Once the prediction will be fail, the CPU will keep staying at shallow C-state
      for a long time. Acutally, the CPU has change enter into deep C-state.
      So when the expected residency is long enough but governor choose a shallow
      C-state, an timer will be added in order to monitor if the prediction failure.
      
      When C-state is waken up prior to the adding timer, the timer will be cancelled
      initiatively. When the timer is triggered and menu governor will quickly notice
      prediction failure and re-evaluates deeper C-states possibility.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NYouquan Song <youquan.song@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      e11538d1
    • Y
      cpuidle: Quickly notice prediction failure for repeat mode · 69a37bea
      Youquan Song 提交于
      The prediction for future is difficult and when the cpuidle governor prediction
      fails and govenor possibly choose the shallower C-state than it should. How to
      quickly notice and find the failure becomes important for power saving.
      
      cpuidle menu governor has a method to predict the repeat pattern if there are 8
      C-states residency which are continuous and the same or very close, so it will
      predict the next C-states residency will keep same residency time.
      
      There is a real case that turbostat utility (tools/power/x86/turbostat)
      at kernel 3.3 or early. turbostat utility will read 10 registers one by one at
      Sandybridge, so it will generate 10 IPIs to wake up idle CPUs. So cpuidle menu
       governor will predict it is repeat mode and there is another IPI wake up idle
       CPU soon, so it keeps idle CPU stay at C1 state even though CPU is totally
      idle. However, in the turbostat, following 10 registers reading is sleep 5
      seconds by default, so the idle CPU will keep at C1 for a long time though it is
       idle until break event occurs.
      In a idle Sandybridge system, run "./turbostat -v", we will notice that deep
      C-state dangles between "70% ~ 99%". After patched the kernel, we will notice
      deep C-state stays at >99.98%.
      
      In the patch, a timer is added when menu governor detects a repeat mode and
      choose a shallow C-state. The timer is set to a time out value that greater
      than predicted time, and we conclude repeat mode prediction failure if timer is
      triggered. When repeat mode happens as expected, the timer is not triggered
      and CPU waken up from C-states and it will cancel the timer initiatively.
      When repeat mode does not happen, the timer will be time out and menu governor
      will quickly notice that the repeat mode prediction fails and then re-evaluates
      deeper C-states possibility.
      
      Below is another case which will clearly show the patch much benefit:
      
      #include <stdlib.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <signal.h>
      #include <sys/time.h>
      #include <time.h>
      #include <pthread.h>
      
      volatile int * shutdown;
      volatile long * count;
      int delay = 20;
      int loop = 8;
      
      void usage(void)
      {
      	fprintf(stderr,
      		"Usage: idle_predict [options]\n"
      		"  --help	-h  Print this help\n"
      		"  --thread	-n  Thread number\n"
      		"  --loop     	-l  Loop times in shallow Cstate\n"
      		"  --delay	-t  Sleep time (uS)in shallow Cstate\n");
      }
      
      void *simple_loop() {
      	int idle_num = 1;
      	while (!(*shutdown)) {
      		*count = *count + 1;
      
      		if (idle_num % loop)
      			usleep(delay);
      		else {
      			/* sleep 1 second */
      			usleep(1000000);
      			idle_num = 0;
      		}
      		idle_num++;
      	}
      
      }
      
      static void sighand(int sig)
      {
      	*shutdown = 1;
      }
      
      int main(int argc, char *argv[])
      {
      	sigset_t sigset;
      	int signum = SIGALRM;
      	int i, c, er = 0, thread_num = 8;
      	pthread_t pt[1024];
      
      	static char optstr[] = "n:l:t:h:";
      
      	while ((c = getopt(argc, argv, optstr)) != EOF)
      		switch (c) {
      			case 'n':
      				thread_num = atoi(optarg);
      				break;
      			case 'l':
      				loop = atoi(optarg);
      				break;
      			case 't':
      				delay = atoi(optarg);
      				break;
      			case 'h':
      			default:
      				usage();
      				exit(1);
      		}
      
      	printf("thread=%d,loop=%d,delay=%d\n",thread_num,loop,delay);
      	count = malloc(sizeof(long));
      	shutdown = malloc(sizeof(int));
      	*count = 0;
      	*shutdown = 0;
      
      	sigemptyset(&sigset);
      	sigaddset(&sigset, signum);
      	sigprocmask (SIG_BLOCK, &sigset, NULL);
      	signal(SIGINT, sighand);
      	signal(SIGTERM, sighand);
      
      	for(i = 0; i < thread_num ; i++)
      		pthread_create(&pt[i], NULL, simple_loop, NULL);
      
      	for (i = 0; i < thread_num; i++)
      		pthread_join(pt[i], NULL);
      
      	exit(0);
      }
      
      Get powertop V2 from git://github.com/fenrus75/powertop, build powertop.
      After build the above test application, then run it.
      Test plaform can be Intel Sandybridge or other recent platforms.
      #./idle_predict -l 10 &
      #./powertop
      
      We will find that deep C-state will dangle between 40%~100% and much time spent
      on C1 state. It is because menu governor wrongly predict that repeat mode
      is kept, so it will choose the C1 shallow C-state even though it has chance to
      sleep 1 second in deep C-state.
      
      While after patched the kernel, we find that deep C-state will keep >99.6%.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NYouquan Song <youquan.song@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      69a37bea
    • D
      cpuidle / sysfs: move kobj initialization in the syfs file · e45a00d6
      Daniel Lezcano 提交于
      Move the kobj initialization and completion in the sysfs.c
      and encapsulate the code more.
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      e45a00d6
    • D
      cpuidle / sysfs: change function parameter · 1aef40e2
      Daniel Lezcano 提交于
      The function needs the cpuidle_device which is initially passed to the
      caller.
      
      The current code gets the struct device from the struct cpuidle_device,
      pass it the cpuidle_add_sysfs function. This function calls
      per_cpu(cpuidle_devices, cpu) to get the cpuidle_device.
      
      This patch pass the cpuidle_device instead and simplify the code.
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      1aef40e2
  10. 08 11月, 2012 1 次提交
  11. 09 10月, 2012 1 次提交
    • S
      ACPI idle, CPU hotplug: Fix NULL pointer dereference during hotplug · cf31cd1a
      Srivatsa S. Bhat 提交于
      On a KVM guest, when a CPU is taken offline and brought back online, we hit
      the following NULL pointer dereference:
      
      [   45.400843] Unregister pv shared memory for cpu 1
      [   45.412331] smpboot: CPU 1 is now offline
      [   45.529894] SMP alternatives: lockdep: fixing up alternatives
      [   45.533472] smpboot: Booting Node 0 Processor 1 APIC 0x1
      [   45.411526] kvm-clock: cpu 1, msr 0:7d14601, secondary cpu clock
      [   45.571370] KVM setup async PF for cpu 1
      [   45.572331] kvm-stealtime: cpu 1, msr 7d0e040
      [   45.575031] BUG: unable to handle kernel NULL pointer dereference at           (null)
      [   45.576017] IP: [<ffffffff81519f98>] cpuidle_disable_device+0x18/0x80
      [   45.576017] PGD 5dfb067 PUD 5da8067 PMD 0
      [   45.576017] Oops: 0000 [#1] SMP
      [   45.576017] Modules linked in:
      [   45.576017] CPU 0
      [   45.576017] Pid: 607, comm: stress_cpu_hotp Not tainted 3.6.0-padata-tp-debug #3 Bochs Bochs
      [   45.576017] RIP: 0010:[<ffffffff81519f98>]  [<ffffffff81519f98>] cpuidle_disable_device+0x18/0x80
      [   45.576017] RSP: 0018:ffff880005d93ce8  EFLAGS: 00010286
      [   45.576017] RAX: ffff880005d93fd8 RBX: 0000000000000000 RCX: 0000000000000006
      [   45.576017] RDX: 0000000000000006 RSI: 2222222222222222 RDI: 0000000000000000
      [   45.576017] RBP: ffff880005d93cf8 R08: 2222222222222222 R09: 2222222222222222
      [   45.576017] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      [   45.576017] R13: 0000000000000000 R14: ffffffff81c8cca0 R15: 0000000000000001
      [   45.576017] FS:  00007f91936ae700(0000) GS:ffff880007c00000(0000) knlGS:0000000000000000
      [   45.576017] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [   45.576017] CR2: 0000000000000000 CR3: 0000000005db3000 CR4: 00000000000006f0
      [   45.576017] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   45.576017] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [   45.576017] Process stress_cpu_hotp (pid: 607, threadinfo ffff880005d92000, task ffff8800066bbf40)
      [   45.576017] Stack:
      [   45.576017]  ffff880007a96400 0000000000000000 ffff880005d93d28 ffffffff813ac689
      [   45.576017]  ffff880007a96400 ffff880007a96400 0000000000000002 ffffffff81cd8d01
      [   45.576017]  ffff880005d93d58 ffffffff813aa498 0000000000000001 00000000ffffffdd
      [   45.576017] Call Trace:
      [   45.576017]  [<ffffffff813ac689>] acpi_processor_hotplug+0x55/0x97
      [   45.576017]  [<ffffffff813aa498>] acpi_cpu_soft_notify+0x93/0xce
      [   45.576017]  [<ffffffff816ae47d>] notifier_call_chain+0x5d/0x110
      [   45.576017]  [<ffffffff8109730e>] __raw_notifier_call_chain+0xe/0x10
      [   45.576017]  [<ffffffff81069050>] __cpu_notify+0x20/0x40
      [   45.576017]  [<ffffffff81069085>] cpu_notify+0x15/0x20
      [   45.576017]  [<ffffffff816978f1>] _cpu_up+0xee/0x137
      [   45.576017]  [<ffffffff81697983>] cpu_up+0x49/0x59
      [   45.576017]  [<ffffffff8168758d>] store_online+0x9d/0xe0
      [   45.576017]  [<ffffffff8140a9f8>] dev_attr_store+0x18/0x30
      [   45.576017]  [<ffffffff812322c0>] sysfs_write_file+0xe0/0x150
      [   45.576017]  [<ffffffff811b389c>] vfs_write+0xac/0x180
      [   45.576017]  [<ffffffff811b3be2>] sys_write+0x52/0xa0
      [   45.576017]  [<ffffffff816b31e9>] system_call_fastpath+0x16/0x1b
      [   45.576017] Code: 48 c7 c7 40 e5 ca 81 e8 07 d0 18 00 5d c3 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 48 83 ec 10 48 89 5d f0 4c 89 65 f8 48 89 fb <f6> 07 02 75 13 48 8b 5d f0 4c 8b 65 f8 c9 c3 66 0f 1f 84 00 00
      [   45.576017] RIP  [<ffffffff81519f98>] cpuidle_disable_device+0x18/0x80
      [   45.576017]  RSP <ffff880005d93ce8>
      [   45.576017] CR2: 0000000000000000
      [   45.656079] ---[ end trace 433d6c9ac0b02cef ]---
      
      Analysis:
      Commit 3d339dcb (cpuidle / ACPI : move cpuidle_device field out of the
      acpi_processor_power structure()) made the allocation of the dev structure
      (struct cpuidle) of a CPU dynamic, whereas previously it was statically
      allocated. And this dynamic allocation occurs in acpi_processor_power_init()
      if pr->flags.power evaluates to non-zero.
      
      On KVM guests, pr->flags.power evaluates to zero, hence dev is never
      allocated. This causes the NULL pointer (dev) dereference in
      cpuidle_disable_device() during a subsequent CPU online operation. Fix this
      by ensuring that dev is non-NULL before dereferencing.
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      cf31cd1a
  12. 22 9月, 2012 1 次提交
    • D
      cpuidle: rename function name "__cpuidle_register_driver", v2 · ed953472
      Daniel Lezcano 提交于
      The function __cpuidle_register_driver name is confusing because it
      suggests, conforming to the coding style of the kernel, it registers
      the driver without taking a lock. Actually, it just fill the different
      power field states with a decresing value if the power has not been
      specified.
      
      Clarify the purpose of the function by changing its name and
      move the condition out of this function.
      
      This patch fix nothing and does not change the behavior of the
      function. It is just for the sake of clarity.
      
      IHMO, reading in the code:
      
      +       if (!drv->power_specified)
      +               set_power_states(drv);
      
      is much more explicit than:
      
      -       __cpuidle_register_driver(drv);
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      ed953472
  13. 20 9月, 2012 1 次提交
  14. 04 9月, 2012 2 次提交
    • R
      PM / cpuidle: Make ladder governor use the "disabled" state flag · 66804c13
      Rafael J. Wysocki 提交于
      For the mechanism introduced by commit cbc9ef02 (PM / Domains: Add
      preliminary support for cpuidle, v2) to work with the ladder
      governor, that governor should respect the "disabled" state flag
      added by that commit.  Change the ladder governor accordingly.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      66804c13
    • C
      Honor state disabling in the cpuidle ladder governor · 62d6ae88
      Carsten Emde 提交于
      There are two cpuidle governors ladder and menu. While the ladder
      governor is always available, if CONFIG_CPU_IDLE is selected, the
      menu governor additionally requires CONFIG_NO_HZ.
      
      A particular C state can be disabled by writing to the sysfs file
      /sys/devices/system/cpu/cpuN/cpuidle/stateN/disable, but this mechanism
      is only implemented in the menu governor. Thus, in a system where
      CONFIG_NO_HZ is not selected, the ladder governor becomes default and
      always will walk through all sleep states - irrespective of whether the
      C state was disabled via sysfs or not. The only way to select a specific
      C state was to write the related latency to /dev/cpu_dma_latency and
      keep the file open as long as this setting was required - not very
      practical and not suitable for setting a single core in an SMP system.
      
      With this patch, the ladder governor only will promote to the next
      C state, if it has not been disabled, and it will demote, if the
      current C state was disabled.
      
      Note that the patch does not make the setting of the sysfs variable
      "disable" coherent, i.e. if one is disabling a light state, then all
      deeper states are disabled as well, but the "disable" variable does not
      reflect it. Likewise, if one enables a deep state but a lighter state
      still is disabled, then this has no effect. A related section has been
      added to the documentation.
      Signed-off-by: NCarsten Emde <C.Emde@osadl.org>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      62d6ae88
  15. 18 8月, 2012 2 次提交
  16. 11 7月, 2012 1 次提交
    • P
      PM / cpuidle: System resume hang fix with cpuidle · 8651f97b
      Preeti U Murthy 提交于
      On certain bios, resume hangs if cpus are allowed to enter idle states
      during suspend [1].
      
      This was fixed in apci idle driver [2].But intel_idle driver does not
      have this fix. Thus instead of replicating the fix in both the idle
      drivers, or in more platform specific idle drivers if needed, the
      more general cpuidle infrastructure could handle this.
      
      A suspend callback in cpuidle_driver could handle this fix. But
      a cpuidle_driver provides only basic functionalities like platform idle
      state detection capability and mechanisms to support entry and exit
      into CPU idle states. All other cpuidle functions are found in the
      cpuidle generic infrastructure for good reason that all cpuidle
      drivers, irrepective of their platforms will support these functions.
      
      One option therefore would be to register a suspend callback in cpuidle
      which handles this fix. This could be called through a PM_SUSPEND_PREPARE
      notifier. But this is too generic a notfier for a driver to handle.
      
      Also, ideally the job of cpuidle is not to handle side effects of suspend.
      It should expose the interfaces which "handle cpuidle 'during' suspend"
      or any other operation, which the subsystems call during that respective
      operation.
      
      The fix demands that during suspend, no cpus should be allowed to enter
      deep C-states. The interface cpuidle_uninstall_idle_handler() in cpuidle
      ensures that. Not just that it also kicks all the cpus which are already
      in idle out of their idle states which was being done during cpu hotplug
      through a CPU_DYING_FROZEN callbacks.
      
      Now the question arises about when during suspend should
      cpuidle_uninstall_idle_handler() be called. Since we are dealing with
      drivers it seems best to call this function during dpm_suspend().
      Delaying the call till dpm_suspend_noirq() does no harm, as long as it is
      before cpu_hotplug_begin() to avoid race conditions with cpu hotpulg
      operations. In dpm_suspend_noirq(), it would be wise to place this call
      before suspend_device_irqs() to avoid ugly interactions with the same.
      
      Ananlogously, during resume.
      
      References:
      [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/674075.
      [2] http://marc.info/?l=linux-pm&m=133958534231884&w=2Reported-and-tested-by: NDave Hansen <dave@linux.vnet.ibm.com>
      Signed-off-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Reviewed-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      8651f97b
  17. 04 7月, 2012 3 次提交
    • R
      PM / Domains: Add preliminary support for cpuidle, v2 · cbc9ef02
      Rafael J. Wysocki 提交于
      On some systems there are CPU cores located in the same power
      domains as I/O devices.  Then, power can only be removed from the
      domain if all I/O devices in it are not in use and the CPU core
      is idle.  Add preliminary support for that to the generic PM domains
      framework.
      
      First, the platform is expected to provide a cpuidle driver with one
      extra state designated for use with the generic PM domains code.
      This state should be initially disabled and its exit_latency value
      should be set to whatever time is needed to bring up the CPU core
      itself after restoring power to it, not including the domain's
      power on latency.  Its .enter() callback should point to a procedure
      that will remove power from the domain containing the CPU core at
      the end of the CPU power transition.
      
      The remaining characteristics of the extra cpuidle state, referred to
      as the "domain" cpuidle state below, (e.g. power usage, target
      residency) should be populated in accordance with the properties of
      the hardware.
      
      Next, the platform should execute genpd_attach_cpuidle() on the PM
      domain containing the CPU core.  That will cause the generic PM
      domains framework to treat that domain in a special way such that:
      
       * When all devices in the domain have been suspended and it is about
         to be turned off, the states of the devices will be saved, but
         power will not be removed from the domain.  Instead, the "domain"
         cpuidle state will be enabled so that power can be removed from
         the domain when the CPU core is idle and the state has been chosen
         as the target by the cpuidle governor.
      
       * When the first I/O device in the domain is resumed and
         __pm_genpd_poweron(() is called for the first time after
         power has been removed from the domain, the "domain" cpuidle
         state will be disabled to avoid subsequent surprise power removals
         via cpuidle.
      
      The effective exit_latency value of the "domain" cpuidle state
      depends on the time needed to bring up the CPU core itself after
      restoring power to it as well as on the power on latency of the
      domain containing the CPU core.  Thus the "domain" cpuidle state's
      exit_latency has to be recomputed every time the domain's power on
      latency is updated, which may happen every time power is restored
      to the domain, if the measured power on latency is greater than
      the latency stored in the corresponding generic_pm_domain structure.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Reviewed-by: NKevin Hilman <khilman@ti.com>
      cbc9ef02
    • R
      PM / cpuidle: Add driver reference counter · 6e797a07
      Rafael J. Wysocki 提交于
      Add a reference counter for the cpuidle driver, so that it can't
      be unregistered when it is in use.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      6e797a07
    • S
      cpuidle: move field disable from per-driver to per-cpu · dc7fd275
      ShuoX Liu 提交于
      Andrew J.Schorr raises a question.  When he changes the disable setting on
      a single CPU, it affects all the other CPUs.  Basically, currently, the
      disable field is per-driver instead of per-cpu.  All the C states of the
      same driver are shared by all CPU in the same machine.
      
      The patch changes the `disable' field to per-cpu, so we could set this
      separately for each cpu.
      Signed-off-by: NShuoX Liu <shuox.liu@intel.com>
      Reported-by: NAndrew J.Schorr <aschorr@telemetry-investments.com>
      Reviewed-by: NYanmin Zhang <yanmin_zhang@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      dc7fd275
  18. 02 6月, 2012 3 次提交
    • C
      cpuidle: coupled: add parallel barrier function · 20ff51a3
      Colin Cross 提交于
      Adds cpuidle_coupled_parallel_barrier, which can be used by coupled
      cpuidle state enter functions to handle resynchronization after
      determining if any cpu needs to abort.  The normal use case will
      be:
      
      static bool abort_flag;
      static atomic_t abort_barrier;
      
      int arch_cpuidle_enter(struct cpuidle_device *dev, ...)
      {
      	if (arch_turn_off_irq_controller()) {
      	   	/* returns an error if an irq is pending and would be lost
      		   if idle continued and turned off power */
      		abort_flag = true;
      	}
      
      	cpuidle_coupled_parallel_barrier(dev, &abort_barrier);
      
      	if (abort_flag) {
      	   	/* One of the cpus didn't turn off it's irq controller */
      	   	arch_turn_on_irq_controller();
      		return -EINTR;
      	}
      
      	/* continue with idle */
      	...
      }
      
      This will cause all cpus to abort idle together if one of them needs
      to abort.
      Reviewed-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Tested-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Reviewed-by: NKevin Hilman <khilman@ti.com>
      Tested-by: NKevin Hilman <khilman@ti.com>
      Signed-off-by: NColin Cross <ccross@android.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      20ff51a3
    • C
      cpuidle: add support for states that affect multiple cpus · 4126c019
      Colin Cross 提交于
      On some ARM SMP SoCs (OMAP4460, Tegra 2, and probably more), the
      cpus cannot be independently powered down, either due to
      sequencing restrictions (on Tegra 2, cpu 0 must be the last to
      power down), or due to HW bugs (on OMAP4460, a cpu powering up
      will corrupt the gic state unless the other cpu runs a work
      around).  Each cpu has a power state that it can enter without
      coordinating with the other cpu (usually Wait For Interrupt, or
      WFI), and one or more "coupled" power states that affect blocks
      shared between the cpus (L2 cache, interrupt controller, and
      sometimes the whole SoC).  Entering a coupled power state must
      be tightly controlled on both cpus.
      
      The easiest solution to implementing coupled cpu power states is
      to hotplug all but one cpu whenever possible, usually using a
      cpufreq governor that looks at cpu load to determine when to
      enable the secondary cpus.  This causes problems, as hotplug is an
      expensive operation, so the number of hotplug transitions must be
      minimized, leading to very slow response to loads, often on the
      order of seconds.
      
      This file implements an alternative solution, where each cpu will
      wait in the WFI state until all cpus are ready to enter a coupled
      state, at which point the coupled state function will be called
      on all cpus at approximately the same time.
      
      Once all cpus are ready to enter idle, they are woken by an smp
      cross call.  At this point, there is a chance that one of the
      cpus will find work to do, and choose not to enter idle.  A
      final pass is needed to guarantee that all cpus will call the
      power state enter function at the same time.  During this pass,
      each cpu will increment the ready counter, and continue once the
      ready counter matches the number of online coupled cpus.  If any
      cpu exits idle, the other cpus will decrement their counter and
      retry.
      
      To use coupled cpuidle states, a cpuidle driver must:
      
         Set struct cpuidle_device.coupled_cpus to the mask of all
         coupled cpus, usually the same as cpu_possible_mask if all cpus
         are part of the same cluster.  The coupled_cpus mask must be
         set in the struct cpuidle_device for each cpu.
      
         Set struct cpuidle_device.safe_state to a state that is not a
         coupled state.  This is usually WFI.
      
         Set CPUIDLE_FLAG_COUPLED in struct cpuidle_state.flags for each
         state that affects multiple cpus.
      
         Provide a struct cpuidle_state.enter function for each state
         that affects multiple cpus.  This function is guaranteed to be
         called on all cpus at approximately the same time.  The driver
         should ensure that the cpus all abort together if any cpu tries
         to abort once the function is called.
      
      update1:
      
      cpuidle: coupled: fix count of online cpus
      
      online_count was never incremented on boot, and was also counting
      cpus that were not part of the coupled set.  Fix both issues by
      introducting a new function that counts online coupled cpus, and
      call it from register as well as the hotplug notifier.
      
      update2:
      
      cpuidle: coupled: fix decrementing ready count
      
      cpuidle_coupled_set_not_ready sometimes refuses to decrement the
      ready count in order to prevent a race condition.  This makes it
      unsuitable for use when finished with idle.  Add a new function
      cpuidle_coupled_set_done that decrements both the ready count and
      waiting count, and call it after idle is complete.
      
      Cc: Amit Kucheria <amit.kucheria@linaro.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Trinabh Gupta <g.trinabh@gmail.com>
      Cc: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
      Reviewed-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Tested-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Reviewed-by: NKevin Hilman <khilman@ti.com>
      Tested-by: NKevin Hilman <khilman@ti.com>
      Signed-off-by: NColin Cross <ccross@android.com>
      Acked-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      4126c019
    • C
      cpuidle: fix error handling in __cpuidle_register_device · 3af272ab
      Colin Cross 提交于
      Fix the error handling in __cpuidle_register_device to include
      the missing list_del.  Move it to a label, which will simplify
      the error handling when coupled states are added.
      Reviewed-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Tested-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Reviewed-by: NKevin Hilman <khilman@ti.com>
      Tested-by: NKevin Hilman <khilman@ti.com>
      Signed-off-by: NColin Cross <ccross@android.com>
      Reviewed-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      3af272ab