1. 17 1月, 2013 1 次提交
  2. 15 11月, 2012 1 次提交
    • Y
      cpuidle: Quickly notice prediction failure for repeat mode · 69a37bea
      Youquan Song 提交于
      The prediction for future is difficult and when the cpuidle governor prediction
      fails and govenor possibly choose the shallower C-state than it should. How to
      quickly notice and find the failure becomes important for power saving.
      
      cpuidle menu governor has a method to predict the repeat pattern if there are 8
      C-states residency which are continuous and the same or very close, so it will
      predict the next C-states residency will keep same residency time.
      
      There is a real case that turbostat utility (tools/power/x86/turbostat)
      at kernel 3.3 or early. turbostat utility will read 10 registers one by one at
      Sandybridge, so it will generate 10 IPIs to wake up idle CPUs. So cpuidle menu
       governor will predict it is repeat mode and there is another IPI wake up idle
       CPU soon, so it keeps idle CPU stay at C1 state even though CPU is totally
      idle. However, in the turbostat, following 10 registers reading is sleep 5
      seconds by default, so the idle CPU will keep at C1 for a long time though it is
       idle until break event occurs.
      In a idle Sandybridge system, run "./turbostat -v", we will notice that deep
      C-state dangles between "70% ~ 99%". After patched the kernel, we will notice
      deep C-state stays at >99.98%.
      
      In the patch, a timer is added when menu governor detects a repeat mode and
      choose a shallow C-state. The timer is set to a time out value that greater
      than predicted time, and we conclude repeat mode prediction failure if timer is
      triggered. When repeat mode happens as expected, the timer is not triggered
      and CPU waken up from C-states and it will cancel the timer initiatively.
      When repeat mode does not happen, the timer will be time out and menu governor
      will quickly notice that the repeat mode prediction fails and then re-evaluates
      deeper C-states possibility.
      
      Below is another case which will clearly show the patch much benefit:
      
      #include <stdlib.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <signal.h>
      #include <sys/time.h>
      #include <time.h>
      #include <pthread.h>
      
      volatile int * shutdown;
      volatile long * count;
      int delay = 20;
      int loop = 8;
      
      void usage(void)
      {
      	fprintf(stderr,
      		"Usage: idle_predict [options]\n"
      		"  --help	-h  Print this help\n"
      		"  --thread	-n  Thread number\n"
      		"  --loop     	-l  Loop times in shallow Cstate\n"
      		"  --delay	-t  Sleep time (uS)in shallow Cstate\n");
      }
      
      void *simple_loop() {
      	int idle_num = 1;
      	while (!(*shutdown)) {
      		*count = *count + 1;
      
      		if (idle_num % loop)
      			usleep(delay);
      		else {
      			/* sleep 1 second */
      			usleep(1000000);
      			idle_num = 0;
      		}
      		idle_num++;
      	}
      
      }
      
      static void sighand(int sig)
      {
      	*shutdown = 1;
      }
      
      int main(int argc, char *argv[])
      {
      	sigset_t sigset;
      	int signum = SIGALRM;
      	int i, c, er = 0, thread_num = 8;
      	pthread_t pt[1024];
      
      	static char optstr[] = "n:l:t:h:";
      
      	while ((c = getopt(argc, argv, optstr)) != EOF)
      		switch (c) {
      			case 'n':
      				thread_num = atoi(optarg);
      				break;
      			case 'l':
      				loop = atoi(optarg);
      				break;
      			case 't':
      				delay = atoi(optarg);
      				break;
      			case 'h':
      			default:
      				usage();
      				exit(1);
      		}
      
      	printf("thread=%d,loop=%d,delay=%d\n",thread_num,loop,delay);
      	count = malloc(sizeof(long));
      	shutdown = malloc(sizeof(int));
      	*count = 0;
      	*shutdown = 0;
      
      	sigemptyset(&sigset);
      	sigaddset(&sigset, signum);
      	sigprocmask (SIG_BLOCK, &sigset, NULL);
      	signal(SIGINT, sighand);
      	signal(SIGTERM, sighand);
      
      	for(i = 0; i < thread_num ; i++)
      		pthread_create(&pt[i], NULL, simple_loop, NULL);
      
      	for (i = 0; i < thread_num; i++)
      		pthread_join(pt[i], NULL);
      
      	exit(0);
      }
      
      Get powertop V2 from git://github.com/fenrus75/powertop, build powertop.
      After build the above test application, then run it.
      Test plaform can be Intel Sandybridge or other recent platforms.
      #./idle_predict -l 10 &
      #./powertop
      
      We will find that deep C-state will dangle between 40%~100% and much time spent
      on C1 state. It is because menu governor wrongly predict that repeat mode
      is kept, so it will choose the C1 shallow C-state even though it has chance to
      sleep 1 second in deep C-state.
      
      While after patched the kernel, we find that deep C-state will keep >99.6%.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NYouquan Song <youquan.song@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      69a37bea
  3. 14 11月, 2012 1 次提交
  4. 01 11月, 2012 1 次提交
  5. 24 10月, 2012 1 次提交
  6. 16 10月, 2012 3 次提交
  7. 05 10月, 2012 1 次提交
    • F
      nohz: Fix one jiffy count too far in idle cputime · 2b17c545
      Frederic Weisbecker 提交于
      When we stop the tick in idle, we save the current jiffies value
      in ts->idle_jiffies. This snapshot is substracted from the later
      value of jiffies when the tick is restarted and the resulting
      delta is accounted as idle cputime. This is how we handle the
      idle cputime accounting without the tick.
      
      But sometimes we need to schedule the next tick to some time in
      the future instead of completely stopping it. In this case, a
      tick may happen before we restart the periodic behaviour and
      from that tick we account one jiffy to idle cputime as usual but
      we also increment the ts->idle_jiffies snapshot by one so that
      when we compute the delta to account, we substract the one jiffy
      we just accounted.
      
      To prepare for stopping the tick outside idle, we introduced a
      check that prevents from fixing up that ts->idle_jiffies if we
      are not running the idle task. But we use idle_cpu() for that
      and this is a problem if we run the tick while another CPU
      remotely enqueues a ttwu to our runqueue:
      
      CPU 0:                            CPU 1:
      
      tick_sched_timer() {              ttwu_queue_remote()
             if (idle_cpu(CPU 0))
                 ts->idle_jiffies++;
      }
      
      Here, idle_cpu() notes that &rq->wake_list is not empty and
      hence won't consider the CPU as idle. As a result,
      ts->idle_jiffies won't be incremented. But this is wrong because
      we actually account the current jiffy to idle cputime. And that
      jiffy won't get substracted from the nohz time delta. So in the
      end, this jiffy is accounted twice.
      
      Fix this by changing idle_cpu(smp_processor_id()) with
      is_idle_task(current). This way the jiffy is substracted
      correctly even if a ttwu operation is enqueued on the CPU.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: <stable@vger.kernel.org> # 3.5+
      Link: http://lkml.kernel.org/r/1349308004-3482-1-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2b17c545
  8. 23 9月, 2012 1 次提交
    • P
      time: RCU permitted to stop idle entry via softirq · 803b0eba
      Paul E. McKenney 提交于
      The can_stop_idle_tick() function complains if a softirq vector is
      raised too late in the idle-entry process, presumably in order to
      prevent dangling softirq invocations from being delayed across the
      full idle period, which might be indefinitely long -- and if softirq
      was asserted any later than the call to this function, such a delay
      might well happen.
      
      However, RCU needs to be able to use softirq to stop idle entry in
      order to be able to drain RCU callbacks from the current CPU, which in
      turn enables faster entry into dyntick-idle mode, which in turn reduces
      power consumption.  Because RCU takes this action at a well-defined
      point in the idle-entry path, it is safe for RCU to take this approach.
      
      This commit therefore silences the error message that is sometimes
      produced when the going-idle CPU suddenly finds that it has an RCU_SOFTIRQ
      to process.  The error message will continue to be issued for other
      softirq vectors.
      Reported-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      803b0eba
  9. 13 9月, 2012 1 次提交
  10. 04 9月, 2012 1 次提交
  11. 06 7月, 2012 1 次提交
  12. 03 7月, 2012 1 次提交
  13. 12 6月, 2012 5 次提交
    • F
      nohz: Move next idle expiry time record into idle logic area · 84bf1bcc
      Frederic Weisbecker 提交于
      The next idle expiry time record and idle sleeps tracking are
      statistics that only concern idle.
      
      Since we want the nohz APIs to become usable further idle
      context, let's pull up the handling of these statistics to the
      callers in idle.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Alessio Igor Bogani <abogani@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@ti.com>
      Cc: Max Krasnyansky <maxk@qualcomm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      84bf1bcc
    • F
      nohz: Move ts->idle_calls incrementation into strict idle logic · 5b39939a
      Frederic Weisbecker 提交于
      Since we want to prepare for making the nohz API to work further
      the idle case, we need to pull ts->idle_calls incrementation up to
      the callers in idle.
      
      To perform this, we split tick_nohz_stop_sched_tick() in two parts:
      a first one that checks if we can really stop the tick for idle,
      and another that actually stops it. Then from the callers in idle,
      we check if we can stop the tick and only then we increment idle_calls
      and finally relay to the nohz API that won't care about these details
      anymore.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Alessio Igor Bogani <abogani@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@ti.com>
      Cc: Max Krasnyansky <maxk@qualcomm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      5b39939a
    • F
      nohz: Rename ts->idle_tick to ts->last_tick · f5d411c9
      Frederic Weisbecker 提交于
      Now that idle and nohz logics are going to be independant each others,
      ts->idle_tick becomes too much a biased name to describe the field that
      saves the last scheduled tick on top of which we re-calculate the next
      tick to schedule when the timer is restarted.
      
      We want to reuse this even to stop the tick outside idle cases. So let's
      rename it to some more generic name: ts->last_tick.
      
      This changes a bit the timer list stat export so we need to increase its
      version.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Alessio Igor Bogani <abogani@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@ti.com>
      Cc: Max Krasnyansky <maxk@qualcomm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      f5d411c9
    • F
      nohz: Make nohz API agnostic against idle ticks cputime accounting · 2ac0d98f
      Frederic Weisbecker 提交于
      When the timer tick fires, it accounts the new jiffy as either part
      of system, user or idle time. This is how we record the cputime
      statistics.
      
      But when the tick is stopped from the idle task, we still need
      to record the number of jiffies spent tickless until we restart
      the tick and fall back to traditional tick-based cputime accounting.
      
      To do this, we take a snapshot of jiffies when the tick is stopped
      and compute the difference against the new value of jiffies when
      the tick is restarted. Then we account this whole difference to
      the idle cputime.
      
      However we are preparing to be able to stop the tick from other places
      than idle. So this idle time accounting needs to be performed from
      the callers of nohz APIs, not from the nohz APIs themselves because
      we now want them to be agnostic against places that stop/restart tick.
      
      Therefore, we pull the tickless idle time accounting out of generic
      nohz helpers up to idle entry/exit callers.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Alessio Igor Bogani <abogani@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@ti.com>
      Cc: Max Krasnyansky <maxk@qualcomm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      2ac0d98f
    • F
      nohz: Separate idle sleeping time accounting from nohz logic · 19f5f736
      Frederic Weisbecker 提交于
      As we plan to be able to stop the tick outside the idle task, we
      need to prepare for separating nohz logic from idle. As a start,
      this pulls the idle sleeping time accounting out of the tick
      stop/restart API to the callers on idle entry/exit.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Alessio Igor Bogani <abogani@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@ti.com>
      Cc: Max Krasnyansky <maxk@qualcomm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      19f5f736
  14. 07 6月, 2012 1 次提交
    • P
      rcu: Precompute RCU_FAST_NO_HZ timer offsets · aa9b1630
      Paul E. McKenney 提交于
      When a CPU is entering dyntick-idle mode, tick_nohz_stop_sched_tick()
      calls rcu_needs_cpu() see if RCU needs that CPU, and, if not, computes the
      next wakeup time based on the timer wheels.  Only later, when actually
      entering the idle loop, rcu_prepare_for_idle() will be invoked.  In some
      cases, rcu_prepare_for_idle() will post timers to wake the CPU back up.
      But all for naught: The next wakeup time for the CPU has already been
      computed, and posting a timer afterwards does not force that wakeup
      time to be recomputed.  This means that rcu_prepare_for_idle()'s have
      no effect.
      
      This is not a problem on a busy system because something else will wake
      up the CPU soon enough.  However, on lightly loaded systems, the CPU
      might stay asleep for a considerable length of time.  If that CPU has
      a callback that the rest of the system is waiting on, the system might
      run very slowly or (in theory) even hang.
      
      This commit avoids this problem by having rcu_needs_cpu() give
      tick_nohz_stop_sched_tick() an estimate of when RCU will need the CPU
      to wake back up, which tick_nohz_stop_sched_tick() takes into account
      when programming the CPU's wakeup time.  An alternative approach is
      for rcu_prepare_for_idle() to use hrtimers instead of normal timers,
      but timers are much more efficient than are hrtimers for frequently
      and repeatedly posting and cancelling a given timer, which is exactly
      what RCU_FAST_NO_HZ does.
      Reported-by: NPascal Chapperon <pascal.chapperon@wanadoo.fr>
      Reported-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Tested-by: NPascal Chapperon <pascal.chapperon@wanadoo.fr>
      aa9b1630
  15. 30 5月, 2012 1 次提交
  16. 25 5月, 2012 2 次提交
    • T
      tick: Move skew_tick option into the HIGH_RES_TIMER section · 62cf20b3
      Thomas Gleixner 提交于
      commit 5307c955 (tick: Add tick skew boot option) broke the
      !CONFIG_HIGH_RES_TIMERS build.
      
      Move the boot option parsing into the CONFIG_HIGH_RES_TIMERS section.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Mike Galbraith <mgalbraith@suse.de>
      62cf20b3
    • M
      tick: Add tick skew boot option · 5307c955
      Mike Galbraith 提交于
      Let the user decide whether power consumption or jitter is the
      more important consideration for their machines.
      
      Quoting removal commit af5ab277:
      
      "Historically, Linux has tried to make the regular timer tick on the
       various CPUs not happen at the same time, to avoid contention on
       xtime_lock.
          
       Nowadays, with the tickless kernel, this contention no longer happens
       since time keeping and updating are done differently. In addition,
       this skew is actually hurting power consumption in a measurable way on
       many-core systems."
      
      Problems:
      
      - Contrary to the above, systems do encounter contention on both
        xtime_lock and RCU structure locks when the tick is synchronized.
        
      - Moderate sized RT systems suffer intolerable jitter due to the tick
        being synchronized.
      
      - SGI reports the same for their large systems.
      
      - Fully utilized systems reap no power saving benefit from skew removal,
        but do suffer from resulting induced lock contention.
      
      - 0209f649 rcu: limit rcu_node leaf-level fanout
        This patch was born to combat lock contention which testing showed
        to have been _induced by_ skew removal.  Skew the tick, contention
        disappeared virtually completely.
      Signed-off-by: NMike Galbraith <mgalbraith@suse.de>
      Link: http://lkml.kernel.org/r/1336472458.21924.78.camel@marge.simpson.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      5307c955
  17. 06 4月, 2012 1 次提交
    • N
      nohz: Fix stale jiffies update in tick_nohz_restart() · 6f103929
      Neal Cardwell 提交于
      Fix tick_nohz_restart() to not use a stale ktime_t "now" value when
      calling tick_do_update_jiffies64(now).
      
      If we reach this point in the loop it means that we crossed a tick
      boundary since we grabbed the "now" timestamp, so at this point "now"
      refers to a time in the old jiffy, so using the old value for "now" is
      incorrect, and is likely to give us a stale jiffies value.
      
      In particular, the first time through the loop the
      tick_do_update_jiffies64(now) call is always a no-op, since the
      caller, tick_nohz_restart_sched_tick(), will have already called
      tick_do_update_jiffies64(now) with that "now" value.
      
      Note that tick_nohz_stop_sched_tick() already uses the correct
      approach: when we notice we cross a jiffy boundary, grab a new
      timestamp with ktime_get(), and *then* update jiffies.
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Cc: Ben Segall <bsegall@google.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/1332875377-23014-1-git-send-email-ncardwell@google.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      6f103929
  18. 15 2月, 2012 2 次提交
  19. 12 12月, 2011 4 次提交
    • F
      nohz: Remove tick_nohz_idle_enter_norcu() / tick_nohz_idle_exit_norcu() · 1268fbc7
      Frederic Weisbecker 提交于
      Those two APIs were provided to optimize the calls of
      tick_nohz_idle_enter() and rcu_idle_enter() into a single
      irq disabled section. This way no interrupt happening in-between would
      needlessly process any RCU job.
      
      Now we are talking about an optimization for which benefits
      have yet to be measured. Let's start simple and completely decouple
      idle rcu and dyntick idle logics to simplify.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      1268fbc7
    • F
      nohz: Allow rcu extended quiescent state handling seperately from tick stop · 2bbb6817
      Frederic Weisbecker 提交于
      It is assumed that rcu won't be used once we switch to tickless
      mode and until we restart the tick. However this is not always
      true, as in x86-64 where we dereference the idle notifiers after
      the tick is stopped.
      
      To prepare for fixing this, add two new APIs:
      tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().
      
      If no use of RCU is made in the idle loop between
      tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
      must instead call the new *_norcu() version such that the arch doesn't
      need to call rcu_idle_enter() and rcu_idle_exit().
      
      Otherwise the arch must call tick_nohz_enter_idle() and
      tick_nohz_exit_idle() and also call explicitly:
      
      - rcu_idle_enter() after its last use of RCU before the CPU is put
      to sleep.
      - rcu_idle_exit() before the first use of RCU after the CPU is woken
      up.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: David Miller <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      2bbb6817
    • F
      nohz: Separate out irq exit and idle loop dyntick logic · 280f0677
      Frederic Weisbecker 提交于
      The tick_nohz_stop_sched_tick() function, which tries to delay
      the next timer tick as long as possible, can be called from two
      places:
      
      - From the idle loop to start the dytick idle mode
      - From interrupt exit if we have interrupted the dyntick
      idle mode, so that we reprogram the next tick event in
      case the irq changed some internal state that requires this
      action.
      
      There are only few minor differences between both that
      are handled by that function, driven by the ts->inidle
      cpu variable and the inidle parameter. The whole guarantees
      that we only update the dyntick mode on irq exit if we actually
      interrupted the dyntick idle mode, and that we enter in RCU extended
      quiescent state from idle loop entry only.
      
      Split this function into:
      
      - tick_nohz_idle_enter(), which sets ts->inidle to 1, enters
      dynticks idle mode unconditionally if it can, and enters into RCU
      extended quiescent state.
      
      - tick_nohz_irq_exit() which only updates the dynticks idle mode
      when ts->inidle is set (ie: if tick_nohz_idle_enter() has been called).
      
      To maintain symmetry, tick_nohz_restart_sched_tick() has been renamed
      into tick_nohz_idle_exit().
      
      This simplifies the code and micro-optimize the irq exit path (no need
      for local_irq_save there). This also prepares for the split between
      dynticks and rcu extended quiescent state logics. We'll need this split to
      further fix illegal uses of RCU in extended quiescent states in the idle
      loop.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: David Miller <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      280f0677
    • P
      rcu: Track idleness independent of idle tasks · 9b2e4f18
      Paul E. McKenney 提交于
      Earlier versions of RCU used the scheduling-clock tick to detect idleness
      by checking for the idle task, but handled idleness differently for
      CONFIG_NO_HZ=y.  But there are now a number of uses of RCU read-side
      critical sections in the idle task, for example, for tracing.  A more
      fine-grained detection of idleness is therefore required.
      
      This commit presses the old dyntick-idle code into full-time service,
      so that rcu_idle_enter(), previously known as rcu_enter_nohz(), is
      always invoked at the beginning of an idle loop iteration.  Similarly,
      rcu_idle_exit(), previously known as rcu_exit_nohz(), is always invoked
      at the end of an idle-loop iteration.  This allows the idle task to
      use RCU everywhere except between consecutive rcu_idle_enter() and
      rcu_idle_exit() calls, in turn allowing architecture maintainers to
      specify exactly where in the idle loop that RCU may be used.
      
      Because some of the userspace upcall uses can result in what looks
      to RCU like half of an interrupt, it is not possible to expect that
      the irq_enter() and irq_exit() hooks will give exact counts.  This
      patch therefore expands the ->dynticks_nesting counter to 64 bits
      and uses two separate bitfields to count process/idle transitions
      and interrupt entry/exit transitions.  It is presumed that userspace
      upcalls do not happen in the idle loop or from usermode execution
      (though usermode might do a system call that results in an upcall).
      The counter is hard-reset on each process/idle transition, which
      avoids the interrupt entry/exit error from accumulating.  Overflow
      is avoided by the 64-bitness of the ->dyntick_nesting counter.
      
      This commit also adds warnings if a non-idle task asks RCU to enter
      idle state (and these checks will need some adjustment before applying
      Frederic's OS-jitter patches (http://lkml.org/lkml/2011/10/7/246).
      In addition, validation of ->dynticks and ->dynticks_nesting is added.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      9b2e4f18
  20. 06 12月, 2011 1 次提交
  21. 29 9月, 2011 1 次提交
  22. 08 9月, 2011 3 次提交
  23. 01 2月, 2011 1 次提交
  24. 20 1月, 2011 1 次提交
    • S
      hrtimers: Notify hrtimer users of switches to NOHZ mode · 2d0640b4
      Stephen Boyd 提交于
      When NOHZ=y and high res timers are disabled (via cmdline or
      Kconfig) tick_nohz_switch_to_nohz() will notify the user about
      switching into NOHZ mode. Nothing is printed for the case where
      HIGH_RES_TIMERS=y. Fix this for the HIGH_RES_TIMERS=y case by
      duplicating the printk from the low res NOHZ path in the high
      res NOHZ path.
      
      This confused me since I was thinking 'dmesg | grep -i NOHZ' would
      tell me if NOHZ was enabled, but if I have hrtimers there is
      nothing.
      Signed-off-by: NStephen Boyd <sboyd@codeaurora.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1295419594-13085-1-git-send-email-sboyd@codeaurora.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2d0640b4
  25. 03 8月, 2010 1 次提交
  26. 17 7月, 2010 1 次提交
  27. 01 7月, 2010 1 次提交
    • P
      sched: Cure nr_iowait_cpu() users · 8c215bd3
      Peter Zijlstra 提交于
      Commit 0224cf4c (sched: Intoduce get_cpu_iowait_time_us())
      broke things by not making sure preemption was indeed disabled
      by the callers of nr_iowait_cpu() which took the iowait value of
      the current cpu.
      
      This resulted in a heap of preempt warnings. Cure this by making
      nr_iowait_cpu() take a cpu number and fix up the callers to pass
      in the right number.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Maxim Levitsky <maximlevitsky@gmail.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: linux-pm@lists.linux-foundation.org
      LKML-Reference: <1277968037.1868.120.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8c215bd3