1. 07 10月, 2009 1 次提交
    • E
      NOHZ: update idle state also when NOHZ is inactive · fdc6f192
      Eero Nurkkala 提交于
      Commit f2e21c96 had unfortunate side
      effects with cpufreq governors on some systems.
      
      If the system did not switch into NOHZ mode ts->inidle is not set when
      tick_nohz_stop_sched_tick() is called from the idle routine. Therefor
      all subsequent calls from irq_exit() to tick_nohz_stop_sched_tick()
      fail to call tick_nohz_start_idle(). This results in bogus idle
      accounting information which is passed to cpufreq governors.
      
      Set the inidle flag unconditionally of the NOHZ active state to keep
      the idle time accounting correct in any case.
      
      [ tglx: Added comment and tweaked the changelog ]
      Reported-by: NSteven Noonan <steven@uplinklabs.net>
      Signed-off-by: NEero Nurkkala <ext-eero.nurkkala@nokia.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Steven Noonan <steven@uplinklabs.net>
      Cc: stable@kernel.org
      LKML-Reference: <1254907901.30157.93.camel@eenurkka-desktop>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      fdc6f192
  2. 27 5月, 2009 1 次提交
  3. 13 5月, 2009 1 次提交
  4. 15 1月, 2009 1 次提交
  5. 31 12月, 2008 2 次提交
    • M
      [PATCH] idle cputime accounting · 79741dd3
      Martin Schwidefsky 提交于
      The cpu time spent by the idle process actually doing something is
      currently accounted as idle time. This is plain wrong, the architectures
      that support VIRT_CPU_ACCOUNTING=y can do better: distinguish between the
      time spent doing nothing and the time spent by idle doing work. The first
      is accounted with account_idle_time and the second with account_system_time.
      The architectures that use the account_xxx_time interface directly and not
      the account_xxx_ticks interface now need to do the check for the idle
      process in their arch code. In particular to improve the system vs true
      idle time accounting the arch code needs to measure the true idle time
      instead of just testing for the idle process.
      To improve the tick based accounting as well we would need an architecture
      primitive that can tell us if the pt_regs of the interrupted context
      points to the magic instruction that halts the cpu.
      
      In addition idle time is no more added to the stime of the idle process.
      This field now contains the system time of the idle process as it should
      be. On systems without VIRT_CPU_ACCOUNTING this will always be zero as
      every tick that occurs while idle is running will be accounted as idle
      time.
      
      This patch contains the necessary common code changes to be able to
      distinguish idle system time and true idle time. The architectures with
      support for VIRT_CPU_ACCOUNTING need some changes to exploit this.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      79741dd3
    • M
      [PATCH] fix scaled & unscaled cputime accounting · 457533a7
      Martin Schwidefsky 提交于
      The utimescaled / stimescaled fields in the task structure and the
      global cpustat should be set on all architectures. On s390 the calls
      to account_user_time_scaled and account_system_time_scaled never have
      been added. In addition system time that is accounted as guest time
      to the user time of a process is accounted to the scaled system time
      instead of the scaled user time.
      To fix the bugs and to prevent future forgetfulness this patch merges
      account_system_time_scaled into account_system_time and
      account_user_time_scaled into account_user_time.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Michael Neuling <mikey@neuling.org>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      457533a7
  6. 12 12月, 2008 2 次提交
    • W
      nohz: suppress needless timer reprogramming · 00147449
      Woodruff, Richard 提交于
      In my device I get many interrupts from a high speed USB device in a very
      short period of time.  The system spends a lot of time reprogramming the
      hardware timer which is in a slower timing domain as compared to the CPU. 
      This results in the CPU spending a huge amount of time waiting for the
      timer posting to be done.  All of this reprogramming is useless as the
      wake up time has not changed.
      
      As measured using ETM trace this drops my reprogramming penalty from
      almost 60% CPU load down to 15% during high interrupt rate.  I can send
      traces to show this.
      
      Suppress setting of duplicate timer event when timer already stopped. 
      Timer programming can be very costly and can result in long cpu stall/wait
      times.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [tglx@linutronix.de: move the check to the right place and avoid raising
      		     the softirq for nothing]
      Signed-off-by: NRichard Woodruff <r-woodruff2@ti.com>
      Cc: johnstul@us.ibm.com
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      00147449
    • H
      nohz: no softirq pending warnings for offline cpus · fa116ea3
      Heiko Carstens 提交于
      Impact: remove false positive warning
      
      After a cpu was taken down during cpu hotplug (read: disabled for interrupts)
      it still might have pending softirqs. However take_cpu_down makes sure
      that the idle task will run next instead of ksoftirqd on the taken down cpu.
      The idle task will call tick_nohz_stop_sched_tick which might warn about
      pending softirqs just before the cpu kills itself completely.
      
      However the pending softirqs on the dead cpu aren't a problem because they
      will be moved to an online cpu during CPU_DEAD handling.
      
      So make sure we warn only for online cpus.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fa116ea3
  7. 25 11月, 2008 2 次提交
    • P
      hrtimer: removing all ur callback modes · ca109491
      Peter Zijlstra 提交于
      Impact: cleanup, move all hrtimer processing into hardirq context
      
      This is an attempt at removing some of the hrtimer complexity by
      reducing the number of callback modes to 1.
      
      This means that all hrtimer callback functions will be ran from HARD-irq
      context.
      
      I went through all the 30 odd hrtimer callback functions in the kernel
      and saw only one that I'm not quite sure of, which is the one in
      net/can/bcm.c - hence I'm CC-ing the folks responsible for that code.
      
      Furthermore, the hrtimer core now calls callbacks directly with IRQs
      disabled in case you try to enqueue an expired timer. If this timer is a
      periodic timer (which should use hrtimer_forward() to advance its time)
      then it might be possible to end up in an inf. recursive loop due to the
      fact that hrtimer_forward() doesn't round up to the next timer
      granularity, and therefore keeps on calling the callback - obviously
      this needs a fix.
      
      Aside from that, this seems to compile and actually boot on my dual core
      test box - although I'm sure there are some bugs in, me not hitting any
      makes me certain :-)
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ca109491
    • R
      sched: convert nohz_cpu_mask to cpumask_var_t. · 6a7b3dc3
      Rusty Russell 提交于
      Impact: (future) size reduction for large NR_CPUS.
      
      Dynamically allocating cpumasks (when CONFIG_CPUMASK_OFFSTACK) saves
      space for small nr_cpu_ids but big CONFIG_NR_CPUS.  cpumask_var_t
      is just a struct cpumask for !CONFIG_CPUMASK_OFFSTACK.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6a7b3dc3
  8. 11 11月, 2008 1 次提交
    • T
      nohz: disable tick_nohz_kick_tick() for now · ae99286b
      Thomas Gleixner 提交于
      Impact: nohz powersavings and wakeup regression
      
      commit fb02fbc1 (NOHZ: restart tick
      device from irq_enter()) causes a serious wakeup regression.
      
      While the patch is correct it does not take into account that spurious
      wakeups happen on x86. A fix for this issue is available, but we just
      revert to the .27 behaviour and let long running softirqs screw
      themself.
      
      Disable it for now.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      ae99286b
  9. 22 10月, 2008 1 次提交
    • T
      NOHZ: fix thinko in the timer restart code path · c4bd822e
      Thomas Gleixner 提交于
      commit fb02fbc1 (NOHZ: restart tick
      device from irq_enter())
      
      solves the problem of stale jiffies when long running softirqs happen
      in a long idle sleep period, but it has a major thinko in it:
      
      When the interrupt which came in _is_ the timer interrupt which should
      expire ts->sched_timer then we cancel and rearm the timer _before_ it
      gets expired in hrtimer_interrupt() to the next period. That means the
      call back function is not called. This game can go on for ever :(
      
      Prevent this by making sure to only rearm the timer when the expiry
      time is more than one tick_period away. Otherwise keep it running as
      it is either already expired or will expiry at the right point to
      update jiffies.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NVenkatesch Pallipadi <venkatesh.pallipadi@intel.com>
      c4bd822e
  10. 18 10月, 2008 3 次提交
    • T
      NOHZ: restart tick device from irq_enter() · fb02fbc1
      Thomas Gleixner 提交于
      We did not restart the tick device from irq_enter() to avoid double
      reprogramming and extra events in the return immediate to idle case.
      
      But long lasting softirqs can lead to a situation where jiffies become
      stale:
      
      idle()
        tick stopped (reprogrammed to next pending timer)
        halt()
         interrupt
           jiffies updated from irq_enter()
           interrupt handler
           softirq function 1 runs 20ms
           softirq function 2 arms a 10ms timer with a stale jiffies value
           jiffies updated from irq_exit()
           timer wheel has now an already expired timer
           (the one added in function 2)
           timer fires and timer softirq runs
      
      This was discovered when debugging a timer problem which happend only
      when the ath5k driver is active. The debugging proved that there is a
      softirq function running for more than 20ms, which is a bug by itself.
      
      To solve this we restart the tick timer right from irq_enter(), but do
      not go through the other functions which are necessary to return from
      idle when need_resched() is set.
      Reported-by: NElias Oltmanns <eo@nebensachen.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NElias Oltmanns <eo@nebensachen.de>
      fb02fbc1
    • T
      NOHZ: split tick_nohz_restart_sched_tick() · c34bec5a
      Thomas Gleixner 提交于
      Split out the clock event device reprogramming. Preparatory
      patch.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      c34bec5a
    • T
      NOHZ: unify the nohz function calls in irq_enter() · 719254fa
      Thomas Gleixner 提交于
      We have two separate nohz function calls in irq_enter() for no good
      reason. Just call a single NOHZ function from irq_enter() and call
      the bits in the tick code.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      719254fa
  11. 10 10月, 2008 1 次提交
  12. 29 9月, 2008 1 次提交
    • T
      hrtimer: prevent migration of per CPU hrtimers · ccc7dadf
      Thomas Gleixner 提交于
      Impact: per CPU hrtimers can be migrated from a dead CPU
      
      The hrtimer code has no knowledge about per CPU timers, but we need to
      prevent the migration of such timers and warn when such a timer is
      active at migration time.
      
      Explicitely mark the timers as per CPU and use a more understandable
      mode descriptor for the interrupts safe unlocked callback mode, which
      is used by hrtimer_sleeper and the scheduler code.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      ccc7dadf
  13. 23 9月, 2008 2 次提交
    • T
      clockevents: prevent stale tick_next_period for onlining CPUs · 49d670fb
      Thomas Gleixner 提交于
      Impact: possible hang on CPU onlining in timer one shot mode.
      
      The tick_next_period variable is only used during boot on nohz/highres
      enabled systems, but for CPU onlining it needs to be maintained when
      the per cpu clock events device operates in one shot mode.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      49d670fb
    • T
      clockevents: prevent cpu online to interfere with nohz · 6441402b
      Thomas Gleixner 提交于
      Impact: rare hang which can be triggered on CPU online.
      
      tick_do_timer_cpu keeps track of the CPU which updates jiffies
      via do_timer. The value -1 is used to signal, that currently no
      CPU is doing this. There are two cases, where the variable can 
      have this state:
      
       boot:
          necessary for systems where the boot cpu id can be != 0
      
       nohz long idle sleep:
          When the CPU which did the jiffies update last goes into
          a long idle sleep it drops the update jiffies duty so
          another CPU which is not idle can pick it up and keep
          jiffies going.
      
      Using the same value for both situations is wrong, as the CPU online
      code can see the -1 state when the timer of the newly onlined CPU is
      setup. The setup for a newly onlined CPU goes through periodic mode
      and can pick up the do_timer duty without being aware of the nohz /
      highres mode of the already running system.
      
      Use two separate states and make them constants to avoid magic
      numbers confusion. 
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      6441402b
  14. 06 9月, 2008 2 次提交
  15. 21 8月, 2008 1 次提交
  16. 11 8月, 2008 1 次提交
  17. 31 7月, 2008 1 次提交
    • I
      sched clock: revert various sched_clock() changes · e4e4e534
      Ingo Molnar 提交于
      Found an interactivity problem on a quad core test-system - simple
      CPU loops would occasionally delay the system un an unacceptable way.
      
      After much debugging with Peter Zijlstra it turned out that the problem
      is caused by the string of sched_clock() changes - they caused the CPU
      clock to jump backwards a bit - which confuses the scheduler arithmetics.
      
      (which is unsigned for performance reasons)
      
      So revert:
      
       # c300ba25: sched_clock: and multiplier for TSC to gtod drift
       # c0c87734: sched_clock: only update deltas with local reads.
       # af52a90a: sched_clock: stop maximum check on NO HZ
       # f7cce27f: sched_clock: widen the max and min time
      
      This solves the interactivity problems.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NMike Galbraith <efault@gmx.de>
      e4e4e534
  18. 19 7月, 2008 1 次提交
    • T
      nohz: prevent tick stop outside of the idle loop · b8f8c3cf
      Thomas Gleixner 提交于
      Jack Ren and Eric Miao tracked down the following long standing
      problem in the NOHZ code:
      
      	scheduler switch to idle task
      	enable interrupts
      
      Window starts here
      
      	----> interrupt happens (does not set NEED_RESCHED)
      	      	irq_exit() stops the tick
      
      	----> interrupt happens (does set NEED_RESCHED)
      
      	return from schedule()
      	
      	cpu_idle(): preempt_disable();
      
      Window ends here
      
      The interrupts can happen at any point inside the race window. The
      first interrupt stops the tick, the second one causes the scheduler to
      rerun and switch away from idle again and we end up with the tick
      disabled.
      
      The fact that it needs two interrupts where the first one does not set
      NEED_RESCHED and the second one does made the bug obscure and extremly
      hard to reproduce and analyse. Kudos to Jack and Eric.
      
      Solution: Limit the NOHZ functionality to the idle loop to make sure
      that we can not run into such a situation ever again.
      
      cpu_idle()
      {
      	preempt_disable();
      
      	while(1) {
      		 tick_nohz_stop_sched_tick(1); <- tell NOHZ code that we
      		 			          are in the idle loop
      
      		 while (!need_resched())
      		       halt();
      
      		 tick_nohz_restart_sched_tick(); <- disables NOHZ mode
      		 preempt_enable_no_resched();
      		 schedule();
      		 preempt_disable();
      	}
      }
      
      In hindsight we should have done this forever, but ... 
      
      /me grabs a large brown paperbag.
      
      Debugged-by: Jack Ren <jack.ren@marvell.com>, 
      Debugged-by: Neric miao <eric.y.miao@gmail.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      b8f8c3cf
  19. 11 7月, 2008 2 次提交
    • S
      sched_clock: stop maximum check on NO HZ · af52a90a
      Steven Rostedt 提交于
      Working with ftrace I would get large jumps of 11 millisecs or more with
      the clock tracer. This killed the latencing timings of ftrace and also
      caused the irqoff self tests to fail.
      
      What was happening is with NO_HZ the idle would stop the jiffy counter and
      before the jiffy counter was updated the sched_clock would have a bad
      delta jiffies to compare with the gtod with the maximum.
      
      The jiffies would stop and the last sched_tick would record the last gtod.
      On wakeup, the sched clock update would compare the gtod + delta jiffies
      (which would be zero) and compare it to the TSC. The TSC would have
      correctly (with a stable TSC) moved forward several jiffies. But because the
      jiffies has not been updated yet the clock would be prevented from moving
      forward because it would appear that the TSC jumped too far ahead.
      
      The clock would then virtually stop, until the jiffies are updated. Then
      the next sched clock update would see that the clock was very much behind
      since the delta jiffies is now correct. This would then jump the clock
      forward by several jiffies.
      
      This caused ftrace to report several milliseconds of interrupts off
      latency at every resume from NO_HZ idle.
      
      This patch adds hooks into the nohz code to disable the checking of the
      maximum clock update when nohz is in effect. It resumes the max check
      when nohz has updated the jiffies again.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      af52a90a
    • H
      nohz: don't stop idle tick if softirqs are pending. · 857f3fd7
      Heiko Carstens 提交于
      In case a cpu goes idle but softirqs are pending only an error message is
      printed to the console. It may take a very long time until the pending
      softirqs will finally be executed. Worst case would be a hanging system.
      
      With this patch the timer tick just continues and the softirqs will be
      executed after the next interrupt. Still a delay but better than a
      hanging system.
      
      Currently we have at least two device drivers on s390 which under certain
      circumstances schedule a tasklet from process context. This is a reason
      why we can end up with pending softirqs when going idle. Fixing these
      drivers seems to be non-trivial.
      However there is no question that the drivers should be fixed.
      This patch shouldn't be considered as a bug fix. It just is intended to
      keep a system running even if device drivers are buggy.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Jan Glauber <jan.glauber@de.ibm.com>
      Cc: Stefan Weinhuber <wein@de.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      857f3fd7
  20. 30 5月, 2008 2 次提交
  21. 25 4月, 2008 1 次提交
    • I
      softlockup: fix NOHZ wakeup · 126e01bf
      Ingo Molnar 提交于
      David Miller reported:
      
      |--------------->
      the following commit:
      
      | commit 27ec4407
      | Author: Ingo Molnar <mingo@elte.hu>
      | Date:   Thu Feb 28 21:00:21 2008 +0100
      |
      |     sched: make cpu_clock() globally synchronous
      |
      |     Alexey Zaytsev reported (and bisected) that the introduction of
      |     cpu_clock() in printk made the timestamps jump back and forth.
      |
      |     Make cpu_clock() more reliable while still keeping it fast when it's
      |     called frequently.
      |
      |     Signed-off-by: Ingo Molnar <mingo@elte.hu>
      
      causes watchdog triggers when a cpu exits NOHZ state when it has been
      there for >= the soft lockup threshold, for example here are some
      messages from a 128 cpu Niagara2 box:
      
      [  168.106406] BUG: soft lockup - CPU#11 stuck for 128s! [dd:3239]
      [  168.989592] BUG: soft lockup - CPU#21 stuck for 86s! [swapper:0]
      [  168.999587] BUG: soft lockup - CPU#29 stuck for 91s! [make:4511]
      [  168.999615] BUG: soft lockup - CPU#2 stuck for 85s! [swapper:0]
      [  169.020514] BUG: soft lockup - CPU#37 stuck for 91s! [swapper:0]
      [  169.020514] BUG: soft lockup - CPU#45 stuck for 91s! [sh:4515]
      [  169.020515] BUG: soft lockup - CPU#69 stuck for 92s! [swapper:0]
      [  169.020515] BUG: soft lockup - CPU#77 stuck for 92s! [swapper:0]
      [  169.020515] BUG: soft lockup - CPU#61 stuck for 92s! [swapper:0]
      [  169.112554] BUG: soft lockup - CPU#85 stuck for 92s! [swapper:0]
      [  169.112554] BUG: soft lockup - CPU#101 stuck for 92s! [swapper:0]
      [  169.112554] BUG: soft lockup - CPU#109 stuck for 92s! [swapper:0]
      [  169.112554] BUG: soft lockup - CPU#117 stuck for 92s! [swapper:0]
      [  169.171483] BUG: soft lockup - CPU#40 stuck for 80s! [dd:3239]
      [  169.331483] BUG: soft lockup - CPU#13 stuck for 86s! [swapper:0]
      [  169.351500] BUG: soft lockup - CPU#43 stuck for 101s! [dd:3239]
      [  169.531482] BUG: soft lockup - CPU#9 stuck for 129s! [mkdir:4565]
      [  169.595754] BUG: soft lockup - CPU#20 stuck for 93s! [swapper:0]
      [  169.626787] BUG: soft lockup - CPU#52 stuck for 93s! [swapper:0]
      [  169.626787] BUG: soft lockup - CPU#84 stuck for 92s! [swapper:0]
      [  169.636812] BUG: soft lockup - CPU#116 stuck for 94s! [swapper:0]
      
      It's simple enough to trigger this by doing a 10 minute sleep after a
      fresh bootup then starting a parallel kernel build.
      
      I suspect this might be reintroducing a problem we've had and fixed
      before, see the thread:
      
      http://marc.info/?l=linux-kernel&m=119546414004065&w=2
      <---------------|
      
      touch the softlockup watchdog when exiting NOHZ state - we are
      obviously not locked up.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      126e01bf
  22. 20 4月, 2008 1 次提交
  23. 17 4月, 2008 1 次提交
  24. 09 3月, 2008 1 次提交
  25. 01 3月, 2008 1 次提交
  26. 09 2月, 2008 1 次提交
  27. 02 2月, 2008 1 次提交
  28. 30 1月, 2008 3 次提交
  29. 26 1月, 2008 1 次提交