1. 15 2月, 2014 1 次提交
    • V
      timer: Spare IPI when deferrable timer is queued on idle remote targets · 8ba14654
      Viresh Kumar 提交于
      When a timer is enqueued or modified on a remote target, the latter is
      expected to see and handle this timer on its next tick. However if the
      target is idle and CONFIG_NO_HZ_IDLE=y, the CPU may be sleeping tickless
      and the timer may be ignored.
      
      wake_up_nohz_cpu() takes care of that by setting TIF_NEED_RESCHED and
      sending an IPI to idle targets so that the tick is reevaluated on the
      idle loop through the tick_nohz_idle_*() APIs.
      
      Now this is all performed regardless of the power properties of the
      timer. If the timer is deferrable, idle targets don't need to be woken
      up. Only the next buzy tick needs to care about it, and no IPI kick
      is needed for that to happen.
      
      So lets spare the IPI on idle targets when the timer is deferrable.
      
      Meanwhile we keep the current behaviour on full dynticks targets. We can
      spare IPIs on idle full dynticks targets as well but some tricky races
      against idle_cpu() must be dealt all along to make sure that the timer
      is well handled after idle exit. We can deal with that later since
      NO_HZ_FULL already has more important powersaving issues.
      Reported-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/CAKohpomMZ0TAN2e6N76_g4ZRzxd5vZ1XfuZfxrP7GMxfTNiLVw@mail.gmail.comSigned-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      8ba14654
  2. 19 11月, 2013 1 次提交
  3. 25 9月, 2013 1 次提交
  4. 15 7月, 2013 1 次提交
    • P
      kernel: delete __cpuinit usage from all core kernel files · 0db0628d
      Paul Gortmaker 提交于
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      This removes all the uses of the __cpuinit macros from C files in
      the core kernel directories (kernel, init, lib, mm, and include)
      that don't really have a specific maintainer.
      
      [1] https://lkml.org/lkml/2013/5/20/589Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      0db0628d
  5. 28 6月, 2013 1 次提交
  6. 14 5月, 2013 1 次提交
    • T
      timer: Don't reinitialize the cpu base lock during CPU_UP_PREPARE · 42a5cf46
      Tirupathi Reddy 提交于
      An inactive timer's base can refer to a offline cpu's base.
      
      In the current code, cpu_base's lock is blindly reinitialized each
      time a CPU is brought up. If a CPU is brought online during the period
      that another thread is trying to modify an inactive timer on that CPU
      with holding its timer base lock, then the lock will be reinitialized
      under its feet. This leads to following SPIN_BUG().
      
      <0> BUG: spinlock already unlocked on CPU#3, kworker/u:3/1466
      <0> lock: 0xe3ebe000, .magic: dead4ead, .owner: kworker/u:3/1466, .owner_cpu: 1
      <4> [<c0013dc4>] (unwind_backtrace+0x0/0x11c) from [<c026e794>] (do_raw_spin_unlock+0x40/0xcc)
      <4> [<c026e794>] (do_raw_spin_unlock+0x40/0xcc) from [<c076c160>] (_raw_spin_unlock+0x8/0x30)
      <4> [<c076c160>] (_raw_spin_unlock+0x8/0x30) from [<c009b858>] (mod_timer+0x294/0x310)
      <4> [<c009b858>] (mod_timer+0x294/0x310) from [<c00a5e04>] (queue_delayed_work_on+0x104/0x120)
      <4> [<c00a5e04>] (queue_delayed_work_on+0x104/0x120) from [<c04eae00>] (sdhci_msm_bus_voting+0x88/0x9c)
      <4> [<c04eae00>] (sdhci_msm_bus_voting+0x88/0x9c) from [<c04d8780>] (sdhci_disable+0x40/0x48)
      <4> [<c04d8780>] (sdhci_disable+0x40/0x48) from [<c04bf300>] (mmc_release_host+0x4c/0xb0)
      <4> [<c04bf300>] (mmc_release_host+0x4c/0xb0) from [<c04c7aac>] (mmc_sd_detect+0x90/0xfc)
      <4> [<c04c7aac>] (mmc_sd_detect+0x90/0xfc) from [<c04c2504>] (mmc_rescan+0x7c/0x2c4)
      <4> [<c04c2504>] (mmc_rescan+0x7c/0x2c4) from [<c00a6a7c>] (process_one_work+0x27c/0x484)
      <4> [<c00a6a7c>] (process_one_work+0x27c/0x484) from [<c00a6e94>] (worker_thread+0x210/0x3b0)
      <4> [<c00a6e94>] (worker_thread+0x210/0x3b0) from [<c00aad9c>] (kthread+0x80/0x8c)
      <4> [<c00aad9c>] (kthread+0x80/0x8c) from [<c000ea80>] (kernel_thread_exit+0x0/0x8)
      
      As an example, this particular crash occurred when CPU #3 is executing
      mod_timer() on an inactive timer whose base is refered to offlined CPU
      #2.  The code locked the timer_base corresponding to CPU #2. Before it
      could proceed, CPU #2 came online and reinitialized the spinlock
      corresponding to its base. Thus now CPU #3 held a lock which was
      reinitialized. When CPU #3 finally ended up unlocking the old cpu_base
      corresponding to CPU #2, we hit the above SPIN_BUG().
      
      CPU #0		CPU #3				       CPU #2
      ------		-------				       -------
      .....		 ......				      <Offline>
      		mod_timer()
      		 lock_timer_base
      		   spin_lock_irqsave(&base->lock)
      
      cpu_up(2)	 .....				        ......
      							init_timers_cpu()
      ....		 .....				    	spin_lock_init(&base->lock)
      .....		   spin_unlock_irqrestore(&base->lock)  ......
      		   <spin_bug>
      
      Allocation of per_cpu timer vector bases is done only once under
      "tvec_base_done[]" check. In the current code, spinlock_initialization
      of base->lock isn't under this check. When a CPU is up each time the
      base lock is reinitialized. Move base spinlock initialization under
      the check.
      Signed-off-by: NTirupathi Reddy <tirupath@codeaurora.org>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/1368520142-4136-1-git-send-email-tirupath@codeaurora.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      42a5cf46
  7. 01 5月, 2013 3 次提交
  8. 03 4月, 2013 1 次提交
    • F
      nohz: Rename CONFIG_NO_HZ to CONFIG_NO_HZ_COMMON · 3451d024
      Frederic Weisbecker 提交于
      We are planning to convert the dynticks Kconfig options layout
      into a choice menu. The user must be able to easily pick
      any of the following implementations: constant periodic tick,
      idle dynticks, full dynticks.
      
      As this implies a mutual exclusion, the two dynticks implementions
      need to converge on the selection of a common Kconfig option in order
      to ease the sharing of a common infrastructure.
      
      It would thus seem pretty natural to reuse CONFIG_NO_HZ to
      that end. It already implements all the idle dynticks code
      and the full dynticks depends on all that code for now.
      So ideally the choice menu would propose CONFIG_NO_HZ_IDLE and
      CONFIG_NO_HZ_EXTENDED then both would select CONFIG_NO_HZ.
      
      On the other hand we want to stay backward compatible: if
      CONFIG_NO_HZ is set in an older config file, we want to
      enable CONFIG_NO_HZ_IDLE by default.
      
      But we can't afford both at the same time or we run into
      a circular dependency:
      
      1) CONFIG_NO_HZ_IDLE and CONFIG_NO_HZ_EXTENDED both select
         CONFIG_NO_HZ
      2) If CONFIG_NO_HZ is set, we default to CONFIG_NO_HZ_IDLE
      
      We might be able to support that from Kconfig/Kbuild but it
      may not be wise to introduce such a confusing behaviour.
      
      So to solve this, create a new CONFIG_NO_HZ_COMMON option
      which gathers the common code between idle and full dynticks
      (that common code for now is simply the idle dynticks code)
      and select it from their referring Kconfig.
      
      Then we'll later create CONFIG_NO_HZ_IDLE and map CONFIG_NO_HZ
      to it for backward compatibility.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      3451d024
  9. 21 3月, 2013 1 次提交
    • F
      nohz: Wake up full dynticks CPUs when a timer gets enqueued · 1c20091e
      Frederic Weisbecker 提交于
      Wake up a CPU when a timer list timer is enqueued there and
      the target is part of the full dynticks range. Sending an IPI
      to it makes it reconsidering the next timer to program on top
      of recent updates.
      
      This may later be improved by checking if the tick is really
      stopped on the target. This would need some careful
      synchronization though. So deal with such optimization later
      and start simple.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      1c20091e
  10. 08 2月, 2013 1 次提交
  11. 18 11月, 2012 1 次提交
    • F
      printk: Wake up klogd using irq_work · 74876a98
      Frederic Weisbecker 提交于
      klogd is woken up asynchronously from the tick in order
      to do it safely.
      
      However if printk is called when the tick is stopped, the reader
      won't be woken up until the next interrupt, which might not fire
      for a while. As a result, the user may miss some message.
      
      To fix this, lets implement the printk tick using a lazy irq work.
      This subsystem takes care of the timer tick state and can
      fix up accordingly.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      74876a98
  12. 10 10月, 2012 1 次提交
  13. 21 8月, 2012 3 次提交
    • T
      timer: Implement TIMER_IRQSAFE · c5f66e99
      Tejun Heo 提交于
      Timer internals are protected with irq-safe locks but timer execution
      isn't, so a timer being dequeued for execution and its execution
      aren't atomic against IRQs.  This makes it impossible to wait for its
      completion from IRQ handlers and difficult to shoot down a timer from
      IRQ handlers.
      
      This issue caused some issues for delayed_work interface.  Because
      there's no way to reliably shoot down delayed_work->timer from IRQ
      handlers, __cancel_delayed_work() can't share the logic to steal the
      target delayed_work with cancel_delayed_work_sync(), and can only
      steal delayed_works which are on queued on timer.  Similarly, the
      pending mod_delayed_work() can't be used from IRQ handlers.
      
      This patch adds a new timer flag TIMER_IRQSAFE, which makes the timer
      to be executed without enabling IRQ after dequeueing such that its
      dequeueing and execution are atomic against IRQ handlers.
      
      This makes it safe to wait for the timer's completion from IRQ
      handlers, for example, using del_timer_sync().  It can never be
      executing on the local CPU and if executing on other CPUs it won't be
      interrupted until done.
      
      This will enable simplifying delayed_work cancel/mod interface.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: torvalds@linux-foundation.org
      Cc: peterz@infradead.org
      Link: http://lkml.kernel.org/r/1344449428-24962-5-git-send-email-tj@kernel.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      c5f66e99
    • T
      timer: Clean up timer initializers · fc683995
      Tejun Heo 提交于
      Over time, timer initializers became messy with unnecessarily
      duplicated code which are inconsistently spread across timer.h and
      timer.c.
      
      This patch cleans up timer initializers.
      
      * timer.c::__init_timer() is renamed to do_init_timer().
      
      * __TIMER_INITIALIZER() added.  It takes @flags and all initializers
        are wrappers around it.
      
      * init_timer[_on_stack]_key() now take @flags.
      
      * __init_timer[_on_stack]() added.  They take @flags and all init
        macros are wrappers around them.
      
      * __setup_timer[_on_stack]() added.  It uses __init_timer() and takes
        @flags.  All setup macros are wrappers around the two.
      
      Note that this patch doesn't add missing init/setup combinations -
      e.g. init_timer_deferrable_on_stack().  Adding missing ones is
      trivial.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: torvalds@linux-foundation.org
      Cc: peterz@infradead.org
      Link: http://lkml.kernel.org/r/1344449428-24962-4-git-send-email-tj@kernel.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      fc683995
    • T
      timer: Generalize timer->base flags handling · e52b1db3
      Tejun Heo 提交于
      To prepare for addition of another flag, generalize timer->base flags
      handling.
      
      * Rename from TBASE_*_FLAG to TIMER_* and make them LU constants.
      
      * Define and use TIMER_FLAG_MASK for flags masking so that multiple
        flags can be handled correctly.
      
      * Don't dereference timer->base directly even if
        !tbase_get_deferrable().  All two such places are already passed in
        @base, so use it instead.
      
      * Make sure tvec_base's alignment is large enough for timer->base
        flags using BUILD_BUG_ON().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: torvalds@linux-foundation.org
      Cc: peterz@infradead.org
      Link: http://lkml.kernel.org/r/1344449428-24962-2-git-send-email-tj@kernel.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      e52b1db3
  14. 19 8月, 2012 1 次提交
  15. 06 6月, 2012 4 次提交
    • T
      timers: Improve get_next_timer_interrupt() · e40468a5
      Thomas Gleixner 提交于
      Gilad reported at
      
       http://lkml.kernel.org/r/1336056962-10465-2-git-send-email-gilad@benyossef.com
      
      "Current timer code fails to correctly return a value meaning that
       there is no future timer event, with the result that the timer keeps
       getting re-armed in HZ one shot mode even when we could turn it off,
       generating unneeded interrupts.
      
       What is happening is that when __next_timer_interrupt() wishes
       to return a value that signifies "there is no future timer
       event", it returns (base->timer_jiffies + NEXT_TIMER_MAX_DELTA).
      
       However, the code in tick_nohz_stop_sched_tick(), which called
       __next_timer_interrupt() via get_next_timer_interrupt(),
       compares the return value to (last_jiffies + NEXT_TIMER_MAX_DELTA)
       to see if the timer needs to be re-armed.
      
       base->timer_jiffies != last_jiffies and so tick_nohz_stop_sched_tick()
       interperts the return value as indication that there is a distant
       future event 12 days from now and programs the timer to fire next
       after KTIME_MAX nsecs instead of avoiding to arm it. This ends up
       causing a needless interrupt once every KTIME_MAX nsecs."
      
      Fix this by using the new active timer accounting. This avoids scans
      when no active timer is enqueued completely, so we don't have to rely
      on base->timer_next and base->timer_jiffies anymore.
      Reported-by: NGilad Ben-Yossef <gilad@benyossef.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/r/20120525214819.317535385@linutronix.de
      e40468a5
    • T
      timers: Add accounting of non deferrable timers · 99d5f3aa
      Thomas Gleixner 提交于
      The code in get_next_timer_interrupt() is suboptimal as it has to run
      through the cascade to find the next expiring timer. On a completely
      idle core we should only do that when there is an active timer
      enqueued and base->next_timer does not give us a fast answer.
      
      Add accounting of the active timers to the now consolidated
      attach/detach code. I deliberately avoided sanity checks because the
      code is fully symetric and any fiddling with timers w/o using the API
      functions will lead to cute explosions anyway. ulong is big enough
      even on 32bit and if we really run into the situation to have more
      than 1<<32 timers enqueued there, then we are definitely not in a
      state to go idle and run through that code.
      
      This allows us to fix another shortcoming of get_next_timer_interrupt().
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Gilad Ben-Yossef <gilad@benyossef.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/r/20120525214819.236377028@linutronix.de
      99d5f3aa
    • T
      timers: Consolidate base->next_timer update · facbb4a7
      Thomas Gleixner 提交于
      Another bunch of mindlessly copied code. All callers of
      internal_add_timer() except the recascading code updates
      base->next_timer.
      
      Move this into internal_add_timer() and let the cascading code call
      __internal_add_timer().
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Gilad Ben-Yossef <gilad@benyossef.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/r/20120525214819.189946224@linutronix.de
      facbb4a7
    • T
      timers: Create detach_if_pending() and use it · ec44bc7a
      Thomas Gleixner 提交于
      Most callers of detach_timer() have the same pattern around
      them. Check whether the timer is pending and eventually updating
      base->next_timer.
      
      Create detach_if_pending() and replace the duplicated code.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Gilad Ben-Yossef <gilad@benyossef.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/r/20120525214819.131246037@linutronix.de
      ec44bc7a
  16. 15 5月, 2012 1 次提交
    • P
      lockdep: fix oops in processing workqueue · 4d82a1de
      Peter Zijlstra 提交于
      Under memory load, on x86_64, with lockdep enabled, the workqueue's
      process_one_work() has been seen to oops in __lock_acquire(), barfing
      on a 0xffffffff00000000 pointer in the lockdep_map's class_cache[].
      
      Because it's permissible to free a work_struct from its callout function,
      the map used is an onstack copy of the map given in the work_struct: and
      that copy is made without any locking.
      
      Surprisingly, gcc (4.5.1 in Hugh's case) uses "rep movsl" rather than
      "rep movsq" for that structure copy: which might race with a workqueue
      user's wait_on_work() doing lock_map_acquire() on the source of the
      copy, putting a pointer into the class_cache[], but only in time for
      the top half of that pointer to be copied to the destination map.
      
      Boom when process_one_work() subsequently does lock_map_acquire()
      on its onstack copy of the lockdep_map.
      
      Fix this, and a similar instance in call_timer_fn(), with a
      lockdep_copy_map() function which additionally NULLs the class_cache[].
      
      Note: this oops was actually seen on 3.4-next, where flush_work() newly
      does the racing lock_map_acquire(); but Tejun points out that 3.4 and
      earlier are already vulnerable to the same through wait_on_work().
      
      * Patch orginally from Peter.  Hugh modified it a bit and wrote the
        description.
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Reported-by: NHugh Dickins <hughd@google.com>
      LKML-Reference: <alpine.LSU.2.00.1205070951170.1544@eggly.anvils>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      4d82a1de
  17. 03 5月, 2012 1 次提交
  18. 27 4月, 2012 1 次提交
  19. 09 12月, 2011 1 次提交
  20. 24 11月, 2011 2 次提交
  21. 31 10月, 2011 1 次提交
  22. 03 6月, 2011 1 次提交
  23. 08 3月, 2011 1 次提交
    • S
      debugobjects: Add hint for better object identification · 99777288
      Stanislaw Gruszka 提交于
      In complex subsystems like mac80211 structures can contain several
      timers and work structs, so identifying a specific instance from the
      call trace and object type output of debugobjects can be hard.
      
      Allow the subsystems which support debugobjects to provide a hint
      function. This function returns a pointer to a kernel address
      (preferrably the objects callback function) which is printed along
      with the debugobjects type.
      
      Add hint methods for timer_list, work_struct and hrtimer.
      
      [ tglx: Massaged changelog, made it compile ]
      Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
      LKML-Reference: <20110307085809.GA9334@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      99777288
  24. 16 2月, 2011 1 次提交
  25. 08 2月, 2011 1 次提交
  26. 04 2月, 2011 1 次提交
  27. 31 1月, 2011 1 次提交
  28. 13 12月, 2010 1 次提交
    • C
      timers: Use this_cpu_read · 7496351a
      Christoph Lameter 提交于
      Eric asked for this.
      
      [tglx: Because it generates faster code according to Erics ]
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: linux-mm@kvack.org
      LKML-Reference: <alpine.DEB.2.00.1011301404490.4039@router.home>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      7496351a
  29. 09 12月, 2010 2 次提交
    • H
      nohz: Fix get_next_timer_interrupt() vs cpu hotplug · dbd87b5a
      Heiko Carstens 提交于
      This fixes a bug as seen on 2.6.32 based kernels where timers got
      enqueued on offline cpus.
      
      If a cpu goes offline it might still have pending timers. These will
      be migrated during CPU_DEAD handling after the cpu is offline.
      However while the cpu is going offline it will schedule the idle task
      which will then call tick_nohz_stop_sched_tick().
      
      That function in turn will call get_next_timer_intterupt() to figure
      out if the tick of the cpu can be stopped or not. If it turns out that
      the next tick is just one jiffy off (delta_jiffies == 1)
      tick_nohz_stop_sched_tick() incorrectly assumes that the tick should
      not stop and takes an early exit and thus it won't update the load
      balancer cpu.
      
      Just afterwards the cpu will be killed and the load balancer cpu could
      be the offline cpu.
      
      On 2.6.32 based kernel get_nohz_load_balancer() gets called to decide
      on which cpu a timer should be enqueued (see __mod_timer()). Which
      leads to the possibility that timers get enqueued on an offline cpu.
      These will never expire and can cause a system hang.
      
      This has been observed 2.6.32 kernels. On current kernels
      __mod_timer() uses get_nohz_timer_target() which doesn't have that
      problem. However there might be other problems because of the too
      early exit tick_nohz_stop_sched_tick() in case a cpu goes offline.
      
      The easiest and probably safest fix seems to be to let
      get_next_timer_interrupt() just lie and let it say there isn't any
      pending timer if the current cpu is offline.
      
      I also thought of moving migrate_[hr]timers() from CPU_DEAD to
      CPU_DYING, but seeing that there already have been fixes at least in
      the hrtimer code in this area I'm afraid that this could add new
      subtle bugs.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20101201091109.GA8984@osiris.boeblingen.de.ibm.com>
      Cc: stable@kernel.org
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      dbd87b5a
    • P
      sched: Cure more NO_HZ load average woes · 0f004f5a
      Peter Zijlstra 提交于
      There's a long-running regression that proved difficult to fix and
      which is hitting certain people and is rather annoying in its effects.
      
      Damien reported that after 74f5187a (sched: Cure load average vs
      NO_HZ woes) his load average is unnaturally high, he also noted that
      even with that patch reverted the load avgerage numbers are not
      correct.
      
      The problem is that the previous patch only solved half the NO_HZ
      problem, it addressed the part of going into NO_HZ mode, not of
      comming out of NO_HZ mode. This patch implements that missing half.
      
      When comming out of NO_HZ mode there are two important things to take
      care of:
      
       - Folding the pending idle delta into the global active count.
       - Correctly aging the averages for the idle-duration.
      
      So with this patch the NO_HZ interaction should be complete and
      behaviour between CONFIG_NO_HZ=[yn] should be equivalent.
      
      Furthermore, this patch slightly changes the load average computation
      by adding a rounding term to the fixed point multiplication.
      Reported-by: NDamien Wyart <damien.wyart@free.fr>
      Reported-by: NTim McGrath <tmhikaru@gmail.com>
      Tested-by: NDamien Wyart <damien.wyart@free.fr>
      Tested-by: NOrion Poplawski <orion@cora.nwra.com>
      Tested-by: NKyle McMartin <kyle@mcmartin.ca>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: stable@kernel.org
      Cc: Chase Douglas <chase.douglas@canonical.com>
      LKML-Reference: <1291129145.32004.874.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0f004f5a
  30. 22 10月, 2010 2 次提交