1. 09 12月, 2010 3 次提交
    • H
      nohz: Fix get_next_timer_interrupt() vs cpu hotplug · dbd87b5a
      Heiko Carstens 提交于
      This fixes a bug as seen on 2.6.32 based kernels where timers got
      enqueued on offline cpus.
      
      If a cpu goes offline it might still have pending timers. These will
      be migrated during CPU_DEAD handling after the cpu is offline.
      However while the cpu is going offline it will schedule the idle task
      which will then call tick_nohz_stop_sched_tick().
      
      That function in turn will call get_next_timer_intterupt() to figure
      out if the tick of the cpu can be stopped or not. If it turns out that
      the next tick is just one jiffy off (delta_jiffies == 1)
      tick_nohz_stop_sched_tick() incorrectly assumes that the tick should
      not stop and takes an early exit and thus it won't update the load
      balancer cpu.
      
      Just afterwards the cpu will be killed and the load balancer cpu could
      be the offline cpu.
      
      On 2.6.32 based kernel get_nohz_load_balancer() gets called to decide
      on which cpu a timer should be enqueued (see __mod_timer()). Which
      leads to the possibility that timers get enqueued on an offline cpu.
      These will never expire and can cause a system hang.
      
      This has been observed 2.6.32 kernels. On current kernels
      __mod_timer() uses get_nohz_timer_target() which doesn't have that
      problem. However there might be other problems because of the too
      early exit tick_nohz_stop_sched_tick() in case a cpu goes offline.
      
      The easiest and probably safest fix seems to be to let
      get_next_timer_interrupt() just lie and let it say there isn't any
      pending timer if the current cpu is offline.
      
      I also thought of moving migrate_[hr]timers() from CPU_DEAD to
      CPU_DYING, but seeing that there already have been fixes at least in
      the hrtimer code in this area I'm afraid that this could add new
      subtle bugs.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20101201091109.GA8984@osiris.boeblingen.de.ibm.com>
      Cc: stable@kernel.org
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      dbd87b5a
    • M
      Sched: fix skip_clock_update optimization · f26f9aff
      Mike Galbraith 提交于
      idle_balance() drops/retakes rq->lock, leaving the previous task
      vulnerable to set_tsk_need_resched().  Clear it after we return
      from balancing instead, and in setup_thread_stack() as well, so
      no successfully descheduled or never scheduled task has it set.
      
      Need resched confused the skip_clock_update logic, which assumes
      that the next call to update_rq_clock() will come nearly immediately
      after being set.  Make the optimization robust against the waking
      a sleeper before it sucessfully deschedules case by checking that
      the current task has not been dequeued before setting the flag,
      since it is that useless clock update we're trying to save, and
      clear unconditionally in schedule() proper instead of conditionally
      in put_prev_task().
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Reported-by: NBjoern B. Brandenburg <bbb.lst@gmail.com>
      Tested-by: NYong Zhang <yong.zhang0@gmail.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: stable@kernel.org
      LKML-Reference: <1291802742.1417.9.camel@marge.simson.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f26f9aff
    • P
      sched: Cure more NO_HZ load average woes · 0f004f5a
      Peter Zijlstra 提交于
      There's a long-running regression that proved difficult to fix and
      which is hitting certain people and is rather annoying in its effects.
      
      Damien reported that after 74f5187a (sched: Cure load average vs
      NO_HZ woes) his load average is unnaturally high, he also noted that
      even with that patch reverted the load avgerage numbers are not
      correct.
      
      The problem is that the previous patch only solved half the NO_HZ
      problem, it addressed the part of going into NO_HZ mode, not of
      comming out of NO_HZ mode. This patch implements that missing half.
      
      When comming out of NO_HZ mode there are two important things to take
      care of:
      
       - Folding the pending idle delta into the global active count.
       - Correctly aging the averages for the idle-duration.
      
      So with this patch the NO_HZ interaction should be complete and
      behaviour between CONFIG_NO_HZ=[yn] should be equivalent.
      
      Furthermore, this patch slightly changes the load average computation
      by adding a rounding term to the fixed point multiplication.
      Reported-by: NDamien Wyart <damien.wyart@free.fr>
      Reported-by: NTim McGrath <tmhikaru@gmail.com>
      Tested-by: NDamien Wyart <damien.wyart@free.fr>
      Tested-by: NOrion Poplawski <orion@cora.nwra.com>
      Tested-by: NKyle McMartin <kyle@mcmartin.ca>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: stable@kernel.org
      Cc: Chase Douglas <chase.douglas@canonical.com>
      LKML-Reference: <1291129145.32004.874.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0f004f5a
  2. 08 12月, 2010 18 次提交
  3. 07 12月, 2010 12 次提交
  4. 06 12月, 2010 7 次提交