1. 10 7月, 2007 25 次提交
  2. 24 6月, 2007 1 次提交
  3. 19 6月, 2007 2 次提交
    • L
      Fix possible runqueue lock starvation in wait_task_inactive() · fa490cfd
      Linus Torvalds 提交于
      Miklos Szeredi reported very long pauses (several seconds, sometimes
      more) on his T60 (with a Core2Duo) which he managed to track down to
      wait_task_inactive()'s open-coded busy-loop.
      
      He observed that an interrupt on one core tries to acquire the
      runqueue-lock but does not succeed in doing so for a very long time -
      while wait_task_inactive() on the other core loops waiting for the first
      core to deschedule a task (which it wont do while spinning in an
      interrupt handler).
      
      This rewrites wait_task_inactive() to do all its waiting optimistically
      without any locks taken at all, and then just double-check the end
      result with the proper runqueue lock held over just a very short
      section.  If there were races in the optimistic wait, of a preemption
      event scheduled the process away, we simply re-synchronize, and start
      over.
      
      So the code now looks like this:
      
      	repeat:
      		/* Unlocked, optimistic looping! */
      		rq = task_rq(p);
      		while (task_running(rq, p))
      			cpu_relax();
      
      		/* Get the *real* values */
      		rq = task_rq_lock(p, &flags);
      		running = task_running(rq, p);
      		array = p->array;
      		task_rq_unlock(rq, &flags);
      
      		/* Check them.. */
      		if (unlikely(running)) {
      			cpu_relax();
      			goto repeat;
      		}
      
      		/* Preempted away? Yield if so.. */
      		if (unlikely(array)) {
      			yield();
      			goto repeat;
      		}
      
      Basically, that first "while()" loop is done entirely without any
      locking at all (and doesn't check for the case where the target process
      might have been preempted away), and so it's possibly "incorrect", but
      we don't really care.  Both the runqueue used, and the "task_running()"
      check might be the wrong tests, but they won't oops - they just mean
      that we could possibly get the wrong results due to lack of locking and
      exit the loop early in the case of a race condition.
      
      So once we've exited the loop, we then get the proper (and careful) rq
      lock, and check the running/runnable state _safely_.  And if it turns
      out that our quick-and-dirty and unsafe loop was wrong after all, we
      just go back and try it all again.
      
      (The patch also adds a lot of comments, which is the actual bulk of it
      all, to make it more obvious why we can do these things without holding
      the locks).
      
      Thanks to Miklos for all the testing and tracking it down.
      Tested-by: NMiklos Szeredi <miklos@szeredi.hu>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fa490cfd
    • I
      sched: fix SysRq-N (normalize RT tasks) · a0f98a1c
      Ingo Molnar 提交于
      Gene Heskett reported the following problem while testing CFS: SysRq-N
      is not always effective in normalizing tasks back to SCHED_OTHER.
      
      The reason for that turns out to be the following bug:
      
       - normalize_rt_tasks() uses for_each_process() to iterate through all
         tasks in the system.  The problem is, this method does not iterate
         through all tasks, it iterates through all thread groups.
      
      The proper mechanism to enumerate over all threads is to use a
      do_each_thread() + while_each_thread() loop.
      Reported-by: NGene Heskett <gene.heskett@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a0f98a1c
  4. 24 5月, 2007 1 次提交
    • T
      Prevent going idle with softirq pending · 98d82567
      Thomas Gleixner 提交于
      The NOHZ patch contains a check for softirqs pending when a CPU goes idle.
      The BUG is unrelated to NOHZ, it just was made visible by the NOHZ patch.
      The BUG showed up mainly on P4 / hyperthreading enabled machines which lead
      the investigations into the wrong direction in the first place.  The real
      cause is in cond_resched_softirq():
      
      cond_resched_softirq() is enabling softirqs without invoking the softirq
      daemon when softirqs are pending.  This leads to the warning message in the
      NOHZ idle code:
      
      t1 runs softirq disabled code on CPU#0
      interrupt happens, softirq is raised, but deferred (softirqs disabled)
      t1 calls cond_resched_softirq()
      	enables softirqs via _local_bh_enable()
      	calls schedule()
      t2 runs
      t1 is migrated to CPU#1
      t2 is done and invokes idle()
      NOHZ detects the pending softirq
      
      Fix: change _local_bh_enable() to local_bh_enable() so the softirq
      daemon is invoked.
      
      Thanks to Anant Nitya for debugging this with great patience !
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      98d82567
  5. 10 5月, 2007 2 次提交
  6. 09 5月, 2007 8 次提交
  7. 08 5月, 2007 1 次提交