1. 24 6月, 2007 1 次提交
  2. 19 6月, 2007 2 次提交
    • L
      Fix possible runqueue lock starvation in wait_task_inactive() · fa490cfd
      Linus Torvalds 提交于
      Miklos Szeredi reported very long pauses (several seconds, sometimes
      more) on his T60 (with a Core2Duo) which he managed to track down to
      wait_task_inactive()'s open-coded busy-loop.
      
      He observed that an interrupt on one core tries to acquire the
      runqueue-lock but does not succeed in doing so for a very long time -
      while wait_task_inactive() on the other core loops waiting for the first
      core to deschedule a task (which it wont do while spinning in an
      interrupt handler).
      
      This rewrites wait_task_inactive() to do all its waiting optimistically
      without any locks taken at all, and then just double-check the end
      result with the proper runqueue lock held over just a very short
      section.  If there were races in the optimistic wait, of a preemption
      event scheduled the process away, we simply re-synchronize, and start
      over.
      
      So the code now looks like this:
      
      	repeat:
      		/* Unlocked, optimistic looping! */
      		rq = task_rq(p);
      		while (task_running(rq, p))
      			cpu_relax();
      
      		/* Get the *real* values */
      		rq = task_rq_lock(p, &flags);
      		running = task_running(rq, p);
      		array = p->array;
      		task_rq_unlock(rq, &flags);
      
      		/* Check them.. */
      		if (unlikely(running)) {
      			cpu_relax();
      			goto repeat;
      		}
      
      		/* Preempted away? Yield if so.. */
      		if (unlikely(array)) {
      			yield();
      			goto repeat;
      		}
      
      Basically, that first "while()" loop is done entirely without any
      locking at all (and doesn't check for the case where the target process
      might have been preempted away), and so it's possibly "incorrect", but
      we don't really care.  Both the runqueue used, and the "task_running()"
      check might be the wrong tests, but they won't oops - they just mean
      that we could possibly get the wrong results due to lack of locking and
      exit the loop early in the case of a race condition.
      
      So once we've exited the loop, we then get the proper (and careful) rq
      lock, and check the running/runnable state _safely_.  And if it turns
      out that our quick-and-dirty and unsafe loop was wrong after all, we
      just go back and try it all again.
      
      (The patch also adds a lot of comments, which is the actual bulk of it
      all, to make it more obvious why we can do these things without holding
      the locks).
      
      Thanks to Miklos for all the testing and tracking it down.
      Tested-by: NMiklos Szeredi <miklos@szeredi.hu>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fa490cfd
    • I
      sched: fix SysRq-N (normalize RT tasks) · a0f98a1c
      Ingo Molnar 提交于
      Gene Heskett reported the following problem while testing CFS: SysRq-N
      is not always effective in normalizing tasks back to SCHED_OTHER.
      
      The reason for that turns out to be the following bug:
      
       - normalize_rt_tasks() uses for_each_process() to iterate through all
         tasks in the system.  The problem is, this method does not iterate
         through all tasks, it iterates through all thread groups.
      
      The proper mechanism to enumerate over all threads is to use a
      do_each_thread() + while_each_thread() loop.
      Reported-by: NGene Heskett <gene.heskett@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a0f98a1c
  3. 24 5月, 2007 1 次提交
    • T
      Prevent going idle with softirq pending · 98d82567
      Thomas Gleixner 提交于
      The NOHZ patch contains a check for softirqs pending when a CPU goes idle.
      The BUG is unrelated to NOHZ, it just was made visible by the NOHZ patch.
      The BUG showed up mainly on P4 / hyperthreading enabled machines which lead
      the investigations into the wrong direction in the first place.  The real
      cause is in cond_resched_softirq():
      
      cond_resched_softirq() is enabling softirqs without invoking the softirq
      daemon when softirqs are pending.  This leads to the warning message in the
      NOHZ idle code:
      
      t1 runs softirq disabled code on CPU#0
      interrupt happens, softirq is raised, but deferred (softirqs disabled)
      t1 calls cond_resched_softirq()
      	enables softirqs via _local_bh_enable()
      	calls schedule()
      t2 runs
      t1 is migrated to CPU#1
      t2 is done and invokes idle()
      NOHZ detects the pending softirq
      
      Fix: change _local_bh_enable() to local_bh_enable() so the softirq
      daemon is invoked.
      
      Thanks to Anant Nitya for debugging this with great patience !
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      98d82567
  4. 10 5月, 2007 2 次提交
  5. 09 5月, 2007 8 次提交
  6. 08 5月, 2007 1 次提交
  7. 28 4月, 2007 1 次提交
  8. 08 4月, 2007 2 次提交
  9. 05 3月, 2007 1 次提交
    • C
      [PATCH] sched: remove SMT nice · 69f7c0a1
      Con Kolivas 提交于
      Remove the SMT-nice feature which idles sibling cpus on SMT cpus to
      facilitiate nice working properly where cpu power is shared.  The idling of
      cpus in the presence of runnable tasks is considered too fragile, easy to
      break with outside code, and the complexity of managing this system if an
      architecture comes along with many logical cores sharing cpu power will be
      unworkable.
      
      Remove the associated per_cpu_gain variable in sched_domains used only by
      this code.
      
      Also:
      
        The reason is that with dynticks enabled, this code breaks without yet
        further tweaks so dynticks brought on the rapid demise of this code.  So
        either we tweak this code or kill it off entirely.  It was Ingo's preference
        to kill it off.  Either way this needs to happen for 2.6.21 since dynticks
        has gone in.
      Signed-off-by: NCon Kolivas <kernel@kolivas.org>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      69f7c0a1
  10. 02 3月, 2007 2 次提交
  11. 13 2月, 2007 2 次提交
    • Z
      [PATCH] i386: paravirt CPU hypercall batching mode · 9226d125
      Zachary Amsden 提交于
      The VMI ROM has a mode where hypercalls can be queued and batched.  This turns
      out to be a significant win during context switch, but must be done at a
      specific point before side effects to CPU state are visible to subsequent
      instructions.  This is similar to the MMU batching hooks already provided.
      The same hooks could be used by the Xen backend to implement a context switch
      multicall.
      
      To explain a bit more about lazy modes in the paravirt patches, basically, the
      idea is that only one of lazy CPU or MMU mode can be active at any given time.
       Lazy MMU mode is similar to this lazy CPU mode, and allows for batching of
      multiple PTE updates (say, inside a remap loop), but to avoid keeping some
      kind of state machine about when to flush cpu or mmu updates, we just allow
      one or the other to be active.  Although there is no real reason a more
      comprehensive scheme could not be implemented, there is also no demonstrated
      need for this extra complexity.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      9226d125
    • N
      [PATCH] sched: avoid div in rebalance_tick · ff91691b
      Nick Piggin 提交于
      Avoid expensive integer divide 3 times per CPU per tick.
      
      A userspace test of this loop went from 26ns, down to 19ns on a G5; and
      from 123ns down to 28ns on a P3.
      
      (Also avoid a variable bit shift, as suggested by Alan. The effect
      of this wasn't noticable on the CPUs I tested with).
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ff91691b
  12. 12 2月, 2007 2 次提交
  13. 12 1月, 2007 1 次提交
  14. 31 12月, 2006 1 次提交
    • I
      [PATCH] sched: fix cond_resched_softirq() offset · 9414232f
      Ingo Molnar 提交于
      Remove the __resched_legal() check: it is conceptually broken.  The biggest
      problem it had is that it can mask buggy cond_resched() calls.  A
      cond_resched() call is only legal if we are not in an atomic context, with
      two narrow exceptions:
      
       - if the system is booting
       - a reacquire_kernel_lock() down() done while PREEMPT_ACTIVE is set
      
      But __resched_legal() hid this and just silently returned whenever
      these primitives were called from invalid contexts. (Same goes for
      cond_resched_locked() and cond_resched_softirq()).
      
      Furthermore, the __legal_resched(0) call was buggy in that it caused
      unnecessarily long softirq latencies via cond_resched_softirq().  (which is
      only called from softirq-off sections, hence the code did nothing.)
      
      The fix is to resurrect the efficiency of the might_sleep checks and to
      only allow the narrow exceptions.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9414232f
  15. 23 12月, 2006 2 次提交
  16. 21 12月, 2006 1 次提交
    • P
      [PATCH] sched: improve efficiency of sched_fork() · bc947631
      Peter Williams 提交于
      Problem:
        sched_fork() has always called scheduler_tick() in some (unlikely)
        circumstances in order to update the current task in light of those
        circumstances.  It has always been the case that the work done by
        scheduler_tick() was more than was required to handle the problem in
        hand but no harm was done except for the waste of a few CPU cycles.
      
        However, the splitting of scheduler_tick() into two procedures in
        2.6.20-rc1 enables the wasted cycles to be saved as the new procedure
        task_running_tick() does all the work that is required to rectify the
        problem being handled.
      
      Solution:
        Replace the call to scheduler_tick() in sched_fork() with a call to
        task_running_tick().
      Signed-off-by: NPeter Williams <pwil3058@bigpond.com.au>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      bc947631
  17. 14 12月, 2006 1 次提交
  18. 11 12月, 2006 9 次提交