1. 06 6月, 2008 11 次提交
    • M
      sched: CPU hotplug events must not destroy scheduler domains created by the cpusets · 5c8e1ed1
      Max Krasnyansky 提交于
      First issue is not related to the cpusets. We're simply leaking doms_cur.
      It's allocated in arch_init_sched_domains() which is called for every
      hotplug event. So we just keep reallocation doms_cur without freeing it.
      I introduced free_sched_domains() function that cleans things up.
      
      Second issue is that sched domains created by the cpusets are
      completely destroyed by the CPU hotplug events. For all CPU hotplug
      events scheduler attaches all CPUs to the NULL domain and then puts
      them all into the single domain thereby destroying domains created
      by the cpusets (partition_sched_domains).
      The solution is simple, when cpusets are enabled scheduler should not
      create default domain and instead let cpusets do that. Which is
      exactly what the patch does.
      Signed-off-by: NMax Krasnyansky <maxk@qualcomm.com>
      Cc: pj@sgi.com
      Cc: menage@google.com
      Cc: rostedt@goodmis.org
      Cc: mingo@elte.hu
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      5c8e1ed1
    • G
      sched: fix cpupri hotplug support · 1f11eb6a
      Gregory Haskins 提交于
      The RT folks over at RedHat found an issue w.r.t. hotplug support which
      was traced to problems with the cpupri infrastructure in the scheduler:
      
      https://bugzilla.redhat.com/show_bug.cgi?id=449676
      
      This bug affects 23-rt12+, 24-rtX, 25-rtX, and sched-devel.  This patch
      applies to 25.4-rt4, though it should trivially apply to most cpupri enabled
      kernels mentioned above.
      
      It turned out that the issue was that offline cpus could get inadvertently
      registered with cpupri so that they were erroneously selected during
      migration decisions.  The end result would be an OOPS as the offline cpu
      had tasks routed to it.
      
      This patch generalizes the old join/leave domain interface into an
      online/offline interface, and adjusts the root-domain/hotplug code to
      utilize it.
      
      I was able to easily reproduce the issue prior to this patch, and am no
      longer able to reproduce it after this patch.  I can offline cpus
      indefinately and everything seems to be in working order.
      
      Thanks to Arnaldo (acme), Thomas, and Peter for doing the legwork to point
      me in the right direction.  Also thank you to Peter for reviewing the
      early iterations of this patch.
      Signed-off-by: NGregory Haskins <ghaskins@novell.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      1f11eb6a
    • G
      sched: print the sd->level in sched_domain_debug code · 099f98c8
      Gautham R Shenoy 提交于
      While printing out the visual representation of the sched-domains, print
      the level (MC, SMT, CPU, NODE, ... ) of each of the sched_domains.
      
      Credit: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      099f98c8
    • D
      sched: add comments for ifdefs in sched.c · 6d6bc0ad
      Dhaval Giani 提交于
      make sched.c easier to read.
      Signed-off-by: NDhaval Giani <dhaval@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      6d6bc0ad
    • A
      sched: print module list in the "scheduling while atomic" warning · e21f5b15
      Arjan van de Ven 提交于
      For the normal WARN_ON() etc we added a print-the-modules-list already,
      which is very useful to figure out candidates for certain types of bugs.
      
      This patch adds the same print to the "scheduling while atomic" BUG warning,
      for the same reason: when we get here it's very useful to see which modules
      are loaded, to narrow down the candidate code list.
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: mingo@elte.hu
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      e21f5b15
    • R
      sched: fix defined-but-unused warning · 81d41d7e
      Rabin Vincent 提交于
      Fix this warning, which appears with !CONFIG_SMP:
      kernel/sched.c:1216: warning: `init_hrtick' defined but not used
      Signed-off-by: NRabin Vincent <rabin@rab.in>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      81d41d7e
    • T
      namespacecheck: fixes in kernel/sched.c · f7dcd80b
      Thomas Gleixner 提交于
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      f7dcd80b
    • D
      sched: check for SD_SERIALIZE atomically in rebalance_domains() · d07355f5
      Dmitry Adamushko 提交于
      Nothing really serious here, mainly just a matter of nit-picking :-/
      
      From: Dmitry Adamushko <dmitry.adamushko@gmail.com>
      For CONFIG_SCHED_DEBUG && CONFIG_SYSCT configs, sd->flags can be altered
      while being manipulated in rebalance_domains(). Let's do an atomic check.
      We rely here on the atomicity of read/write accesses for aligned words.
      Signed-off-by: NDmitry Adamushko <dmitry.adamushko@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      d07355f5
    • G
      sched: use a 2-d bitmap for searching lowest-pri CPU · 6e0534f2
      Gregory Haskins 提交于
      The current code use a linear algorithm which causes scaling issues
      on larger SMP machines.  This patch replaces that algorithm with a
      2-dimensional bitmap to reduce latencies in the wake-up path.
      Signed-off-by: NGregory Haskins <ghaskins@novell.com>
      Acked-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      6e0534f2
    • M
      sched: make !hrtick faster · f333fdc9
      Mike Galbraith 提交于
      it is safe to ignore timers and flags when the feature is disabled.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      f333fdc9
    • G
      sched: prioritize non-migratable tasks over migratable ones · 45c01e82
      Gregory Haskins 提交于
      Dmitry Adamushko pointed out a known flaw in the rt-balancing algorithm
      that could allow suboptimal balancing if a non-migratable task gets
      queued behind a running migratable one.  It is discussed in this thread:
      
      http://lkml.org/lkml/2008/4/22/296
      
      This issue has been further exacerbated by a recent checkin to
      sched-devel (git-id 5eee63a5ebc19a870ac40055c0be49457f3a89a3).
      
      >From a pure priority standpoint, the run-queue is doing the "right"
      thing. Using Dmitry's nomenclature, if T0 is on cpu1 first, and T1
      wakes up at equal or lower priority (affined only to cpu1) later, it
      *should* wait for T0 to finish.  However, in reality that is likely
      suboptimal from a system perspective if there are other cores that
      could allow T0 and T1 to run concurrently.  Since T1 can not migrate,
      the only choice for higher concurrency is to try to move T0.  This is
      not something we addessed in the recent rt-balancing re-work.
      
      This patch tries to enhance the balancing algorithm by accomodating this
      scenario.  It accomplishes this by incorporating the migratability of a
      task into its priority calculation.  Within a numerical tsk->prio, a
      non-migratable task is logically higher than a migratable one.  We
      maintain this by introducing a new per-priority queue (xqueue, or
      exclusive-queue) for holding non-migratable tasks.  The scheduler will
      draw from the xqueue over the standard shared-queue (squeue) when
      available.
      
      There are several details for utilizing this properly.
      
      1) During task-wake-up, we not only need to check if the priority
         preempts the current task, but we also need to check for this
         non-migratable condition.  Therefore, if a non-migratable task wakes
         up and sees an equal priority migratable task already running, it
         will attempt to preempt it *if* there is a likelyhood that the
         current task will find an immediate home.
      
      2) Tasks only get this non-migratable "priority boost" on wake-up.  Any
         requeuing will result in the non-migratable task being queued to the
         end of the shared queue.  This is an attempt to prevent the system
         from being completely unfair to migratable tasks during things like
         SCHED_RR timeslicing.
      
      I am sure this patch introduces potentially "odd" behavior if you
      concoct a scenario where a bunch of non-migratable threads could starve
      migratable ones given the right pattern.  I am not yet convinced that
      this is a problem since we are talking about tasks of equal RT priority
      anyway, and there never is much in the way of guarantees against
      starvation under that scenario anyway. (e.g. you could come up with a
      similar scenario with a specific timing environment verses an affinity
      environment).  I can be convinced otherwise, but for now I think this is
      "ok".
      Signed-off-by: NGregory Haskins <ghaskins@novell.com>
      CC: Dmitry Adamushko <dmitry.adamushko@gmail.com>
      CC: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      45c01e82
  2. 29 5月, 2008 4 次提交
  3. 15 5月, 2008 1 次提交
  4. 12 5月, 2008 1 次提交
    • L
      Add new 'cond_resched_bkl()' helper function · c3921ab7
      Linus Torvalds 提交于
      It acts exactly like a regular 'cond_resched()', but will not get
      optimized away when CONFIG_PREEMPT is set.
      
      Normal kernel code is already preemptable in the presense of
      CONFIG_PREEMPT, so cond_resched() is optimized away (see commit
      02b67cc3 "sched: do not do
      cond_resched() when CONFIG_PREEMPT").
      
      But when wanting to conditionally reschedule while holding a lock, you
      need to use "cond_sched_lock(lock)", and the new function is the BKL
      equivalent of that.
      
      Also make fs/locks.c use it.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c3921ab7
  5. 11 5月, 2008 1 次提交
    • L
      BKL: revert back to the old spinlock implementation · 8e3e076c
      Linus Torvalds 提交于
      The generic semaphore rewrite had a huge performance regression on AIM7
      (and potentially other BKL-heavy benchmarks) because the generic
      semaphores had been rewritten to be simple to understand and fair.  The
      latter, in particular, turns a semaphore-based BKL implementation into a
      mess of scheduling.
      
      The attempt to fix the performance regression failed miserably (see the
      previous commit 00b41ec2 'Revert
      "semaphore: fix"'), and so for now the simple and sane approach is to
      instead just go back to the old spinlock-based BKL implementation that
      never had any issues like this.
      
      This patch also has the advantage of being reported to fix the
      regression completely according to Yanmin Zhang, unlike the semaphore
      hack which still left a couple percentage point regression.
      
      As a spinlock, the BKL obviously has the potential to be a latency
      issue, but it's not really any different from any other spinlock in that
      respect.  We do want to get rid of the BKL asap, but that has been the
      plan for several years.
      
      These days, the biggest users are in the tty layer (open/release in
      particular) and Alan holds out some hope:
      
        "tty release is probably a few months away from getting cured - I'm
         afraid it will almost certainly be the very last user of the BKL in
         tty to get fixed as it depends on everything else being sanely locked."
      
      so while we're not there yet, we do have a plan of action.
      Tested-by: NYanmin Zhang <yanmin_zhang@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Alexander Viro <viro@ftp.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8e3e076c
  6. 06 5月, 2008 10 次提交
  7. 01 5月, 2008 1 次提交
    • R
      rename div64_64 to div64_u64 · 6f6d6a1a
      Roman Zippel 提交于
      Rename div64_64 to div64_u64 to make it consistent with the other divide
      functions, so it clearly includes the type of the divide.  Move its definition
      to math64.h as currently no architecture overrides the generic implementation.
       They can still override it of course, but the duplicated declarations are
      avoided.
      Signed-off-by: NRoman Zippel <zippel@linux-m68k.org>
      Cc: Avi Kivity <avi@qumranet.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Patrick McHardy <kaber@trash.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6f6d6a1a
  8. 29 4月, 2008 2 次提交
  9. 25 4月, 2008 4 次提交
  10. 23 4月, 2008 1 次提交
  11. 20 4月, 2008 4 次提交