1. 25 11月, 2008 3 次提交
  2. 23 11月, 2008 2 次提交
  3. 21 11月, 2008 1 次提交
    • V
      sched: update comment for move_task_off_dead_cpu · 957ad016
      Vegard Nossum 提交于
      Impact: cleanup
      
      This commit:
      
      commit f7b4cddc
      Author: Oleg Nesterov <oleg@tv-sign.ru>
      Date:   Tue Oct 16 23:30:56 2007 -0700
      
          do CPU_DEAD migrating under read_lock(tasklist) instead of write_lock_irq(ta
      
          Currently move_task_off_dead_cpu() is called under
          write_lock_irq(tasklist).  This means it can't use task_lock() which is
          needed to improve migrating to take task's ->cpuset into account.
      
          Change the code to call move_task_off_dead_cpu() with irqs enabled, and
          change migrate_live_tasks() to use read_lock(tasklist).
      
      ...forgot to update the comment in front of move_task_off_dead_cpu.
      
      Reference: http://lkml.org/lkml/2008/6/23/135Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      957ad016
  4. 20 11月, 2008 1 次提交
    • K
      sched: fix inconsistency when redistribute per-cpu tg->cfs_rq shares · ec4e0e2f
      Ken Chen 提交于
      Impact: make load-balancing more consistent
      
      In the update_shares() path leading to tg_shares_up(), the calculation of
      per-cpu cfs_rq shares is rather erratic even under moderate task wake up
      rate.  The problem is that the per-cpu tg->cfs_rq load weight used in the
      sd_rq_weight aggregation and actual redistribution of the cfs_rq->shares
      are collected at different time.  Under moderate system load, we've seen
      quite a bit of variation on the cfs_rq->shares and ultimately wildly
      affects sched_entity's load weight.
      
      This patch caches the result of initial per-cpu load weight when doing the
      sum calculation, and then pass it down to update_group_shares_cpu() for
      redistributing per-cpu cfs_rq shares.  This allows consistent total cfs_rq
      shares across all CPUs. It also simplifies the rounding and zero load
      weight check.
      Signed-off-by: NKen Chen <kenchen@google.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ec4e0e2f
  5. 18 11月, 2008 1 次提交
    • L
      cpuset: fix regression when failed to generate sched domains · 700018e0
      Li Zefan 提交于
      Impact: properly rebuild sched-domains on kmalloc() failure
      
      When cpuset failed to generate sched domains due to kmalloc()
      failure, the scheduler should fallback to the single partition
      'fallback_doms' and rebuild sched domains, but now it only
      destroys but not rebuilds sched domains.
      
      The regression was introduced by:
      
      | commit dfb512ec
      | Author: Max Krasnyansky <maxk@qualcomm.com>
      | Date:   Fri Aug 29 13:11:41 2008 -0700
      |
      |    sched: arch_reinit_sched_domains() must destroy domains to force rebuild
      
      After the above commit, partition_sched_domains(0, NULL, NULL) will
      only destroy sched domains and partition_sched_domains(1, NULL, NULL)
      will create the default sched domain.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Max Krasnyansky <maxk@qualcomm.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      700018e0
  6. 17 11月, 2008 1 次提交
  7. 16 11月, 2008 1 次提交
    • M
      tracepoints: add DECLARE_TRACE() and DEFINE_TRACE() · 7e066fb8
      Mathieu Desnoyers 提交于
      Impact: API *CHANGE*. Must update all tracepoint users.
      
      Add DEFINE_TRACE() to tracepoints to let them declare the tracepoint
      structure in a single spot for all the kernel. It helps reducing memory
      consumption, especially when declaring a lot of tracepoints, e.g. for
      kmalloc tracing.
      
      *API CHANGE WARNING*: now, DECLARE_TRACE() must be used in headers for
      tracepoint declarations rather than DEFINE_TRACE(). This is the sane way
      to do it. The name previously used was misleading.
      
      Updates scheduler instrumentation to follow this API change.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7e066fb8
  8. 13 11月, 2008 1 次提交
    • I
      sched: fix init_idle()'s use of sched_clock() · 5cbd54ef
      Ingo Molnar 提交于
      Maciej Rutecki reported:
      
      > I have this bug during suspend to disk:
      >
      > [  188.592151] Enabling non-boot CPUs ...
      > [  188.592151] SMP alternatives: switching to SMP code
      > [  188.666058] BUG: using smp_processor_id() in preemptible
      > [00000000]
      > code: suspend_to_disk/2934
      > [  188.666064] caller is native_sched_clock+0x2b/0x80
      
      Which, as noted by Linus, was caused by me, via:
      
        7cbaef9c "sched: optimize sched_clock() a bit"
      
      Move the rq locking a bit earlier in the initialization sequence,
      that will make the sched_clock() call in init_idle() non-preemptible.
      Reported-by: NMaciej Rutecki <maciej.rutecki@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5cbd54ef
  9. 12 11月, 2008 1 次提交
  10. 11 11月, 2008 2 次提交
  11. 10 11月, 2008 1 次提交
  12. 07 11月, 2008 4 次提交
  13. 05 11月, 2008 1 次提交
    • P
      sched: backward looking buddy · 4793241b
      Peter Zijlstra 提交于
      Impact: improve/change/fix wakeup-buddy scheduling
      
      Currently we only have a forward looking buddy, that is, we prefer to
      schedule to the task we last woke up, under the presumption that its
      going to consume the data we just produced, and therefore will have
      cache hot benefits.
      
      This allows co-waking producer/consumer task pairs to run ahead of the
      pack for a little while, keeping their cache warm. Without this, we
      would interleave all pairs, utterly trashing the cache.
      
      This patch introduces a backward looking buddy, that is, suppose that
      in the above scenario, the consumer preempts the producer before it
      can go to sleep, we will therefore miss the wakeup from consumer to
      producer (its already running, after all), breaking the cycle and
      reverting to the cache-trashing interleaved schedule pattern.
      
      The backward buddy will try to schedule back to the task that woke us
      up in case the forward buddy is not available, under the assumption
      that the last task will be the one with the most cache hot task around
      barring current.
      
      This will basically allow a task to continue after it got preempted.
      
      In order to avoid starvation, we allow either buddy to get wakeup_gran
      ahead of the pack.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4793241b
  14. 04 11月, 2008 3 次提交
  15. 30 10月, 2008 1 次提交
  16. 29 10月, 2008 1 次提交
  17. 24 10月, 2008 2 次提交
  18. 23 10月, 2008 1 次提交
  19. 20 10月, 2008 1 次提交
    • P
      sched: optimize group load balancer · ffda12a1
      Peter Zijlstra 提交于
      I noticed that tg_shares_up() unconditionally takes rq-locks for all cpus
      in the sched_domain. This hurts.
      
      We need the rq-locks whenever we change the weight of the per-cpu group sched
      entities. To allevate this a little, only change the weight when the new
      weight is at least shares_thresh away from the old value.
      
      This avoids the rq-lock for the top level entries, since those will never
      be re-weighted, and fuzzes the lower level entries a little to gain performance
      in semi-stable situations.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ffda12a1
  20. 16 10月, 2008 3 次提交
  21. 14 10月, 2008 1 次提交
    • M
      tracing, sched: LTTng instrumentation - scheduler · 0a16b607
      Mathieu Desnoyers 提交于
      Instrument the scheduler activity (sched_switch, migration, wakeups,
      wait for a task, signal delivery) and process/thread
      creation/destruction (fork, exit, kthread stop). Actually, kthread
      creation is not instrumented in this patch because it is architecture
      dependent. It allows to connect tracers such as ftrace which detects
      scheduling latencies, good/bad scheduler decisions. Tools like LTTng can
      export this scheduler information along with instrumentation of the rest
      of the kernel activity to perform post-mortem analysis on the scheduler
      activity.
      
      About the performance impact of tracepoints (which is comparable to
      markers), even without immediate values optimizations, tests done by
      Hideo Aoki on ia64 show no regression. His test case was using hackbench
      on a kernel where scheduler instrumentation (about 5 events in code
      scheduler code) was added. See the "Tracepoints" patch header for
      performance result detail.
      
      Changelog :
      
      - Change instrumentation location and parameter to match ftrace
        instrumentation, previously done with kernel markers.
      
      [ mingo@elte.hu: conflict resolutions ]
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Acked-by: N'Peter Zijlstra' <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0a16b607
  22. 09 10月, 2008 1 次提交
    • I
      sched debug: add name to sched_domain sysctl entries · a5d8c348
      Ingo Molnar 提交于
      add /proc/sys/kernel/sched_domain/cpu0/domain0/name, to make
      it easier to see which specific scheduler domain remained at
      that entry.
      
      Since we process the scheduler domain tree and
      simplify it, it's not always immediately clear during debugging
      which domain came from where.
      
      depends on CONFIG_SCHED_DEBUG=y.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a5d8c348
  23. 06 10月, 2008 1 次提交
  24. 30 9月, 2008 2 次提交
    • N
      sched: improve preempt debugging · 7317d7b8
      Nick Piggin 提交于
      This patch helped me out with a problem I recently had....
      
      Basically, when the kernel lock is held, then preempt_count underflow does not
      get detected until it is released which may be a long time (and arbitrarily,
      eg at different points it may be rescheduled). If the bkl is released at
      schedule, the resulting output is actually fairly cryptic...
      
      With any other lock that elevates preempt_count, it is illegal to schedule
      under it (which would get found pretty quickly). bkl allows scheduling with
      preempt_count elevated, which makes underflows hard to debug.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7317d7b8
    • I
      timers: fix itimer/many thread hang, fix · 1508487e
      Ingo Molnar 提交于
      fix bogus rq dereference: v3 removed the locking but also removed the rq
      initialization.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1508487e
  25. 29 9月, 2008 1 次提交
    • T
      hrtimer: prevent migration of per CPU hrtimers · ccc7dadf
      Thomas Gleixner 提交于
      Impact: per CPU hrtimers can be migrated from a dead CPU
      
      The hrtimer code has no knowledge about per CPU timers, but we need to
      prevent the migration of such timers and warn when such a timer is
      active at migration time.
      
      Explicitely mark the timers as per CPU and use a more understandable
      mode descriptor for the interrupts safe unlocked callback mode, which
      is used by hrtimer_sleeper and the scheduler code.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      ccc7dadf
  26. 28 9月, 2008 1 次提交
  27. 23 9月, 2008 1 次提交