1. 25 12月, 2008 1 次提交
  2. 19 12月, 2008 1 次提交
    • I
      tracing: fix warnings in kernel/trace/trace_sched_switch.c · c71dd42d
      Ingo Molnar 提交于
      these warnings:
      
        kernel/trace/trace_sched_switch.c: In function ‘tracing_sched_register’:
        kernel/trace/trace_sched_switch.c:96: warning: passing argument 1 of ‘register_trace_sched_wakeup_new’ from incompatible pointer type
        kernel/trace/trace_sched_switch.c:112: warning: passing argument 1 of ‘unregister_trace_sched_wakeup_new’ from incompatible pointer type
        kernel/trace/trace_sched_switch.c: In function ‘tracing_sched_unregister’:
        kernel/trace/trace_sched_switch.c:121: warning: passing argument 1 of ‘unregister_trace_sched_wakeup_new’ from incompatible pointer type
      
      Trigger because sched_wakeup_new tracepoints need the same trace
      signature as sched_wakeup - which was changed recently.
      
      Fix it.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c71dd42d
  3. 18 12月, 2008 1 次提交
  4. 16 12月, 2008 3 次提交
  5. 12 12月, 2008 3 次提交
  6. 10 12月, 2008 1 次提交
    • B
      sched: CPU remove deadlock fix · 9a2bd244
      Brian King 提交于
      Impact: fix possible deadlock in CPU hot-remove path
      
      This patch fixes a possible deadlock scenario in the CPU remove path.
      migration_call grabs rq->lock, then wakes up everything on rq->migration_queue
      with the lock held. Then one of the tasks on the migration queue ends up
      calling tg_shares_up which then also tries to acquire the same rq->lock.
      
      [c000000058eab2e0] c000000000502078 ._spin_lock_irqsave+0x98/0xf0
      [c000000058eab370] c00000000008011c .tg_shares_up+0x10c/0x20c
      [c000000058eab430] c00000000007867c .walk_tg_tree+0xc4/0xfc
      [c000000058eab4d0] c0000000000840c8 .try_to_wake_up+0xb0/0x3c4
      [c000000058eab590] c0000000000799a0 .__wake_up_common+0x6c/0xe0
      [c000000058eab640] c00000000007ada4 .complete+0x54/0x80
      [c000000058eab6e0] c000000000509fa8 .migration_call+0x5fc/0x6f8
      [c000000058eab7c0] c000000000504074 .notifier_call_chain+0x68/0xe0
      [c000000058eab860] c000000000506568 ._cpu_down+0x2b0/0x3f4
      [c000000058eaba60] c000000000506750 .cpu_down+0xa4/0x108
      [c000000058eabb10] c000000000507e54 .store_online+0x44/0xa8
      [c000000058eabba0] c000000000396260 .sysdev_store+0x3c/0x50
      [c000000058eabc10] c0000000001a39b8 .sysfs_write_file+0x124/0x18c
      [c000000058eabcd0] c00000000013061c .vfs_write+0xd0/0x1bc
      [c000000058eabd70] c0000000001308a4 .sys_write+0x68/0x114
      [c000000058eabe30] c0000000000086b4 syscall_exit+0x0/0x40
      Signed-off-by: NBrian King <brking@linux.vnet.ibm.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9a2bd244
  7. 08 12月, 2008 2 次提交
  8. 02 12月, 2008 1 次提交
  9. 30 11月, 2008 1 次提交
    • I
      sched: prevent divide by zero error in cpu_avg_load_per_task, update · af6d596f
      Ingo Molnar 提交于
      Regarding the bug addressed in:
      
        4cd42620: sched: prevent divide by zero error in cpu_avg_load_per_task
      
      Linus points out that the fix is not complete:
      
      > There's nothing that keeps gcc from deciding not to reload
      > rq->nr_running.
      >
      > Of course, in _practice_, I don't think gcc ever will (if it decides
      > that it will spill, gcc is likely going to decide that it will
      > literally spill the local variable to the stack rather than decide to
      > reload off the pointer), but it's a valid compiler optimization, and
      > it even has a name (rematerialization).
      >
      > So I suspect that your patch does fix the bug, but it still leaves the
      > fairly unlikely _potential_ for it to re-appear at some point.
      >
      > We have ACCESS_ONCE() as a macro to guarantee that the compiler
      > doesn't rematerialize a pointer access. That also would clarify
      > the fact that we access something unsafe outside a lock.
      
      So make sure our nr_running value is immutable and cannot change
      after we check it for nonzero.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      af6d596f
  10. 29 11月, 2008 1 次提交
    • A
      sched: move double_unlock_balance() higher · 70574a99
      Alexey Dobriyan 提交于
      Move double_lock_balance()/double_unlock_balance() higher to fix the following
      with gcc-3.4.6:
      
         CC      kernel/sched.o
       In file included from kernel/sched.c:1605:
       kernel/sched_rt.c: In function `find_lock_lowest_rq':
       kernel/sched_rt.c:914: sorry, unimplemented: inlining failed in call to 'double_unlock_balance': function body not available
       kernel/sched_rt.c:1077: sorry, unimplemented: called from here
       make[2]: *** [kernel/sched.o] Error 1
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      70574a99
  11. 27 11月, 2008 1 次提交
    • S
      sched: prevent divide by zero error in cpu_avg_load_per_task · 4cd42620
      Steven Rostedt 提交于
      Impact: fix divide by zero crash in scheduler rebalance irq
      
      While testing the branch profiler, I hit this crash:
      
      divide error: 0000 [#1] PREEMPT SMP
      [...]
      RIP: 0010:[<ffffffff8024a008>]  [<ffffffff8024a008>] cpu_avg_load_per_task+0x50/0x7f
      [...]
      Call Trace:
       <IRQ> <0> [<ffffffff8024fd43>] find_busiest_group+0x3e5/0xcaa
       [<ffffffff8025da75>] rebalance_domains+0x2da/0xa21
       [<ffffffff80478769>] ? find_next_bit+0x1b2/0x1e6
       [<ffffffff8025e2ce>] run_rebalance_domains+0x112/0x19f
       [<ffffffff8026d7c2>] __do_softirq+0xa8/0x232
       [<ffffffff8020ea7c>] call_softirq+0x1c/0x3e
       [<ffffffff8021047a>] do_softirq+0x94/0x1cd
       [<ffffffff8026d5eb>] irq_exit+0x6b/0x10e
       [<ffffffff8022e6ec>] smp_apic_timer_interrupt+0xd3/0xff
       [<ffffffff8020e4b3>] apic_timer_interrupt+0x13/0x20
      
      The code for cpu_avg_load_per_task has:
      
      	if (rq->nr_running)
      		rq->avg_load_per_task = rq->load.weight / rq->nr_running;
      
      The runqueue lock is not held here, and there is nothing that prevents
      the rq->nr_running from going to zero after it passes the if condition.
      
      The branch profiler simply made the race window bigger.
      
      This patch saves off the rq->nr_running to a local variable and uses that
      for both the condition and the division.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4cd42620
  12. 26 11月, 2008 1 次提交
  13. 25 11月, 2008 1 次提交
    • P
      hrtimer: removing all ur callback modes · ca109491
      Peter Zijlstra 提交于
      Impact: cleanup, move all hrtimer processing into hardirq context
      
      This is an attempt at removing some of the hrtimer complexity by
      reducing the number of callback modes to 1.
      
      This means that all hrtimer callback functions will be ran from HARD-irq
      context.
      
      I went through all the 30 odd hrtimer callback functions in the kernel
      and saw only one that I'm not quite sure of, which is the one in
      net/can/bcm.c - hence I'm CC-ing the folks responsible for that code.
      
      Furthermore, the hrtimer core now calls callbacks directly with IRQs
      disabled in case you try to enqueue an expired timer. If this timer is a
      periodic timer (which should use hrtimer_forward() to advance its time)
      then it might be possible to end up in an inf. recursive loop due to the
      fact that hrtimer_forward() doesn't round up to the next timer
      granularity, and therefore keeps on calling the callback - obviously
      this needs a fix.
      
      Aside from that, this seems to compile and actually boot on my dual core
      test box - although I'm sure there are some bugs in, me not hitting any
      makes me certain :-)
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ca109491
  14. 23 11月, 2008 2 次提交
  15. 21 11月, 2008 1 次提交
    • V
      sched: update comment for move_task_off_dead_cpu · 957ad016
      Vegard Nossum 提交于
      Impact: cleanup
      
      This commit:
      
      commit f7b4cddc
      Author: Oleg Nesterov <oleg@tv-sign.ru>
      Date:   Tue Oct 16 23:30:56 2007 -0700
      
          do CPU_DEAD migrating under read_lock(tasklist) instead of write_lock_irq(ta
      
          Currently move_task_off_dead_cpu() is called under
          write_lock_irq(tasklist).  This means it can't use task_lock() which is
          needed to improve migrating to take task's ->cpuset into account.
      
          Change the code to call move_task_off_dead_cpu() with irqs enabled, and
          change migrate_live_tasks() to use read_lock(tasklist).
      
      ...forgot to update the comment in front of move_task_off_dead_cpu.
      
      Reference: http://lkml.org/lkml/2008/6/23/135Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      957ad016
  16. 20 11月, 2008 1 次提交
    • K
      sched: fix inconsistency when redistribute per-cpu tg->cfs_rq shares · ec4e0e2f
      Ken Chen 提交于
      Impact: make load-balancing more consistent
      
      In the update_shares() path leading to tg_shares_up(), the calculation of
      per-cpu cfs_rq shares is rather erratic even under moderate task wake up
      rate.  The problem is that the per-cpu tg->cfs_rq load weight used in the
      sd_rq_weight aggregation and actual redistribution of the cfs_rq->shares
      are collected at different time.  Under moderate system load, we've seen
      quite a bit of variation on the cfs_rq->shares and ultimately wildly
      affects sched_entity's load weight.
      
      This patch caches the result of initial per-cpu load weight when doing the
      sum calculation, and then pass it down to update_group_shares_cpu() for
      redistributing per-cpu cfs_rq shares.  This allows consistent total cfs_rq
      shares across all CPUs. It also simplifies the rounding and zero load
      weight check.
      Signed-off-by: NKen Chen <kenchen@google.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ec4e0e2f
  17. 18 11月, 2008 1 次提交
    • L
      cpuset: fix regression when failed to generate sched domains · 700018e0
      Li Zefan 提交于
      Impact: properly rebuild sched-domains on kmalloc() failure
      
      When cpuset failed to generate sched domains due to kmalloc()
      failure, the scheduler should fallback to the single partition
      'fallback_doms' and rebuild sched domains, but now it only
      destroys but not rebuilds sched domains.
      
      The regression was introduced by:
      
      | commit dfb512ec
      | Author: Max Krasnyansky <maxk@qualcomm.com>
      | Date:   Fri Aug 29 13:11:41 2008 -0700
      |
      |    sched: arch_reinit_sched_domains() must destroy domains to force rebuild
      
      After the above commit, partition_sched_domains(0, NULL, NULL) will
      only destroy sched domains and partition_sched_domains(1, NULL, NULL)
      will create the default sched domain.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Max Krasnyansky <maxk@qualcomm.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      700018e0
  18. 17 11月, 2008 1 次提交
  19. 16 11月, 2008 1 次提交
    • M
      tracepoints: add DECLARE_TRACE() and DEFINE_TRACE() · 7e066fb8
      Mathieu Desnoyers 提交于
      Impact: API *CHANGE*. Must update all tracepoint users.
      
      Add DEFINE_TRACE() to tracepoints to let them declare the tracepoint
      structure in a single spot for all the kernel. It helps reducing memory
      consumption, especially when declaring a lot of tracepoints, e.g. for
      kmalloc tracing.
      
      *API CHANGE WARNING*: now, DECLARE_TRACE() must be used in headers for
      tracepoint declarations rather than DEFINE_TRACE(). This is the sane way
      to do it. The name previously used was misleading.
      
      Updates scheduler instrumentation to follow this API change.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7e066fb8
  20. 14 11月, 2008 3 次提交
  21. 13 11月, 2008 1 次提交
    • I
      sched: fix init_idle()'s use of sched_clock() · 5cbd54ef
      Ingo Molnar 提交于
      Maciej Rutecki reported:
      
      > I have this bug during suspend to disk:
      >
      > [  188.592151] Enabling non-boot CPUs ...
      > [  188.592151] SMP alternatives: switching to SMP code
      > [  188.666058] BUG: using smp_processor_id() in preemptible
      > [00000000]
      > code: suspend_to_disk/2934
      > [  188.666064] caller is native_sched_clock+0x2b/0x80
      
      Which, as noted by Linus, was caused by me, via:
      
        7cbaef9c "sched: optimize sched_clock() a bit"
      
      Move the rq locking a bit earlier in the initialization sequence,
      that will make the sched_clock() call in init_idle() non-preemptible.
      Reported-by: NMaciej Rutecki <maciej.rutecki@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5cbd54ef
  22. 12 11月, 2008 1 次提交
  23. 11 11月, 2008 2 次提交
  24. 10 11月, 2008 1 次提交
  25. 07 11月, 2008 4 次提交
  26. 05 11月, 2008 1 次提交
    • P
      sched: backward looking buddy · 4793241b
      Peter Zijlstra 提交于
      Impact: improve/change/fix wakeup-buddy scheduling
      
      Currently we only have a forward looking buddy, that is, we prefer to
      schedule to the task we last woke up, under the presumption that its
      going to consume the data we just produced, and therefore will have
      cache hot benefits.
      
      This allows co-waking producer/consumer task pairs to run ahead of the
      pack for a little while, keeping their cache warm. Without this, we
      would interleave all pairs, utterly trashing the cache.
      
      This patch introduces a backward looking buddy, that is, suppose that
      in the above scenario, the consumer preempts the producer before it
      can go to sleep, we will therefore miss the wakeup from consumer to
      producer (its already running, after all), breaking the cycle and
      reverting to the cache-trashing interleaved schedule pattern.
      
      The backward buddy will try to schedule back to the task that woke us
      up in case the forward buddy is not available, under the assumption
      that the last task will be the one with the most cache hot task around
      barring current.
      
      This will basically allow a task to continue after it got preempted.
      
      In order to avoid starvation, we allow either buddy to get wakeup_gran
      ahead of the pack.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4793241b
  27. 04 11月, 2008 2 次提交