1. 20 4月, 2008 23 次提交
  2. 26 3月, 2008 1 次提交
    • T
      NOHZ: reevaluate idle sleep length after add_timer_on() · 06d8308c
      Thomas Gleixner 提交于
      add_timer_on() can add a timer on a CPU which is currently in a long
      idle sleep, but the timer wheel is not reevaluated by the nohz code on
      that CPU. So a timer can be delayed for quite a long time. This
      triggered a false positive in the clocksource watchdog code.
      
      To avoid this we need to wake up the idle CPU and enforce the
      reevaluation of the timer wheel for the next timer event.
      
      Add a function, which checks a given CPU for idle state, marks the
      idle task with NEED_RESCHED and sends a reschedule IPI to notify the
      other CPU of the change in the timer wheel.
      
      Call this function from add_timer_on().
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: stable@kernel.org
      
      --
       include/linux/sched.h |    6 ++++++
       kernel/sched.c        |   43 +++++++++++++++++++++++++++++++++++++++++++
       kernel/timer.c        |   10 +++++++++-
       3 files changed, 58 insertions(+), 1 deletion(-)
      06d8308c
  3. 21 3月, 2008 4 次提交
  4. 19 3月, 2008 2 次提交
    • I
      sched: wakeup-buddy tasks are cache-hot · f540a608
      Ingo Molnar 提交于
      Wakeup-buddy tasks are cache-hot - this makes it a bit harder
      for the load-balancer to tear them apart. (but it's still possible,
      if the load is sufficiently assymetric)
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f540a608
    • I
      sched: improve affine wakeups · 4ae7d5ce
      Ingo Molnar 提交于
      improve affine wakeups. Maintain the 'overlap' metric based on CFS's
      sum_exec_runtime - which means the amount of time a task executes
      after it wakes up some other task.
      
      Use the 'overlap' for the wakeup decisions: if the 'overlap' is short,
      it means there's strong workload coupling between this task and the
      woken up task. If the 'overlap' is large then the workload is decoupled
      and the scheduler will move them to separate CPUs more easily.
      
      ( Also slightly move the preempt_check within try_to_wake_up() - this has
        no effect on functionality but allows 'early wakeups' (for still-on-rq
        tasks) to be correctly accounted as well.)
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4ae7d5ce
  5. 15 3月, 2008 4 次提交
    • P
      sched: fix overload performance: buddy wakeups · aa2ac252
      Peter Zijlstra 提交于
      Currently we schedule to the leftmost task in the runqueue. When the
      runtimes are very short because of some server/client ping-pong,
      especially in over-saturated workloads, this will cycle through all
      tasks trashing the cache.
      
      Reduce cache trashing by keeping dependent tasks together by running
      newly woken tasks first. However, by not running the leftmost task first
      we could starve tasks because the wakee can gain unlimited runtime.
      
      Therefore we only run the wakee if its within a small
      (wakeup_granularity) window of the leftmost task. This preserves
      fairness, but does alternate server/client task groups.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      aa2ac252
    • I
      sched: fix calc_delta_mine() · 27d11726
      Ingo Molnar 提交于
      lw->weight can be 0 for a short time during bootup.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      27d11726
    • I
      sched: fix update_load_add()/sub() · e89996ae
      Ingo Molnar 提交于
      Clear the cached inverse value when updating load. This is needed for
      calc_delta_mine() to work correctly when using the rq load.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      e89996ae
    • H
      sched: fix race in schedule() · 0e1f3483
      Hiroshi Shimamoto 提交于
      Fix a hard to trigger crash seen in the -rt kernel that also affects
      the vanilla scheduler.
      
      There is a race condition between schedule() and some dequeue/enqueue
      functions; rt_mutex_setprio(), __setscheduler() and sched_move_task().
      
      When scheduling to idle, idle_balance() is called to pull tasks from
      other busy processor. It might drop the rq lock. It means that those 3
      functions encounter on_rq=0 and running=1. The current task should be
      put when running.
      
      Here is a possible scenario:
      
         CPU0                               CPU1
          |                              schedule()
          |                              ->deactivate_task()
          |                              ->idle_balance()
          |                              -->load_balance_newidle()
      rt_mutex_setprio()                     |
          |                              --->double_lock_balance()
          *get lock                          *rel lock
          * on_rq=0, ruuning=1               |
          * sched_class is changed           |
          *rel lock                          *get lock
          :                                  |
                                             :
                                         ->put_prev_task_rt()
                                         ->pick_next_task_fair()
                                             => panic
      
      The current process of CPU1(P1) is scheduling. Deactivated P1, and the
      scheduler looks for another process on other CPU's runqueue because CPU1
      will be idle. idle_balance(), load_balance_newidle() and
      double_lock_balance() are called and double_lock_balance() could drop
      the rq lock. On the other hand, CPU0 is trying to boost the priority of
      P1. The result of boosting only P1's prio and sched_class are changed to
      RT. The sched entities of P1 and P1's group are never put. It makes
      cfs_rq invalid, because the cfs_rq has curr and no leaf, but
      pick_next_task_fair() is called, then the kernel panics.
      Signed-off-by: NHiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0e1f3483
  6. 11 3月, 2008 2 次提交
    • G
      keep rd->online and cpu_online_map in sync · 08f503b0
      Gregory Haskins 提交于
      It is possible to allow the root-domain cache of online cpus to
      become out of sync with the global cpu_online_map.  This is because we
      currently trigger removal of cpus too early in the notifier chain.
      Other DOWN_PREPARE handlers may in fact run and reconfigure the
      root-domain topology, thereby stomping on our own offline handling.
      
      The end result is that rd->online may become out of sync with
      cpu_online_map, which results in potential task misrouting.
      
      So change the offline handling to be more tightly coupled with the
      global offline process by triggering on CPU_DYING intead of
      CPU_DOWN_PREPARE.
      Signed-off-by: NGregory Haskins <ghaskins@novell.com>
      Cc: Gautham R Shenoy <ego@in.ibm.com>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      08f503b0
    • G
      Revert "cpu hotplug: adjust root-domain->online span in response to hotplug event" · 1f94ef59
      Gregory Haskins 提交于
      This reverts commit 393d94d9.
      
      Lets fix this right.
      Signed-off-by: NGregory Haskins <ghaskins@novell.com>
      Cc: Gautham R Shenoy <ego@in.ibm.com>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1f94ef59
  7. 10 3月, 2008 1 次提交
  8. 07 3月, 2008 3 次提交