1. 21 1月, 2010 1 次提交
  2. 17 12月, 2009 2 次提交
    • P
      sched: Remove the cfs_rq dependency from set_task_cpu() · 88ec22d3
      Peter Zijlstra 提交于
      In order to remove the cfs_rq dependency from set_task_cpu() we
      need to ensure the task is cfs_rq invariant for all callsites.
      
      The simple approach is to substract cfs_rq->min_vruntime from
      se->vruntime on dequeue, and add cfs_rq->min_vruntime on
      enqueue.
      
      However, this has the downside of breaking FAIR_SLEEPERS since
      we loose the old vruntime as we only maintain the relative
      position.
      
      To solve this, we observe that we only migrate runnable tasks,
      we do this using deactivate_task(.sleep=0) and
      activate_task(.wakeup=0), therefore we can restrain the
      min_vruntime invariance to that state.
      
      The only other case is wakeup balancing, since we want to
      maintain the old vruntime we cannot make it relative on dequeue,
      but since we don't migrate inactive tasks, we can do so right
      before we activate it again.
      
      This is where we need the new pre-wakeup hook, we need to call
      this while still holding the old rq->lock. We could fold it into
      ->select_task_rq(), but since that has multiple callsites and
      would obfuscate the locking requirements, that seems like a
      fudge.
      
      This leaves the fork() case, simply make sure that ->task_fork()
      leaves the ->vruntime in a relative state.
      
      This covers all cases where set_task_cpu() gets called, and
      ensures it sees a relative vruntime.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      LKML-Reference: <20091216170518.191697025@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      88ec22d3
    • P
      sched: Select_task_rq_fair() must honour SD_LOAD_BALANCE · e4f42888
      Peter Zijlstra 提交于
      We should skip !SD_LOAD_BALANCE domains.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      LKML-Reference: <20091216170517.653578430@chello.nl>
      CC: stable@kernel.org
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e4f42888
  3. 15 12月, 2009 1 次提交
  4. 09 12月, 2009 9 次提交
  5. 24 11月, 2009 1 次提交
    • T
      sched: Optimize branch hint in pick_next_task_fair() · 36ace27e
      Tim Blechmann 提交于
      Branch hint profiling on my nehalem machine showed 90%
      incorrect branch hints:
      
        15728471 158903754  90 pick_next_task_fair
        sched_fair.c    1555
      Signed-off-by: NTim Blechmann <tim@klingt.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <4B0BBBB1.2050100@klingt.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      36ace27e
  6. 13 11月, 2009 2 次提交
  7. 05 11月, 2009 2 次提交
    • M
      sched: Fix affinity logic in select_task_rq_fair() · fd210738
      Mike Galbraith 提交于
      Ingo Molnar reported:
      
      [   26.804000] BUG: using smp_processor_id() in preemptible [00000000] code: events/1/10
      [   26.808000] caller is vmstat_update+0x26/0x70
      [   26.812000] Pid: 10, comm: events/1 Not tainted 2.6.32-rc5 #6887
      [   26.816000] Call Trace:
      [   26.820000]  [<c1924a24>] ? printk+0x28/0x3c
      [   26.824000]  [<c13258a0>] debug_smp_processor_id+0xf0/0x110
      [   26.824000] mount used greatest stack depth: 1464 bytes left
      [   26.828000]  [<c111d086>] vmstat_update+0x26/0x70
      [   26.832000]  [<c1086418>] worker_thread+0x188/0x310
      [   26.836000]  [<c10863b7>] ? worker_thread+0x127/0x310
      [   26.840000]  [<c108d310>] ? autoremove_wake_function+0x0/0x60
      [   26.844000]  [<c1086290>] ? worker_thread+0x0/0x310
      [   26.848000]  [<c108cf0c>] kthread+0x7c/0x90
      [   26.852000]  [<c108ce90>] ? kthread+0x0/0x90
      [   26.856000]  [<c100c0a7>] kernel_thread_helper+0x7/0x10
      [   26.860000] BUG: using smp_processor_id() in preemptible [00000000] code: events/1/10
      [   26.864000] caller is vmstat_update+0x3c/0x70
      
      Because this commit:
      
        a1f84a3a: sched: Check for an idle shared cache in select_task_rq_fair()
      
      broke ->cpus_allowed.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: arjan@infradead.org
      Cc: <stable@kernel.org>
      LKML-Reference: <1257415066.12867.1.camel@marge.simson.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fd210738
    • M
      sched: Check for an idle shared cache in select_task_rq_fair() · a1f84a3a
      Mike Galbraith 提交于
      When waking affine, check for an idle shared cache, and if
      found, wake to that CPU/sibling instead of the waker's CPU.
      
      This improves pgsql+oltp ramp up by roughly 8%. Possibly more
      for other loads, depending on overlap. The trade-off is a
      roughly 1% peak downturn if tasks are truly synchronous.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: <stable@kernel.org>
      LKML-Reference: <1256654138.17752.7.camel@marge.simson.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a1f84a3a
  8. 24 10月, 2009 1 次提交
    • M
      sched: Strengthen buddies and mitigate buddy induced latencies · f685ceac
      Mike Galbraith 提交于
      This patch restores the effectiveness of LAST_BUDDY in preventing
      pgsql+oltp from collapsing due to wakeup preemption. It also
      switches LAST_BUDDY to exclusively do what it does best, namely
      mitigate the effects of aggressive wakeup preemption, which
      improves vmark throughput markedly, and restores mysql+oltp
      scalability.
      
      Since buddies are about scalability, enable them beginning at the
      point where we begin expanding sched_latency, namely
      sched_nr_latency. Previously, buddies were cleared aggressively,
      which seriously reduced their effectiveness. Not clearing
      aggressively however, produces a small drop in mysql+oltp
      throughput immediately after peak, indicating that LAST_BUDDY is
      actually doing some harm. This is right at the point where X on the
      desktop in competition with another load wants low latency service.
      Ergo, do not enable until we need to scale.
      
      To mitigate latency induced by buddies, or by a task just missing
      wakeup preemption, check latency at tick time.
      
      Last hunk prevents buddies from stymieing BALANCE_NEWIDLE via
      CACHE_HOT_BUDDY.
      
      Supporting performance tests:
      
       tip   = v2.6.32-rc5-1497-ga525b32
       tipx  = NO_GENTLE_FAIR_SLEEPERS NEXT_BUDDY granularity knobs = 31 knobs + 31 buddies
       tip+x = NO_GENTLE_FAIR_SLEEPERS granularity knobs = 31 knobs
      
      (Three run averages except where noted.)
      
       vmark:
       ------
       tip           108466 messages per second
       tip+          125307 messages per second
       tip+x         125335 messages per second
       tipx          117781 messages per second
       2.6.31.3      122729 messages per second
      
       mysql+oltp:
       -----------
       clients          1        2        4        8       16       32       64        128    256
       ..........................................................................................
       tip        9949.89 18690.20 34801.24 34460.04 32682.88 30765.97 28305.27 25059.64 19548.08
       tip+      10013.90 18526.84 34900.38 34420.14 33069.83 32083.40 30578.30 28010.71 25605.47
       tipx       9698.71 18002.70 34477.56 33420.01 32634.30 31657.27 29932.67 26827.52 21487.18
       2.6.31.3   8243.11 18784.20 34404.83 33148.38 31900.32 31161.90 29663.81 25995.94 18058.86
      
       pgsql+oltp:
       -----------
       clients          1        2        4        8       16       32       64      128      256
       ..........................................................................................
       tip       13686.37 26609.25 51934.28 51347.81 49479.51 45312.65 36691.91 26851.57 24145.35
       tip+ (1x) 13907.85 27135.87 52951.98 52514.04 51742.52 50705.43 49947.97 48374.19 46227.94
       tip+x     13906.78 27065.81 52951.19 52542.59 52176.11 51815.94 50838.90 49439.46 46891.00
       tipx      13742.46 26769.81 52351.99 51891.73 51320.79 50938.98 50248.65 48908.70 46553.84
       2.6.31.3  13815.35 26906.46 52683.34 52061.31 51937.10 51376.80 50474.28 49394.47 47003.25
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f685ceac
  9. 14 10月, 2009 1 次提交
    • P
      sched: Do less agressive buddy clearing · 92f6a5e3
      Peter Zijlstra 提交于
      Yanmin reported a hackbench regression due to:
      
       > commit de69a80b
       > Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
       > Date:   Thu Sep 17 09:01:20 2009 +0200
       >
       >     sched: Stop buddies from hogging the system
      
      I really liked de69a80b, and it affecting hackbench shows I wasn't
      crazy ;-)
      
      So hackbench is a multi-cast, with one sender spraying multiple
      receivers, who in their turn don't spray back.
      
      This would be exactly the scenario that patch 'cures'. Previously
      we would not clear the last buddy after running the next task,
      allowing the sender to get back to work sooner than it otherwise
      ought to have been, increasing latencies for other tasks.
      
      Now, since those receivers don't poke back, they don't enforce the
      buddy relation, which means there's nothing to re-elect the sender.
      
      Cure this by less agressively clearing the buddy stats. Only clear
      buddies when they were not chosen. It should still avoid a buddy
      sticking around long after its served its time.
      Reported-by: N"Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      CC: Mike Galbraith <efault@gmx.de>
      LKML-Reference: <1255084986.8802.46.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      92f6a5e3
  10. 24 9月, 2009 1 次提交
  11. 21 9月, 2009 1 次提交
  12. 19 9月, 2009 1 次提交
  13. 18 9月, 2009 1 次提交
  14. 17 9月, 2009 3 次提交
    • P
      sched: Fix SD_POWERSAVING_BALANCE|SD_PREFER_LOCAL vs SD_WAKE_AFFINE · 29cd8bae
      Peter Zijlstra 提交于
      The SD_POWERSAVING_BALANCE|SD_PREFER_LOCAL code can break out of
      the domain iteration early, making us miss the SD_WAKE_AFFINE bits.
      
      Fix this by continuing iteration until there is no need for a
      larger domain.
      
      This also cleans up the cgroup stuff a bit, but not having two
      update_shares() invocations.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      29cd8bae
    • P
      sched: Stop buddies from hogging the system · de69a80b
      Peter Zijlstra 提交于
      Clear buddies more agressively.
      
      The (theoretical, haven't actually observed any of this) problem is
      that when we do not select either buddy in pick_next_entity()
      because they are too far ahead of the left-most task, we do not
      clear the buddies.
      
      This means that as soon as we service the left-most task, these
      same buddies will be tried again on the next schedule. Now if the
      left-most task was a pure hog, it wouldn't have done any wakeups
      and it wouldn't have set buddies of its own. That leads to the old
      buddies dominating, which would lead to bad latencies.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      de69a80b
    • P
      sched: Add new wakeup preemption mode: WAKEUP_RUNNING · ad4b78bb
      Peter Zijlstra 提交于
      Create a new wakeup preemption mode, preempt towards tasks that run
      shorter on avg. It sets next buddy to be sure we actually run the task
      we preempted for.
      
      Test results:
      
       root@twins:~# while :; do :; done &
       [1] 6537
       root@twins:~# while :; do :; done &
       [2] 6538
       root@twins:~# while :; do :; done &
       [3] 6539
       root@twins:~# while :; do :; done &
       [4] 6540
      
       root@twins:/home/peter# ./latt -c4 sleep 4
       Entries: 48 (clients=4)
      
       Averages:
       ------------------------------
              Max          4750 usec
              Avg           497 usec
              Stdev         737 usec
      
       root@twins:/home/peter# echo WAKEUP_RUNNING > /debug/sched_features
      
       root@twins:/home/peter# ./latt -c4 sleep 4
       Entries: 48 (clients=4)
      
       Averages:
       ------------------------------
              Max            14 usec
              Avg             5 usec
              Stdev           3 usec
      
      Disabled by default - needs more testing.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      LKML-Reference: <new-submission>
      ad4b78bb
  15. 16 9月, 2009 6 次提交
  16. 15 9月, 2009 7 次提交