1. 09 12月, 2009 1 次提交
  2. 24 11月, 2009 1 次提交
    • T
      sched: Optimize branch hint in pick_next_task_fair() · 36ace27e
      Tim Blechmann 提交于
      Branch hint profiling on my nehalem machine showed 90%
      incorrect branch hints:
      
        15728471 158903754  90 pick_next_task_fair
        sched_fair.c    1555
      Signed-off-by: NTim Blechmann <tim@klingt.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <4B0BBBB1.2050100@klingt.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      36ace27e
  3. 13 11月, 2009 2 次提交
  4. 05 11月, 2009 2 次提交
    • M
      sched: Fix affinity logic in select_task_rq_fair() · fd210738
      Mike Galbraith 提交于
      Ingo Molnar reported:
      
      [   26.804000] BUG: using smp_processor_id() in preemptible [00000000] code: events/1/10
      [   26.808000] caller is vmstat_update+0x26/0x70
      [   26.812000] Pid: 10, comm: events/1 Not tainted 2.6.32-rc5 #6887
      [   26.816000] Call Trace:
      [   26.820000]  [<c1924a24>] ? printk+0x28/0x3c
      [   26.824000]  [<c13258a0>] debug_smp_processor_id+0xf0/0x110
      [   26.824000] mount used greatest stack depth: 1464 bytes left
      [   26.828000]  [<c111d086>] vmstat_update+0x26/0x70
      [   26.832000]  [<c1086418>] worker_thread+0x188/0x310
      [   26.836000]  [<c10863b7>] ? worker_thread+0x127/0x310
      [   26.840000]  [<c108d310>] ? autoremove_wake_function+0x0/0x60
      [   26.844000]  [<c1086290>] ? worker_thread+0x0/0x310
      [   26.848000]  [<c108cf0c>] kthread+0x7c/0x90
      [   26.852000]  [<c108ce90>] ? kthread+0x0/0x90
      [   26.856000]  [<c100c0a7>] kernel_thread_helper+0x7/0x10
      [   26.860000] BUG: using smp_processor_id() in preemptible [00000000] code: events/1/10
      [   26.864000] caller is vmstat_update+0x3c/0x70
      
      Because this commit:
      
        a1f84a3a: sched: Check for an idle shared cache in select_task_rq_fair()
      
      broke ->cpus_allowed.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: arjan@infradead.org
      Cc: <stable@kernel.org>
      LKML-Reference: <1257415066.12867.1.camel@marge.simson.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fd210738
    • M
      sched: Check for an idle shared cache in select_task_rq_fair() · a1f84a3a
      Mike Galbraith 提交于
      When waking affine, check for an idle shared cache, and if
      found, wake to that CPU/sibling instead of the waker's CPU.
      
      This improves pgsql+oltp ramp up by roughly 8%. Possibly more
      for other loads, depending on overlap. The trade-off is a
      roughly 1% peak downturn if tasks are truly synchronous.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: <stable@kernel.org>
      LKML-Reference: <1256654138.17752.7.camel@marge.simson.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a1f84a3a
  5. 24 10月, 2009 1 次提交
    • M
      sched: Strengthen buddies and mitigate buddy induced latencies · f685ceac
      Mike Galbraith 提交于
      This patch restores the effectiveness of LAST_BUDDY in preventing
      pgsql+oltp from collapsing due to wakeup preemption. It also
      switches LAST_BUDDY to exclusively do what it does best, namely
      mitigate the effects of aggressive wakeup preemption, which
      improves vmark throughput markedly, and restores mysql+oltp
      scalability.
      
      Since buddies are about scalability, enable them beginning at the
      point where we begin expanding sched_latency, namely
      sched_nr_latency. Previously, buddies were cleared aggressively,
      which seriously reduced their effectiveness. Not clearing
      aggressively however, produces a small drop in mysql+oltp
      throughput immediately after peak, indicating that LAST_BUDDY is
      actually doing some harm. This is right at the point where X on the
      desktop in competition with another load wants low latency service.
      Ergo, do not enable until we need to scale.
      
      To mitigate latency induced by buddies, or by a task just missing
      wakeup preemption, check latency at tick time.
      
      Last hunk prevents buddies from stymieing BALANCE_NEWIDLE via
      CACHE_HOT_BUDDY.
      
      Supporting performance tests:
      
       tip   = v2.6.32-rc5-1497-ga525b32
       tipx  = NO_GENTLE_FAIR_SLEEPERS NEXT_BUDDY granularity knobs = 31 knobs + 31 buddies
       tip+x = NO_GENTLE_FAIR_SLEEPERS granularity knobs = 31 knobs
      
      (Three run averages except where noted.)
      
       vmark:
       ------
       tip           108466 messages per second
       tip+          125307 messages per second
       tip+x         125335 messages per second
       tipx          117781 messages per second
       2.6.31.3      122729 messages per second
      
       mysql+oltp:
       -----------
       clients          1        2        4        8       16       32       64        128    256
       ..........................................................................................
       tip        9949.89 18690.20 34801.24 34460.04 32682.88 30765.97 28305.27 25059.64 19548.08
       tip+      10013.90 18526.84 34900.38 34420.14 33069.83 32083.40 30578.30 28010.71 25605.47
       tipx       9698.71 18002.70 34477.56 33420.01 32634.30 31657.27 29932.67 26827.52 21487.18
       2.6.31.3   8243.11 18784.20 34404.83 33148.38 31900.32 31161.90 29663.81 25995.94 18058.86
      
       pgsql+oltp:
       -----------
       clients          1        2        4        8       16       32       64      128      256
       ..........................................................................................
       tip       13686.37 26609.25 51934.28 51347.81 49479.51 45312.65 36691.91 26851.57 24145.35
       tip+ (1x) 13907.85 27135.87 52951.98 52514.04 51742.52 50705.43 49947.97 48374.19 46227.94
       tip+x     13906.78 27065.81 52951.19 52542.59 52176.11 51815.94 50838.90 49439.46 46891.00
       tipx      13742.46 26769.81 52351.99 51891.73 51320.79 50938.98 50248.65 48908.70 46553.84
       2.6.31.3  13815.35 26906.46 52683.34 52061.31 51937.10 51376.80 50474.28 49394.47 47003.25
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f685ceac
  6. 14 10月, 2009 1 次提交
    • P
      sched: Do less agressive buddy clearing · 92f6a5e3
      Peter Zijlstra 提交于
      Yanmin reported a hackbench regression due to:
      
       > commit de69a80b
       > Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
       > Date:   Thu Sep 17 09:01:20 2009 +0200
       >
       >     sched: Stop buddies from hogging the system
      
      I really liked de69a80b, and it affecting hackbench shows I wasn't
      crazy ;-)
      
      So hackbench is a multi-cast, with one sender spraying multiple
      receivers, who in their turn don't spray back.
      
      This would be exactly the scenario that patch 'cures'. Previously
      we would not clear the last buddy after running the next task,
      allowing the sender to get back to work sooner than it otherwise
      ought to have been, increasing latencies for other tasks.
      
      Now, since those receivers don't poke back, they don't enforce the
      buddy relation, which means there's nothing to re-elect the sender.
      
      Cure this by less agressively clearing the buddy stats. Only clear
      buddies when they were not chosen. It should still avoid a buddy
      sticking around long after its served its time.
      Reported-by: N"Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      CC: Mike Galbraith <efault@gmx.de>
      LKML-Reference: <1255084986.8802.46.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      92f6a5e3
  7. 24 9月, 2009 1 次提交
  8. 21 9月, 2009 1 次提交
  9. 19 9月, 2009 1 次提交
  10. 18 9月, 2009 1 次提交
  11. 17 9月, 2009 3 次提交
    • P
      sched: Fix SD_POWERSAVING_BALANCE|SD_PREFER_LOCAL vs SD_WAKE_AFFINE · 29cd8bae
      Peter Zijlstra 提交于
      The SD_POWERSAVING_BALANCE|SD_PREFER_LOCAL code can break out of
      the domain iteration early, making us miss the SD_WAKE_AFFINE bits.
      
      Fix this by continuing iteration until there is no need for a
      larger domain.
      
      This also cleans up the cgroup stuff a bit, but not having two
      update_shares() invocations.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      29cd8bae
    • P
      sched: Stop buddies from hogging the system · de69a80b
      Peter Zijlstra 提交于
      Clear buddies more agressively.
      
      The (theoretical, haven't actually observed any of this) problem is
      that when we do not select either buddy in pick_next_entity()
      because they are too far ahead of the left-most task, we do not
      clear the buddies.
      
      This means that as soon as we service the left-most task, these
      same buddies will be tried again on the next schedule. Now if the
      left-most task was a pure hog, it wouldn't have done any wakeups
      and it wouldn't have set buddies of its own. That leads to the old
      buddies dominating, which would lead to bad latencies.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      de69a80b
    • P
      sched: Add new wakeup preemption mode: WAKEUP_RUNNING · ad4b78bb
      Peter Zijlstra 提交于
      Create a new wakeup preemption mode, preempt towards tasks that run
      shorter on avg. It sets next buddy to be sure we actually run the task
      we preempted for.
      
      Test results:
      
       root@twins:~# while :; do :; done &
       [1] 6537
       root@twins:~# while :; do :; done &
       [2] 6538
       root@twins:~# while :; do :; done &
       [3] 6539
       root@twins:~# while :; do :; done &
       [4] 6540
      
       root@twins:/home/peter# ./latt -c4 sleep 4
       Entries: 48 (clients=4)
      
       Averages:
       ------------------------------
              Max          4750 usec
              Avg           497 usec
              Stdev         737 usec
      
       root@twins:/home/peter# echo WAKEUP_RUNNING > /debug/sched_features
      
       root@twins:/home/peter# ./latt -c4 sleep 4
       Entries: 48 (clients=4)
      
       Averages:
       ------------------------------
              Max            14 usec
              Avg             5 usec
              Stdev           3 usec
      
      Disabled by default - needs more testing.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      LKML-Reference: <new-submission>
      ad4b78bb
  12. 16 9月, 2009 6 次提交
  13. 15 9月, 2009 12 次提交
  14. 14 9月, 2009 1 次提交
    • I
      perf_counter, sched: Add sched_stat_runtime tracepoint · f977bb49
      Ingo Molnar 提交于
      This allows more precise tracking of how the scheduler accounts
      (and acts upon) a task having spent N nanoseconds of CPU time.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f977bb49
  15. 11 9月, 2009 1 次提交
    • I
      sched: Fix sched::sched_stat_wait tracepoint field · e1f84508
      Ingo Molnar 提交于
      This weird perf trace output:
      
        cc1-9943  [001]  2802.059479616: sched_stat_wait: task: as:9944 wait: 2801938766276 [ns]
      
      Is caused by setting one component field of the delta to zero
      a bit too early. Move it to later.
      
      ( Note, this does not affect the NEW_FAIR_SLEEPERS interactivity bug,
        it's just a reporting bug in essence. )
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Nikos Chantziaras <realnc@arcor.de>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Mike Galbraith <efault@gmx.de>
      LKML-Reference: <4AA93D34.8040500@arcor.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e1f84508
  16. 09 9月, 2009 2 次提交
  17. 08 9月, 2009 3 次提交
    • M
      sched: Ensure that a child can't gain time over it's parent after fork() · b5d9d734
      Mike Galbraith 提交于
      A fork/exec load is usually "pass the baton", so the child
      should never be placed behind the parent.  With START_DEBIT we
      make room for the new task, but with child_runs_first, that
      room comes out of the _parent's_ hide. There's nothing to say
      that the parent wasn't ahead of min_vruntime at fork() time,
      which means that the "baton carrier", who is essentially the
      parent in drag, can gain time and increase scheduling latencies
      for waiters.
      
      With NEW_FAIR_SLEEPERS + START_DEBIT + child_runs_first
      enabled, we essentially pass the sleeper fairness off to the
      child, which is fine, but if we don't base placement on the
      parent's updated vruntime, we can end up compounding latency
      woes if the child itself then does fork/exec.  The debit
      incurred at fork doesn't hurt the parent who is then going to
      sleep and maybe exit, but the child who acquires the error
      harms all comers.
      
      This improves latencies of make -j<n> kernel build workloads.
      Reported-by: NJens Axboe <jens.axboe@oracle.com>
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b5d9d734
    • P
      sched: Deal with low-load in wake_affine() · 71a29aa7
      Peter Zijlstra 提交于
      wake_affine() would always fail under low-load situations where
      both prev and this were idle, because adding a single task will
      always be a significant imbalance, even if there's nothing
      around that could balance it.
      
      Deal with this by allowing imbalance when there's nothing you
      can do about it.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      71a29aa7
    • P
      sched: Remove short cut from select_task_rq_fair() · cdd2ab3d
      Peter Zijlstra 提交于
      select_task_rq_fair() incorrectly skips the wake_affine()
      logic, remove this.
      
      When prev_cpu == this_cpu, the code jumps straight to the
      wake_idle() logic, this doesn't give the wake_affine() logic
      the chance to pin the task to this cpu.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cdd2ab3d