1. 12 3月, 2010 6 次提交
    • M
      sched: Fix select_idle_sibling() · 8b911acd
      Mike Galbraith 提交于
      Don't bother with selection when the current cpu is idle.  Recent load
      balancing changes also make it no longer necessary to check wake_affine()
      success before returning the selected sibling, so we now always use it.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1268301369.6785.36.camel@marge.simson.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8b911acd
    • M
      sched: Tweak sched_latency and min_granularity · 21406928
      Mike Galbraith 提交于
      Allow LAST_BUDDY to kick in sooner, improving cache utilization as soon as
      a second buddy pair arrives on scene.  The cost is latency starting to climb
      sooner, the tbenefit for tbench 8 on my Q6600 box is ~2%.  No detrimental
      effects noted in normal idesktop usage.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1268301285.6785.34.camel@marge.simson.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      21406928
    • M
      sched: Cleanup/optimize clock updates · a64692a3
      Mike Galbraith 提交于
      Now that we no longer depend on the clock being updated prior to enqueueing
      on migratory wakeup, we can clean up a bit, placing calls to update_rq_clock()
      exactly where they are needed, ie on enqueue, dequeue and schedule events.
      
      In the case of a freshly enqueued task immediately preempting, we can skip the
      update during preemption, as the clock was just updated by the enqueue event.
      We also save an unneeded call during a migratory wakeup by not updating the
      previous runqueue, where update_curr() won't be invoked.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1268301199.6785.32.camel@marge.simson.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a64692a3
    • M
      sched: Remove avg_overlap · e12f31d3
      Mike Galbraith 提交于
      Both avg_overlap and avg_wakeup had an inherent problem in that their accuracy
      was detrimentally affected by cross-cpu wakeups, this because we are missing
      the necessary call to update_curr().  This can't be fixed without increasing
      overhead in our already too fat fastpath.
      
      Additionally, with recent load balancing changes making us prefer to place tasks
      in an idle cache domain (which is good for compute bound loads), communicating
      tasks suffer when a sync wakeup, which would enable affine placement, is turned
      into a non-sync wakeup by SYNC_LESS.  With one task on the runqueue, wake_affine()
      rejects the affine wakeup request, leaving the unfortunate where placed, taking
      frequent cache misses.
      
      Remove it, and recover some fastpath cycles.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1268301121.6785.30.camel@marge.simson.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e12f31d3
    • M
      sched: Remove avg_wakeup · b42e0c41
      Mike Galbraith 提交于
      Testing the load which led to this heuristic (nfs4 kbuild) shows that it has
      outlived it's usefullness.  With intervening load balancing changes, I cannot
      see any difference with/without, so recover there fastpath cycles.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1268301062.6785.29.camel@marge.simson.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b42e0c41
    • M
      sched: Rate-limit nohz · 39c0cbe2
      Mike Galbraith 提交于
      Entering nohz code on every micro-idle is costing ~10% throughput for netperf
      TCP_RR when scheduling cross-cpu.  Rate limiting entry fixes this, but raises
      ticks a bit.  On my Q6600, an idle box goes from ~85 interrupts/sec to 128.
      
      The higher the context switch rate, the more nohz entry costs.  With this patch
      and some cycle recovery patches in my tree, max cross cpu context switch rate is
      improved by ~16%, a large portion of which of which is this ratelimiting.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1268301003.6785.28.camel@marge.simson.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      39c0cbe2
  2. 11 3月, 2010 3 次提交
  3. 02 3月, 2010 31 次提交