1. 15 9月, 2009 6 次提交
  2. 11 9月, 2009 1 次提交
    • I
      sched: Fix sched::sched_stat_wait tracepoint field · e1f84508
      Ingo Molnar 提交于
      This weird perf trace output:
      
        cc1-9943  [001]  2802.059479616: sched_stat_wait: task: as:9944 wait: 2801938766276 [ns]
      
      Is caused by setting one component field of the delta to zero
      a bit too early. Move it to later.
      
      ( Note, this does not affect the NEW_FAIR_SLEEPERS interactivity bug,
        it's just a reporting bug in essence. )
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Nikos Chantziaras <realnc@arcor.de>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Mike Galbraith <efault@gmx.de>
      LKML-Reference: <4AA93D34.8040500@arcor.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e1f84508
  3. 09 9月, 2009 2 次提交
  4. 08 9月, 2009 3 次提交
    • M
      sched: Ensure that a child can't gain time over it's parent after fork() · b5d9d734
      Mike Galbraith 提交于
      A fork/exec load is usually "pass the baton", so the child
      should never be placed behind the parent.  With START_DEBIT we
      make room for the new task, but with child_runs_first, that
      room comes out of the _parent's_ hide. There's nothing to say
      that the parent wasn't ahead of min_vruntime at fork() time,
      which means that the "baton carrier", who is essentially the
      parent in drag, can gain time and increase scheduling latencies
      for waiters.
      
      With NEW_FAIR_SLEEPERS + START_DEBIT + child_runs_first
      enabled, we essentially pass the sleeper fairness off to the
      child, which is fine, but if we don't base placement on the
      parent's updated vruntime, we can end up compounding latency
      woes if the child itself then does fork/exec.  The debit
      incurred at fork doesn't hurt the parent who is then going to
      sleep and maybe exit, but the child who acquires the error
      harms all comers.
      
      This improves latencies of make -j<n> kernel build workloads.
      Reported-by: NJens Axboe <jens.axboe@oracle.com>
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b5d9d734
    • P
      sched: Deal with low-load in wake_affine() · 71a29aa7
      Peter Zijlstra 提交于
      wake_affine() would always fail under low-load situations where
      both prev and this were idle, because adding a single task will
      always be a significant imbalance, even if there's nothing
      around that could balance it.
      
      Deal with this by allowing imbalance when there's nothing you
      can do about it.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      71a29aa7
    • P
      sched: Remove short cut from select_task_rq_fair() · cdd2ab3d
      Peter Zijlstra 提交于
      select_task_rq_fair() incorrectly skips the wake_affine()
      logic, remove this.
      
      When prev_cpu == this_cpu, the code jumps straight to the
      wake_idle() logic, this doesn't give the wake_affine() logic
      the chance to pin the task to this cpu.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cdd2ab3d
  5. 02 9月, 2009 2 次提交
  6. 02 8月, 2009 3 次提交
  7. 18 7月, 2009 1 次提交
  8. 11 7月, 2009 1 次提交
  9. 18 6月, 2009 1 次提交
  10. 09 4月, 2009 1 次提交
  11. 11 2月, 2009 1 次提交
  12. 01 2月, 2009 3 次提交
  13. 16 1月, 2009 1 次提交
  14. 15 1月, 2009 3 次提交
    • P
      sched: fix update_min_vruntime · e17036da
      Peter Zijlstra 提交于
      Impact: fix SCHED_IDLE latency problems
      
      OK, so we have 1 running task A (which is obviously curr and the tree is
      equally obviously empty).
      
      'A' nicely chugs along, doing its thing, carrying min_vruntime along as it
      goes.
      
      Then some whacko speed freak SCHED_IDLE task gets inserted due to SMP
      balancing, which is very likely far right, in that case
      
      update_curr
        update_min_vruntime
          cfs_rq->rb_leftmost := true (the crazy task sitting in a tree)
            vruntime = se->vruntime
      
      and voila, min_vruntime is waaay right of where it ought to be.
      
      OK, so why did I write it like that to begin with...
      
      Aah, yes.
      
      Say we've just dequeued current
      
      schedule
        deactivate_task(prev)
          dequeue_entity
            update_min_vruntime
      
      Then we'll set
      
        vruntime = cfs_rq->min_vruntime;
      
      we find !cfs_rq->curr, but do find someone in the tree. Then we _must_
      do vruntime = se->vruntime, because
      
       vruntime = min_vruntime(vruntime := cfs_rq->min_vruntime, se->vruntime)
      
      will not advance vruntime, and cause lags the other way around (which we
      fixed with that initial patch: 1af5f730
      (sched: more accurate min_vruntime accounting).
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Tested-by: NMike Galbraith <efault@gmx.de>
      Acked-by: NMike Galbraith <efault@gmx.de>
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e17036da
    • P
      sched: SCHED_OTHER vs SCHED_IDLE isolation · 6bc912b7
      Peter Zijlstra 提交于
      Stronger SCHED_IDLE isolation:
      
       - no SCHED_IDLE buddies
       - never let SCHED_IDLE preempt on wakeup
       - always preempt SCHED_IDLE on wakeup
       - limit SLEEPER fairness for SCHED_IDLE.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6bc912b7
    • P
      sched: prefer wakers · e52fb7c0
      Peter Zijlstra 提交于
      Prefer tasks that wake other tasks to preempt quickly. This improves
      performance because more work is available sooner.
      
      The workload that prompted this patch was a kernel build over NFS4 (for some
      curious and not understood reason we had to revert commit:
      18de9735 to make any progress at all)
      
      Without this patch a make -j8 bzImage (of x86-64 defconfig) would take
      3m30-ish, with this patch we're down to 2m50-ish.
      
      psql-sysbench/mysql-sysbench show a slight improvement in peak performance as
      well, tbench and vmark seemed to not care.
      
      It is possible to improve upon the build time (to 2m20-ish) but that seriously
      destroys other benchmarks (just shows that there's more room for tinkering).
      
      Much thanks to Mike who put in a lot of effort to benchmark things and proved
      a worthy opponent with a competing patch.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e52fb7c0
  15. 09 1月, 2009 1 次提交
  16. 03 1月, 2009 1 次提交
  17. 19 12月, 2008 1 次提交
    • V
      sched: bias task wakeups to preferred semi-idle packages · 7eb52dfa
      Vaidyanathan Srinivasan 提交于
      Impact: tweak task wakeup to save power more agressively
      
      Preferred wakeup cpu (from a semi idle package) has been
      nominated in find_busiest_group() in the previous patch.  Use
      this information in sched_mc_preferred_wakeup_cpu in function
      wake_idle() to bias task wakeups if the following conditions
      are satisfied:
      
              - The present cpu that is trying to wakeup the process is
                idle and waking the target process on this cpu will
                potentially wakeup a completely idle package
              - The previous cpu on which the target process ran is
                also idle and hence selecting the previous cpu may
                wakeup a semi idle cpu package
              - The task being woken up is allowed to run in the
                nominated cpu (cpu affinity and restrictions)
      
      Basically if both the current cpu and the previous cpu on
      which the task ran is idle, select the nominated cpu from semi
      idle cpu package for running the new task that is waking up.
      
      Cache hotness is considered since the actual biasing happens
      in wake_idle() only if the application is cache cold.
      
      This technique will effectively move short running bursty jobs in
      a mostly idle system.
      
      Wakeup biasing for power savings gets automatically disabled if
      system utilisation increases due to the fact that the probability
      of finding both this_cpu and prev_cpu idle decreases.
      Signed-off-by: NVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7eb52dfa
  18. 16 12月, 2008 2 次提交
  19. 25 11月, 2008 2 次提交
    • R
      sched: convert remaining old-style cpumask operators · 96f874e2
      Rusty Russell 提交于
      Impact: Trivial API conversion
      
        NR_CPUS -> nr_cpu_ids
        cpumask_t -> struct cpumask
        sizeof(cpumask_t) -> cpumask_size()
        cpumask_a = cpumask_b -> cpumask_copy(&cpumask_a, &cpumask_b)
      
        cpu_set() -> cpumask_set_cpu()
        first_cpu() -> cpumask_first()
        cpumask_of_cpu() -> cpumask_of()
        cpus_* -> cpumask_*
      
      There are some FIXMEs where we all archs to complete infrastructure
      (patches have been sent):
      
        cpu_coregroup_map -> cpu_coregroup_mask
        node_to_cpumask* -> cpumask_of_node
      
      There is also one FIXME where we pass an array of cpumasks to
      partition_sched_domains(): this implies knowing the definition of
      'struct cpumask' and the size of a cpumask.  This will be fixed in a
      future patch.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      96f874e2
    • R
      sched: wrap sched_group and sched_domain cpumask accesses. · 758b2cdc
      Rusty Russell 提交于
      Impact: trivial wrap of member accesses
      
      This eases the transition in the next patch.
      
      We also get rid of a temporary cpumask in find_idlest_cpu() thanks to
      for_each_cpu_and, and sched_balance_self() due to getting weight before
      setting sd to NULL.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      758b2cdc
  20. 11 11月, 2008 1 次提交
  21. 05 11月, 2008 3 次提交
    • P
      sched: fix buddies for group scheduling · 02479099
      Peter Zijlstra 提交于
      Impact: scheduling order fix for group scheduling
      
      For each level in the hierarchy, set the buddy to point to the right entity.
      Therefore, when we do the hierarchical schedule, we have a fair chance of
      ending up where we meant to.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      02479099
    • P
      sched: backward looking buddy · 4793241b
      Peter Zijlstra 提交于
      Impact: improve/change/fix wakeup-buddy scheduling
      
      Currently we only have a forward looking buddy, that is, we prefer to
      schedule to the task we last woke up, under the presumption that its
      going to consume the data we just produced, and therefore will have
      cache hot benefits.
      
      This allows co-waking producer/consumer task pairs to run ahead of the
      pack for a little while, keeping their cache warm. Without this, we
      would interleave all pairs, utterly trashing the cache.
      
      This patch introduces a backward looking buddy, that is, suppose that
      in the above scenario, the consumer preempts the producer before it
      can go to sleep, we will therefore miss the wakeup from consumer to
      producer (its already running, after all), breaking the cycle and
      reverting to the cache-trashing interleaved schedule pattern.
      
      The backward buddy will try to schedule back to the task that woke us
      up in case the forward buddy is not available, under the assumption
      that the last task will be the one with the most cache hot task around
      barring current.
      
      This will basically allow a task to continue after it got preempted.
      
      In order to avoid starvation, we allow either buddy to get wakeup_gran
      ahead of the pack.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4793241b
    • P
      sched: fix fair preempt check · d95f98d0
      Peter Zijlstra 提交于
      Impact: fix cross-class preemption
      
      Inter-class wakeup preemptions should go on class order.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d95f98d0