1. 02 9月, 2009 2 次提交
  2. 02 8月, 2009 3 次提交
  3. 18 7月, 2009 1 次提交
  4. 11 7月, 2009 1 次提交
  5. 18 6月, 2009 1 次提交
  6. 09 4月, 2009 1 次提交
  7. 11 2月, 2009 1 次提交
  8. 01 2月, 2009 3 次提交
  9. 16 1月, 2009 1 次提交
  10. 15 1月, 2009 3 次提交
    • P
      sched: fix update_min_vruntime · e17036da
      Peter Zijlstra 提交于
      Impact: fix SCHED_IDLE latency problems
      
      OK, so we have 1 running task A (which is obviously curr and the tree is
      equally obviously empty).
      
      'A' nicely chugs along, doing its thing, carrying min_vruntime along as it
      goes.
      
      Then some whacko speed freak SCHED_IDLE task gets inserted due to SMP
      balancing, which is very likely far right, in that case
      
      update_curr
        update_min_vruntime
          cfs_rq->rb_leftmost := true (the crazy task sitting in a tree)
            vruntime = se->vruntime
      
      and voila, min_vruntime is waaay right of where it ought to be.
      
      OK, so why did I write it like that to begin with...
      
      Aah, yes.
      
      Say we've just dequeued current
      
      schedule
        deactivate_task(prev)
          dequeue_entity
            update_min_vruntime
      
      Then we'll set
      
        vruntime = cfs_rq->min_vruntime;
      
      we find !cfs_rq->curr, but do find someone in the tree. Then we _must_
      do vruntime = se->vruntime, because
      
       vruntime = min_vruntime(vruntime := cfs_rq->min_vruntime, se->vruntime)
      
      will not advance vruntime, and cause lags the other way around (which we
      fixed with that initial patch: 1af5f730
      (sched: more accurate min_vruntime accounting).
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Tested-by: NMike Galbraith <efault@gmx.de>
      Acked-by: NMike Galbraith <efault@gmx.de>
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e17036da
    • P
      sched: SCHED_OTHER vs SCHED_IDLE isolation · 6bc912b7
      Peter Zijlstra 提交于
      Stronger SCHED_IDLE isolation:
      
       - no SCHED_IDLE buddies
       - never let SCHED_IDLE preempt on wakeup
       - always preempt SCHED_IDLE on wakeup
       - limit SLEEPER fairness for SCHED_IDLE.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6bc912b7
    • P
      sched: prefer wakers · e52fb7c0
      Peter Zijlstra 提交于
      Prefer tasks that wake other tasks to preempt quickly. This improves
      performance because more work is available sooner.
      
      The workload that prompted this patch was a kernel build over NFS4 (for some
      curious and not understood reason we had to revert commit:
      18de9735 to make any progress at all)
      
      Without this patch a make -j8 bzImage (of x86-64 defconfig) would take
      3m30-ish, with this patch we're down to 2m50-ish.
      
      psql-sysbench/mysql-sysbench show a slight improvement in peak performance as
      well, tbench and vmark seemed to not care.
      
      It is possible to improve upon the build time (to 2m20-ish) but that seriously
      destroys other benchmarks (just shows that there's more room for tinkering).
      
      Much thanks to Mike who put in a lot of effort to benchmark things and proved
      a worthy opponent with a competing patch.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e52fb7c0
  11. 09 1月, 2009 1 次提交
  12. 03 1月, 2009 1 次提交
  13. 19 12月, 2008 1 次提交
    • V
      sched: bias task wakeups to preferred semi-idle packages · 7eb52dfa
      Vaidyanathan Srinivasan 提交于
      Impact: tweak task wakeup to save power more agressively
      
      Preferred wakeup cpu (from a semi idle package) has been
      nominated in find_busiest_group() in the previous patch.  Use
      this information in sched_mc_preferred_wakeup_cpu in function
      wake_idle() to bias task wakeups if the following conditions
      are satisfied:
      
              - The present cpu that is trying to wakeup the process is
                idle and waking the target process on this cpu will
                potentially wakeup a completely idle package
              - The previous cpu on which the target process ran is
                also idle and hence selecting the previous cpu may
                wakeup a semi idle cpu package
              - The task being woken up is allowed to run in the
                nominated cpu (cpu affinity and restrictions)
      
      Basically if both the current cpu and the previous cpu on
      which the task ran is idle, select the nominated cpu from semi
      idle cpu package for running the new task that is waking up.
      
      Cache hotness is considered since the actual biasing happens
      in wake_idle() only if the application is cache cold.
      
      This technique will effectively move short running bursty jobs in
      a mostly idle system.
      
      Wakeup biasing for power savings gets automatically disabled if
      system utilisation increases due to the fact that the probability
      of finding both this_cpu and prev_cpu idle decreases.
      Signed-off-by: NVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7eb52dfa
  14. 16 12月, 2008 2 次提交
  15. 25 11月, 2008 2 次提交
    • R
      sched: convert remaining old-style cpumask operators · 96f874e2
      Rusty Russell 提交于
      Impact: Trivial API conversion
      
        NR_CPUS -> nr_cpu_ids
        cpumask_t -> struct cpumask
        sizeof(cpumask_t) -> cpumask_size()
        cpumask_a = cpumask_b -> cpumask_copy(&cpumask_a, &cpumask_b)
      
        cpu_set() -> cpumask_set_cpu()
        first_cpu() -> cpumask_first()
        cpumask_of_cpu() -> cpumask_of()
        cpus_* -> cpumask_*
      
      There are some FIXMEs where we all archs to complete infrastructure
      (patches have been sent):
      
        cpu_coregroup_map -> cpu_coregroup_mask
        node_to_cpumask* -> cpumask_of_node
      
      There is also one FIXME where we pass an array of cpumasks to
      partition_sched_domains(): this implies knowing the definition of
      'struct cpumask' and the size of a cpumask.  This will be fixed in a
      future patch.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      96f874e2
    • R
      sched: wrap sched_group and sched_domain cpumask accesses. · 758b2cdc
      Rusty Russell 提交于
      Impact: trivial wrap of member accesses
      
      This eases the transition in the next patch.
      
      We also get rid of a temporary cpumask in find_idlest_cpu() thanks to
      for_each_cpu_and, and sched_balance_self() due to getting weight before
      setting sd to NULL.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      758b2cdc
  16. 11 11月, 2008 1 次提交
  17. 05 11月, 2008 4 次提交
  18. 24 10月, 2008 4 次提交
  19. 22 10月, 2008 1 次提交
  20. 20 10月, 2008 2 次提交
    • P
      sched: revert back to per-rq vruntime · f9c0b095
      Peter Zijlstra 提交于
      Vatsa rightly points out that having the runqueue weight in the vruntime
      calculations can cause unfairness in the face of task joins/leaves.
      
      Suppose: dv = dt * rw / w
      
      Then take 10 tasks t_n, each of similar weight. If the first will run 1
      then its vruntime will increase by 10. Now, if the next 8 tasks leave after
      having run their 1, then the last task will get a vruntime increase of 2
      after having run 1.
      
      Which will leave us with 2 tasks of equal weight and equal runtime, of which
      one will not be scheduled for 8/2=4 units of time.
      
      Ergo, we cannot do that and must use: dv = dt / w.
      
      This means we cannot have a global vruntime based on effective priority, but
      must instead go back to the vruntime per rq model we started out with.
      
      This patch was lightly tested by doing starting while loops on each nice level
      and observing their execution time, and a simple group scenario of 1:2:3 pinned
      to a single cpu.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f9c0b095
    • P
      sched: fair scheduler should not resched rt tasks · a4c2f00f
      Peter Zijlstra 提交于
      With use of ftrace Steven noticed that some RT tasks got rescheduled due
      to sched_fair interaction.
      
      What happens is that we reprogram the hrtick from enqueue/dequeue_fair_task()
      because that can change nr_running, and thus a current tasks ideal runtime.
      However, its possible the current task isn't a fair_sched_class task, and thus
      doesn't have a hrtick set to change.
      
      Fix this by wrapping those hrtick_start_fair() calls in a hrtick_update()
      function, which will check for the right conditions.
      Reported-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a4c2f00f
  21. 17 10月, 2008 1 次提交
  22. 08 10月, 2008 1 次提交
  23. 30 9月, 2008 1 次提交
  24. 25 9月, 2008 1 次提交