1. 20 10月, 2008 2 次提交
    • P
      sched: revert back to per-rq vruntime · f9c0b095
      Peter Zijlstra 提交于
      Vatsa rightly points out that having the runqueue weight in the vruntime
      calculations can cause unfairness in the face of task joins/leaves.
      
      Suppose: dv = dt * rw / w
      
      Then take 10 tasks t_n, each of similar weight. If the first will run 1
      then its vruntime will increase by 10. Now, if the next 8 tasks leave after
      having run their 1, then the last task will get a vruntime increase of 2
      after having run 1.
      
      Which will leave us with 2 tasks of equal weight and equal runtime, of which
      one will not be scheduled for 8/2=4 units of time.
      
      Ergo, we cannot do that and must use: dv = dt / w.
      
      This means we cannot have a global vruntime based on effective priority, but
      must instead go back to the vruntime per rq model we started out with.
      
      This patch was lightly tested by doing starting while loops on each nice level
      and observing their execution time, and a simple group scenario of 1:2:3 pinned
      to a single cpu.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f9c0b095
    • P
      sched: fair scheduler should not resched rt tasks · a4c2f00f
      Peter Zijlstra 提交于
      With use of ftrace Steven noticed that some RT tasks got rescheduled due
      to sched_fair interaction.
      
      What happens is that we reprogram the hrtick from enqueue/dequeue_fair_task()
      because that can change nr_running, and thus a current tasks ideal runtime.
      However, its possible the current task isn't a fair_sched_class task, and thus
      doesn't have a hrtick set to change.
      
      Fix this by wrapping those hrtick_start_fair() calls in a hrtick_update()
      function, which will check for the right conditions.
      Reported-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a4c2f00f
  2. 17 10月, 2008 1 次提交
  3. 08 10月, 2008 1 次提交
  4. 30 9月, 2008 1 次提交
  5. 25 9月, 2008 1 次提交
  6. 23 9月, 2008 4 次提交
  7. 22 9月, 2008 1 次提交
  8. 06 9月, 2008 1 次提交
    • G
      sched: fix __load_balance_iterator() for cfq with only one task · 38736f47
      Gautham R Shenoy 提交于
      The __load_balance_iterator() returns a NULL when there's only one
      sched_entity which is a task. It is caused by the following code-path.
      
      	/* Skip over entities that are not tasks */
      	do {
      		se = list_entry(next, struct sched_entity, group_node);
      		next = next->next;
      	} while (next != &cfs_rq->tasks && !entity_is_task(se));
      
      	if (next == &cfs_rq->tasks)
      		return NULL;
      	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
            This will return NULL even when se is a task.
      
      As a side-effect, there was a regression in sched_mc behavior since 2.6.25,
      since iter_move_one_task() when it calls load_balance_start_fair(),
      would not get any tasks to move!
      
      Fix this by checking if the last entity was a task or not.
      Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      38736f47
  9. 28 8月, 2008 1 次提交
  10. 11 8月, 2008 1 次提交
    • M
      sched: fix mysql+oltp regression · 77ae6513
      Mike Galbraith 提交于
      Defer commit 6d299f1b to the next release.
      
      Testing of the tip/sched/clock tree revealed a mysql+oltp regression
      which bisection eventually traced back to this commit in mainline.
      
      Pertinent test results:  Three run sysbench averages, throughput units
      in read/write requests/sec.
      
      clients         1     2     4     8    16    32    64
      6e0534f2      9646 17876 34774 33868 32230 30767 29441
      2.6.26.1     9112 17936 34652 33383 31929 30665 29232
      6d299f1b      9112 14637 28370 33339 32038 30762 29204
      
      Note: subsequent commits hide the majority of this regression until you
      apply the clock fixes, at which time it reemerges at full magnitude.
      
      We cannot see anything bad about the change itself so we defer it to the
      next release until this problem is fully analysed.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Gregory Haskins <ghaskins@novell.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      77ae6513
  11. 28 7月, 2008 1 次提交
  12. 20 7月, 2008 1 次提交
    • P
      sched, x86: clean up hrtick implementation · 31656519
      Peter Zijlstra 提交于
      random uvesafb failures were reported against Gentoo:
      
        http://bugs.gentoo.org/show_bug.cgi?id=222799
      
      and Mihai Moldovan bisected it back to:
      
      > 8f4d37ec is first bad commit
      > commit 8f4d37ec
      > Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
      > Date:   Fri Jan 25 21:08:29 2008 +0100
      >
      >    sched: high-res preemption tick
      
      Linus suspected it to be hrtick + vm86 interaction and observed:
      
      > Btw, Peter, Ingo: I think that commit is doing bad things. They aren't
      > _incorrect_ per se, but they are definitely bad.
      >
      > Why?
      >
      > Using random _TIF_WORK_MASK flags is really impolite for doing
      > "scheduling" work. There's a reason that arch/x86/kernel/entry_32.S
      > special-cases the _TIF_NEED_RESCHED flag: we don't want to exit out of
      > vm86 mode unnecessarily.
      >
      > See the "work_notifysig_v86" label, and how it does that
      > "save_v86_state()" thing etc etc.
      
      Right, I never liked having to fiddle with those TIF flags. Initially I
      needed it because the hrtimer base lock could not nest in the rq lock.
      That however is fixed these days.
      
      Currently the only reason left to fiddle with the TIF flags is remote
      wakeups. We cannot program a remote cpu's hrtimer. I've been thinking
      about using the new and improved IPI function call stuff to implement
      hrtimer_start_on().
      
      However that does require that smp_call_function_single(.wait=0) works
      from interrupt context - /me looks at the latest series from Jens - Yes
      that does seem to be supported, good.
      
      Here's a stab at cleaning this stuff up ...
      
      Mihai reported test success as well.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Tested-by: NMihai Moldovan <ionic@ionic.de>
      Cc: Michal Januszewski <spock@gentoo.org>
      Cc: Antonino Daplas <adaplas@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      31656519
  13. 18 7月, 2008 1 次提交
    • M
      cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2) · e761b772
      Max Krasnyansky 提交于
      This is based on Linus' idea of creating cpu_active_map that prevents
      scheduler load balancer from migrating tasks to the cpu that is going
      down.
      
      It allows us to simplify domain management code and avoid unecessary
      domain rebuilds during cpu hotplug event handling.
      
      Please ignore the cpusets part for now. It needs some more work in order
      to avoid crazy lock nesting. Although I did simplfy and unify domain
      reinitialization logic. We now simply call partition_sched_domains() in
      all the cases. This means that we're using exact same code paths as in
      cpusets case and hence the test below cover cpusets too.
      Cpuset changes to make rebuild_sched_domains() callable from various
      contexts are in the separate patch (right next after this one).
      
      This not only boots but also easily handles
      	while true; do make clean; make -j 8; done
      and
      	while true; do on-off-cpu 1; done
      at the same time.
      (on-off-cpu 1 simple does echo 0/1 > /sys/.../cpu1/online thing).
      
      Suprisingly the box (dual-core Core2) is quite usable. In fact I'm typing
      this on right now in gnome-terminal and things are moving just fine.
      
      Also this is running with most of the debug features enabled (lockdep,
      mutex, etc) no BUG_ONs or lockdep complaints so far.
      
      I believe I addressed all of the Dmitry's comments for original Linus'
      version. I changed both fair and rt balancer to mask out non-active cpus.
      And replaced cpu_is_offline() with !cpu_active() in the main scheduler
      code where it made sense (to me).
      Signed-off-by: NMax Krasnyanskiy <maxk@qualcomm.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NGregory Haskins <ghaskins@novell.com>
      Cc: dmitry.adamushko@gmail.com
      Cc: pj@sgi.com
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e761b772
  14. 04 7月, 2008 1 次提交
    • G
      sched: add avg-overlap support to RT tasks · 2087a1ad
      Gregory Haskins 提交于
      We have the notion of tracking process-coupling (a.k.a. buddy-wake) via
      the p->se.last_wake / p->se.avg_overlap facilities, but it is only used
      for cfs to cfs interactions.  There is no reason why an rt to cfs
      interaction cannot share in establishing a relationhip in a similar
      manner.
      
      Because PREEMPT_RT runs many kernel threads as FIFO priority, we often
      times have heavy interaction between RT threads waking CFS applications.
      This patch offers a substantial boost (50-60%+) in perfomance under those
      circumstances.
      Signed-off-by: NGregory Haskins <ghaskins@novell.com>
      Cc: npiggin@suse.de
      Cc: rostedt@goodmis.org
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2087a1ad
  15. 27 6月, 2008 18 次提交
  16. 06 6月, 2008 1 次提交
  17. 29 5月, 2008 3 次提交