1. 23 4月, 2010 1 次提交
  2. 03 4月, 2010 2 次提交
    • P
      sched: Add enqueue/dequeue flags · 371fd7e7
      Peter Zijlstra 提交于
      In order to reduce the dependency on TASK_WAKING rework the enqueue
      interface to support a proper flags field.
      
      Replace the int wakeup, bool head arguments with an int flags argument
      and create the following flags:
      
        ENQUEUE_WAKEUP - the enqueue is a wakeup of a sleeping task,
        ENQUEUE_WAKING - the enqueue has relative vruntime due to
                         having sched_class::task_waking() called,
        ENQUEUE_HEAD - the waking task should be places on the head
                       of the priority queue (where appropriate).
      
      For symmetry also convert sched_class::dequeue() to a flags scheme.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      371fd7e7
    • P
      sched: Fix TASK_WAKING vs fork deadlock · 0017d735
      Peter Zijlstra 提交于
      Oleg noticed a few races with the TASK_WAKING usage on fork.
      
       - since TASK_WAKING is basically a spinlock, it should be IRQ safe
       - since we set TASK_WAKING (*) without holding rq->lock it could
         be there still is a rq->lock holder, thereby not actually
         providing full serialization.
      
      (*) in fact we clear PF_STARTING, which in effect enables TASK_WAKING.
      
      Cure the second issue by not setting TASK_WAKING in sched_fork(), but
      only temporarily in wake_up_new_task() while calling select_task_rq().
      
      Cure the first by holding rq->lock around the select_task_rq() call,
      this will disable IRQs, this however requires that we push down the
      rq->lock release into select_task_rq_fair()'s cgroup stuff.
      
      Because select_task_rq_fair() still needs to drop the rq->lock we
      cannot fully get rid of TASK_WAKING.
      Reported-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0017d735
  3. 12 3月, 2010 10 次提交
  4. 11 3月, 2010 1 次提交
  5. 01 3月, 2010 1 次提交
  6. 26 2月, 2010 1 次提交
    • S
      sched: Fix SCHED_MC regression caused by change in sched cpu_power · dd5feea1
      Suresh Siddha 提交于
      On platforms like dual socket quad-core platform, the scheduler load
      balancer is not detecting the load imbalances in certain scenarios. This
      is leading to scenarios like where one socket is completely busy (with
      all the 4 cores running with 4 tasks) and leaving another socket
      completely idle. This causes performance issues as those 4 tasks share
      the memory controller, last-level cache bandwidth etc. Also we won't be
      taking advantage of turbo-mode as much as we would like, etc.
      
      Some of the comparisons in the scheduler load balancing code are
      comparing the "weighted cpu load that is scaled wrt sched_group's
      cpu_power" with the "weighted average load per task that is not scaled
      wrt sched_group's cpu_power". While this has probably been broken for a
      longer time (for multi socket numa nodes etc), the problem got aggrevated
      via this recent change:
      
       |
       |  commit f93e65c1
       |  Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
       |  Date:   Tue Sep 1 10:34:32 2009 +0200
       |
       |	sched: Restore __cpu_power to a straight sum of power
       |
      
      Also with this change, the sched group cpu power alone no longer reflects
      the group capacity that is needed to implement MC, MT performance
      (default) and power-savings (user-selectable) policies.
      
      We need to use the computed group capacity (sgs.group_capacity, that is
      computed using the SD_PREFER_SIBLING logic in update_sd_lb_stats()) to
      find out if the group with the max load is above its capacity and how
      much load to move etc.
      Reported-by: NMa Ling <ling.ma@intel.com>
      Initial-Analysis-by: NZhang, Yanmin <yanmin_zhang@linux.intel.com>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      [ -v2: build fix ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: <stable@kernel.org> # [2.6.32.x, 2.6.33.x]
      LKML-Reference: <1266970432.11588.22.camel@sbs-t61.sc.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      dd5feea1
  7. 23 1月, 2010 1 次提交
  8. 21 1月, 2010 11 次提交
  9. 17 1月, 2010 1 次提交
  10. 17 12月, 2009 2 次提交
    • P
      sched: Remove the cfs_rq dependency from set_task_cpu() · 88ec22d3
      Peter Zijlstra 提交于
      In order to remove the cfs_rq dependency from set_task_cpu() we
      need to ensure the task is cfs_rq invariant for all callsites.
      
      The simple approach is to substract cfs_rq->min_vruntime from
      se->vruntime on dequeue, and add cfs_rq->min_vruntime on
      enqueue.
      
      However, this has the downside of breaking FAIR_SLEEPERS since
      we loose the old vruntime as we only maintain the relative
      position.
      
      To solve this, we observe that we only migrate runnable tasks,
      we do this using deactivate_task(.sleep=0) and
      activate_task(.wakeup=0), therefore we can restrain the
      min_vruntime invariance to that state.
      
      The only other case is wakeup balancing, since we want to
      maintain the old vruntime we cannot make it relative on dequeue,
      but since we don't migrate inactive tasks, we can do so right
      before we activate it again.
      
      This is where we need the new pre-wakeup hook, we need to call
      this while still holding the old rq->lock. We could fold it into
      ->select_task_rq(), but since that has multiple callsites and
      would obfuscate the locking requirements, that seems like a
      fudge.
      
      This leaves the fork() case, simply make sure that ->task_fork()
      leaves the ->vruntime in a relative state.
      
      This covers all cases where set_task_cpu() gets called, and
      ensures it sees a relative vruntime.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      LKML-Reference: <20091216170518.191697025@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      88ec22d3
    • P
      sched: Select_task_rq_fair() must honour SD_LOAD_BALANCE · e4f42888
      Peter Zijlstra 提交于
      We should skip !SD_LOAD_BALANCE domains.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      LKML-Reference: <20091216170517.653578430@chello.nl>
      CC: stable@kernel.org
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e4f42888
  11. 15 12月, 2009 1 次提交
  12. 09 12月, 2009 8 次提交