1. 09 12月, 2009 1 次提交
  2. 04 11月, 2009 1 次提交
  3. 21 9月, 2009 1 次提交
  4. 15 9月, 2009 3 次提交
  5. 04 9月, 2009 1 次提交
  6. 02 8月, 2009 4 次提交
    • I
      sched: Fix cpupri build on !CONFIG_SMP · bcf08df3
      Ingo Molnar 提交于
      This build bug:
      
       In file included from kernel/sched.c:1765:
       kernel/sched_rt.c: In function ‘has_pushable_tasks’:
       kernel/sched_rt.c:1069: error: ‘struct rt_rq’ has no member named ‘pushable_tasks’
       kernel/sched_rt.c: In function ‘pick_next_task_rt’:
       kernel/sched_rt.c:1084: error: ‘struct rq’ has no member named ‘post_schedule’
      
      Triggers because both pushable_tasks and post_schedule are
      SMP-only fields.
      
      Move pushable_tasks() to the SMP section and #ifdef the post_schedule use.
      
      Cc: Gregory Haskins <ghaskins@novell.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20090729150422.17691.55590.stgit@dev.haskins.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      bcf08df3
    • P
      sched: Add debug check to task_of() · 8f48894f
      Peter Zijlstra 提交于
      A frequent mistake appears to be to call task_of() on a
      scheduler entity that is not actually a task, which can result
      in a wild pointer.
      
      Add a check to catch these mistakes.
      Suggested-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8f48894f
    • G
      sched: Fully integrate cpus_active_map and root-domain code · 00aec93d
      Gregory Haskins 提交于
      Reflect "active" cpus in the rq->rd->online field, instead of
      the online_map.
      
      The motivation is that things that use the root-domain code
      (such as cpupri) only care about cpus classified as "active"
      anyway. By synchronizing the root-domain state with the active
      map, we allow several optimizations.
      
      For instance, we can remove an extra cpumask_and from the
      scheduler hotpath by utilizing rq->rd->online (since it is now
      a cached version of cpu_active_map & rq->rd->span).
      Signed-off-by: NGregory Haskins <ghaskins@novell.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Acked-by: NMax Krasnyansky <maxk@qualcomm.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20090730145723.25226.24493.stgit@dev.haskins.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      00aec93d
    • G
      sched: Enhance the pre/post scheduling logic · 3f029d3c
      Gregory Haskins 提交于
      We currently have an explicit "needs_post" vtable method which
      returns a stack variable for whether we should later run
      post-schedule.  This leads to an awkward exchange of the
      variable as it bubbles back up out of the context switch. Peter
      Zijlstra observed that this information could be stored in the
      run-queue itself instead of handled on the stack.
      
      Therefore, we revert to the method of having context_switch
      return void, and update an internal rq->post_schedule variable
      when we require further processing.
      
      In addition, we fix a race condition where we try to access
      current->sched_class without holding the rq->lock.  This is
      technically racy, as the sched-class could change out from
      under us.  Instead, we reference the per-rq post_schedule
      variable with the runqueue unlocked, but with preemption
      disabled to see if we need to reacquire the rq->lock.
      
      Finally, we clean the code up slightly by removing the #ifdef
      CONFIG_SMP conditionals from the schedule() call, and implement
      some inline helper functions instead.
      
      This patch passes checkpatch, and rt-migrate.
      Signed-off-by: NGregory Haskins <ghaskins@novell.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20090729150422.17691.55590.stgit@dev.haskins.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3f029d3c
  7. 10 7月, 2009 1 次提交
  8. 09 6月, 2009 1 次提交
  9. 01 4月, 2009 1 次提交
  10. 01 2月, 2009 1 次提交
  11. 16 1月, 2009 1 次提交
    • P
      sched: make plist a library facility · ceacc2c1
      Peter Zijlstra 提交于
      Ingo Molnar wrote:
      
      > here's a new build failure with tip/sched/rt:
      >
      >   LD      .tmp_vmlinux1
      > kernel/built-in.o: In function `set_curr_task_rt':
      > sched.c:(.text+0x3675): undefined reference to `plist_del'
      > kernel/built-in.o: In function `pick_next_task_rt':
      > sched.c:(.text+0x37ce): undefined reference to `plist_del'
      > kernel/built-in.o: In function `enqueue_pushable_task':
      > sched.c:(.text+0x381c): undefined reference to `plist_del'
      
      Eliminate the plist library kconfig and make it available
      unconditionally.
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ceacc2c1
  12. 14 1月, 2009 2 次提交
  13. 12 1月, 2009 1 次提交
  14. 04 1月, 2009 1 次提交
    • M
      sched: put back some stack hog changes that were undone in kernel/sched.c · 6ca09dfc
      Mike Travis 提交于
      Impact: prevents panic from stack overflow on numa-capable machines.
      
      Some of the "removal of stack hogs" changes in kernel/sched.c by using
      node_to_cpumask_ptr were undone by the early cpumask API updates, and
      causes a panic due to stack overflow.  This patch undoes those changes
      by using cpumask_of_node() which returns a 'const struct cpumask *'.
      
      In addition, cpu_coregoup_map is replaced with cpu_coregroup_mask further
      reducing stack usage.  (Both of these updates removed 9 FIXME's!)
      
      Also:
         Pick up some remaining changes from the old 'cpumask_t' functions to
         the new 'struct cpumask *' functions.
      
         Optimize memory traffic by allocating each percpu local_cpu_mask on the
         same node as the referring cpu.
      Signed-off-by: NMike Travis <travis@sgi.com>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6ca09dfc
  15. 29 12月, 2008 8 次提交
    • G
      RT: fix push_rt_task() to handle dequeue_pushable properly · 1563513d
      Gregory Haskins 提交于
      A panic was discovered by Chirag Jog where a BUG_ON sanity check
      in the new "pushable_task" logic would trigger a panic under
      certain circumstances:
      
      http://lkml.org/lkml/2008/9/25/189
      
      Gilles Carry discovered that the root cause was attributed to the
      pushable_tasks list getting corrupted in the push_rt_task logic.
      This was the result of a dropped rq lock in double_lock_balance
      allowing a task in the process of being pushed to potentially migrate
      away, and thus corrupt the pushable_tasks() list.
      
      I traced back the problem as introduced by the pushable_tasks patch
      that went in recently.   There is a "retry" path in push_rt_task()
      that actually had a compound conditional to decide whether to
      retry or exit.  I missed the meaning behind the rationale for the
      virtual "if(!task) goto out;" portion of the compound statement and
      thus did not handle it properly.  The new pushable_tasks logic
      actually creates three distinct conditions:
      
      1) an untouched and unpushable task should be dequeued
      2) a migrated task where more pushable tasks remain should be retried
      3) a migrated task where no more pushable tasks exist should exit
      
      The original logic mushed (1) and (3) together, resulting in the
      system dequeuing a migrated task (against an unlocked foreign run-queue
      nonetheless).
      
      To fix this, we get rid of the notion of "paranoid" and we support the
      three unique conditions properly.  The paranoid feature is no longer
      relevant with the new pushable logic (since pushable naturally limits
      the loop) anyway, so lets just remove it.
      Reported-By: NChirag Jog <chirag@linux.vnet.ibm.com>
      Found-by: NGilles Carry <gilles.carry@bull.net>
      Signed-off-by: NGregory Haskins <ghaskins@novell.com>
      1563513d
    • G
      sched: create "pushable_tasks" list to limit pushing to one attempt · 917b627d
      Gregory Haskins 提交于
      The RT scheduler employs a "push/pull" design to actively balance tasks
      within the system (on a per disjoint cpuset basis).  When a task is
      awoken, it is immediately determined if there are any lower priority
      cpus which should be preempted.  This is opposed to the way normal
      SCHED_OTHER tasks behave, which will wait for a periodic rebalancing
      operation to occur before spreading out load.
      
      When a particular RQ has more than 1 active RT task, it is said to
      be in an "overloaded" state.  Once this occurs, the system enters
      the active balancing mode, where it will try to push the task away,
      or persuade a different cpu to pull it over.  The system will stay
      in this state until the system falls back below the <= 1 queued RT
      task per RQ.
      
      However, the current implementation suffers from a limitation in the
      push logic.  Once overloaded, all tasks (other than current) on the
      RQ are analyzed on every push operation, even if it was previously
      unpushable (due to affinity, etc).  Whats more, the operation stops
      at the first task that is unpushable and will not look at items
      lower in the queue.  This causes two problems:
      
      1) We can have the same tasks analyzed over and over again during each
         push, which extends out the fast path in the scheduler for no
         gain.  Consider a RQ that has dozens of tasks that are bound to a
         core.  Each one of those tasks will be encountered and skipped
         for each push operation while they are queued.
      
      2) There may be lower-priority tasks under the unpushable task that
         could have been successfully pushed, but will never be considered
         until either the unpushable task is cleared, or a pull operation
         succeeds.  The net result is a potential latency source for mid
         priority tasks.
      
      This patch aims to rectify these two conditions by introducing a new
      priority sorted list: "pushable_tasks".  A task is added to the list
      each time a task is activated or preempted.  It is removed from the
      list any time it is deactivated, made current, or fails to push.
      
      This works because a task only needs to be attempted to push once.
      After an initial failure to push, the other cpus will eventually try to
      pull the task when the conditions are proper.  This also solves the
      problem that we don't completely analyze all tasks due to encountering
      an unpushable tasks.  Now every task will have a push attempted (when
      appropriate).
      
      This reduces latency both by shorting the critical section of the
      rq->lock for certain workloads, and by making sure the algorithm
      considers all eligible tasks in the system.
      
      [ rostedt: added a couple more BUG_ONs ]
      Signed-off-by: NGregory Haskins <ghaskins@novell.com>
      Acked-by: NSteven Rostedt <srostedt@redhat.com>
      917b627d
    • G
      sched: add sched_class->needs_post_schedule() member · 967fc046
      Gregory Haskins 提交于
      We currently run class->post_schedule() outside of the rq->lock, which
      means that we need to test for the need to post_schedule outside of
      the lock to avoid a forced reacquistion.  This is currently not a problem
      as we only look at rq->rt.overloaded.  However, we want to enhance this
      going forward to look at more state to reduce the need to post_schedule to
      a bare minimum set.  Therefore, we introduce a new member-func called
      needs_post_schedule() which tests for the post_schedule condtion without
      actually performing the work.  Therefore it is safe to call this
      function before the rq->lock is released, because we are guaranteed not
      to drop the lock at an intermediate point (such as what post_schedule()
      may do).
      
      We will use this later in the series
      
      [ rostedt: removed paranoid BUG_ON ]
      Signed-off-by: NGregory Haskins <ghaskins@novell.com>
      967fc046
    • G
      sched: only try to push a task on wakeup if it is migratable · 777c2f38
      Gregory Haskins 提交于
      There is no sense in wasting time trying to push a task away that
      cannot move anywhere else.  We gain no benefit from trying to push
      other tasks at this point, so if the task being woken up is non
      migratable, just skip the whole operation.  This reduces overhead
      in the wakeup path for certain tasks.
      Signed-off-by: NGregory Haskins <ghaskins@novell.com>
      777c2f38
    • G
      sched: use highest_prio.next to optimize pull operations · 74ab8e4f
      Gregory Haskins 提交于
      We currently take the rq->lock for every cpu in an overload state during
      pull_rt_tasks().  However, we now have enough information via the
      highest_prio.[curr|next] fields to determine if there is any tasks of
      interest to warrant the overhead of the rq->lock, before we actually take
      it.  So we use this information to reduce lock contention during the
      pull for the case where the source-rq doesnt have tasks that preempt
      the current task.
      Signed-off-by: NGregory Haskins <ghaskins@novell.com>
      74ab8e4f
    • G
      sched: use highest_prio.curr for pull threshold · a8728944
      Gregory Haskins 提交于
      highest_prio.curr is actually a more accurate way to keep track of
      the pull_rt_task() threshold since it is always up to date, even
      if the "next" task migrates during double_lock.  Therefore, stop
      looking at the "next" task object and simply use the highest_prio.curr.
      Signed-off-by: NGregory Haskins <ghaskins@novell.com>
      a8728944
    • G
      sched: track the next-highest priority on each runqueue · e864c499
      Gregory Haskins 提交于
      We will use this later in the series to reduce the amount of rq-lock
      contention during a pull operation
      Signed-off-by: NGregory Haskins <ghaskins@novell.com>
      e864c499
    • G
      sched: cleanup inc/dec_rt_tasks · 4d984277
      Gregory Haskins 提交于
      Move some common definitions up to the function prologe to simplify the
      body logic.
      Signed-off-by: NGregory Haskins <ghaskins@novell.com>
      4d984277
  16. 17 12月, 2008 1 次提交
  17. 29 11月, 2008 1 次提交
    • A
      sched: move double_unlock_balance() higher · 70574a99
      Alexey Dobriyan 提交于
      Move double_lock_balance()/double_unlock_balance() higher to fix the following
      with gcc-3.4.6:
      
         CC      kernel/sched.o
       In file included from kernel/sched.c:1605:
       kernel/sched_rt.c: In function `find_lock_lowest_rq':
       kernel/sched_rt.c:914: sorry, unimplemented: inlining failed in call to 'double_unlock_balance': function body not available
       kernel/sched_rt.c:1077: sorry, unimplemented: called from here
       make[2]: *** [kernel/sched.o] Error 1
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      70574a99
  18. 26 11月, 2008 1 次提交
  19. 25 11月, 2008 5 次提交
  20. 07 11月, 2008 1 次提交
    • S
      sched, lockdep: inline double_unlock_balance() · cf7f8690
      Sripathi Kodi 提交于
      We have a test case which measures the variation in the amount of time
      needed to perform a fixed amount of work on the preempt_rt kernel. We
      started seeing deterioration in it's performance recently. The test
      should never take more than 10 microseconds, but we started 5-10%
      failure rate.
      
      Using elimination method, we traced the problem to commit
      1b12bbc7 (lockdep: re-annotate
      scheduler runqueues).
      
      When LOCKDEP is disabled, this patch only adds an additional function
      call to double_unlock_balance(). Hence I inlined double_unlock_balance()
      and the problem went away. Here is a patch to make this change.
      Signed-off-by: NSripathi Kodi <sripathik@in.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cf7f8690
  21. 03 11月, 2008 1 次提交
    • D
      sched/rt: small optimization to update_curr_rt() · e113a745
      Dimitri Sivanich 提交于
      Impact: micro-optimization to SCHED_FIFO/RR scheduling
      
      A very minor improvement, but might it be better to check sched_rt_runtime(rt_rq)
      before taking the rt_runtime_lock?
      
      Peter Zijlstra observes:
      
      > Yes, I think its ok to do so.
      >
      > Like pointed out in the other thread, there are two races:
      >
      >  - sched_rt_runtime() going to RUNTIME_INF, and that will be handled
      >    properly by sched_rt_runtime_exceeded()
      >
      >  - sched_rt_runtime() going to !RUNTIME_INF, and here we can miss an
      >    accounting cycle, but I don't think that is something to worry too
      >    much about.
      Signed-off-by: NDimitri Sivanich <sivanich@sgi.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      
      --
      
       kernel/sched_rt.c |    4 ++--
       1 file changed, 2 insertions(+), 2 deletions(-)
      e113a745
  22. 22 10月, 2008 1 次提交
  23. 04 10月, 2008 1 次提交
    • D
      sched_rt.c: resch needed in rt_rq_enqueue() for the root rt_rq · f6121f4f
      Dario Faggioli 提交于
      While working on the new version of the code for SCHED_SPORADIC I
      noticed something strange in the present throttling mechanism. More
      specifically in the throttling timer handler in sched_rt.c
      (do_sched_rt_period_timer()) and in rt_rq_enqueue().
      
      The problem is that, when unthrottling a runqueue, rt_rq_enqueue() only
      asks for rescheduling if the runqueue has a sched_entity associated to
      it (i.e., rt_rq->rt_se != NULL).
      Now, if the runqueue is the root rq (which has a rt_se = NULL)
      rescheduling does not take place, and it is delayed to some undefined
      instant in the future.
      
      This imply some random bandwidth usage by the RT tasks under throttling.
      For instance, setting rt_runtime_us/rt_period_us = 950ms/1000ms an RT
      task will get less than 95%. In our tests we got something varying
      between 70% to 95%.
      Using smaller time values, e.g., 95ms/100ms, things are even worse, and
      I can see values also going down to 20-25%!!
      
      The tests we performed are simply running 'yes' as a SCHED_FIFO task,
      and checking the CPU usage with top, but we can investigate thoroughly
      if you think it is needed.
      
      Things go much better, for us, with the attached patch... Don't know if
      it is the best approach, but it solved the issue for us.
      Signed-off-by: NDario Faggioli <raistlin@linux.it>
      Signed-off-by: NMichael Trimarchi <trimarchimichael@yahoo.it>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f6121f4f