1. 04 11月, 2014 3 次提交
    • W
      sched/deadline: Add deadline rq status print · acb32132
      Wanpeng Li 提交于
      This patch add deadline rq status print.
      Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Juri Lelli <juri.lelli@arm.com>
      Cc: Kirill Tkhai <ktkhai@parallels.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1414708776-124078-3-git-send-email-wanpeng.li@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      acb32132
    • W
      sched/deadline: Fix artificial overrun introduced by yield_task_dl() · 80496880
      Wanpeng Li 提交于
      The yield semantic of deadline class is to reduce remaining runtime to
      zero, and then update_curr_dl() will stop it. However, comsumed bandwidth
      is reduced from the budget of yield task again even if it has already been
      set to zero which leads to artificial overrun. This patch fix it by make
      sure we don't steal some more time from the task that yielded in update_curr_dl().
      Suggested-by: NJuri Lelli <juri.lelli@arm.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Kirill Tkhai <ktkhai@parallels.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1414708776-124078-2-git-send-email-wanpeng.li@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      80496880
    • K
      sched/deadline: Implement cancel_dl_timer() to use in switched_from_dl() · 67dfa1b7
      Kirill Tkhai 提交于
      Currently used hrtimer_try_to_cancel() is racy:
      
      raw_spin_lock(&rq->lock)
      ...                            dl_task_timer                 raw_spin_lock(&rq->lock)
      ...                               raw_spin_lock(&rq->lock)   ...
         switched_from_dl()             ...                        ...
            hrtimer_try_to_cancel()     ...                        ...
         switched_to_fair()             ...                        ...
      ...                               ...                        ...
      ...                               ...                        ...
      raw_spin_unlock(&rq->lock)        ...                        (asquired)
      ...                               ...                        ...
      ...                               ...                        ...
      do_exit()                         ...                        ...
         schedule()                     ...                        ...
            raw_spin_lock(&rq->lock)    ...                        raw_spin_unlock(&rq->lock)
            ...                         ...                        ...
            raw_spin_unlock(&rq->lock)  ...                        raw_spin_lock(&rq->lock)
            ...                         ...                        (asquired)
            put_task_struct()           ...                        ...
                free_task_struct()      ...                        ...
            ...                         ...                        raw_spin_unlock(&rq->lock)
      ...                               (asquired)                 ...
      ...                               ...                        ...
      ...                               (use after free)           ...
      
      So, let's implement 100% guaranteed way to cancel the timer and let's
      be sure we are safe even in very unlikely situations.
      
      rq unlocking does not limit the area of switched_from_dl() use, because
      this has already been possible in pull_dl_task() below.
      
      Let's consider the safety of of this unlocking. New code in the patch
      is working when hrtimer_try_to_cancel() fails. This means the callback
      is running. In this case hrtimer_cancel() is just waiting till the
      callback is finished. Two
      
      1) Since we are in switched_from_dl(), new class is not dl_sched_class and
      new prio is not less MAX_DL_PRIO. So, the callback returns early; it's
      right after !dl_task() check. After that hrtimer_cancel() returns back too.
      
      The above is:
      
      raw_spin_lock(rq->lock);                  ...
      ...                                       dl_task_timer()
      ...                                          raw_spin_lock(rq->lock);
         switched_from_dl()                        ...
             hrtimer_try_to_cancel()               ...
                raw_spin_unlock(rq->lock);         ...
                hrtimer_cancel()                   ...
                ...                                raw_spin_unlock(rq->lock);
                ...                                return HRTIMER_NORESTART;
                ...                             ...
                raw_spin_lock(rq->lock);        ...
      
      2) But the below is also possible:
                                         dl_task_timer()
                                            raw_spin_lock(rq->lock);
                                            ...
                                            raw_spin_unlock(rq->lock);
      raw_spin_lock(rq->lock);              ...
         switched_from_dl()                 ...
             hrtimer_try_to_cancel()        ...
             ...                            return HRTIMER_NORESTART;
             raw_spin_unlock(rq->lock);  ...
             hrtimer_cancel();           ...
             raw_spin_lock(rq->lock);    ...
      
      In this case hrtimer_cancel() returns immediately. Very unlikely case,
      just to mention.
      
      Nobody can manipulate the task, because check_class_changed() is
      always called with pi_lock locked. Nobody can force the task to
      participate in (concurrent) priority inheritance schemes (the same reason).
      
      All concurrent task operations require pi_lock, which is held by us.
      No deadlocks with dl_task_timer() are possible, because it returns
      right after !dl_task() check (it does nothing).
      
      If we receive a new dl_task during the time of unlocked rq, we just
      don't have to do pull_dl_task() in switched_from_dl() further.
      Signed-off-by: NKirill Tkhai <ktkhai@parallels.com>
      [ Added comments]
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NJuri Lelli <juri.lelli@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1414420852.19914.186.camel@tkhaiSigned-off-by: NIngo Molnar <mingo@kernel.org>
      67dfa1b7
  2. 28 10月, 2014 7 次提交
  3. 24 9月, 2014 2 次提交
  4. 19 9月, 2014 1 次提交
  5. 07 9月, 2014 1 次提交
    • X
      sched/deadline: Fix a precision problem in the microseconds range · 177ef2a6
      xiaofeng.yan 提交于
      An overrun could happen in function start_hrtick_dl()
      when a task with SCHED_DEADLINE runs in the microseconds
      range.
      
      For example, if a task with SCHED_DEADLINE has the following parameters:
      
        Task  runtime  deadline  period
         P1   200us     500us    500us
      
      The deadline and period from task P1 are less than 1ms.
      
      In order to achieve microsecond precision, we need to enable HRTICK feature
      by the next command:
      
        PC#echo "HRTICK" > /sys/kernel/debug/sched_features
        PC#trace-cmd record -e sched_switch &
        PC#./schedtool -E -t 200000:500000:500000 -e ./test
      
      The binary test is in an endless while(1) loop here.
      Some pieces of trace.dat are as follows:
      
        <idle>-0   157.603157: sched_switch: :R ==> 2481:4294967295: test
        test-2481  157.603203: sched_switch:  2481:R ==> 0:120: swapper/2
        <idle>-0   157.605657: sched_switch:  :R ==> 2481:4294967295: test
        test-2481  157.608183: sched_switch:  2481:R ==> 2483:120: trace-cmd
        trace-cmd-2483 157.609656: sched_switch:2483:R==>2481:4294967295: test
      
      We can get the runtime of P1 from the information above:
      
        runtime = 157.608183 - 157.605657
        runtime = 0.002526(2.526ms)
      
      The correct runtime should be less than or equal to 200us at some point.
      
      The problem is caused by a conditional judgment "delta > 10000"
      in function start_hrtick_dl().
      
      Because no hrtimer start up to control the rest of runtime
      when the reset of runtime is less than 10us.
      
      So the process will continue to run until tick-period is coming.
      
      Move the code with the limit of the least time slice
      from hrtick_start_fair() to hrtick_start() because the
      EDF schedule class also needs this function in start_hrtick_dl().
      
      To fix this problem, we call hrtimer_start() unconditionally in
      start_hrtick_dl(), and make sure the scheduling slice won't be smaller
      than 10us in hrtimer_start().
      Signed-off-by: NXiaofeng Yan <xiaofeng.yan@huawei.com>
      Reviewed-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NJuri Lelli <juri.lelli@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1409022941-5880-1-git-send-email-xiaofeng.yan@huawei.com
      [ Massaged the changelog and the code. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      177ef2a6
  6. 28 8月, 2014 1 次提交
    • C
      percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t · 4ba29684
      Christoph Lameter 提交于
      __get_cpu_var can paper over differences in the definitions of
      cpumask_var_t and either use the address of the cpumask variable
      directly or perform a fetch of the address of the struct cpumask
      allocated elsewhere. This is important particularly when using per cpu
      cpumask_var_t declarations because in one case we have an offset into
      a per cpu area to handle and in the other case we need to fetch a
      pointer from the offset.
      
      This patch introduces a new macro
      
      this_cpu_cpumask_var_ptr()
      
      that is defined where cpumask_var_t is defined and performs the proper
      actions. All use cases where __get_cpu_var is used with cpumask_var_t
      are converted to the use of this_cpu_cpumask_var_ptr().
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      4ba29684
  7. 20 8月, 2014 1 次提交
    • K
      sched: Add wrapper for checking task_struct::on_rq · da0c1e65
      Kirill Tkhai 提交于
      Implement task_on_rq_queued() and use it everywhere instead of
      on_rq check. No functional changes.
      
      The only exception is we do not use the wrapper in
      check_for_tasks(), because it requires to export
      task_on_rq_queued() in global header files. Next patch in series
      would return it back, so we do not twist it from here to there.
      Signed-off-by: NKirill Tkhai <ktkhai@parallels.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Turner <pjt@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Kirill Tkhai <tkhai@yandex.ru>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1408528052.23412.87.camel@tkhaiSigned-off-by: NIngo Molnar <mingo@kernel.org>
      da0c1e65
  8. 16 7月, 2014 2 次提交
  9. 05 6月, 2014 4 次提交
  10. 22 5月, 2014 2 次提交
  11. 07 5月, 2014 1 次提交
  12. 17 4月, 2014 1 次提交
  13. 11 3月, 2014 1 次提交
  14. 27 2月, 2014 2 次提交
    • J
      sched/deadline: Prevent rt_time growth to infinity · faa59937
      Juri Lelli 提交于
      Kirill Tkhai noted:
      
        Since deadline tasks share rt bandwidth, we must care about
        bandwidth timer set. Otherwise rt_time may grow up to infinity
        in update_curr_dl(), if there are no other available RT tasks
        on top level bandwidth.
      
      RT task were in fact throttled right after they got enqueued,
      and never executed again (rt_time never again went below rt_runtime).
      
      Peter then proposed to accrue DL execution on rt_time only when
      rt timer is active, and proposed a patch (this patch is a slight
      modification of that) to implement that behavior. While this
      solves Kirill problem, it has a drawback.
      
      Indeed, Kirill noted again:
      
        It looks we may get into a situation, when all CPU time is shared
        between RT and DL tasks:
      
        rt_runtime = n
        rt_period  = 2n
      
        | RT working, DL sleeping  | DL working, RT sleeping      |
        -----------------------------------------------------------
        | (1)     duration = n     | (2)     duration = n         | (repeat)
        |--------------------------|------------------------------|
        | (rt_bw timer is running) | (rt_bw timer is not running) |
      
        No time for fair tasks at all.
      
      While this can happen during the first period, if rq is always backlogged,
      RT tasks won't have the opportunity to execute anymore: rt_time reached
      rt_runtime during (1), suppose after (2) RT is enqueued back, it gets
      throttled since rt timer didn't fire, replenishment is from now on eaten up
      by DL tasks that accrue their execution on rt_time (while rt timer is
      active - we have an RT task waiting for replenishment). FAIR tasks are
      not touched after this first period. Ok, this is not ideal, and the situation
      is even worse!
      
      What above (the nice case), practically never happens in reality, where
      your rt timer is not aligned to tasks periods, tasks are in general not
      periodic, etc.. Long story short, you always risk to overload your system.
      
      This patch is based on Peter's idea, but exploits an additional fact:
      if you don't have RT tasks enqueued, it makes little sense to continue
      incrementing rt_time once you reached the upper limit (DL tasks have their
      own mechanism for throttling).
      
      This cures both problems:
      
       - no matter how many DL instances in the past, you'll have an rt_time
         slightly above rt_runtime when an RT task is enqueued, and from that
         point on (after the first replenishment), the task will normally execute;
      
       - you can still eat up all bandwidth during the first period, but not
         anymore after that, remember that DL execution will increment rt_time
         till the upper limit is reached.
      
      The situation is still not perfect! But, we have a simple solution for now,
      that limits how much you can jeopardize your system, as we keep working
      towards the right answer: RT groups scheduled using deadline servers.
      Reported-by: NKirill Tkhai <tkhai@yandex.ru>
      Signed-off-by: NJuri Lelli <juri.lelli@gmail.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/20140225151515.617714e2f2cd6c558531ba61@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      faa59937
    • K
      sched/deadline: Cleanup RT leftovers from {inc/dec}_dl_migration · 3908ac13
      Kirill Tkhai 提交于
      In deadline class we do not have group scheduling.
      
      So, let's remove unnecessary
      
      	X = X;
      
      equations.
      Signed-off-by: NKirill Tkhai <ktkhai@parallels.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Juri Lelli <juri.lelli@gmail.com>
      Link: http://lkml.kernel.org/r/1393343543.4089.5.camel@tkhaiSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3908ac13
  15. 22 2月, 2014 4 次提交
    • P
      sched: Remove some #ifdeffery · dc877341
      Peter Zijlstra 提交于
      Remove a few gratuitous #ifdefs in pick_next_task*().
      
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Juri Lelli <juri.lelli@gmail.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/n/tip-nnzddp5c4fijyzzxxrwlxghf@git.kernel.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      dc877341
    • P
      sched: Fix hotplug task migration · 3f1d2a31
      Peter Zijlstra 提交于
      Dan Carpenter reported:
      
      > kernel/sched/rt.c:1347 pick_next_task_rt() warn: variable dereferenced before check 'prev' (see line 1338)
      > kernel/sched/deadline.c:1011 pick_next_task_dl() warn: variable dereferenced before check 'prev' (see line 1005)
      
      Kirill also spotted that migrate_tasks() will have an instant NULL
      deref because pick_next_task() will immediately deref prev.
      
      Instead of fixing all the corner cases because migrate_tasks() can
      pass in a NULL prev task in the unlikely case of hot-un-plug, provide
      a fake task such that we can remove all the NULL checks from the far
      more common paths.
      
      A further problem; not previously spotted; is that because we pushed
      pre_schedule() and idle_balance() into pick_next_task() we now need to
      avoid those getting called and pulling more tasks on our dying CPU.
      
      We avoid pull_{dl,rt}_task() by setting fake_task.prio to MAX_PRIO+1.
      We also note that since we call pick_next_task() exactly the amount of
      times we have runnable tasks present, we should never land in
      idle_balance().
      
      Fixes: 38033c37 ("sched: Push down pre_schedule() and idle_balance()")
      Cc: Juri Lelli <juri.lelli@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Reported-by: NKirill Tkhai <tkhai@yandex.ru>
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20140212094930.GB3545@laptop.programming.kicks-ass.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      3f1d2a31
    • K
      sched/deadline: Remove useless dl_nr_total · 995b9ea4
      Kirill Tkhai 提交于
      In deadline class we do not have group scheduling like in RT.
      
      dl_nr_total is the same as dl_nr_running. So, one of them should
      be removed.
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Juri Lelli <juri.lelli@gmail.com>
      Signed-off-by: NKirill Tkhai <tkhai@yandex.ru>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/368631392675853@web20h.yandex.ruSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      995b9ea4
    • J
      sched/deadline: Fix bad accounting of nr_running · 3d5f35bd
      Juri Lelli 提交于
      Rostedt writes:
      
      My test suite was locking up hard when enabling mmiotracer. This was due
      to the mmiotracer placing all but one CPU offline. I found this out
      when I was able to reproduce the bug with just my stress-cpu-hotplug
      test. This bug baffled me because it would not always trigger, and
      would only trigger on the first run after boot up. The
      stress-cpu-hotplug test would crash hard the first run, or never crash
      at all. But a new reboot may cause it to crash on the first run again.
      
      I spent all week bisecting this, as I couldn't find a consistent
      reproducer. I finally narrowed it down to the sched deadline patches,
      and even more peculiar, to the commit that added the sched
      deadline boot up self test to the latency tracer. Then it dawned on me
      to what the bug was.
      
      All it took was to run a task under sched deadline to screw up the CPU
      hot plugging. This explained why it would lock up only on the first run
      of the stress-cpu-hotplug test. The bug happened when the boot up self
      test of the schedule latency tracer would test a deadline task. The
      deadline task would corrupt something that would cause CPU hotplug to
      fail. If it didn't corrupt it, the stress test would always work
      (there's no other sched deadline tasks that would run to cause
      problems). If it did corrupt on boot up, the first test would lockup
      hard.
      
      I proved this theory by running my deadline test program on another box,
      and then run the stress-cpu-hotplug test, and it would now consistently
      lock up. I could run stress-cpu-hotplug over and over with no problem,
      but once I ran the deadline test, the next run of the
      stress-cpu-hotplug would lock hard.
      
      After adding lots of tracing to the code, I found the cause. The
      function tracer showed that migrate_tasks() was stuck in an infinite
      loop, where rq->nr_running never equaled 1 to break out of it. When I
      added a trace_printk() to see what that number was, it was 335 and
      never decrementing!
      
      Looking at the deadline code I found:
      
      static void __dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags) {
      	dequeue_dl_entity(&p->dl);
      	dequeue_pushable_dl_task(rq, p);
      }
      
      static void dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags) {
      	update_curr_dl(rq);
      	__dequeue_task_dl(rq, p, flags);
      
      	dec_nr_running(rq);
      }
      
      And this:
      
      	if (dl_runtime_exceeded(rq, dl_se)) {
      		__dequeue_task_dl(rq, curr, 0);
      		if (likely(start_dl_timer(dl_se, curr->dl.dl_boosted)))
      			dl_se->dl_throttled = 1;
      		else
      			enqueue_task_dl(rq, curr, ENQUEUE_REPLENISH);
      
      		if (!is_leftmost(curr, &rq->dl))
      			resched_task(curr);
      	}
      
      Notice how we call __dequeue_task_dl() and in the else case we
      call enqueue_task_dl()? Also notice that dequeue_task_dl() has
      underscores where enqueue_task_dl() does not. The enqueue_task_dl()
      calls inc_nr_running(rq), but __dequeue_task_dl() does not. This is
      where we get nr_running out of sync.
      
      [snip]
      
      Another point where nr_running can get out of sync is when the dl_timer
      fires:
      
      	dl_se->dl_throttled = 0;
      	if (p->on_rq) {
      		enqueue_task_dl(rq, p, ENQUEUE_REPLENISH);
      		if (task_has_dl_policy(rq->curr))
      			check_preempt_curr_dl(rq, p, 0);
      		else
      			resched_task(rq->curr);
      
      This patch does two things:
      
       - correctly accounts for throttled tasks (that are now considered
         !running);
      
       - fixes the bug, updating nr_running from {inc,dec}_dl_tasks(),
         since we risk to update it twice in some situations (e.g., a
         task is dequeued while it has exceeded its budget).
      
      Cc: mingo@redhat.com
      Cc: torvalds@linux-foundation.org
      Cc: akpm@linux-foundation.org
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Tested-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NJuri Lelli <juri.lelli@gmail.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1392884379-13744-1-git-send-email-juri.lelli@gmail.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      3d5f35bd
  16. 11 2月, 2014 1 次提交
    • P
      sched: Push down pre_schedule() and idle_balance() · 38033c37
      Peter Zijlstra 提交于
      This patch both merged idle_balance() and pre_schedule() and pushes
      both of them into pick_next_task().
      
      Conceptually pre_schedule() and idle_balance() are rather similar,
      both are used to pull more work onto the current CPU.
      
      We cannot however first move idle_balance() into pre_schedule_fair()
      since there is no guarantee the last runnable task is a fair task, and
      thus we would miss newidle balances.
      
      Similarly, the dl and rt pre_schedule calls must be ran before
      idle_balance() since their respective tasks have higher priority and
      it would not do to delay their execution searching for less important
      tasks first.
      
      However, by noticing that pick_next_tasks() already traverses the
      sched_class hierarchy in the right order, we can get the right
      behaviour and do away with both calls.
      
      We must however change the special case optimization to also require
      that prev is of sched_class_fair, otherwise we can miss doing a dl or
      rt pull where we needed one.
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/n/tip-a8k6vvaebtn64nie345kx1je@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      38033c37
  17. 10 2月, 2014 1 次提交
  18. 09 2月, 2014 1 次提交
  19. 28 1月, 2014 1 次提交
  20. 16 1月, 2014 1 次提交
  21. 13 1月, 2014 2 次提交
    • P
      sched/deadline: Remove the sysctl_sched_dl knobs · 1724813d
      Peter Zijlstra 提交于
      Remove the deadline specific sysctls for now. The problem with them is
      that the interaction with the exisiting rt knobs is nearly impossible
      to get right.
      
      The current (as per before this patch) situation is that the rt and dl
      bandwidth is completely separate and we enforce rt+dl < 100%. This is
      undesirable because this means that the rt default of 95% leaves us
      hardly any room, even though dl tasks are saver than rt tasks.
      
      Another proposed solution was (a discarted patch) to have the dl
      bandwidth be a fraction of the rt bandwidth. This is highly
      confusing imo.
      
      Furthermore neither proposal is consistent with the situation we
      actually want; which is rt tasks ran from a dl server. In which case
      the rt bandwidth is a direct subset of dl.
      
      So whichever way we go, the introduction of dl controls at this point
      is painful. Therefore remove them and instead share the rt budget.
      
      This means that for now the rt knobs are used for dl admission control
      and the dl runtime is accounted against the rt runtime. I realise that
      this isn't entirely desirable either; but whatever we do we appear to
      need to change the interface later, so better have a small interface
      for now.
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/n/tip-zpyqbqds1r0vyxtxza1e7rdc@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1724813d
    • J
      sched/deadline: speed up SCHED_DEADLINE pushes with a push-heap · 6bfd6d72
      Juri Lelli 提交于
      Data from tests confirmed that the original active load balancing
      logic didn't scale neither in the number of CPU nor in the number of
      tasks (as sched_rt does).
      
      Here we provide a global data structure to keep track of deadlines
      of the running tasks in the system. The structure is composed by
      a bitmask showing the free CPUs and a max-heap, needed when the system
      is heavily loaded.
      
      The implementation and concurrent access scheme are kept simple by
      design. However, our measurements show that we can compete with sched_rt
      on large multi-CPUs machines [1].
      
      Only the push path is addressed, the extension to use this structure
      also for pull decisions is straightforward. However, we are currently
      evaluating different (in order to decrease/avoid contention) data
      structures to solve possibly both problems. We are also going to re-run
      tests considering recent changes inside cpupri [2].
      
       [1] http://retis.sssup.it/~jlelli/papers/Ospert11Lelli.pdf
       [2] http://www.spinics.net/lists/linux-rt-users/msg06778.htmlSigned-off-by: NJuri Lelli <juri.lelli@gmail.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1383831828-15501-14-git-send-email-juri.lelli@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6bfd6d72