1. 16 11月, 2014 4 次提交
    • W
      sched/deadline: Introduce start_hrtick_dl() for !CONFIG_SCHED_HRTICK · 36ce9881
      Wanpeng Li 提交于
      Introduce start_hrtick_dl for !CONFIG_SCHED_HRTICK to align with
      the fair class.
      Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Juri Lelli <juri.lelli@arm.com>
      Cc: Kirill Tkhai <ktkhai@parallels.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1415670747-58726-1-git-send-email-wanpeng.li@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      36ce9881
    • W
      sched/deadline: Fix rq->dl.pushable_tasks bug in push_dl_task() · c51b8ab5
      Wanpeng Li 提交于
      Do not call dequeue_pushable_dl_task() when failing to push an eligible
      task, as it remains pushable, merely not at this particular moment.
      
      Actually the patch is the same behavior as commit 311e800e ("sched,
      rt: Fix rq->rt.pushable_tasks bug in push_rt_task()" in -rt side.
      Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Juri Lelli <juri.lelli@arm.com>
      Cc: Kirill Tkhai <ktkhai@parallels.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1415258564-8573-1-git-send-email-wanpeng.li@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c51b8ab5
    • W
      sched: Move p->nr_cpus_allowed check to select_task_rq() · 6c1d9410
      Wanpeng Li 提交于
      Move the p->nr_cpus_allowed check into kernel/sched/core.c: select_task_rq().
      This change will make fair.c, rt.c, and deadline.c all start with the
      same logic.
      Suggested-and-Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: "pang.xunlei" <pang.xunlei@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1415150077-59053-1-git-send-email-wanpeng.li@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6c1d9410
    • S
      sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency · 6e998916
      Stanislaw Gruszka 提交于
      Commit d670ec13 "posix-cpu-timers: Cure SMP wobbles" fixes one glibc
      test case in cost of breaking another one. After that commit, calling
      clock_nanosleep(TIMER_ABSTIME, X) and then clock_gettime(&Y) can result
      of Y time being smaller than X time.
      
      Reproducer/tester can be found further below, it can be compiled and ran by:
      
      	gcc -o tst-cpuclock2 tst-cpuclock2.c -pthread
      	while ./tst-cpuclock2 ; do : ; done
      
      This reproducer, when running on a buggy kernel, will complain
      about "clock_gettime difference too small".
      
      Issue happens because on start in thread_group_cputimer() we initialize
      sum_exec_runtime of cputimer with threads runtime not yet accounted and
      then add the threads runtime to running cputimer again on scheduler
      tick, making it's sum_exec_runtime bigger than actual threads runtime.
      
      KOSAKI Motohiro posted a fix for this problem, but that patch was never
      applied: https://lkml.org/lkml/2013/5/26/191 .
      
      This patch takes different approach to cure the problem. It calls
      update_curr() when cputimer starts, that assure we will have updated
      stats of running threads and on the next schedule tick we will account
      only the runtime that elapsed from cputimer start. That also assure we
      have consistent state between cpu times of individual threads and cpu
      time of the process consisted by those threads.
      
      Full reproducer (tst-cpuclock2.c):
      
      	#define _GNU_SOURCE
      	#include <unistd.h>
      	#include <sys/syscall.h>
      	#include <stdio.h>
      	#include <time.h>
      	#include <pthread.h>
      	#include <stdint.h>
      	#include <inttypes.h>
      
      	/* Parameters for the Linux kernel ABI for CPU clocks.  */
      	#define CPUCLOCK_SCHED          2
      	#define MAKE_PROCESS_CPUCLOCK(pid, clock) \
      		((~(clockid_t) (pid) << 3) | (clockid_t) (clock))
      
      	static pthread_barrier_t barrier;
      
      	/* Help advance the clock.  */
      	static void *chew_cpu(void *arg)
      	{
      		pthread_barrier_wait(&barrier);
      		while (1) ;
      
      		return NULL;
      	}
      
      	/* Don't use the glibc wrapper.  */
      	static int do_nanosleep(int flags, const struct timespec *req)
      	{
      		clockid_t clock_id = MAKE_PROCESS_CPUCLOCK(0, CPUCLOCK_SCHED);
      
      		return syscall(SYS_clock_nanosleep, clock_id, flags, req, NULL);
      	}
      
      	static int64_t tsdiff(const struct timespec *before, const struct timespec *after)
      	{
      		int64_t before_i = before->tv_sec * 1000000000ULL + before->tv_nsec;
      		int64_t after_i = after->tv_sec * 1000000000ULL + after->tv_nsec;
      
      		return after_i - before_i;
      	}
      
      	int main(void)
      	{
      		int result = 0;
      		pthread_t th;
      
      		pthread_barrier_init(&barrier, NULL, 2);
      
      		if (pthread_create(&th, NULL, chew_cpu, NULL) != 0) {
      			perror("pthread_create");
      			return 1;
      		}
      
      		pthread_barrier_wait(&barrier);
      
      		/* The test.  */
      		struct timespec before, after, sleeptimeabs;
      		int64_t sleepdiff, diffabs;
      		const struct timespec sleeptime = {.tv_sec = 0,.tv_nsec = 100000000 };
      
      		/* The relative nanosleep.  Not sure why this is needed, but its presence
      		   seems to make it easier to reproduce the problem.  */
      		if (do_nanosleep(0, &sleeptime) != 0) {
      			perror("clock_nanosleep");
      			return 1;
      		}
      
      		/* Get the current time.  */
      		if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &before) < 0) {
      			perror("clock_gettime[2]");
      			return 1;
      		}
      
      		/* Compute the absolute sleep time based on the current time.  */
      		uint64_t nsec = before.tv_nsec + sleeptime.tv_nsec;
      		sleeptimeabs.tv_sec = before.tv_sec + nsec / 1000000000;
      		sleeptimeabs.tv_nsec = nsec % 1000000000;
      
      		/* Sleep for the computed time.  */
      		if (do_nanosleep(TIMER_ABSTIME, &sleeptimeabs) != 0) {
      			perror("absolute clock_nanosleep");
      			return 1;
      		}
      
      		/* Get the time after the sleep.  */
      		if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &after) < 0) {
      			perror("clock_gettime[3]");
      			return 1;
      		}
      
      		/* The time after sleep should always be equal to or after the absolute sleep
      		   time passed to clock_nanosleep.  */
      		sleepdiff = tsdiff(&sleeptimeabs, &after);
      		if (sleepdiff < 0) {
      			printf("absolute clock_nanosleep woke too early: %" PRId64 "\n", sleepdiff);
      			result = 1;
      
      			printf("Before %llu.%09llu\n", before.tv_sec, before.tv_nsec);
      			printf("After  %llu.%09llu\n", after.tv_sec, after.tv_nsec);
      			printf("Sleep  %llu.%09llu\n", sleeptimeabs.tv_sec, sleeptimeabs.tv_nsec);
      		}
      
      		/* The difference between the timestamps taken before and after the
      		   clock_nanosleep call should be equal to or more than the duration of the
      		   sleep.  */
      		diffabs = tsdiff(&before, &after);
      		if (diffabs < sleeptime.tv_nsec) {
      			printf("clock_gettime difference too small: %" PRId64 "\n", diffabs);
      			result = 1;
      		}
      
      		pthread_cancel(th);
      
      		return result;
      	}
      Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20141112155843.GA24803@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6e998916
  2. 04 11月, 2014 6 次提交
  3. 28 10月, 2014 7 次提交
  4. 24 9月, 2014 2 次提交
  5. 19 9月, 2014 1 次提交
  6. 07 9月, 2014 1 次提交
    • X
      sched/deadline: Fix a precision problem in the microseconds range · 177ef2a6
      xiaofeng.yan 提交于
      An overrun could happen in function start_hrtick_dl()
      when a task with SCHED_DEADLINE runs in the microseconds
      range.
      
      For example, if a task with SCHED_DEADLINE has the following parameters:
      
        Task  runtime  deadline  period
         P1   200us     500us    500us
      
      The deadline and period from task P1 are less than 1ms.
      
      In order to achieve microsecond precision, we need to enable HRTICK feature
      by the next command:
      
        PC#echo "HRTICK" > /sys/kernel/debug/sched_features
        PC#trace-cmd record -e sched_switch &
        PC#./schedtool -E -t 200000:500000:500000 -e ./test
      
      The binary test is in an endless while(1) loop here.
      Some pieces of trace.dat are as follows:
      
        <idle>-0   157.603157: sched_switch: :R ==> 2481:4294967295: test
        test-2481  157.603203: sched_switch:  2481:R ==> 0:120: swapper/2
        <idle>-0   157.605657: sched_switch:  :R ==> 2481:4294967295: test
        test-2481  157.608183: sched_switch:  2481:R ==> 2483:120: trace-cmd
        trace-cmd-2483 157.609656: sched_switch:2483:R==>2481:4294967295: test
      
      We can get the runtime of P1 from the information above:
      
        runtime = 157.608183 - 157.605657
        runtime = 0.002526(2.526ms)
      
      The correct runtime should be less than or equal to 200us at some point.
      
      The problem is caused by a conditional judgment "delta > 10000"
      in function start_hrtick_dl().
      
      Because no hrtimer start up to control the rest of runtime
      when the reset of runtime is less than 10us.
      
      So the process will continue to run until tick-period is coming.
      
      Move the code with the limit of the least time slice
      from hrtick_start_fair() to hrtick_start() because the
      EDF schedule class also needs this function in start_hrtick_dl().
      
      To fix this problem, we call hrtimer_start() unconditionally in
      start_hrtick_dl(), and make sure the scheduling slice won't be smaller
      than 10us in hrtimer_start().
      Signed-off-by: NXiaofeng Yan <xiaofeng.yan@huawei.com>
      Reviewed-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NJuri Lelli <juri.lelli@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1409022941-5880-1-git-send-email-xiaofeng.yan@huawei.com
      [ Massaged the changelog and the code. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      177ef2a6
  7. 28 8月, 2014 1 次提交
    • C
      percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t · 4ba29684
      Christoph Lameter 提交于
      __get_cpu_var can paper over differences in the definitions of
      cpumask_var_t and either use the address of the cpumask variable
      directly or perform a fetch of the address of the struct cpumask
      allocated elsewhere. This is important particularly when using per cpu
      cpumask_var_t declarations because in one case we have an offset into
      a per cpu area to handle and in the other case we need to fetch a
      pointer from the offset.
      
      This patch introduces a new macro
      
      this_cpu_cpumask_var_ptr()
      
      that is defined where cpumask_var_t is defined and performs the proper
      actions. All use cases where __get_cpu_var is used with cpumask_var_t
      are converted to the use of this_cpu_cpumask_var_ptr().
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      4ba29684
  8. 20 8月, 2014 1 次提交
    • K
      sched: Add wrapper for checking task_struct::on_rq · da0c1e65
      Kirill Tkhai 提交于
      Implement task_on_rq_queued() and use it everywhere instead of
      on_rq check. No functional changes.
      
      The only exception is we do not use the wrapper in
      check_for_tasks(), because it requires to export
      task_on_rq_queued() in global header files. Next patch in series
      would return it back, so we do not twist it from here to there.
      Signed-off-by: NKirill Tkhai <ktkhai@parallels.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Turner <pjt@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Kirill Tkhai <tkhai@yandex.ru>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1408528052.23412.87.camel@tkhaiSigned-off-by: NIngo Molnar <mingo@kernel.org>
      da0c1e65
  9. 16 7月, 2014 2 次提交
  10. 05 6月, 2014 4 次提交
  11. 22 5月, 2014 2 次提交
  12. 07 5月, 2014 1 次提交
  13. 17 4月, 2014 1 次提交
  14. 11 3月, 2014 1 次提交
  15. 27 2月, 2014 2 次提交
    • J
      sched/deadline: Prevent rt_time growth to infinity · faa59937
      Juri Lelli 提交于
      Kirill Tkhai noted:
      
        Since deadline tasks share rt bandwidth, we must care about
        bandwidth timer set. Otherwise rt_time may grow up to infinity
        in update_curr_dl(), if there are no other available RT tasks
        on top level bandwidth.
      
      RT task were in fact throttled right after they got enqueued,
      and never executed again (rt_time never again went below rt_runtime).
      
      Peter then proposed to accrue DL execution on rt_time only when
      rt timer is active, and proposed a patch (this patch is a slight
      modification of that) to implement that behavior. While this
      solves Kirill problem, it has a drawback.
      
      Indeed, Kirill noted again:
      
        It looks we may get into a situation, when all CPU time is shared
        between RT and DL tasks:
      
        rt_runtime = n
        rt_period  = 2n
      
        | RT working, DL sleeping  | DL working, RT sleeping      |
        -----------------------------------------------------------
        | (1)     duration = n     | (2)     duration = n         | (repeat)
        |--------------------------|------------------------------|
        | (rt_bw timer is running) | (rt_bw timer is not running) |
      
        No time for fair tasks at all.
      
      While this can happen during the first period, if rq is always backlogged,
      RT tasks won't have the opportunity to execute anymore: rt_time reached
      rt_runtime during (1), suppose after (2) RT is enqueued back, it gets
      throttled since rt timer didn't fire, replenishment is from now on eaten up
      by DL tasks that accrue their execution on rt_time (while rt timer is
      active - we have an RT task waiting for replenishment). FAIR tasks are
      not touched after this first period. Ok, this is not ideal, and the situation
      is even worse!
      
      What above (the nice case), practically never happens in reality, where
      your rt timer is not aligned to tasks periods, tasks are in general not
      periodic, etc.. Long story short, you always risk to overload your system.
      
      This patch is based on Peter's idea, but exploits an additional fact:
      if you don't have RT tasks enqueued, it makes little sense to continue
      incrementing rt_time once you reached the upper limit (DL tasks have their
      own mechanism for throttling).
      
      This cures both problems:
      
       - no matter how many DL instances in the past, you'll have an rt_time
         slightly above rt_runtime when an RT task is enqueued, and from that
         point on (after the first replenishment), the task will normally execute;
      
       - you can still eat up all bandwidth during the first period, but not
         anymore after that, remember that DL execution will increment rt_time
         till the upper limit is reached.
      
      The situation is still not perfect! But, we have a simple solution for now,
      that limits how much you can jeopardize your system, as we keep working
      towards the right answer: RT groups scheduled using deadline servers.
      Reported-by: NKirill Tkhai <tkhai@yandex.ru>
      Signed-off-by: NJuri Lelli <juri.lelli@gmail.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/20140225151515.617714e2f2cd6c558531ba61@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      faa59937
    • K
      sched/deadline: Cleanup RT leftovers from {inc/dec}_dl_migration · 3908ac13
      Kirill Tkhai 提交于
      In deadline class we do not have group scheduling.
      
      So, let's remove unnecessary
      
      	X = X;
      
      equations.
      Signed-off-by: NKirill Tkhai <ktkhai@parallels.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Juri Lelli <juri.lelli@gmail.com>
      Link: http://lkml.kernel.org/r/1393343543.4089.5.camel@tkhaiSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3908ac13
  16. 22 2月, 2014 4 次提交
    • P
      sched: Remove some #ifdeffery · dc877341
      Peter Zijlstra 提交于
      Remove a few gratuitous #ifdefs in pick_next_task*().
      
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Juri Lelli <juri.lelli@gmail.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/n/tip-nnzddp5c4fijyzzxxrwlxghf@git.kernel.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      dc877341
    • P
      sched: Fix hotplug task migration · 3f1d2a31
      Peter Zijlstra 提交于
      Dan Carpenter reported:
      
      > kernel/sched/rt.c:1347 pick_next_task_rt() warn: variable dereferenced before check 'prev' (see line 1338)
      > kernel/sched/deadline.c:1011 pick_next_task_dl() warn: variable dereferenced before check 'prev' (see line 1005)
      
      Kirill also spotted that migrate_tasks() will have an instant NULL
      deref because pick_next_task() will immediately deref prev.
      
      Instead of fixing all the corner cases because migrate_tasks() can
      pass in a NULL prev task in the unlikely case of hot-un-plug, provide
      a fake task such that we can remove all the NULL checks from the far
      more common paths.
      
      A further problem; not previously spotted; is that because we pushed
      pre_schedule() and idle_balance() into pick_next_task() we now need to
      avoid those getting called and pulling more tasks on our dying CPU.
      
      We avoid pull_{dl,rt}_task() by setting fake_task.prio to MAX_PRIO+1.
      We also note that since we call pick_next_task() exactly the amount of
      times we have runnable tasks present, we should never land in
      idle_balance().
      
      Fixes: 38033c37 ("sched: Push down pre_schedule() and idle_balance()")
      Cc: Juri Lelli <juri.lelli@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Reported-by: NKirill Tkhai <tkhai@yandex.ru>
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20140212094930.GB3545@laptop.programming.kicks-ass.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      3f1d2a31
    • K
      sched/deadline: Remove useless dl_nr_total · 995b9ea4
      Kirill Tkhai 提交于
      In deadline class we do not have group scheduling like in RT.
      
      dl_nr_total is the same as dl_nr_running. So, one of them should
      be removed.
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Juri Lelli <juri.lelli@gmail.com>
      Signed-off-by: NKirill Tkhai <tkhai@yandex.ru>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/368631392675853@web20h.yandex.ruSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      995b9ea4
    • J
      sched/deadline: Fix bad accounting of nr_running · 3d5f35bd
      Juri Lelli 提交于
      Rostedt writes:
      
      My test suite was locking up hard when enabling mmiotracer. This was due
      to the mmiotracer placing all but one CPU offline. I found this out
      when I was able to reproduce the bug with just my stress-cpu-hotplug
      test. This bug baffled me because it would not always trigger, and
      would only trigger on the first run after boot up. The
      stress-cpu-hotplug test would crash hard the first run, or never crash
      at all. But a new reboot may cause it to crash on the first run again.
      
      I spent all week bisecting this, as I couldn't find a consistent
      reproducer. I finally narrowed it down to the sched deadline patches,
      and even more peculiar, to the commit that added the sched
      deadline boot up self test to the latency tracer. Then it dawned on me
      to what the bug was.
      
      All it took was to run a task under sched deadline to screw up the CPU
      hot plugging. This explained why it would lock up only on the first run
      of the stress-cpu-hotplug test. The bug happened when the boot up self
      test of the schedule latency tracer would test a deadline task. The
      deadline task would corrupt something that would cause CPU hotplug to
      fail. If it didn't corrupt it, the stress test would always work
      (there's no other sched deadline tasks that would run to cause
      problems). If it did corrupt on boot up, the first test would lockup
      hard.
      
      I proved this theory by running my deadline test program on another box,
      and then run the stress-cpu-hotplug test, and it would now consistently
      lock up. I could run stress-cpu-hotplug over and over with no problem,
      but once I ran the deadline test, the next run of the
      stress-cpu-hotplug would lock hard.
      
      After adding lots of tracing to the code, I found the cause. The
      function tracer showed that migrate_tasks() was stuck in an infinite
      loop, where rq->nr_running never equaled 1 to break out of it. When I
      added a trace_printk() to see what that number was, it was 335 and
      never decrementing!
      
      Looking at the deadline code I found:
      
      static void __dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags) {
      	dequeue_dl_entity(&p->dl);
      	dequeue_pushable_dl_task(rq, p);
      }
      
      static void dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags) {
      	update_curr_dl(rq);
      	__dequeue_task_dl(rq, p, flags);
      
      	dec_nr_running(rq);
      }
      
      And this:
      
      	if (dl_runtime_exceeded(rq, dl_se)) {
      		__dequeue_task_dl(rq, curr, 0);
      		if (likely(start_dl_timer(dl_se, curr->dl.dl_boosted)))
      			dl_se->dl_throttled = 1;
      		else
      			enqueue_task_dl(rq, curr, ENQUEUE_REPLENISH);
      
      		if (!is_leftmost(curr, &rq->dl))
      			resched_task(curr);
      	}
      
      Notice how we call __dequeue_task_dl() and in the else case we
      call enqueue_task_dl()? Also notice that dequeue_task_dl() has
      underscores where enqueue_task_dl() does not. The enqueue_task_dl()
      calls inc_nr_running(rq), but __dequeue_task_dl() does not. This is
      where we get nr_running out of sync.
      
      [snip]
      
      Another point where nr_running can get out of sync is when the dl_timer
      fires:
      
      	dl_se->dl_throttled = 0;
      	if (p->on_rq) {
      		enqueue_task_dl(rq, p, ENQUEUE_REPLENISH);
      		if (task_has_dl_policy(rq->curr))
      			check_preempt_curr_dl(rq, p, 0);
      		else
      			resched_task(rq->curr);
      
      This patch does two things:
      
       - correctly accounts for throttled tasks (that are now considered
         !running);
      
       - fixes the bug, updating nr_running from {inc,dec}_dl_tasks(),
         since we risk to update it twice in some situations (e.g., a
         task is dequeued while it has exceeded its budget).
      
      Cc: mingo@redhat.com
      Cc: torvalds@linux-foundation.org
      Cc: akpm@linux-foundation.org
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Tested-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NJuri Lelli <juri.lelli@gmail.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1392884379-13744-1-git-send-email-juri.lelli@gmail.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      3d5f35bd