1. 02 4月, 2015 1 次提交
  2. 27 3月, 2015 4 次提交
  3. 23 3月, 2015 1 次提交
    • B
      sched: Fix RLIMIT_RTTIME when PI-boosting to RT · 746db944
      Brian Silverman 提交于
      When non-realtime tasks get priority-inheritance boosted to a realtime
      scheduling class, RLIMIT_RTTIME starts to apply to them. However, the
      counter used for checking this (the same one used for SCHED_RR
      timeslices) was not getting reset. This meant that tasks running with a
      non-realtime scheduling class which are repeatedly boosted to a realtime
      one, but never block while they are running realtime, eventually hit the
      timeout without ever running for a time over the limit. This patch
      resets the realtime timeslice counter when un-PI-boosting from an RT to
      a non-RT scheduling class.
      
      I have some test code with two threads and a shared PTHREAD_PRIO_INHERIT
      mutex which induces priority boosting and spins while boosted that gets
      killed by a SIGXCPU on non-fixed kernels but doesn't with this patch
      applied. It happens much faster with a CONFIG_PREEMPT_RT kernel, and
      does happen eventually with PREEMPT_VOLUNTARY kernels.
      Signed-off-by: NBrian Silverman <brian@peloton-tech.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: austin@peloton-tech.com
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1424305436-6716-1-git-send-email-brian@peloton-tech.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      746db944
  4. 19 2月, 2015 1 次提交
  5. 18 2月, 2015 6 次提交
  6. 14 2月, 2015 1 次提交
  7. 04 2月, 2015 4 次提交
  8. 02 2月, 2015 1 次提交
    • L
      sched: don't cause task state changes in nested sleep debugging · 00845eb9
      Linus Torvalds 提交于
      Commit 8eb23b9f ("sched: Debug nested sleeps") added code to report
      on nested sleep conditions, which we generally want to avoid because the
      inner sleeping operation can re-set the thread state to TASK_RUNNING,
      but that will then cause the outer sleep loop not actually sleep when it
      calls schedule.
      
      However, that's actually valid traditional behavior, with the inner
      sleep being some fairly rare case (like taking a sleeping lock that
      normally doesn't actually need to sleep).
      
      And the debug code would actually change the state of the task to
      TASK_RUNNING internally, which makes that kind of traditional and
      working code not work at all, because now the nested sleep doesn't just
      sometimes cause the outer one to not block, but will cause it to happen
      every time.
      
      In particular, it will cause the cardbus kernel daemon (pccardd) to
      basically busy-loop doing scheduling, converting a laptop into a heater,
      as reported by Bruno Prémont.  But there may be other legacy uses of
      that nested sleep model in other drivers that are also likely to never
      get converted to the new model.
      
      This fixes both cases:
      
       - don't set TASK_RUNNING when the nested condition happens (note: even
         if WARN_ONCE() only _warns_ once, the return value isn't whether the
         warning happened, but whether the condition for the warning was true.
         So despite the warning only happening once, the "if (WARN_ON(..))"
         would trigger for every nested sleep.
      
       - in the cases where we knowingly disable the warning by using
         "sched_annotate_sleep()", don't change the task state (that is used
         for all core scheduling decisions), instead use '->task_state_change'
         that is used for the debugging decision itself.
      
      (Credit for the second part of the fix goes to Oleg Nesterov: "Can't we
      avoid this subtle change in behaviour DEBUG_ATOMIC_SLEEP adds?" with the
      suggested change to use 'task_state_change' as part of the test)
      Reported-and-bisected-by: NBruno Prémont <bonbons@linux-vserver.org>
      Tested-by: NRafael J Wysocki <rjw@rjwysocki.net>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>,
      Cc: Ilya Dryomov <ilya.dryomov@inktank.com>,
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Hurley <peter@hurleysoftware.com>,
      Cc: Davidlohr Bueso <dave@stgolabs.net>,
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      00845eb9
  9. 31 1月, 2015 1 次提交
  10. 28 1月, 2015 1 次提交
  11. 14 1月, 2015 6 次提交
  12. 23 12月, 2014 1 次提交
  13. 11 12月, 2014 1 次提交
  14. 08 12月, 2014 1 次提交
  15. 04 12月, 2014 1 次提交
  16. 16 11月, 2014 4 次提交
    • W
      sched: Move p->nr_cpus_allowed check to select_task_rq() · 6c1d9410
      Wanpeng Li 提交于
      Move the p->nr_cpus_allowed check into kernel/sched/core.c: select_task_rq().
      This change will make fair.c, rt.c, and deadline.c all start with the
      same logic.
      Suggested-and-Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: "pang.xunlei" <pang.xunlei@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1415150077-59053-1-git-send-email-wanpeng.li@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6c1d9410
    • K
      sched/fair: Kill task_struct::numa_entry and numa_group::task_list · 75389918
      Kirill Tkhai 提交于
      Nobody iterates over numa_group::task_list, this just confuses the readers.
      Signed-off-by: NKirill Tkhai <ktkhai@parallels.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1415358456.28592.17.camel@tkhaiSigned-off-by: NIngo Molnar <mingo@kernel.org>
      75389918
    • S
      sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency · 6e998916
      Stanislaw Gruszka 提交于
      Commit d670ec13 "posix-cpu-timers: Cure SMP wobbles" fixes one glibc
      test case in cost of breaking another one. After that commit, calling
      clock_nanosleep(TIMER_ABSTIME, X) and then clock_gettime(&Y) can result
      of Y time being smaller than X time.
      
      Reproducer/tester can be found further below, it can be compiled and ran by:
      
      	gcc -o tst-cpuclock2 tst-cpuclock2.c -pthread
      	while ./tst-cpuclock2 ; do : ; done
      
      This reproducer, when running on a buggy kernel, will complain
      about "clock_gettime difference too small".
      
      Issue happens because on start in thread_group_cputimer() we initialize
      sum_exec_runtime of cputimer with threads runtime not yet accounted and
      then add the threads runtime to running cputimer again on scheduler
      tick, making it's sum_exec_runtime bigger than actual threads runtime.
      
      KOSAKI Motohiro posted a fix for this problem, but that patch was never
      applied: https://lkml.org/lkml/2013/5/26/191 .
      
      This patch takes different approach to cure the problem. It calls
      update_curr() when cputimer starts, that assure we will have updated
      stats of running threads and on the next schedule tick we will account
      only the runtime that elapsed from cputimer start. That also assure we
      have consistent state between cpu times of individual threads and cpu
      time of the process consisted by those threads.
      
      Full reproducer (tst-cpuclock2.c):
      
      	#define _GNU_SOURCE
      	#include <unistd.h>
      	#include <sys/syscall.h>
      	#include <stdio.h>
      	#include <time.h>
      	#include <pthread.h>
      	#include <stdint.h>
      	#include <inttypes.h>
      
      	/* Parameters for the Linux kernel ABI for CPU clocks.  */
      	#define CPUCLOCK_SCHED          2
      	#define MAKE_PROCESS_CPUCLOCK(pid, clock) \
      		((~(clockid_t) (pid) << 3) | (clockid_t) (clock))
      
      	static pthread_barrier_t barrier;
      
      	/* Help advance the clock.  */
      	static void *chew_cpu(void *arg)
      	{
      		pthread_barrier_wait(&barrier);
      		while (1) ;
      
      		return NULL;
      	}
      
      	/* Don't use the glibc wrapper.  */
      	static int do_nanosleep(int flags, const struct timespec *req)
      	{
      		clockid_t clock_id = MAKE_PROCESS_CPUCLOCK(0, CPUCLOCK_SCHED);
      
      		return syscall(SYS_clock_nanosleep, clock_id, flags, req, NULL);
      	}
      
      	static int64_t tsdiff(const struct timespec *before, const struct timespec *after)
      	{
      		int64_t before_i = before->tv_sec * 1000000000ULL + before->tv_nsec;
      		int64_t after_i = after->tv_sec * 1000000000ULL + after->tv_nsec;
      
      		return after_i - before_i;
      	}
      
      	int main(void)
      	{
      		int result = 0;
      		pthread_t th;
      
      		pthread_barrier_init(&barrier, NULL, 2);
      
      		if (pthread_create(&th, NULL, chew_cpu, NULL) != 0) {
      			perror("pthread_create");
      			return 1;
      		}
      
      		pthread_barrier_wait(&barrier);
      
      		/* The test.  */
      		struct timespec before, after, sleeptimeabs;
      		int64_t sleepdiff, diffabs;
      		const struct timespec sleeptime = {.tv_sec = 0,.tv_nsec = 100000000 };
      
      		/* The relative nanosleep.  Not sure why this is needed, but its presence
      		   seems to make it easier to reproduce the problem.  */
      		if (do_nanosleep(0, &sleeptime) != 0) {
      			perror("clock_nanosleep");
      			return 1;
      		}
      
      		/* Get the current time.  */
      		if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &before) < 0) {
      			perror("clock_gettime[2]");
      			return 1;
      		}
      
      		/* Compute the absolute sleep time based on the current time.  */
      		uint64_t nsec = before.tv_nsec + sleeptime.tv_nsec;
      		sleeptimeabs.tv_sec = before.tv_sec + nsec / 1000000000;
      		sleeptimeabs.tv_nsec = nsec % 1000000000;
      
      		/* Sleep for the computed time.  */
      		if (do_nanosleep(TIMER_ABSTIME, &sleeptimeabs) != 0) {
      			perror("absolute clock_nanosleep");
      			return 1;
      		}
      
      		/* Get the time after the sleep.  */
      		if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &after) < 0) {
      			perror("clock_gettime[3]");
      			return 1;
      		}
      
      		/* The time after sleep should always be equal to or after the absolute sleep
      		   time passed to clock_nanosleep.  */
      		sleepdiff = tsdiff(&sleeptimeabs, &after);
      		if (sleepdiff < 0) {
      			printf("absolute clock_nanosleep woke too early: %" PRId64 "\n", sleepdiff);
      			result = 1;
      
      			printf("Before %llu.%09llu\n", before.tv_sec, before.tv_nsec);
      			printf("After  %llu.%09llu\n", after.tv_sec, after.tv_nsec);
      			printf("Sleep  %llu.%09llu\n", sleeptimeabs.tv_sec, sleeptimeabs.tv_nsec);
      		}
      
      		/* The difference between the timestamps taken before and after the
      		   clock_nanosleep call should be equal to or more than the duration of the
      		   sleep.  */
      		diffabs = tsdiff(&before, &after);
      		if (diffabs < sleeptime.tv_nsec) {
      			printf("clock_gettime difference too small: %" PRId64 "\n", diffabs);
      			result = 1;
      		}
      
      		pthread_cancel(th);
      
      		return result;
      	}
      Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20141112155843.GA24803@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6e998916
    • P
      sched/cputime: Fix cpu_timer_sample_group() double accounting · 23cfa361
      Peter Zijlstra 提交于
      While looking over the cpu-timer code I found that we appear to add
      the delta for the calling task twice, through:
      
        cpu_timer_sample_group()
          thread_group_cputimer()
            thread_group_cputime()
              times->sum_exec_runtime += task_sched_runtime();
      
          *sample = cputime.sum_exec_runtime + task_delta_exec();
      
      Which would make the sample run ahead, making the sleep short.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Link: http://lkml.kernel.org/r/20141112113737.GI10476@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      23cfa361
  17. 10 11月, 2014 1 次提交
    • A
      sched/numa: Fix out of bounds read in sched_init_numa() · c123588b
      Andrey Ryabinin 提交于
      On latest mm + KASan patchset I've got this:
      
          ==================================================================
          BUG: AddressSanitizer: out of bounds access in sched_init_smp+0x3ba/0x62c at addr ffff88006d4bee6c
          =============================================================================
          BUG kmalloc-8 (Not tainted): kasan error
          -----------------------------------------------------------------------------
      
          Disabling lock debugging due to kernel taint
          INFO: Allocated in alloc_vfsmnt+0xb0/0x2c0 age=75 cpu=0 pid=0
           __slab_alloc+0x4b4/0x4f0
           __kmalloc_track_caller+0x15f/0x1e0
           kstrdup+0x44/0x90
           alloc_vfsmnt+0xb0/0x2c0
           vfs_kern_mount+0x35/0x190
           kern_mount_data+0x25/0x50
           pid_ns_prepare_proc+0x19/0x50
           alloc_pid+0x5e2/0x630
           copy_process.part.41+0xdf5/0x2aa0
           do_fork+0xf5/0x460
           kernel_thread+0x21/0x30
           rest_init+0x1e/0x90
           start_kernel+0x522/0x531
           x86_64_start_reservations+0x2a/0x2c
           x86_64_start_kernel+0x15b/0x16a
          INFO: Slab 0xffffea0001b52f80 objects=24 used=22 fp=0xffff88006d4befc0 flags=0x100000000004080
          INFO: Object 0xffff88006d4bed20 @offset=3360 fp=0xffff88006d4bee70
      
          Bytes b4 ffff88006d4bed10: 00 00 00 00 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a  ........ZZZZZZZZ
          Object ffff88006d4bed20: 70 72 6f 63 00 6b 6b a5                          proc.kk.
          Redzone ffff88006d4bed28: cc cc cc cc cc cc cc cc                          ........
          Padding ffff88006d4bee68: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ
          CPU: 0 PID: 1 Comm: swapper/0 Tainted: G    B          3.18.0-rc3-mm1+ #108
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
           ffff88006d4be000 0000000000000000 ffff88006d4bed20 ffff88006c86fd18
           ffffffff81cd0a59 0000000000000058 ffff88006d404240 ffff88006c86fd48
           ffffffff811fa3a8 ffff88006d404240 ffffea0001b52f80 ffff88006d4bed20
          Call Trace:
          dump_stack (lib/dump_stack.c:52)
          print_trailer (mm/slub.c:645)
          object_err (mm/slub.c:652)
          ? sched_init_smp (kernel/sched/core.c:6552 kernel/sched/core.c:7063)
          kasan_report_error (mm/kasan/report.c:102 mm/kasan/report.c:178)
          ? kasan_poison_shadow (mm/kasan/kasan.c:48)
          ? kasan_unpoison_shadow (mm/kasan/kasan.c:54)
          ? kasan_poison_shadow (mm/kasan/kasan.c:48)
          ? kasan_kmalloc (mm/kasan/kasan.c:311)
          __asan_load4 (mm/kasan/kasan.c:371)
          ? sched_init_smp (kernel/sched/core.c:6552 kernel/sched/core.c:7063)
          sched_init_smp (kernel/sched/core.c:6552 kernel/sched/core.c:7063)
          kernel_init_freeable (init/main.c:869 init/main.c:997)
          ? finish_task_switch (kernel/sched/sched.h:1036 kernel/sched/core.c:2248)
          ? rest_init (init/main.c:924)
          kernel_init (init/main.c:929)
          ? rest_init (init/main.c:924)
          ret_from_fork (arch/x86/kernel/entry_64.S:348)
          ? rest_init (init/main.c:924)
          Read of size 4 by task swapper/0:
          Memory state around the buggy address:
           ffff88006d4beb80: fc fc fc fc fc fc fc fc fc fc 00 fc fc fc fc fc
           ffff88006d4bec00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
           ffff88006d4bec80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
           ffff88006d4bed00: fc fc fc fc 00 fc fc fc fc fc fc fc fc fc fc fc
           ffff88006d4bed80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
          >ffff88006d4bee00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc 04 fc
                                                                    ^
           ffff88006d4bee80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
           ffff88006d4bef00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
           ffff88006d4bef80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
           ffff88006d4bf000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
           ffff88006d4bf080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
          ==================================================================
      
      Zero 'level' (e.g. on non-NUMA system) causing out of bounds
      access in this line:
      
           sched_max_numa_distance = sched_domains_numa_distance[level - 1];
      
      Fix this by exiting from sched_init_numa() earlier.
      Signed-off-by: NAndrey Ryabinin <a.ryabinin@samsung.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Fixes: 9942f79b ("sched/numa: Export info needed for NUMA balancing on complex topologies")
      Cc: peterz@infradead.org
      Link: http://lkml.kernel.org/r/1415372020-1871-1-git-send-email-a.ryabinin@samsung.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c123588b
  18. 04 11月, 2014 4 次提交
    • I
      sched: Refactor task_struct to use numa_faults instead of numa_* pointers · 44dba3d5
      Iulia Manda 提交于
      This patch simplifies task_struct by removing the four numa_* pointers
      in the same array and replacing them with the array pointer. By doing this,
      on x86_64, the size of task_struct is reduced by 3 ulong pointers (24 bytes on
      x86_64).
      
      A new parameter is added to the task_faults_idx function so that it can return
      an index to the correct offset, corresponding with the old precalculated
      pointers.
      
      All of the code in sched/ that depended on task_faults_idx and numa_* was
      changed in order to match the new logic.
      Signed-off-by: NIulia Manda <iulia.manda21@gmail.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: mgorman@suse.de
      Cc: dave@stgolabs.net
      Cc: riel@redhat.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20141031001331.GA30662@winterfellSigned-off-by: NIngo Molnar <mingo@kernel.org>
      44dba3d5
    • J
      sched/core: Use dl_bw_of() under rcu_read_lock_sched() · 75e23e49
      Juri Lelli 提交于
      As per commit f10e00f4 ("sched/dl: Use dl_bw_of() under
      rcu_read_lock_sched()"), dl_bw_of() has to be protected by
      rcu_read_lock_sched().
      Signed-off-by: NJuri Lelli <juri.lelli@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1414497286-28824-1-git-send-email-juri.lelli@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      75e23e49
    • K
      sched/deadline: Implement cancel_dl_timer() to use in switched_from_dl() · 67dfa1b7
      Kirill Tkhai 提交于
      Currently used hrtimer_try_to_cancel() is racy:
      
      raw_spin_lock(&rq->lock)
      ...                            dl_task_timer                 raw_spin_lock(&rq->lock)
      ...                               raw_spin_lock(&rq->lock)   ...
         switched_from_dl()             ...                        ...
            hrtimer_try_to_cancel()     ...                        ...
         switched_to_fair()             ...                        ...
      ...                               ...                        ...
      ...                               ...                        ...
      raw_spin_unlock(&rq->lock)        ...                        (asquired)
      ...                               ...                        ...
      ...                               ...                        ...
      do_exit()                         ...                        ...
         schedule()                     ...                        ...
            raw_spin_lock(&rq->lock)    ...                        raw_spin_unlock(&rq->lock)
            ...                         ...                        ...
            raw_spin_unlock(&rq->lock)  ...                        raw_spin_lock(&rq->lock)
            ...                         ...                        (asquired)
            put_task_struct()           ...                        ...
                free_task_struct()      ...                        ...
            ...                         ...                        raw_spin_unlock(&rq->lock)
      ...                               (asquired)                 ...
      ...                               ...                        ...
      ...                               (use after free)           ...
      
      So, let's implement 100% guaranteed way to cancel the timer and let's
      be sure we are safe even in very unlikely situations.
      
      rq unlocking does not limit the area of switched_from_dl() use, because
      this has already been possible in pull_dl_task() below.
      
      Let's consider the safety of of this unlocking. New code in the patch
      is working when hrtimer_try_to_cancel() fails. This means the callback
      is running. In this case hrtimer_cancel() is just waiting till the
      callback is finished. Two
      
      1) Since we are in switched_from_dl(), new class is not dl_sched_class and
      new prio is not less MAX_DL_PRIO. So, the callback returns early; it's
      right after !dl_task() check. After that hrtimer_cancel() returns back too.
      
      The above is:
      
      raw_spin_lock(rq->lock);                  ...
      ...                                       dl_task_timer()
      ...                                          raw_spin_lock(rq->lock);
         switched_from_dl()                        ...
             hrtimer_try_to_cancel()               ...
                raw_spin_unlock(rq->lock);         ...
                hrtimer_cancel()                   ...
                ...                                raw_spin_unlock(rq->lock);
                ...                                return HRTIMER_NORESTART;
                ...                             ...
                raw_spin_lock(rq->lock);        ...
      
      2) But the below is also possible:
                                         dl_task_timer()
                                            raw_spin_lock(rq->lock);
                                            ...
                                            raw_spin_unlock(rq->lock);
      raw_spin_lock(rq->lock);              ...
         switched_from_dl()                 ...
             hrtimer_try_to_cancel()        ...
             ...                            return HRTIMER_NORESTART;
             raw_spin_unlock(rq->lock);  ...
             hrtimer_cancel();           ...
             raw_spin_lock(rq->lock);    ...
      
      In this case hrtimer_cancel() returns immediately. Very unlikely case,
      just to mention.
      
      Nobody can manipulate the task, because check_class_changed() is
      always called with pi_lock locked. Nobody can force the task to
      participate in (concurrent) priority inheritance schemes (the same reason).
      
      All concurrent task operations require pi_lock, which is held by us.
      No deadlocks with dl_task_timer() are possible, because it returns
      right after !dl_task() check (it does nothing).
      
      If we receive a new dl_task during the time of unlocked rq, we just
      don't have to do pull_dl_task() in switched_from_dl() further.
      Signed-off-by: NKirill Tkhai <ktkhai@parallels.com>
      [ Added comments]
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NJuri Lelli <juri.lelli@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1414420852.19914.186.camel@tkhaiSigned-off-by: NIngo Molnar <mingo@kernel.org>
      67dfa1b7
    • P
      sched: Use WARN_ONCE for the might_sleep() TASK_RUNNING test · e7097e8b
      Peter Zijlstra 提交于
      In some cases this can trigger a true flood of output.
      Requested-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      e7097e8b