提交 · 1f8a7633094b7886c0677b78ba60b82e501f3ce6 · openeuler / raspberrypi-kernel

14 1月, 2015 2 次提交

sched/debug: Fix potential call to __ffs(0) in sched_show_task() · 1f8a7633

由 Tetsuo Handa 提交于 12月 05, 2014

"struct task_struct"->state is "volatile long" and __ffs() warns that
"Undefined if no bit exists, so code should check against 0 first."

Therefore, at expression

state = p->state ? __ffs(p->state) + 1 : 0;

in sched_show_task(), CPU might see "p->state" before "?" as "non-zero"
but "p->state" after "?" as "zero", which could result in
"state >= sizeof(stat_nam)" being true and bogus '?' is printed.

This patch changes "state" from "unsigned int" to "unsigned long" and
save "p->state" before calling __ffs(), in order to avoid potential call
to __ffs(0).
Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/201412052131.GCE35924.FVHFOtLOJOMQFS@I-love.SAKURA.ne.jpSigned-off-by: NIngo Molnar <mingo@kernel.org>

1f8a7633

sched/debug: Check for stack overflow in ___might_sleep() · a8b686b3

由 Eric Sandeen 提交于 12月 16, 2014

Sometimes a "BUG: sleeping function called from invalid context"
message is not indicative of locking problems, but is the result
of a stack overflow corrupting the thread info.

Witness http://oss.sgi.com/archives/xfs/2014-02/msg00325.html
for example, which took a few go-rounds to sort out.

If we're printing the warning, things are wonky already, and
it'd be informative to check for the stack end corruption at this
point, too.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/5490B158.4060005@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

a8b686b3

23 12月, 2014 1 次提交

sched: Fix KMALLOC_MAX_SIZE overflow during cpumask allocation · b74e6278

由 Alex Thorlton 提交于 12月 18, 2014

When allocating space for load_balance_mask, in sched_init, when
CPUMASK_OFFSTACK is set, we've managed to spill over
KMALLOC_MAX_SIZE on our 6144 core machine.  The patch below
breaks up the allocations so that they don't overflow the max
alloc size.  It also allocates the masks on the the node from
which they'll most commonly be accessed, to minimize remote
accesses on NUMA machines.
Suggested-by: NGeorge Beshers <gbeshers@sgi.com>
Signed-off-by: NAlex Thorlton <athorlton@sgi.com>
Cc: George Beshers <gbeshers@sgi.com>
Cc: Russ Anderson <rja@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1418928270-148543-1-git-send-email-athorlton@sgi.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

b74e6278

11 12月, 2014 1 次提交

sched_show_task: fix unsafe usage of ->real_parent · a90e984c

由 Oleg Nesterov 提交于 12月 10, 2014

rcu_read_lock() can not protect p->real_parent if release_task(p) was
already called, change sched_show_task() to check pis_alive() like other
users do.

Note: we need some helpers to cleanup the code like this.  And it seems
that that the usage of cpu_curr(cpu) in dump_cpu_task() is not safe too.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
Cc: Sterling Alexander <stalexan@redhat.com>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Roland McGrath <roland@hack.frob.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a90e984c

08 12月, 2014 1 次提交

sched: Add missing rcu protection to wake_up_all_idle_cpus · fd7de1e8

由 Andy Lutomirski 提交于 11月 29, 2014

Locklessly doing is_idle_task(rq->curr) is only okay because of
RCU protection. The older variant of the broken code checked
rq->curr == rq->idle instead and therefore didn't need RCU.

Fixes: f6be8af1 ("sched: Add new API wake_up_if_idle() to wake up the idle cpu")
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Reviewed-by: NChuansheng Liu <chuansheng.liu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/729365dddca178506dfd0a9451006344cd6808bc.1417277372.git.luto@amacapital.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

fd7de1e8

04 12月, 2014 1 次提交

context_tracking: Restore previous state in schedule_user · 7cc78f8f

由 Andy Lutomirski 提交于 12月 03, 2014

It appears that some SCHEDULE_USER (asm for schedule_user) callers
in arch/x86/kernel/entry_64.S are called from RCU kernel context,
and schedule_user will return in RCU user context.  This causes RCU
warnings and possible failures.

This is intended to be a minimal fix suitable for 3.18.
Reported-and-tested-by: NDave Jones <davej@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7cc78f8f

16 11月, 2014 4 次提交

sched: Move p->nr_cpus_allowed check to select_task_rq() · 6c1d9410

由 Wanpeng Li 提交于 11月 05, 2014

Move the p->nr_cpus_allowed check into kernel/sched/core.c: select_task_rq().
This change will make fair.c, rt.c, and deadline.c all start with the
same logic.
Suggested-and-Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: "pang.xunlei" <pang.xunlei@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1415150077-59053-1-git-send-email-wanpeng.li@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

6c1d9410

sched/fair: Kill task_struct::numa_entry and numa_group::task_list · 75389918

由 Kirill Tkhai 提交于 11月 07, 2014

Nobody iterates over numa_group::task_list, this just confuses the readers.
Signed-off-by: NKirill Tkhai <ktkhai@parallels.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1415358456.28592.17.camel@tkhaiSigned-off-by: NIngo Molnar <mingo@kernel.org>

75389918

sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency · 6e998916

由 Stanislaw Gruszka 提交于 11月 12, 2014

Commit d670ec13 "posix-cpu-timers: Cure SMP wobbles" fixes one glibc
test case in cost of breaking another one. After that commit, calling
clock_nanosleep(TIMER_ABSTIME, X) and then clock_gettime(&Y) can result
of Y time being smaller than X time.

Reproducer/tester can be found further below, it can be compiled and ran by:

	gcc -o tst-cpuclock2 tst-cpuclock2.c -pthread
	while ./tst-cpuclock2 ; do : ; done

This reproducer, when running on a buggy kernel, will complain
about "clock_gettime difference too small".

Issue happens because on start in thread_group_cputimer() we initialize
sum_exec_runtime of cputimer with threads runtime not yet accounted and
then add the threads runtime to running cputimer again on scheduler
tick, making it's sum_exec_runtime bigger than actual threads runtime.

KOSAKI Motohiro posted a fix for this problem, but that patch was never
applied: https://lkml.org/lkml/2013/5/26/191 .

This patch takes different approach to cure the problem. It calls
update_curr() when cputimer starts, that assure we will have updated
stats of running threads and on the next schedule tick we will account
only the runtime that elapsed from cputimer start. That also assure we
have consistent state between cpu times of individual threads and cpu
time of the process consisted by those threads.

Full reproducer (tst-cpuclock2.c):

	#define _GNU_SOURCE
	#include <unistd.h>
	#include <sys/syscall.h>
	#include <stdio.h>
	#include <time.h>
	#include <pthread.h>
	#include <stdint.h>
	#include <inttypes.h>

	/* Parameters for the Linux kernel ABI for CPU clocks.  */
	#define CPUCLOCK_SCHED          2
	#define MAKE_PROCESS_CPUCLOCK(pid, clock) \
		((~(clockid_t) (pid) << 3) | (clockid_t) (clock))

	static pthread_barrier_t barrier;

	/* Help advance the clock.  */
	static void *chew_cpu(void *arg)
	{
		pthread_barrier_wait(&barrier);
		while (1) ;

		return NULL;
	}

	/* Don't use the glibc wrapper.  */
	static int do_nanosleep(int flags, const struct timespec *req)
	{
		clockid_t clock_id = MAKE_PROCESS_CPUCLOCK(0, CPUCLOCK_SCHED);

		return syscall(SYS_clock_nanosleep, clock_id, flags, req, NULL);
	}

	static int64_t tsdiff(const struct timespec *before, const struct timespec *after)
	{
		int64_t before_i = before->tv_sec * 1000000000ULL + before->tv_nsec;
		int64_t after_i = after->tv_sec * 1000000000ULL + after->tv_nsec;

		return after_i - before_i;
	}

	int main(void)
	{
		int result = 0;
		pthread_t th;

		pthread_barrier_init(&barrier, NULL, 2);

		if (pthread_create(&th, NULL, chew_cpu, NULL) != 0) {
			perror("pthread_create");
			return 1;
		}

		pthread_barrier_wait(&barrier);

		/* The test.  */
		struct timespec before, after, sleeptimeabs;
		int64_t sleepdiff, diffabs;
		const struct timespec sleeptime = {.tv_sec = 0,.tv_nsec = 100000000 };

		/* The relative nanosleep.  Not sure why this is needed, but its presence
		   seems to make it easier to reproduce the problem.  */
		if (do_nanosleep(0, &sleeptime) != 0) {
			perror("clock_nanosleep");
			return 1;
		}

		/* Get the current time.  */
		if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &before) < 0) {
			perror("clock_gettime[2]");
			return 1;
		}

		/* Compute the absolute sleep time based on the current time.  */
		uint64_t nsec = before.tv_nsec + sleeptime.tv_nsec;
		sleeptimeabs.tv_sec = before.tv_sec + nsec / 1000000000;
		sleeptimeabs.tv_nsec = nsec % 1000000000;

		/* Sleep for the computed time.  */
		if (do_nanosleep(TIMER_ABSTIME, &sleeptimeabs) != 0) {
			perror("absolute clock_nanosleep");
			return 1;
		}

		/* Get the time after the sleep.  */
		if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &after) < 0) {
			perror("clock_gettime[3]");
			return 1;
		}

		/* The time after sleep should always be equal to or after the absolute sleep
		   time passed to clock_nanosleep.  */
		sleepdiff = tsdiff(&sleeptimeabs, &after);
		if (sleepdiff < 0) {
			printf("absolute clock_nanosleep woke too early: %" PRId64 "\n", sleepdiff);
			result = 1;

			printf("Before %llu.%09llu\n", before.tv_sec, before.tv_nsec);
			printf("After  %llu.%09llu\n", after.tv_sec, after.tv_nsec);
			printf("Sleep  %llu.%09llu\n", sleeptimeabs.tv_sec, sleeptimeabs.tv_nsec);
		}

		/* The difference between the timestamps taken before and after the
		   clock_nanosleep call should be equal to or more than the duration of the
		   sleep.  */
		diffabs = tsdiff(&before, &after);
		if (diffabs < sleeptime.tv_nsec) {
			printf("clock_gettime difference too small: %" PRId64 "\n", diffabs);
			result = 1;
		}

		pthread_cancel(th);

		return result;
	}
Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20141112155843.GA24803@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

6e998916

sched/cputime: Fix cpu_timer_sample_group() double accounting · 23cfa361

由 Peter Zijlstra 提交于 11月 12, 2014

While looking over the cpu-timer code I found that we appear to add
the delta for the calling task twice, through:

  cpu_timer_sample_group()
    thread_group_cputimer()
      thread_group_cputime()
        times->sum_exec_runtime += task_sched_runtime();

    *sample = cputime.sum_exec_runtime + task_delta_exec();

Which would make the sample run ahead, making the sleep short.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20141112113737.GI10476@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

23cfa361

10 11月, 2014 1 次提交

sched/numa: Fix out of bounds read in sched_init_numa() · c123588b

由 Andrey Ryabinin 提交于 11月 07, 2014

On latest mm + KASan patchset I've got this:

    ==================================================================
    BUG: AddressSanitizer: out of bounds access in sched_init_smp+0x3ba/0x62c at addr ffff88006d4bee6c
    =============================================================================
    BUG kmalloc-8 (Not tainted): kasan error
    -----------------------------------------------------------------------------

    Disabling lock debugging due to kernel taint
    INFO: Allocated in alloc_vfsmnt+0xb0/0x2c0 age=75 cpu=0 pid=0
     __slab_alloc+0x4b4/0x4f0
     __kmalloc_track_caller+0x15f/0x1e0
     kstrdup+0x44/0x90
     alloc_vfsmnt+0xb0/0x2c0
     vfs_kern_mount+0x35/0x190
     kern_mount_data+0x25/0x50
     pid_ns_prepare_proc+0x19/0x50
     alloc_pid+0x5e2/0x630
     copy_process.part.41+0xdf5/0x2aa0
     do_fork+0xf5/0x460
     kernel_thread+0x21/0x30
     rest_init+0x1e/0x90
     start_kernel+0x522/0x531
     x86_64_start_reservations+0x2a/0x2c
     x86_64_start_kernel+0x15b/0x16a
    INFO: Slab 0xffffea0001b52f80 objects=24 used=22 fp=0xffff88006d4befc0 flags=0x100000000004080
    INFO: Object 0xffff88006d4bed20 @offset=3360 fp=0xffff88006d4bee70

    Bytes b4 ffff88006d4bed10: 00 00 00 00 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a  ........ZZZZZZZZ
    Object ffff88006d4bed20: 70 72 6f 63 00 6b 6b a5                          proc.kk.
    Redzone ffff88006d4bed28: cc cc cc cc cc cc cc cc                          ........
    Padding ffff88006d4bee68: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ
    CPU: 0 PID: 1 Comm: swapper/0 Tainted: G    B          3.18.0-rc3-mm1+ #108
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
     ffff88006d4be000 0000000000000000 ffff88006d4bed20 ffff88006c86fd18
     ffffffff81cd0a59 0000000000000058 ffff88006d404240 ffff88006c86fd48
     ffffffff811fa3a8 ffff88006d404240 ffffea0001b52f80 ffff88006d4bed20
    Call Trace:
    dump_stack (lib/dump_stack.c:52)
    print_trailer (mm/slub.c:645)
    object_err (mm/slub.c:652)
    ? sched_init_smp (kernel/sched/core.c:6552 kernel/sched/core.c:7063)
    kasan_report_error (mm/kasan/report.c:102 mm/kasan/report.c:178)
    ? kasan_poison_shadow (mm/kasan/kasan.c:48)
    ? kasan_unpoison_shadow (mm/kasan/kasan.c:54)
    ? kasan_poison_shadow (mm/kasan/kasan.c:48)
    ? kasan_kmalloc (mm/kasan/kasan.c:311)
    __asan_load4 (mm/kasan/kasan.c:371)
    ? sched_init_smp (kernel/sched/core.c:6552 kernel/sched/core.c:7063)
    sched_init_smp (kernel/sched/core.c:6552 kernel/sched/core.c:7063)
    kernel_init_freeable (init/main.c:869 init/main.c:997)
    ? finish_task_switch (kernel/sched/sched.h:1036 kernel/sched/core.c:2248)
    ? rest_init (init/main.c:924)
    kernel_init (init/main.c:929)
    ? rest_init (init/main.c:924)
    ret_from_fork (arch/x86/kernel/entry_64.S:348)
    ? rest_init (init/main.c:924)
    Read of size 4 by task swapper/0:
    Memory state around the buggy address:
     ffff88006d4beb80: fc fc fc fc fc fc fc fc fc fc 00 fc fc fc fc fc
     ffff88006d4bec00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
     ffff88006d4bec80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
     ffff88006d4bed00: fc fc fc fc 00 fc fc fc fc fc fc fc fc fc fc fc
     ffff88006d4bed80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    >ffff88006d4bee00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc 04 fc
                                                              ^
     ffff88006d4bee80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
     ffff88006d4bef00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
     ffff88006d4bef80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
     ffff88006d4bf000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
     ffff88006d4bf080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ==================================================================

Zero 'level' (e.g. on non-NUMA system) causing out of bounds
access in this line:

     sched_max_numa_distance = sched_domains_numa_distance[level - 1];

Fix this by exiting from sched_init_numa() earlier.
Signed-off-by: NAndrey Ryabinin <a.ryabinin@samsung.com>
Reviewed-by: NRik van Riel <riel@redhat.com>
Fixes: 9942f79b ("sched/numa: Export info needed for NUMA balancing on complex topologies")
Cc: peterz@infradead.org
Link: http://lkml.kernel.org/r/1415372020-1871-1-git-send-email-a.ryabinin@samsung.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

c123588b

04 11月, 2014 6 次提交

sched: Refactor task_struct to use numa_faults instead of numa_* pointers · 44dba3d5

由 Iulia Manda 提交于 10月 31, 2014

This patch simplifies task_struct by removing the four numa_* pointers
in the same array and replacing them with the array pointer. By doing this,
on x86_64, the size of task_struct is reduced by 3 ulong pointers (24 bytes on
x86_64).

A new parameter is added to the task_faults_idx function so that it can return
an index to the correct offset, corresponding with the old precalculated
pointers.

All of the code in sched/ that depended on task_faults_idx and numa_* was
changed in order to match the new logic.
Signed-off-by: NIulia Manda <iulia.manda21@gmail.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: mgorman@suse.de
Cc: dave@stgolabs.net
Cc: riel@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20141031001331.GA30662@winterfellSigned-off-by: NIngo Molnar <mingo@kernel.org>

44dba3d5

sched/core: Use dl_bw_of() under rcu_read_lock_sched() · 75e23e49

由 Juri Lelli 提交于 10月 28, 2014

As per commit f10e00f4 ("sched/dl: Use dl_bw_of() under
rcu_read_lock_sched()"), dl_bw_of() has to be protected by
rcu_read_lock_sched().
Signed-off-by: NJuri Lelli <juri.lelli@arm.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1414497286-28824-1-git-send-email-juri.lelli@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

75e23e49

sched/deadline: Implement cancel_dl_timer() to use in switched_from_dl() · 67dfa1b7

由 Kirill Tkhai 提交于 10月 27, 2014

Currently used hrtimer_try_to_cancel() is racy:

raw_spin_lock(&rq->lock)
...                            dl_task_timer                 raw_spin_lock(&rq->lock)
...                               raw_spin_lock(&rq->lock)   ...
   switched_from_dl()             ...                        ...
      hrtimer_try_to_cancel()     ...                        ...
   switched_to_fair()             ...                        ...
...                               ...                        ...
...                               ...                        ...
raw_spin_unlock(&rq->lock)        ...                        (asquired)
...                               ...                        ...
...                               ...                        ...
do_exit()                         ...                        ...
   schedule()                     ...                        ...
      raw_spin_lock(&rq->lock)    ...                        raw_spin_unlock(&rq->lock)
      ...                         ...                        ...
      raw_spin_unlock(&rq->lock)  ...                        raw_spin_lock(&rq->lock)
      ...                         ...                        (asquired)
      put_task_struct()           ...                        ...
          free_task_struct()      ...                        ...
      ...                         ...                        raw_spin_unlock(&rq->lock)
...                               (asquired)                 ...
...                               ...                        ...
...                               (use after free)           ...

So, let's implement 100% guaranteed way to cancel the timer and let's
be sure we are safe even in very unlikely situations.

rq unlocking does not limit the area of switched_from_dl() use, because
this has already been possible in pull_dl_task() below.

Let's consider the safety of of this unlocking. New code in the patch
is working when hrtimer_try_to_cancel() fails. This means the callback
is running. In this case hrtimer_cancel() is just waiting till the
callback is finished. Two

1) Since we are in switched_from_dl(), new class is not dl_sched_class and
new prio is not less MAX_DL_PRIO. So, the callback returns early; it's
right after !dl_task() check. After that hrtimer_cancel() returns back too.

The above is:

raw_spin_lock(rq->lock);                  ...
...                                       dl_task_timer()
...                                          raw_spin_lock(rq->lock);
   switched_from_dl()                        ...
       hrtimer_try_to_cancel()               ...
          raw_spin_unlock(rq->lock);         ...
          hrtimer_cancel()                   ...
          ...                                raw_spin_unlock(rq->lock);
          ...                                return HRTIMER_NORESTART;
          ...                             ...
          raw_spin_lock(rq->lock);        ...

2) But the below is also possible:
                                   dl_task_timer()
                                      raw_spin_lock(rq->lock);
                                      ...
                                      raw_spin_unlock(rq->lock);
raw_spin_lock(rq->lock);              ...
   switched_from_dl()                 ...
       hrtimer_try_to_cancel()        ...
       ...                            return HRTIMER_NORESTART;
       raw_spin_unlock(rq->lock);  ...
       hrtimer_cancel();           ...
       raw_spin_lock(rq->lock);    ...

In this case hrtimer_cancel() returns immediately. Very unlikely case,
just to mention.

Nobody can manipulate the task, because check_class_changed() is
always called with pi_lock locked. Nobody can force the task to
participate in (concurrent) priority inheritance schemes (the same reason).

All concurrent task operations require pi_lock, which is held by us.
No deadlocks with dl_task_timer() are possible, because it returns
right after !dl_task() check (it does nothing).

If we receive a new dl_task during the time of unlocked rq, we just
don't have to do pull_dl_task() in switched_from_dl() further.
Signed-off-by: NKirill Tkhai <ktkhai@parallels.com>
[ Added comments]
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NJuri Lelli <juri.lelli@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1414420852.19914.186.camel@tkhaiSigned-off-by: NIngo Molnar <mingo@kernel.org>

67dfa1b7

sched: Use WARN_ONCE for the might_sleep() TASK_RUNNING test · e7097e8b

由 Peter Zijlstra 提交于 10月 29, 2014

In some cases this can trigger a true flood of output.
Requested-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

e7097e8b

sched: Remove lockdep check in sched_move_task() · f7b8a47d

由 Kirill Tkhai 提交于 10月 28, 2014

sched_move_task() is the only interface to change sched_task_group:
cpu_cgrp_subsys methods and autogroup_move_group() use it.

Everything is synchronized by task_rq_lock(), so cpu_cgroup_attach()
is ordered with other users of sched_move_task(). This means we do no
need RCU here: if we've dereferenced a tg here, the .attach method
hasn't been called for it yet.

Thus, we should pass "true" to task_css_check() to silence lockdep
warnings.

Fixes: eeb61e53 ("sched: Fix race between task_group and sched_task_group")
Reported-by: NOleg Nesterov <oleg@redhat.com>
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NKirill Tkhai <ktkhai@parallels.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1414473874.8574.2.camel@tkhaiSigned-off-by: NIngo Molnar <mingo@kernel.org>

f7b8a47d

rcu: Remove "cpu" argument to rcu_note_context_switch() · 38200cf2

由 Paul E. McKenney 提交于 10月 21, 2014

The "cpu" argument to rcu_note_context_switch() is always the current
CPU, so drop it.  This in turn allows the "cpu" argument to
rcu_preempt_note_context_switch() to be removed, which allows the sole
use of "cpu" in both functions to be replaced with a this_cpu_ptr().
Again, the anticipated cross-CPU uses of these functions has been
replaced by NO_HZ_FULL.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NPranith Kumar <bobby.prani@gmail.com>

38200cf2

28 10月, 2014 11 次提交

sched: Exclude cond_resched() from nested sleep test · 3427445a

由 Peter Zijlstra 提交于 9月 24, 2014

cond_resched() is a preemption point, not strictly a blocking
primitive, so exclude it from the ->state test.

In particular, preemption preserves task_struct::state.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: tglx@linutronix.de
Cc: ilya.dryomov@inktank.com
Cc: umgwanakikbuti@gmail.com
Cc: oleg@redhat.com
Cc: Alex Elder <alex.elder@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Axel Lin <axel.lin@ingics.com>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20140924082242.656559952@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

3427445a

sched: Debug nested sleeps · 8eb23b9f

由 Peter Zijlstra 提交于 9月 24, 2014

Validate we call might_sleep() with TASK_RUNNING, which catches places
where we nest blocking primitives, eg. mutex usage in a wait loop.

Since all blocking is arranged through task_struct::state, nesting
this will cause the inner primitive to set TASK_RUNNING and the outer
will thus not block.

Another observed problem is calling a blocking function from
schedule()->sched_submit_work()->blk_schedule_flush_plug() which will
then destroy the task state for the actual __schedule() call that
comes after it.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: tglx@linutronix.de
Cc: ilya.dryomov@inktank.com
Cc: umgwanakikbuti@gmail.com
Cc: oleg@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140924082242.591637616@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

8eb23b9f

sched/deadline: Ensure that updates to exclusive cpusets don't break AC · f82f8042

由 Juri Lelli 提交于 10月 07, 2014

How we deal with updates to exclusive cpusets is currently broken.
As an example, suppose we have an exclusive cpuset composed of
two cpus: A[cpu0,cpu1]. We can assign SCHED_DEADLINE task to it
up to the allowed bandwidth. If we want now to modify cpusetA's
cpumask, we have to check that removing a cpu's amount of
bandwidth doesn't break AC guarantees. This thing isn't checked
in the current code.

This patch fixes the problem above, denying an update if the
new cpumask won't have enough bandwidth for SCHED_DEADLINE tasks
that are currently active.
Signed-off-by: NJuri Lelli <juri.lelli@arm.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: cgroups@vger.kernel.org
Link: http://lkml.kernel.org/r/5433E6AF.5080105@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

f82f8042

sched/deadline: Fix bandwidth check/update when migrating tasks between exclusive cpusets · 7f51412a

由 Juri Lelli 提交于 9月 19, 2014

Exclusive cpusets are the only way users can restrict SCHED_DEADLINE tasks
affinity (performing what is commonly called clustered scheduling).
Unfortunately, such thing is currently broken for two reasons:

 - No check is performed when the user tries to attach a task to
   an exlusive cpuset (recall that exclusive cpusets have an
   associated maximum allowed bandwidth).

 - Bandwidths of source and destination cpusets are not correctly
   updated after a task is migrated between them.

This patch fixes both things at once, as they are opposite faces
of the same coin.

The check is performed in cpuset_can_attach(), as there aren't any
points of failure after that function. The updated is split in two
halves. We first reserve bandwidth in the destination cpuset, after
we pass the check in cpuset_can_attach(). And we then release
bandwidth from the source cpuset when the task's affinity is
actually changed. Even if there can be time windows when sched_setattr()
may erroneously fail in the source cpuset, we are fine with it, as
we can't perfom an atomic update of both cpusets at once.
Reported-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
Reported-by: NVincent Legout <vincent@legout.info>
Signed-off-by: NJuri Lelli <juri.lelli@arm.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dario Faggioli <raistlin@linux.it>
Cc: Michael Trimarchi <michael@amarulasolutions.com>
Cc: Fabio Checconi <fchecconi@gmail.com>
Cc: michael@amarulasolutions.com
Cc: luca.abeni@unitn.it
Cc: Li Zefan <lizefan@huawei.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: cgroups@vger.kernel.org
Link: http://lkml.kernel.org/r/1411118561-26323-3-git-send-email-juri.lelli@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

7f51412a

sched: Kill task_preempt_count() · e2336f6e

由 Oleg Nesterov 提交于 10月 08, 2014

task_preempt_count() is pointless if preemption counter is per-cpu,
currently this is x86 only. It is only valid if the task is not
running, and even in this case the only info it can provide is the
state of PREEMPT_ACTIVE bit.

Change its single caller to check p->on_rq instead, this should be
the same if p->state != TASK_RUNNING, and kill this helper.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Alexander Graf <agraf@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch@vger.kernel.org
Link: http://lkml.kernel.org/r/20141008183348.GC17495@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

e2336f6e

sched: Make finish_task_switch() return 'struct rq *' · dfa50b60

由 Oleg Nesterov 提交于 10月 09, 2014

Both callers of finish_task_switch() need to recalculate this_rq()
and pass it as an argument, plus __schedule() does this again after
context_switch().

It would be simpler to call this_rq() once in finish_task_switch()
and return the this rq to the callers.

Note: probably "int cpu" in __schedule() should die; it is not used
and both rcu_note_context_switch() and wq_worker_sleeping() do not
really need this argument.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20141009193232.GB5408@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

dfa50b60

sched: Fix schedule_tail() to disable preemption · 1a43a14a

由 Oleg Nesterov 提交于 10月 08, 2014

finish_task_switch() enables preemption, so post_schedule(rq) can be
called on the wrong (and even dead) CPU. Afaics, nothing really bad
can happen, but in this case we can wrongly clear rq->post_schedule
on that CPU. And this simply looks wrong in any case.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20141008193644.GA32055@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

1a43a14a

sched/numa: Classify the NUMA topology of a system · e3fe70b1

由 Rik van Riel 提交于 10月 17, 2014

Smaller NUMA systems tend to have all NUMA nodes directly connected
to each other. This includes the degenerate case of a system with just
one node, ie. a non-NUMA system.

Larger systems can have two kinds of NUMA topology, which affects how
tasks and memory should be placed on the system.

On glueless mesh systems, nodes that are not directly connected to
each other will bounce traffic through intermediary nodes. Task groups
can be run closer to each other by moving tasks from a node to an
intermediary node between it and the task's preferred node.

On NUMA systems with backplane controllers, the intermediary hops
are incapable of running programs. This creates "islands" of nodes
that are at an equal distance to anywhere else in the system.

Each kind of topology requires a slightly different placement
algorithm; this patch provides the mechanism to detect the kind
of NUMA topology of a system.
Signed-off-by: NRik van Riel <riel@redhat.com>
Tested-by: NChegu Vinod <chegu_vinod@hp.com>
[ Changed to use kernel/sched/sched.h ]
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: mgorman@suse.de
Cc: chegu_vinod@hp.com
Link: http://lkml.kernel.org/r/1413530994-9732-3-git-send-email-riel@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

e3fe70b1

sched/numa: Export info needed for NUMA balancing on complex topologies · 9942f79b

由 Rik van Riel 提交于 10月 17, 2014

Export some information that is necessary to do placement of
tasks on systems with multi-level NUMA topologies.
Signed-off-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: mgorman@suse.de
Cc: chegu_vinod@hp.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1413530994-9732-2-git-send-email-riel@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

9942f79b

sched: stop the unbound recursion in preempt_schedule_context() · 009f60e2

由 Oleg Nesterov 提交于 10月 05, 2014

preempt_schedule_context() does preempt_enable_notrace() at the end
and this can call the same function again; exception_exit() is heavy
and it is quite possible that need-resched is true again.

1. Change this code to dec preempt_count() and check need_resched()
   by hand.

2. As Linus suggested, we can use the PREEMPT_ACTIVE bit and avoid
   the enable/disable dance around __schedule(). But in this case
   we need to move into sched/core.c.

3. Cosmetic, but x86 forgets to declare this function. This doesn't
   really matter because it is only called by asm helpers, still it
   make sense to add the declaration into asm/preempt.h to match
   preempt_schedule().
Reported-by: NSasha Levin <sasha.levin@oracle.com>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Graf <agraf@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Chuck Ebbert <cebbert.lkml@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20141005202322.GB27962@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

009f60e2

sched: Fix race between task_group and sched_task_group · eeb61e53

由 Kirill Tkhai 提交于 10月 27, 2014

The race may happen when somebody is changing task_group of a forking task.
Child's cgroup is the same as parent's after dup_task_struct() (there just
memory copying). Also, cfs_rq and rt_rq are the same as parent's.

But if parent changes its task_group before it's called cgroup_post_fork(),
we do not reflect this situation on child. Child's cfs_rq and rt_rq remain
the same, while child's task_group changes in cgroup_post_fork().

To fix this we introduce fork() method, which calls sched_move_task() directly.
This function changes sched_task_group on appropriate (also its logic has
no problem with freshly created tasks, so we shouldn't introduce something
special; we are able just to use it).

Possibly, this decides the Burke Libbey's problem: https://lkml.org/lkml/2014/10/24/456Signed-off-by: NKirill Tkhai <ktkhai@parallels.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1414405105.19914.169.camel@tkhaiSigned-off-by: NIngo Molnar <mingo@kernel.org>

eeb61e53

03 10月, 2014 1 次提交

sched/dl: Use dl_bw_of() under rcu_read_lock_sched() · f10e00f4

由 Kirill Tkhai 提交于 9月 30, 2014

rq->rd is freed using call_rcu_sched(), so rcu_read_lock() to access it
is not enough. We should use either rcu_read_lock_sched() or preempt_disable().
Reported-by: NSasha Levin <sasha.levin@oracle.com>
Suggested-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: Kirill Tkhai <ktkhai@parallels.com
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Fixes: 66339c31 "sched: Use dl_bw_of() under RCU read lock"
Link: http://lkml.kernel.org/r/1412065417.20287.24.camel@tkhaiSigned-off-by: NIngo Molnar <mingo@kernel.org>

f10e00f4

24 9月, 2014 7 次提交

sched: Use rq->rd in sched_setaffinity() under RCU read lock · f1e3a093

由 Kirill Tkhai 提交于 9月 22, 2014

Probability of use-after-free isn't zero in this place.
Signed-off-by: NKirill Tkhai <ktkhai@parallels.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org> # v3.14+
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140922183636.11015.83611.stgit@localhostSigned-off-by: NIngo Molnar <mingo@kernel.org>

f1e3a093

sched: cleanup: Rename 'out_unlock' to 'out_free_new_mask' · 16303ab2

由 Kirill Tkhai 提交于 9月 22, 2014

Nothing is locked there, so label's name only confuses a reader.
Signed-off-by: NKirill Tkhai <ktkhai@parallels.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140922183630.11015.59500.stgit@localhostSigned-off-by: NIngo Molnar <mingo@kernel.org>

16303ab2

sched: Use dl_bw_of() under RCU read lock · 66339c31

由 Kirill Tkhai 提交于 9月 22, 2014

dl_bw_of() dereferences rq->rd which has to have RCU read lock held.
Probability of use-after-free isn't zero here.

Also add lockdep assert into dl_bw_cpus().
Signed-off-by: NKirill Tkhai <ktkhai@parallels.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org> # v3.14+
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140922183624.11015.71558.stgit@localhostSigned-off-by: NIngo Molnar <mingo@kernel.org>

66339c31

sched, mips, ia64: Remove __ARCH_WANT_UNLOCKED_CTXSW · c55f5158

由 Peter Zijlstra 提交于 9月 23, 2014

Kirill found that there's a subtle race in the
__ARCH_WANT_UNLOCKED_CTXSW code, and instead of fixing it, remove the
entire exception because neither arch that uses it seems to actually
still require it.

Boot tested on mips64el (qemu) only.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NKirill Tkhai <tkhai@yandex.ru>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Qais Yousef <qais.yousef@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: oleg@redhat.com
Cc: linux@roeck-us.net
Cc: linux-ia64@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Link: http://lkml.kernel.org/r/20140923150641.GH3312@worktop.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

c55f5158

sched: normalize_rt_tasks(): Don't use _irqsave for tasklist_lock, use task_rq_lock() · 3472eaa1

由 Oleg Nesterov 提交于 9月 21, 2014

1. read_lock(tasklist_lock) does not need to disable irqs.

2. ->mm != NULL is a common mistake, use PF_KTHREAD.

3. The second ->mm check can be simply removed.

4. task_rq_lock() looks better than raw_spin_lock(&p->pi_lock) +
   __task_rq_lock().
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140921193338.GA28621@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

3472eaa1

sched: Fix the task-group check in tg_has_rt_tasks() · 8651c658

由 Oleg Nesterov 提交于 9月 21, 2014

tg_has_rt_tasks() wants to find an RT task in this task_group, but
task_rq(p)->rt.tg wrongly checks the root rt_rq.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NKirill Tkhai <ktkhai@parallels.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Link: http://lkml.kernel.org/r/20140921193336.GA28618@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

8651c658

sched/deadline: Clear dl_entity params when setscheduling to different class · a5e7be3b

由 Juri Lelli 提交于 9月 19, 2014

When a task is using SCHED_DEADLINE and the user setschedules it to a
different class its sched_dl_entity static parameters are not cleaned
up. This causes a bug if the user sets it back to SCHED_DEADLINE with
the same parameters again.  The problem resides in the check we
perform at the very beginning of dl_overflow():

	if (new_bw == p->dl.dl_bw)
		return 0;

This condition is met in the case depicted above, so the function
returns and dl_b->total_bw is not updated (the p->dl.dl_bw is not
added to it). After this, admission control is broken.

This patch fixes the thing, properly clearing static parameters for a
task that ceases to use SCHED_DEADLINE.
Reported-by: NDaniele Alessandrelli <daniele.alessandrelli@gmail.com>
Reported-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
Reported-by: NVincent Legout <vincent@legout.info>
Tested-by: NLuca Abeni <luca.abeni@unitn.it>
Tested-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
Tested-by: NVincent Legout <vincent@legout.info>
Signed-off-by: NJuri Lelli <juri.lelli@arm.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Fabio Checconi <fchecconi@gmail.com>
Cc: Dario Faggioli <raistlin@linux.it>
Cc: Michael Trimarchi <michael@amarulasolutions.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1411118561-26323-2-git-send-email-juri.lelli@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

a5e7be3b

21 9月, 2014 1 次提交

sched: Clean up some typos and grammatical errors in code/comments · 9c58c79a

由 Zhihui Zhang 提交于 9月 20, 2014

Signed-off-by: NZhihui Zhang <zzhsuny@gmail.com>
Cc: peterz@infradead.org
Link: http://lkml.kernel.org/r/1411262676-19928-1-git-send-email-zzhsuny@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

9c58c79a

19 9月, 2014 3 次提交

sched: Add default-disabled option to BUG() when stack end location is overwritten · 0d9e2632

由 Aaron Tomlin 提交于 9月 12, 2014

Currently in the event of a stack overrun a call to schedule()
does not check for this type of corruption. This corruption is
often silent and can go unnoticed. However once the corrupted
region is examined at a later stage, the outcome is undefined
and often results in a sporadic page fault which cannot be
handled.

This patch checks for a stack overrun and takes appropriate
action since the damage is already done, there is no point
in continuing.
Signed-off-by: NAaron Tomlin <atomlin@redhat.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: aneesh.kumar@linux.vnet.ibm.com
Cc: dzickus@redhat.com
Cc: bmr@redhat.com
Cc: jcastillo@redhat.com
Cc: oleg@redhat.com
Cc: riel@redhat.com
Cc: prarit@redhat.com
Cc: jgh@redhat.com
Cc: minchan@kernel.org
Cc: mpe@ellerman.id.au
Cc: tglx@linutronix.de
Cc: rostedt@goodmis.org
Cc: hannes@cmpxchg.org
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lubomir Rintel <lkundrak@v3.sk>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1410527779-8133-4-git-send-email-atomlin@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

0d9e2632

sched: Do not stop cpu in set_cpus_allowed_ptr() if task is not running · a15b12ac

由 Kirill Tkhai 提交于 9月 12, 2014

If a task is queued but not running on it rq, we can simply migrate
it without migration thread and switching of context.
Signed-off-by: NKirill Tkhai <ktkhai@parallels.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1410519814.3569.7.camel@tkhaiSigned-off-by: NIngo Molnar <mingo@kernel.org>

a15b12ac

sched/core: Use put_prev_task() accessor where possible · f3cd1c4e

由 Kirill Tkhai 提交于 9月 12, 2014

Signed-off-by: NKirill Tkhai <ktkhai@parallels.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1410529300.3569.25.camel@tkhaiSigned-off-by: NIngo Molnar <mingo@kernel.org>

f3cd1c4e