提交 · 0e59bdaea75f12a7d7c03672f4ac22c0119a1bc0 · openanolis / cloud-kernel

05 6月, 2014 3 次提交

sched: Remove redundant assignment to "rt_rq" in update_curr_rt(...) · 0b07939c

由 Giedrius Rekasius 提交于 5月 25, 2014

Variable "rt_rq" is used only in block "for_each_sched_rt_entity" so the
value assigned to it at the beginning of the update_curr_rt(...) gets
overwritten without ever being read. Remove redundant assignment and
move variable declaration to the block in which it is being used.
Signed-off-by: NGiedrius Rekasius <giedrius.rekasius@gmail.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Cc: kernel-janitors@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1401027811-30066-1-git-send-email-giedrius.rekasius@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

0b07939c

printk: Add printk_deferred_once · c224815d

由 John Stultz 提交于 6月 04, 2014

Two of the three prink_deferred uses are really printk_once style
uses, so add a printk_deferred_once macro to simplify those call
sites.
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c224815d

printk: rename printk_sched to printk_deferred · aac74dc4

由 John Stultz 提交于 6月 04, 2014

After learning we'll need some sort of deferred printk functionality in
the timekeeping core, Peter suggested we rename the printk_sched function
so it can be reused by needed subsystems.

This only changes the function name. No logic changes.
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

aac74dc4

22 5月, 2014 1 次提交

sched, nohz: Change rq->nr_running to always use wrappers · 72465447

由 Kirill Tkhai 提交于 5月 09, 2014

Sometimes ->nr_running may cross 2 but interrupt is not being
sent to rq's cpu. In this case we don't reenable the timer.
Looks like this may be the reason for rare unexpected effects,
if nohz is enabled.

Patch replaces all places of direct changing of nr_running
and makes add_nr_running() caring about crossing border.
Signed-off-by: NKirill Tkhai <tkhai@yandex.ru>
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140508225830.2469.97461.stgit@localhostSigned-off-by: NIngo Molnar <mingo@kernel.org>

72465447

18 4月, 2014 5 次提交

sched: Revert commit ("sched/core: Fix endless loop in pick_next_task()") · 46383648

由 Kirill Tkhai 提交于 3月 15, 2014

This reverts commit 4c6c4e38 ("sched/core: Fix endless loop in
pick_next_task()"), which is not necessary after ("sched/rt: Substract number
of tasks of throttled queues from rq->nr_running").
Signed-off-by: NKirill Tkhai <tkhai@yandex.ru>
Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
[conflict resolution with stop task checking patch]
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1394835307.18748.34.camel@HP-250-G1-Notebook-PC
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

46383648

sched/rt: Substract number of tasks of throttled queues from rq->nr_running · f4ebcbc0

由 Kirill Tkhai 提交于 3月 15, 2014

Now rq->rt becomes to be able to be in dequeued or enqueued state.
We add new member rt_rq->rt_queued, which is used to indicate this.
The member is used only for top queue rq->rt_rq.

The goal is to fit generic scheme which is used in deadline and
fair classes, i.e. throttled rt_rq's rt_nr_running is beeing
substracted from rq->nr_running.
Signed-off-by: NKirill Tkhai <tkhai@yandex.ru>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1394835300.18748.33.camel@HP-250-G1-Notebook-PC
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

f4ebcbc0

sched/rt: Add accessors rq_of_rt_se() · 653d07a6

由 Kirill Tkhai 提交于 3月 15, 2014

Two accessors for RT_GROUP_SCHED and !RT_GROUP_SCHED cases.
Signed-off-by: NKirill Tkhai <tkhai@yandex.ru>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1394835295.18748.32.camel@HP-250-G1-Notebook-PC
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

653d07a6

sched/rt: Sum number of all children tasks in hierarhy at ->rt_nr_running · 22abdef3

由 Kirill Tkhai 提交于 3月 15, 2014

{inc,dec}_rt_tasks() used to count entities which are directly queued
on the rt_rq. If an entity was not a task (i.e., it is some queue), its
children were not counted.

There is no problem here, but now we want to count number of all tasks
which are actually queued under the rt_rq in all the hierarchy (except
throttled rt queues).

Empty queues are not able to be queued and all of the places, which
use ->rt_nr_running, just compare it with zero, so we do not break
anything here.
Signed-off-by: NKirill Tkhai <tkhai@yandex.ru>
Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1394835289.18748.31.camel@HP-250-G1-Notebook-PC
Cc: linux-kernel@vger.kernel.org
[ Twiddled the changelog. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

22abdef3

sched/rt: Do not try to push tasks if pinned task switches to RT · 10447917

由 Kirill V Tkhai 提交于 3月 12, 2014

Just switched pinned task is not able to be pushed. If the rq had had
several RT tasks before they have already been considered as candidates
to be pushed (or pulled).
Signed-off-by: NKirill V Tkhai <tkhai@yandex.ru>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140312061833.3a43aa64@gandalf.local.home
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

10447917

17 4月, 2014 1 次提交

sched: Check for stop task appearance when balancing happens · a1d9a323

由 Kirill Tkhai 提交于 4月 10, 2014

We need to do it like we do for the other higher priority classes..
Signed-off-by: NKirill Tkhai <tkhai@yandex.ru>
Cc: Michael wang <wangyun@linux.vnet.ibm.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/336561397137116@web27h.yandex.ruSigned-off-by: NIngo Molnar <mingo@kernel.org>

a1d9a323

11 3月, 2014 2 次提交

sched/core: Fix endless loop in pick_next_task() · 4c6c4e38

由 Kirill Tkhai 提交于 3月 06, 2014

1) Single cpu machine case.

When rq has only RT tasks, but no one of them can be picked
because of throttling, we enter in endless loop.

pick_next_task_{dl,rt} return NULL.

In pick_next_task_fair() we permanently go to retry

	if (rq->nr_running != rq->cfs.h_nr_running)
		return RETRY_TASK;

(rq->nr_running is not being decremented when rt_rq becomes
throttled).

No chances to unthrottle any rt_rq or to wake fair here,
because of rq is locked permanently and interrupts are
disabled.

2) In case of SMP this can cause a hang too. Although we unlock
   rq in idle_balance(), interrupts are still disabled.

The solution is to check for available tasks in DL and RT
classes instead of checking for sum.
Signed-off-by: NKirill Tkhai <ktkhai@parallels.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1394098321.19290.11.camel@tkhaiSigned-off-by: NIngo Molnar <mingo@kernel.org>

4c6c4e38

sched/rt: Fix picking RT and DL tasks from empty queue · 734ff2a7

由 Kirill Tkhai 提交于 3月 04, 2014

The problems:

1) We check for rt_nr_running before call of put_prev_task().
   If previous task is RT, its rt_rq may become throttled
   and dequeued after this call.

In case of p is from rt->rq this just causes picking a task
from throttled queue, but in case of its rt_rq is child
we are guaranteed catch BUG_ON.

2) The same with deadline class. The only difference we operate
   on only dl_rq.

This patch fixes all the above problems and it adds a small skip in the
DL update like we've already done for RT class:

	if (unlikely((s64)delta_exec <= 0))
		return;

This will optimize sequential update_curr_dl() calls a little.
Signed-off-by: NKirill Tkhai <ktkhai@parallels.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@gmail.com>
Link: http://lkml.kernel.org/r/1393946746.3643.3.camel@tkhaiSigned-off-by: NIngo Molnar <mingo@kernel.org>

734ff2a7

27 2月, 2014 2 次提交

sched: Guarantee task priority in pick_next_task() · 37e117c0

由 Peter Zijlstra 提交于 2月 14, 2014

Michael spotted that the idle_balance() push down created a task
priority problem.

Previously, when we called idle_balance() before pick_next_task() it
wasn't a problem when -- because of the rq->lock droppage -- an rt/dl
task slipped in.

Similarly for pre_schedule(), rt pre-schedule could have a dl task
slip in.

But by pulling it into the pick_next_task() loop, we'll not try a
higher task priority again.

Cure this by creating a re-start condition in pick_next_task(); and
triggering this from pick_next_task_{rt,fair}().

It also fixes a live-lock where we get stuck in pick_next_task_fair()
due to idle_balance() seeing !0 nr_running but there not actually
being any fair tasks about.
Reported-by: NMichael Wang <wangyun@linux.vnet.ibm.com>
Fixes: 38033c37 ("sched: Push down pre_schedule() and idle_balance()")
Tested-by: NSasha Levin <sasha.levin@oracle.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20140224121218.GR15586@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

37e117c0

sched/deadline: Prevent rt_time growth to infinity · faa59937

由 Juri Lelli 提交于 2月 21, 2014

Kirill Tkhai noted:

  Since deadline tasks share rt bandwidth, we must care about
  bandwidth timer set. Otherwise rt_time may grow up to infinity
  in update_curr_dl(), if there are no other available RT tasks
  on top level bandwidth.

RT task were in fact throttled right after they got enqueued,
and never executed again (rt_time never again went below rt_runtime).

Peter then proposed to accrue DL execution on rt_time only when
rt timer is active, and proposed a patch (this patch is a slight
modification of that) to implement that behavior. While this
solves Kirill problem, it has a drawback.

Indeed, Kirill noted again:

  It looks we may get into a situation, when all CPU time is shared
  between RT and DL tasks:

  rt_runtime = n
  rt_period  = 2n

  | RT working, DL sleeping  | DL working, RT sleeping      |
  -----------------------------------------------------------
  | (1)     duration = n     | (2)     duration = n         | (repeat)
  |--------------------------|------------------------------|
  | (rt_bw timer is running) | (rt_bw timer is not running) |

  No time for fair tasks at all.

While this can happen during the first period, if rq is always backlogged,
RT tasks won't have the opportunity to execute anymore: rt_time reached
rt_runtime during (1), suppose after (2) RT is enqueued back, it gets
throttled since rt timer didn't fire, replenishment is from now on eaten up
by DL tasks that accrue their execution on rt_time (while rt timer is
active - we have an RT task waiting for replenishment). FAIR tasks are
not touched after this first period. Ok, this is not ideal, and the situation
is even worse!

What above (the nice case), practically never happens in reality, where
your rt timer is not aligned to tasks periods, tasks are in general not
periodic, etc.. Long story short, you always risk to overload your system.

This patch is based on Peter's idea, but exploits an additional fact:
if you don't have RT tasks enqueued, it makes little sense to continue
incrementing rt_time once you reached the upper limit (DL tasks have their
own mechanism for throttling).

This cures both problems:

 - no matter how many DL instances in the past, you'll have an rt_time
   slightly above rt_runtime when an RT task is enqueued, and from that
   point on (after the first replenishment), the task will normally execute;

 - you can still eat up all bandwidth during the first period, but not
   anymore after that, remember that DL execution will increment rt_time
   till the upper limit is reached.

The situation is still not perfect! But, we have a simple solution for now,
that limits how much you can jeopardize your system, as we keep working
towards the right answer: RT groups scheduled using deadline servers.
Reported-by: NKirill Tkhai <tkhai@yandex.ru>
Signed-off-by: NJuri Lelli <juri.lelli@gmail.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20140225151515.617714e2f2cd6c558531ba61@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

faa59937

23 2月, 2014 1 次提交

sched/rt: Make init_sched_rt_calss() __init · 11c785b7

由 Li Zefan 提交于 2月 08, 2014

It's a bootstrap function.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/52F5CC09.1080502@huawei.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

11c785b7

22 2月, 2014 2 次提交

sched: Remove some #ifdeffery · dc877341

由 Peter Zijlstra 提交于 2月 12, 2014

Remove a few gratuitous #ifdefs in pick_next_task*().

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Juri Lelli <juri.lelli@gmail.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-nnzddp5c4fijyzzxxrwlxghf@git.kernel.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

dc877341

sched: Fix hotplug task migration · 3f1d2a31

由 Peter Zijlstra 提交于 2月 12, 2014

Dan Carpenter reported:

> kernel/sched/rt.c:1347 pick_next_task_rt() warn: variable dereferenced before check 'prev' (see line 1338)
> kernel/sched/deadline.c:1011 pick_next_task_dl() warn: variable dereferenced before check 'prev' (see line 1005)

Kirill also spotted that migrate_tasks() will have an instant NULL
deref because pick_next_task() will immediately deref prev.

Instead of fixing all the corner cases because migrate_tasks() can
pass in a NULL prev task in the unlikely case of hot-un-plug, provide
a fake task such that we can remove all the NULL checks from the far
more common paths.

A further problem; not previously spotted; is that because we pushed
pre_schedule() and idle_balance() into pick_next_task() we now need to
avoid those getting called and pulling more tasks on our dying CPU.

We avoid pull_{dl,rt}_task() by setting fake_task.prio to MAX_PRIO+1.
We also note that since we call pick_next_task() exactly the amount of
times we have runnable tasks present, we should never land in
idle_balance().

Fixes: 38033c37 ("sched: Push down pre_schedule() and idle_balance()")
Cc: Juri Lelli <juri.lelli@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Reported-by: NKirill Tkhai <tkhai@yandex.ru>
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140212094930.GB3545@laptop.programming.kicks-ass.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

3f1d2a31

11 2月, 2014 1 次提交

sched: Push down pre_schedule() and idle_balance() · 38033c37

由 Peter Zijlstra 提交于 1月 23, 2014

This patch both merged idle_balance() and pre_schedule() and pushes
both of them into pick_next_task().

Conceptually pre_schedule() and idle_balance() are rather similar,
both are used to pull more work onto the current CPU.

We cannot however first move idle_balance() into pre_schedule_fair()
since there is no guarantee the last runnable task is a fair task, and
thus we would miss newidle balances.

Similarly, the dl and rt pre_schedule calls must be ran before
idle_balance() since their respective tasks have higher priority and
it would not do to delay their execution searching for less important
tasks first.

However, by noticing that pick_next_tasks() already traverses the
sched_class hierarchy in the right order, we can get the right
behaviour and do away with both calls.

We must however change the special case optimization to also require
that prev is of sched_class_fair, otherwise we can miss doing a dl or
rt pull where we needed one.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/n/tip-a8k6vvaebtn64nie345kx1je@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

38033c37

10 2月, 2014 1 次提交

sched: Push put_prev_task() into pick_next_task() · 606dba2e

由 Peter Zijlstra 提交于 2月 11, 2012

In order to avoid having to do put/set on a whole cgroup hierarchy
when we context switch, push the put into pick_next_task() so that
both operations are in the same function. Further changes then allow
us to possibly optimize away redundant work.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1328936700.2476.17.camel@laptopSigned-off-by: NIngo Molnar <mingo@kernel.org>

606dba2e

13 1月, 2014 1 次提交

sched/deadline: Add SCHED_DEADLINE SMP-related data structures & logic · 1baca4ce

由 Juri Lelli 提交于 11月 07, 2013

Introduces data structures relevant for implementing dynamic
migration of -deadline tasks and the logic for checking if
runqueues are overloaded with -deadline tasks and for choosing
where a task should migrate, when it is the case.

Adds also dynamic migrations to SCHED_DEADLINE, so that tasks can
be moved among CPUs when necessary. It is also possible to bind a
task to a (set of) CPU(s), thus restricting its capability of
migrating, or forbidding migrations at all.

The very same approach used in sched_rt is utilised:
- -deadline tasks are kept into CPU-specific runqueues,
- -deadline tasks are migrated among runqueues to achieve the
following:
* on an M-CPU system the M earliest deadline ready tasks
are always running;
* affinity/cpusets settings of all the -deadline tasks is
always respected.

Therefore, this very special form of "load balancing" is done with
an active method, i.e., the scheduler pushes or pulls tasks between
runqueues when they are woken up and/or (de)scheduled.
IOW, every time a preemption occurs, the descheduled task might be sent
to some other CPU (depending on its deadline) to continue executing
(push). On the other hand, every time a CPU becomes idle, it might pull
the second earliest deadline ready task from some other CPU.

To enforce this, a pull operation is always attempted before taking any
scheduling decision (pre_schedule()), as well as a push one after each
scheduling decision (post_schedule()). In addition, when a task arrives
or wakes up, the best CPU where to resume it is selected taking into
account its affinity mask, the system topology, but also its deadline.
E.g., from the scheduling point of view, the best CPU where to wake
up (and also where to push) a task is the one which is running the task
with the latest deadline among the M executing ones.

In order to facilitate these decisions, per-runqueue "caching" of the
deadlines of the currently running and of the first ready task is used.
Queued but not running tasks are also parked in another rb-tree to
speed-up pushes.
Signed-off-by: NJuri Lelli <juri.lelli@gmail.com>
Signed-off-by: NDario Faggioli <raistlin@linux.it>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1383831828-15501-5-git-send-email-juri.lelli@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

1baca4ce

17 12月, 2013 1 次提交

sched/rt: Fix rq's cpupri leak while enqueue/dequeue child RT entities · 757dfcaa

由 Kirill Tkhai 提交于 11月 27, 2013

This patch touches the RT group scheduling case.

Functions inc_rt_prio_smp() and dec_rt_prio_smp() change (global) rq's
priority, while rt_rq passed to them may be not the top-level rt_rq.
This is wrong, because changing of priority on a child level does not
guarantee that the priority is the highest all over the rq. So, this
leak makes RT balancing unusable.

The short example: the task having the highest priority among all rq's
RT tasks (no one other task has the same priority) are waking on a
throttle rt_rq.  The rq's cpupri is set to the task's priority
equivalent, but real rq->rt.highest_prio.curr is less.

The patch below fixes the problem.
Signed-off-by: NKirill Tkhai <tkhai@yandex.ru>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/49231385567953@web4m.yandex.ruSigned-off-by: NIngo Molnar <mingo@kernel.org>

757dfcaa

26 10月, 2013 1 次提交

sched/rt: Fix task_tick_rt() comment · e9aa39bb

由 Li Bin 提交于 10月 21, 2013

This issue was introduced by 454c7999 ("sched/rt: Fix SCHED_RR
across cgroups") that missed the word 'not'. Fix it.
Signed-off-by: NLi Bin <huawei.libin@huawei.com>
Cc: <guohanjun@huawei.com>
Cc: <xiexiuqi@huawei.com>
Cc: <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1382357743-54136-1-git-send-email-huawei.libin@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

e9aa39bb

16 10月, 2013 1 次提交

sched/rt: Add missing rmb() · 7c3f2ab7

由 Peter Zijlstra 提交于 10月 15, 2013

While discussing the proposed SCHED_DEADLINE patches which in parts
mimic the existing FIFO code it was noticed that the wmb in
rt_set_overloaded() didn't have a matching barrier.

The only site using rt_overloaded() to test the rto_count is
pull_rt_task() and we should issue a matching rmb before then assuming
there's an rto_mask bit set.

Without that smp_rmb() in there we could actually miss seeing the
rto_mask bit.

Also, change to using smp_[wr]mb(), even though this is SMP only code;
memory barriers without smp_ always make me think they're against
hardware of some sort.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Cc: vincent.guittot@linaro.org
Cc: luca.abeni@unitn.it
Cc: bruce.ashfield@windriver.com
Cc: dhaval.giani@gmail.com
Cc: rostedt@goodmis.org
Cc: hgu1972@gmail.com
Cc: oleg@redhat.com
Cc: fweisbec@gmail.com
Cc: darren@dvhart.com
Cc: johan.eker@ericsson.com
Cc: p.faure@akatech.ch
Cc: paulmck@linux.vnet.ibm.com
Cc: raistlin@linux.it
Cc: claudio@evidence.eu.com
Cc: insop.song@gmail.com
Cc: michael@amarulasolutions.com
Cc: liming.wang@windriver.com
Cc: fchecconi@gmail.com
Cc: jkacur@redhat.com
Cc: tommaso.cucinotta@sssup.it
Cc: Juri Lelli <juri.lelli@gmail.com>
Cc: harald.gustafsson@ericsson.com
Cc: nicola.manica@disi.unitn.it
Cc: tglx@linutronix.de
Link: http://lkml.kernel.org/r/20131015103507.GF10651@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

7c3f2ab7

09 10月, 2013 1 次提交

sched/numa: Introduce migrate_swap() · ac66f547

由 Peter Zijlstra 提交于 10月 07, 2013

Use the new stop_two_cpus() to implement migrate_swap(), a function that
flips two tasks between their respective cpus.

I'm fairly sure there's a less crude way than employing the stop_two_cpus()
method, but everything I tried either got horribly fragile and/or complex. So
keep it simple for now.

The notable detail is how we 'migrate' tasks that aren't runnable
anymore. We'll make it appear like we migrated them before they went to
sleep. The sole difference is the previous cpu in the wakeup path, so we
override this.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Reviewed-by: NRik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: NMel Gorman <mgorman@suse.de>
Link: http://lkml.kernel.org/r/1381141781-10992-39-git-send-email-mgorman@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

ac66f547

06 10月, 2013 1 次提交

sched/rt: Remove redundant nr_cpus_allowed test · 6bfa687c

由 Shawn Bohrer 提交于 10月 04, 2013

In 76854c7e ("sched: Use
rt.nr_cpus_allowed to recover select_task_rq() cycles") an
optimization was added to select_task_rq_rt() that immediately
returns when p->nr_cpus_allowed == 1 at the beginning of the
function.

This makes the latter p->nr_cpus_allowed > 1 check redundant,
which can now be removed.
Signed-off-by: NShawn Bohrer <sbohrer@rgmadvisors.com>
Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: Mike Galbraith <mgalbraith@suse.de>
Cc: tomk@rgmadvisors.com
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1380914693-24634-1-git-send-email-shawn.bohrer@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

6bfa687c

19 6月, 2013 1 次提交

sched/rt: Simplify pull_rt_task() logic and remove .leaf_rt_rq_list · e23ee747

由 Kirill Tkhai 提交于 6月 07, 2013

[ Peter, this is based off of some of my work, I ran it though a few
  tests and it passed. I also reviewed it, and added my SOB as I am
  somewhat a co-author to it. ]

Based on the patch by Steven Rostedt from previous year:

https://lkml.org/lkml/2012/4/18/517

1)Simplify pull_rt_task() logic: search in pushable tasks of dest runqueue.
The only pullable tasks are the tasks which are pushable in their local rq,
and no others.

2)Remove .leaf_rt_rq_list member of struct rt_rq and functions connected
with it: nobody uses it since now.
Signed-off-by: NKirill Tkhai <tkhai@yandex.ru>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/287571370557898@web7d.yandex.ruSigned-off-by: NIngo Molnar <mingo@kernel.org>

e23ee747

28 5月, 2013 2 次提交

sched: Use an accessor to read the rq clock · 78becc27

由 Frederic Weisbecker 提交于 4月 12, 2013

Read the runqueue clock through an accessor. This
prepares for adding a debugging infrastructure to
detect missing or redundant calls to update_rq_clock()
between a scheduler's entry and exit point.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Turner <pjt@google.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1365724262-20142-6-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

78becc27

sched: Remove redundant update_runtime notifier · c5405a49

由 Neil Zhang 提交于 4月 11, 2013

migration_call() will do all the things that update_runtime() does.
So let's remove it.

Furthermore, there is potential risk that the current code will catch
BUG_ON at line 689 of rt.c when do cpu hotplug while there are realtime
threads running because of enabling runtime twice while the rt_runtime
may already changed.
Signed-off-by: NNeil Zhang <zhangwm@marvell.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1365685499-26515-1-git-send-email-zhangwm@marvell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

c5405a49

10 5月, 2013 1 次提交

sched: Use this_rq() helper · 424c93fe

由 Nathan Zimmer 提交于 5月 09, 2013

It is a few instructions more efficent to and slightly more
readable to use this_rq()-> instead of cpu_rq(smp_processor_id())-> .

Size comparison of kernel/sched/fair.o:

   text    data     bss     dec     hex filename
  27972     122      26   28120    6dd8 fair.o.before
  27956     122      26   28104    6dc8 fair.o.after
Signed-off-by: NNathan Zimmer <nzimmer@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1368116643-87971-1-git-send-email-nzimmer@sgi.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

424c93fe

08 2月, 2013 1 次提交

sched/rt: Add a tuning knob to allow changing SCHED_RR timeslice · ce0dbbbb

由 Clark Williams 提交于 2月 07, 2013

Add a /proc/sys/kernel scheduler knob named
sched_rr_timeslice_ms that allows global changing of the
SCHED_RR timeslice value. User visable value is in milliseconds
but is stored as jiffies.  Setting to 0 (zero) resets to the
default (currently 100ms).
Signed-off-by: NClark Williams <williams@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20130207094704.13751796@riff.lanSigned-off-by: NIngo Molnar <mingo@kernel.org>

ce0dbbbb

04 2月, 2013 1 次提交

sched/rt: Further simplify pick_rt_task() · 60334caf

由 Kirill Tkhai 提交于 1月 31, 2013

Function next_prio() has been removed and pull_rt_task() is the
only user of pick_next_highest_task_rt() at the moment.

pull_rt_task is not interested in p->nr_cpus_allowed, its only
interest is the fact that cpu is allowed to execute p. If
nr_cpus_allowed == 1, cpu != task_cpu(p) and cpu is allowed then
it means that task p is in the middle of the migration
techniques; the task waits until it is moved by migration
thread. So, lets pull it earlier.
Signed-off-by: NKirill V Tkhai <tkhai@yandex.ru>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
CC: linux-rt-users <linux-rt-users@vger.kernel.org>
Link: http://lkml.kernel.org/r/70871359644177@web16d.yandex.ruSigned-off-by: NIngo Molnar <mingo@kernel.org>

60334caf

31 1月, 2013 1 次提交

sched/rt: Do not account zero delta_exec in update_curr_rt() · fc79e240

由 Kirill Tkhai 提交于 1月 30, 2013

There are several places of consecutive calls of
dequeue_task_rt() and put_prev_task_rt() in the scheduler.
For example, function rt_mutex_setprio() does it.

The both calls lead to update_curr_rt(), the second of it
receives zeroed delta_exec. The only effective action in this
case is call of sched_rt_avg_update(), which can change
rq->age_stamp and rq->rt_avg. But it is possible in case of
""floating"" rq->clock. This fact is not reasonable to be
accounted. Another actions do nothing.
Signed-off-by: NKirill V Tkhai <tkhai@yandex.ru>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
CC: linux-rt-users <linux-rt-users@vger.kernel.org>
Link: http://lkml.kernel.org/r/931541359550236@web1g.yandex.ruSigned-off-by: NIngo Molnar <mingo@kernel.org>

fc79e240

25 1月, 2013 3 次提交

sched/rt: Avoid updating RT entry timeout twice within one tick period · 57d2aa00

由 Ying Xue 提交于 7月 17, 2012

The issue below was found in 2.6.34-rt rather than mainline rt
kernel, but the issue still exists upstream as well.

So please let me describe how it was noticed on 2.6.34-rt:

On this version, each softirq has its own thread, it means there
is at least one RT FIFO task per cpu. The priority of these
tasks is set to 49 by default. If user launches an RT FIFO task
with priority lower than 49 of softirq RT tasks, it's possible
there are two RT FIFO tasks enqueued one cpu runqueue at one
moment. By current strategy of balancing RT tasks, when it comes
to RT tasks, we really need to put them off to a CPU that they
can run on as soon as possible. Even if it means a bit of cache
line flushing, we want RT tasks to be run with the least latency.

When the user RT FIFO task which just launched before is
running, the sched timer tick of the current cpu happens. In this
tick period, the timeout value of the user RT task will be
updated once. Subsequently, we try to wake up one softirq RT
task on its local cpu. As the priority of current user RT task
is lower than the softirq RT task, the current task will be
preempted by the higher priority softirq RT task. Before
preemption, we check to see if current can readily move to a
different cpu. If so, we will reschedule to allow the RT push logic
to try to move current somewhere else. Whenever the woken
softirq RT task runs, it first tries to migrate the user FIFO RT
task over to a cpu that is running a task of lesser priority. If
migration is done, it will send a reschedule request to the found
cpu by IPI interrupt. Once the target cpu responds the IPI
interrupt, it will pick the migrated user RT task to preempt its
current task. When the user RT task is running on the new cpu,
the sched timer tick of the cpu fires. So it will tick the user
RT task again. This also means the RT task timeout value will be
updated again. As the migration may be done in one tick period,
it means the user RT task timeout value will be updated twice
within one tick.

If we set a limit on the amount of cpu time for the user RT task
by setrlimit(RLIMIT_RTTIME), the SIGXCPU signal should be posted
upon reaching the soft limit.

But exactly when the SIGXCPU signal should be sent depends on the
RT task timeout value. In fact the timeout mechanism of sending
the SIGXCPU signal assumes the RT task timeout is increased once
every tick.

However, currently the timeout value may be added twice per
tick. So it results in the SIGXCPU signal being sent earlier
than expected.

To solve this issue, we prevent the timeout value from increasing
twice within one tick time by remembering the jiffies value of
last updating the timeout. As long as the RT task's jiffies is
different with the global jiffies value, we allow its timeout to
be updated.
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NFan Du <fan.du@windriver.com>
Reviewed-by: NYong Zhang <yong.zhang0@gmail.com>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1342508623-2887-1-git-send-email-ying.xue@windriver.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

57d2aa00

sched/rt: Use root_domain of rt_rq not current processor · aa7f6730

由 Shawn Bohrer 提交于 1月 14, 2013

When the system has multiple domains do_sched_rt_period_timer()
can run on any CPU and may iterate over all rt_rq in
cpu_online_mask.  This means when balance_runtime() is run for a
given rt_rq that rt_rq may be in a different rd than the current
processor.  Thus if we use smp_processor_id() to get rd in
do_balance_runtime() we may borrow runtime from a rt_rq that is
not part of our rd.

This changes do_balance_runtime to get the rd from the passed in
rt_rq ensuring that we borrow runtime only from the correct rd
for the given rt_rq.

This fixes a BUG at kernel/sched/rt.c:687! in __disable_runtime
when we try reclaim runtime lent to other rt_rq but runtime has
been lent to a rt_rq in another rd.
Signed-off-by: NShawn Bohrer <sbohrer@rgmadvisors.com>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Acked-by: NMike Galbraith <bitbucket@online.de>
Cc: peterz@infradead.org
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/1358186131-29494-1-git-send-email-sbohrer@rgmadvisors.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

aa7f6730

sched/rt: Add reschedule check to switched_from_rt() · 1158ddb5

由 Kirill Tkhai 提交于 11月 23, 2012

Reschedule rq->curr if the first RT task has just been
pulled to the rq.
Signed-off-by: NKirill V Tkhai <tkhai@yandex.ru>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tkhai Kirill <tkhai@yandex.ru>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/118761353614535@web28f.yandex.ruSigned-off-by: NIngo Molnar <mingo@kernel.org>

1158ddb5

13 9月, 2012 1 次提交

sched: Remove __ARCH_WANT_INTERRUPTS_ON_CTXSW · f3e94786

由 Peter Zijlstra 提交于 9月 12, 2012

Now that the last architecture to use this has stopped doing so (ARM,
thanks Catalin!) we can remove this complexity from the scheduler
core.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Link: http://lkml.kernel.org/n/tip-g9p2a1w81xxbrze25v9zpzbf@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

f3e94786

04 9月, 2012 1 次提交

sched: Unthrottle rt runqueues in __disable_runtime() · a4c96ae3

由 Peter Boonstoppel 提交于 8月 09, 2012

migrate_tasks() uses _pick_next_task_rt() to get tasks from the
real-time runqueues to be migrated. When rt_rq is throttled
_pick_next_task_rt() won't return anything, in which case
migrate_tasks() can't move all threads over and gets stuck in an
infinite loop.

Instead unthrottle rt runqueues before migrating tasks.

Additionally: move unthrottle_offline_cfs_rqs() to rq_offline_fair()
Signed-off-by: NPeter Boonstoppel <pboonstoppel@nvidia.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Turner <pjt@google.com>
Link: http://lkml.kernel.org/r/5FBF8E85CA34454794F0F7ECBA79798F379D3648B7@HQMAIL04.nvidia.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

a4c96ae3

14 8月, 2012 1 次提交

sched,rt: fix isolated CPUs leaving root_task_group indefinitely throttled · e221d028

由 Mike Galbraith 提交于 8月 07, 2012

Root task group bandwidth replenishment must service all CPUs, regardless of
where the timer was last started, and regardless of the isolation mechanism,
lest 'Quoth the Raven, "Nevermore"' become rt scheduling policy.
Signed-off-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1344326558.6968.25.camel@marge.simpson.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

e221d028

06 6月, 2012 1 次提交

sched/rt: Fix lockdep annotation within find_lock_lowest_rq() · 7f1b4393

由 Peter Zijlstra 提交于 5月 17, 2012

Roland Dreier reported spurious, hard to trigger lockdep warnings
within the scheduler - without any real lockup.

This bit gives us the right clue:

> [89945.640512]  [<ffffffff8103fa1a>] double_lock_balance+0x5a/0x90
> [89945.640568]  [<ffffffff8104c546>] push_rt_task+0xc6/0x290

if you look at that code you'll find the double_lock_balance() in
question is the one in find_lock_lowest_rq() [yay for inlining].

Now find_lock_lowest_rq() has a bug.. it fails to use
double_unlock_balance() in one exit path, if this results in a retry in
push_rt_task() we'll call double_lock_balance() again, at which point
we'll run into said lockdep confusion.
Reported-by: NRoland Dreier <roland@kernel.org>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1337282386.4281.77.camel@twinsSigned-off-by: NIngo Molnar <mingo@kernel.org>

7f1b4393

30 5月, 2012 1 次提交

sched/rt: Fix SCHED_RR across cgroups · 454c7999

由 Colin Cross 提交于 5月 16, 2012

task_tick_rt() has an optimization to only reschedule SCHED_RR tasks
if they were the only element on their rq. However, with cgroups
a SCHED_RR task could be the only element on its per-cgroup rq but
still be competing with other SCHED_RR tasks in its parent's
cgroup. In this case, the SCHED_RR task in the child cgroup would
never yield at the end of its timeslice. If the child cgroup
rt_runtime_us was the same as the parent cgroup rt_runtime_us,
the task in the parent cgroup would starve completely.

Modify task_tick_rt() to check that the task is the only task on its
rq, and that the each of the scheduling entities of its ancestors
is also the only entity on its rq.
Signed-off-by: NColin Cross <ccross@android.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1337229266-15798-1-git-send-email-ccross@android.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

454c7999

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功