提交 · 8e3fabfde445a872c8aec2296846badf24d7c8b4 · openeuler / raspberrypi-kernel

13 3月, 2012 5 次提交

由 Peter Zijlstra 提交于 3月 06, 2012

Suggested-by: NJoe Perches <joe@perches.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1331056466.11248.327.camel@twinsSigned-off-by: NIngo Molnar <mingo@elte.hu>

8e3fabfd

printk/sched: Introduce special printk_sched() for those awkward moments · 3ccf3e83

由 Peter Zijlstra 提交于 2月 27, 2012

There's a few awkward printk()s inside of scheduler guts that people
prefer to keep but really are rather deadlock prone. Fudge around it
by storing the text in a per-cpu buffer and poll it using the existing
printk_tick() handler.

This will drop output when its more frequent than once a tick, however
only the affinity thing could possible go that fast and for that just
one should suffice to notify the admin he's done something silly..
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-wua3lmkt3dg8nfts66o6brne@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

3ccf3e83

sched/nohz: Correctly initialize 'next_balance' in 'nohz' idle balancer · 554cecaf

由 Diwakar Tundlam 提交于 3月 07, 2012

The 'next_balance' field of 'nohz' idle balancer must be initialized
to jiffies. Since jiffies is initialized to negative 300 seconds the
'nohz' idle balancer does not run for the first 300s (5mins) after
bootup. If no new processes are spawed or no idle cycles happen, the
load on the cpus will remain unbalanced for that duration.
Signed-off-by: NDiwakar Tundlam <dtundlam@nvidia.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1DD7BFEDD3147247B1355BEFEFE4665237994F30EF@HQMAIL04.nvidia.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

554cecaf

sched: Cleanup cpu_active madness · 5fbd036b

由 Peter Zijlstra 提交于 12月 15, 2011

Stepan found:

CPU0		CPUn

_cpu_up()
  __cpu_up()

		boostrap()
		  notify_cpu_starting()
		  set_cpu_online()
		  while (!cpu_active())
		    cpu_relax()

<PREEMPT-out>

smp_call_function(.wait=1)
  /* we find cpu_online() is true */
  arch_send_call_function_ipi_mask()

  /* wait-forever-more */

<PREEMPT-in>
		  local_irq_enable()

  cpu_notify(CPU_ONLINE)
    sched_cpu_active()
      set_cpu_active()

Now the purpose of cpu_active is mostly with bringing down a cpu, where
we mark it !active to avoid the load-balancer from moving tasks to it
while we tear down the cpu. This is required because we only update the
sched_domain tree after we brought the cpu-down. And this is needed so
that some tasks can still run while we bring it down, we just don't want
new tasks to appear.

On cpu-up however the sched_domain tree doesn't yet include the new cpu,
so its invisible to the load-balancer, regardless of the active state.
So instead of setting the active state after we boot the new cpu (and
consequently having to wait for it before enabling interrupts) set the
cpu active before we set it online and avoid the whole mess.
Reported-by: NStepan Moskovchenko <stepanm@codeaurora.org>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1323965362.18942.71.camel@twinsSigned-off-by: NIngo Molnar <mingo@elte.hu>

5fbd036b

sched: Fix load-balance wreckage · 5d6523eb

由 Peter Zijlstra 提交于 3月 10, 2012

Commit 367456c7 ("sched: Ditch per cgroup task lists for
load-balancing") completely wrecked load-balancing due to
a few silly mistakes.

Correct those and remove more pointless code.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-zk04ihygwxn7qqrlpaf73b0r@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

5d6523eb

02 3月, 2012 1 次提交

sched: Clean up parameter passing of proc_sched_autogroup_set_nice() · 2e5b5b3a

由 Hiroshi Shimamoto 提交于 2月 23, 2012

Pass nice as a value to proc_sched_autogroup_set_nice().

No side effect is expected, and the variable err will be overwritten with
the return value.
Signed-off-by: NHiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4F45FBB7.5090607@ct.jp.nec.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

2e5b5b3a

01 3月, 2012 10 次提交

sched: Ditch per cgroup task lists for load-balancing · 367456c7

由 Peter Zijlstra 提交于 2月 20, 2012

Per cgroup load-balance has numerous problems, chief amongst them that
there is no real sane order in them. So stop pretending it makes sense
and enqueue all tasks on a single list.

This also allows us to more easily fix the fwd progress issue
uncovered by the lock-break stuff. Rotate the list on failure to
migreate and limit the total iterations to nr_running (which with
releasing the lock isn't strictly accurate but close enough).

Also add a filter that skips very light tasks on the first attempt
around the list, this attempts to avoid shooting whole cgroups around
without affecting over balance.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: pjt@google.com
Link: http://lkml.kernel.org/n/tip-tx8yqydc7eimgq7i4rkc3a4g@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

367456c7

sched: Rename load-balancing fields · ddcdf6e7

由 Peter Zijlstra 提交于 2月 22, 2012

 s/env->this_/env->dst_/g
 s/env->busiest_/env->src_/g
 s/pull_task/move_task/g

Makes everything clearer.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: pjt@google.com
Link: http://lkml.kernel.org/n/tip-0yvgms8t8x962drpvl0fu0kk@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

ddcdf6e7

sched: Move load-balancing arguments into helper struct · 8e45cb54

由 Peter Zijlstra 提交于 2月 22, 2012

Passing large sets of similar arguments all around the load-balancer
gets tiresom when you want to modify something. Stick them all in a
helper structure and pass the structure around.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: pjt@google.com
Link: http://lkml.kernel.org/n/tip-5slqz0vhsdzewrfk9eza1aon@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

8e45cb54

sched/rt: Do not submit new work when PI-blocked · 3c7d5184

由 Thomas Gleixner 提交于 7月 17, 2011

When we are PI-blocked then we want to get things done ASAP.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-vw8et3445km5b8mpihf4trae@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

3c7d5184

sched/rt: Prevent idle task boosting · 1c4dd99b

由 Thomas Gleixner 提交于 6月 06, 2011

Idle task boosting is a nono in general. There is one
exception, when PREEMPT_RT and NOHZ is active:

The idle task calls get_next_timer_interrupt() and holds
the timer wheel base->lock on the CPU and another CPU wants
to access the timer (probably to cancel it). We can safely
ignore the boosting request, as the idle CPU runs this code
with interrupts disabled and will complete the lock
protected section without being interrupted. So there is no
real need to boost.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-755rvsosz7sdzot12a3gbha6@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

1c4dd99b

sched/wait: Add __wake_up_all_locked() API · 63b20011

由 Thomas Gleixner 提交于 12月 01, 2011

For code which protects the waitqueue itself with another lock it
makes no sense to acquire the waitqueue lock for wakeup all. Provide
__wake_up_all_locked().

This is an optimization on the vanilla kernel (to be used by the
PCI code) and an important semantic distinction on -rt.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-ux6m4b8jonb9inx8xafh77ds@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

63b20011

sched/rt: Document scheduler related skip-resched-check sites · ba74c144

由 Thomas Gleixner 提交于 3月 21, 2011

Create a distinction between scheduler related preempt_enable_no_resched()
calls and the nearly one hundred other places in the kernel that do not
want to reschedule, for one reason or another.

This distinction matters for -rt, where the scheduler and the non-scheduler
preempt models (and checks) are different. For upstream it's purely
documentational.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/n/tip-gs88fvx2mdv5psnzxnv575ke@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

ba74c144

sched/rt: Add schedule_preempt_disabled() · c5491ea7

由 Thomas Gleixner 提交于 3月 21, 2011

Add helper to get rid of the ever repeating:

    preempt_enable_no_resched();
    schedule();
    preempt_disable();

patterns.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-wxx7btox7coby6ifv5vzhzgp@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

c5491ea7

sched/rt: Do not throttle when PI boosting · 7abc63b1

由 Peter Zijlstra 提交于 10月 18, 2011

When a runqueue has rt_runtime_us = 0 then the only way it can
accumulate rt_time is via PI boosting. That causes the runqueue
to be throttled and replenishing does not change anything due to
rt_runtime_us = 0. So avoid that situation by clearing rt_time and
skip the throttling alltogether.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
[ Changelog ]
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/n/tip-7x70cypsotjb4jvcor3edctk@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

7abc63b1

sched/rt: Keep period timer ticking when rt throttling is active · 42c62a58

由 Peter Zijlstra 提交于 10月 18, 2011

When a runqueue is throttled we cannot disable the period timer
because that timer is the only way to undo the throttling.

We got stale throttling entries when a rq was throttled and then the
global sysctl was disabled, which stopped the timer.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
[ Added changelog ]
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/n/tip-nuj34q52p6ro7szapuz84i0v@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

42c62a58

22 2月, 2012 3 次提交

sched: Make initial SCHED_RR timeslace DEF_TIMESLICE · de5bdff7

由 Hiroshi Shimamoto 提交于 2月 16, 2012

Current the initial SCHED_RR timeslice of init_task is HZ, which means
1s, and is not same as the default SCHED_RR timeslice DEF_TIMESLICE.

Change that initial timeslice to the DEF_TIMESLICE.
Signed-off-by: NHiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
[ s/DEF_TIMESLICE/RR_TIMESLICE/g ]
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4F3C9995.3010800@ct.jp.nec.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

de5bdff7

sched: Remove rcu_read_lock/unlock() from select_idle_sibling() · 62f6536a

由 Nikunj A. Dadhania 提交于 2月 17, 2012

select_idle_sibling() is called from select_task_rq_fair(), which
already has the RCU read lock held.
Signed-off-by: NNikunj A. Dadhania <nikunj@linux.vnet.ibm.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120217030409.11748.12491.stgit@abhimanyuSigned-off-by: NIngo Molnar <mingo@elte.hu>

62f6536a

sched/events: Revert trace_sched_stat_sleeptime() · 8c79a045

由 Peter Zijlstra 提交于 1月 30, 2012

Commit 1ac9bc69 ("sched/tracing: Add a new tracepoint for sleeptime")
added a new sched:sched_stat_sleeptime tracepoint.

It's broken: the first sample we get on a task might be bad because
of a stale sleep_start value that wasn't reset at the last task switch
because the tracepoint was not active.

It also breaks the existing schedstat samples due to the side
effects of:

-               se->statistics.sleep_start = 0;
...
-               se->statistics.block_start = 0;

Nor do I see means to fix it without adding overhead to the scheduler
fast path, which I'm not willing to for the sake of redundant
instrumentation.

Most importantly, sleep time information can already be constructed
by tracing context switches and wakeups, and taking the timestamp
difference between the schedule-out, the wakeup and the schedule-in.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-pc4c9qhl8q6vg3bs4j6k0rbd@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

8c79a045

31 1月, 2012 1 次提交

sched: Move SMP-only variable into the SMP section · ed387b78

由 Hiroshi Shimamoto 提交于 1月 31, 2012

This also fixes the following compilation warning on !SMP:

CC kernel/sched/fair.o
kernel/sched/fair.c:218:36: warning: 'max_load_balance_interval' defined but not used [-Wunused-variable]
Signed-off-by: NHiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Reviewed-by: NVincent Guittot <vincent.guittot@linaro.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4F2754A0.9090306@ct.jp.nec.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

ed387b78

27 1月, 2012 7 次提交

sched: Remove sched_switch · 30fd049a

由 Rakib Mullick 提交于 1月 24, 2012

Currently we don't utilize the sched_switch field anymore.

But, simply removing sched_switch field from the middle of the
sched_stat output will break tools.

So, to stay compatible we hardcode it to zero and remove the
field from the scheduler data structures.

Update the schedstat documentation accordingly.
Signed-off-by: NRakib Mullick <rakib.mullick@gmail.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1327422836.27181.5.camel@localhost.localdomainSigned-off-by: NIngo Molnar <mingo@elte.hu>

30fd049a

sched: Ensure cpu_power periodic update · 4ec4412e

由 Vincent Guittot 提交于 12月 12, 2011

With a lot of small tasks, the softirq sched is nearly never called
when no_hz is enabled. In this case load_balance() is mainly called
with the newly_idle mode which doesn't update the cpu_power.

Add a next_update field which ensure a maximum update period when
there is short activity.

Having stale cpu_power information can skew the load-balancing
decisions, this is cured by the guaranteed update.
Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1323717668-2143-1-git-send-email-vincent.guittot@linaro.org

4ec4412e

sched, block: Unify cache detection · 39be3501

由 Peter Zijlstra 提交于 1月 26, 2012

The block layer has some code trying to determine if two CPUs share a
cache, the scheduler has a similar function. Expose the function used
by the scheduler and make the block layer use it, thereby removing the
block layers usage of CONFIG_SCHED* and topology bits.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NJens Axboe <axboe@kernel.dk>
Link: http://lkml.kernel.org/r/1327579450.2446.95.camel@twins

39be3501

sched/rt: Fix task stack corruption under __ARCH_WANT_INTERRUPTS_ON_CTXSW · cb297a3e

由 Chanho Min 提交于 1月 05, 2012

This issue happens under the following conditions:

 1. preemption is off
 2. __ARCH_WANT_INTERRUPTS_ON_CTXSW is defined
 3. RT scheduling class
 4. SMP system

Sequence is as follows:

 1.suppose current task is A. start schedule()
 2.task A is enqueued pushable task at the entry of schedule()
   __schedule
    prev = rq->curr;
    ...
    put_prev_task
     put_prev_task_rt
      enqueue_pushable_task
 4.pick the task B as next task.
   next = pick_next_task(rq);
 3.rq->curr set to task B and context_switch is started.
   rq->curr = next;
 4.At the entry of context_swtich, release this cpu's rq->lock.
   context_switch
    prepare_task_switch
     prepare_lock_switch
      raw_spin_unlock_irq(&rq->lock);
 5.Shortly after rq->lock is released, interrupt is occurred and start IRQ context
 6.try_to_wake_up() which called by ISR acquires rq->lock
    try_to_wake_up
     ttwu_remote
      rq = __task_rq_lock(p)
      ttwu_do_wakeup(rq, p, wake_flags);
        task_woken_rt
 7.push_rt_task picks the task A which is enqueued before.
   task_woken_rt
    push_rt_tasks(rq)
     next_task = pick_next_pushable_task(rq)
 8.At find_lock_lowest_rq(), If double_lock_balance() returns 0,
   lowest_rq can be the remote rq.
  (But,If preemption is on, double_lock_balance always return 1 and it
   does't happen.)
   push_rt_task
    find_lock_lowest_rq
     if (double_lock_balance(rq, lowest_rq))..
 9.find_lock_lowest_rq return the available rq. task A is migrated to
   the remote cpu/rq.
   push_rt_task
    ...
    deactivate_task(rq, next_task, 0);
    set_task_cpu(next_task, lowest_rq->cpu);
    activate_task(lowest_rq, next_task, 0);
 10. But, task A is on irq context at this cpu.
     So, task A is scheduled by two cpus at the same time until restore from IRQ.
     Task A's stack is corrupted.

To fix it, don't migrate an RT task if it's still running.
Signed-off-by: NChanho Min <chanho.min@lge.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/CAOAMb1BHA=5fm7KTewYyke6u-8DP0iUuJMpgQw54vNeXFsGpoQ@mail.gmail.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

cb297a3e

sched/nohz: Fix nohz cpu idle load balancing state with cpu hotplug · 71325960

由 Suresh Siddha 提交于 1月 19, 2012

With the recent nohz scheduler changes, rq's nohz flag
'NOHZ_TICK_STOPPED' and its associated state doesn't get cleared
immediately after the cpu exits idle. This gets cleared as part
of the next tick seen on that cpu.

For the cpu offline support, we need to clear this state
manually. Fix it by registering a cpu notifier, which clears the
nohz idle load balance state for this rq explicitly during the
CPU_DYING notification.

There won't be any nohz updates for that cpu, after the
CPU_DYING notification. But lets be extra paranoid and skip
updating the nohz state in the select_nohz_load_balancer() if
the cpu is not in active state anymore.
Reported-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Reviewed-and-tested-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Tested-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1327026538.16150.40.camel@sbsiddha-desk.sc.intel.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

71325960

sched/s390: Fix compile error in sched/core.c · db7e527d

由 Christian Borntraeger 提交于 1月 11, 2012

Commit 029632fb ("sched: Make
separate sched*.c translation units") removed the include of
asm/mutex.h from sched.c.

This breaks the combination of:

 CONFIG_MUTEX_SPIN_ON_OWNER=yes
 CONFIG_HAVE_ARCH_MUTEX_CPU_RELAX=yes

like s390 without mutex debugging:

  CC      kernel/sched/core.o
  kernel/sched/core.c: In function ‘mutex_spin_on_owner’:
  kernel/sched/core.c:3287: error: implicit declaration of function ‘arch_mutex_cpu_relax’

Lets re-add the include to kernel/sched/core.c
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1326268696-30904-1-git-send-email-borntraeger@de.ibm.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

db7e527d

sched: Fix rq->nr_uninterruptible update race · 4ca9b72b

由 Peter Zijlstra 提交于 1月 25, 2012

KOSAKI Motohiro noticed the following race:

 > CPU0                    CPU1
 > --------------------------------------------------------
 > deactivate_task()
 >                         task->state = TASK_UNINTERRUPTIBLE;
 > activate_task()
 >    rq->nr_uninterruptible--;
 >
 >                         schedule()
 >                           deactivate_task()
 >                             rq->nr_uninterruptible++;
 >

Kosaki-San's scenario is possible when CPU0 runs
__sched_setscheduler() against CPU1's current @task.

__sched_setscheduler() does a dequeue/enqueue in order to move
the task to its new queue (position) to reflect the newly provided
scheduling parameters. However it should be completely invariant to
nr_uninterruptible accounting, sched_setscheduler() doesn't affect
readyness to run, merely policy on when to run.

So convert the inappropriate activate/deactivate_task usage to
enqueue/dequeue_task, which avoids the nr_uninterruptible accounting.

Also convert the two other sites: __migrate_task() and
normalize_task() that still use activate/deactivate_task. These sites
aren't really a problem since __migrate_task() will only be called on
non-running task (and therefore are immume to the described problem)
and normalize_task() isn't ever used on regular systems.

Also remove the comments from activate/deactivate_task since they're
misleading at best.
Reported-by: NKOSAKI Motohiro <kosaki.motohiro@gmail.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1327486224.2614.45.camel@laptopSigned-off-by: NIngo Molnar <mingo@elte.hu>

4ca9b72b

24 1月, 2012 1 次提交

kernel-doc: fix kernel-doc warnings in sched · fa757281

由 Randy Dunlap 提交于 1月 21, 2012

Fix new kernel-doc notation warnings:

Warning(include/linux/sched.h:2094): No description found for parameter 'p'
Warning(include/linux/sched.h:2094): Excess function parameter 'tsk' description in 'is_idle_task'
Warning(kernel/sched/cpupri.c:139): No description found for parameter 'newpri'
Warning(kernel/sched/cpupri.c:139): Excess function parameter 'pri' description in 'cpupri_set'
Warning(kernel/sched/cpupri.c:208): Excess function parameter 'bootmem' description in 'cpupri_init'
Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fa757281

12 1月, 2012 1 次提交

sched: Fix lockup by limiting load-balance retries on lock-break · bced76ae

由 Peter Zijlstra 提交于 1月 11, 2012

Eric and David reported dead machines and traced it to commit
a195f004 ("sched: Fix load-balance lock-breaking"), it turns out
there's still a scenario where we can end up re-trying forever.

Since there is no strict forward progress guarantee in the
load-balance iteration we can get stuck re-retrying the same
task-set over and over.

Creating a forward progress guarantee with the existing
structure is somewhat non-trivial, for now simply terminate the
retry loop after a few tries.
Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
Tested-by: NEric Dumazet <eric.dumazet@gmail.com>
Reported-by: NDavid Ahern <dsahern@gmail.com>
[ logic cleanup as suggested by Eric ]
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1326297936.2442.157.camel@twinsSigned-off-by: NIngo Molnar <mingo@elte.hu>

bced76ae

10 1月, 2012 1 次提交

sched: Remove empty #ifdefs · 9b9fb610

由 Hiroshi Shimamoto 提交于 1月 10, 2012

Signed-off-by: NHiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4F0B8525.8070901@ct.jp.nec.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

9b9fb610

24 12月, 2011 1 次提交

sched/tracing: Add a new tracepoint for sleeptime · 1ac9bc69

由 Arun Sharma 提交于 12月 21, 2011

If CONFIG_SCHEDSTATS is defined, the kernel maintains
information about how long the task was sleeping or
in the case of iowait, blocking in the kernel before
getting woken up.

This will be useful for sleep time profiling.

Note: this information is only provided for sched_fair.
Other scheduling classes may choose to provide this in
the future.

Note: the delay includes the time spent on the runqueue
as well.
Signed-off-by: NArun Sharma <asharma@fb.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1324512940-32060-2-git-send-email-asharma@fb.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

1ac9bc69

23 12月, 2011 1 次提交

sched: Disable scheduler warnings during oopses · 664dfa65

由 Dave Jones 提交于 12月 22, 2011

The panic-on-framebuffer code seems to cause a schedule
to occur during an oops. This causes a bunch of extra
spew as can be seen in:

   https://bugzilla.redhat.com/attachment.cgi?id=549230

Don't do scheduler debug checks when we are oopsing already.
Signed-off-by: NDave Jones <davej@redhat.com>
Link: http://lkml.kernel.org/r/20111222213929.GA4722@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

664dfa65

21 12月, 2011 7 次提交

sched: Fix cgroup movement of waking process · 62af3783

由 Daisuke Nishimura 提交于 12月 15, 2011

There is a small race between try_to_wake_up() and sched_move_task(),
which is trying to move the process being woken up.

    try_to_wake_up() on CPU0       sched_move_task() on CPU1
--------------------------------+---------------------------------
  raw_spin_lock_irqsave(p->pi_lock)
  task_waking_fair()
    ->p.se.vruntime -= cfs_rq->min_vruntime
  ttwu_queue()
    ->send reschedule IPI to CPU1
  raw_spin_unlock_irqsave(p->pi_lock)
                                   task_rq_lock()
                                     -> tring to aquire both p->pi_lock and
                                        rq->lock with IRQ disabled
                                   task_move_group_fair()
                                     -> p.se.vruntime
                                          -= (old)cfs_rq->min_vruntime
                                          += (new)cfs_rq->min_vruntime
                                   task_rq_unlock()

                                   (via IPI)
                                   sched_ttwu_pending()
                                     raw_spin_lock(rq->lock)
                                     ttwu_do_activate()
                                       ...
                                       enqueue_entity()
                                         child.se->vruntime += cfs_rq->min_vruntime
                                     raw_spin_unlock(rq->lock)

As a result, vruntime of the process becomes far bigger than min_vruntime,
if (new)cfs_rq->min_vruntime >> (old)cfs_rq->min_vruntime.

This patch fixes this problem by just ignoring such process in
task_move_group_fair(), because the vruntime has already been normalized in
task_waking_fair().
Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20111215143741.df82dd50.nishimura@mxp.nes.nec.co.jpSigned-off-by: NIngo Molnar <mingo@elte.hu>

62af3783

sched: Fix cgroup movement of newly created process · 7ceff013

由 Daisuke Nishimura 提交于 12月 15, 2011

There is a small race between do_fork() and sched_move_task(), which is
trying to move the child.

            do_fork()                 sched_move_task()
--------------------------------+---------------------------------
  copy_process()
    sched_fork()
      task_fork_fair()
        -> vruntime of the child is initialized
           based on that of the parent.
  -> we can see the child in "tasks" file now.
                                    task_rq_lock()
                                    task_move_group_fair()
                                      -> child.se.vruntime
                                           -= (old)cfs_rq->min_vruntime
                                           += (new)cfs_rq->min_vruntime
                                    task_rq_unlock()
  wake_up_new_task()
    ...
    enqueue_entity()
      child.se.vruntime += cfs_rq->min_vruntime

As a result, vruntime of the child becomes far bigger than min_vruntime,
if (new)cfs_rq->min_vruntime >> (old)cfs_rq->min_vruntime.

This patch fixes this problem by just ignoring such process in
task_move_group_fair(), because the vruntime has already been normalized in
task_fork_fair().
Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20111215143607.2ee12c5d.nishimura@mxp.nes.nec.co.jpSigned-off-by: NIngo Molnar <mingo@elte.hu>

7ceff013

sched: Fix cgroup movement of forking process · 4fc420c9

由 Daisuke Nishimura 提交于 12月 15, 2011

There is a small race between task_fork_fair() and sched_move_task(),
which is trying to move the parent.

        task_fork_fair()                 sched_move_task()
--------------------------------+---------------------------------
  cfs_rq = task_cfs_rq(current)
    -> cfs_rq is the "old" one.
  curr = cfs_rq->curr
    -> curr is set to the parent.
                                    task_rq_lock()
                                    dequeue_task()
                                      ->parent.se.vruntime -= (old)cfs_rq->min_vruntime
                                    enqueue_task()
                                      ->parent.se.vruntime += (new)cfs_rq->min_vruntime
                                    task_rq_unlock()
  raw_spin_lock_irqsave(rq->lock)
  se->vruntime = curr->vruntime
    -> vruntime of the child is set to that of the parent
       which has already been updated by sched_move_task().
  se->vruntime -= (old)cfs_rq->min_vruntime.
  raw_spin_unlock_irqrestore(rq->lock)

As a result, vruntime of the child becomes far bigger than expected,
if (new)cfs_rq->min_vruntime >> (old)cfs_rq->min_vruntime.

This patch fixes this problem by setting "cfs_rq" and "curr" after
holding the rq->lock.
Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: NPaul Turner <pjt@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20111215143655.662676b0.nishimura@mxp.nes.nec.co.jpSigned-off-by: NIngo Molnar <mingo@elte.hu>

4fc420c9

sched: Remove cfs bandwidth period check in tg_set_cfs_period() · 11534ec5

由 Kamalesh Babulal 提交于 12月 10, 2011

Remove cfs bandwidth period check from tg_set_cfs_period.
Invalid bandwidth period's lower/upper limits are denoted
by min_cfs_quota_period/max_cfs_quota_period repsectively,
and are checked against valid period in tg_set_cfs_bandwidth().

As pjt pointed out, negative input will result in very large unsigned
numbers and will be caught by the max allowed period test.
Signed-off-by: NKamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Acked-by: NPaul Turner <pjt@google.com>
[ammended changelog to mention negative values]
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20111210135925.GA14593@linux.vnet.ibm.com
--
 kernel/sched/core.c |    3 ---
 1 file changed, 3 deletions(-)
Signed-off-by: NIngo Molnar <mingo@elte.hu>

11534ec5

sched: Fix load-balance lock-breaking · a195f004

由 Peter Zijlstra 提交于 9月 22, 2011

The current lock break relies on contention on the rq locks, something
which might never come because we've got IRQs disabled. Or will be
very likely because on anything with more than 2 cpus a synchronized
load-balance pass will very likely cause contention on the rq locks.

Also the sched_nr_migrate thing fails when it gets trapped the loops
of either the cgroup muck in load_balance_fair() or the move_tasks()
load condition.

Instead, use the new lb_flags field to propagate break/abort
conditions for all these loops and create a new loop outside the irq
disabled on the break being required.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-tsceb6w61q0gakmsccix6xxi@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

a195f004

sched: Replace all_pinned with a generic flags field · 5b54b56b

由 Peter Zijlstra 提交于 9月 22, 2011

Replace the all_pinned argument with a flags field so that we can add
some extra controls throughout that entire call chain.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-33kevm71m924ok1gpxd720v3@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

5b54b56b

sched: Only queue remote wakeups when crossing cache boundaries · 518cd623

由 Peter Zijlstra 提交于 12月 07, 2011

Mike reported a 13% drop in netperf TCP_RR performance due to the
new remote wakeup code. Suresh too noticed some performance issues
with it.

Reducing the IPIs to only cross cache domains solves the observed
performance issues.
Reported-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Reported-by: NMike Galbraith <efault@gmx.de>
Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Acked-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Dave Kleikamp <dave.kleikamp@oracle.com>
Link: http://lkml.kernel.org/r/1323338531.17673.7.camel@twinsSigned-off-by: NIngo Molnar <mingo@elte.hu>

518cd623

16 12月, 2011 1 次提交

sched: Add missing rcu_dereference() around ->real_parent usage · 07cde260

由 Kees Cook 提交于 12月 15, 2011

Wrap another ->real_parent dereference while under rcu_read_lock.
Signed-off-by: NKees Cook <keescook@chromium.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Link: http://lkml.kernel.org/r/20111215164918.GA13003@www.outflux.net
[ tidied up the changelog ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

07cde260