提交 · c6ee36c423c3ed1fb86bb3eabba9fc256a300d16 · openeuler / raspberrypi-kernel

12 3月, 2010 5 次提交

sched: Remove SYNC_WAKEUPS feature · c6ee36c4

由 Mike Galbraith 提交于 3月 11, 2010

Sync wakeups are critical functionality with a long history.  Remove it, we don't
need the branch or icache footprint.
Signed-off-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1268301817.6785.47.camel@marge.simson.net>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c6ee36c4

sched: Cleanup/optimize clock updates · a64692a3

由 Mike Galbraith 提交于 3月 11, 2010

Now that we no longer depend on the clock being updated prior to enqueueing
on migratory wakeup, we can clean up a bit, placing calls to update_rq_clock()
exactly where they are needed, ie on enqueue, dequeue and schedule events.

In the case of a freshly enqueued task immediately preempting, we can skip the
update during preemption, as the clock was just updated by the enqueue event.
We also save an unneeded call during a migratory wakeup by not updating the
previous runqueue, where update_curr() won't be invoked.
Signed-off-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1268301199.6785.32.camel@marge.simson.net>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a64692a3

sched: Remove avg_overlap · e12f31d3

由 Mike Galbraith 提交于 3月 11, 2010

Both avg_overlap and avg_wakeup had an inherent problem in that their accuracy
was detrimentally affected by cross-cpu wakeups, this because we are missing
the necessary call to update_curr(). This can't be fixed without increasing
overhead in our already too fat fastpath.

Additionally, with recent load balancing changes making us prefer to place tasks
in an idle cache domain (which is good for compute bound loads), communicating
tasks suffer when a sync wakeup, which would enable affine placement, is turned
into a non-sync wakeup by SYNC_LESS. With one task on the runqueue, wake_affine()
rejects the affine wakeup request, leaving the unfortunate where placed, taking
frequent cache misses.

Remove it, and recover some fastpath cycles.
Signed-off-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1268301121.6785.30.camel@marge.simson.net>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e12f31d3

sched: Remove avg_wakeup · b42e0c41

由 Mike Galbraith 提交于 3月 11, 2010

Testing the load which led to this heuristic (nfs4 kbuild) shows that it has
outlived it's usefullness.  With intervening load balancing changes, I cannot
see any difference with/without, so recover there fastpath cycles.
Signed-off-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1268301062.6785.29.camel@marge.simson.net>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b42e0c41

sched: Rate-limit nohz · 39c0cbe2

由 Mike Galbraith 提交于 3月 11, 2010

Entering nohz code on every micro-idle is costing ~10% throughput for netperf
TCP_RR when scheduling cross-cpu. Rate limiting entry fixes this, but raises
ticks a bit. On my Q6600, an idle box goes from ~85 interrupts/sec to 128.

The higher the context switch rate, the more nohz entry costs. With this patch
and some cycle recovery patches in my tree, max cross cpu context switch rate is
improved by ~16%, a large portion of which of which is this ratelimiting.
Signed-off-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1268301003.6785.28.camel@marge.simson.net>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

39c0cbe2

11 3月, 2010 2 次提交

sched: Implement group scheduler statistics in one struct · 41acab88

由 Lucas De Marchi 提交于 3月 10, 2010

Put all statistic fields of sched_entity in one struct, sched_statistics,
and embed it into sched_entity.

This change allows to memset the sched_statistics to 0 when needed (for
instance when forking), avoiding bugs of non initialized fields.
Signed-off-by: NLucas De Marchi <lucas.de.marchi@gmail.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1268275065-18542-1-git-send-email-lucas.de.marchi@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

41acab88

sched: Cleanup: remove unused variable in try_to_wake_up() · ab3b3aa5

由 Dan Carpenter 提交于 3月 06, 2010

We haven't used the "orig_rq" variable since
055a0086 "Fix/add missing update_rq_clock() calls"
Signed-off-by: NDan Carpenter <error27@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: efault@gmx.de
LKML-Reference: <20100306111752.GL4958@bicker>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ab3b3aa5

25 2月, 2010 2 次提交

sched: Better name for for_each_domain_rd · 497f0ab3

由 Paul E. McKenney 提交于 2月 22, 2010

As suggested by Peter Ziljstra, make better choice of name
for for_each_domain_rd(), containing "rcu_dereference", given
that it is but a wrapper for rcu_dereference_check().  The name
rcu_dereference_check_sched_domain() does that and provides a
separate per-subsystem name space.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <1266887105-1528-7-git-send-email-paulmck@linux.vnet.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

497f0ab3

sched: Use lockdep-based checking on rcu_dereference() · d11c563d

由 Paul E. McKenney 提交于 2月 22, 2010

Update the rcu_dereference() usages to take advantage of the new
lockdep-based checking.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <1266887105-1528-6-git-send-email-paulmck@linux.vnet.ibm.com>
[ -v2: fix allmodconfig missing symbol export build failure on x86 ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d11c563d

17 2月, 2010 1 次提交

sched: Don't use possibly stale sched_class · 83ab0aa0

由 Thomas Gleixner 提交于 2月 17, 2010

setscheduler() saves task->sched_class outside of the rq->lock held
region for a check after the setscheduler changes have become
effective. That might result in checking a stale value.

rtmutex_setprio() has the same problem, though it is protected by
p->pi_lock against setscheduler(), but for correctness sake (and to
avoid bad examples) it needs to be fixed as well.

Retrieve task->sched_class inside of the rq->lock held region.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: stable@kernel.org

83ab0aa0

16 2月, 2010 2 次提交

sched: Fix race between ttwu() and task_rq_lock() · 0970d299

由 Peter Zijlstra 提交于 2月 15, 2010

Thomas found that due to ttwu() changing a task's cpu without holding
the rq->lock, task_rq_lock() might end up locking the wrong rq.

Avoid this by serializing against TASK_WAKING.
Reported-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1266241712.15770.420.camel@laptop>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

0970d299

sched: Fix SMT scheduler regression in find_busiest_queue() · 9000f05c

由 Suresh Siddha 提交于 2月 12, 2010

Fix a SMT scheduler performance regression that is leading to a scenario
where SMT threads in one core are completely idle while both the SMT threads
in another core (on the same socket) are busy.

This is caused by this commit (with the problematic code highlighted)

   commit bdb94aa5
   Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
   Date:   Tue Sep 1 10:34:38 2009 +0200

   sched: Try to deal with low capacity

   @@ -4203,15 +4223,18 @@ find_busiest_queue()
   ...
	for_each_cpu(i, sched_group_cpus(group)) {
   +	unsigned long power = power_of(i);

   ...

   -	wl = weighted_cpuload(i);
   +	wl = weighted_cpuload(i) * SCHED_LOAD_SCALE;
   +	wl /= power;

   -	if (rq->nr_running == 1 && wl > imbalance)
   +	if (capacity && rq->nr_running == 1 && wl > imbalance)
		continue;

On a SMT system, power of the HT logical cpu will be 589 and
the scheduler load imbalance (for scenarios like the one mentioned above)
can be approximately 1024 (SCHED_LOAD_SCALE). The above change of scaling
the weighted load with the power will result in "wl > imbalance" and
ultimately resulting in find_busiest_queue() return NULL, causing
load_balance() to think that the load is well balanced. But infact
one of the tasks can be moved to the idle core for optimal performance.

We don't need to use the weighted load (wl) scaled by the cpu power to
compare with  imabalance. In that condition, we already know there is only a
single task "rq->nr_running == 1" and the comparison between imbalance,
wl is to make sure that we select the correct priority thread which matches
imbalance. So we really need to compare the imabalnce with the original
weighted load of the cpu and not the scaled load.

But in other conditions where we want the most hammered(busiest) cpu, we can
use scaled load to ensure that we consider the cpu power in addition to the
actual load on that cpu, so that we can move the load away from the
guy that is getting most hammered with respect to the actual capacity,
as compared with the rest of the cpu's in that busiest group.

Fix it.
Reported-by: NMa Ling <ling.ma@intel.com>
Initial-Analysis-by: NZhang, Yanmin <yanmin_zhang@linux.intel.com>
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1266023662.2808.118.camel@sbs-t61.sc.intel.com>
Cc: stable@kernel.org [2.6.32.x]
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

9000f05c

08 2月, 2010 2 次提交

sched: cpuacct: Use bigger percpu counter batch values for stats counters · fa535a77

由 Anton Blanchard 提交于 2月 02, 2010

When CONFIG_VIRT_CPU_ACCOUNTING and CONFIG_CGROUP_CPUACCT are
enabled we can call cpuacct_update_stats with values much larger
than percpu_counter_batch.  This means the call to
percpu_counter_add will always add to the global count which is
protected by a spinlock and we end up with a global spinlock in
the scheduler.

Based on an idea by KOSAKI Motohiro, this patch scales the batch
value by cputime_one_jiffy such that we have the same batch
limit as we would if CONFIG_VIRT_CPU_ACCOUNTING was disabled.
His patch did this once at boot but that initialisation happened
too early on PowerPC (before time_init) and it was never updated
at runtime as a result of a hotplug cpu add/remove.

This patch instead scales percpu_counter_batch by
cputime_one_jiffy at runtime, which keeps the batch correct even
after cpu hotplug operations.  We cap it at INT_MAX in case of
overflow.

For architectures that do not support
CONFIG_VIRT_CPU_ACCOUNTING, cputime_one_jiffy is the constant 1
and gcc is smart enough to optimise min(s32
percpu_counter_batch, INT_MAX) to just percpu_counter_batch at
least on x86 and PowerPC.  So there is no need to add an #ifdef.

On a 64 thread PowerPC box with CONFIG_VIRT_CPU_ACCOUNTING and
CONFIG_CGROUP_CPUACCT enabled, a context switch microbenchmark
is 234x faster and almost matches a CONFIG_CGROUP_CPUACCT
disabled kernel:

 CONFIG_CGROUP_CPUACCT disabled:   16906698 ctx switches/sec
 CONFIG_CGROUP_CPUACCT enabled:       61720 ctx switches/sec
 CONFIG_CGROUP_CPUACCT + patch:	   16663217 ctx switches/sec

Tested with:

 wget http://ozlabs.org/~anton/junkcode/context_switch.c
 make context_switch
 for i in `seq 0 63`; do taskset -c $i ./context_switch & done
 vmstat 1
Signed-off-by: NAnton Blanchard <anton@samba.org>
Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Tested-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

fa535a77

kernel/sched.c: Suppress unused var warning · 50200df4

由 Andrew Morton 提交于 2月 02, 2010

On UP:

 kernel/sched.c: In function 'wake_up_new_task':
 kernel/sched.c:2631: warning: unused variable 'cpu'
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

50200df4

04 2月, 2010 1 次提交

sched: Remove member rt_se from struct rt_rq · 23577256

由 Yong Zhang 提交于 1月 29, 2010

It's a duplicate of tg->rt_se[cpu] and the only usage is
sched_rt_rq_dequeue() and sched_rt_rq_enqueue(). After the
first patch to those two function. rt_se can be removed.
Signed-off-by: NYong Zhang <yong.zhang0@gmail.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <2674af741001282258q38781619u653ca4a7dd267347@mail.gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

23577256

02 2月, 2010 1 次提交

sched: Remove unused update_shares_locked() · 4a461c85

由 Peter Zijlstra 提交于 2月 01, 2010

Commit f492e12e ("sched: Remove
load_balance_newidle()") removed the only user of this function,
so remove it too.
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1265019219.24455.128.camel@laptop>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

4a461c85

23 1月, 2010 2 次提交

sched: Queue a deboosted task to the head of the RT prio queue · 60db48ca

由 Thomas Gleixner 提交于 1月 20, 2010

rtmutex_set_prio() is used to implement priority inheritance for
futexes. When a task is deboosted it gets enqueued at the tail of its
RT priority list. This is violating the POSIX scheduling semantics:

rt priority list X contains two runnable tasks A and B

task A	 runs with priority X and holds mutex M
task C	 preempts A and is blocked on mutex M 
     	 -> task A is boosted to priority of task C (Y)
task A	 unlocks the mutex M and deboosts itself
     	 -> A is dequeued from rt priority list Y
	 -> A is enqueued to the tail of rt priority list X
task C	 schedules away
task B	 runs

This is wrong as task A did not schedule away and therefor violates
the POSIX scheduling semantics.

Enqueue the task to the head of the priority list instead. 
Reported-by: NMathias Weber <mathias.weber.mw1@roche.com>
Reported-by: NCarsten Emde <cbe@osadl.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Tested-by: NCarsten Emde <cbe@osadl.org>
Tested-by: NMathias Weber <mathias.weber.mw1@roche.com>
LKML-Reference: <20100120171629.809074113@linutronix.de>

60db48ca

sched: Extend enqueue_task to allow head queueing · ea87bb78

由 Thomas Gleixner 提交于 1月 20, 2010

The ability of enqueueing a task to the head of a SCHED_FIFO priority
list is required to fix some violations of POSIX scheduling policy.

Extend the related functions with a "head" argument.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Tested-by: NCarsten Emde <cbe@osadl.org>
Tested-by: NMathias Weber <mathias.weber.mw1@roche.com>
LKML-Reference: <20100120171629.734886007@linutronix.de>

ea87bb78

22 1月, 2010 1 次提交

sched: Fix fork vs hotplug vs cpuset namespaces · fabf318e

由 Peter Zijlstra 提交于 1月 21, 2010

There are a number of issues:

1) TASK_WAKING vs cgroup_clone (cpusets)

copy_process():

  sched_fork()
    child->state = TASK_WAKING; /* waiting for wake_up_new_task() */
  if (current->nsproxy != p->nsproxy)
     ns_cgroup_clone()
       cgroup_clone()
         mutex_lock(inode->i_mutex)
         mutex_lock(cgroup_mutex)
         cgroup_attach_task()
	   ss->can_attach()
           ss->attach() [ -> cpuset_attach() ]
             cpuset_attach_task()
               set_cpus_allowed_ptr();
                 while (child->state == TASK_WAKING)
                   cpu_relax();
will deadlock the system.


2) cgroup_clone (cpusets) vs copy_process

So even if the above would work we still have:

copy_process():

  if (current->nsproxy != p->nsproxy)
     ns_cgroup_clone()
       cgroup_clone()
         mutex_lock(inode->i_mutex)
         mutex_lock(cgroup_mutex)
         cgroup_attach_task()
	   ss->can_attach()
           ss->attach() [ -> cpuset_attach() ]
             cpuset_attach_task()
               set_cpus_allowed_ptr();
  ...

  p->cpus_allowed = current->cpus_allowed

over-writing the modified cpus_allowed.


3) fork() vs hotplug

  if we unplug the child's cpu after the sanity check when the child
  gets attached to the task_list but before wake_up_new_task() shit
  will meet with fan.

Solve all these issues by moving fork cpu selection into
wake_up_new_task().
Reported-by: NSerge E. Hallyn <serue@us.ibm.com>
Tested-by: NSerge E. Hallyn <serue@us.ibm.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1264106190.4283.1314.camel@laptop>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

fabf318e

21 1月, 2010 4 次提交

sched: Remove USER_SCHED · 7c941438

由 Dhaval Giani 提交于 1月 20, 2010

Remove the USER_SCHED feature. It has been scheduled to be removed in
2.6.34 as per http://marc.info/?l=linux-kernel&m=125728479022976&w=2Signed-off-by: NDhaval Giani <dhaval.giani@gmail.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1263990378.24844.3.camel@localhost>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7c941438

sched: Remove the sched_class load_balance methods · 3d45fd80

由 Peter Zijlstra 提交于 12月 17, 2009

Take out the sched_class methods for load-balancing.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3d45fd80

sched: Move load balance code into sched_fair.c · 1e3c88bd

由 Peter Zijlstra 提交于 12月 17, 2009

Straight fwd code movement.

Since non of the load-balance abstractions are used anymore, do away with
them and simplify the code some. In preparation move the code around.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

1e3c88bd

sched: Reassign prev and switch_count when reacquire_kernel_lock() fail · 6d558c3a

由 Yong Zhang 提交于 1月 11, 2010

Assume A->B schedule is processing, if B have acquired BKL before and it
need reschedule this time. Then on B's context, it will go to
need_resched_nonpreemptible for reschedule. But at this time, prev and
switch_count are related to A. It's wrong and will lead to incorrect
scheduler statistics.
Signed-off-by: NYong Zhang <yong.zhang0@gmail.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <2674af741001102238w7b0ddcadref00d345e2181d11@mail.gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

6d558c3a

13 1月, 2010 1 次提交

sched/perf: Make sure irqs are disabled for perf_event_task_sched_in() · 8381f65d

由 Jamie Iles 提交于 1月 08, 2010

perf_event_task_sched_in() expects interrupts to be disabled,
but on architectures with __ARCH_WANT_INTERRUPTS_ON_CTXSW
defined, this isn't true. If this is defined, disable irqs
around the call in finish_task_switch().
Signed-off-by: NJamie Iles <jamie.iles@picochip.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
LKML-Reference: <1262964453-27370-1-git-send-email-jamie.iles@picochip.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

8381f65d

28 12月, 2009 2 次提交

sched: might_sleep(): Make file parameter const char * · d894837f

由 Simon Kagstrom 提交于 12月 23, 2009

Fixes a warning when building with g++:

 warning: deprecated conversion from string constant to 'char*'

And the file parameter use is constant, so mark it as such.
Signed-off-by: NSimon Kagstrom <simon.kagstrom@netinsight.net>
Cc: peterz@infradead.org
LKML-Reference: <20091223110818.442d848e@marrow.netinsight.se>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d894837f

perf events: Remove arg from perf sched hooks · 49f47433

由 Peter Zijlstra 提交于 12月 27, 2009

Since we only ever schedule the local cpu, there is no need to pass the
cpu number to the perf sched hooks.

This micro-optimizes things a bit.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

49f47433

23 12月, 2009 1 次提交

sched: Revert , simplify set_task_cpu() · 0c69774e

由 Peter Zijlstra 提交于 12月 22, 2009

Effectively reverts 738d2be4.

As demonstrated by Eric, we really need to call __set_task_cpu()
early in the fork() path to properly initialize the various task
state -- specifically the cgroup state through set_task_rq().

[ we could probably fix this by explicitly calling
  __set_task_cpu() from   sched_fork(), but lets try that for the
  next cycle and simply revert to the old behaviour for now. ]
Reported-by: NEric Paris <eparis@redhat.com>
Tested-by: Eric Paris <eparis@redhat.com>,
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: efault@gmx.de
LKML-Reference: <1261492999.4937.36.camel@laptop>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

0c69774e

21 12月, 2009 2 次提交

sched: Fix hotplug hang · 70f11205

由 Peter Zijlstra 提交于 12月 20, 2009

The hot-unplug kstopmachine usage does a wakeup after
deactivating the cpu, hence we cannot use cpu_active()
here but must rely on the good olde online.
Reported-by: NSachin Sant <sachinp@in.ibm.com>
Reported-by: NJens Axboe <jens.axboe@oracle.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: NJens Axboe <jens.axboe@oracle.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
LKML-Reference: <1261326987.4314.24.camel@laptop>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

70f11205

sched: Restore printk sanity · 3df0fc5b

由 Peter Zijlstra 提交于 12月 20, 2009

Revert the braindead pr_* crap. (Commit 663997d4 "sched: Use
pr_fmt() and pr_<level>()")

It's dumb and causes stupid "sched: " strings all over the place.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NMike Galbraith <efault@gmx.de>
Cc: Joe Perches <joe@perches.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <1261315437.4314.6.camel@laptop>
[ i dont mind the pr_*() patterns that much - but Peter dislikes them with a vengence. ]
[ - v2: remove spurious diffstat from changelog :-/ ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3df0fc5b

17 12月, 2009 11 次提交

sched: Fix broken assertion · 077614ee

由 Peter Zijlstra 提交于 12月 17, 2009

There's a preemption race in the set_task_cpu() debug check in
that when we get preempted after setting task->state we'd still
be on the rq proper, but fail the test.

Check for preempted tasks, since those are always on the RQ.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20091217121830.137155561@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

077614ee

sched: Teach might_sleep() about preemptible RCU · 234da7bc

由 Frederic Weisbecker 提交于 12月 16, 2009

In practice, it is harmless to voluntarily sleep in a
rcu_read_lock() section if we are running under preempt rcu, but
it is illegal if we build a kernel running non-preemptable rcu.

Currently, might_sleep() doesn't notice sleepable operations
under rcu_read_lock() sections if we are running under
preemptable rcu because preempt_count() is left untouched after
rcu_read_lock() in this case. But we want developers who test
their changes under such config to notice the "sleeping while
atomic" issues.

So we add rcu_read_lock_nesting to prempt_count() in
might_sleep() checks.

[ v2: Handle rcu-tiny ]
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1260991265-8451-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

234da7bc

sched: Make warning less noisy · 416eb395

由 Ingo Molnar 提交于 12月 17, 2009

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091216170517.807938893@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

416eb395

sched: Simplify set_task_cpu() · 738d2be4

由 Peter Zijlstra 提交于 12月 16, 2009

Rearrange code a bit now that its a simpler function.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091216170518.269101883@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

738d2be4

sched: Remove the cfs_rq dependency from set_task_cpu() · 88ec22d3

由 Peter Zijlstra 提交于 12月 16, 2009

In order to remove the cfs_rq dependency from set_task_cpu() we
need to ensure the task is cfs_rq invariant for all callsites.

The simple approach is to substract cfs_rq->min_vruntime from
se->vruntime on dequeue, and add cfs_rq->min_vruntime on
enqueue.

However, this has the downside of breaking FAIR_SLEEPERS since
we loose the old vruntime as we only maintain the relative
position.

To solve this, we observe that we only migrate runnable tasks,
we do this using deactivate_task(.sleep=0) and
activate_task(.wakeup=0), therefore we can restrain the
min_vruntime invariance to that state.

The only other case is wakeup balancing, since we want to
maintain the old vruntime we cannot make it relative on dequeue,
but since we don't migrate inactive tasks, we can do so right
before we activate it again.

This is where we need the new pre-wakeup hook, we need to call
this while still holding the old rq->lock. We could fold it into
->select_task_rq(), but since that has multiple callsites and
would obfuscate the locking requirements, that seems like a
fudge.

This leaves the fork() case, simply make sure that ->task_fork()
leaves the ->vruntime in a relative state.

This covers all cases where set_task_cpu() gets called, and
ensures it sees a relative vruntime.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091216170518.191697025@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

88ec22d3

sched: Add pre and post wakeup hooks · efbbd05a

由 Peter Zijlstra 提交于 12月 16, 2009

As will be apparent in the next patch, we need a pre wakeup hook
for sched_fair task migration, hence rename the post wakeup hook
and one pre wakeup.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091216170518.114746117@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

efbbd05a

sched: Move kthread_bind() back to kthread.c · 881232b7

由 Peter Zijlstra 提交于 12月 16, 2009

Since kthread_bind() lost its dependencies on sched.c, move it
back where it came from.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091216170518.039524041@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

881232b7

sched: Fix select_task_rq() vs hotplug issues · 5da9a0fb

由 Peter Zijlstra 提交于 12月 16, 2009

Since select_task_rq() is now responsible for guaranteeing
->cpus_allowed and cpu_active_mask, we need to verify this.

select_task_rq_rt() can blindly return
smp_processor_id()/task_cpu() without checking the valid masks,
select_task_rq_fair() can do the same in the rare case that all
SD_flags are disabled.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091216170517.961475466@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

5da9a0fb

sched: Fix sched_exec() balancing · 38022906

由 Peter Zijlstra 提交于 12月 16, 2009

Since we access ->cpus_allowed without holding rq->lock we need
a retry loop to validate the result, this comes for near free
when we merge sched_migrate_task() into sched_exec() since that
already does the needed check.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091216170517.884743662@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

38022906

sched: Ensure set_task_cpu() is never called on blocked tasks · e2912009

由 Peter Zijlstra 提交于 12月 16, 2009

In order to clean up the set_task_cpu() rq dependencies we need
to ensure it is never called on blocked tasks because such usage
does not pair with consistent rq->lock usage.

This puts the migration burden on ttwu().

Furthermore we need to close a race against changing
->cpus_allowed, since select_task_rq() runs with only preemption
disabled.

For sched_fork() this is safe because the child isn't in the
tasklist yet, for wakeup we fix this by synchronizing
set_cpus_allowed_ptr() against TASK_WAKING, which leaves
sched_exec to be a problem

This also closes a hole in (6ad4c188 sched: Fix balance vs
hotplug race) where ->select_task_rq() doesn't validate the
result against the sched_domain/root_domain.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091216170517.807938893@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e2912009

sched: Use TASK_WAKING for fork wakups · 06b83b5f

由 Peter Zijlstra 提交于 12月 16, 2009

For later convenience use TASK_WAKING for fresh tasks.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091216170517.732561278@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

06b83b5f