提交 · 4d2732c63e0c05cfef2a74868d08eace922dfc3e · gsplhtlxg / clone-Linux

25 11月, 2008 10 次提交

sched: convert idle_balance() to cpumask_var_t. · 4d2732c6

由 Rusty Russell 提交于 11月 25, 2008

Impact: stack usage reduction

Dynamically allocating cpumasks (when CONFIG_CPUMASK_OFFSTACK) saves
space in the stack.  cpumask_var_t is just a struct cpumask for
!CONFIG_CPUMASK_OFFSTACK.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

4d2732c6

sched: convert nohz struct to cpumask_var_t. · 7d1e6a9b

由 Rusty Russell 提交于 11月 25, 2008

Impact: (future) size reduction for large NR_CPUS.

Dynamically allocating cpumasks (when CONFIG_CPUMASK_OFFSTACK) saves
space for small nr_cpu_ids but big CONFIG_NR_CPUS.  cpumask_var_t
is just a struct cpumask for !CONFIG_CPUMASK_OFFSTACK.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7d1e6a9b

sched: convert struct root_domain to cpumask_var_t. · c6c4927b

由 Rusty Russell 提交于 11月 25, 2008

Impact: (future) size reduction for large NR_CPUS.

Dynamically allocating cpumasks (when CONFIG_CPUMASK_OFFSTACK) saves
space for small nr_cpu_ids but big CONFIG_NR_CPUS.  cpumask_var_t
is just a struct cpumask for !CONFIG_CPUMASK_OFFSTACK.

def_root_domain is static, and so its masks are initialized with
alloc_bootmem_cpumask_var.  After that, alloc_cpumask_var is used.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c6c4927b

sched: convert nohz_cpu_mask to cpumask_var_t. · 6a7b3dc3

由 Rusty Russell 提交于 11月 25, 2008

Impact: (future) size reduction for large NR_CPUS.

Dynamically allocating cpumasks (when CONFIG_CPUMASK_OFFSTACK) saves
space for small nr_cpu_ids but big CONFIG_NR_CPUS.  cpumask_var_t
is just a struct cpumask for !CONFIG_CPUMASK_OFFSTACK.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

6a7b3dc3

sched: convert struct sched_group/sched_domain cpumask_ts to variable bitmaps · 6c99e9ad

由 Rusty Russell 提交于 11月 25, 2008

Impact: (future) size reduction for large NR_CPUS.

We move the 'cpumask' member of sched_group to the end, so when we
kmalloc it we can do a minimal allocation: saves space for small
nr_cpu_ids but big CONFIG_NR_CPUS.  Similar trick for 'span' in
sched_domain.

This isn't quite as good as converting to a cpumask_var_t, as some
sched_groups are actually static, but it's safer: we don't have to
figure out where to call alloc_cpumask_var/free_cpumask_var.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

6c99e9ad

sched: wrap sched_group and sched_domain cpumask accesses. · 758b2cdc

由 Rusty Russell 提交于 11月 25, 2008

Impact: trivial wrap of member accesses

This eases the transition in the next patch.

We also get rid of a temporary cpumask in find_idlest_cpu() thanks to
for_each_cpu_and, and sched_balance_self() due to getting weight before
setting sd to NULL.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

758b2cdc

sched: remove any_online_cpu() · 1e5ce4f4

由 Rusty Russell 提交于 11月 25, 2008

Impact: use new API

any_online_cpu() is a good name, but it takes a cpumask_t, not a
pointer.

There are several places where any_online_cpu() doesn't really want a
mask arg at all.  Replace all callers with cpumask_any() and
cpumask_any_and().
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NMike Travis <travis@sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

1e5ce4f4

sched: get rid of boutique sched.c allocations, use cpumask_var_t. · 3404c8d9

由 Rusty Russell 提交于 11月 25, 2008

Impact: use new general API

Using lots of allocs rather than one big alloc is less efficient, but
who cares for this setup function?
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NMike Travis <travis@sgi.com>
Acked-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3404c8d9

sched: convert sched.c from for_each_cpu_mask to for_each_cpu. · abcd083a

由 Rusty Russell 提交于 11月 25, 2008

Impact: trivial API conversion

This is a simple conversion, but note that for_each_cpu() terminates
with i >= nr_cpu_ids, not i == NR_CPUS like for_each_cpu_mask() did.

I don't convert all of them: sd->span changes in a later patch, so
change those iterators there rather than here.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

abcd083a

sched: reduce stack size requirements in kernel/sched.c · ea6f18ed

由 Mike Travis 提交于 11月 25, 2008

Impact: cleanup

  * use node_to_cpumask_ptr in place of node_to_cpumask to reduce stack
    requirements in sched.c
Signed-off-by: NMike Travis <travis@sgi.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ea6f18ed

23 11月, 2008 2 次提交

tracing/function-return-tracer: clean up task start/exit callbacks · 82f60f0b

由 Ingo Molnar 提交于 11月 23, 2008

Impact: cleanup

Eliminate #ifdefs in core code by using empty inline functions.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

82f60f0b

tracing/function-return-tracer: store return stack into task_struct and allocate it dynamically · f201ae23

由 Frederic Weisbecker 提交于 11月 23, 2008

Impact: use deeper function tracing depth safely

Some tests showed that function return tracing needed a more deeper depth
of function calls. But it could be unsafe to store these return addresses
to the stack.

So these arrays will now be allocated dynamically into task_struct of current
only when the tracer is activated.

Typical scheme when tracer is activated:
- allocate a return stack for each task in global list.
- fork: allocate the return stack for the newly created task
- exit: free return stack of current
- idle init: same as fork

I chose a default depth of 50. I don't have overruns anymore.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f201ae23

21 11月, 2008 1 次提交

sched: update comment for move_task_off_dead_cpu · 957ad016

由 Vegard Nossum 提交于 11月 21, 2008

Impact: cleanup

This commit:

commit f7b4cddc
Author: Oleg Nesterov <oleg@tv-sign.ru>
Date:   Tue Oct 16 23:30:56 2007 -0700

    do CPU_DEAD migrating under read_lock(tasklist) instead of write_lock_irq(ta

    Currently move_task_off_dead_cpu() is called under
    write_lock_irq(tasklist).  This means it can't use task_lock() which is
    needed to improve migrating to take task's ->cpuset into account.

    Change the code to call move_task_off_dead_cpu() with irqs enabled, and
    change migrate_live_tasks() to use read_lock(tasklist).

...forgot to update the comment in front of move_task_off_dead_cpu.

Reference: http://lkml.org/lkml/2008/6/23/135Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

957ad016

20 11月, 2008 1 次提交

sched: fix inconsistency when redistribute per-cpu tg->cfs_rq shares · ec4e0e2f

由 Ken Chen 提交于 11月 18, 2008

Impact: make load-balancing more consistent

In the update_shares() path leading to tg_shares_up(), the calculation of
per-cpu cfs_rq shares is rather erratic even under moderate task wake up
rate.  The problem is that the per-cpu tg->cfs_rq load weight used in the
sd_rq_weight aggregation and actual redistribution of the cfs_rq->shares
are collected at different time.  Under moderate system load, we've seen
quite a bit of variation on the cfs_rq->shares and ultimately wildly
affects sched_entity's load weight.

This patch caches the result of initial per-cpu load weight when doing the
sum calculation, and then pass it down to update_group_shares_cpu() for
redistributing per-cpu cfs_rq shares.  This allows consistent total cfs_rq
shares across all CPUs. It also simplifies the rounding and zero load
weight check.
Signed-off-by: NKen Chen <kenchen@google.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ec4e0e2f

18 11月, 2008 1 次提交

cpuset: fix regression when failed to generate sched domains · 700018e0

由 Li Zefan 提交于 11月 18, 2008

Impact: properly rebuild sched-domains on kmalloc() failure

When cpuset failed to generate sched domains due to kmalloc()
failure, the scheduler should fallback to the single partition
'fallback_doms' and rebuild sched domains, but now it only
destroys but not rebuilds sched domains.

The regression was introduced by:

| commit dfb512ec
| Author: Max Krasnyansky <maxk@qualcomm.com>
| Date:   Fri Aug 29 13:11:41 2008 -0700
|
|    sched: arch_reinit_sched_domains() must destroy domains to force rebuild

After the above commit, partition_sched_domains(0, NULL, NULL) will
only destroy sched domains and partition_sched_domains(1, NULL, NULL)
will create the default sched domain.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: <stable@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

700018e0

17 11月, 2008 1 次提交

account_steal_time: kill the unneeded account_group_system_time() · 74fcd524

由 Oleg Nesterov 提交于 11月 17, 2008

Impact: remove unnecessary accounting call

I don't actually understand account_steal_time() and I failed to find the
commit which added account_group_system_time(), but this looks bogus.
In any case rq->idle must be single-threaded, so it can't have ->totals.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

74fcd524

16 11月, 2008 1 次提交

tracepoints: add DECLARE_TRACE() and DEFINE_TRACE() · 7e066fb8

由 Mathieu Desnoyers 提交于 11月 14, 2008

Impact: API *CHANGE*. Must update all tracepoint users.

Add DEFINE_TRACE() to tracepoints to let them declare the tracepoint
structure in a single spot for all the kernel. It helps reducing memory
consumption, especially when declaring a lot of tracepoints, e.g. for
kmalloc tracing.

*API CHANGE WARNING*: now, DECLARE_TRACE() must be used in headers for
tracepoint declarations rather than DEFINE_TRACE(). This is the sane way
to do it. The name previously used was misleading.

Updates scheduler instrumentation to follow this API change.
Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7e066fb8

13 11月, 2008 1 次提交

sched: fix init_idle()'s use of sched_clock() · 5cbd54ef

由 Ingo Molnar 提交于 11月 12, 2008

Maciej Rutecki reported:

> I have this bug during suspend to disk:
>
> [  188.592151] Enabling non-boot CPUs ...
> [  188.592151] SMP alternatives: switching to SMP code
> [  188.666058] BUG: using smp_processor_id() in preemptible
> [00000000]
> code: suspend_to_disk/2934
> [  188.666064] caller is native_sched_clock+0x2b/0x80

Which, as noted by Linus, was caused by me, via:

  7cbaef9c "sched: optimize sched_clock() a bit"

Move the rq locking a bit earlier in the initialization sequence,
that will make the sched_clock() call in init_idle() non-preemptible.
Reported-by: NMaciej Rutecki <maciej.rutecki@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

5cbd54ef

12 11月, 2008 1 次提交

sched: fix stale value in average load per task · a2d47777

由 Balbir Singh 提交于 11月 12, 2008

Impact: fix load balancer load average calculation accuracy

cpu_avg_load_per_task() returns a stale value when nr_running is 0.
It returns an older stale (caculated when nr_running was non zero) value.

This patch returns and sets rq->avg_load_per_task to zero when nr_running
is 0.

Compile and boot tested on a x86_64 box.
Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a2d47777

11 11月, 2008 2 次提交

sched: add hierarchical accounting to cpu accounting controller · 934352f2

由 Bharata B Rao 提交于 11月 10, 2008

Impact: improve CPU time accounting of tasks under the cpu accounting controller

Add hierarchical accounting to cpu accounting controller and include
cpuacct documentation.

Currently, while charging the task's cputime to its accounting group,
the accounting group hierarchy isn't updated. This patch charges the cputime
of a task to its accounting group and all its parent accounting groups.
Reported-by: NSrivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: NBharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: NPaul Menage <menage@google.com>
Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

934352f2

fix for account_group_exec_runtime(), make sure ->signal can't be freed under rq->lock · ad474cac

由 Oleg Nesterov 提交于 11月 10, 2008

Impact: fix hang/crash on ia64 under high load

This is ugly, but the simplest patch by far.

Unlike other similar routines, account_group_exec_runtime() could be
called "implicitly" from within scheduler after exit_notify(). This
means we can race with the parent doing release_task(), we can't just
check ->signal != NULL.

Change __exit_signal() to do spin_unlock_wait(&task_rq(tsk)->lock)
before __cleanup_signal() to make sure ->signal can't be freed under
task_rq(tsk)->lock. Note that task_rq_unlock_wait() doesn't care
about the case when tsk changes cpu/rq under us, this should be OK.

Thanks to Ingo who nacked my previous buggy patch.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Reported-by: NDoug Chapman <doug.chapman@hp.com>

ad474cac

10 11月, 2008 1 次提交

sched: clean up debug info · 5ac5c4d6

由 Peter Zijlstra 提交于 11月 10, 2008

Impact: clean up and fix debug info printout

While looking over the sched_debug code I noticed that we printed the rq
schedstats for every cfs_rq, ammend this.

Also change nr_spead_over into an int, and fix a little buglet in
min_vruntime printing.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

5ac5c4d6

07 11月, 2008 4 次提交

sched: clean up SCHED_CPUMASK_ALLOC · 6d21cd62

由 Li Zefan 提交于 11月 07, 2008

Impact: cleanup

The #if/#endif is ugly. Change SCHED_CPUMASK_ALLOC and
SCHED_CPUMASK_FREE to static inline functions.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

6d21cd62

sched: fix memory leak in a failure path · ca3273f9

由 Li Zefan 提交于 11月 07, 2008

Impact: fix rare memory leak in the sched-domains manual reconfiguration code

In the failure path, rd is not attached to a sched domain,
so it causes a leak.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ca3273f9

sched: fix a bug in sched domain degenerate · f29c9b1c

由 Li Zefan 提交于 11月 06, 2008

Impact: re-add incorrectly eliminated sched domain layers

(1) on i386 with SCHED_SMT and SCHED_MC enabled
	# mount -t cgroup -o cpuset xxx /mnt
	# echo 0 > /mnt/cpuset.sched_load_balance
	# mkdir /mnt/0
	# echo 0 > /mnt/0/cpuset.cpus
	# dmesg
	CPU0 attaching sched-domain:
	 domain 0: span 0 level CPU
	  groups: 0

(2) on i386 with SCHED_MC enabled but SCHED_SMT disabled
	# same with (1)
	# dmesg
	CPU0 attaching NULL sched-domain.

The bug is that some sched domains may be skipped unintentionally when
degenerating (optimizing) sched domains.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f29c9b1c

sched, lockdep: inline double_unlock_balance() · cf7f8690

由 Sripathi Kodi 提交于 11月 05, 2008

We have a test case which measures the variation in the amount of time
needed to perform a fixed amount of work on the preempt_rt kernel. We
started seeing deterioration in it's performance recently. The test
should never take more than 10 microseconds, but we started 5-10%
failure rate.

Using elimination method, we traced the problem to commit
1b12bbc7 (lockdep: re-annotate
scheduler runqueues).

When LOCKDEP is disabled, this patch only adds an additional function
call to double_unlock_balance(). Hence I inlined double_unlock_balance()
and the problem went away. Here is a patch to make this change.
Signed-off-by: NSripathi Kodi <sripathik@in.ibm.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

cf7f8690

05 11月, 2008 1 次提交

sched: backward looking buddy · 4793241b

由 Peter Zijlstra 提交于 11月 04, 2008

Impact: improve/change/fix wakeup-buddy scheduling

Currently we only have a forward looking buddy, that is, we prefer to
schedule to the task we last woke up, under the presumption that its
going to consume the data we just produced, and therefore will have
cache hot benefits.

This allows co-waking producer/consumer task pairs to run ahead of the
pack for a little while, keeping their cache warm. Without this, we
would interleave all pairs, utterly trashing the cache.

This patch introduces a backward looking buddy, that is, suppose that
in the above scenario, the consumer preempts the producer before it
can go to sleep, we will therefore miss the wakeup from consumer to
producer (its already running, after all), breaking the cycle and
reverting to the cache-trashing interleaved schedule pattern.

The backward buddy will try to schedule back to the task that woke us
up in case the forward buddy is not available, under the assumption
that the last task will be the one with the most cache hot task around
barring current.

This will basically allow a task to continue after it got preempted.

In order to avoid starvation, we allow either buddy to get wakeup_gran
ahead of the pack.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

4793241b

04 11月, 2008 3 次提交

sched: add sanity check in partition_sched_domains() · faa2f98f

由 Li Zefan 提交于 11月 04, 2008

Impact: cleanup, add debug check

It's wrong to make dattr_new = NULL if doms_new == NULL, it introduces
memory leak if dattr_new != NULL. Fortunately dattr_new is always NULL
in this case. So remove the code and add a sanity check.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

faa2f98f

sched: remove redundant call to unregister_sched_domain_sysctl() · a17e2260

由 Li Zefan 提交于 11月 04, 2008

Impact: cleanup

The sysctl has been unregistered by partition_sched_domains().
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a17e2260

sched debug: remove sd_level_to_string() · eefd796a

由 Li Zefan 提交于 11月 04, 2008

Impact: cleanup

Just use the newly introduced sd->name.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

eefd796a

30 10月, 2008 1 次提交

sched: switch sched_features to seqfile · 34f3a814

由 Li Zefan 提交于 10月 30, 2008

Impact: cleanup

So handling of sched_features read is simplified.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

34f3a814

29 10月, 2008 1 次提交

sched: cleanup for alloc_rt/fair_sched_group() · eab17229

由 Li Zefan 提交于 10月 29, 2008

Impact: cleanup

Remove checking parent == NULL. It won't be NULLL, because we dynamically
create sub task_group only, and sub task_group always has its parent.
(root task_group is statically defined)

Also replace kmalloc_node(GFP_ZERO) with kzalloc_node().
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

eab17229

24 10月, 2008 2 次提交

sched: virtual time buddy preemption · 3f3a4904

由 Peter Zijlstra 提交于 10月 24, 2008

Since we moved wakeup preemption back to virtual time, it makes sense to move
the buddy stuff back as well. The purpose of the buddy scheduling is to allow
a quickly scheduling pair of tasks to run away from the group as far as a
regular busy task would be allowed under wakeup preemption.

This has the advantage that the pair can ping-pong for a while, enjoying
cache-hotness. Without buddy scheduling other tasks would interleave destroying
the cache.

Also, it saves a word in cfs_rq.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3f3a4904

sched: fix a find_busiest_group buglet · 01c8c57d

由 Peter Zijlstra 提交于 10月 24, 2008

In one of the group load balancer patches:

	commit 408ed066
	Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
	Date:   Fri Jun 27 13:41:28 2008 +0200
	Subject: sched: hierarchical load vs find_busiest_group

The following change:

-               if (max_load - this_load + SCHED_LOAD_SCALE_FUZZ >=
+               if (max_load - this_load + 2*busiest_load_per_task >=
                                        busiest_load_per_task * imbn) {

made the condition always true, because imbn is [1,2].
Therefore, remove the 2*, and give the it a fair chance.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

01c8c57d

23 10月, 2008 1 次提交
- A
  proc: move /proc/schedstat boilerplate to kernel/sched_stats.h · b5aadf7f
  由 Alexey Dobriyan 提交于 10月 06, 2008
```
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
```
  b5aadf7f
20 10月, 2008 1 次提交

sched: optimize group load balancer · ffda12a1

由 Peter Zijlstra 提交于 10月 17, 2008

I noticed that tg_shares_up() unconditionally takes rq-locks for all cpus
in the sched_domain. This hurts.

We need the rq-locks whenever we change the weight of the per-cpu group sched
entities. To allevate this a little, only change the weight when the new
weight is at least shares_thresh away from the old value.

This avoids the rq-lock for the top level entries, since those will never
be re-weighted, and fuzzes the lower level entries a little to gain performance
in semi-stable situations.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ffda12a1

16 10月, 2008 3 次提交

x86: move kstat_irqs from kstat to irq_desc · 7f95ec9e

由 Yinghai Lu 提交于 8月 19, 2008

based on Eric's patch ...

together mold it with dyn_array for irq_desc, will allcate kstat_irqs for
nr_irq_desc alltogether if needed. -- at that point nr_cpus is known already.

v2: make sure system without generic_hardirqs works they don't have irq_desc
v3: fix merging
v4: [mingo@elte.hu] fix typo

[ mingo@elte.hu ] irq: build fix

fix:

 arch/x86/xen/spinlock.c: In function 'xen_spin_lock_slow':
 arch/x86/xen/spinlock.c:90: error: 'struct kernel_stat' has no member named 'irqs'
Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7f95ec9e

Y
irq: make irqs in kernel stat use per_cpu_dyn_array · d17a55de
由 Yinghai Lu 提交于 8月 19, 2008
```
Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
```
d17a55de

sched: only update rq->clock while holding rq->lock · 8cd162ce

由 Peter Zijlstra 提交于 10月 15, 2008

Vatsa noticed rq->clock going funny and tracked it down to an update_rq_clock()
outside a rq->lock section.

This is a problem because things like double_rq_lock() update the rq->clock
value for both rqs. Therefore disabling interrupts isn't strong enough.
Reported-by: NSrivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

8cd162ce

14 10月, 2008 1 次提交

tracing, sched: LTTng instrumentation - scheduler · 0a16b607

由 Mathieu Desnoyers 提交于 7月 18, 2008

Instrument the scheduler activity (sched_switch, migration, wakeups,
wait for a task, signal delivery) and process/thread
creation/destruction (fork, exit, kthread stop). Actually, kthread
creation is not instrumented in this patch because it is architecture
dependent. It allows to connect tracers such as ftrace which detects
scheduling latencies, good/bad scheduler decisions. Tools like LTTng can
export this scheduler information along with instrumentation of the rest
of the kernel activity to perform post-mortem analysis on the scheduler
activity.

About the performance impact of tracepoints (which is comparable to
markers), even without immediate values optimizations, tests done by
Hideo Aoki on ia64 show no regression. His test case was using hackbench
on a kernel where scheduler instrumentation (about 5 events in code
scheduler code) was added. See the "Tracepoints" patch header for
performance result detail.

Changelog :

- Change instrumentation location and parameter to match ftrace
  instrumentation, previously done with kernel markers.

[ mingo@elte.hu: conflict resolutions ]
Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: N'Peter Zijlstra' <peterz@infradead.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

0a16b607