提交 · 6b588c18f8dacfa6d7957c33c5ff832096e752d3 · openeuler / raspberrypi-kernel

06 2月, 2009 1 次提交

wait: prevent exclusive waiter starvation · 777c6c5f

由 Johannes Weiner 提交于 2月 04, 2009

With exclusive waiters, every process woken up through the wait queue must
ensure that the next waiter down the line is woken when it has finished.

Interruptible waiters don't do that when aborting due to a signal.  And if
an aborting waiter is concurrently woken up through the waitqueue, noone
will ever wake up the next waiter.

This has been observed with __wait_on_bit_lock() used by
lock_page_killable(): the first contender on the queue was aborting when
the actual lock holder woke it up concurrently.  The aborted contender
didn't acquire the lock and therefor never did an unlock followed by
waking up the next waiter.

Add abort_exclusive_wait() which removes the process' wait descriptor from
the waitqueue, iff still queued, or wakes up the next waiter otherwise.
It does so under the waitqueue lock.  Racing with a wake up means the
aborting process is either already woken (removed from the queue) and will
wake up the next waiter, or it will remove itself from the queue and the
concurrent wake up will apply to the next waiter after it.

Use abort_exclusive_wait() in __wait_event_interruptible_exclusive() and
__wait_on_bit_lock() when they were interrupted by other means than a wake
up through the queue.

[akpm@linux-foundation.org: coding-style fixes]
Reported-by: NChris Mason <chris.mason@oracle.com>
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Mentored-by: NOleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Chuck Lever <cel@citi.umich.edu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org>		["after some testing"]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

777c6c5f

01 2月, 2009 2 次提交

sched: symmetric sync vs avg_overlap · 1596e297

由 Peter Zijlstra 提交于 1月 28, 2009

Reinstate the weakening of the sync hint if set. This yields a more
symmetric usage of avg_overlap.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

1596e297

sched: fix sync wakeups · d942fb6c

由 Peter Zijlstra 提交于 1月 26, 2009

Pawel Dziekonski reported that the openssl benchmark and his
quantum chemistry application both show slowdowns due to the
scheduler under-parallelizing execution.

The reason are pipe wakeups still doing 'sync' wakeups which
overrides the normal buddy wakeup logic - even if waker and
wakee are loosely coupled.

Fix an inversion of logic in the buddy wakeup code.
Reported-by: NPawel Dziekonski <dzieko@gmail.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d942fb6c

15 1月, 2009 2 次提交

sched: SCHED_IDLE weight change · cce7ade8

由 Peter Zijlstra 提交于 1月 15, 2009

Increase the SCHED_IDLE weight from 2 to 3, this gives much more stable
vruntime numbers.

time advanced in 100ms:

 weight=2

 64765.988352
 67012.881408
 88501.412352

 weight=3

 35496.181411
 34130.971298
 35497.411573
Signed-off-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

cce7ade8

sched: fix bandwidth validation for UID grouping · 98a4826b

由 Peter Zijlstra 提交于 1月 14, 2009

Impact: make rt-limit tunables work again

Mark Glines reported:

> I've got an issue on x86-64 where I can't configure the system to allow
> RT tasks for a non-root user.
>
> In 2.6.26.5, I was able to do the following to set things up nicely:
> echo 450000 >/sys/kernel/uids/0/cpu_rt_runtime
> echo 450000 >/sys/kernel/uids/1000/cpu_rt_runtime
>
> Seems like every value I try to echo into the /sys files returns EINVAL.

For UID grouping we initialize the root group with infinite bandwidth
which by default is actually more than the global limit, therefore the
bandwidth check always fails.

Because the root group is a phantom group (for UID grouping) we cannot
runtime adjust it, therefore we let it reflect the global bandwidth
settings.
Reported-by: NMark Glines <mark@glines.org>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

98a4826b

14 1月, 2009 3 次提交
- H
  [CVE-2009-0029] System call wrappers part 08 · 17da2bd9
  由 Heiko Carstens 提交于 1月 14, 2009
```
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
```
  17da2bd9
- H
  [CVE-2009-0029] System call wrappers part 07 · 754fe8d2
  由 Heiko Carstens 提交于 1月 14, 2009
```
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
```
  754fe8d2
- H
  [CVE-2009-0029] System call wrappers part 06 · 5add95d4
  由 Heiko Carstens 提交于 1月 14, 2009
```
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
```
  5add95d4
12 1月, 2009 1 次提交

Revert "sched: improve preempt debugging" · 01e3eb82

由 Ingo Molnar 提交于 1月 12, 2009

This reverts commit 7317d7b8.

This has been reported (and bisected) by Alexey Zaytsev and
Kamalesh Babulal to produce annoying warnings during bootup
on both x86 and powerpc.

kernel_locked() is not a valid test in IRQ context (we update the
BKL's ->lock_depth and the preempt count separately and non-atomicalyy),
so we cannot put it into the generic preempt debugging checks which
can run in IRQ contexts too.
Reported-and-bisected-by: NAlexey Zaytsev <alexey.zaytsev@gmail.com>
Reported-and-bisected-by: NKamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

01e3eb82

11 1月, 2009 2 次提交

kernel/sched.c: add missing forward declaration for 'double_rq_lock' · fd2ab30b

由 Steven Noonan 提交于 1月 11, 2009

Impact: build fix on certain configs

Added 'double_rq_lock' forward declaration, allowing double_rq_lock
to be used in _double_lock_balance().
Signed-off-by: NSteven Noonan <steven@uplinklabs.net>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

fd2ab30b

cpumask: fix CONFIG_NUMA=y sched.c · 62ea9ceb

由 Rusty Russell 提交于 1月 11, 2009

Impact: fix panic on ia64 with NR_CPUS=1024

struct sched_domain is now a dangling structure; where we really want
static ones, we need to use static_sched_domain.

(As the FIXME in this file says, cpumask_var_t would be better, but
this code is hairy enough without trying to add initialization code to
the right places).
Reported-by: NMike Travis <travis@sgi.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

62ea9ceb

07 1月, 2009 1 次提交

sched: fix possible recursive rq->lock · da8d5089

由 Peter Zijlstra 提交于 1月 07, 2009

Vaidyanathan Srinivasan reported:

 > =============================================
 > [ INFO: possible recursive locking detected ]
 > 2.6.28-autotest-tip-sv #1
 > ---------------------------------------------
 > klogd/5062 is trying to acquire lock:
 >  (&rq->lock){++..}, at: [<ffffffff8022aca2>] task_rq_lock+0x45/0x7e
 >
 > but task is already holding lock:
 >  (&rq->lock){++..}, at: [<ffffffff805f7354>] schedule+0x158/0xa31

With sched_mc at 2. (it is default-off)

Strictly speaking we'll not deadlock, because ttwu will not be able to
place the migration task on our rq, but since the code can deal with
both rqs getting unlocked, this seems the easiest way out.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

da8d5089

06 1月, 2009 2 次提交

sched: fix section mismatch · db2f59c8

由 Li Zefan 提交于 1月 06, 2009

init_rootdomain() calls alloc_bootmem_cpumask_var() at system boot,
so does cpupri_init().
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

db2f59c8

sched: fix double kfree in failure path · 0c910d28

由 Li Zefan 提交于 1月 06, 2009

It's not the responsibility of init_rootdomain() to free root_domain
allocated by alloc_rootdomain().
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Reviewed-by: NPekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

0c910d28

05 1月, 2009 2 次提交

sched: clean up arch_reinit_sched_domains() · c70f22d2

由 Li Zefan 提交于 1月 05, 2009

- Make arch_reinit_sched_domains() static. It was exported to be used in
  s390, but now rebuild_sched_domains() is used instead.

- Make it return void.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c70f22d2

sched: mark sched_create_sysfs_power_savings_entries() as __init · 39aac648

由 Li Zefan 提交于 1月 05, 2009

Impact: cleanup

The only caller is cpu_dev_init() which is marked as __init.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

39aac648

04 1月, 2009 1 次提交

sched: put back some stack hog changes that were undone in kernel/sched.c · 6ca09dfc

由 Mike Travis 提交于 12月 31, 2008

Impact: prevents panic from stack overflow on numa-capable machines.

Some of the "removal of stack hogs" changes in kernel/sched.c by using
node_to_cpumask_ptr were undone by the early cpumask API updates, and
causes a panic due to stack overflow.  This patch undoes those changes
by using cpumask_of_node() which returns a 'const struct cpumask *'.

In addition, cpu_coregoup_map is replaced with cpu_coregroup_mask further
reducing stack usage.  (Both of these updates removed 9 FIXME's!)

Also:
   Pick up some remaining changes from the old 'cpumask_t' functions to
   the new 'struct cpumask *' functions.

   Optimize memory traffic by allocating each percpu local_cpu_mask on the
   same node as the referring cpu.
Signed-off-by: NMike Travis <travis@sgi.com>
Acked-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

6ca09dfc

31 12月, 2008 2 次提交

[PATCH] idle cputime accounting · 79741dd3

由 Martin Schwidefsky 提交于 12月 31, 2008

The cpu time spent by the idle process actually doing something is
currently accounted as idle time. This is plain wrong, the architectures
that support VIRT_CPU_ACCOUNTING=y can do better: distinguish between the
time spent doing nothing and the time spent by idle doing work. The first
is accounted with account_idle_time and the second with account_system_time.
The architectures that use the account_xxx_time interface directly and not
the account_xxx_ticks interface now need to do the check for the idle
process in their arch code. In particular to improve the system vs true
idle time accounting the arch code needs to measure the true idle time
instead of just testing for the idle process.
To improve the tick based accounting as well we would need an architecture
primitive that can tell us if the pt_regs of the interrupted context
points to the magic instruction that halts the cpu.

In addition idle time is no more added to the stime of the idle process.
This field now contains the system time of the idle process as it should
be. On systems without VIRT_CPU_ACCOUNTING this will always be zero as
every tick that occurs while idle is running will be accounted as idle
time.

This patch contains the necessary common code changes to be able to
distinguish idle system time and true idle time. The architectures with
support for VIRT_CPU_ACCOUNTING need some changes to exploit this.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

79741dd3

[PATCH] fix scaled & unscaled cputime accounting · 457533a7

由 Martin Schwidefsky 提交于 12月 31, 2008

The utimescaled / stimescaled fields in the task structure and the
global cpustat should be set on all architectures. On s390 the calls
to account_user_time_scaled and account_system_time_scaled never have
been added. In addition system time that is accounted as guest time
to the user time of a process is accounted to the scaled system time
instead of the scaled user time.
To fix the bugs and to prevent future forgetfulness this patch merges
account_system_time_scaled into account_system_time and
account_user_time_scaled into account_user_time.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Michael Neuling <mikey@neuling.org>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

457533a7

26 12月, 2008 1 次提交

cpumask: Replace cpu_coregroup_map with cpu_coregroup_mask · be4d638c

由 Rusty Russell 提交于 12月 26, 2008

cpu_coregroup_map returned a cpumask_t: it's going away.

(Note, the sched part of this patch won't apply meaningfully to the
sched tree, but I'm posting it to show the goal).
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NMike Travis <travis@sgi.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ingo Molnar <mingo@redhat.com>

be4d638c

25 12月, 2008 1 次提交

sched, trace: update trace_sched_wakeup() · 468a15bb

由 Peter Zijlstra 提交于 12月 16, 2008

Impact: extend the wakeup tracepoint with the info whether the wakeup was real

Add the information needed to distinguish 'real' wakeups from 'false'
wakeups.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

468a15bb

24 12月, 2008 1 次提交

sched: nominate preferred wakeup cpu, fix · 36dffab6

由 Vaidyanathan Srinivasan 提交于 12月 20, 2008

Andrew Morton reported:

> kernel/sched.c: In function 'schedule':
> kernel/sched.c:3679: warning: 'active_balance' may be used uninitialized in this function
>
> This warning is correct - the code is buggy.

In sched.c load_balance_newidle, there's real potential use of
uninitialised variable - fix it.
Signed-off-by: NVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

36dffab6

19 12月, 2008 6 次提交

sched: fix warning in kernel/sched.c · 9924da43

由 Ingo Molnar 提交于 12月 19, 2008

Impact: fix cpumask conversion bug

this warning:

  kernel/sched.c: In function ‘find_busiest_group’:
  kernel/sched.c:3429: warning: passing argument 1 of ‘__first_cpu’ from incompatible pointer type

shows that we forgot to convert a new patch to the new cpumask APIs.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

9924da43

sched: activate active load balancing in new idle cpus · ad273b32

由 Vaidyanathan Srinivasan 提交于 12月 18, 2008

Impact: tweak task balancing to save power more agressively

Active load balancing is a process by which migration thread
is woken up on the target CPU in order to pull current
running task on another package into this newly idle
package.

This method is already in use with normal load_balance(),
this patch introduces this method to new idle cpus when
sched_mc is set to POWERSAVINGS_BALANCE_WAKEUP.

This logic provides effective consolidation of short running
daemon jobs in a almost idle system

The side effect of this patch may be ping-ponging of tasks
if the system is moderately utilised. May need to adjust the
iterations before triggering.
Signed-off-by: NVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ad273b32

sched: nominate preferred wakeup cpu · 7a09b1a2

由 Vaidyanathan Srinivasan 提交于 12月 18, 2008

Impact: extend load-balancing code (no change in behavior yet)

When the system utilisation is low and more cpus are idle,
then the process waking up from sleep should prefer to
wakeup an idle cpu from semi-idle cpu package (multi core
package) rather than a completely idle cpu package which
would waste power.

Use the sched_mc balance logic in find_busiest_group() to
nominate a preferred wakeup cpu.

This info can be stored in appropriate sched_domain, but
updating this info in all copies of sched_domain is not
practical.  Hence this information is stored in root_domain
struct which is one copy per partitioned sched domain.
The root_domain can be accessed from each cpu's runqueue
and there is one copy per partitioned sched domain.
Signed-off-by: NVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7a09b1a2

sched: favour lower logical cpu number for sched_mc balance · d5679bd1

由 Vaidyanathan Srinivasan 提交于 12月 18, 2008

Impact: change load-balancing direction to match that of irqbalanced

Just in case two groups have identical load, prefer to move load to lower
logical cpu number rather than the present logic of moving to higher logical
number.

find_busiest_group() tries to look for a group_leader that has spare capacity
to take more tasks and freeup an appropriate least loaded group. Just in case
there is a tie and the load is equal, then the group with higher logical number
is favoured. This conflicts with user space irqbalance daemon that will move
interrupts to lower logical number if the system utilisation is very low.
Signed-off-by: NVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d5679bd1

sched: framework for sched_mc/smt_power_savings=N · afb8a9b7

由 Gautham R Shenoy 提交于 12月 18, 2008

Impact: extend range of /sys/devices/system/cpu/sched_mc_power_savings

Currently the sched_mc/smt_power_savings variable is a boolean,
which either enables or disables topology based power savings.
This patch extends the behaviour of the variable from boolean to
multivalued, such that based on the value, we decide how
aggressively do we want to perform powersavings balance at
appropriate sched domain based on topology.

Variable levels of power saving tunable would benefit end user to
match the required level of power savings vs performance
trade-off depending on the system configuration and workloads.

This version makes the sched_mc_power_savings global variable to
take more values (0,1,2).  Later versions can have a single
tunable called sched_power_savings instead of
sched_{mc,smt}_power_savings.
Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
Signed-off-by: NVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

afb8a9b7

tracing: fix warnings in kernel/trace/trace_sched_switch.c · c71dd42d

由 Ingo Molnar 提交于 12月 19, 2008

these warnings:

kernel/trace/trace_sched_switch.c: In function ‘tracing_sched_register’:
kernel/trace/trace_sched_switch.c:96: warning: passing argument 1 of ‘register_trace_sched_wakeup_new’ from incompatible pointer type
kernel/trace/trace_sched_switch.c:112: warning: passing argument 1 of ‘unregister_trace_sched_wakeup_new’ from incompatible pointer type
kernel/trace/trace_sched_switch.c: In function ‘tracing_sched_unregister’:
kernel/trace/trace_sched_switch.c:121: warning: passing argument 1 of ‘unregister_trace_sched_wakeup_new’ from incompatible pointer type

Trigger because sched_wakeup_new tracepoints need the same trace
signature as sched_wakeup - which was changed recently.

Fix it.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c71dd42d

18 12月, 2008 1 次提交

schedstat: consolidate per-task cpu runtime stats · 9c2c4802

由 Ken Chen 提交于 12月 16, 2008

Impact: simplify code

When we turn on CONFIG_SCHEDSTATS, per-task cpu runtime is accumulated
twice. Once in task->se.sum_exec_runtime and once in sched_info.cpu_time.
These two stats are exactly the same.

Given that task->se.sum_exec_runtime is always accumulated by the core
scheduler, sched_info can reuse that data instead of duplicate the accounting.
Signed-off-by: NKen Chen <kenchen@google.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

9c2c4802

16 12月, 2008 3 次提交

sched, cpuacct: export percpu cpuacct cgroup stats · e9515c3c

由 Ken Chen 提交于 12月 15, 2008

This patch export per-cpu CPU cycle usage for a given cpuacct cgroup.
There is a need for a user space monitor daemon to track group CPU
usage on per-cpu base.  It is also useful for monitoring CFS load
balancer behavior by tracking per CPU group usage.
Signed-off-by: NKen Chen <kenchen@google.com>
Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e9515c3c

sched, cpuacct: refactoring cpuusage_read / cpuusage_write · 720f5498

由 Ken Chen 提交于 12月 15, 2008

Impact: micro-optimize the code on 64-bit architectures

In the thread regarding to 'export percpu cpuacct cgroup stats'
http://lkml.org/lkml/2008/12/7/13

akpm pointed out that current cpuacct code is inefficient.  This patch
refactoring the following:

* make cpu_rq locking only on 32-bit
* change iterator to each_present_cpu instead of each_possible_cpu to
  make it hotplug friendly.

It's a bit of code churn, but I was rewarded with 160 byte code size saving
on x86-64 arch and zero code size change on i386.
Signed-off-by: NKen Chen <kenchen@google.com>
Cc: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

720f5498

sched: fix wakeup preemption clock · 03e89e45

由 Mike Galbraith 提交于 12月 16, 2008

Impact: sharpen the wakeup-granularity to always be against current scheduler time

It was possible to do the preemption check against an old time stamp.
Signed-off-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

03e89e45

13 12月, 2008 1 次提交

cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and... · 29c0177e

由 Rusty Russell 提交于 12月 13, 2008

cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and cpulist_scnprintf to take pointers.

Impact: change calling convention of existing cpumask APIs

Most cpumask functions started with cpus_: these have been replaced by
cpumask_ ones which take struct cpumask pointers as expected.

These four functions don't have good replacement names; fortunately
they're rarely used, so we just change them over.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NMike Travis <travis@sgi.com>
Acked-by: NIngo Molnar <mingo@elte.hu>
Cc: paulus@samba.org
Cc: mingo@redhat.com
Cc: tony.luck@intel.com
Cc: ralf@linux-mips.org
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: cl@linux-foundation.org
Cc: srostedt@redhat.com

29c0177e

12 12月, 2008 3 次提交

sched: add missing arch_update_cpu_topology() call · d65bd5ec

由 Heiko Carstens 提交于 12月 09, 2008

arch_reinit_sched_domains() used to call arch_update_cpu_topology()
via arch_init_sched_domains(). This call got lost with
e761b772 ("cpu hotplug, sched: Introduce
cpu_active_map and redo sched domain managment (take 2)".

So we might end up with outdated and missing cpus in the cpu core
maps (architecture used to call arch_reinit_sched_domains if cpu
topology changed).

This adds a call to arch_update_cpu_topology in partition_sched_domains
which gets called whenever scheduling domains get updated. Which is
what is supposed to happen when cpu topology changes.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d65bd5ec

sched: let arch_update_cpu_topology indicate if topology changed · ee79d1bd

由 Heiko Carstens 提交于 12月 09, 2008

Change arch_update_cpu_topology so it returns 1 if the cpu topology changed
and 0 if it didn't change. This will be useful for the next patch which adds
a call to this function in partition_sched_domains.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ee79d1bd

sched: fix tracepoints in scheduler · cbc34ed1

由 Peter Zijlstra 提交于 12月 10, 2008

The trace point only caught one of many places where a task changes cpu,
put it in the right place to we get all of them.

Change the signature while we're at it.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

cbc34ed1

10 12月, 2008 1 次提交

sched: CPU remove deadlock fix · 9a2bd244

由 Brian King 提交于 12月 09, 2008

Impact: fix possible deadlock in CPU hot-remove path

This patch fixes a possible deadlock scenario in the CPU remove path.
migration_call grabs rq->lock, then wakes up everything on rq->migration_queue
with the lock held. Then one of the tasks on the migration queue ends up
calling tg_shares_up which then also tries to acquire the same rq->lock.

[c000000058eab2e0] c000000000502078 ._spin_lock_irqsave+0x98/0xf0
[c000000058eab370] c00000000008011c .tg_shares_up+0x10c/0x20c
[c000000058eab430] c00000000007867c .walk_tg_tree+0xc4/0xfc
[c000000058eab4d0] c0000000000840c8 .try_to_wake_up+0xb0/0x3c4
[c000000058eab590] c0000000000799a0 .__wake_up_common+0x6c/0xe0
[c000000058eab640] c00000000007ada4 .complete+0x54/0x80
[c000000058eab6e0] c000000000509fa8 .migration_call+0x5fc/0x6f8
[c000000058eab7c0] c000000000504074 .notifier_call_chain+0x68/0xe0
[c000000058eab860] c000000000506568 ._cpu_down+0x2b0/0x3f4
[c000000058eaba60] c000000000506750 .cpu_down+0xa4/0x108
[c000000058eabb10] c000000000507e54 .store_online+0x44/0xa8
[c000000058eabba0] c000000000396260 .sysdev_store+0x3c/0x50
[c000000058eabc10] c0000000001a39b8 .sysfs_write_file+0x124/0x18c
[c000000058eabcd0] c00000000013061c .vfs_write+0xd0/0x1bc
[c000000058eabd70] c0000000001308a4 .sys_write+0x68/0x114
[c000000058eabe30] c0000000000086b4 syscall_exit+0x0/0x40
Signed-off-by: NBrian King <brking@linux.vnet.ibm.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

9a2bd244

08 12月, 2008 2 次提交

sched: idle_balance() does not call load_balance_newidle() · efbe027e

由 Vaidyanathan Srinivasan 提交于 12月 08, 2008

Impact: fix SD_BALANCE_NEWIDLEand broaden its use

load_balance_newidle() does not get called if SD_BALANCE_NEWIDLE is
set at higher level domain (3-CPU) and not in low level domain (2-MC).

pulled_task is initialised to -1 and checked for non-zero which is
always true if the lowest level sched_domain does not have
SD_BALANCE_NEWIDLE flag set.
Signed-off-by: NVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

efbe027e

sched: fix sd_parent_degenerate on non-numa smp machine · 5436499e

由 Ken Chen 提交于 12月 07, 2008

Impact: optimize the sched domains tree some more

The addition of SD_SERIALIZE flag added to SD_NODE_INIT prevented top level
dummy numa sched_domain to be properly degenerated on non-numa smp machine.
The reason is that in sd_parent_degenerate(), it found that the child and
parent does not have comon sched_domain flags due to SD_SERIALIZE. However,
for non-numa smp box, the top level is a dummy with a single sched_group.

Filter out SD_SERIALIZE if it is on non-numa machine to properly degenerate
top level node sched_domain. this will cut back some of the sd domain walk
in the load balancer code.
Signed-off-by: NKen Chen <kenchen@google.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

5436499e

02 12月, 2008 1 次提交

sched: add uid information to sched_debug for CONFIG_USER_SCHED · 6c415b92

由 Arun R Bharadwaj 提交于 12月 01, 2008

Impact: extend information in /proc/sched_debug

This patch adds uid information in sched_debug for CONFIG_USER_SCHED
Signed-off-by: NArun R Bharadwaj <arun@linux.vnet.ibm.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

6c415b92