提交 · 5c8e1ed1d204a6770ca2854cd3b3597070fe7e5a · openanolis / cloud-kernel

06 6月, 2008 11 次提交

sched: CPU hotplug events must not destroy scheduler domains created by the cpusets · 5c8e1ed1

由 Max Krasnyansky 提交于 5月 29, 2008

First issue is not related to the cpusets. We're simply leaking doms_cur.
It's allocated in arch_init_sched_domains() which is called for every
hotplug event. So we just keep reallocation doms_cur without freeing it.
I introduced free_sched_domains() function that cleans things up.

Second issue is that sched domains created by the cpusets are
completely destroyed by the CPU hotplug events. For all CPU hotplug
events scheduler attaches all CPUs to the NULL domain and then puts
them all into the single domain thereby destroying domains created
by the cpusets (partition_sched_domains).
The solution is simple, when cpusets are enabled scheduler should not
create default domain and instead let cpusets do that. Which is
exactly what the patch does.
Signed-off-by: NMax Krasnyansky <maxk@qualcomm.com>
Cc: pj@sgi.com
Cc: menage@google.com
Cc: rostedt@goodmis.org
Cc: mingo@elte.hu
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

5c8e1ed1

sched: fix cpupri hotplug support · 1f11eb6a

由 Gregory Haskins 提交于 6月 04, 2008

The RT folks over at RedHat found an issue w.r.t. hotplug support which
was traced to problems with the cpupri infrastructure in the scheduler:

https://bugzilla.redhat.com/show_bug.cgi?id=449676

This bug affects 23-rt12+, 24-rtX, 25-rtX, and sched-devel.  This patch
applies to 25.4-rt4, though it should trivially apply to most cpupri enabled
kernels mentioned above.

It turned out that the issue was that offline cpus could get inadvertently
registered with cpupri so that they were erroneously selected during
migration decisions.  The end result would be an OOPS as the offline cpu
had tasks routed to it.

This patch generalizes the old join/leave domain interface into an
online/offline interface, and adjusts the root-domain/hotplug code to
utilize it.

I was able to easily reproduce the issue prior to this patch, and am no
longer able to reproduce it after this patch.  I can offline cpus
indefinately and everything seems to be in working order.

Thanks to Arnaldo (acme), Thomas, and Peter for doing the legwork to point
me in the right direction.  Also thank you to Peter for reviewing the
early iterations of this patch.
Signed-off-by: NGregory Haskins <ghaskins@novell.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

1f11eb6a

sched: print the sd->level in sched_domain_debug code · 099f98c8

由 Gautham R Shenoy 提交于 5月 29, 2008

While printing out the visual representation of the sched-domains, print
the level (MC, SMT, CPU, NODE, ... ) of each of the sched_domains.

Credit: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

099f98c8

sched: add comments for ifdefs in sched.c · 6d6bc0ad

由 Dhaval Giani 提交于 5月 30, 2008

make sched.c easier to read.
Signed-off-by: NDhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

6d6bc0ad

sched: print module list in the "scheduling while atomic" warning · e21f5b15

由 Arjan van de Ven 提交于 5月 23, 2008

For the normal WARN_ON() etc we added a print-the-modules-list already,
which is very useful to figure out candidates for certain types of bugs.

This patch adds the same print to the "scheduling while atomic" BUG warning,
for the same reason: when we get here it's very useful to see which modules
are loaded, to narrow down the candidate code list.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Cc: mingo@elte.hu
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

e21f5b15

sched: fix defined-but-unused warning · 81d41d7e

由 Rabin Vincent 提交于 5月 11, 2008

Fix this warning, which appears with !CONFIG_SMP:
kernel/sched.c:1216: warning: `init_hrtick' defined but not used
Signed-off-by: NRabin Vincent <rabin@rab.in>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

81d41d7e

namespacecheck: fixes in kernel/sched.c · f7dcd80b

由 Thomas Gleixner 提交于 5月 24, 2008

Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

f7dcd80b

sched: check for SD_SERIALIZE atomically in rebalance_domains() · d07355f5

由 Dmitry Adamushko 提交于 5月 12, 2008

Nothing really serious here, mainly just a matter of nit-picking :-/

From: Dmitry Adamushko <dmitry.adamushko@gmail.com>
For CONFIG_SCHED_DEBUG && CONFIG_SYSCT configs, sd->flags can be altered
while being manipulated in rebalance_domains(). Let's do an atomic check.
We rely here on the atomicity of read/write accesses for aligned words.
Signed-off-by: NDmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

d07355f5

sched: use a 2-d bitmap for searching lowest-pri CPU · 6e0534f2

由 Gregory Haskins 提交于 5月 12, 2008

The current code use a linear algorithm which causes scaling issues
on larger SMP machines.  This patch replaces that algorithm with a
2-dimensional bitmap to reduce latencies in the wake-up path.
Signed-off-by: NGregory Haskins <ghaskins@novell.com>
Acked-by: NSteven Rostedt <srostedt@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

6e0534f2

sched: make !hrtick faster · f333fdc9

由 Mike Galbraith 提交于 5月 12, 2008

it is safe to ignore timers and flags when the feature is disabled.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

f333fdc9

sched: prioritize non-migratable tasks over migratable ones · 45c01e82

由 Gregory Haskins 提交于 5月 12, 2008

Dmitry Adamushko pointed out a known flaw in the rt-balancing algorithm
that could allow suboptimal balancing if a non-migratable task gets
queued behind a running migratable one.  It is discussed in this thread:

http://lkml.org/lkml/2008/4/22/296

This issue has been further exacerbated by a recent checkin to
sched-devel (git-id 5eee63a5ebc19a870ac40055c0be49457f3a89a3).

>From a pure priority standpoint, the run-queue is doing the "right"
thing. Using Dmitry's nomenclature, if T0 is on cpu1 first, and T1
wakes up at equal or lower priority (affined only to cpu1) later, it
*should* wait for T0 to finish.  However, in reality that is likely
suboptimal from a system perspective if there are other cores that
could allow T0 and T1 to run concurrently.  Since T1 can not migrate,
the only choice for higher concurrency is to try to move T0.  This is
not something we addessed in the recent rt-balancing re-work.

This patch tries to enhance the balancing algorithm by accomodating this
scenario.  It accomplishes this by incorporating the migratability of a
task into its priority calculation.  Within a numerical tsk->prio, a
non-migratable task is logically higher than a migratable one.  We
maintain this by introducing a new per-priority queue (xqueue, or
exclusive-queue) for holding non-migratable tasks.  The scheduler will
draw from the xqueue over the standard shared-queue (squeue) when
available.

There are several details for utilizing this properly.

1) During task-wake-up, we not only need to check if the priority
   preempts the current task, but we also need to check for this
   non-migratable condition.  Therefore, if a non-migratable task wakes
   up and sees an equal priority migratable task already running, it
   will attempt to preempt it *if* there is a likelyhood that the
   current task will find an immediate home.

2) Tasks only get this non-migratable "priority boost" on wake-up.  Any
   requeuing will result in the non-migratable task being queued to the
   end of the shared queue.  This is an attempt to prevent the system
   from being completely unfair to migratable tasks during things like
   SCHED_RR timeslicing.

I am sure this patch introduces potentially "odd" behavior if you
concoct a scenario where a bunch of non-migratable threads could starve
migratable ones given the right pattern.  I am not yet convinced that
this is a problem since we are talking about tasks of equal RT priority
anyway, and there never is much in the way of guarantees against
starvation under that scenario anyway. (e.g. you could come up with a
similar scenario with a specific timing environment verses an affinity
environment).  I can be convinced otherwise, but for now I think this is
"ok".
Signed-off-by: NGregory Haskins <ghaskins@novell.com>
CC: Dmitry Adamushko <dmitry.adamushko@gmail.com>
CC: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

45c01e82

29 5月, 2008 4 次提交

revert ("sched: fair-group: SMP-nice for group scheduling") · 6363ca57

由 Ingo Molnar 提交于 5月 29, 2008

Yanmin Zhang reported:

Comparing with 2.6.25, volanoMark has big regression with kernel 2.6.26-rc1.
It's about 50% on my 8-core stoakley, 16-core tigerton, and Itanium Montecito.

With bisect, I located the following patch:

| 18d95a28 is first bad commit
| commit 18d95a28
| Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
| Date:   Sat Apr 19 19:45:00 2008 +0200
|
|     sched: fair-group: SMP-nice for group scheduling

Revert it so that we get v2.6.25 behavior.
Bisected-by: NYanmin Zhang <yanmin_zhang@linux.intel.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

6363ca57

I
sched: cleanup · 4285f594
由 Ingo Molnar 提交于 5月 16, 2008
```
Signed-off-by: NIngo Molnar <mingo@elte.hu>
```
4285f594

sched: unite unlikely pairs in rt_policy() and schedule_debug() · 3f33a7ce

由 Roel Kluin 提交于 5月 13, 2008

Removes obfuscation and may improve assembly.
Signed-off-by: NRoel Kluin <roel.kluin@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3f33a7ce

revert ("sched: fair: weight calculations") · f9305d4a

由 Ingo Molnar 提交于 5月 29, 2008

Yanmin Zhang reported:

Comparing with kernel 2.6.25, sysbench+mysql(oltp, readonly) has many
regressions with 2.6.26-rc1:

 1) 8-core stoakley: 28%;
 2) 16-core tigerton: 20%;
 3) Itanium Montvale: 50%.

Bisect located this patch:

| 8f1bc385 is first bad commit
| commit 8f1bc385
| Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
| Date:   Sat Apr 19 19:45:00 2008 +0200
|
|     sched: fair: weight calculations

Revert it to the 2.6.25 state.
Bisected-by: NYanmin Zhang <yanmin_zhang@linux.intel.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f9305d4a

15 5月, 2008 1 次提交

cgroups: fix compile warning · 0c70814c

由 Mirco Tischler 提交于 5月 14, 2008

Return type of cpu_rt_runtime_write() should be int instead of ssize_t.
Signed-off-by: NMirco Tischler <mt-ml@gmx.de>
Acked-by: NPaul Menage <menage@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0c70814c

12 5月, 2008 1 次提交

Add new 'cond_resched_bkl()' helper function · c3921ab7

由 Linus Torvalds 提交于 5月 11, 2008

It acts exactly like a regular 'cond_resched()', but will not get
optimized away when CONFIG_PREEMPT is set.

Normal kernel code is already preemptable in the presense of
CONFIG_PREEMPT, so cond_resched() is optimized away (see commit
02b67cc3 "sched: do not do
cond_resched() when CONFIG_PREEMPT").

But when wanting to conditionally reschedule while holding a lock, you
need to use "cond_sched_lock(lock)", and the new function is the BKL
equivalent of that.

Also make fs/locks.c use it.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c3921ab7

11 5月, 2008 1 次提交

BKL: revert back to the old spinlock implementation · 8e3e076c

由 Linus Torvalds 提交于 5月 10, 2008

The generic semaphore rewrite had a huge performance regression on AIM7
(and potentially other BKL-heavy benchmarks) because the generic
semaphores had been rewritten to be simple to understand and fair.  The
latter, in particular, turns a semaphore-based BKL implementation into a
mess of scheduling.

The attempt to fix the performance regression failed miserably (see the
previous commit 00b41ec2 'Revert
"semaphore: fix"'), and so for now the simple and sane approach is to
instead just go back to the old spinlock-based BKL implementation that
never had any issues like this.

This patch also has the advantage of being reported to fix the
regression completely according to Yanmin Zhang, unlike the semaphore
hack which still left a couple percentage point regression.

As a spinlock, the BKL obviously has the potential to be a latency
issue, but it's not really any different from any other spinlock in that
respect.  We do want to get rid of the BKL asap, but that has been the
plan for several years.

These days, the biggest users are in the tty layer (open/release in
particular) and Alan holds out some hope:

  "tty release is probably a few months away from getting cured - I'm
   afraid it will almost certainly be the very last user of the BKL in
   tty to get fixed as it depends on everything else being sanely locked."

so while we're not there yet, we do have a plan of action.
Tested-by: NYanmin Zhang <yanmin_zhang@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Alexander Viro <viro@ftp.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8e3e076c

06 5月, 2008 10 次提交

sched: add optional support for CONFIG_HAVE_UNSTABLE_SCHED_CLOCK · 3e51f33f

由 Peter Zijlstra 提交于 5月 03, 2008

this replaces the rq->clock stuff (and possibly cpu_clock()).

 - architectures that have an 'imperfect' hardware clock can set
   CONFIG_HAVE_UNSTABLE_SCHED_CLOCK

 - the 'jiffie' window might be superfulous when we update tick_gtod
   before the __update_sched_clock() call in sched_clock_tick()

 - cpu_clock() might be implemented as:

     sched_clock_cpu(smp_processor_id())

   if the accuracy proves good enough - how far can TSC drift in a
   single jiffie when considering the filtering and idle hooks?

[ mingo@elte.hu: various fixes and cleanups ]
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3e51f33f

sched: fix cpu clock · dfbf4a1b

由 Ingo Molnar 提交于 4月 23, 2008

David Miller pointed it out that nothing in cpu_clock() sets
prev_cpu_time. This caused __sync_cpu_clock() to be called
all the time - against the intention of this code.

The result was that in practice we hit a global spinlock every
time cpu_clock() is called - which - even though cpu_clock()
is used for tracing and debugging, is suboptimal.

While at it, also:

- move the irq disabling to the outest layer,
  this should make cpu_clock() warp-free when called with irqs
  enabled.

- use long long instead of cycles_t - for platforms where cycles_t
  is 32-bit.
Reported-by: NDavid Miller <davem@davemloft.net>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

dfbf4a1b

sched: fair-group: fix a Div0 error of the fair group scheduler · cb4ad1ff

由 Miao Xie 提交于 4月 28, 2008

When I echoed 0 into the "cpu.shares" file, a Div0 error occured.

We found it is caused by the following calling.

   sched_group_set_shares(tg, shares)
       set_se_shares(tg->se[i], shares/nr_cpu_ids)
           __set_se_shares(se, shares)
               div64_64((1ULL<<32), shares)

When the echoed value was less than the number of processores, the result of the
sentence "shares/nr_cpu_ids" was 0, and then the system called div64() to divide
the result, the Div0 error occured.

It is unnecessary that the shares value is divided by nr_cpu_ids, I think.
Because in the function  __update_group_shares_cpu() and init_tg_cfs_entry(),
the shares value isn't divided by nr_cpu_ids when setting shares of the sched
entity.

This patch fixes this bug. And echoing ULONG_MAX value into cpu.shares also
causes Div0 error, so we set a macro MAX_SHARES to limit the max value of
shares.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

cb4ad1ff

sched: fix missing locking in sched_domains code · 712555ee

由 Heiko Carstens 提交于 4月 28, 2008

Concurrent calls to detach_destroy_domains and arch_init_sched_domains
were prevented by the old scheduler subsystem cpu hotplug mutex. When
this got converted to get_online_cpus() the locking got broken.
Unlike before now several processes can concurrently enter the critical
sections that were protected by the old lock.

So use the already present doms_cur_mutex to protect these sections again.

Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

712555ee

I
sched: make clock sync tunable by architecture code · 690229a0
由 Ingo Molnar 提交于 4月 23, 2008
```
make time_sync_thresh tunable to architecture code.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
```
690229a0

sched: fix sched_info_switch not being called according to documentation · 673a90a1

由 David Simner 提交于 4月 29, 2008

http://bugzilla.kernel.org/show_bug.cgi?id=10545

sched_stats.h says that __sched_info_switch is "called when prev !=
next" in the comment.  sched.c should therefore do that.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

673a90a1

sched: fix hrtick_start_fair and CPU-Hotplug · b328ca18

由 Peter Zijlstra 提交于 4月 29, 2008

Gautham R Shenoy reported:

 > While running the usual CPU-Hotplug stress tests on linux-2.6.25,
 > I noticed the following in the console logs.
 >
 > This is a wee bit difficult to reproduce. In the past 10 runs I hit this
 > only once.
 >
 > ------------[ cut here ]------------
 >
 > WARNING: at kernel/sched.c:962 hrtick+0x2e/0x65()
 >
 > Just wondering if we are doing a good job at handling the cancellation
 > of any per-cpu scheduler timers during CPU-Hotplug.

This looks like its indeed not cancelled at all and migrates the it to
another cpu. Fix it via a proper hotplug notifier mechanism.
Reported-by: NGautham R Shenoy <ego@in.ibm.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stable@kernel.org
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b328ca18

sched: add statics, don't return void expressions · 983ed7a6

由 Harvey Harrison 提交于 4月 24, 2008

Noticed by sparse:
kernel/sched.c:760:20: warning: symbol 'sched_feat_names' was not declared. Should it be static?
kernel/sched.c:767:5: warning: symbol 'sched_feat_open' was not declared. Should it be static?
kernel/sched_fair.c:845:3: warning: returning void-valued expression
kernel/sched.c:4386:3: warning: returning void-valued expression
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

983ed7a6

sched: add debug checks to idle functions · d478c2cf

由 Andrew Morton 提交于 4月 26, 2008

Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: "Justin Mattock" <justinmattock@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d478c2cf

sched: optimize calc_delta_mine() · e05510d0

由 Peter Zijlstra 提交于 5月 05, 2008

Joel noticed that the !lw->inv_weight contition isn't unlikely anymore so
remove the unlikely annotation. Also, remove the two div64_u64() inv_weight
calculations, which makes them rely on the calc_delta_mine() path as well.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
CC: Joel Schopp <jschopp@austin.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e05510d0

01 5月, 2008 1 次提交

rename div64_64 to div64_u64 · 6f6d6a1a

由 Roman Zippel 提交于 5月 01, 2008

Rename div64_64 to div64_u64 to make it consistent with the other divide
functions, so it clearly includes the type of the divide.  Move its definition
to math64.h as currently no architecture overrides the generic implementation.
 They can still override it of course, but the duplicated declarations are
avoided.
Signed-off-by: NRoman Zippel <zippel@linux-m68k.org>
Cc: Avi Kivity <avi@qumranet.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6f6d6a1a

29 4月, 2008 2 次提交

CGroups _s64 files: use read_s64/write_s64 in CFS cgroup for rt_runtime file · 06ecb27c

由 Paul Menage 提交于 4月 29, 2008

This removes some filesystem boilerplate from the CFS cgroup subsystem.
Signed-off-by: NPaul Menage <menage@google.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

06ecb27c

CGroup API files: rename read/write_uint methods to read_write_u64 · f4c753b7

由 Paul Menage 提交于 4月 29, 2008

Several people have justifiably complained that the "_uint" suffix is
inappropriate for functions that handle u64 values, so this patch just renames
all these functions and their users to have the suffic _u64.

[peterz@infradead.org: build fix]
Signed-off-by: NPaul Menage <menage@google.com>
Cc: "Li Zefan" <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "YAMAMOTO Takashi" <yamamoto@valinux.co.jp>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f4c753b7

25 4月, 2008 4 次提交

sched: use alloc_bootmem() instead of alloc_bootmem_low() · 5a9d3225

由 David Miller 提交于 4月 24, 2008

There is no guarantee that there is physical ram below 4GB, and in
fact many boxes don't have exactly that.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

5a9d3225

sched: fix share (re)distribution · 3f5087a2

由 Peter Zijlstra 提交于 4月 25, 2008

fix __aggregate_redistribute_shares() related lockup reported by
David S. Miller.

The problem this code tries to solve is 'accurately' calculating the 'fair'
share of the group weight for each cpu. The current code falls back to a global
group rebalance in case the sched_domain's span it looks at has no shares, but
does have tasks.

The reason it gets stuck here, is because its inherently racy - if someone
steals the last task after we compute the agg->rq_weight, but before we
rebalance, we'll never get out of the loop.

We could of course go fix that, but while looking at this issue I found that
this 'fallback' wasn't nearly as rare as I'd hoped it to be. In fact its quite
common - and given it walks the whole machine, thats very bad.

The new approach is simple (why didn't I think of it before?), we set the
aggregate shares to the full task group weight, and each larger sched domain
that encounters an aggregate shares larger than the weight, clips it (it
already re-distributes anyway).

This nicely converges to the desired global picture where the sum of all
shares equals the task group weight.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3f5087a2

[PATCH] Build fix for CONFIG_NUMA=y && CONFIG_SMP=n · 03970f06

由 Mike Travis 提交于 4月 22, 2008

Regression caused by 434d53b0Signed-off-by: NMike Travis <travis@sgi.com>
Signed-off-by: NTony Luck <tony.luck@intel.com>

03970f06

[IA64] fix bootmem regression on Altix · 472613b9

由 Russ Anderson 提交于 4月 24, 2008

A recent change prevents SGI Altix from booting.
This patch fixes the problem.

The regresson was introduced in commit 434d53b0Signed-off-by: NRuss Anderson <rja@sgi.com>
Signed-off-by: NTony Luck <tony.luck@intel.com>

472613b9

23 4月, 2008 1 次提交

kernel-doc: fix sched.c missing parameter · 73486722

由 Randy Dunlap 提交于 4月 22, 2008

Add missing kernel-doc in kernel/sched.c:

Warning(linux-2.6.25-git3//kernel/sched.c:7044): No description found for parameter 'span'
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

73486722

20 4月, 2008 4 次提交

I
sched: features fix · c24b7c52
由 Ingo Molnar 提交于 4月 18, 2008
```
Signed-off-by: NIngo Molnar <mingo@elte.hu>
```
c24b7c52

sched: /debug/sched_features · f00b45c1

由 Peter Zijlstra 提交于 4月 19, 2008

provide a text based interface to the scheduler features; this saves the
'user' from setting bits using decimal arithmetic.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f00b45c1

I
sched: add SCHED_FEAT_DEADLINE · 06379aba
由 Ingo Molnar 提交于 4月 19, 2008
```
unused at the moment.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
```
06379aba

sched: fair: weight calculations · 8f1bc385

由 Peter Zijlstra 提交于 4月 19, 2008

In order to level the hierarchy, we need to calculate load based on the
root view. That is, each task's load is in the same unit.

             A
            / \
           B   1
          / \
         2   3

To compute 1's load we do:

	   weight(1)
	--------------
	 rq_weight(A)

To compute 2's load we do:

	  weight(2)      weight(B)
	------------ * -----------
	rq_weight(B)   rw_weight(A)

This yields load fractions in comparable units.

The consequence is that it changes virtual time. We used to have:

                time_{i}
  vtime_{i} = ------------
               weight_{i}

  vtime = \Sum vtime_{i} = time / rq_weight.

But with the new way of load calculation we get that vtime equals time.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

8f1bc385

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功