提交 · 75c28ace9f2b2f403674e045939424a77c95b47c · openanolis / cloud-kernel

15 10月, 2007 10 次提交

sched: revert recent removal of set_curr_task() · 83b699ed

由 Srivatsa Vaddagiri 提交于 10月 15, 2007

Revert removal of set_curr_task.
Use put_prev_task/set_curr_task when changing groups/policies

Signed-off-by: Srivatsa Vaddagiri < vatsa@linux.vnet.ibm.com>
Signed-off-by: NDhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>

83b699ed

sched: rework enqueue/dequeue_entity() to get rid of set_curr_task() · f6b53205

由 Dmitry Adamushko 提交于 10月 15, 2007

rework enqueue/dequeue_entity() to get rid of 
sched_class::set_curr_task(). This simplifies sched_setscheduler(), 
rt_mutex_setprio() and sched_move_tasks().

   text    data     bss     dec     hex filename
  24330    2734      20   27084    69cc sched.o.before
  24233    2730      20   26983    6967 sched.o.after
Signed-off-by: NDmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: NSrivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>

f6b53205

sched: simplify sched_class::yield_task() · 4530d7ab

由 Dmitry Adamushko 提交于 10月 15, 2007

the 'p' (task_struct) parameter in the sched_class :: yield_task() is
redundant as the caller is always the 'current'. Get rid of it.

   text    data     bss     dec     hex filename
  24341    2734      20   27095    69d7 sched.o.before
  24330    2734      20   27084    69cc sched.o.after
Signed-off-by: NDmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>

4530d7ab

sched: do not keep current in the tree and get rid of sched_entity::fair_key · 30cfdcfc

由 Dmitry Adamushko 提交于 10月 15, 2007

Get rid of 'sched_entity::fair_key'.

As a side effect, 'current' is not kept withing the tree for 
SCHED_NORMAL/BATCH tasks anymore. This simplifies some parts of code 
(e.g. entity_tick() and yield_task_fair()) and also somewhat optimizes 
them (e.g. a single update_curr() now vs. dequeue/enqueue() before in 
entity_tick()).
Signed-off-by: NDmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>

30cfdcfc

sched: remove wait_runtime fields and features · bbdba7c0

由 Ingo Molnar 提交于 10月 15, 2007

remove wait_runtime based fields and features, now that the CFS
math has been changed over to the vruntime metric.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NMike Galbraith <efault@gmx.de>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>

bbdba7c0

sched: remove wait_runtime limit · e22f5bbf

由 Ingo Molnar 提交于 10月 15, 2007

remove the wait_runtime-limit fields and the code depending on it, now
that the math has been changed over to rely on the vruntime metric.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NMike Galbraith <efault@gmx.de>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>

e22f5bbf

sched: introduce se->vruntime · e9acbff6

由 Ingo Molnar 提交于 10月 15, 2007

introduce se->vruntime as a sum of weighted delta-exec's, and use that
as the key into the tree.

the idea to use absolute virtual time as the basic metric of scheduling
has been first raised by William Lee Irwin, advanced by Tong Li and first
prototyped by Roman Zippel in the "Really Fair Scheduler" (RFS) patchset.

also see:

   http://lkml.org/lkml/2007/9/2/76

for a simpler variant of this patch.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NMike Galbraith <efault@gmx.de>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>

e9acbff6

sched: remove stat_gran · 8ebc91d9

由 Ingo Molnar 提交于 10月 15, 2007

remove the stat_gran code - it was disabled by default and it causes
unnecessary overhead.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NMike Galbraith <efault@gmx.de>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>

8ebc91d9

sched: use constants if !CONFIG_SCHED_DEBUG · 2bd8e6d4

由 Ingo Molnar 提交于 10月 15, 2007

use constants if !CONFIG_SCHED_DEBUG.

this speeds up the code and reduces code-size:

    text    data     bss     dec     hex filename
   27464    3014      16   30494    771e sched.o.before
   26929    3010      20   29959    7507 sched.o.after
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NMike Galbraith <efault@gmx.de>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>

2bd8e6d4

sched: debug: track maximum 'slice' · eba1ed4b

由 Ingo Molnar 提交于 10月 15, 2007

track the maximum amount of time a task has executed while
the CPU load was at least 2x. (i.e. at least two nice-0
tasks were runnable)
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NMike Galbraith <efault@gmx.de>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>

eba1ed4b

11 10月, 2007 2 次提交

E
[NETNS]: CLONE_NEWNET don't use the same clone flag as the pid namespace. · 169e3674
由 Eric W. Biederman 提交于 9月 27, 2007
```
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
169e3674

[NET]: Add network namespace clone & unshare support. · 9dd776b6

由 Eric W. Biederman 提交于 9月 26, 2007

This patch allows you to create a new network namespace
using sys_clone, or sys_unshare.

As the network namespace is still experimental and under development
clone and unshare support is only made available when CONFIG_NET_NS is
selected at compile time.

As this patch introduces network namespace support into code paths
that exist when the CONFIG_NET is not selected there are a few
additions made to net_namespace.h to allow a few more functions
to be used when the networking stack is not compiled in.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9dd776b6

08 10月, 2007 1 次提交

Don't do load-average calculations at even 5-second intervals · 0c2043ab

由 Linus Torvalds 提交于 10月 07, 2007

It turns out that there are a few other five-second timers in the
kernel, and if the timers get in sync, the load-average can get
artificially inflated by events that just happen to coincide.

So just offset the load average calculation it by a timer tick.

Noticed by Anders Boström, for whom the coincidence started triggering
on one of his machines with the JBD jiffies rounding code (JBD is one of
the subsystems that also end up using a 5-second timer by default).
Tested-by: NAnders Boström <anders@bostrom.dyndns.org>
Cc: Chuck Ebbert <cebbert@redhat.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0c2043ab

21 9月, 2007 1 次提交

signalfd simplification · b8fceee1

由 Davide Libenzi 提交于 9月 20, 2007

This simplifies signalfd code, by avoiding it to remain attached to the
sighand during its lifetime.

In this way, the signalfd remain attached to the sighand only during
poll(2) (and select and epoll) and read(2).  This also allows to remove
all the custom "tsk == current" checks in kernel/signal.c, since
dequeue_signal() will only be called by "current".

I think this is also what Ben was suggesting time ago.

The external effect of this, is that a thread can extract only its own
private signals and the group ones.  I think this is an acceptable
behaviour, in that those are the signals the thread would be able to
fetch w/out signalfd.
Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b8fceee1

20 9月, 2007 3 次提交

sched: add /proc/sys/kernel/sched_compat_yield · 1799e35d

由 Ingo Molnar 提交于 9月 19, 2007

add /proc/sys/kernel/sched_compat_yield to make sys_sched_yield()
more agressive, by moving the yielding task to the last position
in the rbtree.

with sched_compat_yield=0:

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  2539 mingo     20   0  1576  252  204 R   50  0.0   0:02.03 loop_yield
  2541 mingo     20   0  1576  244  196 R   50  0.0   0:02.05 loop

with sched_compat_yield=1:

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  2584 mingo     20   0  1576  248  196 R   99  0.0   0:52.45 loop
  2582 mingo     20   0  1576  256  204 R    0  0.0   0:00.00 loop_yield
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>

1799e35d

Fix user namespace exiting OOPs · 28f300d2

由 Pavel Emelyanov 提交于 9月 18, 2007

It turned out, that the user namespace is released during the do_exit() in
exit_task_namespaces(), but the struct user_struct is released only during the
put_task_struct(), i.e.  MUCH later.

On debug kernels with poisoned slabs this will cause the oops in
uid_hash_remove() because the head of the chain, which resides inside the
struct user_namespace, will be already freed and poisoned.

Since the uid hash itself is required only when someone can search it, i.e.
when the namespace is alive, we can safely unhash all the user_struct-s from
it during the namespace exiting.  The subsequent free_uid() will complete the
user_struct destruction.

For example simple program

   #include <sched.h>

   char stack[2 * 1024 * 1024];

   int f(void *foo)
   {
   	return 0;
   }

   int main(void)
   {
   	clone(f, stack + 1 * 1024 * 1024, 0x10000000, 0);
   	return 0;
   }

run on kernel with CONFIG_USER_NS turned on will oops the
kernel immediately.

This was spotted during OpenVZ kernel testing.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NAlexey Dobriyan <adobriyan@openvz.org>
Acked-by: N"Serge E. Hallyn" <serue@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

28f300d2

Convert uid hash to hlist · 735de223

由 Pavel Emelyanov 提交于 9月 18, 2007

Surprisingly, but (spotted by Alexey Dobriyan) the uid hash still uses
list_heads, thus occupying twice as much place as it could.  Convert it to
hlist_heads.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NAlexey Dobriyan <adobriyan@openvz.org>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

735de223

28 8月, 2007 1 次提交

sched: make the scheduler converge to the ideal latency · f6cf891c

由 Ingo Molnar 提交于 8月 28, 2007

de-HZ-ification of the granularity defaults unearthed a pre-existing
property of CFS: while it correctly converges to the granularity goal,
it does not prevent run-time fluctuations in the range of
[-gran ... 0 ... +gran].

With the increase of the granularity due to the removal of HZ
dependencies, this becomes visible in chew-max output (with 5 tasks
running):

 out:  28 . 27. 32 | flu:  0 .  0 | ran:    9 .   13 | per:   37 .   40
 out:  27 . 27. 32 | flu:  0 .  0 | ran:   17 .   13 | per:   44 .   40
 out:  27 . 27. 32 | flu:  0 .  0 | ran:    9 .   13 | per:   36 .   40
 out:  29 . 27. 32 | flu:  2 .  0 | ran:   17 .   13 | per:   46 .   40
 out:  28 . 27. 32 | flu:  0 .  0 | ran:    9 .   13 | per:   37 .   40
 out:  29 . 27. 32 | flu:  0 .  0 | ran:   18 .   13 | per:   47 .   40
 out:  28 . 27. 32 | flu:  0 .  0 | ran:    9 .   13 | per:   37 .   40

average slice is the ideal 13 msecs and the period is picture-perfect 40
msecs. But the 'ran' field fluctuates around 13.33 msecs and there's no
mechanism in CFS to keep that from happening: it's a perfectly valid
solution that CFS finds.

to fix this we add a granularity/preemption rule that knows about
the "target latency", which makes tasks that run longer than the ideal
latency run a bit less. The simplest approach is to simply decrease the
preemption granularity when a task overruns its ideal latency. For this
we have to track how much the task executed since its last preemption.

( this adds a new field to task_struct, but we can eliminate that
  overhead in 2.6.24 by putting all the scheduler timestamps into an
  anonymous union. )

with this change in place, chew-max output is fluctuation-less all
around:

 out:  28 . 27. 39 | flu:  0 .  2 | ran:   13 .   13 | per:   41 .   40
 out:  28 . 27. 39 | flu:  0 .  2 | ran:   13 .   13 | per:   41 .   40
 out:  28 . 27. 39 | flu:  0 .  2 | ran:   13 .   13 | per:   41 .   40
 out:  28 . 27. 39 | flu:  0 .  2 | ran:   13 .   13 | per:   41 .   40
 out:  28 . 27. 39 | flu:  0 .  1 | ran:   13 .   13 | per:   41 .   40
 out:  28 . 27. 39 | flu:  0 .  1 | ran:   13 .   13 | per:   41 .   40

this patch has no impact on any fastpath or on any globally observable
scheduling property. (unless you have sharp enough eyes to see
millisecond-level ruckles in glxgears smoothness :-)
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NMike Galbraith <efault@gmx.de>

f6cf891c

26 8月, 2007 2 次提交

sched: cleanup, sched_granularity -> sched_min_granularity · 172ac3db

由 Ingo Molnar 提交于 8月 25, 2007

due to adaptive granularity scheduling the role of sched_granularity
has changed to "minimum granularity", so rename the variable (and the
tunable) accordingly.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>

172ac3db

sched: adaptive scheduler granularity · 21805085

由 Peter Zijlstra 提交于 8月 25, 2007

Instead of specifying the preemption granularity, specify the wanted
latency. By fixing the granlarity to a constany the wakeup latency
it a function of the number of running tasks on the rq.

Invert this relation.

sysctl_sched_granularity becomes a minimum for the dynamic granularity
computed from the new sysctl_sched_latency.

Then use this latency to do more intelligent granularity decisions: if
there are fewer tasks running then we can schedule coarser. This helps
performance while still always keeping the latency target.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

21805085

23 8月, 2007 2 次提交

sched: fix broken SMT/MC optimizations · f8700df7

由 Suresh Siddha 提交于 8月 23, 2007

On a four package system with HT - HT load balancing optimizations were
broken.  For example, if two tasks end up running on two logical threads
of one of the packages, scheduler is not able to pull one of the tasks
to a completely idle package.

In this scenario, for nice-0 tasks, imbalance calculated by scheduler
will be 512 and find_busiest_queue() will return 0 (as each cpu's load
is 1024 > imbalance and has only one task running).

Similarly MC scheduler optimizations also get fixed with this patch.

[ mingo@elte.hu: restored fair balancing by increasing the fuzz and
                 adding it back to the power decision, without the /2
                 factor. ]
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f8700df7

sched: sched_clock_idle_[sleep|wakeup]_event() · 2aa44d05

由 Ingo Molnar 提交于 8月 23, 2007

construct a more or less wall-clock time out of sched_clock(), by
using ACPI-idle's existing knowledge about how much time we spent
idling. This allows the rq clock to work around TSC-stops-in-C2,
TSC-gets-corrupted-in-C3 type of problems.

( Besides the scheduler's statistics this also benefits blktrace and
  printk-timestamps as well. )

Furthermore, the precise before-C2/C3-sleep and after-C2/C3-wakeup
callbacks allow the scheduler to get out the most of the period where
the CPU has a reliable TSC. This results in slightly more precise
task statistics.

the ACPI bits were acked by Len.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Acked-by: NLen Brown <len.brown@intel.com>

2aa44d05

09 8月, 2007 8 次提交

sched: remove the 'u64 now' parameter from ->task_new() · ee0827d8

由 Ingo Molnar 提交于 8月 09, 2007

remove the 'u64 now' parameter from ->task_new().

( identity transformation that causes no change in functionality. )
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ee0827d8

sched: remove the 'u64 now' parameter from ->put_prev_task() · 31ee529c

由 Ingo Molnar 提交于 8月 09, 2007

remove the 'u64 now' parameter from ->put_prev_task().

( identity transformation that causes no change in functionality. )
Signed-off-by: NIngo Molnar <mingo@elte.hu>

31ee529c

sched: remove the 'u64 now' parameter from ->pick_next_task() · fb8d4724

由 Ingo Molnar 提交于 8月 09, 2007

remove the 'u64 now' parameter from ->pick_next_task().

( identity transformation that causes no change in functionality. )
Signed-off-by: NIngo Molnar <mingo@elte.hu>

fb8d4724

sched: remove the 'u64 now' parameter from ->dequeue_task() · f02231e5

由 Ingo Molnar 提交于 8月 09, 2007

remove the 'u64 now' parameter from ->dequeue_task().

( identity transformation that causes no change in functionality. )
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f02231e5

sched: remove the 'u64 now' parameter from ->enqueue_task() · fd390f6a

由 Ingo Molnar 提交于 8月 09, 2007

remove the 'u64 now' parameter from ->enqueue_task().

( identity transformation that causes no change in functionality. )
Signed-off-by: NIngo Molnar <mingo@elte.hu>

fd390f6a

sched: remove the 'u64 now' parameter from print_cfs_rq() · 5cef9eca

由 Ingo Molnar 提交于 8月 09, 2007

remove the 'u64 now' parameter from print_cfs_rq().

( identity transformation that causes no change in functionality. )
Signed-off-by: NIngo Molnar <mingo@elte.hu>

5cef9eca

sched: fix bug in balance_tasks() · a4ac01c3

由 Peter Williams 提交于 8月 09, 2007

There are two problems with balance_tasks() and how it used:

1. The variables best_prio and best_prio_seen (inherited from the old
move_tasks()) were only required to handle problems caused by the
active/expired arrays, the order in which they were processed and the
possibility that the task with the highest priority could be on either.
These issues are no longer present and the extra overhead associated
with their use is unnecessary (and possibly wrong).

2. In the absence of CONFIG_FAIR_GROUP_SCHED being set, the same
this_best_prio variable needs to be used by all scheduling classes or
there is a risk of moving too much load. E.g. if the highest priority
task on this at the beginning is a fairly low priority task and the rt
class migrates a task (during its turn) then that moved task becomes the
new highest priority task on this_rq but when the sched_fair class
initializes its copy of this_best_prio it will get the priority of the
original highest priority task as, due to the run queue locks being
held, the reschedule triggered by pull_task() will not have taken place.
This could result in inappropriate overriding of skip_for_load and
excessive load being moved.

The attached patch addresses these problems by deleting all reference to
best_prio and best_prio_seen and making this_best_prio a reference
parameter to the various functions involved.

load_balance_fair() has also been modified so that this_best_prio is
only reset (in the loop) if CONFIG_FAIR_GROUP_SCHED is set. This should
preserve the effect of helping spread groups' higher priority tasks
around the available CPUs while improving system performance when
CONFIG_FAIR_GROUP_SCHED isn't set.
Signed-off-by: NPeter Williams <pwil3058@bigpond.net.au>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a4ac01c3

sched: simplify move_tasks() · 43010659

由 Peter Williams 提交于 8月 09, 2007

The move_tasks() function is currently multiplexed with two distinct
capabilities:

1. attempt to move a specified amount of weighted load from one run
queue to another; and
2. attempt to move a specified number of tasks from one run queue to
another.

The first of these capabilities is used in two places, load_balance()
and load_balance_idle(), and in both of these cases the return value of
move_tasks() is used purely to decide if tasks/load were moved and no
notice of the actual number of tasks moved is taken.

The second capability is used in exactly one place,
active_load_balance(), to attempt to move exactly one task and, as
before, the return value is only used as an indicator of success or failure.

This multiplexing of sched_task() was introduced, by me, as part of the
smpnice patches and was motivated by the fact that the alternative, one
function to move specified load and one to move a single task, would
have led to two functions of roughly the same complexity as the old
move_tasks() (or the new balance_tasks()).  However, the new modular
design of the new CFS scheduler allows a simpler solution to be adopted
and this patch addresses that solution by:

1. adding a new function, move_one_task(), to be used by
active_load_balance(); and
2. making move_tasks() a single purpose function that tries to move a
specified weighted load and returns 1 for success and 0 for failure.

One of the consequences of these changes is that neither move_one_task()
or the new move_tasks() care how many tasks sched_class.load_balance()
moves and this enables its interface to be simplified by returning the
amount of load moved as its result and removing the load_moved pointer
from the argument list.  This helps simplify the new move_tasks() and
slightly reduces the amount of work done in each of
sched_class.load_balance()'s implementations.

Further simplification, e.g. changes to balance_tasks(), are possible
but (slightly) complicated by the special needs of load_balance_fair()
so I've left them to a later patch (if this one gets accepted).

NB Since move_tasks() gets called with two run queue locks held even
small reductions in overhead are worthwhile.

[ mingo@elte.hu ]

this change also reduces code size nicely:

   text    data     bss     dec     hex filename
   39216    3618      24   42858    a76a sched.o.before
   39173    3618      24   42815    a73f sched.o.after
Signed-off-by: NPeter Williams <pwil3058@bigpond.net.au>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

43010659

02 8月, 2007 3 次提交

[PATCH] sched: reduce task_struct size · 94c18227

由 Ingo Molnar 提交于 8月 02, 2007

more task_struct size reduction, by moving the debugging/instrumentation
fields to under CONFIG_SCHEDSTATS:

 (i386, nodebug):

                          size
                          ----
     pre-CFS              1328
         CFS              1472
         CFS+patch        1376
Signed-off-by: NIngo Molnar <mingo@elte.hu>

94c18227

[PATCH] sched: ->task_new cleanup · cad60d93

由 Ingo Molnar 提交于 8月 02, 2007

make sched_class.task_new == NULL a 'default method', this
allows the removal of task_rt_new.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

cad60d93

[PATCH] sched: remove cache_hot_time · 362a7016

由 Ingo Molnar 提交于 8月 02, 2007

remove the last unused remains of cache_hot_time.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

362a7016

26 7月, 2007 3 次提交

[PATCH] sched: add above_background_load() function · d02c7a8c

由 Con Kolivas 提交于 7月 26, 2007

Add an above_background_load() function which can be used by other
subsystems to detect if there is anything besides niced tasks running.

Place it in sched.h to allow it to be compiled out if not used.

Unused for now, but it is a useful hint to the IO scheduler and to
swap-prefetch.
Signed-off-by: NCon Kolivas <kernel@kolivas.org>
Cc: Peter Williams <pwil3058@bigpond.net.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d02c7a8c

[PATCH] sched: arch preempt notifier mechanism · e107be36

由 Avi Kivity 提交于 7月 26, 2007

This adds a general mechanism whereby a task can request the scheduler to
notify it whenever it is preempted or scheduled back in.  This allows the
task to swap any special-purpose registers like the fpu or Intel's VT
registers.
Signed-off-by: NAvi Kivity <avi@qumranet.com>
[ mingo@elte.hu: fixes, cleanups ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e107be36

[PATCH] sched: increase SCHED_LOAD_SCALE_FUZZ · b47e8608

由 Ingo Molnar 提交于 7月 26, 2007

increase SCHED_LOAD_SCALE_FUZZ that adds a small amount of
over-balancing: to help distribute CPU-bound tasks more fairly on SMP
systems.

the problem of unfair balancing was noticed and reported by Tong N Li.

10 CPU-bound tasks running on 8 CPUs, v2.6.23-rc1:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2572 mingo     20   0  1576  244  196 R  100  0.0   1:03.61 loop
 2578 mingo     20   0  1576  248  196 R  100  0.0   1:03.59 loop
 2576 mingo     20   0  1576  248  196 R  100  0.0   1:03.52 loop
 2571 mingo     20   0  1576  244  196 R  100  0.0   1:03.46 loop
 2569 mingo     20   0  1576  244  196 R   99  0.0   1:03.36 loop
 2570 mingo     20   0  1576  244  196 R   95  0.0   1:00.55 loop
 2577 mingo     20   0  1576  248  196 R   50  0.0   0:31.88 loop
 2574 mingo     20   0  1576  248  196 R   50  0.0   0:31.87 loop
 2573 mingo     20   0  1576  248  196 R   50  0.0   0:31.86 loop
 2575 mingo     20   0  1576  248  196 R   50  0.0   0:31.86 loop

v2.6.23-rc1 + patch:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2681 mingo     20   0  1576  244  196 R   85  0.0   3:51.68 loop
 2688 mingo     20   0  1576  244  196 R   81  0.0   3:46.35 loop
 2682 mingo     20   0  1576  244  196 R   80  0.0   3:43.68 loop
 2685 mingo     20   0  1576  248  196 R   80  0.0   3:45.97 loop
 2683 mingo     20   0  1576  248  196 R   80  0.0   3:40.25 loop
 2679 mingo     20   0  1576  244  196 R   80  0.0   3:33.53 loop
 2680 mingo     20   0  1576  244  196 R   79  0.0   3:43.53 loop
 2686 mingo     20   0  1576  244  196 R   79  0.0   3:39.31 loop
 2687 mingo     20   0  1576  244  196 R   78  0.0   3:33.31 loop
 2684 mingo     20   0  1576  244  196 R   77  0.0   3:27.52 loop

so they now nicely converge to the expected 80% long-term CPU usage.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b47e8608

20 7月, 2007 3 次提交

[PATCH] sched: implement cpu_clock(cpu) high-speed time source · e436d800

由 Ingo Molnar 提交于 7月 19, 2007

Implement the cpu_clock(cpu) interface for kernel-internal use:
high-speed (but slightly incorrect) per-cpu clock constructed from
sched_clock().

This API, unused at the moment, will be used in the future by blktrace,
by the softlockup-watchdog, by printk and by lockstat.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e436d800

coredump masking: add an interface for core dump filter · 3cb4a0bb

由 Kawai, Hidehiro 提交于 7月 19, 2007

This patch adds an interface to set/reset flags which determines each memory
segment should be dumped or not when a core file is generated.

/proc/<pid>/coredump_filter file is provided to access the flags.  You can
change the flag status for a particular process by writing to or reading from
the file.

The flag status is inherited to the child process when it is created.
Signed-off-by: NHidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3cb4a0bb

coredump masking: reimplementation of dumpable using two flags · 6c5d5238

由 Kawai, Hidehiro 提交于 7月 19, 2007

This patch changes mm_struct.dumpable to a pair of bit flags.

set_dumpable() converts three-value dumpable to two flags and stores it into
lower two bits of mm_struct.flags instead of mm_struct.dumpable.
get_dumpable() behaves in the opposite way.

[akpm@linux-foundation.org: export set_dumpable]
Signed-off-by: NHidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6c5d5238

17 7月, 2007 1 次提交

user namespace: add unshare · 77ec739d

由 Serge E. Hallyn 提交于 7月 15, 2007

This patch enables the unshare of user namespaces.

It adds a new clone flag CLONE_NEWUSER and implements copy_user_ns() which
resets the current user_struct and adds a new root user (uid == 0)

For now, unsharing the user namespace allows a process to reset its
user_struct accounting and uid 0 in the new user namespace should be contained
using appropriate means, for instance selinux

The plan, when the full support is complete (all uid checks covered), is to
keep the original user's rights in the original namespace, and let a process
become uid 0 in the new namespace, with full capabilities to the new
namespace.
Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
Acked-by: NPavel Emelianov <xemul@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Andrew Morgan <agm@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

77ec739d

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功