提交 · ff95f3df54609d9d4b9572f8a67d09922a645043 · openanolis / cloud-kernel

09 8月, 2007 6 次提交

sched: remove the 'u64 now' parameter from ->pick_next_task() · fb8d4724

由 Ingo Molnar 提交于 8月 09, 2007

remove the 'u64 now' parameter from ->pick_next_task().

( identity transformation that causes no change in functionality. )
Signed-off-by: NIngo Molnar <mingo@elte.hu>

fb8d4724

sched: remove the 'u64 now' parameter from ->dequeue_task() · f02231e5

由 Ingo Molnar 提交于 8月 09, 2007

remove the 'u64 now' parameter from ->dequeue_task().

( identity transformation that causes no change in functionality. )
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f02231e5

sched: remove the 'u64 now' parameter from ->enqueue_task() · fd390f6a

由 Ingo Molnar 提交于 8月 09, 2007

remove the 'u64 now' parameter from ->enqueue_task().

( identity transformation that causes no change in functionality. )
Signed-off-by: NIngo Molnar <mingo@elte.hu>

fd390f6a

sched: remove the 'u64 now' parameter from print_cfs_rq() · 5cef9eca

由 Ingo Molnar 提交于 8月 09, 2007

remove the 'u64 now' parameter from print_cfs_rq().

( identity transformation that causes no change in functionality. )
Signed-off-by: NIngo Molnar <mingo@elte.hu>

5cef9eca

sched: fix bug in balance_tasks() · a4ac01c3

由 Peter Williams 提交于 8月 09, 2007

There are two problems with balance_tasks() and how it used:

1. The variables best_prio and best_prio_seen (inherited from the old
move_tasks()) were only required to handle problems caused by the
active/expired arrays, the order in which they were processed and the
possibility that the task with the highest priority could be on either.
These issues are no longer present and the extra overhead associated
with their use is unnecessary (and possibly wrong).

2. In the absence of CONFIG_FAIR_GROUP_SCHED being set, the same
this_best_prio variable needs to be used by all scheduling classes or
there is a risk of moving too much load. E.g. if the highest priority
task on this at the beginning is a fairly low priority task and the rt
class migrates a task (during its turn) then that moved task becomes the
new highest priority task on this_rq but when the sched_fair class
initializes its copy of this_best_prio it will get the priority of the
original highest priority task as, due to the run queue locks being
held, the reschedule triggered by pull_task() will not have taken place.
This could result in inappropriate overriding of skip_for_load and
excessive load being moved.

The attached patch addresses these problems by deleting all reference to
best_prio and best_prio_seen and making this_best_prio a reference
parameter to the various functions involved.

load_balance_fair() has also been modified so that this_best_prio is
only reset (in the loop) if CONFIG_FAIR_GROUP_SCHED is set. This should
preserve the effect of helping spread groups' higher priority tasks
around the available CPUs while improving system performance when
CONFIG_FAIR_GROUP_SCHED isn't set.
Signed-off-by: NPeter Williams <pwil3058@bigpond.net.au>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a4ac01c3

sched: simplify move_tasks() · 43010659

由 Peter Williams 提交于 8月 09, 2007

The move_tasks() function is currently multiplexed with two distinct
capabilities:

1. attempt to move a specified amount of weighted load from one run
queue to another; and
2. attempt to move a specified number of tasks from one run queue to
another.

The first of these capabilities is used in two places, load_balance()
and load_balance_idle(), and in both of these cases the return value of
move_tasks() is used purely to decide if tasks/load were moved and no
notice of the actual number of tasks moved is taken.

The second capability is used in exactly one place,
active_load_balance(), to attempt to move exactly one task and, as
before, the return value is only used as an indicator of success or failure.

This multiplexing of sched_task() was introduced, by me, as part of the
smpnice patches and was motivated by the fact that the alternative, one
function to move specified load and one to move a single task, would
have led to two functions of roughly the same complexity as the old
move_tasks() (or the new balance_tasks()).  However, the new modular
design of the new CFS scheduler allows a simpler solution to be adopted
and this patch addresses that solution by:

1. adding a new function, move_one_task(), to be used by
active_load_balance(); and
2. making move_tasks() a single purpose function that tries to move a
specified weighted load and returns 1 for success and 0 for failure.

One of the consequences of these changes is that neither move_one_task()
or the new move_tasks() care how many tasks sched_class.load_balance()
moves and this enables its interface to be simplified by returning the
amount of load moved as its result and removing the load_moved pointer
from the argument list.  This helps simplify the new move_tasks() and
slightly reduces the amount of work done in each of
sched_class.load_balance()'s implementations.

Further simplification, e.g. changes to balance_tasks(), are possible
but (slightly) complicated by the special needs of load_balance_fair()
so I've left them to a later patch (if this one gets accepted).

NB Since move_tasks() gets called with two run queue locks held even
small reductions in overhead are worthwhile.

[ mingo@elte.hu ]

this change also reduces code size nicely:

   text    data     bss     dec     hex filename
   39216    3618      24   42858    a76a sched.o.before
   39173    3618      24   42815    a73f sched.o.after
Signed-off-by: NPeter Williams <pwil3058@bigpond.net.au>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

43010659

02 8月, 2007 3 次提交

[PATCH] sched: reduce task_struct size · 94c18227

由 Ingo Molnar 提交于 8月 02, 2007

more task_struct size reduction, by moving the debugging/instrumentation
fields to under CONFIG_SCHEDSTATS:

 (i386, nodebug):

                          size
                          ----
     pre-CFS              1328
         CFS              1472
         CFS+patch        1376
Signed-off-by: NIngo Molnar <mingo@elte.hu>

94c18227

[PATCH] sched: ->task_new cleanup · cad60d93

由 Ingo Molnar 提交于 8月 02, 2007

make sched_class.task_new == NULL a 'default method', this
allows the removal of task_rt_new.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

cad60d93

[PATCH] sched: remove cache_hot_time · 362a7016

由 Ingo Molnar 提交于 8月 02, 2007

remove the last unused remains of cache_hot_time.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

362a7016

26 7月, 2007 3 次提交

[PATCH] sched: add above_background_load() function · d02c7a8c

由 Con Kolivas 提交于 7月 26, 2007

Add an above_background_load() function which can be used by other
subsystems to detect if there is anything besides niced tasks running.

Place it in sched.h to allow it to be compiled out if not used.

Unused for now, but it is a useful hint to the IO scheduler and to
swap-prefetch.
Signed-off-by: NCon Kolivas <kernel@kolivas.org>
Cc: Peter Williams <pwil3058@bigpond.net.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d02c7a8c

[PATCH] sched: arch preempt notifier mechanism · e107be36

由 Avi Kivity 提交于 7月 26, 2007

This adds a general mechanism whereby a task can request the scheduler to
notify it whenever it is preempted or scheduled back in.  This allows the
task to swap any special-purpose registers like the fpu or Intel's VT
registers.
Signed-off-by: NAvi Kivity <avi@qumranet.com>
[ mingo@elte.hu: fixes, cleanups ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e107be36

[PATCH] sched: increase SCHED_LOAD_SCALE_FUZZ · b47e8608

由 Ingo Molnar 提交于 7月 26, 2007

increase SCHED_LOAD_SCALE_FUZZ that adds a small amount of
over-balancing: to help distribute CPU-bound tasks more fairly on SMP
systems.

the problem of unfair balancing was noticed and reported by Tong N Li.

10 CPU-bound tasks running on 8 CPUs, v2.6.23-rc1:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2572 mingo     20   0  1576  244  196 R  100  0.0   1:03.61 loop
 2578 mingo     20   0  1576  248  196 R  100  0.0   1:03.59 loop
 2576 mingo     20   0  1576  248  196 R  100  0.0   1:03.52 loop
 2571 mingo     20   0  1576  244  196 R  100  0.0   1:03.46 loop
 2569 mingo     20   0  1576  244  196 R   99  0.0   1:03.36 loop
 2570 mingo     20   0  1576  244  196 R   95  0.0   1:00.55 loop
 2577 mingo     20   0  1576  248  196 R   50  0.0   0:31.88 loop
 2574 mingo     20   0  1576  248  196 R   50  0.0   0:31.87 loop
 2573 mingo     20   0  1576  248  196 R   50  0.0   0:31.86 loop
 2575 mingo     20   0  1576  248  196 R   50  0.0   0:31.86 loop

v2.6.23-rc1 + patch:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2681 mingo     20   0  1576  244  196 R   85  0.0   3:51.68 loop
 2688 mingo     20   0  1576  244  196 R   81  0.0   3:46.35 loop
 2682 mingo     20   0  1576  244  196 R   80  0.0   3:43.68 loop
 2685 mingo     20   0  1576  248  196 R   80  0.0   3:45.97 loop
 2683 mingo     20   0  1576  248  196 R   80  0.0   3:40.25 loop
 2679 mingo     20   0  1576  244  196 R   80  0.0   3:33.53 loop
 2680 mingo     20   0  1576  244  196 R   79  0.0   3:43.53 loop
 2686 mingo     20   0  1576  244  196 R   79  0.0   3:39.31 loop
 2687 mingo     20   0  1576  244  196 R   78  0.0   3:33.31 loop
 2684 mingo     20   0  1576  244  196 R   77  0.0   3:27.52 loop

so they now nicely converge to the expected 80% long-term CPU usage.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b47e8608

20 7月, 2007 3 次提交

[PATCH] sched: implement cpu_clock(cpu) high-speed time source · e436d800

由 Ingo Molnar 提交于 7月 19, 2007

Implement the cpu_clock(cpu) interface for kernel-internal use:
high-speed (but slightly incorrect) per-cpu clock constructed from
sched_clock().

This API, unused at the moment, will be used in the future by blktrace,
by the softlockup-watchdog, by printk and by lockstat.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e436d800

coredump masking: add an interface for core dump filter · 3cb4a0bb

由 Kawai, Hidehiro 提交于 7月 19, 2007

This patch adds an interface to set/reset flags which determines each memory
segment should be dumped or not when a core file is generated.

/proc/<pid>/coredump_filter file is provided to access the flags.  You can
change the flag status for a particular process by writing to or reading from
the file.

The flag status is inherited to the child process when it is created.
Signed-off-by: NHidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3cb4a0bb

coredump masking: reimplementation of dumpable using two flags · 6c5d5238

由 Kawai, Hidehiro 提交于 7月 19, 2007

This patch changes mm_struct.dumpable to a pair of bit flags.

set_dumpable() converts three-value dumpable to two flags and stores it into
lower two bits of mm_struct.flags instead of mm_struct.dumpable.
get_dumpable() behaves in the opposite way.

[akpm@linux-foundation.org: export set_dumpable]
Signed-off-by: NHidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6c5d5238

17 7月, 2007 4 次提交

user namespace: add unshare · 77ec739d

由 Serge E. Hallyn 提交于 7月 15, 2007

This patch enables the unshare of user namespaces.

It adds a new clone flag CLONE_NEWUSER and implements copy_user_ns() which
resets the current user_struct and adds a new root user (uid == 0)

For now, unsharing the user namespace allows a process to reset its
user_struct accounting and uid 0 in the new user namespace should be contained
using appropriate means, for instance selinux

The plan, when the full support is complete (all uid checks covered), is to
keep the original user's rights in the original namespace, and let a process
become uid 0 in the new namespace, with full capabilities to the new
namespace.
Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
Acked-by: NPavel Emelianov <xemul@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Andrew Morgan <agm@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

77ec739d

user namespace: add the framework · acce292c

由 Cedric Le Goater 提交于 7月 15, 2007

Basically, it will allow a process to unshare its user_struct table,
resetting at the same time its own user_struct and all the associated
accounting.

A new root user (uid == 0) is added to the user namespace upon creation.
Such root users have full privileges and it seems that theses privileges
should be controlled through some means (process capabilities ?)

The unshare is not included in this patch.

Changes since [try #4]:
	- Updated get_user_ns and put_user_ns to accept NULL, and
	  get_user_ns to return the namespace.

Changes since [try #3]:
	- moved struct user_namespace to files user_namespace.{c,h}

Changes since [try #2]:
	- removed struct user_namespace* argument from find_user()

Changes since [try #1]:
	- removed struct user_namespace* argument from find_user()
	- added a root_user per user namespace
Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
Acked-by: NPavel Emelianov <xemul@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Andrew Morgan <agm@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

acce292c

Audit: add TTY input auditing · 522ed776

由 Miloslav Trmac 提交于 7月 15, 2007

Add TTY input auditing, used to audit system administrator's actions.  This is
required by various security standards such as DCID 6/3 and PCI to provide
non-repudiation of administrator's actions and to allow a review of past
actions if the administrator seems to overstep their duties or if the system
becomes misconfigured for unknown reasons.  These requirements do not make it
necessary to audit TTY output as well.

Compared to an user-space keylogger, this approach records TTY input using the
audit subsystem, correlated with other audit events, and it is completely
transparent to the user-space application (e.g.  the console ioctls still
work).

TTY input auditing works on a higher level than auditing all system calls
within the session, which would produce an overwhelming amount of mostly
useless audit events.

Add an "audit_tty" attribute, inherited across fork ().  Data read from TTYs
by process with the attribute is sent to the audit subsystem by the kernel.
The audit netlink interface is extended to allow modifying the audit_tty
attribute, and to allow sending explanatory audit events from user-space (for
example, a shell might send an event containing the final command, after the
interactive command-line editing and history expansion is performed, which
might be difficult to decipher from the TTY input alone).

Because the "audit_tty" attribute is inherited across fork (), it would be set
e.g.  for sshd restarted within an audited session.  To prevent this, the
audit_tty attribute is cleared when a process with no open TTY file
descriptors (e.g.  after daemon startup) opens a TTY.

See https://www.redhat.com/archives/linux-audit/2007-June/msg00000.html for a
more detailed rationale document for an older version of this patch.

[akpm@linux-foundation.org: build fix]
Signed-off-by: NMiloslav Trmac <mitr@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Paul Fulghum <paulkf@microgate.com>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

522ed776

Use boot based time for process start time and boot time in /proc · 924b42d5

由 Tomas Janousek 提交于 7月 15, 2007

Commit 411187fb caused boot time to move and
process start times to become invalid after suspend.  Using boot based time
for those restores the old behaviour and fixes the issue.

[akpm@linux-foundation.org: little cleanup]
Signed-off-by: NTomas Janousek <tjanouse@redhat.com>
Cc: Tomas Smetana <tsmetana@redhat.com>
Acked-by: NJohn Stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

924b42d5

10 7月, 2007 19 次提交

sched: micro-optimize mmdrop() · 6fb43d7b

由 Ingo Molnar 提交于 7月 09, 2007

micro-optimize mmdrop(). Improves schedule()'s assembly a bit.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

6fb43d7b

sched: scheduler debugging, core · 43ae34cb

由 Ingo Molnar 提交于 7月 09, 2007

scheduler debugging core: implement /proc/sched_debug and
/proc/<PID>/sched files for scheduler debugging.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

43ae34cb

sched: remove old cpu accounting field · 7dd59360

由 Ingo Molnar 提交于 7月 09, 2007

remove the old cpu-accounting field from signal_struct, now
that the code is using CFS's stats.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7dd59360

sched: remove batch_task() · 0c57d589

由 Ingo Molnar 提交于 7月 09, 2007

batch_task() in sched.h is now unused - remove it.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

0c57d589

sched: remove interactivity types from sched.h · 50e645a8

由 Ingo Molnar 提交于 7月 09, 2007

remove now-unused types/fields used by the old scheduler.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

50e645a8

sched: clean up fastcall uses of sched_fork()/sched_exit() · ad46c2c4

由 Ingo Molnar 提交于 7月 09, 2007

sched_fork()/sched_exit() does not need to specify fastcall anymore,
as the x86 kernel defaults to regparm3, and no assembly code calls
these functions.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ad46c2c4

B
sched: update delay-accounting to use CFS's precise stats · 172ba844
由 Balbir Singh 提交于 7月 09, 2007
```
update delay-accounting to use CFS's precise stats.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
```
172ba844

sched: x86, track TSC-unstable events · bb29ab26

由 Ingo Molnar 提交于 7月 09, 2007

track TSC-unstable events and propagate it to the scheduler code.
Also allow sched_clock() to be used when the TSC is unstable,
the rq_clock() wrapper creates a reliable clock out of it.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

bb29ab26

sched: remove sleep_type · f2ac58ee

由 Ingo Molnar 提交于 7月 09, 2007

remove the sleep_type heuristics from the core scheduler - scheduling
policy is implemented in the scheduling-policy modules. (and CFS does
not use this type of sleep-type heuristics)
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f2ac58ee

sched: clean up the rt priority macros · e05606d3

由 Ingo Molnar 提交于 7月 09, 2007

clean up the rt priority macros, pointed out by Andrew Morton.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e05606d3

sched: make posix-cpu-timers use CFS's accounting information · 41b86e9c

由 Ingo Molnar 提交于 7月 09, 2007

update the posix-cpu-timers code to use CFS's CPU accounting information.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

41b86e9c

sched: cfs, core data types · 20b8a59f

由 Ingo Molnar 提交于 7月 09, 2007

add the CFS data types to sched.h.

(the old scheduler is still fully intact.)
Signed-off-by: NIngo Molnar <mingo@elte.hu>

20b8a59f

sched: cfs core, kernel/sched_fair.c · bf0f6f24

由 Ingo Molnar 提交于 7月 09, 2007

add kernel/sched_fair.c - which implements the bulk of CFS's
behavioral changes for SCHED_OTHER tasks.

see Documentation/sched-design-CFS.txt about details.

Authors:

 Ingo Molnar <mingo@elte.hu>
 Dmitry Adamushko <dmitry.adamushko@gmail.com>
 Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
 Mike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NDmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: NSrivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>

bf0f6f24

sched: increase the resolution of smpnice · 9aa7b369

由 Ingo Molnar 提交于 7月 09, 2007

increase SMP-nice's resolution. This is needed by CFS to
implement SCHED_IDLE and cleaned up nice level support.

no behavioral changes.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

9aa7b369

sched: add init_idle_bootup_task() · 1df21055

由 Ingo Molnar 提交于 7月 09, 2007

add the init_idle_bootup_task() callback to the bootup thread,
unused at the moment. (CFS will use it to switch the scheduling
class of the boot thread to the idle class)
Signed-off-by: NIngo Molnar <mingo@elte.hu>

1df21055

sched: uninline set_task_cpu() · c65cc870

由 Ingo Molnar 提交于 7月 09, 2007

uninline set_task_cpu(): CFS will add more code to it.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c65cc870

sched: zap the migration init / cache-hot balancing code · 0437e109

由 Ingo Molnar 提交于 7月 09, 2007

the SMP load-balancer uses the boot-time migration-cost estimation
code to attempt to improve the quality of balancing. The reason for
this code is that the discrete priority queues do not preserve
the order of scheduling accurately, so the load-balancer skips
tasks that were running on a CPU 'recently'.

this code is fundamental fragile: the boot-time migration cost detector
doesnt really work on systems that had large L3 caches, it caused boot
delays on large systems and the whole cache-hot concept made the
balancing code pretty undeterministic as well.

(and hey, i wrote most of it, so i can say it out loud that it sucks ;-)

under CFS the same purpose of cache affinity can be achieved without
any special cache-hot special-case: tasks are sorted in the 'timeline'
tree and the SMP balancer picks tasks from the left side of the
tree, thus the most cache-cold task is balanced automatically.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

0437e109

sched: add SCHED_IDLE policy · 0e6aca43

由 Ingo Molnar 提交于 7月 09, 2007

this patch adds the SCHED_IDLE policy to sched.h.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

0e6aca43

sched: rename idle_type/SCHED_IDLE · d15bcfdb

由 Ingo Molnar 提交于 7月 09, 2007

enum idle_type (used by the load-balancer) clashes with the
SCHED_IDLE name that we want to introduce. 'CPU_IDLE' instead
of 'SCHED_IDLE' is more descriptive as well.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d15bcfdb

09 6月, 2007 1 次提交

pi-futex: fix exit races and locking problems · 778e9a9c

由 Alexey Kuznetsov 提交于 6月 08, 2007

1. New entries can be added to tsk->pi_state_list after task completed
   exit_pi_state_list(). The result is memory leakage and deadlocks.

2. handle_mm_fault() is called under spinlock. The result is obvious.

3. results in self-inflicted deadlock inside glibc.
   Sometimes futex_lock_pi returns -ESRCH, when it is not expected
   and glibc enters to for(;;) sleep() to simulate deadlock. This problem
   is quite obvious and I think the patch is right. Though it looks like
   each "if" in futex_lock_pi() got some stupid special case "else if". :-)

4. sometimes futex_lock_pi() returns -EDEADLK,
   when nobody has the lock. The reason is also obvious (see comment
   in the patch), but correct fix is far beyond my comprehension.
   I guess someone already saw this, the chunk:

                        if (rt_mutex_trylock(&q.pi_state->pi_mutex))
                                ret = 0;

   is obviously from the same opera. But it does not work, because the
   rtmutex is really taken at this point: wake_futex_pi() of previous
   owner reassigned it to us. My fix works. But it looks very stupid.
   I would think about removal of shift of ownership in wake_futex_pi()
   and making all the work in context of process taking lock.

From: Thomas Gleixner <tglx@linutronix.de>

Fix 1) Avoid the tasklist lock variant of the exit race fix by adding
    an additional state transition to the exit code.

    This fixes also the issue, when a task with recursive segfaults
    is not able to release the futexes.

Fix 2) Cleanup the lookup_pi_state() failure path and solve the -ESRCH
    problem finally.

Fix 3) Solve the fixup_pi_state_owner() problem which needs to do the fixup
    in the lock protected section by using the in_atomic userspace access
    functions.

    This removes also the ugly lock drop / unqueue inside of fixup_pi_state()

Fix 4) Fix a stale lock in the error path of futex_wake_pi()

Added some error checks for verification.

The -EDEADLK problem is solved by the rtmutex fixups.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NIngo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

778e9a9c

24 5月, 2007 1 次提交

recalc_sigpending_tsk fixes · 7bb44ade

由 Roland McGrath 提交于 5月 23, 2007

Steve Hawkes discovered a problem where recalc_sigpending_tsk was called in
do_sigaction but no signal_wake_up call was made, preventing later signals
from waking up blocked threads with TIF_SIGPENDING already set.

In fact, the few other calls to recalc_sigpending_tsk outside the signals
code are also subject to this problem in other race conditions.

This change makes recalc_sigpending_tsk private to the signals code.  It
changes the outside calls, as well as do_sigaction, to use the new
recalc_sigpending_and_wake instead.
Signed-off-by: NRoland McGrath <roland@redhat.com>
Cc: <Steve.Hawkes@motorola.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7bb44ade

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功