- 23 8月, 2007 2 次提交
-
-
由 Suresh Siddha 提交于
On a four package system with HT - HT load balancing optimizations were broken. For example, if two tasks end up running on two logical threads of one of the packages, scheduler is not able to pull one of the tasks to a completely idle package. In this scenario, for nice-0 tasks, imbalance calculated by scheduler will be 512 and find_busiest_queue() will return 0 (as each cpu's load is 1024 > imbalance and has only one task running). Similarly MC scheduler optimizations also get fixed with this patch. [ mingo@elte.hu: restored fair balancing by increasing the fuzz and adding it back to the power decision, without the /2 factor. ] Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
construct a more or less wall-clock time out of sched_clock(), by using ACPI-idle's existing knowledge about how much time we spent idling. This allows the rq clock to work around TSC-stops-in-C2, TSC-gets-corrupted-in-C3 type of problems. ( Besides the scheduler's statistics this also benefits blktrace and printk-timestamps as well. ) Furthermore, the precise before-C2/C3-sleep and after-C2/C3-wakeup callbacks allow the scheduler to get out the most of the period where the CPU has a reliable TSC. This results in slightly more precise task statistics. the ACPI bits were acked by Len. Signed-off-by: NIngo Molnar <mingo@elte.hu> Acked-by: NLen Brown <len.brown@intel.com>
-
- 09 8月, 2007 8 次提交
-
-
由 Ingo Molnar 提交于
remove the 'u64 now' parameter from ->task_new(). ( identity transformation that causes no change in functionality. ) Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
remove the 'u64 now' parameter from ->put_prev_task(). ( identity transformation that causes no change in functionality. ) Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
remove the 'u64 now' parameter from ->pick_next_task(). ( identity transformation that causes no change in functionality. ) Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
remove the 'u64 now' parameter from ->dequeue_task(). ( identity transformation that causes no change in functionality. ) Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
remove the 'u64 now' parameter from ->enqueue_task(). ( identity transformation that causes no change in functionality. ) Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
remove the 'u64 now' parameter from print_cfs_rq(). ( identity transformation that causes no change in functionality. ) Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Williams 提交于
There are two problems with balance_tasks() and how it used: 1. The variables best_prio and best_prio_seen (inherited from the old move_tasks()) were only required to handle problems caused by the active/expired arrays, the order in which they were processed and the possibility that the task with the highest priority could be on either. These issues are no longer present and the extra overhead associated with their use is unnecessary (and possibly wrong). 2. In the absence of CONFIG_FAIR_GROUP_SCHED being set, the same this_best_prio variable needs to be used by all scheduling classes or there is a risk of moving too much load. E.g. if the highest priority task on this at the beginning is a fairly low priority task and the rt class migrates a task (during its turn) then that moved task becomes the new highest priority task on this_rq but when the sched_fair class initializes its copy of this_best_prio it will get the priority of the original highest priority task as, due to the run queue locks being held, the reschedule triggered by pull_task() will not have taken place. This could result in inappropriate overriding of skip_for_load and excessive load being moved. The attached patch addresses these problems by deleting all reference to best_prio and best_prio_seen and making this_best_prio a reference parameter to the various functions involved. load_balance_fair() has also been modified so that this_best_prio is only reset (in the loop) if CONFIG_FAIR_GROUP_SCHED is set. This should preserve the effect of helping spread groups' higher priority tasks around the available CPUs while improving system performance when CONFIG_FAIR_GROUP_SCHED isn't set. Signed-off-by: NPeter Williams <pwil3058@bigpond.net.au> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Williams 提交于
The move_tasks() function is currently multiplexed with two distinct capabilities: 1. attempt to move a specified amount of weighted load from one run queue to another; and 2. attempt to move a specified number of tasks from one run queue to another. The first of these capabilities is used in two places, load_balance() and load_balance_idle(), and in both of these cases the return value of move_tasks() is used purely to decide if tasks/load were moved and no notice of the actual number of tasks moved is taken. The second capability is used in exactly one place, active_load_balance(), to attempt to move exactly one task and, as before, the return value is only used as an indicator of success or failure. This multiplexing of sched_task() was introduced, by me, as part of the smpnice patches and was motivated by the fact that the alternative, one function to move specified load and one to move a single task, would have led to two functions of roughly the same complexity as the old move_tasks() (or the new balance_tasks()). However, the new modular design of the new CFS scheduler allows a simpler solution to be adopted and this patch addresses that solution by: 1. adding a new function, move_one_task(), to be used by active_load_balance(); and 2. making move_tasks() a single purpose function that tries to move a specified weighted load and returns 1 for success and 0 for failure. One of the consequences of these changes is that neither move_one_task() or the new move_tasks() care how many tasks sched_class.load_balance() moves and this enables its interface to be simplified by returning the amount of load moved as its result and removing the load_moved pointer from the argument list. This helps simplify the new move_tasks() and slightly reduces the amount of work done in each of sched_class.load_balance()'s implementations. Further simplification, e.g. changes to balance_tasks(), are possible but (slightly) complicated by the special needs of load_balance_fair() so I've left them to a later patch (if this one gets accepted). NB Since move_tasks() gets called with two run queue locks held even small reductions in overhead are worthwhile. [ mingo@elte.hu ] this change also reduces code size nicely: text data bss dec hex filename 39216 3618 24 42858 a76a sched.o.before 39173 3618 24 42815 a73f sched.o.after Signed-off-by: NPeter Williams <pwil3058@bigpond.net.au> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 02 8月, 2007 3 次提交
-
-
由 Ingo Molnar 提交于
more task_struct size reduction, by moving the debugging/instrumentation fields to under CONFIG_SCHEDSTATS: (i386, nodebug): size ---- pre-CFS 1328 CFS 1472 CFS+patch 1376 Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
make sched_class.task_new == NULL a 'default method', this allows the removal of task_rt_new. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
remove the last unused remains of cache_hot_time. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 26 7月, 2007 3 次提交
-
-
由 Con Kolivas 提交于
Add an above_background_load() function which can be used by other subsystems to detect if there is anything besides niced tasks running. Place it in sched.h to allow it to be compiled out if not used. Unused for now, but it is a useful hint to the IO scheduler and to swap-prefetch. Signed-off-by: NCon Kolivas <kernel@kolivas.org> Cc: Peter Williams <pwil3058@bigpond.net.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Avi Kivity 提交于
This adds a general mechanism whereby a task can request the scheduler to notify it whenever it is preempted or scheduled back in. This allows the task to swap any special-purpose registers like the fpu or Intel's VT registers. Signed-off-by: NAvi Kivity <avi@qumranet.com> [ mingo@elte.hu: fixes, cleanups ] Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
increase SCHED_LOAD_SCALE_FUZZ that adds a small amount of over-balancing: to help distribute CPU-bound tasks more fairly on SMP systems. the problem of unfair balancing was noticed and reported by Tong N Li. 10 CPU-bound tasks running on 8 CPUs, v2.6.23-rc1: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2572 mingo 20 0 1576 244 196 R 100 0.0 1:03.61 loop 2578 mingo 20 0 1576 248 196 R 100 0.0 1:03.59 loop 2576 mingo 20 0 1576 248 196 R 100 0.0 1:03.52 loop 2571 mingo 20 0 1576 244 196 R 100 0.0 1:03.46 loop 2569 mingo 20 0 1576 244 196 R 99 0.0 1:03.36 loop 2570 mingo 20 0 1576 244 196 R 95 0.0 1:00.55 loop 2577 mingo 20 0 1576 248 196 R 50 0.0 0:31.88 loop 2574 mingo 20 0 1576 248 196 R 50 0.0 0:31.87 loop 2573 mingo 20 0 1576 248 196 R 50 0.0 0:31.86 loop 2575 mingo 20 0 1576 248 196 R 50 0.0 0:31.86 loop v2.6.23-rc1 + patch: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2681 mingo 20 0 1576 244 196 R 85 0.0 3:51.68 loop 2688 mingo 20 0 1576 244 196 R 81 0.0 3:46.35 loop 2682 mingo 20 0 1576 244 196 R 80 0.0 3:43.68 loop 2685 mingo 20 0 1576 248 196 R 80 0.0 3:45.97 loop 2683 mingo 20 0 1576 248 196 R 80 0.0 3:40.25 loop 2679 mingo 20 0 1576 244 196 R 80 0.0 3:33.53 loop 2680 mingo 20 0 1576 244 196 R 79 0.0 3:43.53 loop 2686 mingo 20 0 1576 244 196 R 79 0.0 3:39.31 loop 2687 mingo 20 0 1576 244 196 R 78 0.0 3:33.31 loop 2684 mingo 20 0 1576 244 196 R 77 0.0 3:27.52 loop so they now nicely converge to the expected 80% long-term CPU usage. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 20 7月, 2007 3 次提交
-
-
由 Ingo Molnar 提交于
Implement the cpu_clock(cpu) interface for kernel-internal use: high-speed (but slightly incorrect) per-cpu clock constructed from sched_clock(). This API, unused at the moment, will be used in the future by blktrace, by the softlockup-watchdog, by printk and by lockstat. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Kawai, Hidehiro 提交于
This patch adds an interface to set/reset flags which determines each memory segment should be dumped or not when a core file is generated. /proc/<pid>/coredump_filter file is provided to access the flags. You can change the flag status for a particular process by writing to or reading from the file. The flag status is inherited to the child process when it is created. Signed-off-by: NHidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kawai, Hidehiro 提交于
This patch changes mm_struct.dumpable to a pair of bit flags. set_dumpable() converts three-value dumpable to two flags and stores it into lower two bits of mm_struct.flags instead of mm_struct.dumpable. get_dumpable() behaves in the opposite way. [akpm@linux-foundation.org: export set_dumpable] Signed-off-by: NHidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 7月, 2007 4 次提交
-
-
由 Serge E. Hallyn 提交于
This patch enables the unshare of user namespaces. It adds a new clone flag CLONE_NEWUSER and implements copy_user_ns() which resets the current user_struct and adds a new root user (uid == 0) For now, unsharing the user namespace allows a process to reset its user_struct accounting and uid 0 in the new user namespace should be contained using appropriate means, for instance selinux The plan, when the full support is complete (all uid checks covered), is to keep the original user's rights in the original namespace, and let a process become uid 0 in the new namespace, with full capabilities to the new namespace. Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com> Signed-off-by: NCedric Le Goater <clg@fr.ibm.com> Acked-by: NPavel Emelianov <xemul@openvz.org> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Andrew Morgan <agm@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cedric Le Goater 提交于
Basically, it will allow a process to unshare its user_struct table, resetting at the same time its own user_struct and all the associated accounting. A new root user (uid == 0) is added to the user namespace upon creation. Such root users have full privileges and it seems that theses privileges should be controlled through some means (process capabilities ?) The unshare is not included in this patch. Changes since [try #4]: - Updated get_user_ns and put_user_ns to accept NULL, and get_user_ns to return the namespace. Changes since [try #3]: - moved struct user_namespace to files user_namespace.{c,h} Changes since [try #2]: - removed struct user_namespace* argument from find_user() Changes since [try #1]: - removed struct user_namespace* argument from find_user() - added a root_user per user namespace Signed-off-by: NCedric Le Goater <clg@fr.ibm.com> Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com> Acked-by: NPavel Emelianov <xemul@openvz.org> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Andrew Morgan <agm@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Miloslav Trmac 提交于
Add TTY input auditing, used to audit system administrator's actions. This is required by various security standards such as DCID 6/3 and PCI to provide non-repudiation of administrator's actions and to allow a review of past actions if the administrator seems to overstep their duties or if the system becomes misconfigured for unknown reasons. These requirements do not make it necessary to audit TTY output as well. Compared to an user-space keylogger, this approach records TTY input using the audit subsystem, correlated with other audit events, and it is completely transparent to the user-space application (e.g. the console ioctls still work). TTY input auditing works on a higher level than auditing all system calls within the session, which would produce an overwhelming amount of mostly useless audit events. Add an "audit_tty" attribute, inherited across fork (). Data read from TTYs by process with the attribute is sent to the audit subsystem by the kernel. The audit netlink interface is extended to allow modifying the audit_tty attribute, and to allow sending explanatory audit events from user-space (for example, a shell might send an event containing the final command, after the interactive command-line editing and history expansion is performed, which might be difficult to decipher from the TTY input alone). Because the "audit_tty" attribute is inherited across fork (), it would be set e.g. for sshd restarted within an audited session. To prevent this, the audit_tty attribute is cleared when a process with no open TTY file descriptors (e.g. after daemon startup) opens a TTY. See https://www.redhat.com/archives/linux-audit/2007-June/msg00000.html for a more detailed rationale document for an older version of this patch. [akpm@linux-foundation.org: build fix] Signed-off-by: NMiloslav Trmac <mitr@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Paul Fulghum <paulkf@microgate.com> Cc: Casey Schaufler <casey@schaufler-ca.com> Cc: Steve Grubb <sgrubb@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Tomas Janousek 提交于
Commit 411187fb caused boot time to move and process start times to become invalid after suspend. Using boot based time for those restores the old behaviour and fixes the issue. [akpm@linux-foundation.org: little cleanup] Signed-off-by: NTomas Janousek <tjanouse@redhat.com> Cc: Tomas Smetana <tsmetana@redhat.com> Acked-by: NJohn Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 7月, 2007 17 次提交
-
-
由 Ingo Molnar 提交于
micro-optimize mmdrop(). Improves schedule()'s assembly a bit. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
scheduler debugging core: implement /proc/sched_debug and /proc/<PID>/sched files for scheduler debugging. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
remove the old cpu-accounting field from signal_struct, now that the code is using CFS's stats. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
batch_task() in sched.h is now unused - remove it. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
remove now-unused types/fields used by the old scheduler. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
sched_fork()/sched_exit() does not need to specify fastcall anymore, as the x86 kernel defaults to regparm3, and no assembly code calls these functions. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Balbir Singh 提交于
update delay-accounting to use CFS's precise stats. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
track TSC-unstable events and propagate it to the scheduler code. Also allow sched_clock() to be used when the TSC is unstable, the rq_clock() wrapper creates a reliable clock out of it. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
remove the sleep_type heuristics from the core scheduler - scheduling policy is implemented in the scheduling-policy modules. (and CFS does not use this type of sleep-type heuristics) Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
clean up the rt priority macros, pointed out by Andrew Morton. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
update the posix-cpu-timers code to use CFS's CPU accounting information. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
add the CFS data types to sched.h. (the old scheduler is still fully intact.) Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
add kernel/sched_fair.c - which implements the bulk of CFS's behavioral changes for SCHED_OTHER tasks. see Documentation/sched-design-CFS.txt about details. Authors: Ingo Molnar <mingo@elte.hu> Dmitry Adamushko <dmitry.adamushko@gmail.com> Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Mike Galbraith <efault@gmx.de> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NMike Galbraith <efault@gmx.de> Signed-off-by: NDmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: NSrivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
-
由 Ingo Molnar 提交于
increase SMP-nice's resolution. This is needed by CFS to implement SCHED_IDLE and cleaned up nice level support. no behavioral changes. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
add the init_idle_bootup_task() callback to the bootup thread, unused at the moment. (CFS will use it to switch the scheduling class of the boot thread to the idle class) Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
uninline set_task_cpu(): CFS will add more code to it. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
the SMP load-balancer uses the boot-time migration-cost estimation code to attempt to improve the quality of balancing. The reason for this code is that the discrete priority queues do not preserve the order of scheduling accurately, so the load-balancer skips tasks that were running on a CPU 'recently'. this code is fundamental fragile: the boot-time migration cost detector doesnt really work on systems that had large L3 caches, it caused boot delays on large systems and the whole cache-hot concept made the balancing code pretty undeterministic as well. (and hey, i wrote most of it, so i can say it out loud that it sucks ;-) under CFS the same purpose of cache affinity can be achieved without any special cache-hot special-case: tasks are sorted in the 'timeline' tree and the SMP balancer picks tasks from the left side of the tree, thus the most cache-cold task is balanced automatically. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-