- 26 3月, 2008 1 次提交
-
-
由 Thomas Gleixner 提交于
add_timer_on() can add a timer on a CPU which is currently in a long idle sleep, but the timer wheel is not reevaluated by the nohz code on that CPU. So a timer can be delayed for quite a long time. This triggered a false positive in the clocksource watchdog code. To avoid this we need to wake up the idle CPU and enforce the reevaluation of the timer wheel for the next timer event. Add a function, which checks a given CPU for idle state, marks the idle task with NEED_RESCHED and sends a reschedule IPI to notify the other CPU of the change in the timer wheel. Call this function from add_timer_on(). Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: NIngo Molnar <mingo@elte.hu> Cc: stable@kernel.org -- include/linux/sched.h | 6 ++++++ kernel/sched.c | 43 +++++++++++++++++++++++++++++++++++++++++++ kernel/timer.c | 10 +++++++++- 3 files changed, 58 insertions(+), 1 deletion(-)
-
- 21 3月, 2008 1 次提交
-
-
由 Heiko Carstens 提交于
Needed so it can be called from outside of sched.c. Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 19 3月, 2008 1 次提交
-
-
由 Ingo Molnar 提交于
improve affine wakeups. Maintain the 'overlap' metric based on CFS's sum_exec_runtime - which means the amount of time a task executes after it wakes up some other task. Use the 'overlap' for the wakeup decisions: if the 'overlap' is short, it means there's strong workload coupling between this task and the woken up task. If the 'overlap' is large then the workload is decoupled and the scheduler will move them to separate CPUs more easily. ( Also slightly move the preempt_check within try_to_wake_up() - this has no effect on functionality but allows 'early wakeups' (for still-on-rq tasks) to be correctly accounted as well.) Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 07 3月, 2008 1 次提交
-
-
由 Peter Zijlstra 提交于
Kei Tokunaga reported an interactivity problem when moving tasks between control groups. Tasks would retain their old vruntime when moved between groups, this can cause funny lags. Re-set the vruntime on group move to fit within the new tree. Reported-by: NKei Tokunaga <tokunaga.keiich@jp.fujitsu.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 05 3月, 2008 1 次提交
-
-
由 Peter Zijlstra 提交于
The following commits cause a number of regressions: commit 58e2d4ca Author: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Date: Fri Jan 25 21:08:00 2008 +0100 sched: group scheduling, change how cpu load is calculated commit 6b2d7700 Author: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Date: Fri Jan 25 21:08:00 2008 +0100 sched: group scheduler, fix fairness of cpu bandwidth allocation for task groups Namely: - very frequent wakeups on SMP, reported by PowerTop users. - cacheline trashing on (large) SMP - some latencies larger than 500ms While there is a mergeable patch to fix the latter, the former issues are not fixable in a manner suitable for .25 (we're at -rc3 now). Hence we revert them and try again in v2.6.26. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> CC: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Tested-by: NAlexey Zaytsev <alexey.zaytsev@gmail.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 26 2月, 2008 1 次提交
-
-
由 Peter Zijlstra 提交于
Some code paths exceed the current max lock depth (XFS), so increase this limit a bit. I looked at making this a dynamic allocated array, but we should not advocate insane lock depths, so stay with this as long as it works... Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 25 2月, 2008 1 次提交
-
-
由 Harvey Harrison 提交于
Avoids sparse warnings: kernel/sched.c:2170:17: warning: symbol 'schedule_tail' was not declared. Should it be static? Avoids the need for an external declaration in arch/um/process.c Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 14 2月, 2008 1 次提交
-
-
由 Harvey Harrison 提交于
FASTCALL() is always expanded to empty, remove it. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 2月, 2008 2 次提交
-
-
由 Peter Zijlstra 提交于
Make the rt group scheduler compile time configurable. Keep it experimental for now. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Change the rt_ratio interface to rt_runtime_us, to match rt_period_us. This avoids picking a granularity for the ratio. Extend the /sys/kernel/uids/<uid>/ interface to allow setting the group's rt_runtime. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 09 2月, 2008 4 次提交
-
-
由 Pavel Emelyanov 提交于
There's only one caller left - the kill_pgrp one - so merge these two functions and forget the kill_pgrp_info one. Signed-off-by: NPavel Emelyanov <xemul@openvz.org> Reviewed-by: NOleg Nesterov <oleg@tv-sign.ru> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
signal_struct->tsk points to the ->group_leader and thus we have the nasty code in de_thread() which has to change it and restart ->real_timer if the leader is changed. Use "struct pid *leader_pid" instead. This also allows us to kill now unneeded send_group_sig_info(). Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru> Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Pavel Emelyanov <xemul@openvz.org> Acked-by: NRoland McGrath <roland@redhat.com> Acked-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Eric W. Biederman 提交于
pid_vnr returns the user space pid with respect to the pid namespace the struct pid was allocated in. What we want before we return a pid to user space is the user space pid with respect to the pid namespace of current. pid_vnr is a very nice optimization but because it isn't quite what we want it is easy to use pid_vnr at times when we aren't certain the struct pid was allocated in our pid namespace. Currently this describes at least tiocgpgrp and tiocgsid in ttyio.c the parent process reported in the core dumps and the parent process in get_signal_to_deliver. So unless the performance impact is huge having an interface that does what we want instead of always what we want should be much more reliable and much less error prone. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: NPavel Emelyanov <xemul@openvz.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
Change set_special_pids() to work with struct pid, not pid_t from global name space. This again speedups and imho cleanups the code, also a preparation for the next patch. Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru> Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com> Acked-by: NPavel Emelyanov <xemul@openvz.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 08 2月, 2008 1 次提交
-
-
由 Pavel Emelianov 提交于
Basic setup routines, the mm_struct has a pointer to the cgroup that it belongs to and the the page has a page_cgroup associated with it. Signed-off-by: NPavel Emelianov <xemul@openvz.org> Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com> Cc: Paul Menage <menage@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Kirill Korotaev <dev@sw.ru> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 07 2月, 2008 1 次提交
-
-
由 Eric Dumazet 提交于
groups_sort() can be quite long if user loads a large gid table. This is because GROUP_AT(group_info, some_integer) uses an integer divide. So having to do XXX thousand divides during one syscall can lead to very high latencies. (NGROUPS_MAX=65536) In the past (25 Mar 2006), an analog problem was found in groups_search() (commit d74beb9f ) and at that time I changed some variables to unsigned int. I believe that a more generic fix is to make sure NGROUPS_PER_BLOCK is unsigned. Signed-off-by: NEric Dumazet <dada1@cosmosbay.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 2月, 2008 4 次提交
-
-
由 Serge E. Hallyn 提交于
The capability bounding set is a set beyond which capabilities cannot grow. Currently cap_bset is per-system. It can be manipulated through sysctl, but only init can add capabilities. Root can remove capabilities. By default it includes all caps except CAP_SETPCAP. This patch makes the bounding set per-process when file capabilities are enabled. It is inherited at fork from parent. Noone can add elements, CAP_SETPCAP is required to remove them. One example use of this is to start a safer container. For instance, until device namespaces or per-container device whitelists are introduced, it is best to take CAP_MKNOD away from a container. The bounding set will not affect pP and pE immediately. It will only affect pP' and pE' after subsequent exec()s. It also does not affect pI, and exec() does not constrain pI'. So to really start a shell with no way of regain CAP_MKNOD, you would do prctl(PR_CAPBSET_DROP, CAP_MKNOD); cap_t cap = cap_get_proc(); cap_value_t caparray[1]; caparray[0] = CAP_MKNOD; cap_set_flag(cap, CAP_INHERITABLE, 1, caparray, CAP_DROP); cap_set_proc(cap); cap_free(cap); The following test program will get and set the bounding set (but not pI). For instance ./bset get (lists capabilities in bset) ./bset drop cap_net_raw (starts shell with new bset) (use capset, setuid binary, or binary with file capabilities to try to increase caps) ************************************************************ cap_bound.c ************************************************************ #include <sys/prctl.h> #include <linux/capability.h> #include <sys/types.h> #include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #ifndef PR_CAPBSET_READ #define PR_CAPBSET_READ 23 #endif #ifndef PR_CAPBSET_DROP #define PR_CAPBSET_DROP 24 #endif int usage(char *me) { printf("Usage: %s get\n", me); printf(" %s drop <capability>\n", me); return 1; } #define numcaps 32 char *captable[numcaps] = { "cap_chown", "cap_dac_override", "cap_dac_read_search", "cap_fowner", "cap_fsetid", "cap_kill", "cap_setgid", "cap_setuid", "cap_setpcap", "cap_linux_immutable", "cap_net_bind_service", "cap_net_broadcast", "cap_net_admin", "cap_net_raw", "cap_ipc_lock", "cap_ipc_owner", "cap_sys_module", "cap_sys_rawio", "cap_sys_chroot", "cap_sys_ptrace", "cap_sys_pacct", "cap_sys_admin", "cap_sys_boot", "cap_sys_nice", "cap_sys_resource", "cap_sys_time", "cap_sys_tty_config", "cap_mknod", "cap_lease", "cap_audit_write", "cap_audit_control", "cap_setfcap" }; int getbcap(void) { int comma=0; unsigned long i; int ret; printf("i know of %d capabilities\n", numcaps); printf("capability bounding set:"); for (i=0; i<numcaps; i++) { ret = prctl(PR_CAPBSET_READ, i); if (ret < 0) perror("prctl"); else if (ret==1) printf("%s%s", (comma++) ? ", " : " ", captable[i]); } printf("\n"); return 0; } int capdrop(char *str) { unsigned long i; int found=0; for (i=0; i<numcaps; i++) { if (strcmp(captable[i], str) == 0) { found=1; break; } } if (!found) return 1; if (prctl(PR_CAPBSET_DROP, i)) { perror("prctl"); return 1; } return 0; } int main(int argc, char *argv[]) { if (argc<2) return usage(argv[0]); if (strcmp(argv[1], "get")==0) return getbcap(); if (strcmp(argv[1], "drop")!=0 || argc<3) return usage(argv[0]); if (capdrop(argv[2])) { printf("unknown capability\n"); return 1; } return execl("/bin/bash", "/bin/bash", NULL); } ************************************************************ [serue@us.ibm.com: fix typo] Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com> Signed-off-by: NAndrew G. Morgan <morgan@kernel.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Casey Schaufler <casey@schaufler-ca.com>a Signed-off-by: N"Serge E. Hallyn" <serue@us.ibm.com> Tested-by: NJiri Slaby <jirislaby@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dave Hansen 提交于
The following replaces the earlier patches sent. It should address David Rientjes's comments, and has been compile tested on all the architectures that it touches, save for parisc. For the /proc/<pid>/pagemap code[1], we need to able to query how much virtual address space a particular task has. The trick is that we do it through /proc and can't use TASK_SIZE since it references "current" on some arches. The process opening the /proc file might be a 32-bit process opening a 64-bit process's pagemap file. x86_64 already has a TASK_SIZE_OF() macro: #define TASK_SIZE_OF(child) ((test_tsk_thread_flag(child, TIF_IA32)) ? IA32_PAGE_OFFSET : TASK_SIZE64) I'd like to have that for other architectures. So, add it for all the architectures that actually use "current" in their TASK_SIZE. For the others, just add a quick #define in sched.h to use plain old TASK_SIZE. 1. http://www.linuxworld.com/news/2007/042407-kernel.html - MIPS portion from Ralf Baechle <ralf@linux-mips.org> [akpm@linux-foundation.org: fix mips build] Signed-off-by: NDave Hansen <haveblue@us.ibm.com> Signed-off-by: NRalf Baechle <ralf@linux-mips.org> Signed-off-by: NMatt Mackall <mpm@selenic.com> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
As Roland pointed out, we have the very old problem with exec. de_thread() sets SIGNAL_GROUP_EXIT, kills other threads, changes ->group_leader and then clears signal->flags. All signals (even fatal ones) sent in this window (which is not too small) will be lost. With this patch exec doesn't abuse SIGNAL_GROUP_EXIT. signal_group_exit(), the new helper, should be used to detect exit_group() or exec() in progress. It can have more users, but this patch does only strictly necessary changes. Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Robin Holt <holt@sgi.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrew Morton 提交于
It was dumb to make get_task_comm() return void. Change it to return a pointer to the resulting output for caller convenience. Cc: Ulrich Drepper <drepper@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 02 2月, 2008 2 次提交
-
-
由 Eric Paris 提交于
In order to correlate audit records to an individual login add a session id. This is incremented every time a user logs in and is included in almost all messages which currently output the auid. The field is labeled ses= or oses= Signed-off-by: NEric Paris <eparis@redhat.com>
-
由 Al Viro 提交于
Keeping loginuid in audit_context is racy and results in messier code. Taken to task_struct, out of the way of ->audit_context changes. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 30 1月, 2008 1 次提交
-
-
由 Nick Piggin 提交于
The break_lock data structure and code for spinlocks is quite nasty. Not only does it double the size of a spinlock but it changes locking to a potentially less optimal trylock. Put all of that under CONFIG_GENERIC_LOCKBREAK, and introduce a __raw_spin_is_contended that uses the lock data itself to determine whether there are waiters on the lock, to be used if CONFIG_GENERIC_LOCKBREAK is not set. Rename need_lockbreak to spin_needbreak, make it use spin_is_contended to decouple it from the spinlock implementation, and make it typesafe (rwlocks do not have any need_lockbreak sites -- why do they even get bloated up with that break_lock then?). Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 28 1月, 2008 2 次提交
-
-
由 Jens Axboe 提交于
syslets (or other threads/processes that want io context sharing) can set this to enforce sharing of io context. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
This is where it belongs and then it doesn't take up space for a process that doesn't do IO. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 26 1月, 2008 15 次提交
-
-
由 Arjan van de Ven 提交于
Right now, the linux kernel (with scheduler statistics enabled) keeps track of the maximum time a process is waiting to be scheduled. While the maximum is a very useful metric, tracking average and total is equally useful (at least for latencytop) to figure out the accumulated effect of scheduler delays. The accumulated effect is important to judge the performance impact of scheduler tuning/behavior. Signed-off-by: NArjan van de Ven <arjan@linux.intel.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Alexey Dobriyan 提交于
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
fix softlockup tunables signedness. mark tunables read-mostly. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Arjan van de Ven 提交于
LatencyTOP kernel infrastructure; it measures latencies in the scheduler and tracks it system wide and per process. Signed-off-by: NArjan van de Ven <arjan@linux.intel.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
We need to teach no_hz about the rt throttling because its tick driven. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Extend group scheduling to also cover the realtime classes. It uses the time limiting introduced by the previous patch to allow multiple realtime groups. The hard time limit is required to keep behaviour deterministic. The algorithms used make the realtime scheduler O(tg), linear scaling wrt the number of task groups. This is the worst case behaviour I can't seem to get out of, the avg. case of the algorithms can be improved, I focused on correctness and worst case. [ akpm@linux-foundation.org: move side-effects out of BUG_ON(). ] Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Very simple time limit on the realtime scheduling classes. Allow the rq's realtime class to consume sched_rt_ratio of every sched_rt_period slice. If the class exceeds this quota the fair class will preempt the realtime class. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Use HR-timers (when available) to deliver an accurate preemption tick. The regular scheduler tick that runs at 1/HZ can be too coarse when nice level are used. The fairness system will still keep the cpu utilisation 'fair' by then delaying the task that got an excessive amount of CPU time but try to minimize this by delivering preemption points spot-on. The average frequency of this extra interrupt is sched_latency / nr_latency. Which need not be higher than 1/HZ, its just that the distribution within the sched_latency period is important. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Herbert Xu 提交于
Why do we even have cond_resched when real preemption is on? It seems to be a waste of space and time. remove cond_resched with CONFIG_PREEMPT on. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Introduce a new rlimit that allows the user to set a runtime timeout on real-time tasks their slice. Once this limit is exceeded the task will receive SIGXCPU. So it measures runtime since the last sleep. Input and ideas by Thomas Gleixner and Lennart Poettering. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> CC: Lennart Poettering <mzxreary@0pointer.de> CC: Michael Kerrisk <mtk.manpages@googlemail.com> CC: Ulrich Drepper <drepper@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Move the task_struct members specific to rt scheduling together. A future optimization could be to put sched_entity and sched_rt_entity into a union. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> CC: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Paul E. McKenney 提交于
This patch implements a new version of RCU which allows its read-side critical sections to be preempted. It uses a set of counter pairs to keep track of the read-side critical sections and flips them when all tasks exit read-side critical section. The details of this implementation can be found in this paper - http://www.rdrop.com/users/paulmck/RCU/OLSrtRCU.2006.08.11a.pdf and the article- http://lwn.net/Articles/253651/ This patch was developed as a part of the -rt kernel development and meant to provide better latencies when read-side critical sections of RCU don't disable preemption. As a consequence of keeping track of RCU readers, the readers have a slight overhead (optimizations in the paper). This implementation co-exists with the "classic" RCU implementations and can be switched to at compiler. Also includes RCU tracing summarized in debugfs. [ akpm@linux-foundation.org: build fixes on non-preempt architectures ] Signed-off-by: NGautham R Shenoy <ego@in.ibm.com> Signed-off-by: NDipankar Sarma <dipankar@in.ibm.com> Signed-off-by: NPaul E. McKenney <paulmck@us.ibm.com> Reviewed-by: NSteven Rostedt <srostedt@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Steven Rostedt 提交于
Dmitry Adamushko found that the current implementation of the RT balancing code left out changes to the sched_setscheduler and rt_mutex_setprio. This patch addresses this issue by adding methods to the schedule classes to handle being switched out of (switched_from) and being switched into (switched_to) a sched_class. Also a method for changing of priorities is also added (prio_changed). This patch also removes some duplicate logic between rt_mutex_setprio and sched_setscheduler. Signed-off-by: NSteven Rostedt <srostedt@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Steven Rostedt 提交于
To make the main sched.c code more agnostic to the schedule classes. Instead of having specific hooks in the schedule code for the RT class balancing. They are replaced with a pre_schedule, post_schedule and task_wake_up methods. These methods may be used by any of the classes but currently, only the sched_rt class implements them. Signed-off-by: NSteven Rostedt <srostedt@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Gregory Haskins 提交于
We add the notion of a root-domain which will be used later to rescope global variables to per-domain variables. Each exclusive cpuset essentially defines an island domain by fully partitioning the member cpus from any other cpuset. However, we currently still maintain some policy/state as global variables which transcend all cpusets. Consider, for instance, rt-overload state. Whenever a new exclusive cpuset is created, we also create a new root-domain object and move each cpu member to the root-domain's span. By default the system creates a single root-domain with all cpus as members (mimicking the global state we have today). We add some plumbing for storing class specific data in our root-domain. Whenever a RQ is switching root-domains (because of repartitioning) we give each sched_class the opportunity to remove any state from its old domain and add state to the new one. This logic doesn't have any clients yet but it will later in the series. Signed-off-by: NGregory Haskins <ghaskins@novell.com> CC: Christoph Lameter <clameter@sgi.com> CC: Paul Jackson <pj@sgi.com> CC: Simon Derr <simon.derr@bull.net> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-