- 07 3月, 2008 3 次提交
-
-
由 Miao Xie 提交于
Function sys_sched_rr_get_interval returns wrong time slice value for SCHED_FIFO tasks. The time slice for SCHED_FIFO tasks should be 0. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Pavel Roskin 提交于
The API is trivial, and so is the implementation. Signed-off-by: NPavel Roskin <proski@gnu.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Kei Tokunaga reported an interactivity problem when moving tasks between control groups. Tasks would retain their old vruntime when moved between groups, this can cause funny lags. Re-set the vruntime on group move to fit within the new tree. Reported-by: NKei Tokunaga <tokunaga.keiich@jp.fujitsu.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 05 3月, 2008 1 次提交
-
-
由 Peter Zijlstra 提交于
The following commits cause a number of regressions: commit 58e2d4ca Author: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Date: Fri Jan 25 21:08:00 2008 +0100 sched: group scheduling, change how cpu load is calculated commit 6b2d7700 Author: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Date: Fri Jan 25 21:08:00 2008 +0100 sched: group scheduler, fix fairness of cpu bandwidth allocation for task groups Namely: - very frequent wakeups on SMP, reported by PowerTop users. - cacheline trashing on (large) SMP - some latencies larger than 500ms While there is a mergeable patch to fix the latter, the former issues are not fixable in a manner suitable for .25 (we're at -rc3 now). Hence we revert them and try again in v2.6.26. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> CC: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Tested-by: NAlexey Zaytsev <alexey.zaytsev@gmail.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 25 2月, 2008 2 次提交
-
-
由 Harvey Harrison 提交于
Unsigned long values are always assigned to switch_count, make it unsigned long. kernel/sched.c:3897:15: warning: incorrect type in assignment (different signedness) kernel/sched.c:3897:15: expected long *switch_count kernel/sched.c:3897:15: got unsigned long *<noident> kernel/sched.c:3921:16: warning: incorrect type in assignment (different signedness) kernel/sched.c:3921:16: expected long *switch_count kernel/sched.c:3921:16: got unsigned long *<noident> Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
do not call sched_clock() too early. Not only might rq->idle not be set up - but pure per-cpu data might not be accessible either. this solves an ia64 early bootup hang with CONFIG_PRINTK_TIME=y. Tested-by: NTony Luck <tony.luck@gmail.com> Acked-by: NTony Luck <tony.luck@gmail.com> Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 24 2月, 2008 2 次提交
-
-
由 Linus Torvalds 提交于
Oleg Nesterov and others have pointed out that on some architectures, the traditional sequence of set_current_state(TASK_INTERRUPTIBLE); if (CONDITION) return; schedule(); is racy wrt another CPU doing CONDITION = 1; wake_up_process(p); because while set_current_state() has a memory barrier separating setting of the TASK_INTERRUPTIBLE state from reading of the CONDITION variable, there is no such memory barrier on the wakeup side. Now, wake_up_process() does actually take a spinlock before it reads and sets the task state on the waking side, and on x86 (and many other architectures) that spinlock is in fact equivalent to a memory barrier, but that is not generally guaranteed. The write that sets CONDITION could move into the critical region protected by the runqueue spinlock. However, adding a smp_wmb() to before the spinlock should now order the writing of CONDITION wrt the lock itself, which in turn is ordered wrt the accesses within the spinlock (which includes the reading of the old state). This should thus close the race (which probably has never been seen in practice, but since smp_wmb() is a no-op on x86, it's not like this will make anything worse either on the most common architecture where the spinlock already gave the required protection). Acked-by: NOleg Nesterov <oleg@tv-sign.ru> Acked-by: NDmitry Adamushko <dmitry.adamushko@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Srinivasa Ds 提交于
Kprobes makes use of preempt_disable(),preempt_enable_noresched() and these functions inturn call add/sub_preempt_count(). So we need to refuse user from inserting probe in to these functions. This patch disallows user from probing add/sub_preempt_count(). Signed-off-by: NSrinivasa DS <srinivasa@in.ibm.com> Acked-by: NAnanth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 2月, 2008 7 次提交
-
-
由 Peter Zijlstra 提交于
Refuse to accept or create RT tasks in groups that can't run them. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Clean up some of the excessive ifdeffery introduces in the last patch. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Make the rt group scheduler compile time configurable. Keep it experimental for now. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Change the rt_ratio interface to rt_runtime_us, to match rt_period_us. This avoids picking a granularity for the ratio. Extend the /sys/kernel/uids/<uid>/ interface to allow setting the group's rt_runtime. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Steven mentioned the fun case where a lock holding task will be throttled. Simple fix: allow groups that have boosted tasks to run anyway. If a runnable task in a throttled group gets boosted the dequeue/enqueue done by rt_mutex_setprio() is enough to unthrottle the group. This is ofcourse not quite correct. Two possible ways forward are: - second prio array for boosted tasks - boost to a prio ceiling (this would also work for deadline scheduling) Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
lockdep spotted this bogus irq locking. normalize_rt_tasks() can be called from hardirq context through sysrq-n Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
On Mon, 2008-02-11 at 15:09 +0300, Denis V. Lunev wrote: > BUG: sleeping function called from invalid context > at /home/den/src/linux-netns26/kernel/mutex.c:209 > in_atomic():1, irqs_disabled():0 > no locks held by swapper/0. > Pid: 0, comm: swapper Not tainted 2.6.24 #304 > > Call Trace: > <IRQ> [<ffffffff80252d1e>] ? __debug_show_held_locks+0x15/0x27 > [<ffffffff8022c2a8>] __might_sleep+0xc0/0xdf > [<ffffffff8049f1df>] mutex_lock_nested+0x28/0x2a9 > [<ffffffff80231294>] sched_destroy_group+0x18/0xea > [<ffffffff8023e835>] sched_destroy_user+0xd/0xf > [<ffffffff8023e8c1>] free_uid+0x8a/0xab > [<ffffffff80233e24>] __put_task_struct+0x3f/0xd3 > [<ffffffff80236708>] delayed_put_task_struct+0x23/0x25 > [<ffffffff8026fda7>] __rcu_process_callbacks+0x8d/0x215 > [<ffffffff8026ff52>] rcu_process_callbacks+0x23/0x44 > [<ffffffff8023a2ae>] __do_softirq+0x79/0xf8 > [<ffffffff8020f8c3>] ? profile_pc+0x2a/0x67 > [<ffffffff8020d38c>] call_softirq+0x1c/0x30 > [<ffffffff8020f689>] do_softirq+0x61/0x9c > [<ffffffff8023a233>] irq_exit+0x51/0x53 > [<ffffffff8021bd1a>] smp_apic_timer_interrupt+0x77/0xad > [<ffffffff8020ce3b>] apic_timer_interrupt+0x6b/0x70 > <EOI> [<ffffffff8020b0dd>] ? default_idle+0x43/0x76 > [<ffffffff8020b0db>] ? default_idle+0x41/0x76 > [<ffffffff8020b09a>] ? default_idle+0x0/0x76 > [<ffffffff8020b186>] ? cpu_idle+0x76/0x98 separate the tg->shares protection from the task_group lock. Reported-by: NDenis V. Lunev <den@openvz.org> Tested-by: NDenis V. Lunev <den@openvz.org> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 09 2月, 2008 1 次提交
-
-
由 Harvey Harrison 提交于
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 2月, 2008 1 次提交
-
-
由 Gerald Stralko 提交于
This removes the extra struct task_struct *p parameter in inc_nr_running and dec_nr_running functions. Signed-off by: Jerry Stralko <gerb.stralko@gmail.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 30 1月, 2008 1 次提交
-
-
由 Nick Piggin 提交于
The break_lock data structure and code for spinlocks is quite nasty. Not only does it double the size of a spinlock but it changes locking to a potentially less optimal trylock. Put all of that under CONFIG_GENERIC_LOCKBREAK, and introduce a __raw_spin_is_contended that uses the lock data itself to determine whether there are waiters on the lock, to be used if CONFIG_GENERIC_LOCKBREAK is not set. Rename need_lockbreak to spin_needbreak, make it use spin_is_contended to decouple it from the spinlock implementation, and make it typesafe (rwlocks do not have any need_lockbreak sites -- why do they even get bloated up with that break_lock then?). Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 26 1月, 2008 22 次提交
-
-
由 Nick Piggin 提交于
The attached patch is something really simple that can sometimes help in getting more info out of a hung system. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Guillaume Chazarain 提交于
We monitor clock overflows, let's also monitor clock underflows. Signed-off-by: NGuillaume Chazarain <guichaz@yahoo.fr> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Guillaume Chazarain 提交于
sched: fix rq->clock warps on frequency changes Fix 2bacec8c (sched: touch softlockup watchdog after idling) that reintroduced warps on frequency changes. touch_softlockup_watchdog() calls __update_rq_clock that checks rq->clock for warps, so call it after adjusting rq->clock. Signed-off-by: NGuillaume Chazarain <guichaz@yahoo.fr> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
remove the !PREEMPT_BKL code. this removes 160 lines of legacy code. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
We need to teach no_hz about the rt throttling because its tick driven. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Extend group scheduling to also cover the realtime classes. It uses the time limiting introduced by the previous patch to allow multiple realtime groups. The hard time limit is required to keep behaviour deterministic. The algorithms used make the realtime scheduler O(tg), linear scaling wrt the number of task groups. This is the worst case behaviour I can't seem to get out of, the avg. case of the algorithms can be improved, I focused on correctness and worst case. [ akpm@linux-foundation.org: move side-effects out of BUG_ON(). ] Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Very simple time limit on the realtime scheduling classes. Allow the rq's realtime class to consume sched_rt_ratio of every sched_rt_period slice. If the class exceeds this quota the fair class will preempt the realtime class. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Use HR-timers (when available) to deliver an accurate preemption tick. The regular scheduler tick that runs at 1/HZ can be too coarse when nice level are used. The fairness system will still keep the cpu utilisation 'fair' by then delaying the task that got an excessive amount of CPU time but try to minimize this by delivering preemption points spot-on. The average frequency of this extra interrupt is sched_latency / nr_latency. Which need not be higher than 1/HZ, its just that the distribution within the sched_latency period is important. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Herbert Xu 提交于
Why do we even have cond_resched when real preemption is on? It seems to be a waste of space and time. remove cond_resched with CONFIG_PREEMPT on. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
whitespace fixes. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Move the task_struct members specific to rt scheduling together. A future optimization could be to put sched_entity and sched_rt_entity into a union. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> CC: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Gregory Haskins 提交于
The baseline code statically builds the span maps when the domain is formed. Previous attempts at dynamically updating the maps caused a suspend-to-ram regression, which should now be fixed. Signed-off-by: NGregory Haskins <ghaskins@novell.com> CC: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Steven Rostedt 提交于
Dmitry Adamushko found that the current implementation of the RT balancing code left out changes to the sched_setscheduler and rt_mutex_setprio. This patch addresses this issue by adding methods to the schedule classes to handle being switched out of (switched_from) and being switched into (switched_to) a sched_class. Also a method for changing of priorities is also added (prio_changed). This patch also removes some duplicate logic between rt_mutex_setprio and sched_setscheduler. Signed-off-by: NSteven Rostedt <srostedt@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Steven Rostedt 提交于
To make the main sched.c code more agnostic to the schedule classes. Instead of having specific hooks in the schedule code for the RT class balancing. They are replaced with a pre_schedule, post_schedule and task_wake_up methods. These methods may be used by any of the classes but currently, only the sched_rt class implements them. Signed-off-by: NSteven Rostedt <srostedt@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Dmitry Adamushko 提交于
Clean-up try_to_wake_up(). Get rid of the 'new_cpu' variable in try_to_wake_up() [ that's, one #ifdef section less ]. Also remove a few redundant blank lines. Signed-off-by: NDmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
add credits for RT balancing improvements. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
style cleanup of various changes that were done recently. no code changed: text data bss dec hex filename 26399 2578 48 29025 7161 sched.o.before 26399 2578 48 29025 7161 sched.o.after Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
remove unused JIFFIES_TO_NS() macro. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Gregory Haskins 提交于
We move the rt-overload data as the first global to per-domain reclassification. This limits the scope of overload related cache-line bouncing to stay with a specified partition instead of affecting all cpus in the system. Finally, we limit the scope of find_lowest_cpu searches to the domain instead of the entire system. Note that we would always respect domain boundaries even without this patch, but we first would scan potentially all cpus before whittling the list down. Now we can avoid looking at RQs that are out of scope, again reducing cache-line hits. Note: In some cases, task->cpus_allowed will effectively reduce our search to within our domain. However, I believe there are cases where the cpus_allowed mask may be all ones and therefore we err on the side of caution. If it can be optimized later, so be it. Signed-off-by: NGregory Haskins <ghaskins@novell.com> CC: Christoph Lameter <clameter@sgi.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Gregory Haskins 提交于
We add the notion of a root-domain which will be used later to rescope global variables to per-domain variables. Each exclusive cpuset essentially defines an island domain by fully partitioning the member cpus from any other cpuset. However, we currently still maintain some policy/state as global variables which transcend all cpusets. Consider, for instance, rt-overload state. Whenever a new exclusive cpuset is created, we also create a new root-domain object and move each cpu member to the root-domain's span. By default the system creates a single root-domain with all cpus as members (mimicking the global state we have today). We add some plumbing for storing class specific data in our root-domain. Whenever a RQ is switching root-domains (because of repartitioning) we give each sched_class the opportunity to remove any state from its old domain and add state to the new one. This logic doesn't have any clients yet but it will later in the series. Signed-off-by: NGregory Haskins <ghaskins@novell.com> CC: Christoph Lameter <clameter@sgi.com> CC: Paul Jackson <pj@sgi.com> CC: Simon Derr <simon.derr@bull.net> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Steven Rostedt 提交于
rt-balance when creating new tasks. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Gregory Haskins 提交于
We have logic to detect whether the system has migratable tasks, but we are not using it when deciding whether to push tasks away. So we add support for considering this new information. Signed-off-by: NGregory Haskins <ghaskins@novell.com> Signed-off-by: NSteven Rostedt <srostedt@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-