- 25 3月, 2009 1 次提交
-
-
由 Luis Henriques 提交于
Impact: cleanup, new schedstat ABI Since they are used on in statistics and are always set to zero, the following fields from struct rq have been removed: yld_exp_empty, yld_act_empty and yld_both_empty. Both Sched Debug and SCHEDSTAT_VERSION versions has also been incremented since ABIs have been changed. The schedtop tool has been updated to properly handle new version of schedstat: http://rt.wiki.kernel.org/index.php/Schedtop_utilitySigned-off-by: NLuis Henriques <henrix@sapo.pt> Acked-by: NGregory Haskins <ghaskins@novell.com> Acked-by: NPeter Zijlstra <peterz@infradead.org> LKML-Reference: <20090324221002.GA10061@hades.domain.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 17 3月, 2009 2 次提交
-
-
由 Luis Henriques 提交于
There were 3 invocations of task_hot() in can_migrate_task(). Replace these 3 invocations by only one invocation, cached in a local variable. Signed-off-by: NLuis Henriques <henrix@sapo.pt> LKML-Reference: <20090316195902.GA6197@hades.domain.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Luis Henriques 提交于
Fixed typos in function documentation. Signed-off-by: NLuis Henriques <henrix@sapo.pt> LKML-Reference: <20090316195809.GA6073@hades.domain.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 11 3月, 2009 1 次提交
-
-
由 Mike Galbraith 提交于
Impact: more precise avg_overlap metric - better load-balancing avg_overlap is used to measure the runtime overlap of the waker and wakee. However, when a process changes behaviour, eg a pipe becomes un-congested and we don't need to go to sleep after a wakeup for a while, the avg_overlap value grows stale. When running we use the avg runtime between preemption as a measure for avg_overlap since the amount of runtime can be correlated to cache footprint. The longer we run, the less likely we'll be wanting to be migrated to another CPU. Signed-off-by: NMike Galbraith <efault@gmx.de> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1236709131.25234.576.camel@laptop> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 10 3月, 2009 1 次提交
-
-
由 Peter Zijlstra 提交于
Impact: micro-optimization We can avoid the sched domain walk on try_to_wake_up() when we know there are no groups. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1236603381.8389.455.camel@laptop> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 06 3月, 2009 1 次提交
-
-
由 Lai Jiangshan 提交于
Impact: cleanup Use test_tsk_need_resched(), set_tsk_need_resched(), need_resched() instead of using TIF_NEED_RESCHED. Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <49B10BA4.9070209@cn.fujitsu.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 05 3月, 2009 1 次提交
-
-
由 Frederic Weisbecker 提交于
Impact: fix function graph trace hang / drop pointless softirq on UP While debugging a function graph trace hang on an old PII, I saw that it consumed most of its time on the timer interrupt. And the domain rebalancing softirq was the most concerned. The timer interrupt calls trigger_load_balance() which will decide if it is worth to schedule a rebalancing softirq. In case of builtin UP kernel, no problem arises because there is no domain question. In case of builtin SMP kernel running on an SMP box, still no problem, the softirq will be raised each time we reach the next_balance time. In case of builtin SMP kernel running on a UP box (most distros provide default SMP kernels, whatever the box you have), then the CPU is attached to the NULL sched domain. So a kind of unexpected behaviour happen: trigger_load_balance() -> raises the rebalancing softirq later on softirq: run_rebalance_domains() -> rebalance_domains() where the for_each_domain(cpu, sd) is not taken because of the NULL domain we are attached at. Which means rq->next_balance is never updated. So on the next timer tick, we will enter trigger_load_balance() which will always reschedule() the rebalacing softirq: if (time_after_eq(jiffies, rq->next_balance)) raise_softirq(SCHED_SOFTIRQ); So for each tick, we process this pointless softirq. This patch fixes it by checking if we are attached to the null domain before raising the softirq, another possible fix would be to set the maximal possible JIFFIES value to rq->next_balance if we are attached to the NULL domain. v2: build fix on UP Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <49af242d.1c07d00a.32d5.ffffc019@mx.google.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 02 3月, 2009 1 次提交
-
-
由 Wang Chen 提交于
Impact: micro-optimization Parameter "prev" is not used really. Signed-off-by: NWang Chen <wangchen@cn.fujitsu.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 27 2月, 2009 1 次提交
-
-
由 Dhaval Giani 提交于
Impact: fix hung task with certain (non-default) rt-limit settings Corey Hickey reported that on using setuid to change the uid of a rt process, the process would be unkillable and not be running. This is because there was no rt runtime for that user group. Add in a check to see if a user can attach an rt task to its task group. On failure, return EINVAL, which is also returned in CONFIG_CGROUP_SCHED. Reported-by: NCorey Hickey <bugfood-ml@fatooh.org> Signed-off-by: NDhaval Giani <dhaval@linux.vnet.ibm.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 26 2月, 2009 2 次提交
-
-
由 Hiroshi Shimamoto 提交于
Impact: fix incorrect condition check No need to start rt bandwidth timer when rt bandwidth is disabled. If this timer starts, it may stop at sched_rt_period_timer() on the first time. Signed-off-by: NHiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Li Zefan 提交于
cpuacct_charge() is in fast-path, and checking of !cpuacct_susys.active always returns false after cpuacct has been initialized at system boot. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Menage <menage@google.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 16 2月, 2009 2 次提交
-
-
由 Américo Wang 提交于
#define TASK_NICE(p) PRIO_TO_NICE((p)->static_prio) So it's better to use TASK_NICE here. Signed-off-by: NWANG Cong <wangcong@zeuux.org> Acked-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Henrik Austad 提交于
Impact: struct rq size optimization The idle_at_tick in struct rq is only used in SMP settings and it does not make sense to have this in the rq in an UP setup. Signed-off-by: NHenrik Austad <henrik@austad.us> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 12 2月, 2009 1 次提交
-
-
由 Ingo Molnar 提交于
rq_attach_root() does a kfree() with the runqueue lock held. That's not a very wise move, fix it. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 11 2月, 2009 1 次提交
-
-
由 Peter Zijlstra 提交于
Intel reported a 10% regression (mysql+sysbench) on a 16-way machine with these patches: 1596e297: sched: symmetric sync vs avg_overlap d942fb6c: sched: fix sync wakeups Revert them. Reported-by: N"Zhang, Yanmin" <yanmin_zhang@linux.intel.com> Bisected-by: NLin Ming <ming.m.lin@intel.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 06 2月, 2009 1 次提交
-
-
由 Johannes Weiner 提交于
With exclusive waiters, every process woken up through the wait queue must ensure that the next waiter down the line is woken when it has finished. Interruptible waiters don't do that when aborting due to a signal. And if an aborting waiter is concurrently woken up through the waitqueue, noone will ever wake up the next waiter. This has been observed with __wait_on_bit_lock() used by lock_page_killable(): the first contender on the queue was aborting when the actual lock holder woke it up concurrently. The aborted contender didn't acquire the lock and therefor never did an unlock followed by waking up the next waiter. Add abort_exclusive_wait() which removes the process' wait descriptor from the waitqueue, iff still queued, or wakes up the next waiter otherwise. It does so under the waitqueue lock. Racing with a wake up means the aborting process is either already woken (removed from the queue) and will wake up the next waiter, or it will remove itself from the queue and the concurrent wake up will apply to the next waiter after it. Use abort_exclusive_wait() in __wait_event_interruptible_exclusive() and __wait_on_bit_lock() when they were interrupted by other means than a wake up through the queue. [akpm@linux-foundation.org: coding-style fixes] Reported-by: NChris Mason <chris.mason@oracle.com> Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org> Mentored-by: NOleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Chuck Lever <cel@citi.umich.edu> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org> ["after some testing"] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 2月, 2009 1 次提交
-
-
由 Suresh Siddha 提交于
Christian Borntraeger reports: > After a logical cpu offline, even on a complete idle system, there > is one cpu with full ticks. It turns out that nohz.cpu_mask has the > the offlined cpu still set. > > In select_nohz_load_balancer() we check if the system is completely > idle to turn of load balancing. We compare cpu_online_map with > nohz.cpu_mask. Since cpu_online_map is updated on cpu unplug, > but nohz.cpu_mask is not, the check fails and the scheduler believes > that we need an "idle load balancer" even on a fully idle system. > Since the ilb cpu does not deactivate the timer tick this breaks NOHZ. Fix the select_nohz_load_balancer() to not set the nohz.cpu_mask while a cpu is going offline. Reported-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> Tested-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 01 2月, 2009 2 次提交
-
-
由 Peter Zijlstra 提交于
Reinstate the weakening of the sync hint if set. This yields a more symmetric usage of avg_overlap. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Pawel Dziekonski reported that the openssl benchmark and his quantum chemistry application both show slowdowns due to the scheduler under-parallelizing execution. The reason are pipe wakeups still doing 'sync' wakeups which overrides the normal buddy wakeup logic - even if waker and wakee are loosely coupled. Fix an inversion of logic in the buddy wakeup code. Reported-by: NPawel Dziekonski <dzieko@gmail.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 15 1月, 2009 3 次提交
-
-
由 Peter Zijlstra 提交于
Increase the SCHED_IDLE weight from 2 to 3, this gives much more stable vruntime numbers. time advanced in 100ms: weight=2 64765.988352 67012.881408 88501.412352 weight=3 35496.181411 34130.971298 35497.411573 Signed-off-by: NMike Galbraith <efault@gmx.de> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Impact: make rt-limit tunables work again Mark Glines reported: > I've got an issue on x86-64 where I can't configure the system to allow > RT tasks for a non-root user. > > In 2.6.26.5, I was able to do the following to set things up nicely: > echo 450000 >/sys/kernel/uids/0/cpu_rt_runtime > echo 450000 >/sys/kernel/uids/1000/cpu_rt_runtime > > Seems like every value I try to echo into the /sys files returns EINVAL. For UID grouping we initialize the root group with infinite bandwidth which by default is actually more than the global limit, therefore the bandwidth check always fails. Because the root group is a phantom group (for UID grouping) we cannot runtime adjust it, therefore we let it reflect the global bandwidth settings. Reported-by: NMark Glines <mark@glines.org> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Introduce a new avg_wakeup statistic. avg_wakeup is a measure of how frequently a task wakes up other tasks, it represents the average time between wakeups, with a limit of avg_runtime for when it doesn't wake up anybody. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NMike Galbraith <efault@gmx.de> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 14 1月, 2009 4 次提交
-
-
由 Gregory Haskins 提交于
Ingo found a build error in the scheduler when RT_GROUP_SCHED was enabled, but SMP was not. This patch rearranges the code such that it is a little more streamlined and compiles under all permutations of SMP, UP and RT_GROUP_SCHED. It was boot tested on my 4-way x86_64 and it still passes preempt-test. Signed-off-by: NGregory Haskins <ghaskins@novell.com>
-
由 Heiko Carstens 提交于
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
-
由 Heiko Carstens 提交于
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
-
由 Heiko Carstens 提交于
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
-
- 12 1月, 2009 1 次提交
-
-
由 Ingo Molnar 提交于
This reverts commit 7317d7b8. This has been reported (and bisected) by Alexey Zaytsev and Kamalesh Babulal to produce annoying warnings during bootup on both x86 and powerpc. kernel_locked() is not a valid test in IRQ context (we update the BKL's ->lock_depth and the preempt count separately and non-atomicalyy), so we cannot put it into the generic preempt debugging checks which can run in IRQ contexts too. Reported-and-bisected-by: NAlexey Zaytsev <alexey.zaytsev@gmail.com> Reported-and-bisected-by: NKamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 11 1月, 2009 2 次提交
-
-
由 Steven Noonan 提交于
Impact: build fix on certain configs Added 'double_rq_lock' forward declaration, allowing double_rq_lock to be used in _double_lock_balance(). Signed-off-by: NSteven Noonan <steven@uplinklabs.net> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Rusty Russell 提交于
Impact: fix panic on ia64 with NR_CPUS=1024 struct sched_domain is now a dangling structure; where we really want static ones, we need to use static_sched_domain. (As the FIXME in this file says, cpumask_var_t would be better, but this code is hairy enough without trying to add initialization code to the right places). Reported-by: NMike Travis <travis@sgi.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 07 1月, 2009 1 次提交
-
-
由 Peter Zijlstra 提交于
Vaidyanathan Srinivasan reported: > ============================================= > [ INFO: possible recursive locking detected ] > 2.6.28-autotest-tip-sv #1 > --------------------------------------------- > klogd/5062 is trying to acquire lock: > (&rq->lock){++..}, at: [<ffffffff8022aca2>] task_rq_lock+0x45/0x7e > > but task is already holding lock: > (&rq->lock){++..}, at: [<ffffffff805f7354>] schedule+0x158/0xa31 With sched_mc at 2. (it is default-off) Strictly speaking we'll not deadlock, because ttwu will not be able to place the migration task on our rq, but since the code can deal with both rqs getting unlocked, this seems the easiest way out. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 06 1月, 2009 2 次提交
-
-
由 Li Zefan 提交于
init_rootdomain() calls alloc_bootmem_cpumask_var() at system boot, so does cpupri_init(). Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Li Zefan 提交于
It's not the responsibility of init_rootdomain() to free root_domain allocated by alloc_rootdomain(). Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Reviewed-by: NPekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 05 1月, 2009 2 次提交
-
-
由 Li Zefan 提交于
- Make arch_reinit_sched_domains() static. It was exported to be used in s390, but now rebuild_sched_domains() is used instead. - Make it return void. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Li Zefan 提交于
Impact: cleanup The only caller is cpu_dev_init() which is marked as __init. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 04 1月, 2009 1 次提交
-
-
由 Mike Travis 提交于
Impact: prevents panic from stack overflow on numa-capable machines. Some of the "removal of stack hogs" changes in kernel/sched.c by using node_to_cpumask_ptr were undone by the early cpumask API updates, and causes a panic due to stack overflow. This patch undoes those changes by using cpumask_of_node() which returns a 'const struct cpumask *'. In addition, cpu_coregoup_map is replaced with cpu_coregroup_mask further reducing stack usage. (Both of these updates removed 9 FIXME's!) Also: Pick up some remaining changes from the old 'cpumask_t' functions to the new 'struct cpumask *' functions. Optimize memory traffic by allocating each percpu local_cpu_mask on the same node as the referring cpu. Signed-off-by: NMike Travis <travis@sgi.com> Acked-by: NRusty Russell <rusty@rustcorp.com.au> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 31 12月, 2008 2 次提交
-
-
由 Martin Schwidefsky 提交于
The cpu time spent by the idle process actually doing something is currently accounted as idle time. This is plain wrong, the architectures that support VIRT_CPU_ACCOUNTING=y can do better: distinguish between the time spent doing nothing and the time spent by idle doing work. The first is accounted with account_idle_time and the second with account_system_time. The architectures that use the account_xxx_time interface directly and not the account_xxx_ticks interface now need to do the check for the idle process in their arch code. In particular to improve the system vs true idle time accounting the arch code needs to measure the true idle time instead of just testing for the idle process. To improve the tick based accounting as well we would need an architecture primitive that can tell us if the pt_regs of the interrupted context points to the magic instruction that halts the cpu. In addition idle time is no more added to the stime of the idle process. This field now contains the system time of the idle process as it should be. On systems without VIRT_CPU_ACCOUNTING this will always be zero as every tick that occurs while idle is running will be accounted as idle time. This patch contains the necessary common code changes to be able to distinguish idle system time and true idle time. The architectures with support for VIRT_CPU_ACCOUNTING need some changes to exploit this. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Martin Schwidefsky 提交于
The utimescaled / stimescaled fields in the task structure and the global cpustat should be set on all architectures. On s390 the calls to account_user_time_scaled and account_system_time_scaled never have been added. In addition system time that is accounted as guest time to the user time of a process is accounted to the scaled system time instead of the scaled user time. To fix the bugs and to prevent future forgetfulness this patch merges account_system_time_scaled into account_system_time and account_user_time_scaled into account_user_time. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Michael Neuling <mikey@neuling.org> Acked-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 29 12月, 2008 3 次提交
-
-
由 Gregory Haskins 提交于
The RT scheduler employs a "push/pull" design to actively balance tasks within the system (on a per disjoint cpuset basis). When a task is awoken, it is immediately determined if there are any lower priority cpus which should be preempted. This is opposed to the way normal SCHED_OTHER tasks behave, which will wait for a periodic rebalancing operation to occur before spreading out load. When a particular RQ has more than 1 active RT task, it is said to be in an "overloaded" state. Once this occurs, the system enters the active balancing mode, where it will try to push the task away, or persuade a different cpu to pull it over. The system will stay in this state until the system falls back below the <= 1 queued RT task per RQ. However, the current implementation suffers from a limitation in the push logic. Once overloaded, all tasks (other than current) on the RQ are analyzed on every push operation, even if it was previously unpushable (due to affinity, etc). Whats more, the operation stops at the first task that is unpushable and will not look at items lower in the queue. This causes two problems: 1) We can have the same tasks analyzed over and over again during each push, which extends out the fast path in the scheduler for no gain. Consider a RQ that has dozens of tasks that are bound to a core. Each one of those tasks will be encountered and skipped for each push operation while they are queued. 2) There may be lower-priority tasks under the unpushable task that could have been successfully pushed, but will never be considered until either the unpushable task is cleared, or a pull operation succeeds. The net result is a potential latency source for mid priority tasks. This patch aims to rectify these two conditions by introducing a new priority sorted list: "pushable_tasks". A task is added to the list each time a task is activated or preempted. It is removed from the list any time it is deactivated, made current, or fails to push. This works because a task only needs to be attempted to push once. After an initial failure to push, the other cpus will eventually try to pull the task when the conditions are proper. This also solves the problem that we don't completely analyze all tasks due to encountering an unpushable tasks. Now every task will have a push attempted (when appropriate). This reduces latency both by shorting the critical section of the rq->lock for certain workloads, and by making sure the algorithm considers all eligible tasks in the system. [ rostedt: added a couple more BUG_ONs ] Signed-off-by: NGregory Haskins <ghaskins@novell.com> Acked-by: NSteven Rostedt <srostedt@redhat.com>
-
由 Gregory Haskins 提交于
We currently run class->post_schedule() outside of the rq->lock, which means that we need to test for the need to post_schedule outside of the lock to avoid a forced reacquistion. This is currently not a problem as we only look at rq->rt.overloaded. However, we want to enhance this going forward to look at more state to reduce the need to post_schedule to a bare minimum set. Therefore, we introduce a new member-func called needs_post_schedule() which tests for the post_schedule condtion without actually performing the work. Therefore it is safe to call this function before the rq->lock is released, because we are guaranteed not to drop the lock at an intermediate point (such as what post_schedule() may do). We will use this later in the series [ rostedt: removed paranoid BUG_ON ] Signed-off-by: NGregory Haskins <ghaskins@novell.com>
-
由 Gregory Haskins 提交于
double_lock balance() currently favors logically lower cpus since they often do not have to release their own lock to acquire a second lock. The result is that logically higher cpus can get starved when there is a lot of pressure on the RQs. This can result in higher latencies on higher cpu-ids. This patch makes the algorithm more fair by forcing all paths to have to release both locks before acquiring them again. Since callsites to double_lock_balance already consider it a potential preemption/reschedule point, they have the proper logic to recheck for atomicity violations. Signed-off-by: NGregory Haskins <ghaskins@novell.com>
-