- 06 2月, 2006 2 次提交
-
-
由 Chuck Ebbert 提交于
migration_cost prints after every CPU hotplug event. Make it print only once at boot. Signed-off-by: NChuck Ebbert <76306.1226@compuserve.com> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Eric Dumazet 提交于
percpu_data blindly allocates bootmem memory to store NR_CPUS instances of cpudata, instead of allocating memory only for possible cpus. As a preparation for changing that, we need to convert various 0 -> NR_CPUS loops to use for_each_cpu(). (The above only applies to users of asm-generic/percpu.h. powerpc has gone it alone and is presently only allocating memory for present CPUs, so it's currently corrupting memory). Signed-off-by: NEric Dumazet <dada1@cosmosbay.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: James Bottomley <James.Bottomley@steeleye.com> Acked-by: NIngo Molnar <mingo@elte.hu> Cc: Jens Axboe <axboe@suse.de> Cc: Anton Blanchard <anton@samba.org> Acked-by: NWilliam Irwin <wli@holomorphy.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 02 2月, 2006 1 次提交
-
-
由 Jack Steiner 提交于
Change sched_getaffinity() so that it returns a bitmap that indicates the legally schedulable cpus that a task is allowed to run on. Without this patch, if CONFIG_HOTPLUG_CPU is enabled, sched_getaffinity() unconditionally returns (at least on IA64) a mask with NR_CPUS bits set. This conveys no useful infornmation except for a kernel compile option. This fixes a breakage we obseved running recent kernels. We have MPI jobs that use sched_getaffinity() to determine where to place their threads. Placing them on non-existant cpus is problematic :-) Signed-off-by: NJack Steiner <steiner@sgi.com> Acked-by: NIngo Molnar <mingo@elte.hu> Cc: Nathan Lynch <ntl@pobox.com> Cc: Paul Jackson <pj@sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 01 2月, 2006 1 次提交
-
-
由 Ingo Molnar 提交于
This reduces the amount of time the migration cost calculations cost during bootup. Based on numbers by Tony Luck <tony.luck@intel.com>. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 19 1月, 2006 1 次提交
-
-
由 Jason Baron 提交于
Currently, a negative policy argument passed into the 'sys_sched_setscheduler()' system call, will return with success. However, the manpage for 'sys_sched_setscheduler' says: EINVAL The scheduling policy is not one of the recognized policies, or the parameter p does not make sense for the policy. Signed-off-by: NJason Baron <jbaron@redhat.com> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 15 1月, 2006 2 次提交
-
-
由 Arjan van de Ven 提交于
Remove the "inline" keyword from a bunch of big functions in the kernel with the goal of shrinking it by 30kb to 40kb Signed-off-by: NArjan van de Ven <arjan@infradead.org> Signed-off-by: NIngo Molnar <mingo@elte.hu> Acked-by: NJeff Garzik <jgarzik@pobox.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Ingo Molnar 提交于
Add a new SCHED_BATCH (3) scheduling policy: such tasks are presumed CPU-intensive, and will acquire a constant +5 priority level penalty. Such policy is nice for workloads that are non-interactive, but which do not want to give up their nice levels. The policy is also useful for workloads that want a deterministic scheduling policy without interactivity causing extra preemptions (between that workload's tasks). Signed-off-by: NIngo Molnar <mingo@elte.hu> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 13 1月, 2006 2 次提交
-
-
由 akpm@osdl.org 提交于
) From: Nick Piggin <nickpiggin@yahoo.com.au> Track the last waker CPU, and only consider wakeup-balancing if there's a match between current waker CPU and the previous waker CPU. This ensures that there is some correlation between two subsequent wakeup events before we move the task. Should help random-wakeup workloads on large SMP systems, by reducing the migration attempts by a factor of nr_cpus. Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 akpm@osdl.org 提交于
) From: Ingo Molnar <mingo@elte.hu> This is the latest version of the scheduler cache-hot-auto-tune patch. The first problem was that detection time scaled with O(N^2), which is unacceptable on larger SMP and NUMA systems. To solve this: - I've added a 'domain distance' function, which is used to cache measurement results. Each distance is only measured once. This means that e.g. on NUMA distances of 0, 1 and 2 might be measured, on HT distances 0 and 1, and on SMP distance 0 is measured. The code walks the domain tree to determine the distance, so it automatically follows whatever hierarchy an architecture sets up. This cuts down on the boot time significantly and removes the O(N^2) limit. The only assumption is that migration costs can be expressed as a function of domain distance - this covers the overwhelming majority of existing systems, and is a good guess even for more assymetric systems. [ People hacking systems that have assymetries that break this assumption (e.g. different CPU speeds) should experiment a bit with the cpu_distance() function. Adding a ->migration_distance factor to the domain structure would be one possible solution - but lets first see the problem systems, if they exist at all. Lets not overdesign. ] Another problem was that only a single cache-size was used for measuring the cost of migration, and most architectures didnt set that variable up. Furthermore, a single cache-size does not fit NUMA hierarchies with L3 caches and does not fit HT setups, where different CPUs will often have different 'effective cache sizes'. To solve this problem: - Instead of relying on a single cache-size provided by the platform and sticking to it, the code now auto-detects the 'effective migration cost' between two measured CPUs, via iterating through a wide range of cachesizes. The code searches for the maximum migration cost, which occurs when the working set of the test-workload falls just below the 'effective cache size'. I.e. real-life optimized search is done for the maximum migration cost, between two real CPUs. This, amongst other things, has the positive effect hat if e.g. two CPUs share a L2/L3 cache, a different (and accurate) migration cost will be found than between two CPUs on the same system that dont share any caches. (The reliable measurement of migration costs is tricky - see the source for details.) Furthermore i've added various boot-time options to override/tune migration behavior. Firstly, there's a blanket override for autodetection: migration_cost=1000,2000,3000 will override the depth 0/1/2 values with 1msec/2msec/3msec values. Secondly, there's a global factor that can be used to increase (or decrease) the autodetected values: migration_factor=120 will increase the autodetected values by 20%. This option is useful to tune things in a workload-dependent way - e.g. if a workload is cache-insensitive then CPU utilization can be maximized by specifying migration_factor=0. I've tested the autodetection code quite extensively on x86, on 3 P3/Xeon/2MB, and the autodetected values look pretty good: Dual Celeron (128K L2 cache): --------------------- migration cost matrix (max_cache_size: 131072, cpu: 467 MHz): --------------------- [00] [01] [00]: - 1.7(1) [01]: 1.7(1) - --------------------- cacheflush times [2]: 0.0 (0) 1.7 (1784008) --------------------- Here the slow memory subsystem dominates system performance, and even though caches are small, the migration cost is 1.7 msecs. Dual HT P4 (512K L2 cache): --------------------- migration cost matrix (max_cache_size: 524288, cpu: 2379 MHz): --------------------- [00] [01] [02] [03] [00]: - 0.4(1) 0.0(0) 0.4(1) [01]: 0.4(1) - 0.4(1) 0.0(0) [02]: 0.0(0) 0.4(1) - 0.4(1) [03]: 0.4(1) 0.0(0) 0.4(1) - --------------------- cacheflush times [2]: 0.0 (33900) 0.4 (448514) --------------------- Here it can be seen that there is no migration cost between two HT siblings (CPU#0/2 and CPU#1/3 are separate physical CPUs). A fast memory system makes inter-physical-CPU migration pretty cheap: 0.4 msecs. 8-way P3/Xeon [2MB L2 cache]: --------------------- migration cost matrix (max_cache_size: 2097152, cpu: 700 MHz): --------------------- [00] [01] [02] [03] [04] [05] [06] [07] [00]: - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) [01]: 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) [02]: 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) [03]: 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1) [04]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1) [05]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1) [06]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1) [07]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) - --------------------- cacheflush times [2]: 0.0 (0) 19.2 (19281756) --------------------- This one has huge caches and a relatively slow memory subsystem - so the migration cost is 19 msecs. Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAshok Raj <ashok.raj@intel.com> Signed-off-by: NKen Chen <kenneth.w.chen@intel.com> Cc: <wilder@us.ibm.com> Signed-off-by: NJohn Hawkes <hawkes@sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 12 1月, 2006 2 次提交
-
-
由 Andi Kleen 提交于
They are referred to often so avoid potential false sharing for them. Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Randy.Dunlap 提交于
- Move capable() from sched.h to capability.h; - Use <linux/capability.h> where capable() is used (in include/, block/, ipc/, kernel/, a few drivers/, mm/, security/, & sound/; many more drivers/ to go) Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 10 1月, 2006 1 次提交
-
-
由 Ingo Molnar 提交于
more mutex debugging: check for held locks during memory freeing, task exit, enable sysrq printouts, etc. Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NArjan van de Ven <arjan@infradead.org>
-
- 09 1月, 2006 1 次提交
-
-
由 Ingo Molnar 提交于
RCU tasklist_lock and RCU signal handling: send signals RCU-read-locked instead of tasklist_lock read-locked. This is a scalability improvement on SMP and a preemption-latency improvement under PREEMPT_RCU. Signed-off-by: NPaul E. McKenney <paulmck@us.ibm.com> Signed-off-by: NIngo Molnar <mingo@elte.hu> Acked-by: NWilliam Irwin <wli@holomorphy.com> Cc: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 14 11月, 2005 2 次提交
-
-
由 Al Viro 提交于
encapsulates the rest of arch-dependent operations with thread_info access. Two new helpers - setup_thread_stack() and end_of_stack(). For normal case the former consists of copying thread_info of parent to new thread_info and the latter returns pointer immediately past the end of thread_info. Signed-off-by: NAl Viro <viro@parcelfarce.linux.theplanet.co.uk> Signed-off-by: NRoman Zippel <zippel@linux-m68k.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Al Viro 提交于
new helper - task_thread_info(task). On platforms that have thread_info allocated separately (i.e. in default case) it simply returns task->thread_info. m68k wants (and for good reasons) to embed its thread_info into task_struct. So it will (in later patch) have task_thread_info() of its own. For now we just add a macro for generic case and convert existing instances of its body in core kernel to uses of new macro. Obviously safe - all normal architectures get the same preprocessor output they used to get. Signed-off-by: NAl Viro <viro@parcelfarce.linux.theplanet.co.uk> Signed-off-by: NRoman Zippel <zippel@linux-m68k.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 10 11月, 2005 1 次提交
-
-
由 Chen, Kenneth W 提交于
recalc_task_prio() is called from activate_task() to calculate dynamic priority and interactive credit for the activating task. For real-time scheduling process, all that dynamic calculation is thrown away at the end because rt priority is fixed. Patch to optimize recalc_task_prio() away for rt processes. Signed-off-by: NKen Chen <kenneth.w.chen@intel.com> Acked-by: NIngo Molnar <mingo@elte.hu> Cc: Nick Piggin <piggin@cyberone.com.au> Cc: Con Kolivas <kernel@kolivas.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 09 11月, 2005 7 次提交
-
-
由 Nick Piggin 提交于
Make some changes to the NEED_RESCHED and POLLING_NRFLAG to reduce confusion, and make their semantics rigid. Improves efficiency of resched_task and some cpu_idle routines. * In resched_task: - TIF_NEED_RESCHED is only cleared with the task's runqueue lock held, and as we hold it during resched_task, then there is no need for an atomic test and set there. The only other time this should be set is when the task's quantum expires, in the timer interrupt - this is protected against because the rq lock is irq-safe. - If TIF_NEED_RESCHED is set, then we don't need to do anything. It won't get unset until the task get's schedule()d off. - If we are running on the same CPU as the task we resched, then set TIF_NEED_RESCHED and no further action is required. - If we are running on another CPU, and TIF_POLLING_NRFLAG is *not* set after TIF_NEED_RESCHED has been set, then we need to send an IPI. Using these rules, we are able to remove the test and set operation in resched_task, and make clear the previously vague semantics of POLLING_NRFLAG. * In idle routines: - Enter cpu_idle with preempt disabled. When the need_resched() condition becomes true, explicitly call schedule(). This makes things a bit clearer (IMO), but haven't updated all architectures yet. - Many do a test and clear of TIF_NEED_RESCHED for some reason. According to the resched_task rules, this isn't needed (and actually breaks the assumption that TIF_NEED_RESCHED is only cleared with the runqueue lock held). So remove that. Generally one less locked memory op when switching to the idle thread. - Many idle routines clear TIF_POLLING_NRFLAG, and only set it in the inner most polling idle loops. The above resched_task semantics allow it to be set until before the last time need_resched() is checked before going into a halt requiring interrupt wakeup. Many idle routines simply never enter such a halt, and so POLLING_NRFLAG can be always left set, completely eliminating resched IPIs when rescheduling the idle task. POLLING_NRFLAG width can be increased, to reduce the chance of resched IPIs. Signed-off-by: NNick Piggin <npiggin@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Con Kolivas <kernel@kolivas.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Con Kolivas 提交于
The intermittent scheduling of the migration thread at ultra high priority makes the smp nice handling see that runqueue as being heavily loaded. The migration thread itself actually handles the balancing so its influence on priority balancing should be ignored. Signed-off-by: NCon Kolivas <kernel@kolivas.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Con Kolivas 提交于
The priority biasing was off by mutliplying the total load by the total priority bias and this ruins the ratio of loads between runqueues. This patch should correct the ratios of loads between runqueues to be proportional to overall load. -2nd attempt. From: Dave Kleikamp <shaggy@austin.ibm.com> This patch fixes a divide-by-zero error that I hit on a two-way i386 machine. rq->nr_running is tested to be non-zero, but may change by the time it is used in the division. Saving the value to a local variable ensures that the same value that is checked is used in the division. Signed-off-by: NCon Kolivas <kernel@kolivas.org> Signed-off-by: NDave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Con Kolivas 提交于
To intensify the 'nice' support across physical cpus on SMP we can bias the loads on idle rebalancing. To prevent idle rebalance from trying to pull tasks from queues that appear heavily loaded we only bias the load if there is more than one task running. Add some minor micro-optimisations and have only one return from __source_load and __target_load functions. Fix the fact that target_load was not biased by priority when type == 0. Signed-off-by: NCon Kolivas <kernel@kolivas.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Con Kolivas 提交于
Real time tasks' effect on prio_bias should be based on their real time priority level instead of their static_prio which is based on nice. Signed-off-by: NCon Kolivas <kernel@kolivas.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Con Kolivas 提交于
prio_bias should only be adjusted in set_user_nice if p is actually currently queued. Signed-off-by: NCon Kolivas <kernel@kolivas.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Con Kolivas 提交于
This patch implements 'nice' support across physical cpus on SMP. It introduces an extra runqueue variable prio_bias which is the sum of the (inverted) static priorities of all the tasks on the runqueue. This is then used to bias busy rebalancing between runqueues to obtain good distribution of tasks of different nice values. By biasing the balancing only during busy rebalancing we can avoid having any significant loss of throughput by not affecting the carefully tuned idle balancing already in place. If all tasks are running at the same nice level this code should also have minimal effect. The code is optimised out in the !CONFIG_SMP case. Signed-off-by: NCon Kolivas <kernel@kolivas.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 07 11月, 2005 2 次提交
-
-
由 Adrian Bunk 提交于
I didn't find any possible modular usage in the kernel. Signed-off-by: NAdrian Bunk <bunk@stusta.de> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Heiko Carstens 提交于
Replace smp_processor_id() with any_online_cpu(cpu_online_map) in order to avoid lots of "BUG: using smp_processor_id() in preemptible [00000001] code:..." messages in case taking a cpu online fails. All the traces start at the last notifier_call_chain(...) in kernel/cpu.c. Since we hold the cpu_control semaphore it shouldn't be any problem to access cpu_online_map. The reason why cpu_up failed is simply that the cpu that was supposed to be taken online wasn't even there. That is because on s390 we never know when a new cpu comes and therefore cpu_possible_map consists of only ones and doesn't reflect reality. Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 05 11月, 2005 1 次提交
-
-
由 Oleg Nesterov 提交于
Do not transfer remaining time slice to another cpu on process exit. Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 31 10月, 2005 1 次提交
-
-
由 Paul Jackson 提交于
Simplify the UP (1 CPU) implementatin of set_cpus_allowed. The one CPU is hardcoded to be cpu 0 - so just test for that bit, and avoid having to pick up the cpu_online_map. Also, unexport cpu_online_map: it was only needed for set_cpus_allowed(). Signed-off-by: NPaul Jackson <pj@sgi.com> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 30 10月, 2005 1 次提交
-
-
由 Hugh Dickins 提交于
update_mem_hiwater has attracted various criticisms, in particular from those concerned with mm scalability. Originally it was called whenever rss or total_vm got raised. Then many of those callsites were replaced by a timer tick call from account_system_time. Now Frank van Maarseveen reports that to be found inadequate. How about this? Works for Frank. Replace update_mem_hiwater, a poor combination of two unrelated ops, by macros update_hiwater_rss and update_hiwater_vm. Don't attempt to keep mm->hiwater_rss up to date at timer tick, nor every time we raise rss (usually by 1): those are hot paths. Do the opposite, update only when about to lower rss (usually by many), or just before final accounting in do_exit. Handle mm->hiwater_vm in the same way, though it's much less of an issue. Demand that whoever collects these hiwater statistics do the work of taking the maximum with rss or total_vm. And there has been no collector of these hiwater statistics in the tree. The new convention needs an example, so match Frank's usage by adding a VmPeak line above VmSize to /proc/<pid>/status, and also a VmHWM line above VmRSS (High-Water-Mark or High-Water-Memory). There was a particular anomaly during mremap move, that hiwater_vm might be captured too high. A fleeting such anomaly remains, but it's quickly corrected now, whereas before it would stick. What locking? None: if the app is racy then these statistics will be racy, it's not worth any overhead to make them exact. But whenever it suits, hiwater_vm is updated under exclusive mmap_sem, and hiwater_rss under page_table_lock (for now) or with preemption disabled (later on): without going to any trouble, minimize the time between reading current values and updating, to minimize those occasions when a racing thread bumps a count up and back down in between. Signed-off-by: NHugh Dickins <hugh@veritas.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 27 10月, 2005 1 次提交
-
-
由 Andrew Morton 提交于
With CONFIG_SMP=n: *** Warning: "cpu_online_map" [drivers/firmware/dcdbas.ko] undefined! due to set_cpus_allowed(). Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 14 9月, 2005 1 次提交
-
-
由 Ingo Molnar 提交于
fix up the runqueue lock owner only if we truly did a context-switch with the runqueue lock held. Impacts ia64, mips, sparc64 and arm. Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 12 9月, 2005 2 次提交
-
-
由 Linus Torvalds 提交于
..and only enable them for ia64. The functions are only valid when the whole system has been totally stopped and no scheduler activity is ongoing on any CPU, and interrupts are globally disabled. In other words, they aren't useful for anything else. So make sure that nobody can use them by mistake. Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Keith Owens 提交于
Scheduler hooks to see/change which process is deemed to be on a cpu. Signed-off-by: NKeith Owens <kaos@sgi.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 11 9月, 2005 8 次提交
-
-
由 Siddha, Suresh B 提交于
Don't pull tasks from a group if that would cause the group's total load to drop below its total cpu_power (ie. cause the group to start going idle). Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: NNick Piggin <npiggin@suse.de> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Siddha, Suresh B 提交于
Jack Steiner brought this issue at my OLS talk. Take a scenario where two tasks are pinned to two HT threads in a physical package. Idle packages in the system will keep kicking migration_thread on the busy package with out any success. We will run into similar scenarios in the presence of CMP/NUMA. Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Renaud Lienhart 提交于
In sys_sched_yield(), we cache current->array in the "array" variable, thus there's no need to dereference "current" again later. Signed-Off-By: NRenaud Lienhart <renaud.lienhart@free.fr> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Nick Piggin 提交于
If an idle sibling of an HT queue encounters a busy sibling, then make higher level load balancing of the non-idle variety. Performance of multiprocessor HT systems with low numbers of tasks (generally < number of virtual CPUs) can be significantly worse than the exact same workloads when running in non-HT mode. The reason is largely due to poor scheduling behaviour. This patch improves the situation, making the performance gap far less significant on one problematic test case (tbench). Signed-off-by: NNick Piggin <npiggin@suse.de> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Nick Piggin 提交于
During periodic load balancing, don't hold this runqueue's lock while scanning remote runqueues, which can take a non trivial amount of time especially on very large systems. Holding the runqueue lock will only help to stabilise ->nr_running, however this doesn't do much to help because tasks being woken will simply get held up on the runqueue lock, so ->nr_running would not provide a really accurate picture of runqueue load in that case anyway. What's more, ->nr_running (and possibly the cpu_load averages) of remote runqueues won't be stable anyway, so load balancing is always an inexact operation. Signed-off-by: NNick Piggin <npiggin@suse.de> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Nick Piggin 提交于
Similarly to the earlier change in load_balance, only lock the runqueue in load_balance_newidle if the busiest queue found has a nr_running > 1. This will reduce frequency of expensive remote runqueue lock aquisitions in the schedule() path on some workloads. Signed-off-by: NNick Piggin <npiggin@suse.de> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Ingo Molnar 提交于
William Weston reported unusually high scheduling latencies on his x86 HT box, on the -RT kernel. I managed to reproduce it on my HT box and the latency tracer shows the incident in action: _------=> CPU# / _-----=> irqs-off | / _----=> need-resched || / _---=> hardirq/softirq ||| / _--=> preempt-depth |||| / ||||| delay cmd pid ||||| time | caller \ / ||||| \ | / du-2803 3Dnh2 0us : __trace_start_sched_wakeup (try_to_wake_up) .............................................................. ... we are running on CPU#3, PID 2778 gets woken to CPU#1: ... .............................................................. du-2803 3Dnh2 0us : __trace_start_sched_wakeup <<...>-2778> (73 1) du-2803 3Dnh2 0us : _raw_spin_unlock (try_to_wake_up) ................................................ ... still on CPU#3, we send an IPI to CPU#1: ... ................................................ du-2803 3Dnh1 0us : resched_task (try_to_wake_up) du-2803 3Dnh1 1us : smp_send_reschedule (try_to_wake_up) du-2803 3Dnh1 1us : send_IPI_mask_bitmask (smp_send_reschedule) du-2803 3Dnh1 2us : _raw_spin_unlock_irqrestore (try_to_wake_up) ............................................... ... 1 usec later, the IPI arrives on CPU#1: ... ............................................... <idle>-0 1Dnh. 2us : smp_reschedule_interrupt (c0100c5a 0 0) So far so good, this is the normal wakeup/preemption mechanism. But here comes the scheduler anomaly on CPU#1: <idle>-0 1Dnh. 2us : preempt_schedule_irq (need_resched) <idle>-0 1Dnh. 2us : preempt_schedule_irq (need_resched) <idle>-0 1Dnh. 3us : __schedule (preempt_schedule_irq) <idle>-0 1Dnh. 3us : profile_hit (__schedule) <idle>-0 1Dnh1 3us : sched_clock (__schedule) <idle>-0 1Dnh1 4us : _raw_spin_lock_irq (__schedule) <idle>-0 1Dnh1 4us : _raw_spin_lock_irqsave (__schedule) <idle>-0 1Dnh2 5us : _raw_spin_unlock (__schedule) <idle>-0 1Dnh1 5us : preempt_schedule (__schedule) <idle>-0 1Dnh1 6us : _raw_spin_lock (__schedule) <idle>-0 1Dnh2 6us : find_next_bit (__schedule) <idle>-0 1Dnh2 6us : _raw_spin_lock (__schedule) <idle>-0 1Dnh3 7us : find_next_bit (__schedule) <idle>-0 1Dnh3 7us : find_next_bit (__schedule) <idle>-0 1Dnh3 8us : _raw_spin_unlock (__schedule) <idle>-0 1Dnh2 8us : preempt_schedule (__schedule) <idle>-0 1Dnh2 8us : find_next_bit (__schedule) <idle>-0 1Dnh2 9us : trace_stop_sched_switched (__schedule) <idle>-0 1Dnh2 9us : _raw_spin_lock (trace_stop_sched_switched) <idle>-0 1Dnh3 10us : trace_stop_sched_switched <<...>-2778> (73 8c) <idle>-0 1Dnh3 10us : _raw_spin_unlock (trace_stop_sched_switched) <idle>-0 1Dnh1 10us : _raw_spin_unlock (__schedule) <idle>-0 1Dnh. 11us : local_irq_enable_noresched (preempt_schedule_irq) <idle>-0 1Dnh. 11us < (0) we didnt pick up pid 2778! It only gets scheduled much later: <...>-2778 1Dnh2 412us : __switch_to (__schedule) <...>-2778 1Dnh2 413us : __schedule <<idle>-0> (8c 73) <...>-2778 1Dnh2 413us : _raw_spin_unlock (__schedule) <...>-2778 1Dnh1 413us : trace_stop_sched_switched (__schedule) <...>-2778 1Dnh1 414us : _raw_spin_lock (trace_stop_sched_switched) <...>-2778 1Dnh2 414us : trace_stop_sched_switched <<...>-2778> (73 1) <...>-2778 1Dnh2 414us : _raw_spin_unlock (trace_stop_sched_switched) <...>-2778 1Dnh1 415us : trace_stop_sched_switched (__schedule) the reason for this anomaly is the following code in dependent_sleeper(): /* * If a user task with lower static priority than the * running task on the SMT sibling is trying to schedule, * delay it till there is proportionately less timeslice * left of the sibling task to prevent a lower priority * task from using an unfair proportion of the * physical cpu's resources. -ck */ [...] if (((smt_curr->time_slice * (100 - sd->per_cpu_gain) / 100) > task_timeslice(p))) ret = 1; Note that in contrast to the comment above, we dont actually do the check based on static priority, we do the check based on timeslices. But timeslices go up and down, and even highprio tasks can randomly have very low timeslices (just before their next refill) and can thus be judged as 'lowprio' by the above piece of code. This condition is clearly buggy. The correct test is to check for static_prio _and_ to check for the preemption priority. Even on different static priority levels, a higher-prio interactive task should not be delayed due to a higher-static-prio CPU hog. There is a symmetric bug in the 'kick SMT sibling' code of this function as well, which can be solved in a similar way. The patch below (against the current scheduler queue in -mm) fixes both bugs. I have build and boot-tested this on x86 SMT, and nice +20 tasks still get properly throttled - so the dependent-sleeper logic is still in action. btw., these bugs pessimised the SMT scheduler because the 'delay wakeup' property was applied too liberally, so this fix is likely a throughput improvement as well. I separated out a smt_slice() function to make the code easier to read. Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Ingo Molnar 提交于
This patch implements a task state bit (TASK_NONINTERACTIVE), which can be used by blocking points to mark the task's wait as "non-interactive". This does not mean the task will be considered a CPU-hog - the wait will simply not have an effect on the waiting task's priority - positive or negative alike. Right now only pipe_wait() will make use of it, because it's a common source of not-so-interactive waits (kernel compilation jobs, etc.). Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-