- 07 1月, 2006 1 次提交
-
-
由 Christoph Lameter 提交于
Several counters already have the need to use 64 atomic variables on 64 bit platforms (see mm_counter_t in sched.h). We have to do ugly ifdefs to fall back to 32 bit atomic on 32 bit platforms. The VM statistics patch that I am working on will also make more extensive use of atomic64. This patch introduces a new type atomic_long_t by providing definitions in asm-generic/atomic.h that works similar to the c "long" type. Its 32 bits on 32 bit platforms and 64 bits on 64 bit platforms. Also cleans up the determination of the mm_counter_t in sched.h. Signed-off-by: NChristoph Lameter <clameter@sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 29 11月, 2005 1 次提交
-
-
由 Ashok Raj 提交于
There are some callers in cpufreq hotplug notify path that the lowest function calls lock_cpu_hotplug(). The lock is already held during cpu_up() and cpu_down() calls when the notify calls are broadcast to registered clients. Ideally if possible, we could disable_preempt() at the highest caller and make sure we dont sleep in the path down in cpufreq->driver_target() calls but the calls are so intertwined and cumbersome to cleanup. Hence we consistently use lock_cpu_hotplug() and unlock_cpu_hotplug() in all places. - Removed export of cpucontrol semaphore and made it static. - removed explicit uses of up/down with lock_cpu_hotplug() so we can keep track of the the callers in same thread context and just keep refcounts without calling a down() that causes a deadlock. - Removed current_in_hotplug() uses - Removed PF_HOTPLUG_CPU in sched.h introduced for the current_in_hotplug() temporary workaround. Tested with insmod of cpufreq_stat.ko, and logical online/offline to make sure we dont have any hang situations. Signed-off-by: NAshok Raj <ashok.raj@intel.com> Cc: Zwane Mwaikambo <zwane@linuxpower.ca> Cc: Shaohua Li <shaohua.li@intel.com> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 14 11月, 2005 4 次提交
-
-
由 Zach Brown 提交于
Sync iocbs have a life cycle that don't need a kioctx. Their retrying, if any, is done in the context of their owner who has allocated them on the stack. The sole user of a sync iocb's ctx reference was aio_complete() checking for an elevated iocb ref count that could never happen. No path which grabs an iocb ref has access to sync iocbs. If we were to implement sync iocb cancelation it would be done by the owner of the iocb using its on-stack reference. Removing this chunk from aio_complete allows us to remove the entire kioctx instance from mm_struct, reducing its size by a third. On a i386 testing box the slab size went from 768 to 504 bytes and from 5 to 8 per page. Signed-off-by: NZach Brown <zach.brown@oracle.com> Acked-by: NBenjamin LaHaise <bcrl@kvack.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Al Viro 提交于
a) in smp_lock.h #include of sched.h and spinlock.h moved under #ifdef CONFIG_LOCK_KERNEL. b) interrupt.h now explicitly pulls sched.h (not via smp_lock.h from hardirq.h as it used to) c) in three more places we need changes to compensate for (a) - one place in arch/sparc needs string.h now, hardirq.h needs forward declaration of task_struct and preempt.h needs direct include of thread_info.h. d) thread_info-related helpers in sched.h and thread_info.h put under ifndef __HAVE_THREAD_FUNCTIONS. Obviously safe. Signed-off-by: NAl Viro <viro@parcelfarce.linux.theplanet.co.uk> Signed-off-by: NRoman Zippel <zippel@linux-m68k.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Al Viro 提交于
encapsulates the rest of arch-dependent operations with thread_info access. Two new helpers - setup_thread_stack() and end_of_stack(). For normal case the former consists of copying thread_info of parent to new thread_info and the latter returns pointer immediately past the end of thread_info. Signed-off-by: NAl Viro <viro@parcelfarce.linux.theplanet.co.uk> Signed-off-by: NRoman Zippel <zippel@linux-m68k.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Al Viro 提交于
new helper - task_thread_info(task). On platforms that have thread_info allocated separately (i.e. in default case) it simply returns task->thread_info. m68k wants (and for good reasons) to embed its thread_info into task_struct. So it will (in later patch) have task_thread_info() of its own. For now we just add a macro for generic case and convert existing instances of its body in core kernel to uses of new macro. Obviously safe - all normal architectures get the same preprocessor output they used to get. Signed-off-by: NAl Viro <viro@parcelfarce.linux.theplanet.co.uk> Signed-off-by: NRoman Zippel <zippel@linux-m68k.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 09 11月, 2005 1 次提交
-
-
由 Ashok Raj 提交于
When calling target drivers to set frequency, we take cpucontrol lock. When we modified the code to accomodate CPU hotplug, there was an attempt to take a double lock of cpucontrol leading to a deadlock. Since the current thread context is already holding the cpucontrol lock, we dont need to make another attempt to acquire it. Now we leave a trace in current->flags indicating current thread already is under cpucontrol lock held, so we dont attempt to do this another time. Thanks to Andrew Morton for the beating:-) From: Brice Goglin <Brice.Goglin@ens-lyon.org> Build fix (akpm: this patch is still unpleasant. Ashok continues to look for a cleaner solution, doesn't he? ;)) Signed-off-by: NAshok Raj <ashok.raj@intel.com> Signed-off-by: NBrice Goglin <Brice.Goglin@ens-lyon.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 31 10月, 2005 3 次提交
-
-
由 Oleg Nesterov 提交于
This patch simplifies some checks for magic siginfo values. It should not change the behaviour in any way. Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Paul Jackson 提交于
Simplify the UP (1 CPU) implementatin of set_cpus_allowed. The one CPU is hardcoded to be cpu 0 - so just test for that bit, and avoid having to pick up the cpu_online_map. Also, unexport cpu_online_map: it was only needed for set_cpus_allowed(). Signed-off-by: NPaul Jackson <pj@sgi.com> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Paul Jackson 提交于
Overhaul cpuset locking. Replace single semaphore with two semaphores. The suggestion to use two locks was made by Roman Zippel. Both locks are global. Code that wants to modify cpusets must first acquire the exclusive manage_sem, which allows them read-only access to cpusets, and holds off other would-be modifiers. Before making actual changes, the second semaphore, callback_sem must be acquired as well. Code that needs only to query cpusets must acquire callback_sem, which is also a global exclusive lock. The earlier problems with double tripping are avoided, because it is allowed for holders of manage_sem to nest the second callback_sem lock, and only callback_sem is needed by code called from within __alloc_pages(), where the double tripping had been possible. This is not quite the same as a normal read/write semaphore, because obtaining read-only access with intent to change must hold off other such attempts, while allowing read-only access w/o such intention. Changing cpusets involves several related checks and changes, which must be done while allowing read-only queries (to avoid the double trip), but while ensuring nothing changes (holding off other would be modifiers.) This overhaul of cpuset locking also makes careful use of task_lock() to guard access to the task->cpuset pointer, closing a couple of race conditions noticed while reading this code (thanks, Roman). I've never seen these races fail in any use or test. See further the comments in the code. Signed-off-by: NPaul Jackson <pj@sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 30 10月, 2005 4 次提交
-
-
由 Hugh Dickins 提交于
A couple of oddities were guarded by page_table_lock, no longer properly guarded when that is split. The mm_counters of file_rss and anon_rss: make those an atomic_t, or an atomic64_t if the architecture supports it, in such a case. Definitions by courtesy of Christoph Lameter: who spent considerable effort on more scalable ways of counting, but found insufficient benefit in practice. And adding an mm with swap to the mmlist for swapoff: the list is well- guarded by its own lock, but the list_empty check now has to be repeated inside it. Signed-off-by: NHugh Dickins <hugh@veritas.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Hugh Dickins 提交于
Slight and timid rearrangement of mm_struct: hiwater_rss and hiwater_vm were tacked on the end, but it seems better to keep them near _file_rss, _anon_rss and total_vm, in the same cacheline on those arches verified. There are likely to be more profitable rearrangements, but less obvious (is it good or bad that saved_auxv[AT_VECTOR_SIZE] isolates cpu_vm_mask and context from many others?), needing serious instrumentation. Signed-off-by: NHugh Dickins <hugh@veritas.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Hugh Dickins 提交于
update_mem_hiwater has attracted various criticisms, in particular from those concerned with mm scalability. Originally it was called whenever rss or total_vm got raised. Then many of those callsites were replaced by a timer tick call from account_system_time. Now Frank van Maarseveen reports that to be found inadequate. How about this? Works for Frank. Replace update_mem_hiwater, a poor combination of two unrelated ops, by macros update_hiwater_rss and update_hiwater_vm. Don't attempt to keep mm->hiwater_rss up to date at timer tick, nor every time we raise rss (usually by 1): those are hot paths. Do the opposite, update only when about to lower rss (usually by many), or just before final accounting in do_exit. Handle mm->hiwater_vm in the same way, though it's much less of an issue. Demand that whoever collects these hiwater statistics do the work of taking the maximum with rss or total_vm. And there has been no collector of these hiwater statistics in the tree. The new convention needs an example, so match Frank's usage by adding a VmPeak line above VmSize to /proc/<pid>/status, and also a VmHWM line above VmRSS (High-Water-Mark or High-Water-Memory). There was a particular anomaly during mremap move, that hiwater_vm might be captured too high. A fleeting such anomaly remains, but it's quickly corrected now, whereas before it would stick. What locking? None: if the app is racy then these statistics will be racy, it's not worth any overhead to make them exact. But whenever it suits, hiwater_vm is updated under exclusive mmap_sem, and hiwater_rss under page_table_lock (for now) or with preemption disabled (later on): without going to any trouble, minimize the time between reading current values and updating, to minimize those occasions when a racing thread bumps a count up and back down in between. Signed-off-by: NHugh Dickins <hugh@veritas.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Hugh Dickins 提交于
I was lazy when we added anon_rss, and chose to change as few places as possible. So currently each anonymous page has to be counted twice, in rss and in anon_rss. Which won't be so good if those are atomic counts in some configurations. Change that around: keep file_rss and anon_rss separately, and add them together (with get_mm_rss macro) when the total is needed - reading two atomics is much cheaper than updating two atomics. And update anon_rss upfront, typically in memory.c, not tucked away in page_add_anon_rmap. Signed-off-by: NHugh Dickins <hugh@veritas.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 11 10月, 2005 1 次提交
-
-
由 Harald Welte 提交于
If a process issues an URB from userspace and (starts to) terminate before the URB comes back, we run into the issue described above. This is because the urb saves a pointer to "current" when it is posted to the device, but there's no guarantee that this pointer is still valid afterwards. In fact, there are three separate issues: 1) the pointer to "current" can become invalid, since the task could be completely gone when the URB completion comes back from the device. 2) Even if the saved task pointer is still pointing to a valid task_struct, task_struct->sighand could have gone meanwhile. 3) Even if the process is perfectly fine, permissions may have changed, and we can no longer send it a signal. So what we do instead, is to save the PID and uid's of the process, and introduce a new kill_proc_info_as_uid() function. Signed-off-by: NHarald Welte <laforge@gnumonks.org> [ Fixed up types and added symbol exports ] Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 30 9月, 2005 2 次提交
-
-
由 Linus Torvalds 提交于
Roland points out that the flags end up having non-obvious dependencies elsewhere, so revert aa55a086 and add some comments about why things are as they are. We'll just have to fix up the broken comparisons. Roland has a patch. Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Oleg Nesterov 提交于
do_signal_stop: for_each_thread(t) { if (t->state < TASK_STOPPED) ++sig->group_stop_count; } However, TASK_NONINTERACTIVE > TASK_STOPPED, so this loop will not count TASK_INTERRUPTIBLE | TASK_NONINTERACTIVE threads. See also wait_task_stopped(), which checks ->state > TASK_STOPPED. Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru> [ We really probably should always use the appropriate bitmasks to test task states, not do it like this. Using something like #define TASK_RUNNABLE (TASK_RUNNING | TASK_INTERRUPTIBLE | \ TASK_UNINTERRUPTIBLE | TASK_NONINTERACTIVE) and then doing "if (task->state & TASK_RUNNABLE)" or similar. But the ordering of the task states is historical, and keeping the ordering does make sense regardless. ] Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 13 9月, 2005 2 次提交
-
-
由 Andrew Morton 提交于
Explain the mysteries of set_current_state(). Quoth Linus: The scheduler itself never needs the memory barrier at all. The barrier is needed only if the user itself ends up testing some other thing afterwards, ie if you have set_process_state(TASK_INTERRUPTIBLE); if (still_need_to_sleep()) schedule(); then the "still_need_to_sleep()" thing may test flags and wakeup events, and then you _may_ want to (and often do) make sure that the write of TASK_INTERRUPTIBLE is serialized wrt the reads of any wakeup data (since the wakeup may have happened on another CPU). So the comment is somewhat wrong. We don't really _care_ whether the state propagates out to other CPU's since all of our actions are purely local, and there is nothing we do that is conditional on any other CPU: we're going to sleep unconditionally, and the scheduler only cares about _our_ state, not about somebody elses state. Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Paul Jackson 提交于
Optimize the deadlock avoidance check on the global cpuset semaphore cpuset_sem. Instead of adding a depth counter to the task struct of each task, rather just two words are enough, one to store the depth and the other the current cpuset_sem holder. Thanks to Nikita Danilov for the idea. Signed-off-by: NPaul Jackson <pj@sgi.com> [ We may want to change this further, but at least it's now a totally internal decision to the cpusets code ] Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 12 9月, 2005 1 次提交
-
-
由 Keith Owens 提交于
Scheduler hooks to see/change which process is deemed to be on a cpu. Signed-off-by: NKeith Owens <kaos@sgi.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 11 9月, 2005 3 次提交
-
-
由 Nishanth Aravamudan 提交于
Add schedule_timeout_{,un}interruptible() interfaces so that schedule_timeout() callers don't have to worry about forgetting to add the set_current_state() call beforehand. Signed-off-by: NNishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Ingo Molnar 提交于
This patch implements a task state bit (TASK_NONINTERACTIVE), which can be used by blocking points to mark the task's wait as "non-interactive". This does not mean the task will be considered a CPU-hog - the wait will simply not have an effect on the waiting task's priority - positive or negative alike. Right now only pipe_wait() will make use of it, because it's a common source of not-so-interactive waits (kernel compilation jobs, etc.). Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Paul Jackson 提交于
The cpusets-formalize-intermediate-gfp_kernel-containment patch has a deadlock problem. This patch was part of a set of four patches to make more extensive use of the cpuset 'mem_exclusive' attribute to manage kernel GFP_KERNEL memory allocations and to constrain the out-of-memory (oom) killer. A task that is changing cpusets in particular ways on a system when it is very short of free memory could double trip over the global cpuset_sem semaphore (get the lock and then deadlock trying to get it again). The second attempt to get cpuset_sem would be in the routine cpuset_zone_allowed(). This was discovered by code inspection. I can not reproduce the problem except with an artifically hacked kernel and a specialized stress test. In real life you cannot hit this unless you are manipulating cpusets, and are very unlikely to hit it unless you are rapidly modifying cpusets on a memory tight system. Even then it would be a rare occurence. If you did hit it, the task double tripping over cpuset_sem would deadlock in the kernel, and any other task also trying to manipulate cpusets would deadlock there too, on cpuset_sem. Your batch manager would be wedged solid (if it was cpuset savvy), but classic Unix shells and utilities would work well enough to reboot the system. The unusual condition that led to this bug is that unlike most semaphores, cpuset_sem _can_ be acquired while in the page allocation code, when __alloc_pages() calls cpuset_zone_allowed. So it easy to mistakenly perform the following sequence: 1) task makes system call to alter a cpuset 2) take cpuset_sem 3) try to allocate memory 4) memory allocator, via cpuset_zone_allowed, trys to take cpuset_sem 5) deadlock The reason that this is not a serious bug for most users is that almost all calls to allocate memory don't require taking cpuset_sem. Only some code paths off the beaten track require taking cpuset_sem -- which is good. Taking a global semaphore on the main code path for allocating memory would not scale well. This patch fixes this deadlock by wrapping the up() and down() calls on cpuset_sem in kernel/cpuset.c with code that tracks the nesting depth of the current task on that semaphore, and only does the real down() if the task doesn't hold the lock already, and only does the real up() if the nesting depth (number of unmatched downs) is exactly one. The previous required use of refresh_mems(), anytime that the cpuset_sem semaphore was acquired and the code executed while holding that semaphore might try to allocate memory, is no longer required. Two refresh_mems() calls were removed thanks to this. This is a good change, as failing to get all the necessary refresh_mems() calls placed was a primary source of bugs in this cpuset code. The only remaining call to refresh_mems() is made while doing a memory allocation, if certain task memory placement data needs to be updated from its cpuset, due to the cpuset having been changed behind the tasks back. Signed-off-by: NPaul Jackson <pj@sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 10 9月, 2005 1 次提交
-
-
由 Chen, Kenneth W 提交于
For architecture like ia64, the switch stack structure is fairly large (currently 528 bytes). For context switch intensive application, we found that significant amount of cache misses occurs in switch_to() function. The following patch adds a hook in the schedule() function to prefetch switch stack structure as soon as 'next' task is determined. This allows maximum overlap in prefetch cache lines for that structure. Signed-off-by: NKen Chen <kenneth.w.chen@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 08 9月, 2005 3 次提交
-
-
由 John Hawkes 提交于
Signed-off-by: NJohn Hawkes <hawkes@sgi.com> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 H. J. Lu 提交于
The size of auxiliary vector is fixed at 42 in linux/sched.h. But it isn't very obvious when looking at linux/elf.h. This patch adds AT_VECTOR_SIZE so that we can change it if necessary when a new vector is added. Because of include file ordering problems, doing this necessitated the extraction of the AT_* symbols into a standalone header file. Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Ingo Molnar 提交于
This patch adds a new kernel debug feature: CONFIG_DETECT_SOFTLOCKUP. When enabled then per-CPU watchdog threads are started, which try to run once per second. If they get delayed for more than 10 seconds then a callback from the timer interrupt detects this condition and prints out a warning message and a stack dump (once per lockup incident). The feature is otherwise non-intrusive, it doesnt try to unlock the box in any way, it only gets the debug info out, automatically, and on all CPUs affected by the lockup. Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NNishanth Aravamudan <nacc@us.ibm.com> Signed-Off-By: NMatthias Urlichs <smurf@smurf.noris.de> Signed-off-by: NRichard Purdie <rpurdie@rpsys.net> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 13 7月, 2005 1 次提交
-
-
由 Robert Love 提交于
inotify is intended to correct the deficiencies of dnotify, particularly its inability to scale and its terrible user interface: * dnotify requires the opening of one fd per each directory that you intend to watch. This quickly results in too many open files and pins removable media, preventing unmount. * dnotify is directory-based. You only learn about changes to directories. Sure, a change to a file in a directory affects the directory, but you are then forced to keep a cache of stat structures. * dnotify's interface to user-space is awful. Signals? inotify provides a more usable, simple, powerful solution to file change notification: * inotify's interface is a system call that returns a fd, not SIGIO. You get a single fd, which is select()-able. * inotify has an event that says "the filesystem that the item you were watching is on was unmounted." * inotify can watch directories or files. Inotify is currently used by Beagle (a desktop search infrastructure), Gamin (a FAM replacement), and other projects. See Documentation/filesystems/inotify.txt. Signed-off-by: NRobert Love <rml@novell.com> Cc: John McCutchan <ttb@tentacle.dhs.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 28 6月, 2005 1 次提交
-
-
由 Jens Axboe 提交于
This updates the CFQ io scheduler to the new time sliced design (cfq v3). It provides full process fairness, while giving excellent aggregate system throughput even for many competing processes. It supports io priorities, either inherited from the cpu nice value or set directly with the ioprio_get/set syscalls. The latter closely mimic set/getpriority. This import is based on my latest from -mm. Signed-off-by: NJens Axboe <axboe@suse.de> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 26 6月, 2005 7 次提交
-
-
由 Christoph Lameter 提交于
1. Establish a simple API for process freezing defined in linux/include/sched.h: frozen(process) Check for frozen process freezing(process) Check if a process is being frozen freeze(process) Tell a process to freeze (go to refrigerator) thaw_process(process) Restart process frozen_process(process) Process is frozen now 2. Remove all references to PF_FREEZE and PF_FROZEN from all kernel sources except sched.h 3. Fix numerous locations where try_to_freeze is manually done by a driver 4. Remove the argument that is no longer necessary from two function calls. 5. Some whitespace cleanup 6. Clear potential race in refrigerator (provides an open window of PF_FREEZE cleared before setting PF_FROZEN, recalc_sigpending does not check PF_FROZEN). This patch does not address the problem of freeze_processes() violating the rule that a task may only modify its own flags by setting PF_FREEZE. This is not clean in an SMP environment. freeze(process) is therefore not SMP safe! Signed-off-by: NChristoph Lameter <christoph@lameter.com> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Dinakar Guniguntala 提交于
The following patches add dynamic sched domains functionality that was extensively discussed on lkml and lse-tech. I would like to see this added to -mm o The main advantage with this feature is that it ensures that the scheduler load balacing code only balances against the cpus that are in the sched domain as defined by an exclusive cpuset and not all of the cpus in the system. This removes any overhead due to load balancing code trying to pull tasks outside of the cpu exclusive cpuset only to be prevented by the tasks' cpus_allowed mask. o cpu exclusive cpusets are useful for servers running orthogonal workloads such as RT applications requiring low latency and HPC applications that are throughput sensitive o It provides a new API partition_sched_domains in sched.c that makes dynamic sched domains possible. o cpu_exclusive cpusets sets are now associated with a sched domain. Which means that the users can dynamically modify the sched domains through the cpuset file system interface o ia64 sched domain code has been updated to support this feature as well o Currently, this does not support hotplug. (However some of my tests indicate hotplug+preempt is currently broken) o I have tested it extensively on x86. o This should have very minimal impact on performance as none of the fast paths are affected Signed-off-by: NDinakar Guniguntala <dino@in.ibm.com> Acked-by: NPaul Jackson <pj@sgi.com> Acked-by: NNick Piggin <nickpiggin@yahoo.com.au> Acked-by: NMatthew Dobson <colpatch@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Nick Piggin 提交于
Consolidate balance-on-exec with balance-on-fork. This is made easy by the sched-domains RCU patches. As well as the general goodness of code reduction, this allows the runqueues to be unlocked during balance-on-fork. schedstats is a problem. Maybe just have balance-on-event instead of distinguishing fork and exec? Signed-off-by: NNick Piggin <nickpiggin@yahoo.com.au> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Nick Piggin 提交于
Instead of requiring architecture code to interact with the scheduler's locking implementation, provide a couple of defines that can be used by the architecture to request runqueue unlocked context switches, and ask for interrupts to be enabled over the context switch. Also replaces the "switch_lock" used by these architectures with an oncpu flag (note, not a potentially slow bitflag). This eliminates one bus locked memory operation when context switching, and simplifies the task_running function. Signed-off-by: NNick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Nick Piggin 提交于
Add SCHEDSTAT statistics for sched-balance-fork. Signed-off-by: NNick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Nick Piggin 提交于
Reimplement the balance on exec balancing to be sched-domains aware. Use this to also do balance on fork balancing. Make x86_64 do balance on fork over the NUMA domain. The problem that the non sched domains aware blancing became apparent on dual core, multi socket opterons. What we want is for the new tasks to be sent to a different socket, but more often than not, we would first load up our sibling core, or fill two cores of a single remote socket before selecting a new one. This gives large improvements to STREAM on such systems. Signed-off-by: NNick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Nick Piggin 提交于
Do CPU load averaging over a number of different intervals. Allow each interval to be chosen by sending a parameter to source_load and target_load. 0 is instantaneous, idx > 0 returns a decaying average with the most recent sample weighted at 2^(idx-1). To a maximum of 3 (could be easily increased). So generally a higher number will result in more conservative balancing. Signed-off-by: NNick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 24 6月, 2005 2 次提交
-
-
由 David Howells 提交于
The attached patch makes the following changes: (1) There's a new special key type called ".request_key_auth". This is an authorisation key for when one process requests a key and another process is started to construct it. This type of key cannot be created by the user; nor can it be requested by kernel services. Authorisation keys hold two references: (a) Each refers to a key being constructed. When the key being constructed is instantiated the authorisation key is revoked, rendering it of no further use. (b) The "authorising process". This is either: (i) the process that called request_key(), or: (ii) if the process that called request_key() itself had an authorisation key in its session keyring, then the authorising process referred to by that authorisation key will also be referred to by the new authorisation key. This means that the process that initiated a chain of key requests will authorise the lot of them, and will, by default, wind up with the keys obtained from them in its keyrings. (2) request_key() creates an authorisation key which is then passed to /sbin/request-key in as part of a new session keyring. (3) When request_key() is searching for a key to hand back to the caller, if it comes across an authorisation key in the session keyring of the calling process, it will also search the keyrings of the process specified therein and it will use the specified process's credentials (fsuid, fsgid, groups) to do that rather than the calling process's credentials. This allows a process started by /sbin/request-key to find keys belonging to the authorising process. (4) A key can be read, even if the process executing KEYCTL_READ doesn't have direct read or search permission if that key is contained within the keyrings of a process specified by an authorisation key found within the calling process's session keyring, and is searchable using the credentials of the authorising process. This allows a process started by /sbin/request-key to read keys belonging to the authorising process. (5) The magic KEY_SPEC_*_KEYRING key IDs when passed to KEYCTL_INSTANTIATE or KEYCTL_NEGATE will specify a keyring of the authorising process, rather than the process doing the instantiation. (6) One of the process keyrings can be nominated as the default to which request_key() should attach new keys if not otherwise specified. This is done with KEYCTL_SET_REQKEY_KEYRING and one of the KEY_REQKEY_DEFL_* constants. The current setting can also be read using this call. (7) request_key() is partially interruptible. If it is waiting for another process to finish constructing a key, it can be interrupted. This permits a request-key cycle to be broken without recourse to rebooting. Signed-Off-By: NDavid Howells <dhowells@redhat.com> Signed-Off-By: NBenoit Boissinot <benoit.boissinot@ens-lyon.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Alan Cox 提交于
Add a new `suid_dumpable' sysctl: This value can be used to query and set the core dump mode for setuid or otherwise protected/tainted binaries. The modes are 0 - (default) - traditional behaviour. Any process which has changed privilege levels or is execute only will not be dumped 1 - (debug) - all processes dump core when possible. The core dump is owned by the current user and no security is applied. This is intended for system debugging situations only. Ptrace is unchecked. 2 - (suidsafe) - any binary which normally would not be dumped is dumped readable by root only. This allows the end user to remove such a dump but not access it directly. For security reasons core dumps in this mode will not overwrite one another or other files. This mode is appropriate when adminstrators are attempting to debug problems in a normal environment. (akpm: > > +EXPORT_SYMBOL(suid_dumpable); > > EXPORT_SYMBOL_GPL? No problem to me. > > if (current->euid == current->uid && current->egid == current->gid) > > current->mm->dumpable = 1; > > Should this be SUID_DUMP_USER? Actually the feedback I had from last time was that the SUID_ defines should go because its clearer to follow the numbers. They can go everywhere (and there are lots of places where dumpable is tested/used as a bool in untouched code) > Maybe this should be renamed to `dump_policy' or something. Doing that > would help us catch any code which isn't using the #defines, too. Fair comment. The patch was designed to be easy to maintain for Red Hat rather than for merging. Changing that field would create a gigantic diff because it is used all over the place. ) Signed-off-by: NAlan Cox <alan@redhat.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 22 6月, 2005 1 次提交
-
-
由 Wolfgang Wander 提交于
Ingo recently introduced a great speedup for allocating new mmaps using the free_area_cache pointer which boosts the specweb SSL benchmark by 4-5% and causes huge performance increases in thread creation. The downside of this patch is that it does lead to fragmentation in the mmap-ed areas (visible via /proc/self/maps), such that some applications that work fine under 2.4 kernels quickly run out of memory on any 2.6 kernel. The problem is twofold: 1) the free_area_cache is used to continue a search for memory where the last search ended. Before the change new areas were always searched from the base address on. So now new small areas are cluttering holes of all sizes throughout the whole mmap-able region whereas before small holes tended to close holes near the base leaving holes far from the base large and available for larger requests. 2) the free_area_cache also is set to the location of the last munmap-ed area so in scenarios where we allocate e.g. five regions of 1K each, then free regions 4 2 3 in this order the next request for 1K will be placed in the position of the old region 3, whereas before we appended it to the still active region 1, placing it at the location of the old region 2. Before we had 1 free region of 2K, now we only get two free regions of 1K -> fragmentation. The patch addresses thes issues by introducing yet another cache descriptor cached_hole_size that contains the largest known hole size below the current free_area_cache. If a new request comes in the size is compared against the cached_hole_size and if the request can be filled with a hole below free_area_cache the search is started from the base instead. The results look promising: Whereas 2.6.12-rc4 fragments quickly and my (earlier posted) leakme.c test program terminates after 50000+ iterations with 96 distinct and fragmented maps in /proc/self/maps it performs nicely (as expected) with thread creation, Ingo's test_str02 with 20000 threads requires 0.7s system time. Taking out Ingo's patch (un-patch available per request) by basically deleting all mentions of free_area_cache from the kernel and starting the search for new memory always at the respective bases we observe: leakme terminates successfully with 11 distinctive hardly fragmented areas in /proc/self/maps but thread creating is gringdingly slow: 30+s(!) system time for Ingo's test_str02 with 20000 threads. Now - drumroll ;-) the appended patch works fine with leakme: it ends with only 7 distinct areas in /proc/self/maps and also thread creation seems sufficiently fast with 0.71s for 20000 threads. Signed-off-by: NWolfgang Wander <wwc@rentec.com> Credit-to: "Richard Purdie" <rpurdie@rpsys.net> Signed-off-by: NKen Chen <kenneth.w.chen@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> (partly) Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 06 5月, 2005 1 次提交
-
-
Add some comments about task->comm, to explain what it is near its definition and provide some important pointers to its uses. Signed-off-by: NPaolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-