- 26 3月, 2006 24 次提交
-
-
由 Andrew Morton 提交于
text data bss dec hex filename before: 3490577 1322408 360000 5172985 4eeef9 vmlinux after: 3488027 1322496 360128 5170651 4ee5db vmlinux Cc: Paul Jackson <pj@sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Jonathan Corbet 提交于
Documentation changes to help radix tree users avoid overrunning the tags array. RADIX_TREE_TAGS moves to linux/radix-tree.h and is now known as RADIX_TREE_MAX_TAGS (Nick Piggin's idea). Tag parameters are changed to unsigned, and some comments are updated. Signed-off-by: NJonathan Corbet <corbet@lwn.net> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Andrew Morton 提交于
fls() takes an integer, so roundup_pow_of_two() is busted for ulongs larger than 2^32-1. Fix this by implementing and using fls_long(). (Why does roundup_pow_of_two() return a long?) (Why is roundup_pow_of_two() __attribute_const__ whereas long_log2() is __attribute_pure__?) (Why does long_log2() suck so much? Because we were missing fls_long()?) Cc: Roland Dreier <rdreier@cisco.com> Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com> Cc: John Hawkes <hawkes@sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Chris Wright 提交于
Move capable() to kernel/capability.c and eliminate duplicate implementations. Add __capable() function which can be used to check for capabiilty of any process. Signed-off-by: NChris Wright <chrisw@sous-sol.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Adrian Bunk 提交于
This patch adds a proper prototype for setup_arch() in init.h. This patch is based on a patch by Ben Dooks <ben-linux@fluff.org>. Signed-off-by: NAdrian Bunk <bunk@stusta.de> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Davide Libenzi 提交于
Implement the half-closed devices notifiation, by adding a new POLLRDHUP (and its alias EPOLLRDHUP) bit to the existing poll/select sets. Since the existing POLLHUP handling, that does not report correctly half-closed devices, was feared to be changed, this implementation leaves the current POLLHUP reporting unchanged and simply add a new bit that is set in the few places where it makes sense. The same thing was discussed and conceptually agreed quite some time ago: http://lkml.org/lkml/2003/7/12/116 Since this new event bit is added to the existing Linux poll infrastruture, even the existing poll/select system calls will be able to use it. As far as the existing POLLHUP handling, the patch leaves it as is. The pollrdhup-2.6.16.rc5-0.10.diff defines the POLLRDHUP for all the existing archs and sets the bit in the six relevant files. The other attached diff is the simple change required to sys/epoll.h to add the EPOLLRDHUP definition. There is "a stupid program" to test POLLRDHUP delivery here: http://www.xmailserver.org/pollrdhup-test.c It tests poll(2), but since the delivery is same epoll(2) will work equally. Signed-off-by: NDavide Libenzi <davidel@xmailserver.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Andrew Morton 提交于
Uninline some massive IRQ migration functions. Put them in the new kernel/irq/migration.c. Cc: Andi Kleen <ak@muc.de> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Bjorn Helgaas 提交于
Remove the assumption that driver_register() returns the number of devices bound to the driver. In fact, it returns zero for success or a negative error value. zorro_module_init() used the device count to automatically unregister and unload drivers that found no devices. That might have worked at one time, but has been broken for some time because zorro_register_driver() returned either a negative error or a positive count (never zero). So it could only unregister on failure, when it's not needed anyway. This functionality could be resurrected in individual drivers by counting devices in their .probe() methods. Signed-off-by: NBjorn Helgaas <bjorn.helgaas@hp.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Bjorn Helgaas 提交于
Remove the assumption that driver_register() returns the number of devices bound to the driver. In fact, it returns zero for success or a negative error value. dio_module_init() used the device count to automatically unregister and unload drivers that found no devices. That might have worked at one time, but has been broken for some time because dio_register_driver() returned either a negative error or a positive count (never zero). So it could only unregister on failure, when it's not needed anyway. This functionality could be resurrected in individual drivers by counting devices in their .probe() methods. Signed-off-by: NBjorn Helgaas <bjorn.helgaas@hp.com> Cc: Philip Blundell <philb@gnu.org> Cc: Jochen Friedrich <jochen@scram.de> Cc: "Antonino A. Daplas" <adaplas@pol.net> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Vladimir V. Saveliev 提交于
Clean up several places where gcc issues warnings when -W is specified. Thanks to Neil for finding that. Signed-off-by: NVladimir V. Saveliev <vs@namesys.com> Cc: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: NHans Reiser <reiser@namesys.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Rene Herman 提交于
Today I wondered about /dev/parport<n> after not seeing anything in drivers/parport register char-major-99. Having PP_MAJOR in include/linux/major.h would've allowed me to more quickly determine that it was the ppdev driver driving these. Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Nick Piggin 提交于
Previous inotify work avoidance is good when inotify is completely unused, but it breaks down if even a single watch is in place anywhere in the system. Robin Holt notices that udev is one such culprit - it slows down a 512-thread application on a 512 CPU system from 6 seconds to 22 minutes. Solve this by adding a flag in the dentry that tells inotify whether or not its parent inode has a watch on it. Event queueing to parent will skip taking locks if this flag is cleared. Setting and clearing of this flag on all child dentries versus event delivery: this is no in terms of race cases, and that was shown to be equivalent to always performing the check. The essential behaviour is that activity occuring _after_ a watch has been added and _before_ it has been removed, will generate events. Signed-off-by: NNick Piggin <npiggin@suse.de> Cc: Robert Love <rml@novell.com> Cc: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Adrian Bunk 提交于
param_array() in kernel/params.c can now become static. Signed-off-by: NAdrian Bunk <bunk@stusta.de> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Rusty Russell 提交于
MODULE_PARM was actually breaking: recent gcc version optimize them out as unused. It's time to replace the last users, which are generally in the most unloved drivers anyway. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Thomas Koeller 提交于
Add a couple of 'const' qualifiers to the TTY flip buffer APIs, where appropriate. Signed-off-by: NThomas Koeller <thomas@koeller.dyndns.org> Acked-by: NAlan Cox <alan@redhat.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Oleg Drokin 提交于
Introduce FMODE_EXEC file flag, to indicate that file is being opened for execution. This is useful for distributed filesystems to maintain consistent behavior for returning ETXTBUSY when opening for write and execution happens on different nodes. akpm: Needed by Lustre at present. I assume their objective to to work towards being able to install Lustre on an unmodified distro kernel, which seems sane. It should have zero runtime cost. Trond and Chuck indicate that NFS4 can probably use this too, for the same thing. Steven says it's also on the GFS todo list. Signed-off-by: NOleg Drokin <green@linuxhacker.ru> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Chuck Lever <cel@citi.umich.edu> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Alexander Zarochentzev 提交于
This patch fixes a bug in reiserfs truncate. A transaction might overflow when truncating long highly fragmented file. The fix is to split truncation into several transactions to avoid overflowing. Signed-off-by: NVladimir V. Saveliev <vs@namesys.com> Cc; Charles McColgan <cm@chuck.net> Cc: Alexander Zarochentsev <zam@namesys.com> Cc: Hans Reiser <reiser@namesys.com> Cc: Chris Mason <mason@suse.com> Cc: Jeff Mahoney <jeffm@suse.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Adrian Bunk 提交于
There's no reason for iprune_mutex being global. Signed-off-by: NAdrian Bunk <bunk@stusta.de> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Jan Kara 提交于
Remove unused quota flag. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Andrew Morton 提交于
The kjournald timer is currently on the kernel thread's stack and the journal structure points at it. Save a pointer hop by moving the timer into the journal structure. Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Pekka Enberg 提交于
As suggested by Eric Dumazet, optimize kzalloc() calls that pass a compile-time constant size. Please note that the patch increases kernel text slightly (~200 bytes for defconfig on x86). Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Pekka Enberg 提交于
Introduce a memory-zeroing variant of kmem_cache_alloc. The allocator already exits in XFS and there are potential users for it so this patch makes the allocator available for the general public. Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Al Viro 提交于
Implement /proc/slab_allocators. It produces output like: idr_layer_cache: 80 idr_pre_get+0x33/0x4e buffer_head: 2555 alloc_buffer_head+0x20/0x75 mm_struct: 9 mm_alloc+0x1e/0x42 mm_struct: 20 dup_mm+0x36/0x370 vm_area_struct: 384 dup_mm+0x18f/0x370 vm_area_struct: 151 do_mmap_pgoff+0x2e0/0x7c3 vm_area_struct: 1 split_vma+0x5a/0x10e vm_area_struct: 11 do_brk+0x206/0x2e2 vm_area_struct: 2 copy_vma+0xda/0x142 vm_area_struct: 9 setup_arg_pages+0x99/0x214 fs_cache: 8 copy_fs_struct+0x21/0x133 fs_cache: 29 copy_process+0xf38/0x10e3 files_cache: 30 alloc_files+0x1b/0xcf signal_cache: 81 copy_process+0xbaa/0x10e3 sighand_cache: 77 copy_process+0xe65/0x10e3 sighand_cache: 1 de_thread+0x4d/0x5f8 anon_vma: 241 anon_vma_prepare+0xd9/0xf3 size-2048: 1 add_sect_attrs+0x5f/0x145 size-2048: 2 journal_init_revoke+0x99/0x302 size-2048: 2 journal_init_revoke+0x137/0x302 size-2048: 2 journal_init_inode+0xf9/0x1c4 Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Alexander Nyberg <alexn@telia.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> DESC slab-leaks3-locking-fix EDESC From: Andrew Morton <akpm@osdl.org> Update for slab-remove-cachep-spinlock.patch Cc: Al Viro <viro@ftp.linux.org.uk> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Alexander Nyberg <alexn@telia.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Thomas Gleixner 提交于
alarm() calls the kernel with an unsigend int timeout in seconds. The value is stored in the tv_sec field of a struct timeval to setup the itimer. The tv_sec field of struct timeval is of type long, which causes the tv_sec value to be negative on 32 bit machines if seconds > INT_MAX. Before the hrtimer merge (pre 2.6.16) such a negative value was converted to the maximum jiffies timeout by the timeval_to_jiffies conversion. It's not clear whether this was intended or just happened to be done by the timeval_to_jiffies code. hrtimers expect a timeval in canonical form and treat a negative timeout as already expired. This breaks the legitimate usage of alarm() with a timeout value > INT_MAX seconds. For 32 bit machines it is therefor necessary to limit the internal seconds value to avoid API breakage. Instead of doing this in all implementations of sys_alarm the duplicated sys_alarm code is moved into a common function in itimer.c Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 25 3月, 2006 1 次提交
-
-
由 Andrew Victor 提交于
This patch adds support for the Ethernet controller integrated in the Atmel AT91RM9200 SoC processor. Changes since the previous submission (01/02/2006) are: - Make use of the clk.h clock infrastructure. - The multicast hash function is not crc32. [Patch by Pedro Perez] Signed-off-by: NAndrew Victor <andrew@sanpeople.com> Signed-off-by: NJeff Garzik <jeff@garzik.org>
-
- 24 3月, 2006 15 次提交
-
-
由 Davi Arnaut 提交于
This patch series creates a strndup_user() function to easy copying C strings from userspace. Also we avoid common pitfalls like userspace modifying the final \0 after the strlen_user(). Signed-off-by: NDavi Arnaut <davi.arnaut@gmail.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Ingo Molnar 提交于
Make the softlockup detector purely timer-interrupt driven, removing softirq-context (timer) dependencies. This means that if the softlockup watchdog triggers, it has truly observed a longer than 10 seconds scheduling delay of a SCHED_FIFO prio 99 task. (the patch also turns off the softlockup detector during the initial bootup phase and does small style fixes) Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Kumar Gala 提交于
In some embedded systems the IDE hardware interface may only support 16-bit or smaller accesses. Allow the interface to specify if this is the case and don't allow the drive or user to override the setting. Signed-off-by: NKumar Gala <galak@kernel.crashing.org> Acked-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Pierre Ossman 提交于
Class code and register definitions for the Secure Digital Host Controller standard. Signed-off-by: NPierre Ossman <drzeus@drzeus.cx> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Andrew Morton 提交于
Pull the guts out of do_fsync() - we can use it elsewhere. Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Andrew Morton 提交于
We need set_page_dirty() to return true if it actually transitioned the page from a clean to dirty state. This wasn't right in a couple of places. Do a kernel-wide audit, fix things up. This leaves open the possibility of returning a negative errno from set_page_dirty() sometime in the future. But we don't do that at present. Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Andrew Morton 提交于
Modify balance_dirty_pages_ratelimited() so that it can take a number-of-pages-which-I-just-dirtied argument. For msync(). Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Arthur Othieno 提交于
OPL3_HW_OPL3_PC98 #define isn't used anywhere; previously in sound/drivers/opl3/opl3_lib.c and sound/isa/cs423x/pc98.c, the latter of which went away with the rest of PC98 subarch. Signed-off-by: NArthur Othieno <apgo@patchbomb.org> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Andrew Morton 提交于
Add two new linux-specific fadvise extensions(): LINUX_FADV_ASYNC_WRITE: start async writeout of any dirty pages between file offsets `offset' and `offset+len'. Any pages which are currently under writeout are skipped, whether or not they are dirty. LINUX_FADV_WRITE_WAIT: wait upon writeout of any dirty pages between file offsets `offset' and `offset+len'. By combining these two operations the application may do several things: LINUX_FADV_ASYNC_WRITE: push some or all of the dirty pages at the disk. LINUX_FADV_WRITE_WAIT, LINUX_FADV_ASYNC_WRITE: push all of the currently dirty pages at the disk. LINUX_FADV_WRITE_WAIT, LINUX_FADV_ASYNC_WRITE, LINUX_FADV_WRITE_WAIT: push all of the currently dirty pages at the disk, wait until they have been written. It should be noted that none of these operations write out the file's metadata. So unless the application is strictly performing overwrites of already-instantiated disk blocks, there are no guarantees here that the data will be available after a crash. To complete this suite of operations I guess we should have a "sync file metadata only" operation. This gives applications access to all the building blocks needed for all sorts of sync operations. But sync-metadata doesn't fit well with the fadvise() interface. Probably it should be a new syscall: sys_fmetadatasync(). The patch also diddles with the meaning of `endbyte' in sys_fadvise64_64(). It is made to represent that last affected byte in the file (ie: it is inclusive). Generally, all these byterange and pagerange functions are inclusive so we can easily represent EOF with -1. As Ulrich notes, these two functions are somewhat abusive of the fadvise() concept, which appears to be "set the future policy for this fd". But these commands are a perfect fit with the fadvise() impementation, and several of the existing fadvise() commands are synchronous and don't affect future policy either. I think we can live with the slight incongruity. Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Jan Beulich 提交于
Provide abstraction for generating type and size information of assembly routines and data, while permitting architectures to override these defaults. Signed-off-by: NJan Beulich <jbeulich@novell.com> Cc: "Russell King" <rmk@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: "Andi Kleen" <ak@suse.de> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Alexey Dobriyan 提交于
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Paul Jackson 提交于
The hooks in the slab cache allocator code path for support of NUMA mempolicies and cpuset memory spreading are in an important code path. Many systems will use neither feature. This patch optimizes those hooks down to a single check of some bits in the current tasks task_struct flags. For non NUMA systems, this hook and related code is already ifdef'd out. The optimization is done by using another task flag, set if the task is using a non-default NUMA mempolicy. Taking this flag bit along with the PF_SPREAD_PAGE and PF_SPREAD_SLAB flag bits added earlier in this 'cpuset memory spreading' patch set, one can check for the combination of any of these special case memory placement mechanisms with a single test of the current tasks task_struct flags. This patch also tightens up the code, to save a few bytes of kernel text space, and moves some of it out of line. Due to the nested inlines called from multiple places, we were ending up with three copies of this code, which once we get off the main code path (for local node allocation) seems a bit wasteful of instruction memory. Signed-off-by: NPaul Jackson <pj@sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Paul Jackson 提交于
Provide the slab cache infrastructure to support cpuset memory spreading. See the previous patches, cpuset_mem_spread, for an explanation of cpuset memory spreading. This patch provides a slab cache SLAB_MEM_SPREAD flag. If set in the kmem_cache_create() call defining a slab cache, then any task marked with the process state flag PF_MEMSPREAD will spread memory page allocations for that cache over all the allowed nodes, instead of preferring the local (faulting) node. On systems not configured with CONFIG_NUMA, this results in no change to the page allocation code path for slab caches. On systems with cpusets configured in the kernel, but the "memory_spread" cpuset option not enabled for the current tasks cpuset, this adds a call to a cpuset routine and failed bit test of the processor state flag PF_SPREAD_SLAB. For tasks so marked, a second inline test is done for the slab cache flag SLAB_MEM_SPREAD, and if that is set and if the allocation is not in_interrupt(), this adds a call to to a cpuset routine that computes which of the tasks mems_allowed nodes should be preferred for this allocation. ==> This patch adds another hook into the performance critical code path to allocating objects from the slab cache, in the ____cache_alloc() chunk, below. The next patch optimizes this hook, reducing the impact of the combined mempolicy plus memory spreading hooks on this critical code path to a single check against the tasks task_struct flags word. This patch provides the generic slab flags and logic needed to apply memory spreading to a particular slab. A subsequent patch will mark a few specific slab caches for this placement policy. Signed-off-by: NPaul Jackson <pj@sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Paul Jackson 提交于
Change the page cache allocation calls to support cpuset memory spreading. See the previous patch, cpuset_mem_spread, for an explanation of cpuset memory spreading. On systems without cpusets configured in the kernel, this is no change. On systems with cpusets configured in the kernel, but the "memory_spread" cpuset option not enabled for the current tasks cpuset, this adds a call to a cpuset routine and failed bit test of the processor state flag PF_SPREAD_PAGE. On tasks in cpusets with "memory_spread" enabled, this adds a call to a cpuset routine that computes which of the tasks mems_allowed nodes should be preferred for this allocation. If memory spreading applies to a particular allocation, then any other NUMA mempolicy does not apply. Signed-off-by: NPaul Jackson <pj@sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Paul Jackson 提交于
This patch provides the implementation and cpuset interface for an alternative memory allocation policy that can be applied to certain kinds of memory allocations, such as the page cache (file system buffers) and some slab caches (such as inode caches). The policy is called "memory spreading." If enabled, it spreads out these kinds of memory allocations over all the nodes allowed to a task, instead of preferring to place them on the node where the task is executing. All other kinds of allocations, including anonymous pages for a tasks stack and data regions, are not affected by this policy choice, and continue to be allocated preferring the node local to execution, as modified by the NUMA mempolicy. There are two boolean flag files per cpuset that control where the kernel allocates pages for the file system buffers and related in kernel data structures. They are called 'memory_spread_page' and 'memory_spread_slab'. If the per-cpuset boolean flag file 'memory_spread_page' is set, then the kernel will spread the file system buffers (page cache) evenly over all the nodes that the faulting task is allowed to use, instead of preferring to put those pages on the node where the task is running. If the per-cpuset boolean flag file 'memory_spread_slab' is set, then the kernel will spread some file system related slab caches, such as for inodes and dentries evenly over all the nodes that the faulting task is allowed to use, instead of preferring to put those pages on the node where the task is running. The implementation is simple. Setting the cpuset flags 'memory_spread_page' or 'memory_spread_cache' turns on the per-process flags PF_SPREAD_PAGE or PF_SPREAD_SLAB, respectively, for each task that is in the cpuset or subsequently joins that cpuset. In subsequent patches, the page allocation calls for the affected page cache and slab caches are modified to perform an inline check for these flags, and if set, a call to a new routine cpuset_mem_spread_node() returns the node to prefer for the allocation. The cpuset_mem_spread_node() routine is also simple. It uses the value of a per-task rotor cpuset_mem_spread_rotor to select the next node in the current tasks mems_allowed to prefer for the allocation. This policy can provide substantial improvements for jobs that need to place thread local data on the corresponding node, but that need to access large file system data sets that need to be spread across the several nodes in the jobs cpuset in order to fit. Without this patch, especially for jobs that might have one thread reading in the data set, the memory allocation across the nodes in the jobs cpuset can become very uneven. A couple of Copyright year ranges are updated as well. And a couple of email addresses that can be found in the MAINTAINERS file are removed. Signed-off-by: NPaul Jackson <pj@sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-