- 23 6月, 2006 22 次提交
-
-
由 Pekka Enberg 提交于
At present our slab debugging tells us that it detected a double-free or corruption - it does not distinguish between them. Sometimes it's useful to be able to differentiate between these two types of information. Add double-free detection to redzone verification when freeing an object. As explained by Manfred, when we are freeing an object, both redzones should be RED_ACTIVE. However, if both are RED_INACTIVE, we are trying to free an object that was already free'd. Signed-off-by: NManfred Spraul <manfred@colorfullife.com> Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Hua Zhong 提交于
With likely/unlikely profiling on my not-so-busy-typical-developmentsystem there are 5k misses vs 2k hits. So I guess we should remove the unlikely. Signed-off-by: NHua Zhong <hzhong@gmail.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Nick Piggin 提交于
Add remap_vmalloc_range, vmalloc_user, and vmalloc_32_user so that drivers can have a nice interface for remapping vmalloc memory. Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Rafael J. Wysocki 提交于
Rework the swsusp's memory shrinker in the following way: - Simplify balance_pgdat() by removing all of the swsusp-related code from it. - Make shrink_all_memory() use shrink_slab() and a new function shrink_all_zones() which calls shrink_active_list() and shrink_inactive_list() directly for each zone in a way that's optimized for suspend. In shrink_all_memory() we try to free exactly as many pages as the caller asks for, preferably in one shot, starting from easier targets. If slab caches are huge, they are most likely to have enough pages to reclaim. The inactive lists are next (the zones with more inactive pages go first) etc. Each time shrink_all_memory() attempts to shrink the active and inactive lists for each zone in 5 passes. In the first pass, only the inactive lists are taken into consideration. In the next two passes the active lists are also shrunk, but mapped pages are not reclaimed. In the last two passes the active and inactive lists are shrunk and mapped pages are reclaimed as well. The aim of this is to alter the reclaim logic to choose the best pages to keep on resume and improve the responsiveness of the resumed system. Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl> Signed-off-by: NCon Kolivas <kernel@kolivas.org> Signed-off-by: NAdrian Bunk <bunk@stusta.de> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Christoph Hellwig 提交于
Use the _entry variant everywhere to clean the code up a tiny bit. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Christoph Hellwig 提交于
The last ifdef addition hit the ugliness treshold on this functions, so: - rename the variable i to nr_pages so it's somewhat descriptive - remove the addr variable and do the page_address call at the very end - instead of ifdef'ing the whole alloc_pages_node call just make the __GFP_COMP addition to flags conditional - rewrite the __GFP_COMP comment to make sense Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Chen, Kenneth W 提交于
Current hugetlb strict accounting for shared mapping always assume mapping starts at zero file offset and reserves pages between zero and size of the file. This assumption often reserves (or lock down) a lot more pages then necessary if application maps at none zero file offset. libhugetlbfs is one example that requires proper reservation on shared mapping starts at none zero offset. This patch extends the reservation and hugetlb strict accounting to support any arbitrary pair of (offset, len), resulting a much more robust and accurate scheme. More importantly, it won't lock down any hugetlb pages outside file mapping. Signed-off-by: NKen Chen <kenneth.w.chen@intel.com> Acked-by: NAdam Litke <agl@us.ibm.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: William Lee Irwin III <wli@holomorphy.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Dave Peterson 提交于
This fixes a few typos in the comments in mm/oom_kill.c. Signed-off-by: NDavid S. Peterson <dsp@llnl.gov> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 KAMEZAWA Hiroyuki 提交于
This patch adds panic_on_oom sysctl under sys.vm. When sysctl vm.panic_on_oom = 1, the kernel panics intead of killing rogue processes. And if vm.panic_on_oom is 0 the kernel will do oom_kill() in the same way as it does today. Of course, the default value is 0 and only root can modifies it. In general, oom_killer works well and kill rogue processes. So the whole system can survive. But there are environments where panic is preferable rather than kill some processes. Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Andy Whitcroft 提交于
We have architectures where the size of page_to_pfn and pfn_to_page are significant enough to overall image size that they wish to push them out of line. However, in the process we have grown a second copy of the implementation of each of these routines for each memory model. Share the implmentation exposing it either inline or out-of-line as required. Signed-off-by: NAndy Whitcroft <apw@shadowen.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Yasunori Goto 提交于
In current code, zonelist is considered to be build once, no modification. But MemoryHotplug can add new zone/pgdat. It must be updated. This patch modifies build_all_zonelists(). By this, build_all_zonelist() can reconfig pgdat's zonelists. To update them safety, this patch use stop_machine_run(). Other cpus don't touch among updating them by using it. In old version (V2 of node hotadd), kernel updated them after zone initialization. But present_page of its new zone is still 0, because online_page() is not called yet at this time. Build_zonelists() checks present_pages to find present zone. It was too early. So, I changed it after online_pages(). Signed-off-by: NYasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Yasunori Goto 提交于
Wait_table is initialized according to zone size at boot time. But, we cannot know the maixmum zone size when memory hotplug is enabled. It can be changed.... And resizing of wait_table is hard. So kernel allocate and initialzie wait_table as its maximum size. Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: NYasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Yasunori Goto 提交于
[PATCH] wait_table and zonelist initializing for memory hotadd: add return code for init_current_empty_zone When add_zone() is called against empty zone (not populated zone), we have to initialize the zone which didn't initialize at boot time. But, init_currently_empty_zone() may fail due to allocation of wait table. So, this patch is to catch its error code. Changes against wait_table is in the next patch. Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: NYasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Yasunori Goto 提交于
[PATCH] wait_table and zonelist initializing for memory hotadd: change to meminit for build_zonelist Change definitions of some functions and data from __init to __meminit. These functions and data can be used after bootup by this patch to be used for hot-add codes. Signed-off-by: NYasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Yasunori Goto 提交于
This is just to rename from wait_table_size() to wait_table_hash_nr_entries(). Signed-off-by: NYasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Christoph Lameter 提交于
Remove two unnecessary PageSwapCache checks. The page refcount is raised and therefore page migration cannot occur in both functions. Signed-off-by: NChristoph Lameter <clameter@sgi.com> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Pekka Enberg 提交于
Clean up slab allocator page mapping a bit. The memory allocated for a slab is physically contiguous so it is okay to assume struct pages are too so kill the long-standing comment. Furthermore, rename set_slab_attr to slab_map_pages and add a comment explaining why its needed. Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Pekka Enberg 提交于
Move alien object freeing to cache_free_alien() to reduce #ifdef clutter in __cache_free(). Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi> Acked-by: NChristoph Lameter <clameter@sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Christoph Lameter 提交于
It is better to redo the complete fault if do_swap_page() finds that the page is not in PageSwapCache() because the page migration code may have replaced the swap pte already with a pte pointing to valid memory. do_swap_page() may interpret an invalid swap entry without this patch because we do not reload the pte if we are looping back. The page migration code may already have reused the swap entry referenced by our local swp_entry. Signed-off-by: NChristoph Lameter <clameter@sgi.com> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Andy Whitcroft 提交于
The buddy allocator has a requirement that boundaries between contigious zones occur aligned with the the MAX_ORDER ranges. Where they do not we will incorrectly merge pages cross zone boundaries. This can lead to pages from the wrong zone being handed out. Originally the buddy allocator would check that buddies were in the same zone by referencing the zone start and end page frame numbers. This was removed as it became very expensive and the buddy allocator already made the assumption that zones boundaries were aligned. It is clear that not all configurations and architectures are honouring this alignment requirement. Therefore it seems safest to reintroduce support for non-aligned zone boundaries. This patch introduces a new check when considering a page a buddy it compares the zone_table index for the two pages and refuses to merge the pages where they do not match. The zone_table index is unique for each node/zone combination when FLATMEM/DISCONTIGMEM is enabled and for each section/zone combination when SPARSEMEM is enabled (a SPARSEMEM section is at least a MAX_ORDER size). Signed-off-by: NAndy Whitcroft <apw@shadowen.org> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 David Howells 提交于
Give the statfs superblock operation a dentry pointer rather than a superblock pointer. This complements the get_sb() patch. That reduced the significance of sb->s_root, allowing NFS to place a fake root there. However, NFS does require a dentry to use as a target for the statfs operation. This permits the root in the vfsmount to be used instead. linux/mount.h has been added where necessary to make allyesconfig build successfully. Interest has also been expressed for use with the FUSE and XFS filesystems. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NAl Viro <viro@zeniv.linux.org.uk> Cc: Nathan Scott <nathans@sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 David Howells 提交于
Extend the get_sb() filesystem operation to take an extra argument that permits the VFS to pass in the target vfsmount that defines the mountpoint. The filesystem is then required to manually set the superblock and root dentry pointers. For most filesystems, this should be done with simple_set_mnt() which will set the superblock pointer and then set the root dentry to the superblock's s_root (as per the old default behaviour). The get_sb() op now returns an integer as there's now no need to return the superblock pointer. This patch permits a superblock to be implicitly shared amongst several mount points, such as can be done with NFS to avoid potential inode aliasing. In such a case, simple_set_mnt() would not be called, and instead the mnt_root and mnt_sb would be set directly. The patch also makes the following changes: (*) the get_sb_*() convenience functions in the core kernel now take a vfsmount pointer argument and return an integer, so most filesystems have to change very little. (*) If one of the convenience function is not used, then get_sb() should normally call simple_set_mnt() to instantiate the vfsmount. This will always return 0, and so can be tail-called from get_sb(). (*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the dcache upon superblock destruction rather than shrink_dcache_anon(). This is required because the superblock may now have multiple trees that aren't actually bound to s_root, but that still need to be cleaned up. The currently called functions assume that the whole tree is rooted at s_root, and that anonymous dentries are not the roots of trees which results in dentries being left unculled. However, with the way NFS superblock sharing are currently set to be implemented, these assumptions are violated: the root of the filesystem is simply a dummy dentry and inode (the real inode for '/' may well be inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries with child trees. [*] Anonymous until discovered from another tree. (*) The documentation has been adjusted, including the additional bit of changing ext2_* into foo_* in the documentation. [akpm@osdl.org: convert ipath_fs, do other stuff] Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NAl Viro <viro@zeniv.linux.org.uk> Cc: Nathan Scott <nathans@sgi.com> Cc: Roland Dreier <rolandd@cisco.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 13 6月, 2006 2 次提交
-
-
由 Sergey Vlasov 提交于
shmem_rmdir() must undo the increment of i_nlink done in shmem_get_inode() for directories, otherwise at least IN_DELETE_SELF inotify event generation is broken. Signed-off-by: NSergey Vlasov <vsu@altlinux.ru> Signed-off-by: NHugh Dickins <hugh@veritas.com> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Robin H. Johnson 提交于
I noticed a strange behavior in a tmpfs file system the other day, while building packages - occasionally, and seemingly at random, make decided to rebuild a target. However, only on tmpfs. A file would be created, and if checked, it had a sub-second timestamp. However, after an utimes related call where sub-seconds should be set, they were zeroed instead. In the case that a file was created, and utimes(...,NULL) was used on it in the same second, the timestamp on the file moved backwards. After some digging, I found that this was being caused by tmpfs not having a time granularity set, thus inheriting the default 1 second granularity. Hugh adds: yes, we missed tmpfs when the s_time_gran mods went into 2.6.11. Unfortunately, the granularity of CURRENT_TIME, often used in filesystems, does not match the default granularity set by alloc_super. A few more such discrepancies have been found, but this is the most important to fix now. Signed-off-by: NRobin H. Johnson <robbat2@gentoo.org> Acked-by: NAndi Kleen <ak@suse.de> Signed-off-by: NHugh Dickins <hugh@veritas.com> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 12 6月, 2006 1 次提交
-
-
由 Christoph Lameter 提交于
From: Christoph Lameter <clameter@sgi.com> Looks like a comma was left from the conversion from a struct to an assignment. Signed-off-by: NChristoph Lameter <clameter@sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 03 6月, 2006 1 次提交
-
-
由 Ingo Molnar 提交于
mm/slab.c's offlab_limit logic is totally broken. Firstly, "offslab_limit" is a global variable while it should either be calculated in situ or should be passed in as a parameter. Secondly, the more serious problem with it is that the condition for calculating it: if (!(OFF_SLAB(sizes->cs_cachep))) { offslab_limit = sizes->cs_size - sizeof(struct slab); offslab_limit /= sizeof(kmem_bufctl_t); is in total disconnect with the condition that makes use of it: /* More than offslab_limit objects will cause problems */ if ((flags & CFLGS_OFF_SLAB) && num > offslab_limit) break; but due to offslab_limit being a global variable this breakage was hidden. Up until lockdep came along and perturbed the slab sizes sufficiently so that the first off-slab cache would still see a (non-calculated) zero value for offslab_limit and would panic with: kmem_cache_create: couldn't create cache size-512. Call Trace: [<ffffffff8020a5b9>] show_trace+0x96/0x1c8 [<ffffffff8020a8f0>] dump_stack+0x13/0x15 [<ffffffff8022994f>] panic+0x39/0x21a [<ffffffff80270814>] kmem_cache_create+0x5a0/0x5d0 [<ffffffff80aced62>] kmem_cache_init+0x193/0x379 [<ffffffff80abf779>] start_kernel+0x17f/0x218 [<ffffffff80abf263>] _sinittext+0x263/0x26a Kernel panic - not syncing: kmem_cache_create(): failed to create slab `size-512' Paolo Ornati's config on x86_64 managed to trigger it. The fix is to move the calculation to the place that makes use of it. This also makes slab.o 54 bytes smaller. Btw., the check itself is quite silly. Its intention is to test whether the number of objects per slab would be higher than the number of slab control pointers possible. In theory it could be triggered: if someone tried to allocate 4-byte objects cache and explicitly requested with CFLGS_OFF_SLAB. So i kept the check. Out of historic interest i checked how old this bug was and it's ancient, 10 years old! It is the oldest hidden and then truly triggering bugs i ever saw being fixed in the kernel! Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 01 6月, 2006 1 次提交
-
-
由 Yasunori Goto 提交于
From: Yasunori Goto <y-goto@jp.fujitsu.com> If hot-added memory's address is smaller than old area, spanned_pages will not be updated. It must be fixed. example) Old zone_start_pfn = 0x60000, and spanned_pages = 0x10000 Added new memory's start_pfn = 0x50000, and end_pfn = 0x60000 new spanned_pages will be still 0x10000 by old code. (It should be updated to 0x20000.) Because old_zone_end_pfn will be 0x70000, and end_pfn smaller than it. So, spanned_pages will not be updated. In current code, spanned_pages is updated only when end_pfn is updated. But, it should be updated by subtraction between bigger end_pfn and new zone_start_pfn. Signed-off-by: NYasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: NDave Hansen <haveblue@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 22 5月, 2006 3 次提交
-
-
由 Bob Picco 提交于
Andy added code to buddy allocator which does not require the zone's endpoints to be aligned to MAX_ORDER. An issue is that the buddy allocator requires the node_mem_map's endpoints to be MAX_ORDER aligned. Otherwise __page_find_buddy could compute a buddy not in node_mem_map for partial MAX_ORDER regions at zone's endpoints. page_is_buddy will detect that these pages at endpoints are not PG_buddy (they were zeroed out by bootmem allocator and not part of zone). Of course the negative here is we could waste a little memory but the positive is eliminating all the old checks for zone boundary conditions. SPARSEMEM won't encounter this issue because of MAX_ORDER size constraint when SPARSEMEM is configured. ia64 VIRTUAL_MEM_MAP doesn't need the logic either because the holes and endpoints are handled differently. This leaves checking alloc_remap and other arches which privately allocate for node_mem_map. Signed-off-by: NBob Picco <bob.picco@hp.com> Acked-by: NMel Gorman <mel@csn.ul.ie> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Paul Jackson 提交于
Fix a couple of infrequently encountered 'sleeping function called from invalid context' in the cpuset hooks in __alloc_pages. Could sleep while interrupts disabled. The routine cpuset_zone_allowed() is called by code in mm/page_alloc.c __alloc_pages() to determine if a zone is allowed in the current tasks cpuset. This routine can sleep, for certain GFP_KERNEL allocations, if the zone is on a memory node not allowed in the current cpuset, but might be allowed in a parent cpuset. But we can't sleep in __alloc_pages() if in interrupt, nor if called for a GFP_ATOMIC request (__GFP_WAIT not set in gfp_flags). The rule was intended to be: Don't call cpuset_zone_allowed() if you can't sleep, unless you pass in the __GFP_HARDWALL flag set in gfp_flag, which disables the code that might scan up ancestor cpusets and sleep. This rule was being violated in a couple of places, due to a bogus change made (by myself, pj) to __alloc_pages() as part of the November 2005 effort to cleanup its logic, and also due to a later fix to constrain which swap daemons were awoken. The bogus change can be seen at: http://linux.derkeiler.com/Mailing-Lists/Kernel/2005-11/4691.html [PATCH 01/05] mm fix __alloc_pages cpuset ALLOC_* flags This was first noticed on a tight memory system, in code that was disabling interrupts and doing allocation requests with __GFP_WAIT not set, which resulted in __might_sleep() writing complaints to the log "Debug: sleeping function called ...", when the code in cpuset_zone_allowed() tried to take the callback_sem cpuset semaphore. We haven't seen a system hang on this 'might_sleep' yet, but we are at decent risk of seeing it fairly soon, especially since the additional cpuset_zone_allowed() check was added, conditioning wakeup_kswapd(), in March 2006. Special thanks to Dave Chinner, for figuring this out, and a tip of the hat to Nick Piggin who warned me of this back in Nov 2005, before I was ready to listen. Signed-off-by: NPaul Jackson <pj@sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Mike Kravetz 提交于
A bad calculation/loop in __section_nr() could result in incorrect section information being put into sysfs memory entries. This primarily impacts memory add operations as the sysfs information is used while onlining new memory. Fix suggested by Dave Hansen. Note that the bug may not be obvious from the patch. It actually occurs in the function's return statement: return (root_nr * SECTIONS_PER_ROOT) + (ms - root); In the existing code, root_nr has already been multiplied by SECTIONS_PER_ROOT. Signed-off-by: NMike Kravetz <kravetz@us.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 16 5月, 2006 3 次提交
-
-
由 Roland Dreier 提交于
With CONFIG_NUMA set, kmem_cache_destroy() may fail and say "Can't free all objects." The problem is caused by sequences such as the following (suppose we are on a NUMA machine with two nodes, 0 and 1): * Allocate an object from cache on node 0. * Free the object on node 1. The object is put into node 1's alien array_cache for node 0. * Call kmem_cache_destroy(), which ultimately ends up in __cache_shrink(). * __cache_shrink() does drain_cpu_caches(), which loops through all nodes. For each node it drains the shared array_cache and then handles the alien array_cache for the other node. However this means that node 0's shared array_cache will be drained, and then node 1 will move the contents of its alien[0] array_cache into that same shared array_cache. node 0's shared array_cache is never looked at again, so the objects left there will appear to be in use when __cache_shrink() calls __node_shrink() for node 0. So __node_shrink() will return 1 and kmem_cache_destroy() will fail. This patch fixes this by having drain_cpu_caches() do drain_alien_cache() on every node before it does drain_array() on the nodes' shared array_caches. The problem was originally reported by Or Gerlitz <ogerlitz@voltaire.com>. Signed-off-by: NRoland Dreier <rolandd@cisco.com> Acked-by: NChristoph Lameter <clameter@sgi.com> Acked-by: NPekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Mike Kravetz 提交于
slab_is_available() indicates slab based allocators are available for use. SPARSEMEM code needs to know this as it can be called at various times during the boot process. Signed-off-by: NMike Kravetz <kravetz@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Andrew Morton 提交于
As pointed out in http://bugzilla.kernel.org/show_bug.cgi?id=6490, this function can experience overflows on 32-bit machines, causing our response to changed values of min_free_kbytes to go whacky. Fixing it efficiently is all too hard, so fix it with 64-bit math instead. Cc: Ake Sandgren <ake.sandgren@hpc2n.umu.se> Cc: Martin Bligh <mbligh@google.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 02 5月, 2006 3 次提交
-
-
由 Joel H Schopp 提交于
Based on an older patch from Mike Kravetz <kravetz@us.ibm.com> We need to have a mem_map for high addresses in order to make fops->no_page work on spufs mem and register files. So far, we have used the memory_present() function during early bootup, but that did not work when CONFIG_NUMA was enabled. We now use the __add_pages() function to add the mem_map when loading the spufs module, which is a lot nicer. Signed-off-by: NArnd Bergmann <arnd.bergmann@de.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Mike Kravetz 提交于
This patch fixes two bugs with the way sparsemem interacts with memory add. They are: - memory leak if memmap for section already exists - calling alloc_bootmem_node() after boot These bugs were discovered and a first cut at the fixes were provided by Arnd Bergmann <arnd@arndb.de> and Joel Schopp <jschopp@us.ibm.com>. Signed-off-by: NMike Kravetz <kravetz@us.ibm.com> Signed-off-by: NJoel Schopp <jschopp@austin.ibm.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Christoph Lameter 提交于
Currently we check PageDirty() in order to make the decision to swap out the page. However, the dirty information may be only be contained in the ptes pointing to the page. We need to first unmap the ptes before checking for PageDirty(). If unmap is successful then the page count of the page will also be decreased so that pageout() works properly. This is a fix necessary for 2.6.17. Without this fix we may migrate dirty pages for filesystems without migration functions. Filesystems may keep pointers to dirty pages. Migration of dirty pages can result in the filesystem keeping pointers to freed pages. Unmapping is currently not be separated out from removing all the references to a page and moving the mapping. Therefore try_to_unmap will be called again in migrate_page() if the writeout is successful. However, it wont do anything since the ptes are already removed. The coming updates to the page migration code will restructure the code so that this is no longer necessary. Signed-off-by: NChristoph Lameter <clameter@sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 29 4月, 2006 1 次提交
-
-
由 shin, jacob 提交于
transfer_objects should only be called when all of the cpus in the node are online. CPU_DEAD notifier callback marks l3->shared to NULL. Signed-off-by: NJacob Shin <jacob.shin@amd.com> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 27 4月, 2006 1 次提交
-
-
由 Jens Axboe 提交于
find_get_pages_contig() will break out if we hit a hole in the page cache. From Andrew Morton, small modifications and documentation by me. Signed-off-by: NJens Axboe <axboe@suse.de>
-
- 26 4月, 2006 1 次提交
-
-
由 Chandra Seetharaman 提交于
Few of the notifier_chain_register() callers use __init in the definition of notifier_call. It is incorrect as the function definition should be available after the initializations (they do not unregister them during initializations). This patch fixes all such usages to _not_ have the notifier_call __init section. Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 23 4月, 2006 1 次提交
-
-
由 Lee Schermerhorn 提交于
Basic problem: pages of a shared memory segment can only be migrated once. In 2.6.16 through 2.6.17-rc1, shared memory mappings do not have a migratepage address space op. Therefore, migrate_pages() falls back to default processing. In this path, it will try to pageout() dirty pages. Once a shared memory page has been migrated it becomes dirty, so migrate_pages() will try to page it out. However, because the page count is 3 [cache + current + pte], pageout() will return PAGE_KEEP because is_page_cache_freeable() returns false. This will abort all subsequent migrations. This patch adds a migratepage address space op to shared memory segments to avoid taking the default path. We use the "migrate_page()" function because it knows how to migrate dirty pages. This allows shared memory segment pages to migrate, subject to other conditions such as # pte's referencing the page [page_mapcount(page)], when requested. I think this is safe. If we're migrating a shared memory page, then we found the page via a page table, so it must be in memory. Can be verified with memtoy and the shmem-mbind-test script, both available at: http://free.linux.hp.com/~lts/Tools/Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: NChristoph Lameter <clameter@sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-