- 03 4月, 2009 6 次提交
-
-
由 KAMEZAWA Hiroyuki 提交于
In following situation, with memory subsystem, /groupA use_hierarchy==1 /01 some tasks /02 some tasks /03 some tasks /04 empty When tasks under 01/02/03 hit limit on /groupA, hierarchical reclaim is triggered and the kernel walks tree under groupA. In this case, rmdir /groupA/04 fails with -EBUSY frequently because of temporal refcnt from the kernel. In general. cgroup can be rmdir'd if there are no children groups and no tasks. Frequent fails of rmdir() is not useful to users. (And the reason for -EBUSY is unknown to users.....in most cases) This patch tries to modify above behavior, by - retries if css_refcnt is got by someone. - add "return value" to pre_destroy() and allows subsystem to say "we're really busy!" Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Paul Menage <menage@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jean Delvare 提交于
It is a fairly common operation to have a pointer to a work and to need a pointer to the delayed work it is contained in. In particular, all delayed works which want to rearm themselves will have to do that. So it would seem fair to offer a helper function for this operation. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NJean Delvare <khali@linux-fr.org> Acked-by: NIngo Molnar <mingo@elte.hu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Greg KH <greg@kroah.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Martin Schwidefsky 提交于
The calculation of the value nr in do_xip_mapping_read is incorrect. If the copy required more than one iteration in the do while loop the copies variable will be non-zero. The maximum length that may be passed to the call to copy_to_user(buf+copied, xip_mem+offset, nr) is len-copied but the check only compares against (nr > len). This bug is the cause for the heap corruption Carsten has been chasing for so long: *** glibc detected *** /bin/bash: free(): invalid next size (normal): 0x00000000800e39f0 *** ======= Backtrace: ========= /lib64/libc.so.6[0x200000b9b44] /lib64/libc.so.6(cfree+0x8e)[0x200000bdade] /bin/bash(free_buffered_stream+0x32)[0x80050e4e] /bin/bash(close_buffered_stream+0x1c)[0x80050ea4] /bin/bash(unset_bash_input+0x2a)[0x8001c366] /bin/bash(make_child+0x1d4)[0x8004115c] /bin/bash[0x8002fc3c] /bin/bash(execute_command_internal+0x656)[0x8003048e] /bin/bash(execute_command+0x5e)[0x80031e1e] /bin/bash(execute_command_internal+0x79a)[0x800305d2] /bin/bash(execute_command+0x5e)[0x80031e1e] /bin/bash(reader_loop+0x270)[0x8001efe0] /bin/bash(main+0x1328)[0x8001e960] /lib64/libc.so.6(__libc_start_main+0x100)[0x200000592a8] /bin/bash(clearerr+0x5e)[0x8001c092] With this bug fix the commit 0e4a9b59 "ext2/xip: refuse to change xip flag during remount with busy inodes" can be removed again. Cc: Carsten Otte <cotte@de.ibm.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Jared Hulbert <jaredeh@gmail.com> Cc: <stable@kernel.org> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Anton Blanchard 提交于
Even though vmstat_work is marked deferrable, there are still benefits to aligning it. For certain applications we want to keep OS jitter as low as possible and aligning timers and work so they occur together can reduce their overall impact. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Howells 提交于
Fix a number of issues with the per-MM VMA patch: (1) Make mmap_pages_allocated an atomic_long_t, just in case this is used on a NOMMU system with more than 2G pages. Makes no difference on a 32-bit system. (2) Report vma->vm_pgoff * PAGE_SIZE as a 64-bit value, not a 32-bit value, lest it overflow. (3) Move the allocation of the vm_area_struct slab back for fork.c. (4) Use KMEM_CACHE() for both vm_area_struct and vm_region slabs. (5) Use BUG_ON() rather than if () BUG(). (6) Make the default validate_nommu_regions() a static inline rather than a #define. (7) Make free_page_series()'s objection to pages with a refcount != 1 more informative. (8) Adjust the __put_nommu_region() banner comment to indicate that the semaphore must be held for writing. (9) Limit the number of warnings about munmaps of non-mmapped regions. Reported-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NDavid Howells <dhowells@redhat.com> Cc: Greg Ungerer <gerg@snapgear.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Akinobu Mita 提交于
This fixes a build failure with generic debug pagealloc: mm/debug-pagealloc.c: In function 'set_page_poison': mm/debug-pagealloc.c:8: error: 'struct page' has no member named 'debug_flags' mm/debug-pagealloc.c: In function 'clear_page_poison': mm/debug-pagealloc.c:13: error: 'struct page' has no member named 'debug_flags' mm/debug-pagealloc.c: In function 'page_poison': mm/debug-pagealloc.c:18: error: 'struct page' has no member named 'debug_flags' mm/debug-pagealloc.c: At top level: mm/debug-pagealloc.c:120: error: redefinition of 'kernel_map_pages' include/linux/mm.h:1278: error: previous definition of 'kernel_map_pages' was here mm/debug-pagealloc.c: In function 'kernel_map_pages': mm/debug-pagealloc.c:122: error: 'debug_pagealloc_enabled' undeclared (first use in this function) by fixing - debug_flags should be in struct page - define DEBUG_PAGEALLOC config option for all architectures Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Reported-by: NAlexander Beregalov <a.beregalov@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 4月, 2009 25 次提交
-
-
由 Hugh Dickins 提交于
Synopsis: if shmem_writepage calls swap_writepage directly, most shmem swap loads benefit, and a catastrophic interaction between SLUB and some flash storage is avoided. shmem_writepage() has always been peculiar in making no attempt to write: it has just transferred a shmem page from file cache to swap cache, then let that page make its way around the LRU again before being written and freed. The idea was that people use tmpfs because they want those pages to stay in RAM; so although we give it an overflow to swap, we should resist writing too soon, giving those pages a second chance before they can be reclaimed. That was always questionable, and I've toyed with this patch for years; but never had a clear justification to depart from the original design. It became more questionable in 2.6.28, when the split LRU patches classed shmem and tmpfs pages as SwapBacked rather than as file_cache: that in itself gives them more resistance to reclaim than normal file pages. I prepared this patch for 2.6.29, but the merge window arrived before I'd completed gathering statistics to justify sending it in. Then while comparing SLQB against SLUB, running SLUB on a laptop I'd habitually used with SLAB, I found SLUB to run my tmpfs kbuild swapping tests five times slower than SLAB or SLQB - other machines slower too, but nowhere near so bad. Simpler "cp -a" swapping tests showed the same. slub_max_order=0 brings sanity to all, but heavy swapping is too far from normal to justify such a tuning. The crucial factor on that laptop turns out to be that I'm using an SD card for swap. What happens is this: By default, SLUB uses order-2 pages for shmem_inode_cache (and many other fs inodes), so creating tmpfs files under memory pressure brings lumpy reclaim into play. One subpage of the order is chosen from the bottom of the LRU as usual, then the other three picked out from their random positions on the LRUs. In a tmpfs load, many of these pages will be ones which already passed through shmem_writepage, so already have swap allocated. And though their offsets on swap were probably allocated sequentially, now that the pages are picked off at random, their swap offsets are scattered. But the flash storage on the SD card is very sensitive to having its writes merged: once swap is written at scattered offsets, performance falls apart. Rotating disk seeks increase too, but less disastrously. So: stop giving shmem/tmpfs pages a second pass around the LRU, write them out to swap as soon as their swap has been allocated. It's surely possible to devise an artificial load which runs faster the old way, one whose sizing is such that the tmpfs pages on their second pass are the ones that are wanted again, and other pages not. But I've not yet found such a load: on all machines, under the loads I've tried, immediate swap_writepage speeds up shmem swapping: especially when using the SLUB allocator (and more effectively than slub_max_order=0), but also with the others; and it also reduces the variance between runs. How much faster varies widely: a factor of five is rare, 5% is common. One load which might have suffered: imagine a swapping shmem load in a limited mem_cgroup on a machine with plenty of memory. Before 2.6.29 the swapcache was not charged, and such a load would have run quickest with the shmem swapcache never written to swap. But now swapcache is charged, so even this load benefits from shmem_writepage directly to swap. Apologies for the #ifndef CONFIG_SWAP swap_writepage() stub in swap.h: it's silly because that will never get called; but refactoring shmem.c sensibly according to CONFIG_SWAP will be a separate task. Signed-off-by: NHugh Dickins <hugh@veritas.com> Acked-by: NPekka Enberg <penberg@cs.helsinki.fi> Acked-by: NRik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 KAMEZAWA Hiroyuki 提交于
try_to_free_pages() is used for the direct reclaim of up to SWAP_CLUSTER_MAX pages when watermarks are low. The caller to alloc_pages_nodemask() can specify a nodemask of nodes that are allowed to be used but this is not passed to try_to_free_pages(). This can lead to unnecessary reclaim of pages that are unusable by the caller and int the worst case lead to allocation failure as progress was not been make where it is needed. This patch passes the nodemask used for alloc_pages_nodemask() to try_to_free_pages(). Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: NMel Gorman <mel@csn.ul.ie> Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Rientjes 提交于
When a shrinker has a negative number of objects to delete, the symbol name of the shrinker should be printed, not shrink_slab. This also makes the error message slightly more informative. Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Howells 提交于
Make CONFIG_UNEVICTABLE_LRU available when CONFIG_MMU=n. There's no logical reason it shouldn't be available, and it can be used for ramfs. Signed-off-by: NDavid Howells <dhowells@redhat.com> Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Greg Ungerer <gerg@snapgear.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Rik van Riel <riel@redhat.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Enrik Berkhan <Enrik.Berkhan@ge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Howells 提交于
The mlock() facility does not exist for NOMMU since all mappings are effectively locked anyway, so we don't make the bits available when they're not useful. Signed-off-by: NDavid Howells <dhowells@redhat.com> Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Greg Ungerer <gerg@snapgear.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Rik van Riel <riel@redhat.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Enrik Berkhan <Enrik.Berkhan@ge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Akinobu Mita 提交于
x86 has debug_kmap_atomic_prot() which is error checking function for kmap_atomic. It is usefull for the other architectures, although it needs CONFIG_TRACE_IRQFLAGS_SUPPORT. This patch exposes it to the other architectures. Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nick Piggin 提交于
Change the page_mkwrite prototype to take a struct vm_fault, and return VM_FAULT_xxx flags. There should be no functional change. This makes it possible to return much more detailed error information to the VM (and also can provide more information eg. virtual_address to the driver, which might be important in some special cases). This is required for a subsequent fix. And will also make it easier to merge page_mkwrite() with fault() in future. Signed-off-by: NNick Piggin <npiggin@suse.de> Cc: Chris Mason <chris.mason@oracle.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Artem Bityutskiy <dedekind@infradead.org> Cc: Felix Blyakher <felixb@sgi.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexey Dobriyan 提交于
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=9838 On i386, HZ=1000, jiffies_to_clock_t() converts time in a somewhat strange way from the user's point of view: # echo 500 >/proc/sys/vm/dirty_writeback_centisecs # cat /proc/sys/vm/dirty_writeback_centisecs 499 So, we have 5000 jiffies converted to only 499 clock ticks and reported back. TICK_NSEC = 999848 ACTHZ = 256039 Keeping in-kernel variable in units passed from userspace will fix issue of course, but this probably won't be right for every sysctl. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Akinobu Mita 提交于
CONFIG_DEBUG_PAGEALLOC is now supported by x86, powerpc, sparc64, and s390. This patch implements it for the rest of the architectures by filling the pages with poison byte patterns after free_pages() and verifying the poison patterns before alloc_pages(). This generic one cannot detect invalid page accesses immediately but invalid read access may cause invalid dereference by poisoned memory and invalid write access can be detected after a long delay. Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Li Zefan 提交于
I notice there are many places doing copy_from_user() which follows kmalloc(): dst = kmalloc(len, GFP_KERNEL); if (!dst) return -ENOMEM; if (copy_from_user(dst, src, len)) { kfree(dst); return -EFAULT } memdup_user() is a wrapper of the above code. With this new function, we don't have to write 'len' twice, which can lead to typos/mistakes. It also produces smaller code and kernel text. A quick grep shows 250+ places where memdup_user() *may* be used. I'll prepare a patchset to do this conversion. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Americo Wang <xiyou.wangcong@gmail.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Roel Kluin 提交于
chg is unsigned, so it cannot be less than 0. Also, since region_chg returns long, let vma_needs_reservation() forward this to alloc_huge_page(). Store it as long as well. all callers cast it to long anyway. Signed-off-by: NRoel Kluin <roel.kluin@gmail.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Adam Litke <agl@us.ibm.com> Cc: Johannes Weiner <hannes@saeurebad.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 KOSAKI Motohiro 提交于
pagevec_swap_free() is now unused. Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Rik van Riel <riel@redhat.com> Acked-by: NHugh Dickins <hugh@veritas.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
The pagevec_swap_free() at the end of shrink_active_list() was introduced in 68a22394 "vmscan: free swap space on swap-in/activation" when shrink_active_list() was still rotating referenced active pages. In 7e9cd484 "vmscan: fix pagecache reclaim referenced bit check" this was changed, the rotating removed but the pagevec_swap_free() after the rotation loop was forgotten, applying now to the pagevec of the deactivation loop instead. Now swap space is freed for deactivated pages. And only for those that happen to be on the pagevec after the deactivation loop. Complete 7e9cd484 and remove the rest of the swap freeing. Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NRik van Riel <riel@redhat.com> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
In shrink_active_list() after the deactivation loop, we strip buffer heads from the potentially remaining pages in the pagevec. Currently, this drops the zone's lru lock for stripping, only to reacquire it again afterwards to update statistics. It is not necessary to strip the pages before updating the stats, so move the whole thing out of the protected region and save the extra locking. Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org> Reviewed-by: NMinChan Kim <minchan.kim@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Edward Shishkin 提交于
Add a helper function account_page_dirtied(). Use that from two callsites. reiser4 adds a function which adds a third callsite. Signed-off-by: Edward Shishkin<edward.shishkin@gmail.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
During page allocation, there are two stages of direct reclaim that are applied to each zone in the preferred list. The first stage using zone_reclaim() reclaims unmapped file backed pages and slab pages if over defined limits as these are cheaper to reclaim. The caller specifies the order of the target allocation but the scan control is not being correctly initialised. The impact is that the correct number of pages are being reclaimed but that lumpy reclaim is not being applied. This increases the chances of a full direct reclaim via try_to_free_pages() is required. This patch initialises the order field of the scan control as requested by the caller. [mel@csn.ul.ie: rewrote changelog] Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NMel Gorman <mel@csn.ul.ie> Cc: Rik van Riel <riel@redhat.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 KOSAKI Motohiro 提交于
At first look, mark_page_accessed() in follow_page() seems a bit strange. It seems pte_mkyoung() would be better consistent with other kernel code. However, it is intentional. The commit log said: ------------------------------------------------ commit 9e45f61d69be9024a2e6bef3831fb04d90fac7a8 Author: akpm <akpm> Date: Fri Aug 15 07:24:59 2003 +0000 [PATCH] Use mark_page_accessed() in follow_page() Touching a page via follow_page() counts as a reference so we should be either setting the referenced bit in the pte or running mark_page_accessed(). Altering the pte is tricky because we haven't implemented an atomic pte_mkyoung(). And mark_page_accessed() is better anyway because it has more aging state: it can move the page onto the active list. BKrev: 3f3c8acbplT8FbwBVGtth7QmnqWkIw ------------------------------------------------ The atomic issue is still true nowadays. adding comment help to understand code intention and it would be better. [akpm@linux-foundation.org: clarify text] Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: NHugh Dickins <hugh@veritas.com> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
shrink_inactive_list() scans in sc->swap_cluster_max chunks until it hits the scan limit it was passed. shrink_inactive_list() { do { isolate_pages(swap_cluster_max) shrink_page_list() } while (nr_scanned < max_scan); } This assumes that swap_cluster_max is not bigger than the scan limit because the latter is checked only after at least one iteration. In shrink_all_memory() sc->swap_cluster_max is initialized to the overall reclaim goal in the beginning but not decreased while reclaim is making progress which leads to subsequent calls to shrink_inactive_list() reclaiming way too much in the one iteration that is done unconditionally. Set sc->swap_cluster_max always to the proper goal before doing shrink_all_zones() shrink_list() shrink_inactive_list(). While the current shrink_all_memory() happily reclaims more than actually requested, this patch fixes it to never exceed the goal: unpatched wanted=10000 reclaimed=13356 wanted=10000 reclaimed=19711 wanted=10000 reclaimed=10289 wanted=10000 reclaimed=17306 wanted=10000 reclaimed=10700 wanted=10000 reclaimed=10004 wanted=10000 reclaimed=13301 wanted=10000 reclaimed=10976 wanted=10000 reclaimed=10605 wanted=10000 reclaimed=10088 wanted=10000 reclaimed=15000 patched wanted=10000 reclaimed=10000 wanted=10000 reclaimed=9599 wanted=10000 reclaimed=8476 wanted=10000 reclaimed=8326 wanted=10000 reclaimed=10000 wanted=10000 reclaimed=10000 wanted=10000 reclaimed=9919 wanted=10000 reclaimed=10000 wanted=10000 reclaimed=10000 wanted=10000 reclaimed=10000 wanted=10000 reclaimed=10000 wanted=10000 reclaimed=9624 wanted=10000 reclaimed=10000 wanted=10000 reclaimed=10000 wanted=8500 reclaimed=8092 wanted=316 reclaimed=316 Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org> Reviewed-by: NMinChan Kim <minchan.kim@gmail.com> Acked-by: NNigel Cunningham <ncunningham@crca.org.au> Acked-by: N"Rafael J. Wysocki" <rjw@sisk.pl> Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 MinChan Kim 提交于
Commit a79311c1 "vmscan: bail out of direct reclaim after swap_cluster_max pages" moved the nr_reclaimed counter into the scan control to accumulate the number of all reclaimed pages in a reclaim invocation. shrink_all_memory() can use the same mechanism. it increase code consistency and redability. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NMinChan Kim <minchan.kim@gmail.com> Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 KOSAKI Motohiro 提交于
commit bf3f3bc5 (mm: don't mark_page_accessed in fault path) only remove the mark_page_accessed() in filemap_fault(). Therefore, swap-backed pages and file-backed pages have inconsistent behavior. mark_page_accessed() should be removed from do_swap_page(). Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Hugh Dickins <hugh@veritas.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 KOSAKI Motohiro 提交于
Impact: cleanup In almost cases, for_each_zone() is used with populated_zone(). It's because almost function doesn't need memoryless node information. Therefore, for_each_populated_zone() can help to make code simplify. This patch has no functional change. [akpm@linux-foundation.org: small cleanup] Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
sc.may_swap does not only influence reclaiming of anon pages but pages mapped into pagetables in general, which also includes mapped file pages. In shrink_page_list(): if (!sc->may_swap && page_mapped(page)) goto keep_locked; For anon pages, this makes sense as they are always mapped and reclaiming them always requires swapping. But mapped file pages are skipped here as well and it has nothing to do with swapping. The real effect of the knob is whether mapped pages are unmapped and reclaimed or not. Rename it to `may_unmap' to have its name match its actual meaning more precisely. Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org> Reviewed-by: NMinChan Kim <minchan.kim@gmail.com> Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cyrill Gorcunov 提交于
There is no need to call for int_sqrt if argument is 0. Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Christoph Lameter <cl@linux-foundation.org> Acked-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 MinChan Kim 提交于
vmap's dirty_list is unused. It's for optimizing flushing. but Nick didn't write the code yet. so, we don't need it until time as it is needed. This patch removes vmap_block's dirty_list and codes related to it. Signed-off-by: NMinChan Kim <minchan.kim@gmail.com> Acked-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cyrill Gorcunov 提交于
In case if start_pfn overlap the upper bound no need to test end_pfn again since we have it already trimmed. Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org> Reviewed-by: NChristoph Lameter <cl@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 31 3月, 2009 1 次提交
-
-
由 Ingo Molnar 提交于
Impact: build fix fix typo in mm/slob.c: mm/slob.c:469: error: ‘flags’ undeclared (first use in this function) mm/slob.c:469: error: (Each undeclared identifier is reported only once mm/slob.c:469: error: for each function it appears in.) Cc: Nick Piggin <npiggin@suse.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090128135457.350751756@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 30 3月, 2009 2 次提交
-
-
由 Rusty Russell 提交于
Impact: cleanup Time to clean up remaining laggards using the old cpu_ functions. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Trond.Myklebust@netapp.com
-
由 Rusty Russell 提交于
Impact: cleanup (Thanks to Al Viro for reminding me of this, via Ingo) CPU_MASK_ALL is the (deprecated) "all bits set" cpumask, defined as so: #define CPU_MASK_ALL (cpumask_t) { { ... } } Taking the address of such a temporary is questionable at best, unfortunately 321a8e9d (cpumask: add CPU_MASK_ALL_PTR macro) added CPU_MASK_ALL_PTR: #define CPU_MASK_ALL_PTR (&CPU_MASK_ALL) Which formalizes this practice. One day gcc could bite us over this usage (though we seem to have gotten away with it so far). So replace everywhere which used &CPU_MASK_ALL or CPU_MASK_ALL_PTR with the modern "cpu_all_mask" (a real const struct cpumask *). Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Acked-by: NIngo Molnar <mingo@elte.hu> Reported-by: NAl Viro <viro@zeniv.linux.org.uk> Cc: Mike Travis <travis@sgi.com>
-
- 27 3月, 2009 1 次提交
-
-
由 Wu Fengguang 提交于
Enlarge default dirty ratios from 5/10 to 10/20. This fixes [Bug #12809] iozone regression with 2.6.29-rc6. The iozone benchmarks are performed on a 1200M file, with 8GB ram. iozone -i 0 -i 1 -i 2 -i 3 -i 4 -r 4k -s 64k -s 512m -s 1200m -b tmp.xls iozone -B -r 4k -s 64k -s 512m -s 1200m -b tmp.xls The performance regression is triggered by commit 1cf6e7d8(mm: task dirty accounting fix), which makes more correct/thorough dirty accounting. The default 5/10 dirty ratios were picked (a) with the old dirty logic and (b) largely at random and (c) designed to be aggressive. In particular, that (a) means that having fixed some of the dirty accounting, maybe the real bug is now that it was always too aggressive, just hidden by an accounting issue. The enlarged 10/20 dirty ratios are just about enough to fix the regression. [ We will have to look at how this affects the old fsync() latency issue, but that probably will need independent work. - Linus ] Cc: Nick Piggin <npiggin@suse.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Reported-by: N"Lin, Ming M" <ming.m.lin@intel.com> Tested-by: N"Lin, Ming M" <ming.m.lin@intel.com> Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 26 3月, 2009 1 次提交
-
-
由 Jens Axboe 提交于
It really makes no sense to have it in readahead.c, so move it where it belongs. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 23 3月, 2009 2 次提交
-
-
由 Nick Piggin 提交于
Don't hold SLOB lock when freeing the page. Reduces lock hold width. See the following thread for discussion of the bug: http://marc.info/?l=linux-kernel&m=123709983214143&w=2Reported-by: NIngo Molnar <mingo@elte.hu> Acked-by: NMatt Mackall <mpm@selenic.com> Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
-
由 Akinobu Mita 提交于
Use get_track() in set_track() Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
-
- 16 3月, 2009 1 次提交
-
-
由 Nicolas Pitre 提交于
Most ARM machines have a non IO coherent cache, meaning that the dma_map_*() set of functions must clean and/or invalidate the affected memory manually before DMA occurs. And because the majority of those machines have a VIVT cache, the cache maintenance operations must be performed using virtual addresses. When a highmem page is kunmap'd, its mapping (and cache) remains in place in case it is kmap'd again. However if dma_map_page() is then called with such a page, some cache maintenance on the remaining mapping must be performed. In that case, page_address(page) is non null and we can use that to synchronize the cache. It is unlikely but still possible for kmap() to race and recycle the virtual address obtained above, and use it for another page before some on-going cache invalidation loop in dma_map_page() is done. In that case, the new mapping could end up with dirty cache lines for another page, and the unsuspecting cache invalidation loop in dma_map_page() might simply discard those dirty cache lines resulting in data loss. For example, let's consider this sequence of events: - dma_map_page(..., DMA_FROM_DEVICE) is called on a highmem page. --> - vaddr = page_address(page) is non null. In this case it is likely that the page has valid cache lines associated with vaddr. Remember that the cache is VIVT. --> for (i = vaddr; i < vaddr + PAGE_SIZE; i += 32) invalidate_cache_line(i); *** preemption occurs in the middle of the loop above *** - kmap_high() is called for a different page. --> - last_pkmap_nr wraps to zero and flush_all_zero_pkmaps() is called. The pkmap_count value for the page passed to dma_map_page() above happens to be 1, so the page is unmapped. But prior to that, flush_cache_kmaps() cleared the cache for it. So far so good. - A fresh pkmap entry is assigned for this kmap request. The Murphy law says this pkmap entry will eventually happen to use the same vaddr as the one which used to belong to the other page being processed by dma_map_page() in the preempted thread above. - The kmap_high() caller start dirtying the cache using the just assigned virtual mapping for its page. *** the first thread is rescheduled *** - The for(...) loop is resumed, but now cached data belonging to a different physical page is being discarded ! And this is not only a preemption issue as ARM can be SMP as well, making the above scenario just as likely. Hence the need for some kind of pkmap page pinning which can be used in any context, primarily for the benefit of dma_map_page() on ARM. This provides the necessary interface to cope with the above issue if ARCH_NEEDS_KMAP_HIGH_GET is defined, otherwise the resulting code is unchanged. Signed-off-by: NNicolas Pitre <nico@marvell.com> Reviewed-by: NMinChan Kim <minchan.kim@gmail.com> Acked-by: NAndrew Morton <akpm@linux-foundation.org>
-
- 15 3月, 2009 1 次提交
-
-
由 Daisuke Nishimura 提交于
pgmoved should be cleared after updating recent_rotated. Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Rik van Riel <riel@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-