- 27 5月, 2011 3 次提交
-
-
由 KOSAKI Motohiro 提交于
The type of vma->vm_flags is 'unsigned long'. Neither 'int' nor 'unsigned int'. This patch fixes such misuse. Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> [ Changed to use a typedef - we'll extend it to cover more cases later, since there has been discussion about making it a 64-bit type.. - Linus ] Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dan Magenheimer 提交于
This fourth patch of eight in this cleancache series provides the core hooks in VFS for: initializing cleancache per filesystem; capturing clean pages reclaimed by page cache; attempting to get pages from cleancache before filesystem read; and ensuring coherency between pagecache, disk, and cleancache. Note that the placement of these hooks was stable from 2.6.18 to 2.6.38; a minor semantic change was required due to a patchset in 2.6.39. All hooks become no-ops if CONFIG_CLEANCACHE is unset, or become a check of a boolean global if CONFIG_CLEANCACHE is set but no cleancache "backend" has claimed cleancache_ops. Details and a FAQ can be found in Documentation/vm/cleancache.txt [v8: minchan.kim@gmail.com: adapt to new remove_from_page_cache function] Signed-off-by: NChris Mason <chris.mason@oracle.com> Signed-off-by: NDan Magenheimer <dan.magenheimer@oracle.com> Reviewed-by: NJeremy Fitzhardinge <jeremy@goop.org> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Rik Van Riel <riel@redhat.com> Cc: Jan Beulich <JBeulich@novell.com> Cc: Andreas Dilger <adilger@sun.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Nitin Gupta <ngupta@vflare.org>
-
由 Dan Magenheimer 提交于
This third patch of eight in this cleancache series provides the core code for cleancache that interfaces between the hooks in VFS and individual filesystems and a cleancache backend. It also includes build and config patches. Two new files are added: mm/cleancache.c and include/linux/cleancache.h. Note that CONFIG_CLEANCACHE can default to on; in systems that do not provide a cleancache backend, all hooks devolve to a simple check of a global enable flag, so performance impact should be negligible but can be reduced to zero impact if config'ed off. However for this first commit, it defaults to off. Details and a FAQ can be found in Documentation/vm/cleancache.txt Credits: Cleancache_ops design derived from Jeremy Fitzhardinge design for tmem [v8: dan.magenheimer@oracle.com: fix exportfs call affecting btrfs] [v8: akpm@linux-foundation.org: use static inline function, not macro] [v7: dan.magenheimer@oracle.com: cleanup sysfs and remove cleancache prefix] [v6: JBeulich@novell.com: robustly handle buggy fs encode_fh actor definition] [v5: jeremy@goop.org: clean up global usage and static var names] [v5: jeremy@goop.org: simplify init hook and any future fs init changes] [v5: hch@infradead.org: cleaner non-global interface for ops registration] [v4: adilger@sun.com: interface must support exportfs FS's] [v4: hch@infradead.org: interface must support 64-bit FS on 32-bit kernel] [v3: akpm@linux-foundation.org: use one ops struct to avoid pointer hops] [v3: akpm@linux-foundation.org: document and ensure PageLocked reqts are met] [v3: ngupta@vflare.org: fix success/fail codes, change funcs to void] [v2: viro@ZenIV.linux.org.uk: use sane types] Signed-off-by: NDan Magenheimer <dan.magenheimer@oracle.com> Reviewed-by: NJeremy Fitzhardinge <jeremy@goop.org> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: NAl Viro <viro@ZenIV.linux.org.uk> Acked-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NNitin Gupta <ngupta@vflare.org> Acked-by: NMinchan Kim <minchan.kim@gmail.com> Acked-by: NAndreas Dilger <adilger@sun.com> Acked-by: NJan Beulich <JBeulich@novell.com> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Rik Van Riel <riel@redhat.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com>
-
- 26 5月, 2011 1 次提交
-
-
由 Linus Torvalds 提交于
Commit a71ae47a ("slub: Fix double bit unlock in debug mode") removed the only goto to this label, resulting in mm/slub.c: In function '__slab_alloc': mm/slub.c:1834: warning: label 'unlock_out' defined but not used fixed trivially by the removal of the label itself too. Reported-by: NStephen Rothwell <sfr@canb.auug.org.au> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 5月, 2011 36 次提交
-
-
由 Bob Liu 提交于
Currently on nommu arch mmap(),mremap() and munmap() doesn't do page_align() which isn't consist with mmu arch and cause some issues. First, some drivers' mmap() function depends on vma->vm_end - vma->start is page aligned which is true on mmu arch but not on nommu. eg: uvc camera driver. Second munmap() may return -EINVAL[split file] error in cases when end is not page aligned(passed into from userspace) but vma->vm_end is aligned dure to split or driver's mmap() ops. Add page alignment to fix those issues. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NBob Liu <lliubbo@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Greg Ungerer <gerg@snapgear.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Shaohua Li 提交于
The zone->lru_lock is heavily contented in workload where activate_page() is frequently used. We could do batch activate_page() to reduce the lock contention. The batched pages will be added into zone list when the pool is full or page reclaim is trying to drain them. For example, in a 4 socket 64 CPU system, create a sparse file and 64 processes, processes shared map to the file. Each process read access the whole file and then exit. The process exit will do unmap_vmas() and cause a lot of activate_page() call. In such workload, we saw about 58% total time reduction with below patch. Other workloads with a lot of activate_page also benefits a lot too. Andrew Morton suggested activate_page() and putback_lru_pages() should follow the same path to active pages, but this is hard to implement (see commit 7a608572 ("Revert "mm: batch activate_page() to reduce lock contention")). On the other hand, do we really need putback_lru_pages() to follow the same path? I tested several FIO/FFSB benchmark (about 20 scripts for each benchmark) in 3 machines here from 2 sockets to 4 sockets. My test doesn't show anything significant with/without below patch (there is slight difference but mostly some noise which we found even without below patch before). Below patch basically returns to the same as my first post. I tested some microbenchmarks: case-anon-cow-rand-mt 0.58% case-anon-cow-rand -3.30% case-anon-cow-seq-mt -0.51% case-anon-cow-seq -5.68% case-anon-r-rand-mt 0.23% case-anon-r-rand 0.81% case-anon-r-seq-mt -0.71% case-anon-r-seq -1.99% case-anon-rx-rand-mt 2.11% case-anon-rx-seq-mt 3.46% case-anon-w-rand-mt -0.03% case-anon-w-rand -0.50% case-anon-w-seq-mt -1.08% case-anon-w-seq -0.12% case-anon-wx-rand-mt -5.02% case-anon-wx-seq-mt -1.43% case-fork 1.65% case-fork-sleep -0.07% case-fork-withmem 1.39% case-hugetlb -0.59% case-lru-file-mmap-read-mt -0.54% case-lru-file-mmap-read 0.61% case-lru-file-mmap-read-rand -2.24% case-lru-file-readonce -0.64% case-lru-file-readtwice -11.69% case-lru-memcg -1.35% case-mmap-pread-rand-mt 1.88% case-mmap-pread-rand -15.26% case-mmap-pread-seq-mt 0.89% case-mmap-pread-seq -69.72% case-mmap-xread-rand-mt 0.71% case-mmap-xread-seq-mt 0.38% The most significent are: case-lru-file-readtwice -11.69% case-mmap-pread-rand -15.26% case-mmap-pread-seq -69.72% which use activate_page a lot. others are basically variations because each run has slightly difference. In UP case, 'size mm/swap.o' before the two patches: text data bss dec hex filename 6466 896 4 7366 1cc6 mm/swap.o after the two patches: text data bss dec hex filename 6343 896 4 7243 1c4b mm/swap.o Signed-off-by: NShaohua Li <shaohua.li@intel.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Hiroyuki Kamezawa <kamezawa.hiroyuki@gmail.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrew Barry 提交于
I believe I found a problem in __alloc_pages_slowpath, which allows a process to get stuck endlessly looping, even when lots of memory is available. Running an I/O and memory intensive stress-test I see a 0-order page allocation with __GFP_IO and __GFP_WAIT, running on a system with very little free memory. Right about the same time that the stress-test gets killed by the OOM-killer, the utility trying to allocate memory gets stuck in __alloc_pages_slowpath even though most of the systems memory was freed by the oom-kill of the stress-test. The utility ends up looping from the rebalance label down through the wait_iff_congested continiously. Because order=0, __alloc_pages_direct_compact skips the call to get_page_from_freelist. Because all of the reclaimable memory on the system has already been reclaimed, __alloc_pages_direct_reclaim skips the call to get_page_from_freelist. Since there is no __GFP_FS flag, the block with __alloc_pages_may_oom is skipped. The loop hits the wait_iff_congested, then jumps back to rebalance without ever trying to get_page_from_freelist. This loop repeats infinitely. The test case is pretty pathological. Running a mix of I/O stress-tests that do a lot of fork() and consume all of the system memory, I can pretty reliably hit this on 600 nodes, in about 12 hours. 32GB/node. Signed-off-by: NAndrew Barry <abarry@cray.com> Signed-off-by: NMinchan Kim <minchan.kim@gmail.com> Reviewed-by: Rik van Riel<riel@redhat.com> Acked-by: NMel Gorman <mgorman@suse.de> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Hocko 提交于
The noswapaccount parameter has been deprecated since 2.6.38 without any complaints from users so we can remove it. swapaccount=0|1 can be used instead. As we are removing the parameter we can also clean up swapaccount because it doesn't have to accept an empty string anymore (to match noswapaccount) and so we can push = into __setup macro rather than checking "=1" resp. "=0" strings Signed-off-by: NMichal Hocko <mhocko@suse.cz> Cc: Hiroyuki Kamezawa <kamezawa.hiroyuki@gmail.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Stephen Wilson 提交于
Moving show_numa_map() from mempolicy.c to task_mmu.c solves several issues. - Having the show() operation "miles away" from the corresponding seq_file iteration operations is a maintenance burden. - The need to export ad hoc info like struct proc_maps_private is eliminated. - The implementation of show_numa_map() can be improved in a simple manner by cooperating with the other seq_file operations (start, stop, etc) -- something that would be messy to do without this change. Signed-off-by: NStephen Wilson <wilsons@start.ca> Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Stephen Wilson 提交于
This function has been superseded by gather_hugetbl_stats() and is no longer needed. Signed-off-by: NStephen Wilson <wilsons@start.ca> Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Stephen Wilson 提交于
Improve the prototype of gather_stats() to take a struct numa_maps as argument instead of a generic void *. Update all callers to make the required type explicit. Since gather_stats() is not needed before its definition and is scheduled to be moved out of mempolicy.c the declaration is removed as well. Signed-off-by: NStephen Wilson <wilsons@start.ca> Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Stephen Wilson 提交于
Mapping statistics in a NUMA environment is now computed using the generic walk_page_range() logic. Remove the old/equivalent functionality. Signed-off-by: NStephen Wilson <wilsons@start.ca> Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Stephen Wilson 提交于
Converting show_numa_map() to use the generic routine decouples the function from mempolicy.c, allowing it to be moved out of the mm subsystem and into fs/proc. Also, include KSM pages in /proc/pid/numa_maps statistics. The pagewalk logic implemented by check_pte_range() failed to account for such pages as they were not applicable to the page migration case. Signed-off-by: NStephen Wilson <wilsons@start.ca> Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Stephen Wilson 提交于
In commit 48fce342 ("mempolicies: unexport get_vma_policy()") get_vma_policy() was marked static as all clients were local to mempolicy.c. However, the decision to generate /proc/pid/numa_maps in the numa memory policy code and outside the procfs subsystem introduces an artificial interdependency between the two systems. Exporting get_vma_policy() once again is the first step to clean up this interdependency. Signed-off-by: NStephen Wilson <wilsons@start.ca> Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Eric Paris 提交于
Implement generic xattrs for tmpfs filesystems. The Feodra project, while trying to replace suid apps with file capabilities, realized that tmpfs, which is used on the build systems, does not support file capabilities and thus cannot be used to build packages which use file capabilities. Xattrs are also needed for overlayfs. The xattr interface is a bit odd. If a filesystem does not implement any {get,set,list}xattr functions the VFS will call into some random LSM hooks and the running LSM can then implement some method for handling xattrs. SELinux for example provides a method to support security.selinux but no other security.* xattrs. As it stands today when one enables CONFIG_TMPFS_POSIX_ACL tmpfs will have xattr handler routines specifically to handle acls. Because of this tmpfs would loose the VFS/LSM helpers to support the running LSM. To make up for that tmpfs had stub functions that did nothing but call into the LSM hooks which implement the helpers. This new patch does not use the LSM fallback functions and instead just implements a native get/set/list xattr feature for the full security.* and trusted.* namespace like a normal filesystem. This means that tmpfs can now support both security.selinux and security.capability, which was not previously possible. The basic implementation is that I attach a: struct shmem_xattr { struct list_head list; /* anchored by shmem_inode_info->xattr_list */ char *name; size_t size; char value[0]; }; Into the struct shmem_inode_info for each xattr that is set. This implementation could easily support the user.* namespace as well, except some care needs to be taken to prevent large amounts of unswappable memory being allocated for unprivileged users. [mszeredi@suse.cz: new config option, suport trusted.*, support symlinks] Signed-off-by: NEric Paris <eparis@redhat.com> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Acked-by: NSerge Hallyn <serge.hallyn@ubuntu.com> Tested-by: NSerge Hallyn <serge.hallyn@ubuntu.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Acked-by: NHugh Dickins <hughd@google.com> Tested-by: NJordi Pujol <jordipujolp@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yinghai Lu 提交于
The bootmem wrapper with memblock supports top-down now, so we no longer need this trick. Signed-off-by: NYinghai LU <yinghai@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Olaf Hering <olaf@aepfle.de> Cc: Tejun Heo <tj@kernel.org> Cc: Lucas De Marchi <lucas.demarchi@profusion.mobi> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Rientjes 提交于
The page allocator will improperly return a page from ZONE_NORMAL even when __GFP_DMA is passed if CONFIG_ZONE_DMA is disabled. The caller expects DMA memory, perhaps for ISA devices with 16-bit address registers, and may get higher memory resulting in undefined behavior. This patch causes the page allocator to return NULL in such circumstances with a warning emitted to the kernel log on the first occurrence. Signed-off-by: NDavid Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Daniel Kiper 提交于
online_pages() is only compiled for CONFIG_MEMORY_HOTPLUG_SPARSE, so there is no need to support CONFIG_FLATMEM code within it. This patch removes code that is never used. Signed-off-by: NDaniel Kiper <dkiper@net-space.pl> Acked-by: NDave Hansen <dave@linux.vnet.ibm.com> Acked-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Minchan Kim 提交于
It's pointless that deactive_page's operates on unevictable pages. This patch removes unnecessary overhead which might be a bit problem in case that there are many unevictable page in system(ex, mprotect workload) [akpm@linux-foundation.org: tidy up comment] Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: NMinchan Kim <minchan.kim@gmail.com> Reviewed-by: Rik van Riel<riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wu Fengguang 提交于
Previously the mmap sequential readahead is triggered by updating ra->prev_pos on each page fault and compare it with current page offset. It costs dirtying the cache line on each _minor_ page fault. So remove the ra->prev_pos recording, and instead tag PG_readahead to trigger the possible sequential readahead. It's not only more simple, but also will work more reliably and reduce cache line bouncing on concurrent page faults on shared struct file. In the mosbench exim benchmark which does multi-threaded page faults on shared struct file, the ra->mmap_miss and ra->prev_pos updates are found to cause excessive cache line bouncing on tmpfs, which actually disabled readahead totally (shmem_backing_dev_info.ra_pages == 0). So remove the ra->prev_pos recording, and instead tag PG_readahead to trigger the possible sequential readahead. It's not only more simple, but also will work more reliably on concurrent reads on shared struct file. Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Tested-by: NTim Chen <tim.c.chen@intel.com> Reported-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andi Kleen 提交于
The original INT_MAX is too large, reduce it to - avoid unnecessarily dirtying/bouncing the cache line - restore mmap read-around faster on changed access pattern Background: in the mosbench exim benchmark which does multi-threaded page faults on shared struct file, the ra->mmap_miss updates are found to cause excessive cache line bouncing on tmpfs. The ra state updates are needless for tmpfs because it actually disabled readahead totally (shmem_backing_dev_info.ra_pages == 0). Tested-by: NTim Chen <tim.c.chen@intel.com> Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wu Fengguang 提交于
Reduce readahead overheads by returning early in do_sync_mmap_readahead(). tmpfs has ra_pages=0 and it can page fault really fast (not constraint by IO if not swapping). Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Tested-by: NTim Chen <tim.c.chen@intel.com> Reported-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ying Han 提交于
Change each shrinker's API by consolidating the existing parameters into shrink_control struct. This will simplify any further features added w/o touching each file of shrinker. [akpm@linux-foundation.org: fix build] [akpm@linux-foundation.org: fix warning] [kosaki.motohiro@jp.fujitsu.com: fix up new shrinker API] [akpm@linux-foundation.org: fix xfs warning] [akpm@linux-foundation.org: update gfs2] Signed-off-by: NYing Han <yinghan@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Minchan Kim <minchan.kim@gmail.com> Acked-by: NPavel Emelyanov <xemul@openvz.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Acked-by: NRik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ying Han 提交于
Consolidate the existing parameters to shrink_slab() into a new shrink_control struct. This is needed later to pass the same struct to shrinkers. Signed-off-by: NYing Han <yinghan@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Minchan Kim <minchan.kim@gmail.com> Acked-by: NPavel Emelyanov <xemul@openvz.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Acked-by: NRik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wu Fengguang 提交于
Pass __GFP_NORETRY|__GFP_NOWARN for readahead page allocations. readahead page allocations are completely optional. They are OK to fail and in particular shall not trigger OOM on themselves. Reported-by: NDave Young <hidave.darkstar@gmail.com> Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Reviewed-by: NMinchan Kim <minchan.kim@gmail.com> Reviewed-by: NPekka Enberg <penberg@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Arve Hjønnevåg 提交于
This fixes a problem where the first pageblock got marked MIGRATE_RESERVE even though it only had a few free pages. eg, On current ARM port, The kernel starts at offset 0x8000 to leave room for boot parameters, and the memory is freed later. This in turn caused no contiguous memory to be reserved and frequent kswapd wakeups that emptied the caches to get more contiguous memory. Unfortunatelly, ARM needs order-2 allocation for pgd (see arm/mm/pgd.c#pgd_alloc()). Therefore the issue is not minor nor easy avoidable. [kosaki.motohiro@jp.fujitsu.com: added some explanation] [kosaki.motohiro@jp.fujitsu.com: add !pfn_valid_within() to check] [minchan.kim@gmail.com: check end_pfn in pageblock_is_reserved] Signed-off-by: NJohn Stultz <john.stultz@linaro.org> Signed-off-by: NArve Hjønnevåg <arve@android.com> Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: NMel Gorman <mel@csn.ul.ie> Acked-by: NDave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Konstantin Khlebnikov 提交于
isolate_lru_page() must be called only with stable reference to the page, this is what is written in the comment above it, this is reasonable. current isolate_lru_page() users and its page extra reference sources: mm/huge_memory.c: __collapse_huge_page_isolate() - reference from pte mm/memcontrol.c: mem_cgroup_move_parent() - get_page_unless_zero() mem_cgroup_move_charge_pte_range() - reference from pte mm/memory-failure.c: soft_offline_page() - fixed, reference from get_any_page() delete_from_lru_cache() - reference from caller or get_page_unless_zero() [ seems like there bug, because __memory_failure() can call page_action() for hpages tail, but it is ok for isolate_lru_page(), tail getted and not in lru] mm/memory_hotplug.c: do_migrate_range() - fixed, get_page_unless_zero() mm/mempolicy.c: migrate_page_add() - reference from pte mm/migrate.c: do_move_page_to_node_array() - reference from follow_page() mlock.c: - various external references mm/vmscan.c: putback_lru_page() - reference from isolate_lru_page() It seems that all isolate_lru_page() users are ready now for this restriction. So, let's replace redundant get_page_unless_zero() with get_page() and add page initial reference count check with VM_BUG_ON() Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Konstantin Khlebnikov 提交于
Drop first page reference only after calling isolate_lru_page() to keep page stable reference while isolating. Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Konstantin Khlebnikov 提交于
isolate_lru_page() must be called only with stable reference to page. So, let's grab normal page reference. Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dave Hansen 提交于
I was tracking down a page allocation failure that ended up in vmalloc(). Since vmalloc() uses 0-order pages, if somebody asks for an insane amount of memory, we'll still get a warning with "order:0" in it. That's not very useful. During recovery, vmalloc() also nicely frees all of the memory that it got up to the point of the failure. That is wonderful, but it also quickly hides any issues. We have a much different sitation if vmalloc() repeatedly fails 10GB in to: vmalloc(100 * 1<<30); versus repeatedly failing 4096 bytes in to a: vmalloc(8192); This patch will print out messages that look like this: [ 68.123503] vmalloc: allocation failure, allocated 66805763 of 13426688 bytes [ 68.124218] bash: page allocation failure: order:0, mode:0xd2 [ 68.124811] Pid: 3770, comm: bash Not tainted 2.6.39-rc3-00082-g85f2e689-dirty #333 [ 68.125579] Call Trace: [ 68.125853] [<ffffffff810f6da6>] warn_alloc_failed+0x146/0x170 [ 68.126464] [<ffffffff8107e05c>] ? printk+0x6c/0x70 [ 68.126791] [<ffffffff8112b5d4>] ? alloc_pages_current+0x94/0xe0 [ 68.127661] [<ffffffff8111ed37>] __vmalloc_node_range+0x237/0x290 ... The 'order' variable is added for clarity when calling warn_alloc_failed() to avoid having an unexplained '0' as an argument. The 'tmp_mask' is because adding an open-coded '| __GFP_NOWARN' would take us over 80 columns for the alloc_pages_node() call. If we are going to add a line, it might as well be one that makes the sucker easier to read. As a side issue, I also noticed that ctl_ioctl() does vmalloc() based solely on an unverified value passed in from userspace. Granted, it's under CAP_SYS_ADMIN, but it still frightens me a bit. Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Cc: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dave Hansen 提交于
This originally started as a simple patch to give vmalloc() some more verbose output on failure on top of the plain page allocator messages. Johannes suggested that it might be nicer to lead with the vmalloc() info _before_ the page allocator messages. But, I do think there's a lot of value in what __alloc_pages_slowpath() does with its filtering and so forth. This patch creates a new function which other allocators can call instead of relying on the internal page allocator warnings. It also gives this function private rate-limiting which separates it from other printk_ratelimit() users. Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Cc: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 KOSAKI Motohiro 提交于
cpumask_t is very big struct and cpu_vm_mask is placed wrong position. It might lead to reduce cache hit ratio. This patch has two change. 1) Move the place of cpumask into last of mm_struct. Because usually cpumask is accessed only front bits when the system has cpu-hotplug capability 2) Convert cpu_vm_mask into cpumask_var_t. It may help to reduce memory footprint if cpumask_size() will use nr_cpumask_bits properly in future. In addition, this patch change the name of cpu_vm_mask with cpu_vm_mask_var. It may help to detect out of tree cpu_vm_mask users. This patch has no functional change. [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: David Howells <dhowells@redhat.com> Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> Cc: Hugh Dickins <hughd@google.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrea Arcangeli 提交于
We don't need to hold the mmmap_sem through mem_cgroup_newpage_charge(), the mmap_sem is only hold for keeping the vma stable and we don't need the vma stable anymore after we return from alloc_hugepage_vma(). Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Peter Zijlstra 提交于
Some of these functions have grown beyond inline sanity, move them out-of-line. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Requested-by: NAndrew Morton <akpm@linux-foundation.org> Requested-by: NHugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Peter Zijlstra 提交于
Optimize the page_lock_anon_vma() fast path to be one atomic op, instead of two. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Tony Luck <tony.luck@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Namhyung Kim <namhyung@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Peter Zijlstra 提交于
Straightforward conversion of anon_vma->lock to a mutex. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: NHugh Dickins <hughd@google.com> Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Tony Luck <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Namhyung Kim <namhyung@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Peter Zijlstra 提交于
Convert page_lock_anon_vma() over to use refcounts. This is done to prepare for the conversion of anon_vma from spinlock to mutex. Sadly this inceases the cost of page_lock_anon_vma() from one to two atomics, a follow up patch addresses this, lets keep that simple for now. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: NHugh Dickins <hughd@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Tony Luck <tony.luck@intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Namhyung Kim <namhyung@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Peter Zijlstra 提交于
A slightly more verbose comment to go along with the trickery in page_lock_anon_vma(). Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: NMel Gorman <mel@csn.ul.ie> Acked-by: NHugh Dickins <hughd@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Tony Luck <tony.luck@intel.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Namhyung Kim <namhyung@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Peter Zijlstra 提交于
Its beyond ugly and gets in the way. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: NHugh Dickins <hughd@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Tony Luck <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Namhyung Kim <namhyung@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Namhyung Kim <namhyung@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Peter Zijlstra 提交于
Straightforward conversion of i_mmap_lock to a mutex. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: NHugh Dickins <hughd@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Tony Luck <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Namhyung Kim <namhyung@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-