提交 · 888a589f6be07d624e21e2174d98375e9f95911b · openanolis / cloud-kernel

18 5月, 2009 1 次提交

mm, x86: remove MEMORY_HOTPLUG_RESERVE related code · 888a589f

由 Yinghai Lu 提交于 5月 15, 2009

after:

 | commit b263295d
 | Author: Christoph Lameter <clameter@sgi.com>
 | Date:   Wed Jan 30 13:30:47 2008 +0100
 |
 |    x86: 64-bit, make sparsemem vmemmap the only memory model

we don't have MEMORY_HOTPLUG_RESERVE anymore.

Historically, x86-64 had an architecture-specific method for memory hotplug
whereby it scanned the SRAT for physical memory ranges that could be
potentially used for memory hot-add later. By reserving those ranges
without physical memory, the memmap would be allocated and left dormant
until needed. This depended on the DISCONTIG memory model which has been
removed so the code implementing HOTPLUG_RESERVE is now dead.

This patch removes the dead code used by MEMORY_HOTPLUG_RESERVE.

(Changelog authored by Mel.)

v2: updated changelog, and remove hotadd= in doc

[ Impact: remove dead code ]
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Reviewed-by: NChristoph Lameter <cl@linux-foundation.org>
Reviewed-by: NMel Gorman <mel@csn.ul.ie>
Workflow-found-OK-by: NAndrew Morton <akpm@linux-foundation.org>
LKML-Reference: <4A0C4910.7090508@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

888a589f

15 5月, 2009 1 次提交

Revert "mm: add /proc controls for pdflush threads" · cd17cbfd

由 Jens Axboe 提交于 5月 15, 2009

This reverts commit fafd688e.

Work is progressing to switch away from pdflush as the process backing
for flushing out dirty data. So it seems pointless to add more knobs
to control pdflush threads. The original author of the patch did not
have any specific use cases for adding the knobs, so we can easily
revert this before 2.6.30 to avoid having to maintain this API
forever.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

cd17cbfd

13 5月, 2009 1 次提交

Revert "Ignore madvise(MADV_WILLNEED) for hugetlbfs-backed regions" · 0f181328

由 Linus Torvalds 提交于 5月 13, 2009

This reverts commit a425a638.

Now that the previous commit removed the "readpage" actor for hugetlb
files, read-ahead will no longer mess up the mapping, and there's no
longer any reason to treat hugetlbfs mappings specially.
Tested-and-acked-by: NMel Gorman <mel@csn.ul.ie>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0f181328

08 5月, 2009 1 次提交

NOMMU: Don't check vm_region::vm_start is page aligned in add_nommu_region() · 8c9ed899

由 David Howells 提交于 5月 07, 2009

Don't check vm_region::vm_start is page aligned in add_nommu_region() because
the region may reflect some non-page-aligned mapped file, such as could be
obtained from RomFS XIP.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NGreg Ungerer <gerg@uclinux.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8c9ed899

07 5月, 2009 5 次提交

nommu: make the initial mmap allocation excess behaviour Kconfig configurable · fc4d5c29

由 David Howells 提交于 5月 06, 2009

NOMMU mmap() has an option controlled by a sysctl variable that determines
whether the allocations made by do_mmap_private() should have the excess
space trimmed off and returned to the allocator.  Make the initial setting
of this variable a Kconfig configuration option.

The reason there can be excess space is that the allocator only allocates
in power-of-2 size chunks, but mmap()'s can be made in sizes that aren't a
power of 2.

There are two alternatives:

 (1) Keep the excess as dead space.  The dead space then remains unused for the
     lifetime of the mapping.  Mappings of shared objects such as libc, ld.so
     or busybox's text segment may retain their dead space forever.

 (2) Return the excess to the allocator.  This means that the dead space is
     limited to less than a page per mapping, but it means that for a transient
     process, there's more chance of fragmentation as the excess space may be
     reused fairly quickly.

During the boot process, a lot of transient processes are created, and
this can cause a lot of fragmentation as the pagecache and various slabs
grow greatly during this time.

By turning off the trimming of excess space during boot and disabling
batching of frees, Coldfire can manage to boot.

A better way of doing things might be to have /sbin/init turn this option
off.  By that point libc, ld.so and init - which are all long-duration
processes - have all been loaded and trimmed.
Reported-by: NLanttor Guo <lanttor.guo@freescale.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Tested-by: NLanttor Guo <lanttor.guo@freescale.com>
Cc: Greg Ungerer <gerg@snapgear.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fc4d5c29

nommu: clamp zone_batchsize() to 0 under NOMMU conditions · 3a6be87f

由 David Howells 提交于 5月 06, 2009

Clamp zone_batchsize() to 0 under NOMMU conditions to stop
free_hot_cold_page() from queueing and batching frees.

The problem is that under NOMMU conditions it is really important to be
able to allocate large contiguous chunks of memory, but when munmap() or
exit_mmap() releases big stretches of memory, return of these to the buddy
allocator can be deferred, and when it does finally happen, it can be in
small chunks.

Whilst the fragmentation this incurs isn't so much of a problem under MMU
conditions as userspace VM is glued together from individual pages with
the aid of the MMU, it is a real problem if there isn't an MMU.

By clamping the page freeing queue size to 0, pages are returned to the
allocator immediately, and the buddy detector is more likely to be able to
glue them together into large chunks immediately, and fragmentation is
less likely to occur.

By disabling batching of frees, and by turning off the trimming of excess
space during boot, Coldfire can manage to boot.
Reported-by: NLanttor Guo <lanttor.guo@freescale.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Tested-by: NLanttor Guo <lanttor.guo@freescale.com>
Cc: Greg Ungerer <gerg@snapgear.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3a6be87f

mm: use roundown_pow_of_two() in zone_batchsize() · 9155203a

由 David Howells 提交于 5月 06, 2009

Use roundown_pow_of_two(N) in zone_batchsize() rather than (1 <<
(fls(N)-1)) as they are equivalent, and with the former it is easier to
see what is going on.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Tested-by: NLanttor Guo <lanttor.guo@freescale.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9155203a

alloc_vmap_area: fix memory leak · 2498ce42

由 Ralph Wuerthner 提交于 5月 06, 2009

If alloc_vmap_area() fails the allocated struct vmap_area has to be freed.
Signed-off-by: NRalph Wuerthner <ralphw@linux.vnet.ibm.com>
Reviewed-by: NChristoph Lameter <cl@linux-foundation.org>
Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2498ce42

oom: prevent livelock when oom_kill_allocating_task is set · 184101bf

由 David Rientjes 提交于 5月 06, 2009

When /proc/sys/vm/oom_kill_allocating_task is set for large systems that
want to avoid the lengthy tasklist scan, it's possible to livelock if
current is ineligible for oom kill.  This normally happens when it is set
to OOM_DISABLE, but is also possible if any threads are sharing the same
->mm with a different tgid.

So change __out_of_memory() to fall back to the full task-list scan if it
was unable to kill `current'.

Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

184101bf

06 5月, 2009 1 次提交

Ignore madvise(MADV_WILLNEED) for hugetlbfs-backed regions · a425a638

由 Mel Gorman 提交于 5月 05, 2009

madvise(MADV_WILLNEED) forces page cache readahead on a range of memory
backed by a file.  The assumption is made that the page required is
order-0 and "normal" page cache.

On hugetlbfs, this assumption is not true and order-0 pages are
allocated and inserted into the hugetlbfs page cache.  This leaks
hugetlbfs page reservations and can cause BUGs to trigger related to
corrupted page tables.

This patch causes MADV_WILLNEED to be ignored for hugetlbfs-backed
regions.
Signed-off-by: NMel Gorman <mel@csn.ul.ie>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a425a638

03 5月, 2009 6 次提交

vmscan: avoid multiplication overflow in shrink_zone() · 8713e012

由 Andrew Morton 提交于 4月 30, 2009

Local variable `scan' can overflow on zones which are larger than

	(2G * 4k) / 100 = 80GB.

Making it 64-bit on 64-bit will fix that up.

Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8713e012

mm: fix Committed_AS underflow on large NR_CPUS environment · 00a62ce9

由 KOSAKI Motohiro 提交于 4月 30, 2009

The Committed_AS field can underflow in certain situations:

>         # while true; do cat /proc/meminfo  | grep _AS; sleep 1; done | uniq -c
>               1 Committed_AS: 18446744073709323392 kB
>              11 Committed_AS: 18446744073709455488 kB
>               6 Committed_AS:    35136 kB
>               5 Committed_AS: 18446744073709454400 kB
>               7 Committed_AS:    35904 kB
>               3 Committed_AS: 18446744073709453248 kB
>               2 Committed_AS:    34752 kB
>               9 Committed_AS: 18446744073709453248 kB
>               8 Committed_AS:    34752 kB
>               3 Committed_AS: 18446744073709320960 kB
>               7 Committed_AS: 18446744073709454080 kB
>               3 Committed_AS: 18446744073709320960 kB
>               5 Committed_AS: 18446744073709454080 kB
>               6 Committed_AS: 18446744073709320960 kB

Because NR_CPUS can be greater than 1000 and meminfo_proc_show() does
not check for underflow.

But NR_CPUS proportional isn't good calculation.  In general,
possibility of lock contention is proportional to the number of online
cpus, not theorical maximum cpus (NR_CPUS).

The current kernel has generic percpu-counter stuff.  using it is right
way.  it makes code simplify and percpu_counter_read_positive() don't
make underflow issue.
Reported-by: NDave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Eric B Munson <ebmunson@us.ibm.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: <stable@kernel.org>		[All kernel versions]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

00a62ce9

memcg: fix mem_cgroup_shrink_usage() · ae3abae6

由 Daisuke Nishimura 提交于 4月 30, 2009

Current mem_cgroup_shrink_usage() has two problems.

1. It doesn't call mem_cgroup_out_of_memory and doesn't update
   last_oom_jiffies, so pagefault_out_of_memory invokes global OOM.

2. Considering hierarchy, shrinking has to be done from the
   mem_over_limit, not from the memcg which the page would be charged to.

mem_cgroup_try_charge_swapin() does all of these things properly, so we
use it and call cancel_charge_swapin when it succeeded.

The name of "shrink_usage" is not appropriate for this behavior, so we
change it too.
Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.cn>
Cc: Paul Menage <menage@google.com>
Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ae3abae6

mm: close page_mkwrite races · b827e496

由 Nick Piggin 提交于 4月 30, 2009

Change page_mkwrite to allow implementations to return with the page
locked, and also change it's callers (in page fault paths) to hold the
lock until the page is marked dirty.  This allows the filesystem to have
full control of page dirtying events coming from the VM.

Rather than simply hold the page locked over the page_mkwrite call, we
call page_mkwrite with the page unlocked and allow callers to return with
it locked, so filesystems can avoid LOR conditions with page lock.

The problem with the current scheme is this: a filesystem that wants to
associate some metadata with a page as long as the page is dirty, will
perform this manipulation in its ->page_mkwrite.  It currently then must
return with the page unlocked and may not hold any other locks (according
to existing page_mkwrite convention).

In this window, the VM could write out the page, clearing page-dirty.  The
filesystem has no good way to detect that a dirty pte is about to be
attached, so it will happily write out the page, at which point, the
filesystem may manipulate the metadata to reflect that the page is no
longer dirty.

It is not always possible to perform the required metadata manipulation in
->set_page_dirty, because that function cannot block or fail.  The
filesystem may need to allocate some data structure, for example.

And the VM cannot mark the pte dirty before page_mkwrite, because
page_mkwrite is allowed to fail, so we must not allow any window where the
page could be written to if page_mkwrite does fail.

This solution of holding the page locked over the 3 critical operations
(page_mkwrite, setting the pte dirty, and finally setting the page dirty)
closes out races nicely, preventing page cleaning for writeout being
initiated in that window.  This provides the filesystem with a strong
synchronisation against the VM here.

- Sage needs this race closed for ceph filesystem.
- Trond for NFS (http://bugzilla.kernel.org/show_bug.cgi?id=12913).
- I need it for fsblock.
- I suspect other filesystems may need it too (eg. btrfs).
- I have converted buffer.c to the new locking. Even simple block allocation
  under dirty pages might be susceptible to i_size changing under partial page
  at the end of file (we also have a buffer.c-side problem here, but it cannot
  be fixed properly without this patch).
- Other filesystems (eg. NFS, maybe btrfs) will need to change their
  page_mkwrite functions themselves.

[ This also moves page_mkwrite another step closer to fault, which should
  eventually allow page_mkwrite to be moved into ->fault, and thus avoiding a
  filesystem calldown and page lock/unlock cycle in __do_fault. ]

[akpm@linux-foundation.org: fix derefs of NULL ->mapping]
Cc: Sage Weil <sage@newdream.net>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b827e496

memcg: fix try_get_mem_cgroup_from_swapcache() · c0bd3f63

由 Daisuke Nishimura 提交于 4月 30, 2009

This is a bugfix for commit 3c776e64
("memcg: charge swapcache to proper memcg").

Used bit of swapcache is solid under page lock, but considering
move_account, pc->mem_cgroup is not.

We need lock_page_cgroup() anyway.
Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c0bd3f63

mm: fix pageref leak in do_swap_page() · bc43f75c

由 Johannes Weiner 提交于 4月 30, 2009

By the time the memory cgroup code is notified about a swapin we
already hold a reference on the fault page.

If the cgroup callback fails make sure to unlock AND release the page
reference which was taken by lookup_swap_cach(), or we leak the reference.
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bc43f75c

22 4月, 2009 1 次提交

vmscan,memcg: reintroduce sc->may_swap · 2e2e4259

由 KOSAKI Motohiro 提交于 4月 21, 2009

Commit a6dc60f8 ("vmscan: rename
sc.may_swap to may_unmap") removed the may_swap flag, but memcg had used
it as a flag for "we need to use swap?", as the name indicate.

And in the current implementation, memcg cannot reclaim mapped file
caches when mem+swap hits the limit.

re-introduce may_swap flag and handle it at get_scan_ratio().  This
patch doesn't influence any scan_control users other than memcg.
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2e2e4259

19 4月, 2009 1 次提交

PM/Hibernate: Fix memory shrinking · a21e2553

由 Rafael J. Wysocki 提交于 4月 18, 2009

Commit d979677c ("mm: shrink_all_memory(): use sc.nr_reclaimed")
broke the memory shrinking used by hibernation, becuse it did not update
shrink_all_zones() in accordance with the other changes it made.

Fix this by making shrink_all_zones() update sc->nr_reclaimed instead of
overwriting its value.

This fixes http://bugzilla.kernel.org/show_bug.cgi?id=13058Reported-and-tested-by: NAlan Jenkins <alan-jenkins@tuffmail.co.uk>
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a21e2553

17 4月, 2009 1 次提交

mm: pass correct mm when growing stack · 05fa199d

由 Hugh Dickins 提交于 4月 16, 2009

Tetsuo Handa reports seeing the WARN_ON(current->mm == NULL) in
security_vm_enough_memory(), when do_execve() is touching the
target mm's stack, to set up its args and environment.

Yes, a UMH_NO_WAIT or UMH_WAIT_PROC call_usermodehelper() spawns
an mm-less kernel thread to do the exec.  And in any case, that
vm_enough_memory check when growing stack ought to be done on the
target mm, not on the execer's mm (though apart from the warning,
it only makes a slight tweak to OVERCOMMIT_NEVER behaviour).
Reported-by: NTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

05fa199d

16 4月, 2009 1 次提交

Export filemap_write_and_wait_range · f6995585

由 Chris Mason 提交于 4月 15, 2009

This wasn't exported before and is useful (used by the experimental ext3
data=guarded code)
Signed-off-by: NChris Mason <chris.mason@oracle.com>
Acked-by: NTheodore Tso <tytso@mit.edu>
Acked-by: NJan Kara <jack@suse.cz>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f6995585

14 4月, 2009 6 次提交

shmem: respect MAX_LFS_FILESIZE · caefba17

由 Hugh Dickins 提交于 4月 13, 2009

SHMEM_MAX_BYTES was derived from the maximum size of its triple-indirect
swap vector, forgetting to take the MAX_LFS_FILESIZE limit into account.
Never mind 256kB pages, even 8kB pages on 32-bit kernels allowed files to
grow slightly bigger than that supposed maximum.

Fix this by using the min of both (at build time not run time).  And it
happens that this calculation is good as far as 8MB pages on 32-bit or
16MB pages on 64-bit: though SHMSWP_MAX_INDEX gets truncated before that,
it's truncated to such large numbers that we don't need to care.

[akpm@linux-foundation.org: it needs pagemap.h]
[akpm@linux-foundation.org: fix sparc64 min() warnings]
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Cc: Yuri Tikhonov <yur@emcraft.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

caefba17

shmem: fix division by zero · 61609d01

由 Yuri Tikhonov 提交于 4月 13, 2009

Fix a division by zero which we have in shmem_truncate_range() and
shmem_unuse_inode() when using big PAGE_SIZE values (e.g.  256kB on
ppc44x).

With 256kB PAGE_SIZE, the ENTRIES_PER_PAGEPAGE constant becomes too large
(0x1.0000.0000) on a 32-bit kernel, so this patch just changes its type
from 'unsigned long' to 'unsigned long long'.

Hugh: reverted its unsigned long longs in shmem_truncate_range() and
shmem_getpage(): the pagecache index cannot be more than an unsigned long,
so the divisions by zero occurred in unreached code.  It's a pity we need
any ULL arithmetic here, but I found no pretty way to avoid it.
Signed-off-by: NYuri Tikhonov <yur@emcraft.com>
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

61609d01

memcg: remove warning when CONFIG_DEBUG_VM=n · a8031cb0

由 KAMEZAWA Hiroyuki 提交于 4月 13, 2009

mm/memcontrol.c:318: warning: `mem_cgroup_is_obsolete' defined but not used

[akpm@linux-foundation.org: simplify as suggested by Balbir]
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a8031cb0

mm: document get_user_pages_fast() · 9de100d0

由 Andy Grover 提交于 4月 13, 2009

While better than get_user_pages(), the usage of gupf(), especially the
return values and the fact that it can potentially only partially pin the
range, warranted some documentation.
Signed-off-by: NAndy Grover <andy.grover@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9de100d0

mm: point the UNEVICTABLE_LRU config option at the documentation · 5a52edde

由 David Howells 提交于 4月 13, 2009

Point the UNEVICTABLE_LRU config option at the documentation describing
the option.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5a52edde

filemap: fix kernel-doc warnings · 697f619f

由 Randy Dunlap 提交于 4月 13, 2009

Fix filemap.c kernel-doc warnings:

Warning(mm/filemap.c:575): No description found for parameter 'page'
Warning(mm/filemap.c:575): No description found for parameter 'waiter'
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

697f619f

07 4月, 2009 3 次提交

mm: add /proc controls for pdflush threads · fafd688e

由 Peter W Morreale 提交于 4月 06, 2009

Add /proc entries to give the admin the ability to control the minimum and
maximum number of pdflush threads.  This allows finer control of pdflush
on both large and small machines.

The rationale is simply one size does not fit all.  Admins on large and/or
small systems may want to tune the min/max pdflush thread count to best
suit their needs.  Right now the min/max is hardcoded to 2/8.  While
probably a fair estimate for smaller machines, large machines with large
numbers of CPUs and large numbers of filesystems/block devices may benefit
from larger numbers of threads working on different block devices.

Even if the background flushing algorithm is radically changed, it is
still likely that multiple threads will be involved and admins would still
desire finer control on the min/max other than to have to recompile the
kernel.

The patch adds '/proc/sys/vm/nr_pdflush_threads_min' and
'/proc/sys/vm/nr_pdflush_threads_max' with r/w permissions.

The minimum value for nr_pdflush_threads_min is 1 and the maximum value is
the current value of nr_pdflush_threads_max.  This minimum is required
since additional thread creation is performed in a pdflush thread itself.

The minimum value for nr_pdflush_threads_max is the current value of
nr_pdflush_threads_min and the maximum value can be 1000.

Documentation/sysctl/vm.txt is also updated.

[akpm@linux-foundation.org: fix comment, fix whitespace, use __read_mostly]
Signed-off-by: NPeter W Morreale <pmorreale@novell.com>
Reviewed-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fafd688e

mm: fix pdflush thread creation upper bound · a56ed663

由 Peter W Morreale 提交于 4月 06, 2009

Fix a race on creating pdflush threads. Without the patch, it is possible
to create more than MAX_PDFLUSH_THREADS threads, and this has been
observed in practice on IO loaded SMP machines.

The fix involves moving the lock around to protect the check against the
thread count and correctly dealing with thread creation failure.

This fix also _mostly_ repairs a race condition on how quickly the threads
are created. The original intent was to create a pdflush thread (up to
the max allowed) every second. Without this patch is is possible to
create NCPUS pdflush threads concurrently. The 'mostly' caveat is because
an assumption is made that thread creation will be successful. If we fail
to create the thread, the miss is not considered fatal. (we will try
again in 1 second)
Signed-off-by: NPeter W Morreale <pmorreale@novell.com>
Reviewed-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a56ed663

percpu: __percpu_depopulate_mask can take a const mask · 5d6700ea

由 Stephen Rothwell 提交于 4月 06, 2009

This eliminates a compiler warning:

mm/allocpercpu.c: In function 'free_percpu':
mm/allocpercpu.c:146: warning: passing argument 2 of '__percpu_depopulate_mask' discards qualifiers from pointer target type
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5d6700ea

06 4月, 2009 1 次提交

block: change the request allocation/congestion logic to be sync/async based · 1faa16d2

由 Jens Axboe 提交于 4月 06, 2009

This makes sure that we never wait on async IO for sync requests, instead
of doing the split on writes vs reads.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1faa16d2

04 4月, 2009 1 次提交

Staging: pohmelfs: kconfig/makefile and vfs changes. · 18bc0bbd

由 Evgeniy Polyakov 提交于 2月 09, 2009

This patch adds Kconfig and Makefile entries and exports to
VFS functions to be used by POHMELFS.
Signed-off-by: NEvgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

18bc0bbd

03 4月, 2009 9 次提交

CacheFiles: Permit the page lock state to be monitored · 385e1ca5

由 David Howells 提交于 4月 03, 2009

Add a function to install a monitor on the page lock waitqueue for a particular
page, thus allowing the page being unlocked to be detected.

This is used by CacheFiles to detect read completion on a page in the backing
filesystem so that it can then copy the data to the waiting netfs page.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NSteve Dickson <steved@redhat.com>
Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: NRik van Riel <riel@redhat.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Tested-by: NDaire Byrne <Daire.Byrne@framestore.com>

385e1ca5

FS-Cache: Recruit a page flags for cache management · 266cf658

由 David Howells 提交于 4月 03, 2009

Recruit a page flag to aid in cache management.  The following extra flag is
defined:

 (1) PG_fscache (PG_private_2)

     The marked page is backed by a local cache and is pinning resources in the
     cache driver.

If PG_fscache is set, then things that checked for PG_private will now also
check for that.  This includes things like truncation and page invalidation.
The function page_has_private() had been added to make the checks for both
PG_private and PG_private_2 at the same time.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NSteve Dickson <steved@redhat.com>
Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: NRik van Riel <riel@redhat.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Tested-by: NDaire Byrne <Daire.Byrne@framestore.com>

266cf658

FS-Cache: Release page->private after failed readahead · 03fb3d2a

由 David Howells 提交于 4月 03, 2009

The attached patch causes read_cache_pages() to release page-private data on a
page for which add_to_page_cache() fails.  If the filler function fails, then
the problematic page is left attached to the pagecache (with appropriate flags
set, one presumes) and the remaining to-be-attached pages are invalidated and
discarded.  This permits pages with caching references associated with them to
be cleaned up.

The invalidatepage() address space op is called (indirectly) to do the honours.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NSteve Dickson <steved@redhat.com>
Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: NRik van Riel <riel@redhat.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Tested-by: NDaire Byrne <Daire.Byrne@framestore.com>

03fb3d2a

kmemtrace: trace kfree() calls with NULL or zero-length objects · 2121db74

由 Pekka Enberg 提交于 3月 25, 2009

Impact: also output kfree(NULL) entries

This patch moves the trace_kfree() calls before the ZERO_OR_NULL_PTR
check so that we can trace call-sites that call kfree() with NULL many
times which might be an indication of a bug.
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
LKML-Reference: <1237971957.30175.18.camel@penberg-laptop>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

2121db74

kmemtrace: use tracepoints · ca2b84cb

由 Eduard - Gabriel Munteanu 提交于 3月 23, 2009

kmemtrace now uses tracepoints instead of markers. We no longer need to
use format specifiers to pass arguments.
Signed-off-by: NEduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
[ folded: Use the new TP_PROTO and TP_ARGS to fix the build.     ]
[ folded: fix build when CONFIG_KMEMTRACE is disabled.           ]
[ folded: define tracepoints when CONFIG_TRACEPOINTS is enabled. ]
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
LKML-Reference: <ae61c0f37156db8ec8dc0d5778018edde60a92e3.1237813499.git.eduard.munteanu@linux360.ro>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ca2b84cb

kmemtrace, mm: fix slab.h dependency problem in mm/failslab.c · 255d11bc

由 Pekka Enberg 提交于 3月 24, 2009

Impact: cleanup

mm/failslab.c depends on slab.h without including it:

    CC      mm/failslab.o
  mm/failslab.c: In function ‘should_failslab’:
  mm/failslab.c:16: error: ‘__GFP_NOFAIL’ undeclared (first use in this function)
  mm/failslab.c:16: error: (Each undeclared identifier is reported only once
  mm/failslab.c:16: error: for each function it appears in.)
  mm/failslab.c:19: error: ‘__GFP_WAIT’ undeclared (first use in this function)
  make[1]: *** [mm/failslab.o] Error 1
  make: *** [mm] Error 2

It gets included implicitly currently - but this will not be the
case with upcoming kmemtrace changes.
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
LKML-Reference: <1237888761.25315.69.camel@penberg-laptop>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

255d11bc

memcg: cleanup cache_charge · 83aae4c7

由 Daisuke Nishimura 提交于 4月 02, 2009

Current mem_cgroup_cache_charge is a bit complicated especially
in the case of shmem's swap-in.

This patch cleans it up by using try_charge_swapin and commit_charge_swapin.
Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

83aae4c7

memcg: remove redundant message at swapon · 627991a2

由 KAMEZAWA Hiroyuki 提交于 4月 02, 2009

It's pointed out that swap_cgroup's message at swapon() is nonsense.
Because

  * It can be calculated very easily if all necessary information is
    written in Kconfig.

  * It's not necessary to annoying people at every swapon().

In other view, now, memory usage per swp_entry is reduced to 2bytes from
8bytes(64bit) and I think it's reasonably small.
Reported-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

627991a2

cgroups: use css id in swap cgroup for saving memory v5 · a3b2d692

由 KAMEZAWA Hiroyuki 提交于 4月 02, 2009

Try to use CSS ID for records in swap_cgroup.  By this, on 64bit machine,
size of swap_cgroup goes down to 2 bytes from 8bytes.

This means, when 2GB of swap is equipped, (assume the page size is 4096bytes)

	From size of swap_cgroup = 2G/4k * 8 = 4Mbytes.
	To   size of swap_cgroup = 2G/4k * 2 = 1Mbytes.

Reduction is large.  Of course, there are trade-offs.  This CSS ID will
add overhead to swap-in/swap-out/swap-free.

But in general,
  - swap is a resource which the user tend to avoid use.
  - If swap is never used, swap_cgroup area is not used.
  - Reading traditional manuals, size of swap should be proportional to
    size of memory. Memory size of machine is increasing now.

I think reducing size of swap_cgroup makes sense.

Note:
  - ID->CSS lookup routine has no locks, it's under RCU-Read-Side.
  - memcg can be obsolete at rmdir() but not freed while refcnt from
    swap_cgroup is available.

Changelog v4->v5:
 - reworked on to memcg-charge-swapcache-to-proper-memcg.patch
Changlog ->v4:
 - fixed not configured case.
 - deleted unnecessary comments.
 - fixed NULL pointer bug.
 - fixed message in dmesg.

[nishimura@mxp.nes.nec.co.jp: css_tryget can be called twice in !PageCgroupUsed case]
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a3b2d692

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功