提交 · cc4a6851466039a8a688c843962a05689059ff3b · Linux-御风守护者 / linux

12 11月, 2009 1 次提交

page allocator: always wake kswapd when restarting an allocation attempt after... · cc4a6851

由 Mel Gorman 提交于 11月 11, 2009

page allocator: always wake kswapd when restarting an allocation attempt after direct reclaim failed

If a direct reclaim makes no forward progress, it considers whether it
should go OOM or not.  Whether OOM is triggered or not, it may retry the
allocation afterwards.  In times past, this would always wake kswapd as
well but currently, kswapd is not woken up after direct reclaim fails.
For order-0 allocations, this makes little difference but if there is a
heavy mix of higher-order allocations that direct reclaim is failing for,
it might mean that kswapd is not rewoken for higher orders as much as it
did previously.

This patch wakes up kswapd when an allocation is being retried after a
direct reclaim failure.  It would be expected that kswapd is already
awake, but this has the effect of telling kswapd to reclaim at the higher
order as well.
Signed-off-by: NMel Gorman <mel@csn.ul.ie>
Reviewed-by: NChristoph Lameter <cl@linux-foundation.org>
Reviewed-by: NPekka Enberg <penberg@cs.helsinki.fi>
Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cc4a6851

10 11月, 2009 3 次提交

highmem: Fix debug_kmap_atomic() to also handle KM_IRQ_PTE, KM_NMI, and KM_NMI_PTE · d4515646

由 Soeren Sandmann 提交于 10月 28, 2009

Previously calling debug_kmap_atomic() with these types would
cause spurious warnings.

(triggered by SysProf using perf events)
Signed-off-by: NSoeren Sandmann Pedersen <sandmann@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: a.p.zijlstra@chello.nl
Cc: <stable@kernel.org> # .31.x
LKML-Reference: <ye8vdhz8krw.fsf@camel23.daimi.au.dk>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d4515646

highmem: Fix race in debug_kmap_atomic() which could cause warn_count to underflow · 5ebd4c22

由 Soeren Sandmann 提交于 10月 28, 2009

debug_kmap_atomic() tries to prevent ever printing more than 10
warnings, but it does so by testing whether an unsigned integer
is equal to 0. However, if the warning is caused by a nested
IRQ, then this counter may underflow and the stream of warnings
will never end.

Fix that by using a signed integer instead.
Signed-off-by: NSoeren Sandmann Pedersen <sandmann@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: a.p.zijlstra@chello.nl
Cc: <stable@kernel.org> # .31.x
LKML-Reference: <ye8zl7b8ktj.fsf@camel23.daimi.au.dk>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

5ebd4c22

ksm: cond_resched in unstable tree · d178f27f

由 Hugh Dickins 提交于 11月 09, 2009

KSM needs a cond_resched() for CONFIG_PREEMPT_NONE, in its unbounded
search of the unstable tree.  The stable tree cases already have one,
and originally there was one down inside get_user_pages();
but I missed it when I converted to follow_page() instead.
Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
Acked-by: NIzik Eidus <ieidus@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d178f27f

04 11月, 2009 1 次提交

backing-dev: bdi sb prune should be in the unregister path, not destroy · 8c4db335

由 Jens Axboe 提交于 11月 03, 2009

Commit 592b09a4 was different from
the tested path, in that it moved the bdi super_block prune from
unregister to destroy context. This doesn't fully fix the sync hang
bug on unexpected device removal, as need to prune the bdi cache
pointer before killing flusher thread.
Tested-by: NArtur Skawina <art.08.09@gmail.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

8c4db335

03 11月, 2009 1 次提交

mm: remove incorrect swap_count() from try_to_unuse() · 32c5fc10

由 Bo Liu 提交于 11月 02, 2009

In try_to_unuse(), swcount is a local copy of *swap_map, including the
SWAP_HAS_CACHE bit; but a wrong comparison against swap_count(*swap_map),
which masks off the SWAP_HAS_CACHE bit, succeeded where it should fail.

That had the effect of resetting the mm from which to start searching
for the next swap page, to an irrelevant mm instead of to an mm in which
this swap page had been found: which may increase search time by ~20%.
But we're used to swapoff being slow, so never noticed the slowdown.

Remove that one spurious use of swap_count(): Bo Liu thought it merely
redundant, Hugh rewrote the description since it was measurably wrong.
Signed-off-by: NBo Liu <bo-liu@hotmail.com>
Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

32c5fc10

01 11月, 2009 1 次提交

NOMMU: Don't pass NULL pointers to fput() in do_mmap_pgoff() · 89a86402

由 David Howells 提交于 10月 30, 2009

Don't pass NULL pointers to fput() in the error handling paths of the NOMMU
do_mmap_pgoff() as it can't handle it.

The following can be used as a test program:

	int main() { static long long a[1024 * 1024 * 20] = { 0 }; return a;}

Without the patch, the code oopses in atomic_long_dec_and_test() as called by
fput() after the kernel complains that it can't allocate that big a chunk of
memory.  With the patch, the kernel just complains about the allocation size
and then the program segfaults during execve() as execve() can't complete the
allocation of all the new ELF program segments.
Reported-by: NRobin Getz <rgetz@blackfin.uclinux.org>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NRobin Getz <rgetz@blackfin.uclinux.org>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

89a86402

29 10月, 2009 10 次提交

mm: don't call pte_unmap() against an improper pte · c36987e2

由 Daisuke Nishimura 提交于 10月 26, 2009

There are some places where we do like:

	pte = pte_map();
	do {
		(do break in some conditions)
	} while (pte++, ...);
	pte_unmap(pte - 1);

But if the loop breaks at the first loop, pte_unmap() unmaps invalid pte.

This patch is a fix for this problem.
Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Reviewd-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c36987e2

mm: fix sparsemem configuration · 1a83e175

由 Russell King 提交于 10月 26, 2009

Currently, sparsemem is only available if EXPERIMENTAL is enabled.
However, it hasn't ever been marked experimental.

It's been about four years since sparsemem was merged, and we have
platforms which depend on it; allow architectures to decide whether
sparsemem should be the default memory model.
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1a83e175

vmscan: order evictable rescue in LRU putback · 6a7b9548

由 Johannes Weiner 提交于 10月 26, 2009

Isolators putting a page back to the LRU do not hold the page lock, and if
the page is mlocked, another thread might munlock it concurrently.

Expecting this, the putback code re-checks the evictability of a page when
it just moved it to the unevictable list in order to correct its decision.

The problem, however, is that ordering is not garuanteed between setting
PG_lru when moving the page to the list and checking PG_mlocked
afterwards:

	#0:				#1

	spin_lock()
					if (TestClearPageMlocked())
					  if (PageLRU())
					    move to evictable list
	SetPageLRU()
	spin_unlock()
	if (!PageMlocked())
	  move to evictable list

The PageMlocked() check may get reordered before SetPageLRU() in #0,
resulting in #0 not moving the still mlocked page, and in #1 failing to
isolate and move the page as well.  The page is now stranded on the
unevictable list.

The race condition is very unlikely.  The consequence currently is one
page falling off the reclaim grid and eventually getting freed with
PG_unevictable set, which triggers a warning in the page allocator.

TestClearPageMlocked() in #1 already provides full memory barrier
semantics.

This patch adds an explicit full barrier to force ordering between
SetPageLRU() and PageMlocked() so that either one of the competitors
rescues the page.
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6a7b9548

do_mbind(): fix memory leak · b05ca738

由 KOSAKI Motohiro 提交于 10月 26, 2009

If migrate_prep is failed, new variable is leaked.  This patch fixes it.
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: NChristoph Lameter <cl@linux-foundation.org>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b05ca738

mbind(): fix leak of never putback pages · ab8a3e14

由 KOSAKI Motohiro 提交于 10月 26, 2009

If mbind() receives an invalid address, do_mbind leaks a page.  The
following test program detects this leak.

This patch fixes it.

migrate_efault.c
=======================================
 #include <numaif.h>
 #include <numa.h>
 #include <sys/mman.h>
 #include <stdio.h>
 #include <unistd.h>
 #include <stdlib.h>
 #include <string.h>

static unsigned long pagesize;

static void* make_hole_mapping(void)
{

	void* addr;

	addr = mmap(NULL, pagesize*3, PROT_READ|PROT_WRITE,
		    MAP_ANON|MAP_PRIVATE, 0, 0);
	if (addr == MAP_FAILED)
		return NULL;

	/* make page populate */
	memset(addr, 0, pagesize*3);

	/* make memory hole */
	munmap(addr+pagesize, pagesize);

	return addr;
}

int main(int argc, char** argv)
{
	void* addr;
	int ch;
	int node;
	struct bitmask *nmask = numa_allocate_nodemask();
	int err;
	int node_set = 0;

	while ((ch = getopt(argc, argv, "n:")) != -1){
		switch (ch){
		case 'n':
			node = strtol(optarg, NULL, 0);
			numa_bitmask_setbit(nmask, node);
			node_set = 1;
			break;
		default:
			;
		}
	}
	argc -= optind;
	argv += optind;

	if (!node_set)
		numa_bitmask_setbit(nmask, 0);

	pagesize = getpagesize();

	addr = make_hole_mapping();

	err = mbind(addr, pagesize*3, MPOL_BIND, nmask->maskp, nmask->size, MPOL_MF_MOVE_ALL);
	if (err)
		perror("mbind ");

	return 0;
}
=======================================
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: NChristoph Lameter <cl@linux-foundation.org>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ab8a3e14

vmscan: limit VM_EXEC protection to file pages · 41e20983

由 Wu Fengguang 提交于 10月 26, 2009

It is possible to have !Anon but SwapBacked pages, and some apps could
create huge number of such pages with MAP_SHARED|MAP_ANONYMOUS.  These
pages go into the ANON lru list, and hence shall not be protected: we only
care mapped executable files.  Failing to do so may trigger OOM.
Tested-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

41e20983

revert "mm: oom analysis: add buffer cache information to show_free_areas()" · b76146ed

由 Andrew Morton 提交于 10月 26, 2009

Revert

    commit 71de1ccb
    Author:     KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
    AuthorDate: Mon Sep 21 17:01:31 2009 -0700
    Commit:     Linus Torvalds <torvalds@linux-foundation.org>
    CommitDate: Tue Sep 22 07:17:27 2009 -0700

        mm: oom analysis: add buffer cache information to show_free_areas()

show_free_areas() is called during page allocation failures, and page
allocation failures can occur in any calling context.

But nr_blockdev_pages() takes VFS locks which should not be taken from
hard IRQ context (at least).  The result is lockdep warnings (and
deadlockability) during page allocation failures.

Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b76146ed

congestion_wait(): don't use WRITE · 58355c78

由 KOSAKI Motohiro 提交于 10月 26, 2009

commit 8aa7e847 (Fix congestion_wait() sync/async vs read/write
confusion) replace WRITE with BLK_RW_ASYNC.  Unfortunately, concurrent mm
development made the unchanged place accidentally.

This patch fixes it too.
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: NJens Axboe <jens.axboe@oracle.com>
Acked-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

58355c78

hwpoison: fix oops on ksm pages · 92f7ba70

由 Hugh Dickins 提交于 10月 26, 2009

Memory failure on a KSM page currently oopses on its NULL anon_vma in
page_lock_anon_vma(): that may not be much worse than the consequence of
ignoring it, but it is better to be consistent with how ZERO_PAGE and
hugetlb pages and other awkward cases are treated. Just skip it.

We could fix it for 2.6.32 at the KSM end, by putting a dummy anon_vma
pointer in there; but that would get harder next time, when KSM will put a
pointer to something else there (and I'm not currently planning to do any
work to open that up to memory_failure). So I would prefer this simple
PageKsm test, until the other exceptions are handled.
Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

92f7ba70

backing-dev: ensure that a removed bdi no longer has super_block referencing it · 592b09a4

由 Jens Axboe 提交于 10月 29, 2009

When the bdi is being removed, we have to ensure that no super_blocks
currently have that cached in sb->s_bdi. Normally this is ensured by
the sb having a longer life span than the bdi, but if the device is
suddenly yanked, we have to kill this reference. sb->s_bdi is pointed
to freed memory at that point.

This fixes a problem with sync(1) hanging when a USB stick is pulled
without cleanly umounting it first.
Reported-by: NPavel Machek <pavel@ucw.cz>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

592b09a4

28 10月, 2009 1 次提交

percpu: allow pcpu_alloc() to be called with IRQs off · 403a91b1

由 Jiri Kosina 提交于 10月 29, 2009

pcpu_alloc() and pcpu_extend_area_map() perform a series of
spin_lock_irq()/spin_unlock_irq() calls, which make them unsafe
with respect to being called from contexts which have IRQs off.

This patch converts the code to perform save/restore of flags instead,
making pcpu_alloc() (or __alloc_percpu() respectively) to be called
from early kernel startup stage, where IRQs are off.

This is needed for proper initialization of per-cpu rq_weight data from
sched_init().

tj: added comment explaining why irqsave/restore is used in alloc path.
Signed-off-by: NJiri Kosina <jkosina@suse.cz>
Acked-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NTejun Heo <tj@kernel.org>

403a91b1

27 10月, 2009 1 次提交

powerpc: Limit memory hotplug support to PPC64 Book-3S machines · ed84a07a

由 Kumar Gala 提交于 10月 16, 2009

Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

ed84a07a

19 10月, 2009 4 次提交

HWPOISON: fix invalid page count in printk output · 7456b040

由 Wu Fengguang 提交于 10月 19, 2009

The madvise injector already holds a reference when passing in a page
to the memory-failure code. The code corrects for this additional reference
for its checks, but the final printk output didn't. Fix that.
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NAndi Kleen <ak@linux.intel.com>

7456b040

HWPOISON: fix oops on ksm pages · 01e00f88

由 Hugh Dickins 提交于 10月 13, 2009

Memory failure on a KSM page currently oopses on its NULL anon_vma in
page_lock_anon_vma(): that may not be much worse than the consequence
of ignoring it, but it is better to be consistent with how ZERO_PAGE
and hugetlb pages and other awkward cases are treated. Just skip it.
Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: NAndi Kleen <ak@linux.intel.com>

01e00f88

HWPOISON: Fix page count leak in hwpoison late kill in do_swap_page · 4779cb31

由 Andi Kleen 提交于 10月 14, 2009

When returning due to a poisoned page drop the page count.

It wasn't a fatal problem because noone cares about the page count
on a poisoned page (except when it wraps), but it's cleaner to fix it.

Pointed out by Linus.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>

4779cb31

HWPOISON: return early on non-LRU pages · e43c3afb

由 Wu Fengguang 提交于 9月 29, 2009

Right now we have some trouble with non atomic access
to page flags when locking the page. To plug this hole
for now, limit error recovery to LRU pages for now.

This could be better fixed by defining a suitable protocol,
but let's go this simple way for now

This avoids unnecessary races with __set_page_locked() and
__SetPageSlab*() and maybe more non-atomic page flag operations.

This loses isolated pages which are currently in page reclaim, but these
are relatively limited compared to the total memory.
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
[AK: new description, bug fixes, cleanups]

e43c3afb

12 10月, 2009 2 次提交

percpu: fix compile warnings · 1a0c3298

由 Tejun Heo 提交于 10月 04, 2009

Fix the following two compile warnings which show up on i386.

mm/percpu.c:1873: warning: comparison of distinct pointer types lacks a cast
mm/percpu.c:1879: warning: format '%lx' expects type 'long unsigned int', but argument 2 has type 'size_t'
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NRandy Dunlap <randy.dunlap@oracle.com>

1a0c3298

headers: remove sched.h from interrupt.h · d43c36dc

由 Alexey Dobriyan 提交于 10月 07, 2009

After m68k's task_thread_info() doesn't refer to current,
it's possible to remove sched.h from interrupt.h and not break m68k!
Many thanks to Heiko Carstens for allowing this.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>

d43c36dc

10 10月, 2009 2 次提交

kmemleak: Check for NULL pointer returned by create_object() · 0d5d1aad

由 Catalin Marinas 提交于 10月 09, 2009

This patch adds NULL pointer checking in the early_alloc() function.
Reported-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0d5d1aad

kmemleak: Use GFP_ATOMIC for early_alloc(). · c1bcd6b3

由 Tetsuo Handa 提交于 10月 09, 2009

We can't use GFP_KERNEL inside rcu_read_lock().
Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c1bcd6b3

09 10月, 2009 2 次提交

writeback: kill space in debugfs item name · 961515f6

由 Wu Fengguang 提交于 10月 09, 2009

The space is not script friendly, kill it.
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

961515f6

writeback: account IO throttling wait as iowait · d25105e8

由 Wu Fengguang 提交于 10月 09, 2009

It makes sense to do IOWAIT when someone is blocked
due to IO throttle, as suggested by Kame and Peter.

There is an old comment for not doing IOWAIT on throttle,
however it has been mismatching the code for a long time.

If we stop accounting IOWAIT for 2.6.32, it could be an
undesirable behavior change. So restore the io_schedule.

CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

d25105e8

08 10月, 2009 3 次提交

mm, perf_event: Make vmalloc_user() align base kernel virtual address to SHMLBA · 2dca6999

由 David Miller 提交于 9月 21, 2009

When a vmalloc'd area is mmap'd into userspace, some kind of
co-ordination is necessary for this to work on platforms with cpu
D-caches which can have aliases.

Otherwise kernel side writes won't be seen properly in userspace
and vice versa.

If the kernel side mapping and the user side one have the same
alignment, modulo SHMLBA, this can work as long as VM_SHARED is
shared of VMA and for all current users this is true.  VM_SHARED
will force SHMLBA alignment of the user side mmap on platforms with
D-cache aliasing matters.

The bulk of this patch is just making it so that a specific
alignment can be passed down into __get_vm_area_node().  All
existing callers pass in '1' which preserves existing behavior.
vmalloc_user() gives SHMLBA for the alignment.

As a side effect this should get the video media drivers and other
vmalloc_user() users into more working shape on such systems.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
LKML-Reference: <200909211922.n8LJMYjw029425@imap1.linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

2dca6999

mm: includecheck fix: vmalloc.c · 3700c155

由 Jaswinder Singh Rajput 提交于 10月 07, 2009

fix the following 'make includecheck' warning:

  mm/vmalloc.c: linux/highmem.h is included more than once.
Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3700c155

ksm: more on default values · c73602ad

由 Hugh Dickins 提交于 10月 07, 2009

Adjust the max_kernel_pages default to a quarter of totalram_pages,
instead of nr_free_buffer_pages() / 4: the KSM pages themselves come from
highmem, and even on a 16GB PAE machine, 4GB of KSM pages would only be
pinning 32MB of lowmem with their rmap_items, so no need for the more
obscure calculation (nor for its own special init function).

There is no way for the user to switch KSM on if CONFIG_SYSFS is not
enabled, so in that case default run to KSM_RUN_MERGE.

Update KSM Documentation and Kconfig to reflect the new defaults.
Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Izik Eidus <ieidus@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c73602ad

02 10月, 2009 5 次提交

memcg: reduce check for softlimit excess · ef8745c1

由 KAMEZAWA Hiroyuki 提交于 10月 01, 2009

In charge/uncharge/reclaim path, usage_in_excess is calculated repeatedly
and it takes res_counter's spin_lock every time.

This patch removes unnecessary calls for res_count_soft_limit_excess.
Reviewed-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ef8745c1

memcg: some modification to softlimit under hierarchical memory reclaim. · 4e649152

由 KAMEZAWA Hiroyuki 提交于 10月 01, 2009

This patch clean up/fixes for memcg's uncharge soft limit path.

Problems:
  Now, res_counter_charge()/uncharge() handles softlimit information at
  charge/uncharge and softlimit-check is done when event counter per memcg
  goes over limit. Now, event counter per memcg is updated only when
  memory usage is over soft limit. Here, considering hierarchical memcg
  management, ancesotors should be taken care of.

  Now, ancerstors(hierarchy) are handled in charge() but not in uncharge().
  This is not good.

  Prolems:
  1. memcg's event counter incremented only when softlimit hits. That's bad.
     It makes event counter hard to be reused for other purpose.

  2. At uncharge, only the lowest level rescounter is handled. This is bug.
     Because ancesotor's event counter is not incremented, children should
     take care of them.

  3. res_counter_uncharge()'s 3rd argument is NULL in most case.
     ops under res_counter->lock should be small. No "if" sentense is better.

Fixes:
  * Removed soft_limit_xx poitner and checks in charge and uncharge.
    Do-check-only-when-necessary scheme works enough well without them.

  * make event-counter of memcg incremented at every charge/uncharge.
    (per-cpu area will be accessed soon anyway)

  * All ancestors are checked at soft-limit-check. This is necessary because
    ancesotor's event counter may never be modified. Then, they should be
    checked at the same time.
Reviewed-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4e649152

memcg: fix refcnt going negative · 26251eaf

由 KAMEZAWA Hiroyuki 提交于 10月 01, 2009

__mem_cgroup_largest_soft_limit_node() returns a mem_cgroup_per_zone "mz"
with incremnted mz->mem->css's refcnt.  Then, the caller of this function
has to call css_put(mz->mem->css).

But, mz can be !NULL even if "not found" i.e.  without css_get().  By
this, css->refcnt will go down to minus.

This may cause various things...one of results will be
initite-loop in css_tryget()  as this.

INFO: RCU detected CPU 0 stall (t=10000 jiffies)
sending NMI to all CPUs:
NMI backtrace for cpu 0
CPU 0:
<snip>

 <<EOE>>  <IRQ>  [<ffffffff810884bd>] trace_hardirqs_off+0xd/0x10
  [<ffffffff8102a940>] flat_send_IPI_mask+0x90/0xb0
  [<ffffffff8102a9c9>] flat_send_IPI_all+0x69/0x70
  [<ffffffff81027372>] arch_trigger_all_cpu_backtrace+0x62/0xa0
  [<ffffffff810bff8e>] __rcu_pending+0x7e/0x370
  [<ffffffff810c02c7>] rcu_check_callbacks+0x47/0x130
  [<ffffffff81063a26>] update_process_times+0x46/0x70
  [<ffffffff81085930>] tick_sched_timer+0x60/0x160
  [<ffffffff810858d0>] ? tick_sched_timer+0x0/0x160
  [<ffffffff8107a03a>] __run_hrtimer+0xba/0x150
  [<ffffffff8107a325>] hrtimer_interrupt+0xd5/0x1b0
  [<ffffffff81426dfe>] ? trace_hardirqs_off_thunk+0x3a/0x3c
  [<ffffffff8142cacd>] smp_apic_timer_interrupt+0x6d/0x9b
  [<ffffffff8100cb33>] apic_timer_interrupt+0x13/0x20
  <EOI>  [<ffffffff811317b6>] ? mem_cgroup_walk_tree+0x156/0x180
  [<ffffffff811316d3>] ? mem_cgroup_walk_tree+0x73/0x180
  [<ffffffff81131692>] ? mem_cgroup_walk_tree+0x32/0x180
  [<ffffffff81131a00>] ? mem_cgroup_get_local_stat+0x0/0x110
  [<ffffffff81131d5b>] ? mem_control_stat_show+0x14b/0x330
  [<ffffffff810a57fd>] ? cgroup_seqfile_show+0x3d/0x60

Above shows CPU0 caught in css_tryget()'s inifinite loop because
of bad refcnt.

This is a fix to set mz=NULL at the top of retry path.
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NPaul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

26251eaf

mm/rmap.c: fix comment · bf89c8c8

由 Huang Shijie 提交于 10月 01, 2009

The page_address_in_vma() is not only used in unuse_vma().
Signed-off-by: NHuang Shijie <shijie8@gmail.com>
Acked-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bf89c8c8

swapfile: avoid NULL pointer dereference in swapon when s_bdev is NULL · 3bd0f0c7

由 Suresh Jayaraman 提交于 9月 30, 2009

While testing Swap over NFS patchset, I noticed an oops that was triggered
during swapon. Investigating further, the NULL pointer deference is due to the
SSD device check/optimization in the swapon code that assumes s_bdev could never
be NULL.

inode->i_sb->s_bdev could be NULL in a few cases. For e.g. one such case is
loopback NFS mount, there could be others as well. Fix this by ensuring s_bdev
is not NULL before we try to deference s_bdev.
Signed-off-by: NSuresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

3bd0f0c7

29 9月, 2009 3 次提交

percpu: make allocation failures more verbose · f2badb0c

由 Tejun Heo 提交于 9月 29, 2009

Warn and dump stack when percpu allocation fails.  percpu allocator is
still young and unchecked NULL percpu pointer usage can result in
random memory corruption when combined with the pointer shifting in
access macros.  Allocation failures should be rare and the warning
message will be disabled after certain times.
Signed-off-by: NTejun Heo <tj@kernel.org>

f2badb0c

percpu: make pcpu_setup_first_chunk() failures more verbose · 635b75fc

由 Tejun Heo 提交于 9月 24, 2009

The parameters to pcpu_setup_first_chunk() come from different sources
depending on architecture and can be quite complex.  The function runs
various sanity checks on the parameters and triggers BUG() if
something isn't right.  However, this is very early during the boot
and not reporting exactly what the problem is makes debugging even
harder.

Add PCPU_SETUP_BUG() macro which prints out enough information about
the parameters.  As the macro still puts separate BUG() for each
check, it won't lose any information even on the situations where only
the program counter can be retrieved.

While at it, also bump pcpu_dump_alloc_info() message to KERN_INFO so
that it's visible on the console if boot fails to complete.
Signed-off-by: NTejun Heo <tj@kernel.org>

635b75fc

percpu: make embedding first chunk allocator check vmalloc space size · 6ea529a2

由 Tejun Heo 提交于 9月 24, 2009

Embedding first chunk allocator maintains the distances between units
in the vmalloc area and thus needs vmalloc space to be larger than the
maximum distances between units; otherwise, it wouldn't be able to
create any dynamic chunks.  This patch makes the embedding first chunk
allocator check vmalloc space size and if the maximum distance between
units is larger than 75% of it, print warning and, if page mapping
allocator is available, fail initialization so that the system falls
back onto it.

This should work around percpu allocation failure problems on certain
sparc64 configurations where distances between NUMA nodes are larger
than the vmalloc area and makes percpu allocator more robust for
future configurations.
Signed-off-by: NTejun Heo <tj@kernel.org>

6ea529a2

Linux-御风守护者 / linux 与 Fork 源项目一致

Linux-御风守护者 / linux
与 Fork 源项目一致