1. 16 12月, 2009 5 次提交
    • L
      hugetlb: add nodemask arg to huge page alloc, free and surplus adjust functions · 6ae11b27
      Lee Schermerhorn 提交于
      In preparation for constraining huge page allocation and freeing by the
      controlling task's numa mempolicy, add a "nodes_allowed" nodemask pointer
      to the allocate, free and surplus adjustment functions.  For now, pass
      NULL to indicate default behavior--i.e., use node_online_map.  A
      subsqeuent patch will derive a non-default mask from the controlling
      task's numa mempolicy.
      
      Note that this method of updating the global hstate nr_hugepages under the
      constraint of a nodemask simplifies keeping the global state
      consistent--especially the number of persistent and surplus pages relative
      to reservations and overcommit limits.  There are undoubtedly other ways
      to do this, but this works for both interfaces: mempolicy and per node
      attributes.
      
      [rientjes@google.com: fix HIGHMEM compile error]
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Reviewed-by: NMel Gorman <mel@csn.ul.ie>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Reviewed-by: NAndi Kleen <andi@firstfloor.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Nishanth Aravamudan <nacc@us.ibm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: Andy Whitcroft <apw@canonical.com>
      Cc: Eric Whitney <eric.whitney@hp.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6ae11b27
    • L
      hugetlb: rework hstate_next_node_* functions · 9a76db09
      Lee Schermerhorn 提交于
      Modify the hstate_next_node* functions to allow them to be called to
      obtain the "start_nid".  Then, whereas prior to this patch we
      unconditionally called hstate_next_node_to_{alloc|free}(), whether or not
      we successfully allocated/freed a huge page on the node, now we only call
      these functions on failure to alloc/free to advance to next allowed node.
      
      Factor out the next_node_allowed() function to handle wrap at end of
      node_online_map.  In this version, the allowed nodes include all of the
      online nodes.
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Reviewed-by: NMel Gorman <mel@csn.ul.ie>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Reviewed-by: NAndi Kleen <andi@firstfloor.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Nishanth Aravamudan <nacc@us.ibm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: Andy Whitcroft <apw@canonical.com>
      Cc: Eric Whitney <eric.whitney@hp.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9a76db09
    • K
      mm: move inc_zone_page_state(NR_ISOLATED) to just isolated place · 6d9c285a
      KOSAKI Motohiro 提交于
      Christoph pointed out inc_zone_page_state(NR_ISOLATED) should be placed
      in right after isolate_page().
      
      This patch does it.
      Reviewed-by: NChristoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6d9c285a
    • K
      mmap: don't return ENOMEM when mapcount is temporarily exceeded in munmap() · 659ace58
      KOSAKI Motohiro 提交于
      On ia64, the following test program exit abnormally, because glibc thread
      library called abort().
      
       ========================================================
       (gdb) bt
       #0  0xa000000000010620 in __kernel_syscall_via_break ()
       #1  0x20000000003208e0 in raise () from /lib/libc.so.6.1
       #2  0x2000000000324090 in abort () from /lib/libc.so.6.1
       #3  0x200000000027c3e0 in __deallocate_stack () from /lib/libpthread.so.0
       #4  0x200000000027f7c0 in start_thread () from /lib/libpthread.so.0
       #5  0x200000000047ef60 in __clone2 () from /lib/libc.so.6.1
       ========================================================
      
      The fact is, glibc call munmap() when thread exitng time for freeing
      stack, and it assume munlock() never fail.  However, munmap() often make
      vma splitting and it with many mapcount make -ENOMEM.
      
      Oh well, that's crazy, because stack unmapping never increase mapcount.
      The maxcount exceeding is only temporary.  internal temporary exceeding
      shouldn't make ENOMEM.
      
      This patch does it.
      
       test_max_mapcount.c
       ==================================================================
        #include<stdio.h>
        #include<stdlib.h>
        #include<string.h>
        #include<pthread.h>
        #include<errno.h>
        #include<unistd.h>
      
        #define THREAD_NUM 30000
        #define MAL_SIZE (8*1024*1024)
      
       void *wait_thread(void *args)
       {
       	void *addr;
      
       	addr = malloc(MAL_SIZE);
       	sleep(10);
      
       	return NULL;
       }
      
       void *wait_thread2(void *args)
       {
       	sleep(60);
      
       	return NULL;
       }
      
       int main(int argc, char *argv[])
       {
       	int i;
       	pthread_t thread[THREAD_NUM], th;
       	int ret, count = 0;
       	pthread_attr_t attr;
      
       	ret = pthread_attr_init(&attr);
       	if(ret) {
       		perror("pthread_attr_init");
       	}
      
       	ret = pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
       	if(ret) {
       		perror("pthread_attr_setdetachstate");
       	}
      
       	for (i = 0; i < THREAD_NUM; i++) {
       		ret = pthread_create(&th, &attr, wait_thread, NULL);
       		if(ret) {
       			fprintf(stderr, "[%d] ", count);
       			perror("pthread_create");
       		} else {
       			printf("[%d] create OK.\n", count);
       		}
       		count++;
      
       		ret = pthread_create(&thread[i], &attr, wait_thread2, NULL);
       		if(ret) {
       			fprintf(stderr, "[%d] ", count);
       			perror("pthread_create");
       		} else {
       			printf("[%d] create OK.\n", count);
       		}
       		count++;
       	}
      
       	sleep(3600);
       	return 0;
       }
       ==================================================================
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      659ace58
    • D
      oom: dump stack and VM state when oom killer panics · 1b604d75
      David Rientjes 提交于
      The oom killer header, including information such as the allocation order
      and gfp mask, current's cpuset and memory controller, call trace, and VM
      state information is currently only shown when the oom killer has selected
      a task to kill.
      
      This information is omitted, however, when the oom killer panics either
      because of panic_on_oom sysctl settings or when no killable task was
      found.  It is still relevant to know crucial pieces of information such as
      the allocation order and VM state when diagnosing such issues, especially
      at boot.
      
      This patch displays the oom killer header whenever it panics so that bug
      reports can include pertinent information to debug the issue, if possible.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1b604d75
  2. 12 12月, 2009 1 次提交
    • H
      mm: Adjust do_pages_stat() so gcc can see copy_from_user() is safe · b9255850
      H. Peter Anvin 提交于
      Slightly adjust the logic for determining the size of the
      copy_form_user() in do_pages_stat(); with this change, gcc can see
      that the copying is safe.
      
      Without this, we get a build error for i386 allyesconfig:
      
      /home/hpa/kernel/linux-2.6-tip.urgent/arch/x86/include/asm/uaccess_32.h:213:
      error: call to ‘copy_from_user_overflow’ declared with attribute
      error: copy_from_user() buffer size is not provably correct
      
      Unlike an earlier patch from Arjan, this doesn't introduce new
      variables; merely reshuffles the compare so that gcc can see that an
      overflow cannot happen.
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Brice Goglin <Brice.Goglin@inria.fr>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      LKML-Reference: <20090926205406.30d55b08@infradead.org>
      b9255850
  3. 11 12月, 2009 12 次提交
  4. 10 12月, 2009 1 次提交
  5. 06 12月, 2009 3 次提交
  6. 04 12月, 2009 2 次提交
  7. 03 12月, 2009 2 次提交
  8. 01 12月, 2009 1 次提交
    • P
      SLAB: Fix lockdep annotations for CPU hotplug · ce79ddc8
      Pekka Enberg 提交于
      As reported by Paul McKenney:
      
        I am seeing some lockdep complaints in rcutorture runs that include
        frequent CPU-hotplug operations.  The tests are otherwise successful.
        My first thought was to send a patch that gave each array_cache
        structure's ->lock field its own struct lock_class_key, but you already
        have a init_lock_keys() that seems to be intended to deal with this.
      
        ------------------------------------------------------------------------
      
        =============================================
        [ INFO: possible recursive locking detected ]
        2.6.32-rc4-autokern1 #1
        ---------------------------------------------
        syslogd/2908 is trying to acquire lock:
         (&nc->lock){..-...}, at: [<c0000000001407f4>] .kmem_cache_free+0x118/0x2d4
      
        but task is already holding lock:
         (&nc->lock){..-...}, at: [<c0000000001411bc>] .kfree+0x1f0/0x324
      
        other info that might help us debug this:
        3 locks held by syslogd/2908:
         #0:  (&u->readlock){+.+.+.}, at: [<c0000000004556f8>] .unix_dgram_recvmsg+0x70/0x338
         #1:  (&nc->lock){..-...}, at: [<c0000000001411bc>] .kfree+0x1f0/0x324
         #2:  (&parent->list_lock){-.-...}, at: [<c000000000140f64>] .__drain_alien_cache+0x50/0xb8
      
        stack backtrace:
        Call Trace:
        [c0000000e8ccafc0] [c0000000000101e4] .show_stack+0x70/0x184 (unreliable)
        [c0000000e8ccb070] [c0000000000afebc] .validate_chain+0x6ec/0xf58
        [c0000000e8ccb180] [c0000000000b0ff0] .__lock_acquire+0x8c8/0x974
        [c0000000e8ccb280] [c0000000000b2290] .lock_acquire+0x140/0x18c
        [c0000000e8ccb350] [c000000000468df0] ._spin_lock+0x48/0x70
        [c0000000e8ccb3e0] [c0000000001407f4] .kmem_cache_free+0x118/0x2d4
        [c0000000e8ccb4a0] [c000000000140b90] .free_block+0x130/0x1a8
        [c0000000e8ccb540] [c000000000140f94] .__drain_alien_cache+0x80/0xb8
        [c0000000e8ccb5e0] [c0000000001411e0] .kfree+0x214/0x324
        [c0000000e8ccb6a0] [c0000000003ca860] .skb_release_data+0xe8/0x104
        [c0000000e8ccb730] [c0000000003ca2ec] .__kfree_skb+0x20/0xd4
        [c0000000e8ccb7b0] [c0000000003cf2c8] .skb_free_datagram+0x1c/0x5c
        [c0000000e8ccb830] [c00000000045597c] .unix_dgram_recvmsg+0x2f4/0x338
        [c0000000e8ccb920] [c0000000003c0f14] .sock_recvmsg+0xf4/0x13c
        [c0000000e8ccbb30] [c0000000003c28ec] .SyS_recvfrom+0xb4/0x130
        [c0000000e8ccbcb0] [c0000000003bfb78] .sys_recv+0x18/0x2c
        [c0000000e8ccbd20] [c0000000003ed388] .compat_sys_recv+0x14/0x28
        [c0000000e8ccbd90] [c0000000003ee1bc] .compat_sys_socketcall+0x178/0x220
        [c0000000e8ccbe30] [c0000000000085d4] syscall_exit+0x0/0x40
      
      This patch fixes the issue by setting up lockdep annotations during CPU
      hotplug.
      Reported-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      ce79ddc8
  9. 29 11月, 2009 1 次提交
  10. 26 11月, 2009 2 次提交
    • I
      tracing: Fix kmem event exports · 4d795fb1
      Ingo Molnar 提交于
      Commit 53d0422c ("tracing: Convert some kmem events to DEFINE_EVENT")
      moved the kmem tracepoint creation from util.c to page_alloc.c,
      but forgot to move the exports.
      
      Move them back.
      
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      LKML-Reference: <4B0E286A.2000405@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4d795fb1
    • L
      tracing: Convert some kmem events to DEFINE_EVENT · 53d0422c
      Li Zefan 提交于
      Use DECLARE_EVENT_CLASS to remove duplicate code:
      
         text    data     bss     dec     hex filename
       333987   69800   27228  431015   693a7 mm/built-in.o.old
       330030   69800   27228  427058   68432 mm/built-in.o
      
      8 events are converted:
      
        kmem_alloc: kmalloc, kmem_cache_alloc
        kmem_alloc_node: kmalloc_node, kmem_cache_alloc_node
        kmem_free: kfree, kmem_cache_free
        mm_page: mm_page_alloc_zone_locked, mm_page_pcpu_drain
      
      No change in functionality.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      LKML-Reference: <4B0E286A.2000405@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      53d0422c
  11. 25 11月, 2009 1 次提交
    • V
      percpu: Fix kdump failure if booted with percpu_alloc=page · 3b034b0d
      Vivek Goyal 提交于
      o kdump functionality reserves a per cpu area at boot time and exports the
        physical address of that area to user space through sys interface. This
        area stores some dump related information like cpu register states etc
        at the time of crash.
      
      o We were assuming that per cpu area always come from linearly mapped meory
        region and using __pa() to determine physical address.
        With percpu_alloc=page, per cpu area can come from vmalloc region also and
        __pa() breaks.
      
      o This patch implments a new function to convert per cpu address to
        physical address.
      
      Before the patch, crash_notes addresses looked as follows.
      
      cpu0 60fffff49800
      cpu1 60fffff60800
      cpu2 60fffff77800
      
      These are bogus phsyical addresses.
      
      After the patch, address are following.
      
      cpu0 13eb44000
      cpu1 13eb43000
      cpu2 13eb42000
      cpu3 13eb41000
      
      These look fine. I got 4G of memory and /proc/iomem tell me following.
      
      100000000-13fffffff : System RAM
      
      tj: * added missing asm/io.h include reported by Stephen Rothwell
          * repositioned per_cpu_ptr_phys() in percpu.c and added comment.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      3b034b0d
  12. 18 11月, 2009 2 次提交
  13. 12 11月, 2009 5 次提交
  14. 10 11月, 2009 2 次提交