1. 25 5月, 2011 2 次提交
  2. 23 3月, 2011 1 次提交
  3. 05 3月, 2011 2 次提交
  4. 01 3月, 2011 1 次提交
  5. 26 2月, 2011 1 次提交
  6. 14 1月, 2011 5 次提交
  7. 03 12月, 2010 1 次提交
  8. 29 10月, 2010 1 次提交
    • E
      numa: fix slab_node(MPOL_BIND) · 800416f7
      Eric Dumazet 提交于
      When a node contains only HighMem memory, slab_node(MPOL_BIND)
      dereferences a NULL pointer.
      
      [ This code seems to go back all the way to commit 19770b32: "mm:
        filter based on a nodemask as well as a gfp_mask".  Which was back in
        April 2008, and it got merged into 2.6.26.  - Linus ]
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      800416f7
  9. 27 10月, 2010 2 次提交
  10. 10 8月, 2010 2 次提交
  11. 30 6月, 2010 1 次提交
  12. 26 5月, 2010 1 次提交
  13. 25 5月, 2010 10 次提交
    • G
      mm: consider the entire user address space during node migration · 6ec3a127
      Greg Thelen 提交于
      Use mm->task_size instead of TASK_SIZE to ensure that the entire user
      address space is migrated.  mm->task_size is independent of the calling
      task context.  TASK SIZE may be dependant on the address space size of the
      calling process.  Usage of TASK_SIZE can lead to partial address space
      migration if the calling process was 32 bit and the migrating process was
      64 bit.
      
      Here is the test script used on 64 system with a 32 bit echo process:
      
        mount -t cgroup none /cgroup -o cpuset
        cd /cgroup
      
        mkdir 0
        echo 1 > 0/cpuset.cpus
        echo 0 > 0/cpuset.mems
        echo 1 > 0/cpuset.memory_migrate
      
        mkdir 1
        echo 1 > 1/cpuset.cpus
        echo 1 > 1/cpuset.mems
        echo 1 > 1/cpuset.memory_migrate
      
        echo $$ > 0/tasks
        64_bit_process &
        pid=$!
      
        echo $pid > 1/tasks   # This does not migrate all process pages without
                              # this patch.  If 64 bit echo is used or this patch is
                              # applied, then the full address space of $pid is
                              # migrated.
      
      To check memory migration, I watched:
        grep MemUsed /sys/devices/system/node/node*/meminfo
      Signed-off-by: NGreg Thelen <gthelen@google.com>
      Acked-by: NChristoph Lameter <cl@linux-foundation.org>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6ec3a127
    • M
      cpuset,mm: fix no node to alloc memory when changing cpuset's mems · c0ff7453
      Miao Xie 提交于
      Before applying this patch, cpuset updates task->mems_allowed and
      mempolicy by setting all new bits in the nodemask first, and clearing all
      old unallowed bits later.  But in the way, the allocator may find that
      there is no node to alloc memory.
      
      The reason is that cpuset rebinds the task's mempolicy, it cleans the
      nodes which the allocater can alloc pages on, for example:
      
      (mpol: mempolicy)
      	task1			task1's mpol	task2
      	alloc page		1
      	  alloc on node0? NO	1
      				1		change mems from 1 to 0
      				1		rebind task1's mpol
      				0-1		  set new bits
      				0	  	  clear disallowed bits
      	  alloc on node1? NO	0
      	  ...
      	can't alloc page
      	  goto oom
      
      This patch fixes this problem by expanding the nodes range first(set newly
      allowed bits) and shrink it lazily(clear newly disallowed bits).  So we
      use a variable to tell the write-side task that read-side task is reading
      nodemask, and the write-side task clears newly disallowed nodes after
      read-side task ends the current memory allocation.
      
      [akpm@linux-foundation.org: fix spello]
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Paul Menage <menage@google.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Ravikiran Thirumalai <kiran@scalex86.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c0ff7453
    • M
      mempolicy: restructure rebinding-mempolicy functions · 708c1bbc
      Miao Xie 提交于
      Nick Piggin reported that the allocator may see an empty nodemask when
      changing cpuset's mems[1].  It happens only on the kernel that do not do
      atomic nodemask_t stores.  (MAX_NUMNODES > BITS_PER_LONG)
      
      But I found that there is also a problem on the kernel that can do atomic
      nodemask_t stores.  The problem is that the allocator can't find a node to
      alloc page when changing cpuset's mems though there is a lot of free
      memory.  The reason is like this:
      
      (mpol: mempolicy)
      	task1			task1's mpol	task2
      	alloc page		1
      	  alloc on node0? NO	1
      				1		change mems from 1 to 0
      				1		rebind task1's mpol
      				0-1		  set new bits
      				0	  	  clear disallowed bits
      	  alloc on node1? NO	0
      	  ...
      	can't alloc page
      	  goto oom
      
      I can use the attached program reproduce it by the following step:
      
      # mkdir /dev/cpuset
      # mount -t cpuset cpuset /dev/cpuset
      # mkdir /dev/cpuset/1
      # echo `cat /dev/cpuset/cpus` > /dev/cpuset/1/cpus
      # echo `cat /dev/cpuset/mems` > /dev/cpuset/1/mems
      # echo $$ > /dev/cpuset/1/tasks
      # numactl --membind=`cat /dev/cpuset/mems` ./cpuset_mem_hog <nr_tasks> &
         <nr_tasks> = max(nr_cpus - 1, 1)
      # killall -s SIGUSR1 cpuset_mem_hog
      # ./change_mems.sh
      
      several hours later, oom will happen though there is a lot of free memory.
      
      This patchset fixes this problem by expanding the nodes range first(set
      newly allowed bits) and shrink it lazily(clear newly disallowed bits).  So
      we use a variable to tell the write-side task that read-side task is
      reading nodemask, and the write-side task clears newly disallowed nodes
      after read-side task ends the current memory allocation.
      
      This patch:
      
      In order to fix no node to alloc memory, when we want to update mempolicy
      and mems_allowed, we expand the set of nodes first (set all the newly
      nodes) and shrink the set of nodes lazily(clean disallowed nodes), But the
      mempolicy's rebind functions may breaks the expanding.
      
      So we restructure the mempolicy's rebind functions and split the rebind
      work to two steps, just like the update of cpuset's mems: The 1st step:
      expand the set of the mempolicy's nodes.  The 2nd step: shrink the set of
      the mempolicy's nodes.  It is used when there is no real lock to protect
      the mempolicy in the read-side.  Otherwise we can do rebind work at once.
      
      In order to implement it, we define
      
      	enum mpol_rebind_step {
      		MPOL_REBIND_ONCE,
      		MPOL_REBIND_STEP1,
      		MPOL_REBIND_STEP2,
      		MPOL_REBIND_NSTEP,
      	};
      
      If the mempolicy needn't be updated by two steps, we can pass
      MPOL_REBIND_ONCE to the rebind functions.  Or we can pass
      MPOL_REBIND_STEP1 to do the first step of the rebind work and pass
      MPOL_REBIND_STEP2 to do the second step work.
      
      Besides that, it maybe long time between these two step and we have to
      release the lock that protects mempolicy and mems_allowed.  If we hold the
      lock once again, we must check whether the current mempolicy is under the
      rebinding (the first step has been done) or not, because the task may
      alloc a new mempolicy when we don't hold the lock.  So we defined the
      following flag to identify it:
      
      #define MPOL_F_REBINDING (1 << 2)
      
      The new functions will be used in the next patch.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Paul Menage <menage@google.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Ravikiran Thirumalai <kiran@scalex86.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      708c1bbc
    • L
      mempolicy: factor mpol_shared_policy_init() return paths · 15d77835
      Lee Schermerhorn 提交于
      Factor out duplicate put/frees in mpol_shared_policy_init() to a common
      return path.
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Ravikiran Thirumalai <kiran@scalex86.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      15d77835
    • L
      mempolicy: rename policy_types and cleanup initialization · 345ace9c
      Lee Schermerhorn 提交于
      Rename 'policy_types[]' to 'policy_modes[]' to better match the array
      contents.
      
      Use designated intializer syntax for policy_modes[].
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Ravikiran Thirumalai <kiran@scalex86.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      345ace9c
    • L
      mempolicy: lose unnecessary loop variable in mpol_parse_str() · b4652e84
      Lee Schermerhorn 提交于
      We don't really need the extra variable 'i' in mpol_parse_str().  The only
      use is as the the loop variable.  Then, it's assigned to 'mode'.  Just use
      mode, and loose the 'uninitialized_var()' macro.
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Ravikiran Thirumalai <kiran@scalex86.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b4652e84
    • L
      mempolicy: don't call mpol_set_nodemask() when no_context · e17f74af
      Lee Schermerhorn 提交于
      No need to call mpol_set_nodemask() when we have no context for the
      mempolicy.  This can occur when we're parsing a tmpfs 'mpol' mount option.
       Just save the raw nodemask in the mempolicy's w.user_nodemask member for
      use when a tmpfs/shmem file is created.  mpol_shared_policy_init() will
      "contextualize" the policy for the new file based on the creating task's
      context.
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Ravikiran Thirumalai <kiran@scalex86.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e17f74af
    • B
      mempolicy: remove redundant check · 19800502
      Bob Liu 提交于
      Lee's patch "mempolicy: use MPOL_PREFERRED for system-wide default policy"
      has made the MPOL_DEFAULT only used in the memory policy APIs.  So, no
      need to check in __mpol_equal also.  Also get rid of mpol_match_intent()
      and move its logic directly into __mpol_equal().
      Signed-off-by: NBob Liu <lliubbo@gmail.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      19800502
    • B
      mempolicy: remove case MPOL_INTERLEAVE from policy_zonelist() · 6eb27e1f
      Bob Liu 提交于
      In policy_zonelist() mode MPOL_INTERLEAVE shouldn't happen, so fall
      through to BUG() instead of break to return.  I also fixed the comment.
      Signed-off-by: NBob Liu <lliubbo@gmail.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6eb27e1f
    • B
      mempolicy: remove redundant code · 6d556294
      Bob Liu 提交于
      1.  In funtion is_valid_nodemask(), varibable k will be inited to 0 in
         the following loop, needn't init to policy_zone anymore.
      
      2. (MPOL_F_STATIC_NODES | MPOL_F_RELATIVE_NODES) has already defined
         to MPOL_MODE_FLAGS in mempolicy.h.
      Signed-off-by: NBob Liu <lliubbo@gmail.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6d556294
  14. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  15. 25 3月, 2010 5 次提交
  16. 07 3月, 2010 2 次提交
    • K
      mm/mempolicy.c: fix indentation of the comments of do_migrate_pages · da0aa138
      KOSAKI Motohiro 提交于
      Currently, do_migrate_pages() have very long comment and this is not
      indent properly.  I often misunderstand it is function starting commnents
      and confused it.
      
      this patch fixes it.
      
      note: this patch doesn't break 80 column rule. I guess original
            author intended this indentaion, but an accident corrupted it.
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: NChristoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      da0aa138
    • K
      mm: fix mbind vma merge problem · 9d8cebd4
      KOSAKI Motohiro 提交于
      Strangely, current mbind() doesn't merge vma with neighbor vma although it's possible.
      Unfortunately, many vma can reduce performance...
      
      This patch fixes it.
      
          reproduced program
          ----------------------------------------------------------------
           #include <numaif.h>
           #include <numa.h>
           #include <sys/mman.h>
           #include <stdio.h>
           #include <unistd.h>
           #include <stdlib.h>
           #include <string.h>
      
          static unsigned long pagesize;
      
          int main(int argc, char** argv)
          {
          	void* addr;
          	int ch;
          	int node;
          	struct bitmask *nmask = numa_allocate_nodemask();
          	int err;
          	int node_set = 0;
          	char buf[128];
      
          	while ((ch = getopt(argc, argv, "n:")) != -1){
          		switch (ch){
          		case 'n':
          			node = strtol(optarg, NULL, 0);
          			numa_bitmask_setbit(nmask, node);
          			node_set = 1;
          			break;
          		default:
          			;
          		}
          	}
          	argc -= optind;
          	argv += optind;
      
          	if (!node_set)
          		numa_bitmask_setbit(nmask, 0);
      
          	pagesize = getpagesize();
      
          	addr = mmap(NULL, pagesize*3, PROT_READ|PROT_WRITE,
          		    MAP_ANON|MAP_PRIVATE, 0, 0);
          	if (addr == MAP_FAILED)
          		perror("mmap "), exit(1);
      
          	fprintf(stderr, "pid = %d \n" "addr = %p\n", getpid(), addr);
      
          	/* make page populate */
          	memset(addr, 0, pagesize*3);
      
          	/* first mbind */
          	err = mbind(addr+pagesize, pagesize, MPOL_BIND, nmask->maskp,
          		    nmask->size, MPOL_MF_MOVE_ALL);
          	if (err)
          		error("mbind1 ");
      
          	/* second mbind */
          	err = mbind(addr, pagesize*3, MPOL_DEFAULT, NULL, 0, 0);
          	if (err)
          		error("mbind2 ");
      
          	sprintf(buf, "cat /proc/%d/maps", getpid());
          	system(buf);
      
          	return 0;
          }
          ----------------------------------------------------------------
      
      result without this patch
      
      	addr = 0x7fe26ef09000
      	[snip]
      	7fe26ef09000-7fe26ef0a000 rw-p 00000000 00:00 0
      	7fe26ef0a000-7fe26ef0b000 rw-p 00000000 00:00 0
      	7fe26ef0b000-7fe26ef0c000 rw-p 00000000 00:00 0
      	7fe26ef0c000-7fe26ef0d000 rw-p 00000000 00:00 0
      
      	=> 0x7fe26ef09000-0x7fe26ef0c000 have three vmas.
      
      result with this patch
      
      	addr = 0x7fc9ebc76000
      	[snip]
      	7fc9ebc76000-7fc9ebc7a000 rw-p 00000000 00:00 0
      	7fffbe690000-7fffbe6a5000 rw-p 00000000	00:00 0	[stack]
      
      	=> 0x7fc9ebc76000-0x7fc9ebc7a000 have only one vma.
      
      [minchan.kim@gmail.com: fix file offset passed to vma_merge()]
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: NMinchan Kim <minchan.kim@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9d8cebd4
  17. 04 3月, 2010 1 次提交
    • P
      rcu: Suppress __mpol_dup() false positive from RCU lockdep · 99ee4ca7
      Paul E. McKenney 提交于
      Common code is used during task creation and after the task has
      started running.  RCU protection is not needed during task
      creation because no other CPU has access to the
      under-construction task.  Provide the RCU protection anyway to
      suppress the false positive, as there does not appear to be a
      good way for the common code to recognize that the task is only
      accessible to the CPU creating it.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1267667418-32233-2-git-send-email-paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      99ee4ca7
  18. 16 12月, 2009 1 次提交
    • H
      ksm: memory hotremove migration only · 62b61f61
      Hugh Dickins 提交于
      The previous patch enables page migration of ksm pages, but that soon gets
      into trouble: not surprising, since we're using the ksm page lock to lock
      operations on its stable_node, but page migration switches the page whose
      lock is to be used for that.  Another layer of locking would fix it, but
      do we need that yet?
      
      Do we actually need page migration of ksm pages?  Yes, memory hotremove
      needs to offline sections of memory: and since we stopped allocating ksm
      pages with GFP_HIGHUSER, they will tend to be GFP_HIGHUSER_MOVABLE
      candidates for migration.
      
      But KSM is currently unconscious of NUMA issues, happily merging pages
      from different NUMA nodes: at present the rule must be, not to use
      MADV_MERGEABLE where you care about NUMA.  So no, NUMA page migration of
      ksm pages does not make sense yet.
      
      So, to complete support for ksm swapping we need to make hotremove safe.
      ksm_memory_callback() take ksm_thread_mutex when MEM_GOING_OFFLINE and
      release it when MEM_OFFLINE or MEM_CANCEL_OFFLINE.  But if mapped pages
      are freed before migration reaches them, stable_nodes may be left still
      pointing to struct pages which have been removed from the system: the
      stable_node needs to identify a page by pfn rather than page pointer, then
      it can safely prune them when MEM_OFFLINE.
      
      And make NUMA migration skip PageKsm pages where it skips PageReserved.
      But it's only when we reach unmap_and_move() that the page lock is taken
      and we can be sure that raised pagecount has prevented a PageAnon from
      being upgraded: so add offlining arg to migrate_pages(), to migrate ksm
      page when offlining (has sufficient locking) but reject it otherwise.
      Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Izik Eidus <ieidus@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Chris Wright <chrisw@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      62b61f61