1. 22 3月, 2012 17 次提交
    • D
      mm, counters: remove task argument to sync_mm_rss() and __sync_task_rss_stat() · 05af2e10
      David Rientjes 提交于
      sync_mm_rss() can only be used for current to avoid race conditions in
      iterating and clearing its per-task counters.  Remove the task argument
      for it and its helper function, __sync_task_rss_stat(), to avoid thinking
      it can be used safely for anything other than current.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      05af2e10
    • D
      hugepages: fix use after free bug in "quota" handling · 90481622
      David Gibson 提交于
      hugetlbfs_{get,put}_quota() are badly named.  They don't interact with the
      general quota handling code, and they don't much resemble its behaviour.
      Rather than being about maintaining limits on on-disk block usage by
      particular users, they are instead about maintaining limits on in-memory
      page usage (including anonymous MAP_PRIVATE copied-on-write pages)
      associated with a particular hugetlbfs filesystem instance.
      
      Worse, they work by having callbacks to the hugetlbfs filesystem code from
      the low-level page handling code, in particular from free_huge_page().
      This is a layering violation of itself, but more importantly, if the
      kernel does a get_user_pages() on hugepages (which can happen from KVM
      amongst others), then the free_huge_page() can be delayed until after the
      associated inode has already been freed.  If an unmount occurs at the
      wrong time, even the hugetlbfs superblock where the "quota" limits are
      stored may have been freed.
      
      Andrew Barry proposed a patch to fix this by having hugepages, instead of
      storing a pointer to their address_space and reaching the superblock from
      there, had the hugepages store pointers directly to the superblock,
      bumping the reference count as appropriate to avoid it being freed.
      Andrew Morton rejected that version, however, on the grounds that it made
      the existing layering violation worse.
      
      This is a reworked version of Andrew's patch, which removes the extra, and
      some of the existing, layering violation.  It works by introducing the
      concept of a hugepage "subpool" at the lower hugepage mm layer - that is a
      finite logical pool of hugepages to allocate from.  hugetlbfs now creates
      a subpool for each filesystem instance with a page limit set, and a
      pointer to the subpool gets added to each allocated hugepage, instead of
      the address_space pointer used now.  The subpool has its own lifetime and
      is only freed once all pages in it _and_ all other references to it (i.e.
      superblocks) are gone.
      
      subpools are optional - a NULL subpool pointer is taken by the code to
      mean that no subpool limits are in effect.
      
      Previous discussion of this bug found in:  "Fix refcounting in hugetlbfs
      quota handling.". See:  https://lkml.org/lkml/2011/8/11/28 or
      http://marc.info/?l=linux-mm&m=126928970510627&w=1
      
      v2: Fixed a bug spotted by Hillf Danton, and removed the extra parameter to
      alloc_huge_page() - since it already takes the vma, it is not necessary.
      Signed-off-by: NAndrew Barry <abarry@cray.com>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      90481622
    • D
      hugetlb: cleanup hugetlb.h · a1d776ee
      David Gibson 提交于
      Make a couple of small cleanups to linux/include/hugetlb.h.  The
      set_file_hugepages() function, which was not used anywhere is removed,
      and the hugetlbfs_config and hugetlbfs_inode_info structures with its
      HUGETLBFS_I helper function are moved into inode.c, the only place they
      were used.
      
      These structures are really linked to the hugetlbfs filesystem
      specifically not to hugepage mm handling in general, so they belong in
      the filesystem code not in a generally available header.
      
      It would be nice to move the hugetlbfs_sb_info (superblock) structure in
      there as well, but it's currently needed in a number of places via the
      hstate_vma() and hstate_inode().
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Andrew Barry <abarry@cray.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a1d776ee
    • M
      cpuset: mm: reduce large amounts of memory barrier related damage v3 · cc9a6c87
      Mel Gorman 提交于
      Commit c0ff7453 ("cpuset,mm: fix no node to alloc memory when
      changing cpuset's mems") wins a super prize for the largest number of
      memory barriers entered into fast paths for one commit.
      
      [get|put]_mems_allowed is incredibly heavy with pairs of full memory
      barriers inserted into a number of hot paths.  This was detected while
      investigating at large page allocator slowdown introduced some time
      after 2.6.32.  The largest portion of this overhead was shown by
      oprofile to be at an mfence introduced by this commit into the page
      allocator hot path.
      
      For extra style points, the commit introduced the use of yield() in an
      implementation of what looks like a spinning mutex.
      
      This patch replaces the full memory barriers on both read and write
      sides with a sequence counter with just read barriers on the fast path
      side.  This is much cheaper on some architectures, including x86.  The
      main bulk of the patch is the retry logic if the nodemask changes in a
      manner that can cause a false failure.
      
      While updating the nodemask, a check is made to see if a false failure
      is a risk.  If it is, the sequence number gets bumped and parallel
      allocators will briefly stall while the nodemask update takes place.
      
      In a page fault test microbenchmark, oprofile samples from
      __alloc_pages_nodemask went from 4.53% of all samples to 1.15%.  The
      actual results were
      
                                   3.3.0-rc3          3.3.0-rc3
                                   rc3-vanilla        nobarrier-v2r1
          Clients   1 UserTime       0.07 (  0.00%)   0.08 (-14.19%)
          Clients   2 UserTime       0.07 (  0.00%)   0.07 (  2.72%)
          Clients   4 UserTime       0.08 (  0.00%)   0.07 (  3.29%)
          Clients   1 SysTime        0.70 (  0.00%)   0.65 (  6.65%)
          Clients   2 SysTime        0.85 (  0.00%)   0.82 (  3.65%)
          Clients   4 SysTime        1.41 (  0.00%)   1.41 (  0.32%)
          Clients   1 WallTime       0.77 (  0.00%)   0.74 (  4.19%)
          Clients   2 WallTime       0.47 (  0.00%)   0.45 (  3.73%)
          Clients   4 WallTime       0.38 (  0.00%)   0.37 (  1.58%)
          Clients   1 Flt/sec/cpu  497620.28 (  0.00%) 520294.53 (  4.56%)
          Clients   2 Flt/sec/cpu  414639.05 (  0.00%) 429882.01 (  3.68%)
          Clients   4 Flt/sec/cpu  257959.16 (  0.00%) 258761.48 (  0.31%)
          Clients   1 Flt/sec      495161.39 (  0.00%) 517292.87 (  4.47%)
          Clients   2 Flt/sec      820325.95 (  0.00%) 850289.77 (  3.65%)
          Clients   4 Flt/sec      1020068.93 (  0.00%) 1022674.06 (  0.26%)
          MMTests Statistics: duration
          Sys Time Running Test (seconds)             135.68    132.17
          User+Sys Time Running Test (seconds)         164.2    160.13
          Total Elapsed Time (seconds)                123.46    120.87
      
      The overall improvement is small but the System CPU time is much
      improved and roughly in correlation to what oprofile reported (these
      performance figures are without profiling so skew is expected).  The
      actual number of page faults is noticeably improved.
      
      For benchmarks like kernel builds, the overall benefit is marginal but
      the system CPU time is slightly reduced.
      
      To test the actual bug the commit fixed I opened two terminals.  The
      first ran within a cpuset and continually ran a small program that
      faulted 100M of anonymous data.  In a second window, the nodemask of the
      cpuset was continually randomised in a loop.
      
      Without the commit, the program would fail every so often (usually
      within 10 seconds) and obviously with the commit everything worked fine.
      With this patch applied, it also worked fine so the fix should be
      functionally equivalent.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: Miao Xie <miaox@cn.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cc9a6c87
    • D
      mm, memcg: pass charge order to oom killer · e845e199
      David Rientjes 提交于
      The oom killer typically displays the allocation order at the time of oom
      as a part of its diangostic messages (for global, cpuset, and mempolicy
      ooms).
      
      The memory controller may also pass the charge order to the oom killer so
      it can emit the same information.  This is useful in determining how large
      the memory allocation is that triggered the oom killer.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e845e199
    • K
      mm: drain percpu lru add/rotate page-vectors on cpu hot-unplug · f0cb3c76
      Konstantin Khlebnikov 提交于
      This cpu hotplug hook was accidentally removed in commit 00a62ce9
      ("mm: fix Committed_AS underflow on large NR_CPUS environment")
      
      The visible effect of this accident: some pages are borrowed in per-cpu
      page-vectors.  Truncate can deal with it, but these pages cannot be
      reused while this cpu is offline.  So this is like a temporary memory
      leak.
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Eric B Munson <ebmunson@us.ibm.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f0cb3c76
    • D
      thp: allow a hwpoisoned head page to be put back to LRU · 385de357
      Dean Nelson 提交于
      Andrea Arcangeli pointed out to me that a check in __memory_failure()
      which was intended to prevent THP tail pages from being checked for the
      absence of the PG_lru flag (something that is always the case), was also
      preventing THP head pages from being checked.
      
      A THP head page could actually benefit from the call to shake_page() by
      ending up being put back to a LRU, provided it had been waiting in a
      pagevec array.
      
      Andrea suggested that the "!PageTransCompound(p)" in the if-statement
      should be replaced by a "!PageTransTail(p)", thus allowing THP head pages
      to be checked and possibly shaken.
      Signed-off-by: NDean Nelson <dnelson@redhat.com>
      Cc: Jin Dongming <jin.dongming@np.css.fujitsu.com>
      Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      385de357
    • D
      mm, oom: force oom kill on sysrq+f · 08ab9b10
      David Rientjes 提交于
      The oom killer chooses not to kill a thread if:
      
       - an eligible thread has already been oom killed and has yet to exit,
         and
      
       - an eligible thread is exiting but has yet to free all its memory and
         is not the thread attempting to currently allocate memory.
      
      SysRq+F manually invokes the global oom killer to kill a memory-hogging
      task.  This is normally done as a last resort to free memory when no
      progress is being made or to test the oom killer itself.
      
      For both uses, we always want to kill a thread and never defer.  This
      patch causes SysRq+F to always kill an eligible thread and can be used to
      force a kill even if another oom killed thread has failed to exit.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: NPekka Enberg <penberg@kernel.org>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      08ab9b10
    • S
      procfs: mark thread stack correctly in proc/<pid>/maps · b7643757
      Siddhesh Poyarekar 提交于
      Stack for a new thread is mapped by userspace code and passed via
      sys_clone.  This memory is currently seen as anonymous in
      /proc/<pid>/maps, which makes it difficult to ascertain which mappings
      are being used for thread stacks.  This patch uses the individual task
      stack pointers to determine which vmas are actually thread stacks.
      
      For a multithreaded program like the following:
      
      	#include <pthread.h>
      
      	void *thread_main(void *foo)
      	{
      		while(1);
      	}
      
      	int main()
      	{
      		pthread_t t;
      		pthread_create(&t, NULL, thread_main, NULL);
      		pthread_join(t, NULL);
      	}
      
      proc/PID/maps looks like the following:
      
          00400000-00401000 r-xp 00000000 fd:0a 3671804                            /home/siddhesh/a.out
          00600000-00601000 rw-p 00000000 fd:0a 3671804                            /home/siddhesh/a.out
          019ef000-01a10000 rw-p 00000000 00:00 0                                  [heap]
          7f8a44491000-7f8a44492000 ---p 00000000 00:00 0
          7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0
          7f8a44c92000-7f8a44e3d000 r-xp 00000000 fd:00 2097482                    /lib64/libc-2.14.90.so
          7f8a44e3d000-7f8a4503d000 ---p 001ab000 fd:00 2097482                    /lib64/libc-2.14.90.so
          7f8a4503d000-7f8a45041000 r--p 001ab000 fd:00 2097482                    /lib64/libc-2.14.90.so
          7f8a45041000-7f8a45043000 rw-p 001af000 fd:00 2097482                    /lib64/libc-2.14.90.so
          7f8a45043000-7f8a45048000 rw-p 00000000 00:00 0
          7f8a45048000-7f8a4505f000 r-xp 00000000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
          7f8a4505f000-7f8a4525e000 ---p 00017000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
          7f8a4525e000-7f8a4525f000 r--p 00016000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
          7f8a4525f000-7f8a45260000 rw-p 00017000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
          7f8a45260000-7f8a45264000 rw-p 00000000 00:00 0
          7f8a45264000-7f8a45286000 r-xp 00000000 fd:00 2097348                    /lib64/ld-2.14.90.so
          7f8a45457000-7f8a4545a000 rw-p 00000000 00:00 0
          7f8a45484000-7f8a45485000 rw-p 00000000 00:00 0
          7f8a45485000-7f8a45486000 r--p 00021000 fd:00 2097348                    /lib64/ld-2.14.90.so
          7f8a45486000-7f8a45487000 rw-p 00022000 fd:00 2097348                    /lib64/ld-2.14.90.so
          7f8a45487000-7f8a45488000 rw-p 00000000 00:00 0
          7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0                          [stack]
          7fff627ff000-7fff62800000 r-xp 00000000 00:00 0                          [vdso]
          ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
      
      Here, one could guess that 7f8a44492000-7f8a44c92000 is a stack since
      the earlier vma that has no permissions (7f8a44e3d000-7f8a4503d000) but
      that is not always a reliable way to find out which vma is a thread
      stack.  Also, /proc/PID/maps and /proc/PID/task/TID/maps has the same
      content.
      
      With this patch in place, /proc/PID/task/TID/maps are treated as 'maps
      as the task would see it' and hence, only the vma that that task uses as
      stack is marked as [stack].  All other 'stack' vmas are marked as
      anonymous memory.  /proc/PID/maps acts as a thread group level view,
      where all thread stack vmas are marked as [stack:TID] where TID is the
      process ID of the task that uses that vma as stack, while the process
      stack is marked as [stack].
      
      So /proc/PID/maps will look like this:
      
          00400000-00401000 r-xp 00000000 fd:0a 3671804                            /home/siddhesh/a.out
          00600000-00601000 rw-p 00000000 fd:0a 3671804                            /home/siddhesh/a.out
          019ef000-01a10000 rw-p 00000000 00:00 0                                  [heap]
          7f8a44491000-7f8a44492000 ---p 00000000 00:00 0
          7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0                          [stack:1442]
          7f8a44c92000-7f8a44e3d000 r-xp 00000000 fd:00 2097482                    /lib64/libc-2.14.90.so
          7f8a44e3d000-7f8a4503d000 ---p 001ab000 fd:00 2097482                    /lib64/libc-2.14.90.so
          7f8a4503d000-7f8a45041000 r--p 001ab000 fd:00 2097482                    /lib64/libc-2.14.90.so
          7f8a45041000-7f8a45043000 rw-p 001af000 fd:00 2097482                    /lib64/libc-2.14.90.so
          7f8a45043000-7f8a45048000 rw-p 00000000 00:00 0
          7f8a45048000-7f8a4505f000 r-xp 00000000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
          7f8a4505f000-7f8a4525e000 ---p 00017000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
          7f8a4525e000-7f8a4525f000 r--p 00016000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
          7f8a4525f000-7f8a45260000 rw-p 00017000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
          7f8a45260000-7f8a45264000 rw-p 00000000 00:00 0
          7f8a45264000-7f8a45286000 r-xp 00000000 fd:00 2097348                    /lib64/ld-2.14.90.so
          7f8a45457000-7f8a4545a000 rw-p 00000000 00:00 0
          7f8a45484000-7f8a45485000 rw-p 00000000 00:00 0
          7f8a45485000-7f8a45486000 r--p 00021000 fd:00 2097348                    /lib64/ld-2.14.90.so
          7f8a45486000-7f8a45487000 rw-p 00022000 fd:00 2097348                    /lib64/ld-2.14.90.so
          7f8a45487000-7f8a45488000 rw-p 00000000 00:00 0
          7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0                          [stack]
          7fff627ff000-7fff62800000 r-xp 00000000 00:00 0                          [vdso]
          ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
      
      Thus marking all vmas that are used as stacks by the threads in the
      thread group along with the process stack.  The task level maps will
      however like this:
      
          00400000-00401000 r-xp 00000000 fd:0a 3671804                            /home/siddhesh/a.out
          00600000-00601000 rw-p 00000000 fd:0a 3671804                            /home/siddhesh/a.out
          019ef000-01a10000 rw-p 00000000 00:00 0                                  [heap]
          7f8a44491000-7f8a44492000 ---p 00000000 00:00 0
          7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0                          [stack]
          7f8a44c92000-7f8a44e3d000 r-xp 00000000 fd:00 2097482                    /lib64/libc-2.14.90.so
          7f8a44e3d000-7f8a4503d000 ---p 001ab000 fd:00 2097482                    /lib64/libc-2.14.90.so
          7f8a4503d000-7f8a45041000 r--p 001ab000 fd:00 2097482                    /lib64/libc-2.14.90.so
          7f8a45041000-7f8a45043000 rw-p 001af000 fd:00 2097482                    /lib64/libc-2.14.90.so
          7f8a45043000-7f8a45048000 rw-p 00000000 00:00 0
          7f8a45048000-7f8a4505f000 r-xp 00000000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
          7f8a4505f000-7f8a4525e000 ---p 00017000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
          7f8a4525e000-7f8a4525f000 r--p 00016000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
          7f8a4525f000-7f8a45260000 rw-p 00017000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
          7f8a45260000-7f8a45264000 rw-p 00000000 00:00 0
          7f8a45264000-7f8a45286000 r-xp 00000000 fd:00 2097348                    /lib64/ld-2.14.90.so
          7f8a45457000-7f8a4545a000 rw-p 00000000 00:00 0
          7f8a45484000-7f8a45485000 rw-p 00000000 00:00 0
          7f8a45485000-7f8a45486000 r--p 00021000 fd:00 2097348                    /lib64/ld-2.14.90.so
          7f8a45486000-7f8a45487000 rw-p 00022000 fd:00 2097348                    /lib64/ld-2.14.90.so
          7f8a45487000-7f8a45488000 rw-p 00000000 00:00 0
          7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0
          7fff627ff000-7fff62800000 r-xp 00000000 00:00 0                          [vdso]
          ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
      
      where only the vma that is being used as a stack by *that* task is
      marked as [stack].
      
      Analogous changes have been made to /proc/PID/smaps,
      /proc/PID/numa_maps, /proc/PID/task/TID/smaps and
      /proc/PID/task/TID/numa_maps. Relevant snippets from smaps and
      numa_maps:
      
          [siddhesh@localhost ~ ]$ pgrep a.out
          1441
          [siddhesh@localhost ~ ]$ cat /proc/1441/smaps | grep "\[stack"
          7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0                          [stack:1442]
          7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0                          [stack]
          [siddhesh@localhost ~ ]$ cat /proc/1441/task/1442/smaps | grep "\[stack"
          7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0                          [stack]
          [siddhesh@localhost ~ ]$ cat /proc/1441/task/1441/smaps | grep "\[stack"
          7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0                          [stack]
          [siddhesh@localhost ~ ]$ cat /proc/1441/numa_maps | grep "stack"
          7f8a44492000 default stack:1442 anon=2 dirty=2 N0=2
          7fff6273a000 default stack anon=3 dirty=3 N0=3
          [siddhesh@localhost ~ ]$ cat /proc/1441/task/1442/numa_maps | grep "stack"
          7f8a44492000 default stack anon=2 dirty=2 N0=2
          [siddhesh@localhost ~ ]$ cat /proc/1441/task/1441/numa_maps | grep "stack"
          7fff6273a000 default stack anon=3 dirty=3 N0=3
      
      [akpm@linux-foundation.org: checkpatch fixes]
      [akpm@linux-foundation.org: fix build]
      Signed-off-by: NSiddhesh Poyarekar <siddhesh.poyarekar@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Jamie Lokier <jamie@shareable.org>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b7643757
    • X
      rmap: remove __anon_vma_link() declaration · 978ea78b
      Xiao Guangrong 提交于
      This declaration is not used anymore, remove it.
      Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      978ea78b
    • K
      mm: replace PAGE_MIGRATION with IS_ENABLED(CONFIG_MIGRATION) · ce1744f4
      Konstantin Khlebnikov 提交于
      Since commit 2a11c8ea ("kconfig: Introduce IS_ENABLED(),
      IS_BUILTIN() and IS_MODULE()") there is a generic grep-friendly method
      for checking config options in C expressions.
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ce1744f4
    • N
      pagemap: export KPF_THP · e873c49f
      Naoya Horiguchi 提交于
      This flag shows that a given page is a subpage of a transparent hugepage.
      It helps us debug and test the kernel by showing physical address of thp.
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reviewed-by: NWu Fengguang <fengguang.wu@intel.com>
      Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e873c49f
    • N
      thp: optimize away unnecessary page table locking · 025c5b24
      Naoya Horiguchi 提交于
      Currently when we check if we can handle thp as it is or we need to split
      it into regular sized pages, we hold page table lock prior to check
      whether a given pmd is mapping thp or not.  Because of this, when it's not
      "huge pmd" we suffer from unnecessary lock/unlock overhead.  To remove it,
      this patch introduces a optimized check function and replace several
      similar logics with it.
      
      [akpm@linux-foundation.org: checkpatch fixes]
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      025c5b24
    • R
      vmscan: only defer compaction for failed order and higher · aff62249
      Rik van Riel 提交于
      Currently a failed order-9 (transparent hugepage) compaction can lead to
      memory compaction being temporarily disabled for a memory zone.  Even if
      we only need compaction for an order 2 allocation, eg.  for jumbo frames
      networking.
      
      The fix is relatively straightforward: keep track of the highest order at
      which compaction is succeeding, and only defer compaction for orders at
      which compaction is failing.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aff62249
    • R
      vmscan: kswapd carefully call compaction · 7be62de9
      Rik van Riel 提交于
      With CONFIG_COMPACTION enabled, kswapd does not try to free contiguous
      free pages, even when it is woken for a higher order request.
      
      This could be bad for eg.  jumbo frame network allocations, which are done
      from interrupt context and cannot compact memory themselves.  Higher than
      before allocation failure rates in the network receive path have been
      observed in kernels with compaction enabled.
      
      Teach kswapd to defragment the memory zones in a node, but only if
      required and compaction is not deferred in a zone.
      
      [akpm@linux-foundation.org: reduce scope of zones_need_compaction]
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7be62de9
    • R
      mm: make swapin readahead skip over holes · 67f96aa2
      Rik van Riel 提交于
      Ever since abandoning the virtual scan of processes, for scalability
      reasons, swap space has been a little more fragmented than before.  This
      can lead to the situation where a large memory user is killed, swap space
      ends up full of "holes" and swapin readahead is totally ineffective.
      
      On my home system, after killing a leaky firefox it took over an hour to
      page just under 2GB of memory back in, slowing the virtual machines down
      to a crawl.
      
      This patch makes swapin readahead simply skip over holes, instead of
      stopping at them.  This allows the system to swap things back in at rates
      of several MB/second, instead of a few hundred kB/second.
      
      The checks done in valid_swaphandles are already done in
      read_swap_cache_async as well, allowing us to remove a fair amount of
      code.
      
      [akpm@linux-foundation.org: fix it for page_cluster >= 32]
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Adrian Drzewiecki <z@drze.net>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      67f96aa2
    • K
      mm: make get_mm_counter static-inline · 69c97823
      Konstantin Khlebnikov 提交于
      Make get_mm_counter() always static inline, it is simple enough for that.
      And remove unused set_mm_counter()
      
      bloat-o-meter:
      
      add/remove: 0/1 grow/shrink: 4/12 up/down: 99/-341 (-242)
      function                                     old     new   delta
      try_to_unmap_one                             886     952     +66
      sys_remap_file_pages                        1214    1230     +16
      dup_mm                                      1684    1700     +16
      do_exit                                     2277    2278      +1
      zap_page_range                               208     205      -3
      unmap_region                                 304     296      -8
      static.oom_kill_process                      554     546      -8
      try_to_unmap_file                           1716    1700     -16
      getrusage                                    925     909     -16
      flush_old_exec                              1704    1688     -16
      static.dump_header                           416     390     -26
      acct_update_integrals                        218     187     -31
      do_task_stat                                2986    2954     -32
      get_mm_counter                                34       -     -34
      xacct_add_tsk                                371     334     -37
      task_statm                                   172     118     -54
      task_mem                                     383     323     -60
      
      try_to_unmap_one() grows because update_hiwater_rss() now completely inline.
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NKirill A. Shutemov <kirill@shutemov.name>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      69c97823
  2. 20 3月, 2012 10 次提交
  3. 18 3月, 2012 1 次提交
  4. 17 3月, 2012 4 次提交
  5. 16 3月, 2012 3 次提交
  6. 15 3月, 2012 1 次提交
  7. 14 3月, 2012 3 次提交
  8. 13 3月, 2012 1 次提交