1. 23 6月, 2011 1 次提交
  2. 02 6月, 2011 1 次提交
  3. 30 5月, 2011 1 次提交
  4. 29 5月, 2011 4 次提交
    • H
      mm: fix page_lock_anon_vma leaving mutex locked · eee0f252
      Hugh Dickins 提交于
      On one machine I've been getting hangs, a page fault's anon_vma_prepare()
      waiting in anon_vma_lock(), other processes waiting for that page's lock.
      
      This is a replay of last year's f1819427 "mm: fix hang on
      anon_vma->root->lock".
      
      The new page_lock_anon_vma() places too much faith in its refcount: when
      it has acquired the mutex_trylock(), it's possible that a racing task in
      anon_vma_alloc() has just reallocated the struct anon_vma, set refcount
      to 1, and is about to reset its anon_vma->root.
      
      Fix this by saving anon_vma->root, and relying on the usual page_mapped()
      check instead of a refcount check: if page is still mapped, the anon_vma
      is still ours; if page is not still mapped, we're no longer interested.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eee0f252
    • H
      mm: fix kernel BUG at mm/rmap.c:1017! · 5dbe0af4
      Hugh Dickins 提交于
      I've hit the "address >= vma->vm_end" check in do_page_add_anon_rmap()
      just once.  The stack showed khugepaged allocation trying to compact
      pages: the call to page_add_anon_rmap() coming from remove_migration_pte().
      
      That path holds anon_vma lock, but does not hold mmap_sem: it can
      therefore race with a split_vma(), and in commit 5f70b962 "mmap:
      avoid unnecessary anon_vma lock" we just took away the anon_vma lock
      protection when adjusting vma->vm_end.
      
      I don't think that particular BUG_ON ever caught anything interesting,
      so better replace it by a comment, than reinstate the anon_vma locking.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5dbe0af4
    • H
      tmpfs: fix race between truncate and writepage · 826267cf
      Hugh Dickins 提交于
      While running fsx on tmpfs with a memhog then swapoff, swapoff was hanging
      (interruptibly), repeatedly failing to locate the owner of a 0xff entry in
      the swap_map.
      
      Although shmem_writepage() does abandon when it sees incoming page index
      is beyond eof, there was still a window in which shmem_truncate_range()
      could come in between writepage's dropping lock and updating swap_map,
      find the half-completed swap_map entry, and in trying to free it,
      leave it in a state that swap_shmem_alloc() could not correct.
      
      Arguably a bug in __swap_duplicate()'s and swap_entry_free()'s handling
      of the different cases, but easiest to fix by moving swap_shmem_alloc()
      under cover of the lock.
      
      More interesting than the bug: it's been there since 2.6.33, why could
      I not see it with earlier kernels?  The mmotm of two weeks ago seems to
      have some magic for generating races, this is just one of three I found.
      
      With yesterday's git I first saw this in mainline, bisected in search of
      that magic, but the easy reproducibility evaporated.  Oh well, fix the bug.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      826267cf
    • A
      Cache xattr security drop check for write v2 · 69b45732
      Andi Kleen 提交于
      Some recent benchmarking on btrfs showed that a major scaling bottleneck
      on large systems on btrfs is currently the xattr lookup on every write.
      
      Why xattr lookup on every write I hear you ask?
      
      write wants to drop suid and security related xattrs that could set o
      capabilities for executables.  To do that it currently looks up
      security.capability on EVERY write (even for non executables) to decide
      whether to drop it or not.
      
      In btrfs this causes an additional tree walk, hitting some per file system
      locks and quite bad scalability. In a simple read workload on a 8S
      system I saw over 90% CPU time in spinlocks related to that.
      
      Chris Mason tells me this is also a problem in ext4, where it hits
      the global mbcache lock.
      
      This patch adds a simple per inode to avoid this problem.  We only
      do the lookup once per file and then if there is no xattr cache
      the decision. All xattr changes clear the flag.
      
      I also used the same flag to avoid the suid check, although
      that one is pretty cheap.
      
      A file system can also set this flag when it creates the inode,
      if it has a cheap way to do so.  This is done for some common file systems
      in followon patches.
      
      With this patch a major part of the lock contention disappears
      for btrfs. Some testing on smaller systems didn't show significant
      performance changes, but at least it helps the larger systems
      and is generally more efficient.
      
      v2: Rename is_sgid. add file system helper.
      Cc: chris.mason@oracle.com
      Cc: josef@redhat.com
      Cc: viro@zeniv.linux.org.uk
      Cc: agruen@linbit.com
      Cc: Serge E. Hallyn <serue@us.ibm.com>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      69b45732
  5. 28 5月, 2011 1 次提交
  6. 27 5月, 2011 16 次提交
    • Y
      memcg: add the pagefault count into memcg stats · 456f998e
      Ying Han 提交于
      Two new stats in per-memcg memory.stat which tracks the number of page
      faults and number of major page faults.
      
        "pgfault"
        "pgmajfault"
      
      They are different from "pgpgin"/"pgpgout" stat which count number of
      pages charged/discharged to the cgroup and have no meaning of reading/
      writing page to disk.
      
      It is valuable to track the two stats for both measuring application's
      performance as well as the efficiency of the kernel page reclaim path.
      Counting pagefaults per process is useful, but we also need the aggregated
      value since processes are monitored and controlled in cgroup basis in
      memcg.
      
      Functional test: check the total number of pgfault/pgmajfault of all
      memcgs and compare with global vmstat value:
      
        $ cat /proc/vmstat | grep fault
        pgfault 1070751
        pgmajfault 553
      
        $ cat /dev/cgroup/memory.stat | grep fault
        pgfault 1071138
        pgmajfault 553
        total_pgfault 1071142
        total_pgmajfault 553
      
        $ cat /dev/cgroup/A/memory.stat | grep fault
        pgfault 199
        pgmajfault 0
        total_pgfault 199
        total_pgmajfault 0
      
      Performance test: run page fault test(pft) wit 16 thread on faulting in
      15G anon pages in 16G container.  There is no regression noticed on the
      "flt/cpu/s"
      
      Sample output from pft:
      
        TAG pft:anon-sys-default:
          Gb  Thr CLine   User     System     Wall    flt/cpu/s fault/wsec
          15   16   1     0.67s   233.41s    14.76s   16798.546 266356.260
      
        +-------------------------------------------------------------------------+
            N           Min           Max        Median           Avg        Stddev
        x  10     16682.962     17344.027     16913.524     16928.812      166.5362
        +  10     16695.568     16923.896     16820.604     16824.652     84.816568
        No difference proven at 95.0% confidence
      
      [akpm@linux-foundation.org: fix build]
      [hughd@google.com: shmem fix]
      Signed-off-by: NYing Han <yinghan@google.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      456f998e
    • Y
      memcg: add memory.numastat api for numa statistics · 406eb0c9
      Ying Han 提交于
      The new API exports numa_maps per-memcg basis.  This is a piece of useful
      information where it exports per-memcg page distribution across real numa
      nodes.
      
      One of the usecases is evaluating application performance by combining
      this information w/ the cpu allocation to the application.
      
      The output of the memory.numastat tries to follow w/ simiar format of
      numa_maps like:
      
        total=<total pages> N0=<node 0 pages> N1=<node 1 pages> ...
        file=<total file pages> N0=<node 0 pages> N1=<node 1 pages> ...
        anon=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ...
        unevictable=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ...
      
      And we have per-node:
      
        total = file + anon + unevictable
      
        $ cat /dev/cgroup/memory/memory.numa_stat
        total=250020 N0=87620 N1=52367 N2=45298 N3=64735
        file=225232 N0=83402 N1=46160 N2=40522 N3=55148
        anon=21053 N0=3424 N1=6207 N2=4776 N3=6646
        unevictable=3735 N0=794 N1=0 N2=0 N3=2941
      Signed-off-by: NYing Han <yinghan@google.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      406eb0c9
    • Y
      memcg: rename mem_cgroup_zone_nr_pages() to mem_cgroup_zone_nr_lru_pages() · 1bac180b
      Ying Han 提交于
      The caller of the function has been renamed to zone_nr_lru_pages(), and
      this is just fixing up in the memcg code.  The current name is easily to
      be mis-read as zone's total number of pages.
      Signed-off-by: NYing Han <yinghan@google.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1bac180b
    • J
      memcg: remove unused retry signal from reclaim · 4fd14ebf
      Johannes Weiner 提交于
      If the memcg reclaim code detects the target memcg below its limit it
      exits and returns a guaranteed non-zero value so that the charge is
      retried.
      
      Nowadays, the charge side checks the memcg limit itself and does not rely
      on this non-zero return value trick.
      
      This patch removes it.  The reclaim code will now always return the true
      number of pages it reclaimed on its own.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: Rik van Riel<riel@redhat.com>
      Acked-by: Ying Han<yinghan@google.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reviewed-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4fd14ebf
    • K
      memcg: fix get_scan_count() for small targets · 246e87a9
      KAMEZAWA Hiroyuki 提交于
      During memory reclaim we determine the number of pages to be scanned per
      zone as
      
      	(anon + file) >> priority.
      Assume
      	scan = (anon + file) >> priority.
      
      If scan < SWAP_CLUSTER_MAX, the scan will be skipped for this time and
      priority gets higher.  This has some problems.
      
        1. This increases priority as 1 without any scan.
           To do scan in this priority, amount of pages should be larger than 512M.
           If pages>>priority < SWAP_CLUSTER_MAX, it's recorded and scan will be
           batched, later. (But we lose 1 priority.)
           If memory size is below 16M, pages >> priority is 0 and no scan in
           DEF_PRIORITY forever.
      
        2. If zone->all_unreclaimabe==true, it's scanned only when priority==0.
           So, x86's ZONE_DMA will never be recoverred until the user of pages
           frees memory by itself.
      
        3. With memcg, the limit of memory can be small. When using small memcg,
           it gets priority < DEF_PRIORITY-2 very easily and need to call
           wait_iff_congested().
           For doing scan before priorty=9, 64MB of memory should be used.
      
      Then, this patch tries to scan SWAP_CLUSTER_MAX of pages in force...when
      
        1. the target is enough small.
        2. it's kswapd or memcg reclaim.
      
      Then we can avoid rapid priority drop and may be able to recover
      all_unreclaimable in a small zones.  And this patch removes nr_saved_scan.
       This will allow scanning in this priority even when pages >> priority is
      very small.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NYing Han <yinghan@google.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      246e87a9
    • Y
      memcg: reclaim memory from nodes in round-robin order · 889976db
      Ying Han 提交于
      Presently, memory cgroup's direct reclaim frees memory from the current
      node.  But this has some troubles.  Usually when a set of threads works in
      a cooperative way, they tend to operate on the same node.  So if they hit
      limits under memcg they will reclaim memory from themselves, damaging the
      active working set.
      
      For example, assume 2 node system which has Node 0 and Node 1 and a memcg
      which has 1G limit.  After some work, file cache remains and the usages
      are
      
         Node 0:  1M
         Node 1:  998M.
      
      and run an application on Node 0, it will eat its foot before freeing
      unnecessary file caches.
      
      This patch adds round-robin for NUMA and adds equal pressure to each node.
      When using cpuset's spread memory feature, this will work very well.
      
      But yes, a better algorithm is needed.
      
      [akpm@linux-foundation.org: comment editing]
      [kamezawa.hiroyu@jp.fujitsu.com: fix time comparisons]
      Signed-off-by: NYing Han <yinghan@google.com>
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      889976db
    • N
      memcg: move page-freeing code out of lock · 6a5b18d2
      Namhyung Kim 提交于
      Move page-freeing code out of swap_cgroup_mutex in the hope that it could
      reduce few of theoretical contentions between swapons and/or swapoffs.
      
      This is just a cleanup, no functional changes.
      Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6a5b18d2
    • N
      memcg: fix off-by-one when calculating swap cgroup map length · 33278f7f
      Namhyung Kim 提交于
      It allocated one more page than necessary if @max_pages was a multiple of
      SC_PER_PAGE.
      Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      33278f7f
    • N
      memcg: mark init_section_page_cgroup() properly · 268433b8
      Namhyung Kim 提交于
      Commit ca371c0d ("memcg: fix page_cgroup fatal error in FLATMEM")
      removes call to alloc_bootmem() in the function so that it can be marked
      as __meminit to reduce memory usage when MEMORY_HOTPLUG=n.
      
      Also as the new helper function alloc_page_cgroup() is called only in the
      function, it should be marked too.
      Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      268433b8
    • M
      memcg: remove pointless next_mz nullification in mem_cgroup_soft_limit_reclaim() · 39cc98f1
      Michal Hocko 提交于
      next_mz is assigned to NULL if __mem_cgroup_largest_soft_limit_node
      selects the same mz.  This doesn't make much sense as we assign to the
      variable right in the next loop.
      
      Compiler will probably optimize this out but it is little bit confusing
      for the code reading.
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      39cc98f1
    • Y
      memcg: add the soft_limit reclaim in global direct reclaim. · d149e3b2
      Ying Han 提交于
      We recently added the change in global background reclaim which counts the
      return value of soft_limit reclaim.  Now this patch adds the similar logic
      on global direct reclaim.
      
      We should skip scanning global LRU on shrink_zone if soft_limit reclaim
      does enough work.  This is the first step where we start with counting the
      nr_scanned and nr_reclaimed from soft_limit reclaim into global
      scan_control.
      Signed-off-by: NYing Han <yinghan@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d149e3b2
    • Y
      memcg: count the soft_limit reclaim in global background reclaim · 0ae5e89c
      Ying Han 提交于
      The global kswapd scans per-zone LRU and reclaims pages regardless of the
      cgroup. It breaks memory isolation since one cgroup can end up reclaiming
      pages from another cgroup. Instead we should rely on memcg-aware target
      reclaim including per-memcg kswapd and soft_limit hierarchical reclaim under
      memory pressure.
      
      In the global background reclaim, we do soft reclaim before scanning the
      per-zone LRU. However, the return value is ignored. This patch is the first
      step to skip shrink_zone() if soft_limit reclaim does enough work.
      
      This is part of the effort which tries to reduce reclaiming pages in global
      LRU in memcg. The per-memcg background reclaim patchset further enhances the
      per-cgroup targetting reclaim, which I should have V4 posted shortly.
      
      Try running multiple memory intensive workloads within seperate memcgs. Watch
      the counters of soft_steal in memory.stat.
      
        $ cat /dev/cgroup/A/memory.stat | grep 'soft'
        soft_steal 240000
        soft_scan 240000
        total_soft_steal 240000
        total_soft_scan 240000
      
      This patch:
      
      In the global background reclaim, we do soft reclaim before scanning the
      per-zone LRU.  However, the return value is ignored.
      
      We would like to skip shrink_zone() if soft_limit reclaim does enough
      work.  Also, we need to make the memory pressure balanced across per-memcg
      zones, like the logic vm-core.  This patch is the first step where we
      start with counting the nr_scanned and nr_reclaimed from soft_limit
      reclaim into the global scan_control.
      Signed-off-by: NYing Han <yinghan@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Acked-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0ae5e89c
    • B
      cgroups: add per-thread subsystem callbacks · f780bdb7
      Ben Blum 提交于
      Add cgroup subsystem callbacks for per-thread attachment in atomic contexts
      
      Add can_attach_task(), pre_attach(), and attach_task() as new callbacks
      for cgroups's subsystem interface.  Unlike can_attach and attach, these
      are for per-thread operations, to be called potentially many times when
      attaching an entire threadgroup.
      
      Also, the old "bool threadgroup" interface is removed, as replaced by
      this.  All subsystems are modified for the new interface - of note is
      cpuset, which requires from/to nodemasks for attach to be globally scoped
      (though per-cpuset would work too) to persist from its pre_attach to
      attach_task and attach.
      
      This is a pre-patch for cgroup-procs-writable.patch.
      Signed-off-by: NBen Blum <bblum@andrew.cmu.edu>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Reviewed-by: NPaul Menage <menage@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Miao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f780bdb7
    • K
      mm: don't access vm_flags as 'int' · ca16d140
      KOSAKI Motohiro 提交于
      The type of vma->vm_flags is 'unsigned long'. Neither 'int' nor
      'unsigned int'. This patch fixes such misuse.
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      [ Changed to use a typedef - we'll extend it to cover more cases
        later, since there has been discussion about making it a 64-bit
        type..                      - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ca16d140
    • D
      mm/fs: add hooks to support cleancache · c515e1fd
      Dan Magenheimer 提交于
      This fourth patch of eight in this cleancache series provides the
      core hooks in VFS for: initializing cleancache per filesystem;
      capturing clean pages reclaimed by page cache; attempting to get
      pages from cleancache before filesystem read; and ensuring coherency
      between pagecache, disk, and cleancache.  Note that the placement
      of these hooks was stable from 2.6.18 to 2.6.38; a minor semantic
      change was required due to a patchset in 2.6.39.
      
      All hooks become no-ops if CONFIG_CLEANCACHE is unset, or become
      a check of a boolean global if CONFIG_CLEANCACHE is set but no
      cleancache "backend" has claimed cleancache_ops.
      
      Details and a FAQ can be found in Documentation/vm/cleancache.txt
      
      [v8: minchan.kim@gmail.com: adapt to new remove_from_page_cache function]
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      Signed-off-by: NDan Magenheimer <dan.magenheimer@oracle.com>
      Reviewed-by: NJeremy Fitzhardinge <jeremy@goop.org>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Rik Van Riel <riel@redhat.com>
      Cc: Jan Beulich <JBeulich@novell.com>
      Cc: Andreas Dilger <adilger@sun.com>
      Cc: Ted Ts'o <tytso@mit.edu>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <joel.becker@oracle.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      c515e1fd
    • D
      mm: cleancache core ops functions and config · 077b1f83
      Dan Magenheimer 提交于
      This third patch of eight in this cleancache series provides
      the core code for cleancache that interfaces between the hooks in
      VFS and individual filesystems and a cleancache backend.  It also
      includes build and config patches.
      
      Two new files are added: mm/cleancache.c and include/linux/cleancache.h.
      
      Note that CONFIG_CLEANCACHE can default to on; in systems that do
      not provide a cleancache backend, all hooks devolve to a simple
      check of a global enable flag, so performance impact should
      be negligible but can be reduced to zero impact if config'ed off.
      However for this first commit, it defaults to off.
      
      Details and a FAQ can be found in Documentation/vm/cleancache.txt
      
      Credits: Cleancache_ops design derived from Jeremy Fitzhardinge
      design for tmem
      
      [v8: dan.magenheimer@oracle.com: fix exportfs call affecting btrfs]
      [v8: akpm@linux-foundation.org: use static inline function, not macro]
      [v7: dan.magenheimer@oracle.com: cleanup sysfs and remove cleancache prefix]
      [v6: JBeulich@novell.com: robustly handle buggy fs encode_fh actor definition]
      [v5: jeremy@goop.org: clean up global usage and static var names]
      [v5: jeremy@goop.org: simplify init hook and any future fs init changes]
      [v5: hch@infradead.org: cleaner non-global interface for ops registration]
      [v4: adilger@sun.com: interface must support exportfs FS's]
      [v4: hch@infradead.org: interface must support 64-bit FS on 32-bit kernel]
      [v3: akpm@linux-foundation.org: use one ops struct to avoid pointer hops]
      [v3: akpm@linux-foundation.org: document and ensure PageLocked reqts are met]
      [v3: ngupta@vflare.org: fix success/fail codes, change funcs to void]
      [v2: viro@ZenIV.linux.org.uk: use sane types]
      Signed-off-by: NDan Magenheimer <dan.magenheimer@oracle.com>
      Reviewed-by: NJeremy Fitzhardinge <jeremy@goop.org>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Acked-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NNitin Gupta <ngupta@vflare.org>
      Acked-by: NMinchan Kim <minchan.kim@gmail.com>
      Acked-by: NAndreas Dilger <adilger@sun.com>
      Acked-by: NJan Beulich <JBeulich@novell.com>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Rik Van Riel <riel@redhat.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ted Ts'o <tytso@mit.edu>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <joel.becker@oracle.com>
      077b1f83
  7. 26 5月, 2011 2 次提交
  8. 25 5月, 2011 14 次提交
    • B
      nommu: add page alignment to mmap · f67d9b15
      Bob Liu 提交于
      Currently on nommu arch mmap(),mremap() and munmap() doesn't do
      page_align() which isn't consist with mmu arch and cause some issues.
      
      First, some drivers' mmap() function depends on vma->vm_end - vma->start
      is page aligned which is true on mmu arch but not on nommu.  eg: uvc
      camera driver.
      
      Second munmap() may return -EINVAL[split file] error in cases when end is
      not page aligned(passed into from userspace) but vma->vm_end is aligned
      dure to split or driver's mmap() ops.
      
      Add page alignment to fix those issues.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NBob Liu <lliubbo@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Greg Ungerer <gerg@snapgear.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f67d9b15
    • S
      mm: batch activate_page() to reduce lock contention · eb709b0d
      Shaohua Li 提交于
      The zone->lru_lock is heavily contented in workload where activate_page()
      is frequently used.  We could do batch activate_page() to reduce the lock
      contention.  The batched pages will be added into zone list when the pool
      is full or page reclaim is trying to drain them.
      
      For example, in a 4 socket 64 CPU system, create a sparse file and 64
      processes, processes shared map to the file.  Each process read access the
      whole file and then exit.  The process exit will do unmap_vmas() and cause
      a lot of activate_page() call.  In such workload, we saw about 58% total
      time reduction with below patch.  Other workloads with a lot of
      activate_page also benefits a lot too.
      
      Andrew Morton suggested activate_page() and putback_lru_pages() should
      follow the same path to active pages, but this is hard to implement (see
      commit 7a608572 ("Revert "mm: batch activate_page() to reduce lock
      contention")).  On the other hand, do we really need putback_lru_pages()
      to follow the same path?  I tested several FIO/FFSB benchmark (about 20
      scripts for each benchmark) in 3 machines here from 2 sockets to 4
      sockets.  My test doesn't show anything significant with/without below
      patch (there is slight difference but mostly some noise which we found
      even without below patch before).  Below patch basically returns to the
      same as my first post.
      
      I tested some microbenchmarks:
        case-anon-cow-rand-mt         0.58%
        case-anon-cow-rand           -3.30%
        case-anon-cow-seq-mt         -0.51%
        case-anon-cow-seq            -5.68%
        case-anon-r-rand-mt           0.23%
        case-anon-r-rand              0.81%
        case-anon-r-seq-mt           -0.71%
        case-anon-r-seq              -1.99%
        case-anon-rx-rand-mt          2.11%
        case-anon-rx-seq-mt           3.46%
        case-anon-w-rand-mt          -0.03%
        case-anon-w-rand             -0.50%
        case-anon-w-seq-mt           -1.08%
        case-anon-w-seq              -0.12%
        case-anon-wx-rand-mt         -5.02%
        case-anon-wx-seq-mt          -1.43%
        case-fork                     1.65%
        case-fork-sleep              -0.07%
        case-fork-withmem             1.39%
        case-hugetlb                 -0.59%
        case-lru-file-mmap-read-mt   -0.54%
        case-lru-file-mmap-read       0.61%
        case-lru-file-mmap-read-rand -2.24%
        case-lru-file-readonce       -0.64%
        case-lru-file-readtwice     -11.69%
        case-lru-memcg               -1.35%
        case-mmap-pread-rand-mt       1.88%
        case-mmap-pread-rand        -15.26%
        case-mmap-pread-seq-mt        0.89%
        case-mmap-pread-seq         -69.72%
        case-mmap-xread-rand-mt       0.71%
        case-mmap-xread-seq-mt        0.38%
      
      The most significent are:
        case-lru-file-readtwice     -11.69%
        case-mmap-pread-rand        -15.26%
        case-mmap-pread-seq         -69.72%
      
      which use activate_page a lot.  others are basically variations because
      each run has slightly difference.
      
      In UP case, 'size mm/swap.o'
      before the two patches:
         text    data     bss     dec     hex filename
         6466     896       4    7366    1cc6 mm/swap.o
      after the two patches:
         text    data     bss     dec     hex filename
         6343     896       4    7243    1c4b mm/swap.o
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Hiroyuki Kamezawa <kamezawa.hiroyuki@gmail.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eb709b0d
    • A
      mm/page_alloc.c: prevent unending loop in __alloc_pages_slowpath() · cfa54a0f
      Andrew Barry 提交于
      I believe I found a problem in __alloc_pages_slowpath, which allows a
      process to get stuck endlessly looping, even when lots of memory is
      available.
      
      Running an I/O and memory intensive stress-test I see a 0-order page
      allocation with __GFP_IO and __GFP_WAIT, running on a system with very
      little free memory.  Right about the same time that the stress-test gets
      killed by the OOM-killer, the utility trying to allocate memory gets stuck
      in __alloc_pages_slowpath even though most of the systems memory was freed
      by the oom-kill of the stress-test.
      
      The utility ends up looping from the rebalance label down through the
      wait_iff_congested continiously.  Because order=0,
      __alloc_pages_direct_compact skips the call to get_page_from_freelist.
      Because all of the reclaimable memory on the system has already been
      reclaimed, __alloc_pages_direct_reclaim skips the call to
      get_page_from_freelist.  Since there is no __GFP_FS flag, the block with
      __alloc_pages_may_oom is skipped.  The loop hits the wait_iff_congested,
      then jumps back to rebalance without ever trying to
      get_page_from_freelist.  This loop repeats infinitely.
      
      The test case is pretty pathological.  Running a mix of I/O stress-tests
      that do a lot of fork() and consume all of the system memory, I can pretty
      reliably hit this on 600 nodes, in about 12 hours.  32GB/node.
      Signed-off-by: NAndrew Barry <abarry@cray.com>
      Signed-off-by: NMinchan Kim <minchan.kim@gmail.com>
      Reviewed-by: Rik van Riel<riel@redhat.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cfa54a0f
    • M
      memsw: remove noswapaccount kernel parameter · a2c8990a
      Michal Hocko 提交于
      The noswapaccount parameter has been deprecated since 2.6.38 without any
      complaints from users so we can remove it.  swapaccount=0|1 can be used
      instead.
      
      As we are removing the parameter we can also clean up swapaccount because
      it doesn't have to accept an empty string anymore (to match noswapaccount)
      and so we can push = into __setup macro rather than checking "=1" resp.
      "=0" strings
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Hiroyuki Kamezawa <kamezawa.hiroyuki@gmail.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a2c8990a
    • S
      mm: proc: move show_numa_map() to fs/proc/task_mmu.c · f69ff943
      Stephen Wilson 提交于
      Moving show_numa_map() from mempolicy.c to task_mmu.c solves several
      issues.
      
        - Having the show() operation "miles away" from the corresponding
          seq_file iteration operations is a maintenance burden.
      
        - The need to export ad hoc info like struct proc_maps_private is
          eliminated.
      
        - The implementation of show_numa_map() can be improved in a simple
          manner by cooperating with the other seq_file operations (start,
          stop, etc) -- something that would be messy to do without this
          change.
      Signed-off-by: NStephen Wilson <wilsons@start.ca>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f69ff943
    • S
      mm: remove check_huge_range() · 9840e372
      Stephen Wilson 提交于
      This function has been superseded by gather_hugetbl_stats() and is no
      longer needed.
      Signed-off-by: NStephen Wilson <wilsons@start.ca>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9840e372
    • S
      mm: make gather_stats() type-safe and remove forward declaration · 722e2ee0
      Stephen Wilson 提交于
      Improve the prototype of gather_stats() to take a struct numa_maps as
      argument instead of a generic void *.  Update all callers to make the
      required type explicit.
      
      Since gather_stats() is not needed before its definition and is scheduled
      to be moved out of mempolicy.c the declaration is removed as well.
      Signed-off-by: NStephen Wilson <wilsons@start.ca>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      722e2ee0
    • S
      mm: remove MPOL_MF_STATS · b1f72d18
      Stephen Wilson 提交于
      Mapping statistics in a NUMA environment is now computed using the generic
      walk_page_range() logic.  Remove the old/equivalent functionality.
      Signed-off-by: NStephen Wilson <wilsons@start.ca>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b1f72d18
    • S
      mm: use walk_page_range() instead of custom page table walking code · 29ea2f69
      Stephen Wilson 提交于
      Converting show_numa_map() to use the generic routine decouples the
      function from mempolicy.c, allowing it to be moved out of the mm subsystem
      and into fs/proc.
      
      Also, include KSM pages in /proc/pid/numa_maps statistics.  The pagewalk
      logic implemented by check_pte_range() failed to account for such pages as
      they were not applicable to the page migration case.
      Signed-off-by: NStephen Wilson <wilsons@start.ca>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      29ea2f69
    • S
      mm: export get_vma_policy() · d98f6cb6
      Stephen Wilson 提交于
      In commit 48fce342 ("mempolicies: unexport get_vma_policy()")
      get_vma_policy() was marked static as all clients were local to
      mempolicy.c.
      
      However, the decision to generate /proc/pid/numa_maps in the numa memory
      policy code and outside the procfs subsystem introduces an artificial
      interdependency between the two systems.  Exporting get_vma_policy() once
      again is the first step to clean up this interdependency.
      Signed-off-by: NStephen Wilson <wilsons@start.ca>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d98f6cb6
    • E
      tmpfs: implement generic xattr support · b09e0fa4
      Eric Paris 提交于
      Implement generic xattrs for tmpfs filesystems.  The Feodra project, while
      trying to replace suid apps with file capabilities, realized that tmpfs,
      which is used on the build systems, does not support file capabilities and
      thus cannot be used to build packages which use file capabilities.  Xattrs
      are also needed for overlayfs.
      
      The xattr interface is a bit odd.  If a filesystem does not implement any
      {get,set,list}xattr functions the VFS will call into some random LSM hooks
      and the running LSM can then implement some method for handling xattrs.
      SELinux for example provides a method to support security.selinux but no
      other security.* xattrs.
      
      As it stands today when one enables CONFIG_TMPFS_POSIX_ACL tmpfs will have
      xattr handler routines specifically to handle acls.  Because of this tmpfs
      would loose the VFS/LSM helpers to support the running LSM.  To make up
      for that tmpfs had stub functions that did nothing but call into the LSM
      hooks which implement the helpers.
      
      This new patch does not use the LSM fallback functions and instead just
      implements a native get/set/list xattr feature for the full security.* and
      trusted.* namespace like a normal filesystem.  This means that tmpfs can
      now support both security.selinux and security.capability, which was not
      previously possible.
      
      The basic implementation is that I attach a:
      
      struct shmem_xattr {
      	struct list_head list; /* anchored by shmem_inode_info->xattr_list */
      	char *name;
      	size_t size;
      	char value[0];
      };
      
      Into the struct shmem_inode_info for each xattr that is set.  This
      implementation could easily support the user.* namespace as well, except
      some care needs to be taken to prevent large amounts of unswappable memory
      being allocated for unprivileged users.
      
      [mszeredi@suse.cz: new config option, suport trusted.*, support symlinks]
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Acked-by: NSerge Hallyn <serge.hallyn@ubuntu.com>
      Tested-by: NSerge Hallyn <serge.hallyn@ubuntu.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Acked-by: NHugh Dickins <hughd@google.com>
      Tested-by: NJordi Pujol <jordipujolp@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b09e0fa4
    • Y
      memblock/nobootmem: remove unneeded code from alloc_bootmem_node_high() · 4eb31707
      Yinghai Lu 提交于
      The bootmem wrapper with memblock supports top-down now, so we no longer
      need this trick.
      Signed-off-by: NYinghai LU <yinghai@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Olaf Hering <olaf@aepfle.de>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Lucas De Marchi <lucas.demarchi@profusion.mobi>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4eb31707
    • D
      mm: fail GFP_DMA allocations when ZONE_DMA is not configured · a197b59a
      David Rientjes 提交于
      The page allocator will improperly return a page from ZONE_NORMAL even
      when __GFP_DMA is passed if CONFIG_ZONE_DMA is disabled.  The caller
      expects DMA memory, perhaps for ISA devices with 16-bit address registers,
      and may get higher memory resulting in undefined behavior.
      
      This patch causes the page allocator to return NULL in such circumstances
      with a warning emitted to the kernel log on the first occurrence.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a197b59a
    • D
      mm: remove dependency on CONFIG_FLATMEM from online_page() · a3bc42f5
      Daniel Kiper 提交于
      online_pages() is only compiled for CONFIG_MEMORY_HOTPLUG_SPARSE, so there
      is no need to support CONFIG_FLATMEM code within it.
      
      This patch removes code that is never used.
      Signed-off-by: NDaniel Kiper <dkiper@net-space.pl>
      Acked-by: NDave Hansen <dave@linux.vnet.ibm.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a3bc42f5