1. 12 12月, 2012 3 次提交
    • R
      mm: introduce compaction and migration for ballooned pages · bf6bddf1
      Rafael Aquini 提交于
      Memory fragmentation introduced by ballooning might reduce significantly
      the number of 2MB contiguous memory blocks that can be used within a guest,
      thus imposing performance penalties associated with the reduced number of
      transparent huge pages that could be used by the guest workload.
      
      This patch introduces the helper functions as well as the necessary changes
      to teach compaction and migration bits how to cope with pages which are
      part of a guest memory balloon, in order to make them movable by memory
      compaction procedures.
      Signed-off-by: NRafael Aquini <aquini@redhat.com>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bf6bddf1
    • R
      mm: adjust address_space_operations.migratepage() return code · 78bd5209
      Rafael Aquini 提交于
      Memory fragmentation introduced by ballooning might reduce significantly
      the number of 2MB contiguous memory blocks that can be used within a
      guest, thus imposing performance penalties associated with the reduced
      number of transparent huge pages that could be used by the guest workload.
      
      This patch-set follows the main idea discussed at 2012 LSFMMS session:
      "Ballooning for transparent huge pages" -- http://lwn.net/Articles/490114/
      to introduce the required changes to the virtio_balloon driver, as well as
      the changes to the core compaction & migration bits, in order to make
      those subsystems aware of ballooned pages and allow memory balloon pages
      become movable within a guest, thus avoiding the aforementioned
      fragmentation issue
      
      Following are numbers that prove this patch benefits on allowing
      compaction to be more effective at memory ballooned guests.
      
      Results for STRESS-HIGHALLOC benchmark, from Mel Gorman's mmtests suite,
      running on a 4gB RAM KVM guest which was ballooning 512mB RAM in 64mB
      chunks, at every minute (inflating/deflating), while test was running:
      
      ===BEGIN stress-highalloc
      
      STRESS-HIGHALLOC
                       highalloc-3.7     highalloc-3.7
                           rc4-clean         rc4-patch
      Pass 1          55.00 ( 0.00%)    62.00 ( 7.00%)
      Pass 2          54.00 ( 0.00%)    62.00 ( 8.00%)
      while Rested    75.00 ( 0.00%)    80.00 ( 5.00%)
      
      MMTests Statistics: duration
                       3.7         3.7
                 rc4-clean   rc4-patch
      User         1207.59     1207.46
      System       1300.55     1299.61
      Elapsed      2273.72     2157.06
      
      MMTests Statistics: vmstat
                                      3.7         3.7
                                rc4-clean   rc4-patch
      Page Ins                    3581516     2374368
      Page Outs                  11148692    10410332
      Swap Ins                         80          47
      Swap Outs                      3641         476
      Direct pages scanned          37978       33826
      Kswapd pages scanned        1828245     1342869
      Kswapd pages reclaimed      1710236     1304099
      Direct pages reclaimed        32207       31005
      Kswapd efficiency               93%         97%
      Kswapd velocity             804.077     622.546
      Direct efficiency               84%         91%
      Direct velocity              16.703      15.682
      Percentage direct scans          2%          2%
      Page writes by reclaim        79252        9704
      Page writes file              75611        9228
      Page writes anon               3641         476
      Page reclaim immediate        16764       11014
      Page rescued immediate            0           0
      Slabs scanned               2171904     2152448
      Direct inode steals             385        2261
      Kswapd inode steals          659137      609670
      Kswapd skipped wait               1          69
      THP fault alloc                 546         631
      THP collapse alloc              361         339
      THP splits                      259         263
      THP fault fallback               98          50
      THP collapse fail                20          17
      Compaction stalls               747         499
      Compaction success              244         145
      Compaction failures             503         354
      Compaction pages moved       370888      474837
      Compaction move failure       77378       65259
      
      ===END stress-highalloc
      
      This patch:
      
      Introduce MIGRATEPAGE_SUCCESS as the default return code for
      address_space_operations.migratepage() method and documents the expected
      return code for the same method in failure cases.
      Signed-off-by: NRafael Aquini <aquini@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      78bd5209
    • B
      mm: introduce mm_find_pmd() · 6219049a
      Bob Liu 提交于
      Several place need to find the pmd by(mm_struct, address), so introduce a
      function to simplify it.
      
      [akpm@linux-foundation.org: fix warning]
      Signed-off-by: NBob Liu <lliubbo@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Ni zhan Chen <nizhan.chen@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6219049a
  2. 01 8月, 2012 3 次提交
  3. 04 6月, 2012 1 次提交
    • H
      mm: fix warning in __set_page_dirty_nobuffers · 752dc185
      Hugh Dickins 提交于
      New tmpfs use of !PageUptodate pages for fallocate() is triggering the
      WARNING: at mm/page-writeback.c:1990 when __set_page_dirty_nobuffers()
      is called from migrate_page_copy() for compaction.
      
      It is anomalous that migration should use __set_page_dirty_nobuffers()
      on an address_space that does not participate in dirty and writeback
      accounting; and this has also been observed to insert surprising dirty
      tags into a tmpfs radix_tree, despite tmpfs not using tags at all.
      
      We should probably give migrate_page_copy() a better way to preserve the
      tag and migrate accounting info, when mapping_cap_account_dirty().  But
      that needs some more work: so in the interim, avoid the warning by using
      a simple SetPageDirty on PageSwapBacked pages.
      Reported-and-tested-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      752dc185
  4. 16 5月, 2012 1 次提交
  5. 26 4月, 2012 1 次提交
  6. 22 3月, 2012 1 次提交
    • C
      mm: fix move/migrate_pages() race on task struct · 3268c63e
      Christoph Lameter 提交于
      Migration functions perform the rcu_read_unlock too early.  As a result
      the task pointed to may change from under us.  This can result in an oops,
      as reported by Dave Hansen in https://lkml.org/lkml/2012/2/23/302.
      
      The following patch extend the period of the rcu_read_lock until after the
      permissions checks are done.  We also take a refcount so that the task
      reference is stable when calling security check functions and performing
      cpuset node validation (which takes a mutex).
      
      The refcount is dropped before actual page migration occurs so there is no
      change to the refcounts held during page migration.
      
      Also move the determination of the mm of the task struct to immediately
      before the do_migrate*() calls so that it is clear that we switch from
      handling the task during permission checks to the mm for the actual
      migration.  Since the determination is only done once and we then no
      longer use the task_struct we can be sure that we operate on a specific
      address space that will not change from under us.
      
      [akpm@linux-foundation.org: checkpatch fixes]
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Reported-by: NDave Hansen <dave@linux.vnet.ibm.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3268c63e
  7. 06 3月, 2012 1 次提交
    • H
      memcg: fix GPF when cgroup removal races with last exit · 7512102c
      Hugh Dickins 提交于
      When moving tasks from old memcg (with move_charge_at_immigrate on new
      memcg), followed by removal of old memcg, hit General Protection Fault in
      mem_cgroup_lru_del_list() (called from release_pages called from
      free_pages_and_swap_cache from tlb_flush_mmu from tlb_finish_mmu from
      exit_mmap from mmput from exit_mm from do_exit).
      
      Somewhat reproducible, takes a few hours: the old struct mem_cgroup has
      been freed and poisoned by SLAB_DEBUG, but mem_cgroup_lru_del_list() is
      still trying to update its stats, and take page off lru before freeing.
      
      A task, or a charge, or a page on lru: each secures a memcg against
      removal.  In this case, the last task has been moved out of the old memcg,
      and it is exiting: anonymous pages are uncharged one by one from the
      memcg, as they are zapped from its pagetables, so the charge gets down to
      0; but the pages themselves are queued in an mmu_gather for freeing.
      
      Most of those pages will be on lru (and force_empty is careful to
      lru_add_drain_all, to add pages from pagevec to lru first), but not
      necessarily all: perhaps some have been isolated for page reclaim, perhaps
      some isolated for other reasons.  So, force_empty may find no task, no
      charge and no page on lru, and let the removal proceed.
      
      There would still be no problem if these pages were immediately freed; but
      typically (and the put_page_testzero protocol demands it) they have to be
      added back to lru before they are found freeable, then removed from lru
      and freed.  We don't see the issue when adding, because the
      mem_cgroup_iter() loops keep their own reference to the memcg being
      scanned; but when it comes to mem_cgroup_lru_del_list().
      
      I believe this was not an issue in v3.2: there, PageCgroupAcctLRU and
      PageCgroupUsed flags were used (like a trick with mirrors) to deflect view
      of pc->mem_cgroup to the stable root_mem_cgroup when neither set.
      38c5d72f ("memcg: simplify LRU handling by new rule") mercifully
      removed those convolutions, but left this General Protection Fault.
      
      But it's surprisingly easy to restore the old behaviour: just check
      PageCgroupUsed in mem_cgroup_lru_add_list() (which decides on which lruvec
      to add), and reset pc to root_mem_cgroup if page is uncharged.  A risky
      change?  just going back to how it worked before; testing, and an audit of
      uses of pc->mem_cgroup, show no problem.
      
      And there's a nice bonus: with mem_cgroup_lru_add_list() itself making
      sure that an uncharged page goes to root lru, mem_cgroup_reset_owner() no
      longer has any purpose, and we can safely revert 4e5f01c2 ("memcg:
      clear pc->mem_cgroup if necessary").
      
      Calling update_page_reclaim_stat() after add_page_to_lru_list() in swap.c
      is not strictly necessary: the lru_lock there, with RCU before memcg
      structures are freed, makes mem_cgroup_get_reclaim_stat_from_page safe
      without that; but it seems cleaner to rely on one dependency less.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7512102c
  8. 04 2月, 2012 1 次提交
  9. 13 1月, 2012 3 次提交
    • M
      mm: compaction: introduce sync-light migration for use by compaction · a6bc32b8
      Mel Gorman 提交于
      This patch adds a lightweight sync migrate operation MIGRATE_SYNC_LIGHT
      mode that avoids writing back pages to backing storage.  Async compaction
      maps to MIGRATE_ASYNC while sync compaction maps to MIGRATE_SYNC_LIGHT.
      For other migrate_pages users such as memory hotplug, MIGRATE_SYNC is
      used.
      
      This avoids sync compaction stalling for an excessive length of time,
      particularly when copying files to a USB stick where there might be a
      large number of dirty pages backed by a filesystem that does not support
      ->writepages.
      
      [aarcange@redhat.com: This patch is heavily based on Andrea's work]
      [akpm@linux-foundation.org: fix fs/nfs/write.c build]
      [akpm@linux-foundation.org: fix fs/btrfs/disk-io.c build]
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Andy Isaacson <adi@hexapodia.org>
      Cc: Nai Xia <nai.xia@gmail.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a6bc32b8
    • M
      mm: compaction: determine if dirty pages can be migrated without blocking within ->migratepage · b969c4ab
      Mel Gorman 提交于
      Asynchronous compaction is used when allocating transparent hugepages to
      avoid blocking for long periods of time.  Due to reports of stalling,
      there was a debate on disabling synchronous compaction but this severely
      impacted allocation success rates.  Part of the reason was that many dirty
      pages are skipped in asynchronous compaction by the following check;
      
      	if (PageDirty(page) && !sync &&
      		mapping->a_ops->migratepage != migrate_page)
      			rc = -EBUSY;
      
      This skips over all mapping aops using buffer_migrate_page() even though
      it is possible to migrate some of these pages without blocking.  This
      patch updates the ->migratepage callback with a "sync" parameter.  It is
      the responsibility of the callback to fail gracefully if migration would
      block.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Andy Isaacson <adi@hexapodia.org>
      Cc: Nai Xia <nai.xia@gmail.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b969c4ab
    • K
      memcg: clear pc->mem_cgroup if necessary. · 4e5f01c2
      KAMEZAWA Hiroyuki 提交于
      This is a preparation before removing a flag PCG_ACCT_LRU in page_cgroup
      and reducing atomic ops/complexity in memcg LRU handling.
      
      In some cases, pages are added to lru before charge to memcg and pages
      are not classfied to memory cgroup at lru addtion.  Now, the lru where
      the page should be added is determined a bit in page_cgroup->flags and
      pc->mem_cgroup.  I'd like to remove the check of flag.
      
      To handle the case pc->mem_cgroup may contain stale pointers if pages
      are added to LRU before classification.  This patch resets
      pc->mem_cgroup to root_mem_cgroup before lru additions.
      
      [akpm@linux-foundation.org: fix CONFIG_CGROUP_MEM_CONT=n build]
      [hughd@google.com: fix CONFIG_CGROUP_MEM_RES_CTLR=y CONFIG_CGROUP_MEM_RES_CTLR_SWAP=n build]
      [akpm@linux-foundation.org: ksm.c needs memcontrol.h, per Michal]
      [hughd@google.com: stop oops in mem_cgroup_reset_owner()]
      [hughd@google.com: fix page migration to reset_owner]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Miklos Szeredi <mszeredi@suse.cz>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Ying Han <yinghan@google.com>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4e5f01c2
  10. 11 1月, 2012 3 次提交
  11. 09 12月, 2011 1 次提交
  12. 01 11月, 2011 1 次提交
  13. 31 10月, 2011 1 次提交
  14. 20 10月, 2011 1 次提交
    • H
      mm: fix race between mremap and removing migration entry · 486cf46f
      Hugh Dickins 提交于
      I don't usually pay much attention to the stale "? " addresses in
      stack backtraces, but this lucky report from Pawel Sikora hints that
      mremap's move_ptes() has inadequate locking against page migration.
      
       3.0 BUG_ON(!PageLocked(p)) in migration_entry_to_page():
       kernel BUG at include/linux/swapops.h:105!
       RIP: 0010:[<ffffffff81127b76>]  [<ffffffff81127b76>]
                             migration_entry_wait+0x156/0x160
        [<ffffffff811016a1>] handle_pte_fault+0xae1/0xaf0
        [<ffffffff810feee2>] ? __pte_alloc+0x42/0x120
        [<ffffffff8112c26b>] ? do_huge_pmd_anonymous_page+0xab/0x310
        [<ffffffff81102a31>] handle_mm_fault+0x181/0x310
        [<ffffffff81106097>] ? vma_adjust+0x537/0x570
        [<ffffffff81424bed>] do_page_fault+0x11d/0x4e0
        [<ffffffff81109a05>] ? do_mremap+0x2d5/0x570
        [<ffffffff81421d5f>] page_fault+0x1f/0x30
      
      mremap's down_write of mmap_sem, together with i_mmap_mutex or lock,
      and pagetable locks, were good enough before page migration (with its
      requirement that every migration entry be found) came in, and enough
      while migration always held mmap_sem; but not enough nowadays, when
      there's memory hotremove and compaction.
      
      The danger is that move_ptes() lets a migration entry dodge around
      behind remove_migration_pte()'s back, so it's in the old location when
      looking at the new, then in the new location when looking at the old.
      
      Either mremap's move_ptes() must additionally take anon_vma lock(), or
      migration's remove_migration_pte() must stop peeking for is_swap_entry()
      before it takes pagetable lock.
      
      Consensus chooses the latter: we prefer to add overhead to migration
      than to mremapping, which gets used by JVMs and by exec stack setup.
      Reported-and-tested-by: NPaweł Sikora <pluto@agmk.net>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      486cf46f
  15. 17 6月, 2011 1 次提交
  16. 25 5月, 2011 1 次提交
  17. 31 3月, 2011 1 次提交
  18. 24 3月, 2011 1 次提交
  19. 23 3月, 2011 3 次提交
  20. 26 2月, 2011 1 次提交
    • G
      mm: grab rcu read lock in move_pages() · a879bf58
      Greg Thelen 提交于
      The move_pages() usage of find_task_by_vpid() requires rcu_read_lock() to
      prevent free_pid() from reclaiming the pid.
      
      Without this patch, RCU warnings are printed in v2.6.38-rc4 move_pages()
      with:
      
        CONFIG_LOCKUP_DETECTOR=y
        CONFIG_PREEMPT=y
        CONFIG_LOCKDEP=y
        CONFIG_PROVE_LOCKING=y
        CONFIG_PROVE_RCU=y
      
      Previously, migrate_pages() went through a similar transformation
      replacing usage of tasklist_lock with rcu read lock:
      
        commit 55cfaa3c
        Author: Zeng Zhaoming <zengzm.kernel@gmail.com>
        Date:   Thu Dec 2 14:31:13 2010 -0800
      
            mm/mempolicy.c: add rcu read lock to protect pid structure
      
        commit 1e50df39
        Author: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
        Date:   Thu Jan 13 15:46:14 2011 -0800
      
            mempolicy: remove tasklist_lock from migrate_pages
      Signed-off-by: NGreg Thelen <gthelen@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Zeng Zhaoming <zengzm.kernel@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a879bf58
  21. 03 2月, 2011 2 次提交
  22. 26 1月, 2011 1 次提交
  23. 14 1月, 2011 7 次提交
    • D
      memcg: fix memory migration of shmem swapcache · 50de1dd9
      Daisuke Nishimura 提交于
      In the current implementation mem_cgroup_end_migration() decides whether
      the page migration has succeeded or not by checking "oldpage->mapping".
      
      But if we are tring to migrate a shmem swapcache, the page->mapping of it
      is NULL from the begining, so the check would be invalid.  As a result,
      mem_cgroup_end_migration() assumes the migration has succeeded even if
      it's not, so "newpage" would be freed while it's not uncharged.
      
      This patch fixes it by passing mem_cgroup_end_migration() the result of
      the page migration.
      Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      50de1dd9
    • H
      mm: fix hugepage migration · fd4a4663
      Hugh Dickins 提交于
      2.6.37 added an unmap_and_move_huge_page() for memory failure recovery,
      but its anon_vma handling was still based around the 2.6.35 conventions.
      Update it to use page_lock_anon_vma, get_anon_vma, page_unlock_anon_vma,
      drop_anon_vma in the same way as we're now changing unmap_and_move().
      
      I don't particularly like to propose this for stable when I've not seen
      its problems in practice nor tested the solution: but it's clearly out of
      synch at present.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: "Jun'ichi Nomura" <j-nomura@ce.jp.nec.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: <stable@kernel.org> [2.6.37, 2.6.36]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fd4a4663
    • H
      mm: fix migration hangs on anon_vma lock · 1ce82b69
      Hugh Dickins 提交于
      Increased usage of page migration in mmotm reveals that the anon_vma
      locking in unmap_and_move() has been deficient since 2.6.36 (or even
      earlier).  Review at the time of f1819427
      ("mm: fix hang on anon_vma->root->lock") missed the issue here: the
      anon_vma to which we get a reference may already have been freed back to
      its slab (it is in use when we check page_mapped, but that can change),
      and so its anon_vma->root may be switched at any moment by reuse in
      anon_vma_prepare.
      
      Perhaps we could fix that with a get_anon_vma_unless_zero(), but let's
      not: just rely on page_lock_anon_vma() to do all the hard thinking for us,
      then we don't need any rcu read locking over here.
      
      In removing the rcu_unlock label: since PageAnon is a bit in
      page->mapping, it's impossible for a !page->mapping page to be anon; but
      insert VM_BUG_ON in case the implementation ever changes.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Reviewed-by: NMel Gorman <mel@csn.ul.ie>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: "Jun'ichi Nomura" <j-nomura@ce.jp.nec.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: <stable@kernel.org> [2.6.37, 2.6.36]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1ce82b69
    • M
      mm: migration: use rcu_dereference_protected when dereferencing the radix tree... · 29c1f677
      Mel Gorman 提交于
      mm: migration: use rcu_dereference_protected when dereferencing the radix tree slot during file page migration
      
      migrate_pages() -> unmap_and_move() only calls rcu_read_lock() for
      anonymous pages, as introduced by git commit
      989f89c5 ("fix rcu_read_lock() in page
      migraton").  The point of the RCU protection there is part of getting a
      stable reference to anon_vma and is only held for anon pages as file pages
      are locked which is sufficient protection against freeing.
      
      However, while a file page's mapping is being migrated, the radix tree is
      double checked to ensure it is the expected page.  This uses
      radix_tree_deref_slot() -> rcu_dereference() without the RCU lock held
      triggering the following warning.
      
      [  173.674290] ===================================================
      [  173.676016] [ INFO: suspicious rcu_dereference_check() usage. ]
      [  173.676016] ---------------------------------------------------
      [  173.676016] include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
      [  173.676016]
      [  173.676016] other info that might help us debug this:
      [  173.676016]
      [  173.676016]
      [  173.676016] rcu_scheduler_active = 1, debug_locks = 0
      [  173.676016] 1 lock held by hugeadm/2899:
      [  173.676016]  #0:  (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<c10e3d2b>] migrate_page_move_mapping+0x40/0x1ab
      [  173.676016]
      [  173.676016] stack backtrace:
      [  173.676016] Pid: 2899, comm: hugeadm Not tainted 2.6.37-rc5-autobuild
      [  173.676016] Call Trace:
      [  173.676016]  [<c128cc01>] ? printk+0x14/0x1b
      [  173.676016]  [<c1063502>] lockdep_rcu_dereference+0x7d/0x86
      [  173.676016]  [<c10e3db5>] migrate_page_move_mapping+0xca/0x1ab
      [  173.676016]  [<c10e41ad>] migrate_page+0x23/0x39
      [  173.676016]  [<c10e491b>] buffer_migrate_page+0x22/0x107
      [  173.676016]  [<c10e48f9>] ? buffer_migrate_page+0x0/0x107
      [  173.676016]  [<c10e425d>] move_to_new_page+0x9a/0x1ae
      [  173.676016]  [<c10e47e6>] migrate_pages+0x1e7/0x2fa
      
      This patch introduces radix_tree_deref_slot_protected() which calls
      rcu_dereference_protected().  Users of it must pass in the
      mapping->tree_lock that is protecting this dereference.  Holding the tree
      lock protects against parallel updaters of the radix tree meaning that
      rcu_dereference_protected is allowable.
      
      [akpm@linux-foundation.org: remove unneeded casts]
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Milton Miller <miltonm@bga.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: <stable@kernel.org>		[2.6.37.early]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      29c1f677
    • A
      thp: pmd_trans_huge migrate bugcheck · 500d65d4
      Andrea Arcangeli 提交于
      No pmd_trans_huge should ever materialize in migration ptes areas, because
      we split the hugepage before migration ptes are instantiated.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      500d65d4
    • M
      mm: migration: cleanup migrate_pages API by matching types for offlining and sync · 7f0f2496
      Mel Gorman 提交于
      With the introduction of the boolean sync parameter, the API looks a
      little inconsistent as offlining is still an int.  Convert offlining to a
      bool for the sake of being tidy.
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7f0f2496
    • M
      mm: migration: allow migration to operate asynchronously and avoid synchronous... · 77f1fe6b
      Mel Gorman 提交于
      mm: migration: allow migration to operate asynchronously and avoid synchronous compaction in the faster path
      
      Migration synchronously waits for writeback if the initial passes fails.
      Callers of memory compaction do not necessarily want this behaviour if the
      caller is latency sensitive or expects that synchronous migration is not
      going to have a significantly better success rate.
      
      This patch adds a sync parameter to migrate_pages() allowing the caller to
      indicate if wait_on_page_writeback() is allowed within migration or not.
      For reclaim/compaction, try_to_compact_pages() is first called
      asynchronously, direct reclaim runs and then try_to_compact_pages() is
      called synchronously as there is a greater expectation that it'll succeed.
      
      [akpm@linux-foundation.org: build/merge fix]
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      77f1fe6b