1. 13 1月, 2012 1 次提交
    • K
      memcg: clear pc->mem_cgroup if necessary. · 4e5f01c2
      KAMEZAWA Hiroyuki 提交于
      This is a preparation before removing a flag PCG_ACCT_LRU in page_cgroup
      and reducing atomic ops/complexity in memcg LRU handling.
      
      In some cases, pages are added to lru before charge to memcg and pages
      are not classfied to memory cgroup at lru addtion.  Now, the lru where
      the page should be added is determined a bit in page_cgroup->flags and
      pc->mem_cgroup.  I'd like to remove the check of flag.
      
      To handle the case pc->mem_cgroup may contain stale pointers if pages
      are added to LRU before classification.  This patch resets
      pc->mem_cgroup to root_mem_cgroup before lru additions.
      
      [akpm@linux-foundation.org: fix CONFIG_CGROUP_MEM_CONT=n build]
      [hughd@google.com: fix CONFIG_CGROUP_MEM_RES_CTLR=y CONFIG_CGROUP_MEM_RES_CTLR_SWAP=n build]
      [akpm@linux-foundation.org: ksm.c needs memcontrol.h, per Michal]
      [hughd@google.com: stop oops in mem_cgroup_reset_owner()]
      [hughd@google.com: fix page migration to reset_owner]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Miklos Szeredi <mszeredi@suse.cz>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Ying Han <yinghan@google.com>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4e5f01c2
  2. 11 1月, 2012 3 次提交
  3. 09 12月, 2011 1 次提交
  4. 01 11月, 2011 1 次提交
  5. 31 10月, 2011 1 次提交
  6. 20 10月, 2011 1 次提交
    • H
      mm: fix race between mremap and removing migration entry · 486cf46f
      Hugh Dickins 提交于
      I don't usually pay much attention to the stale "? " addresses in
      stack backtraces, but this lucky report from Pawel Sikora hints that
      mremap's move_ptes() has inadequate locking against page migration.
      
       3.0 BUG_ON(!PageLocked(p)) in migration_entry_to_page():
       kernel BUG at include/linux/swapops.h:105!
       RIP: 0010:[<ffffffff81127b76>]  [<ffffffff81127b76>]
                             migration_entry_wait+0x156/0x160
        [<ffffffff811016a1>] handle_pte_fault+0xae1/0xaf0
        [<ffffffff810feee2>] ? __pte_alloc+0x42/0x120
        [<ffffffff8112c26b>] ? do_huge_pmd_anonymous_page+0xab/0x310
        [<ffffffff81102a31>] handle_mm_fault+0x181/0x310
        [<ffffffff81106097>] ? vma_adjust+0x537/0x570
        [<ffffffff81424bed>] do_page_fault+0x11d/0x4e0
        [<ffffffff81109a05>] ? do_mremap+0x2d5/0x570
        [<ffffffff81421d5f>] page_fault+0x1f/0x30
      
      mremap's down_write of mmap_sem, together with i_mmap_mutex or lock,
      and pagetable locks, were good enough before page migration (with its
      requirement that every migration entry be found) came in, and enough
      while migration always held mmap_sem; but not enough nowadays, when
      there's memory hotremove and compaction.
      
      The danger is that move_ptes() lets a migration entry dodge around
      behind remove_migration_pte()'s back, so it's in the old location when
      looking at the new, then in the new location when looking at the old.
      
      Either mremap's move_ptes() must additionally take anon_vma lock(), or
      migration's remove_migration_pte() must stop peeking for is_swap_entry()
      before it takes pagetable lock.
      
      Consensus chooses the latter: we prefer to add overhead to migration
      than to mremapping, which gets used by JVMs and by exec stack setup.
      Reported-and-tested-by: NPaweł Sikora <pluto@agmk.net>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      486cf46f
  7. 17 6月, 2011 1 次提交
  8. 25 5月, 2011 1 次提交
  9. 31 3月, 2011 1 次提交
  10. 24 3月, 2011 1 次提交
  11. 23 3月, 2011 3 次提交
  12. 26 2月, 2011 1 次提交
    • G
      mm: grab rcu read lock in move_pages() · a879bf58
      Greg Thelen 提交于
      The move_pages() usage of find_task_by_vpid() requires rcu_read_lock() to
      prevent free_pid() from reclaiming the pid.
      
      Without this patch, RCU warnings are printed in v2.6.38-rc4 move_pages()
      with:
      
        CONFIG_LOCKUP_DETECTOR=y
        CONFIG_PREEMPT=y
        CONFIG_LOCKDEP=y
        CONFIG_PROVE_LOCKING=y
        CONFIG_PROVE_RCU=y
      
      Previously, migrate_pages() went through a similar transformation
      replacing usage of tasklist_lock with rcu read lock:
      
        commit 55cfaa3c
        Author: Zeng Zhaoming <zengzm.kernel@gmail.com>
        Date:   Thu Dec 2 14:31:13 2010 -0800
      
            mm/mempolicy.c: add rcu read lock to protect pid structure
      
        commit 1e50df39
        Author: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
        Date:   Thu Jan 13 15:46:14 2011 -0800
      
            mempolicy: remove tasklist_lock from migrate_pages
      Signed-off-by: NGreg Thelen <gthelen@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Zeng Zhaoming <zengzm.kernel@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a879bf58
  13. 03 2月, 2011 2 次提交
  14. 26 1月, 2011 1 次提交
  15. 14 1月, 2011 8 次提交
    • D
      memcg: fix memory migration of shmem swapcache · 50de1dd9
      Daisuke Nishimura 提交于
      In the current implementation mem_cgroup_end_migration() decides whether
      the page migration has succeeded or not by checking "oldpage->mapping".
      
      But if we are tring to migrate a shmem swapcache, the page->mapping of it
      is NULL from the begining, so the check would be invalid.  As a result,
      mem_cgroup_end_migration() assumes the migration has succeeded even if
      it's not, so "newpage" would be freed while it's not uncharged.
      
      This patch fixes it by passing mem_cgroup_end_migration() the result of
      the page migration.
      Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      50de1dd9
    • H
      mm: fix hugepage migration · fd4a4663
      Hugh Dickins 提交于
      2.6.37 added an unmap_and_move_huge_page() for memory failure recovery,
      but its anon_vma handling was still based around the 2.6.35 conventions.
      Update it to use page_lock_anon_vma, get_anon_vma, page_unlock_anon_vma,
      drop_anon_vma in the same way as we're now changing unmap_and_move().
      
      I don't particularly like to propose this for stable when I've not seen
      its problems in practice nor tested the solution: but it's clearly out of
      synch at present.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: "Jun'ichi Nomura" <j-nomura@ce.jp.nec.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: <stable@kernel.org> [2.6.37, 2.6.36]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fd4a4663
    • H
      mm: fix migration hangs on anon_vma lock · 1ce82b69
      Hugh Dickins 提交于
      Increased usage of page migration in mmotm reveals that the anon_vma
      locking in unmap_and_move() has been deficient since 2.6.36 (or even
      earlier).  Review at the time of f1819427
      ("mm: fix hang on anon_vma->root->lock") missed the issue here: the
      anon_vma to which we get a reference may already have been freed back to
      its slab (it is in use when we check page_mapped, but that can change),
      and so its anon_vma->root may be switched at any moment by reuse in
      anon_vma_prepare.
      
      Perhaps we could fix that with a get_anon_vma_unless_zero(), but let's
      not: just rely on page_lock_anon_vma() to do all the hard thinking for us,
      then we don't need any rcu read locking over here.
      
      In removing the rcu_unlock label: since PageAnon is a bit in
      page->mapping, it's impossible for a !page->mapping page to be anon; but
      insert VM_BUG_ON in case the implementation ever changes.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Reviewed-by: NMel Gorman <mel@csn.ul.ie>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: "Jun'ichi Nomura" <j-nomura@ce.jp.nec.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: <stable@kernel.org> [2.6.37, 2.6.36]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1ce82b69
    • M
      mm: migration: use rcu_dereference_protected when dereferencing the radix tree... · 29c1f677
      Mel Gorman 提交于
      mm: migration: use rcu_dereference_protected when dereferencing the radix tree slot during file page migration
      
      migrate_pages() -> unmap_and_move() only calls rcu_read_lock() for
      anonymous pages, as introduced by git commit
      989f89c5 ("fix rcu_read_lock() in page
      migraton").  The point of the RCU protection there is part of getting a
      stable reference to anon_vma and is only held for anon pages as file pages
      are locked which is sufficient protection against freeing.
      
      However, while a file page's mapping is being migrated, the radix tree is
      double checked to ensure it is the expected page.  This uses
      radix_tree_deref_slot() -> rcu_dereference() without the RCU lock held
      triggering the following warning.
      
      [  173.674290] ===================================================
      [  173.676016] [ INFO: suspicious rcu_dereference_check() usage. ]
      [  173.676016] ---------------------------------------------------
      [  173.676016] include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
      [  173.676016]
      [  173.676016] other info that might help us debug this:
      [  173.676016]
      [  173.676016]
      [  173.676016] rcu_scheduler_active = 1, debug_locks = 0
      [  173.676016] 1 lock held by hugeadm/2899:
      [  173.676016]  #0:  (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<c10e3d2b>] migrate_page_move_mapping+0x40/0x1ab
      [  173.676016]
      [  173.676016] stack backtrace:
      [  173.676016] Pid: 2899, comm: hugeadm Not tainted 2.6.37-rc5-autobuild
      [  173.676016] Call Trace:
      [  173.676016]  [<c128cc01>] ? printk+0x14/0x1b
      [  173.676016]  [<c1063502>] lockdep_rcu_dereference+0x7d/0x86
      [  173.676016]  [<c10e3db5>] migrate_page_move_mapping+0xca/0x1ab
      [  173.676016]  [<c10e41ad>] migrate_page+0x23/0x39
      [  173.676016]  [<c10e491b>] buffer_migrate_page+0x22/0x107
      [  173.676016]  [<c10e48f9>] ? buffer_migrate_page+0x0/0x107
      [  173.676016]  [<c10e425d>] move_to_new_page+0x9a/0x1ae
      [  173.676016]  [<c10e47e6>] migrate_pages+0x1e7/0x2fa
      
      This patch introduces radix_tree_deref_slot_protected() which calls
      rcu_dereference_protected().  Users of it must pass in the
      mapping->tree_lock that is protecting this dereference.  Holding the tree
      lock protects against parallel updaters of the radix tree meaning that
      rcu_dereference_protected is allowable.
      
      [akpm@linux-foundation.org: remove unneeded casts]
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Milton Miller <miltonm@bga.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: <stable@kernel.org>		[2.6.37.early]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      29c1f677
    • A
      thp: pmd_trans_huge migrate bugcheck · 500d65d4
      Andrea Arcangeli 提交于
      No pmd_trans_huge should ever materialize in migration ptes areas, because
      we split the hugepage before migration ptes are instantiated.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      500d65d4
    • M
      mm: migration: cleanup migrate_pages API by matching types for offlining and sync · 7f0f2496
      Mel Gorman 提交于
      With the introduction of the boolean sync parameter, the API looks a
      little inconsistent as offlining is still an int.  Convert offlining to a
      bool for the sake of being tidy.
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7f0f2496
    • M
      mm: migration: allow migration to operate asynchronously and avoid synchronous... · 77f1fe6b
      Mel Gorman 提交于
      mm: migration: allow migration to operate asynchronously and avoid synchronous compaction in the faster path
      
      Migration synchronously waits for writeback if the initial passes fails.
      Callers of memory compaction do not necessarily want this behaviour if the
      caller is latency sensitive or expects that synchronous migration is not
      going to have a significantly better success rate.
      
      This patch adds a sync parameter to migrate_pages() allowing the caller to
      indicate if wait_on_page_writeback() is allowed within migration or not.
      For reclaim/compaction, try_to_compact_pages() is first called
      asynchronously, direct reclaim runs and then try_to_compact_pages() is
      called synchronously as there is a greater expectation that it'll succeed.
      
      [akpm@linux-foundation.org: build/merge fix]
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      77f1fe6b
    • M
      mm: vmscan: reclaim order-0 and use compaction instead of lumpy reclaim · 3e7d3449
      Mel Gorman 提交于
      Lumpy reclaim is disruptive.  It reclaims a large number of pages and
      ignores the age of the pages it reclaims.  This can incur significant
      stalls and potentially increase the number of major faults.
      
      Compaction has reached the point where it is considered reasonably stable
      (meaning it has passed a lot of testing) and is a potential candidate for
      displacing lumpy reclaim.  This patch introduces an alternative to lumpy
      reclaim whe compaction is available called reclaim/compaction.  The basic
      operation is very simple - instead of selecting a contiguous range of
      pages to reclaim, a number of order-0 pages are reclaimed and then
      compaction is later by either kswapd (compact_zone_order()) or direct
      compaction (__alloc_pages_direct_compact()).
      
      [akpm@linux-foundation.org: fix build]
      [akpm@linux-foundation.org: use conventional task_struct naming]
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3e7d3449
  16. 23 12月, 2010 1 次提交
  17. 27 10月, 2010 3 次提交
    • G
      mm: fix error reporting in move_pages() syscall · 70384dc6
      Gleb Natapov 提交于
      The vma returned by find_vma does not necessarily include the target
      address.  If this happens the code tries to follow a page outside of any
      vma and returns ENOENT instead of EFAULT.
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      Acked-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      70384dc6
    • M
      mm: compaction: fix COMPACTPAGEFAILED counting · cf608ac1
      Minchan Kim 提交于
      Presently update_nr_listpages() doesn't have a role.  That's because lists
      passed is always empty just after calling migrate_pages.  The
      migrate_pages cleans up page list which have failed to migrate before
      returning by aaa994b3.
      
       [PATCH] page migration: handle freeing of pages in migrate_pages()
      
       Do not leave pages on the lists passed to migrate_pages().  Seems that we will
       not need any postprocessing of pages.  This will simplify the handling of
       pages by the callers of migrate_pages().
      
      At that time, we thought we don't need any postprocessing of pages.  But
      the situation is changed.  The compaction need to know the number of
      failed to migrate for COMPACTPAGEFAILED stat
      
      This patch makes new rule for caller of migrate_pages to call
      putback_lru_pages.  So caller need to clean up the lists so it has a
      chance to postprocess the pages.  [suggested by Christoph Lameter]
      Signed-off-by: NMinchan Kim <minchan.kim@gmail.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Reviewed-by: NMel Gorman <mel@csn.ul.ie>
      Reviewed-by: NWu Fengguang <fengguang.wu@intel.com>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cf608ac1
    • W
      writeback: remove nonblocking/encountered_congestion references · 1b430bee
      Wu Fengguang 提交于
      This removes more dead code that was somehow missed by commit 0d99519e
      (writeback: remove unused nonblocking and congestion checks).  There are
      no behavior change except for the removal of two entries from one of the
      ext4 tracing interface.
      
      The nonblocking checks in ->writepages are no longer used because the
      flusher now prefer to block on get_request_wait() than to skip inodes on
      IO congestion.  The latter will lead to more seeky IO.
      
      The nonblocking checks in ->writepage are no longer used because it's
      redundant with the WB_SYNC_NONE check.
      
      We no long set ->nonblocking in VM page out and page migration, because
      a) it's effectively redundant with WB_SYNC_NONE in current code
      b) it's old semantic of "Don't get stuck on request queues" is mis-behavior:
         that would skip some dirty inodes on congestion and page out others, which
         is unfair in terms of LRU age.
      
      Inspired by Christoph Hellwig. Thanks!
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Sage Weil <sage@newdream.net>
      Cc: Steve French <sfrench@samba.org>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1b430bee
  18. 11 10月, 2010 1 次提交
    • A
      Fix migration.c compilation on s390 · 3ef8fd7f
      Andi Kleen 提交于
      31bit s390 doesn't have huge pages and failed with:
      
      > mm/migrate.c: In function 'remove_migration_pte':
      > mm/migrate.c:143:3: error: implicit declaration of function 'pte_mkhuge'
      > mm/migrate.c:143:7: error: incompatible types when assigning to type 'pte_t' from type 'int'
      
      Put that code into a ifdef.
      
      Reported by Heiko Carstens
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      3ef8fd7f
  19. 08 10月, 2010 1 次提交
    • N
      hugetlb: hugepage migration core · 290408d4
      Naoya Horiguchi 提交于
      This patch extends page migration code to support hugepage migration.
      One of the potential users of this feature is soft offlining which
      is triggered by memory corrected errors (added by the next patch.)
      
      Todo:
      - there are other users of page migration such as memory policy,
        memory hotplug and memocy compaction.
        They are not ready for hugepage support for now.
      
      ChangeLog since v4:
      - define migrate_huge_pages()
      - remove changes on isolation/putback_lru_page()
      
      ChangeLog since v2:
      - refactor isolate/putback_lru_page() to handle hugepage
      - add comment about race on unmap_and_move_huge_page()
      
      ChangeLog since v1:
      - divide migration code path for hugepage
      - define routine checking migration swap entry for hugetlb
      - replace "goto" with "if/else" in remove_migration_pte()
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      290408d4
  20. 10 8月, 2010 3 次提交
  21. 28 5月, 2010 1 次提交
    • A
      memcg: fix mis-accounting of file mapped racy with migration · ac39cf8c
      akpm@linux-foundation.org 提交于
      FILE_MAPPED per memcg of migrated file cache is not properly updated,
      because our hook in page_add_file_rmap() can't know to which memcg
      FILE_MAPPED should be counted.
      
      Basically, this patch is for fixing the bug but includes some big changes
      to fix up other messes.
      
      Now, at migrating mapped file, events happen in following sequence.
      
       1. allocate a new page.
       2. get memcg of an old page.
       3. charge ageinst a new page before migration. But at this point,
          no changes to new page's page_cgroup, no commit for the charge.
          (IOW, PCG_USED bit is not set.)
       4. page migration replaces radix-tree, old-page and new-page.
       5. page migration remaps the new page if the old page was mapped.
       6. Here, the new page is unlocked.
       7. memcg commits the charge for newpage, Mark the new page's page_cgroup
          as PCG_USED.
      
      Because "commit" happens after page-remap, we can count FILE_MAPPED
      at "5", because we should avoid to trust page_cgroup->mem_cgroup.
      if PCG_USED bit is unset.
      (Note: memcg's LRU removal code does that but LRU-isolation logic is used
       for helping it. When we overwrite page_cgroup->mem_cgroup, page_cgroup is
       not on LRU or page_cgroup->mem_cgroup is NULL.)
      
      We can lose file_mapped accounting information at 5 because FILE_MAPPED
      is updated only when mapcount changes 0->1. So we should catch it.
      
      BTW, historically, above implemntation comes from migration-failure
      of anonymous page. Because we charge both of old page and new page
      with mapcount=0, we can't catch
        - the page is really freed before remap.
        - migration fails but it's freed before remap
      or .....corner cases.
      
      New migration sequence with memcg is:
      
       1. allocate a new page.
       2. mark PageCgroupMigration to the old page.
       3. charge against a new page onto the old page's memcg. (here, new page's pc
          is marked as PageCgroupUsed.)
       4. page migration replaces radix-tree, page table, etc...
       5. At remapping, new page's page_cgroup is now makrked as "USED"
          We can catch 0->1 event and FILE_MAPPED will be properly updated.
      
          And we can catch SWAPOUT event after unlock this and freeing this
          page by unmap() can be caught.
      
       7. Clear PageCgroupMigration of the old page.
      
      So, FILE_MAPPED will be correctly updated.
      
      Then, for what MIGRATION flag is ?
        Without it, at migration failure, we may have to charge old page again
        because it may be fully unmapped. "charge" means that we have to dive into
        memory reclaim or something complated. So, it's better to avoid
        charge it again. Before this patch, __commit_charge() was working for
        both of the old/new page and fixed up all. But this technique has some
        racy condtion around FILE_MAPPED and SWAPOUT etc...
        Now, the kernel use MIGRATION flag and don't uncharge old page until
        the end of migration.
      
      I hope this change will make memcg's page migration much simpler.  This
      page migration has caused several troubles.  Worth to add a flag for
      simplification.
      Reviewed-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Tested-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Reported-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ac39cf8c
  22. 25 5月, 2010 3 次提交