1. 13 6月, 2013 1 次提交
  2. 25 5月, 2013 1 次提交
    • L
      mm compaction: fix of improper cache flush in migration code · c2cc499c
      Leonid Yegoshin 提交于
      Page 'new' during MIGRATION can't be flushed with flush_cache_page().
      Using flush_cache_page(vma, addr, pfn) is justified only if the page is
      already placed in process page table, and that is done right after
      flush_cache_page().  But without it the arch function has no knowledge
      of process PTE and does nothing.
      
      Besides that, flush_cache_page() flushes an application cache page, but
      the kernel has a different page virtual address and dirtied it.
      
      Replace it with flush_dcache_page(new) which is the proper usage.
      
      The old page is flushed in try_to_unmap_one() before migration.
      
      This bug takes place in Sead3 board with M14Kc MIPS CPU without cache
      aliasing (but Harvard arch - separate I and D cache) in tight memory
      environment (128MB) each 1-3days on SOAK test.  It fails in cc1 during
      kernel build (SIGILL, SIGBUS, SIGSEG) if CONFIG_COMPACTION is switched
      ON.
      Signed-off-by: NLeonid Yegoshin <Leonid.Yegoshin@imgtec.com>
      Cc: Leonid Yegoshin <yegoshin@mips.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: David Miller <davem@davemloft.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c2cc499c
  3. 30 4月, 2013 2 次提交
  4. 24 2月, 2013 6 次提交
    • H
      mm: remove offlining arg to migrate_pages · 9c620e2b
      Hugh Dickins 提交于
      No functional change, but the only purpose of the offlining argument to
      migrate_pages() etc, was to ensure that __unmap_and_move() could migrate a
      KSM page for memory hotremove (which took ksm_thread_mutex) but not for
      other callers.  Now all cases are safe, remove the arg.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Petr Holasek <pholasek@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Izik Eidus <izik.eidus@ravellosystems.com>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9c620e2b
    • H
      ksm: enable KSM page migration · b79bc0a0
      Hugh Dickins 提交于
      Migration of KSM pages is now safe: remove the PageKsm restrictions from
      mempolicy.c and migrate.c.
      
      But keep PageKsm out of __unmap_and_move()'s anon_vma contortions, which
      are irrelevant to KSM: it looks as if that code was preventing hotremove
      migration of KSM pages, unless they happened to be in swapcache.
      
      There is some question as to whether enforcing a NUMA mempolicy migration
      ought to migrate KSM pages, mapped into entirely unrelated processes; but
      moving page_mapcount > 1 is only permitted with MPOL_MF_MOVE_ALL anyway,
      and it seems reasonable to assume that you wouldn't set MADV_MERGEABLE on
      any area where this is a worry.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Petr Holasek <pholasek@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Izik Eidus <izik.eidus@ravellosystems.com>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b79bc0a0
    • H
      ksm: make KSM page migration possible · c8d6553b
      Hugh Dickins 提交于
      KSM page migration is already supported in the case of memory hotremove,
      which takes the ksm_thread_mutex across all its migrations to keep life
      simple.
      
      But the new KSM NUMA merge_across_nodes knob introduces a problem, when
      it's set to non-default 0: if a KSM page is migrated to a different NUMA
      node, how do we migrate its stable node to the right tree?  And what if
      that collides with an existing stable node?
      
      So far there's no provision for that, and this patch does not attempt to
      deal with it either.  But how will I test a solution, when I don't know
      how to hotremove memory?  The best answer is to enable KSM page migration
      in all cases now, and test more common cases.  With THP and compaction
      added since KSM came in, page migration is now mainstream, and it's a
      shame that a KSM page can frustrate freeing a page block.
      
      Without worrying about merge_across_nodes 0 for now, this patch gets KSM
      page migration working reliably for default merge_across_nodes 1 (but
      leave the patch enabling it until near the end of the series).
      
      It's much simpler than I'd originally imagined, and does not require an
      additional tier of locking: page migration relies on the page lock, KSM
      page reclaim relies on the page lock, the page lock is enough for KSM page
      migration too.
      
      Almost all the care has to be in get_ksm_page(): that's the function which
      worries about when a stable node is stale and should be freed, now it also
      has to worry about the KSM page being migrated.
      
      The only new overhead is an additional put/get/lock/unlock_page when
      stable_tree_search() arrives at a matching node: to make sure migration
      respects the raised page count, and so does not migrate the page while
      we're busy with it here.  That's probably avoidable, either by changing
      internal interfaces from using kpage to stable_node, or by moving the
      ksm_migrate_page() callsite into a page_freeze_refs() section (even if not
      swapcache); but this works well, I've no urge to pull it apart now.
      
      (Descents of the stable tree may pass through nodes whose KSM pages are
      under migration: being unlocked, the raised page count does not prevent
      that, nor need it: it's safe to memcmp against either old or new page.)
      
      You might worry about mremap, and whether page migration's rmap_walk to
      remove migration entries will find all the KSM locations where it inserted
      earlier: that should already be handled, by the satisfyingly heavy hammer
      of move_vma()'s call to ksm_madvise(,,,MADV_UNMERGEABLE,).
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Petr Holasek <pholasek@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Izik Eidus <izik.eidus@ravellosystems.com>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c8d6553b
    • M
      mm: rename page struct field helpers · 22b751c3
      Mel Gorman 提交于
      The function names page_xchg_last_nid(), page_last_nid() and
      reset_page_last_nid() were judged to be inconsistent so rename them to a
      struct_field_op style pattern.  As it looked jarring to have
      reset_page_mapcount() and page_nid_reset_last() beside each other in
      memmap_init_zone(), this patch also renames reset_page_mapcount() to
      page_mapcount_reset().  There are others like init_page_count() but as
      it is used throughout the arch code a rename would likely cause more
      conflicts than it is worth.
      
      [akpm@linux-foundation.org: fix zcache]
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Suggested-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      22b751c3
    • H
      mm: numa: cleanup flow of transhuge page migration · 340ef390
      Hugh Dickins 提交于
      When correcting commit 04fa5d6a ("mm: migrate: check page_count of
      THP before migrating") Hugh Dickins noted that the control flow for
      transhuge migration was difficult to follow.  Unconditionally calling
      put_page() in numamigrate_isolate_page() made the failure paths of both
      migrate_misplaced_transhuge_page() and migrate_misplaced_page() more
      complex that they should be.  Further, he was extremely wary that an
      unlock_page() should ever happen after a put_page() even if the
      put_page() should never be the final put_page.
      
      Hugh implemented the following cleanup to simplify the path by calling
      putback_lru_page() inside numamigrate_isolate_page() if it failed to
      isolate and always calling unlock_page() within
      migrate_misplaced_transhuge_page().
      
      There is no functional change after this patch is applied but the code
      is easier to follow and unlock_page() always happens before put_page().
      
      [mgorman@suse.de: changelog only]
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Simon Jeons <simon.jeons@gmail.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      340ef390
    • M
      mm: numa: take THP into account when migrating pages for NUMA balancing · 3abef4e6
      Mel Gorman 提交于
      Wanpeng Li pointed out that numamigrate_isolate_page() assumes that only
      one base page is being migrated when in fact it can also be checking
      THP.
      
      The consequences are that a migration will be attempted when a target
      node is nearly full and fail later.  It's unlikely to be user-visible
      but it should be fixed.  While we are there, migrate_balanced_pgdat()
      should treat nr_migrate_pages as an unsigned long as it is treated as a
      watermark.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Suggested-by: NWanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Simon Jeons <simon.jeons@gmail.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3abef4e6
  5. 05 2月, 2013 1 次提交
  6. 12 1月, 2013 1 次提交
    • M
      mm: migrate: check page_count of THP before migrating · 04fa5d6a
      Mel Gorman 提交于
      Hugh Dickins pointed out that migrate_misplaced_transhuge_page() does
      not check page_count before migrating like base page migration and
      khugepage.  He could not see why this was safe and he is right.
      
      The potential impact of the bug is avoided due to the limitations of
      NUMA balancing.  The page_mapcount() check ensures that only a single
      address space is using this page and as THPs are typically private it
      should not be possible for another address space to fault it in
      parallel.  If the address space has one associated task then it's
      difficult to have both a GUP pin and be referencing the page at the same
      time.  If there are multiple tasks then a buggy scenario requires that
      another thread be accessing the page while the direct IO is in flight.
      This is dodgy behaviour as there is a possibility of corruption with or
      without THP migration.  It would be
      
      While we happen to be safe for the most part it is shoddy to depend on
      such "safety" so this patch checks the page count similar to anonymous
      pages.  Note that this does not mean that the page_mapcount() check can
      go away.  If we were to remove the page_mapcount() check the the THP
      would have to be unmapped from all referencing PTEs, replaced with
      migration PTEs and restored properly afterwards.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reported-by: NHugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      04fa5d6a
  7. 18 12月, 2012 1 次提交
    • S
      mm,numa: fix update_mmu_cache_pmd call · ce4a9cc5
      Stephen Rothwell 提交于
      This build error is currently hidden by the fact that the x86
      implementation of 'update_mmu_cache_pmd()' is a macro that doesn't use
      its last argument, but commit b32967ff ("mm: numa: Add THP migration
      for the NUMA working set scanning fault case") introduced a call with
      the wrong third argument.
      
      In the akpm tree, it causes this build error:
      
        mm/migrate.c: In function 'migrate_misplaced_transhuge_page_put':
        mm/migrate.c:1666:2: error: incompatible type for argument 3 of 'update_mmu_cache_pmd'
        arch/x86/include/asm/pgtable.h:792:20: note: expected 'struct pmd_t *' but argument is of type 'pmd_t'
      
      Fix it.
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ce4a9cc5
  8. 13 12月, 2012 1 次提交
  9. 12 12月, 2012 4 次提交
    • R
      mm: introduce putback_movable_pages() · 5733c7d1
      Rafael Aquini 提交于
      The PATCH "mm: introduce compaction and migration for virtio ballooned pages"
      hacks around putback_lru_pages() in order to allow ballooned pages to be
      re-inserted on balloon page list as if a ballooned page was like a LRU page.
      
      As ballooned pages are not legitimate LRU pages, this patch introduces
      putback_movable_pages() to properly cope with cases where the isolated
      pageset contains ballooned pages and LRU pages, thus fixing the mentioned
      inelegant hack around putback_lru_pages().
      Signed-off-by: NRafael Aquini <aquini@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5733c7d1
    • R
      mm: introduce compaction and migration for ballooned pages · bf6bddf1
      Rafael Aquini 提交于
      Memory fragmentation introduced by ballooning might reduce significantly
      the number of 2MB contiguous memory blocks that can be used within a guest,
      thus imposing performance penalties associated with the reduced number of
      transparent huge pages that could be used by the guest workload.
      
      This patch introduces the helper functions as well as the necessary changes
      to teach compaction and migration bits how to cope with pages which are
      part of a guest memory balloon, in order to make them movable by memory
      compaction procedures.
      Signed-off-by: NRafael Aquini <aquini@redhat.com>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bf6bddf1
    • R
      mm: adjust address_space_operations.migratepage() return code · 78bd5209
      Rafael Aquini 提交于
      Memory fragmentation introduced by ballooning might reduce significantly
      the number of 2MB contiguous memory blocks that can be used within a
      guest, thus imposing performance penalties associated with the reduced
      number of transparent huge pages that could be used by the guest workload.
      
      This patch-set follows the main idea discussed at 2012 LSFMMS session:
      "Ballooning for transparent huge pages" -- http://lwn.net/Articles/490114/
      to introduce the required changes to the virtio_balloon driver, as well as
      the changes to the core compaction & migration bits, in order to make
      those subsystems aware of ballooned pages and allow memory balloon pages
      become movable within a guest, thus avoiding the aforementioned
      fragmentation issue
      
      Following are numbers that prove this patch benefits on allowing
      compaction to be more effective at memory ballooned guests.
      
      Results for STRESS-HIGHALLOC benchmark, from Mel Gorman's mmtests suite,
      running on a 4gB RAM KVM guest which was ballooning 512mB RAM in 64mB
      chunks, at every minute (inflating/deflating), while test was running:
      
      ===BEGIN stress-highalloc
      
      STRESS-HIGHALLOC
                       highalloc-3.7     highalloc-3.7
                           rc4-clean         rc4-patch
      Pass 1          55.00 ( 0.00%)    62.00 ( 7.00%)
      Pass 2          54.00 ( 0.00%)    62.00 ( 8.00%)
      while Rested    75.00 ( 0.00%)    80.00 ( 5.00%)
      
      MMTests Statistics: duration
                       3.7         3.7
                 rc4-clean   rc4-patch
      User         1207.59     1207.46
      System       1300.55     1299.61
      Elapsed      2273.72     2157.06
      
      MMTests Statistics: vmstat
                                      3.7         3.7
                                rc4-clean   rc4-patch
      Page Ins                    3581516     2374368
      Page Outs                  11148692    10410332
      Swap Ins                         80          47
      Swap Outs                      3641         476
      Direct pages scanned          37978       33826
      Kswapd pages scanned        1828245     1342869
      Kswapd pages reclaimed      1710236     1304099
      Direct pages reclaimed        32207       31005
      Kswapd efficiency               93%         97%
      Kswapd velocity             804.077     622.546
      Direct efficiency               84%         91%
      Direct velocity              16.703      15.682
      Percentage direct scans          2%          2%
      Page writes by reclaim        79252        9704
      Page writes file              75611        9228
      Page writes anon               3641         476
      Page reclaim immediate        16764       11014
      Page rescued immediate            0           0
      Slabs scanned               2171904     2152448
      Direct inode steals             385        2261
      Kswapd inode steals          659137      609670
      Kswapd skipped wait               1          69
      THP fault alloc                 546         631
      THP collapse alloc              361         339
      THP splits                      259         263
      THP fault fallback               98          50
      THP collapse fail                20          17
      Compaction stalls               747         499
      Compaction success              244         145
      Compaction failures             503         354
      Compaction pages moved       370888      474837
      Compaction move failure       77378       65259
      
      ===END stress-highalloc
      
      This patch:
      
      Introduce MIGRATEPAGE_SUCCESS as the default return code for
      address_space_operations.migratepage() method and documents the expected
      return code for the same method in failure cases.
      Signed-off-by: NRafael Aquini <aquini@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      78bd5209
    • B
      mm: introduce mm_find_pmd() · 6219049a
      Bob Liu 提交于
      Several place need to find the pmd by(mm_struct, address), so introduce a
      function to simplify it.
      
      [akpm@linux-foundation.org: fix warning]
      Signed-off-by: NBob Liu <lliubbo@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Ni zhan Chen <nizhan.chen@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6219049a
  10. 11 12月, 2012 13 次提交
    • I
      mm/rmap, migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable · 4fc3f1d6
      Ingo Molnar 提交于
      rmap_walk_anon() and try_to_unmap_anon() appears to be too
      careful about locking the anon vma: while it needs protection
      against anon vma list modifications, it does not need exclusive
      access to the list itself.
      
      Transforming this exclusive lock to a read-locked rwsem removes
      a global lock from the hot path of page-migration intense
      threaded workloads which can cause pathological performance like
      this:
      
          96.43%        process 0  [kernel.kallsyms]  [k] perf_trace_sched_switch
                        |
                        --- perf_trace_sched_switch
                            __schedule
                            schedule
                            schedule_preempt_disabled
                            __mutex_lock_common.isra.6
                            __mutex_lock_slowpath
                            mutex_lock
                           |
                           |--50.61%-- rmap_walk
                           |          move_to_new_page
                           |          migrate_pages
                           |          migrate_misplaced_page
                           |          __do_numa_page.isra.69
                           |          handle_pte_fault
                           |          handle_mm_fault
                           |          __do_page_fault
                           |          do_page_fault
                           |          page_fault
                           |          __memset_sse2
                           |          |
                           |           --100.00%-- worker_thread
                           |                     |
                           |                      --100.00%-- start_thread
                           |
                            --49.39%-- page_lock_anon_vma
                                      try_to_unmap_anon
                                      try_to_unmap
                                      migrate_pages
                                      migrate_misplaced_page
                                      __do_numa_page.isra.69
                                      handle_pte_fault
                                      handle_mm_fault
                                      __do_page_fault
                                      do_page_fault
                                      page_fault
                                      __memset_sse2
                                      |
                                       --100.00%-- worker_thread
                                                 start_thread
      
      With this change applied the profile is now nicely flat
      and there's no anon-vma related scheduling/blocking.
      
      Rename anon_vma_[un]lock() => anon_vma_[un]lock_write(),
      to make it clearer that it's an exclusive write-lock in
      that case - suggested by Rik van Riel.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Turner <pjt@google.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      4fc3f1d6
    • M
      mm: migrate: Account a transhuge page properly when rate limiting · d28d4335
      Mel Gorman 提交于
      If there is excessive migration due to NUMA balancing it gets rate
      limited. It does this by counting the number of pages it has migrated
      recently but counts a transhuge page as 1 page. Account for it properly.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      d28d4335
    • M
      mm: numa: Account for failed allocations and isolations as migration failures · 7548341b
      Mel Gorman 提交于
      Subject says it all. Allocation failures and a failure to isolate should
      be accounted as a migration failure. This is partially another
      difference between base page and transhuge page migration. A base page
      migration makes multiple attempts for these conditions before it would
      be accounted for as a failure.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      7548341b
    • M
      mm: numa: Add THP migration for the NUMA working set scanning fault case build fix · 220018d3
      Mel Gorman 提交于
      Commit "Add THP migration for the NUMA working set scanning fault case"
      breaks the build because HPAGE_PMD_SHIFT and HPAGE_PMD_MASK defined to
      explode without CONFIG_TRANSPARENT_HUGEPAGE:
      
      mm/migrate.c: In function 'migrate_misplaced_transhuge_page_put':
      mm/migrate.c:1549: error: call to '__build_bug_failed' declared with attribute error: BUILD_BUG failed
      mm/migrate.c:1564: error: call to '__build_bug_failed' declared with attribute error: BUILD_BUG failed
      mm/migrate.c:1566: error: call to '__build_bug_failed' declared with attribute error: BUILD_BUG failed
      mm/migrate.c:1573: error: call to '__build_bug_failed' declared with attribute error: BUILD_BUG failed
      mm/migrate.c:1606: error: call to '__build_bug_failed' declared with attribute error: BUILD_BUG failed
      mm/migrate.c:1648: error: call to '__build_bug_failed' declared with attribute error: BUILD_BUG failed
      
      CONFIG_NUMA_BALANCING allows compilation without enabling transparent
      hugepages, so define the dummy function for such a configuration and only
      define migrate_misplaced_transhuge_page_put() when transparent hugepages
      are enabled.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      220018d3
    • M
      mm: numa: Add THP migration for the NUMA working set scanning fault case. · b32967ff
      Mel Gorman 提交于
      Note: This is very heavily based on a patch from Peter Zijlstra with
      	fixes from Ingo Molnar, Hugh Dickins and Johannes Weiner.  That patch
      	put a lot of migration logic into mm/huge_memory.c where it does
      	not belong. This version puts tries to share some of the migration
      	logic with migrate_misplaced_page.  However, it should be noted
      	that now migrate.c is doing more with the pagetable manipulation
      	than is preferred. The end result is barely recognisable so as
      	before, the signed-offs had to be removed but will be re-added if
      	the original authors are ok with it.
      
      Add THP migration for the NUMA working set scanning fault case.
      
      It uses the page lock to serialize. No migration pte dance is
      necessary because the pte is already unmapped when we decide
      to migrate.
      
      [dhillf@gmail.com: Fix memory leak on isolation failure]
      [dhillf@gmail.com: Fix transfer of last_nid information]
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      b32967ff
    • H
      mm: numa: migrate: Set last_nid on newly allocated page · bac0382c
      Hillf Danton 提交于
      Pass last_nid from misplaced page to newly allocated migration target page.
      Signed-off-by: NHillf Danton <dhillf@gmail.com>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      bac0382c
    • M
      mm: numa: Rate limit setting of pte_numa if node is saturated · e14808b4
      Mel Gorman 提交于
      If there are a large number of NUMA hinting faults and all of them
      are resulting in migrations it may indicate that memory is just
      bouncing uselessly around. NUMA balancing cost is likely exceeding
      any benefit from locality. Rate limit the PTE updates if the node
      is migration rate-limited. As noted in the comments, this distorts
      the NUMA faulting statistics.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      e14808b4
    • M
      mm: numa: Rate limit the amount of memory that is migrated between nodes · a8f60772
      Mel Gorman 提交于
      NOTE: This is very heavily based on similar logic in autonuma. It should
      	be signed off by Andrea but because there was no standalone
      	patch and it's sufficiently different from what he did that
      	the signed-off is omitted. Will be added back if requested.
      
      If a large number of pages are misplaced then the memory bus can be
      saturated just migrating pages between nodes. This patch rate-limits
      the amount of memory that can be migrating between nodes.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      a8f60772
    • M
      mm: numa: Add pte updates, hinting and migration stats · 03c5a6e1
      Mel Gorman 提交于
      It is tricky to quantify the basic cost of automatic NUMA placement in a
      meaningful manner. This patch adds some vmstats that can be used as part
      of a basic costing model.
      
      u    = basic unit = sizeof(void *)
      Ca   = cost of struct page access = sizeof(struct page) / u
      Cpte = Cost PTE access = Ca
      Cupdate = Cost PTE update = (2 * Cpte) + (2 * Wlock)
      	where Cpte is incurred twice for a read and a write and Wlock
      	is a constant representing the cost of taking or releasing a
      	lock
      Cnumahint = Cost of a minor page fault = some high constant e.g. 1000
      Cpagerw = Cost to read or write a full page = Ca + PAGE_SIZE/u
      Ci = Cost of page isolation = Ca + Wi
      	where Wi is a constant that should reflect the approximate cost
      	of the locking operation
      Cpagecopy = Cpagerw + (Cpagerw * Wnuma) + Ci + (Ci * Wnuma)
      	where Wnuma is the approximate NUMA factor. 1 is local. 1.2
      	would imply that remote accesses are 20% more expensive
      
      Balancing cost = Cpte * numa_pte_updates +
      		Cnumahint * numa_hint_faults +
      		Ci * numa_pages_migrated +
      		Cpagecopy * numa_pages_migrated
      
      Note that numa_pages_migrated is used as a measure of how many pages
      were isolated even though it would miss pages that failed to migrate. A
      vmstat counter could have been added for it but the isolation cost is
      pretty marginal in comparison to the overall cost so it seemed overkill.
      
      The ideal way to measure automatic placement benefit would be to count
      the number of remote accesses versus local accesses and do something like
      
      	benefit = (remote_accesses_before - remove_access_after) * Wnuma
      
      but the information is not readily available. As a workload converges, the
      expection would be that the number of remote numa hints would reduce to 0.
      
      	convergence = numa_hint_faults_local / numa_hint_faults
      		where this is measured for the last N number of
      		numa hints recorded. When the workload is fully
      		converged the value is 1.
      
      This can measure if the placement policy is converging and how fast it is
      doing it.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NRik van Riel <riel@redhat.com>
      03c5a6e1
    • M
      mm: migrate: Drop the misplaced pages reference count if the target node is full · 149c33e1
      Mel Gorman 提交于
      If we have to avoid migrating to a node that is nearly full, put page
      and return zero.
      Signed-off-by: NHillf Danton <dhillf@gmail.com>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      149c33e1
    • P
      mm: migrate: Introduce migrate_misplaced_page() · 7039e1db
      Peter Zijlstra 提交于
      Note: This was originally based on Peter's patch "mm/migrate: Introduce
      	migrate_misplaced_page()" but borrows extremely heavily from Andrea's
      	"autonuma: memory follows CPU algorithm and task/mm_autonuma stats
      	collection". The end result is barely recognisable so signed-offs
      	had to be dropped. If original authors are ok with it, I'll
      	re-add the signed-off-bys.
      
      Add migrate_misplaced_page() which deals with migrating pages from
      faults.
      Based-on-work-by: NLee Schermerhorn <Lee.Schermerhorn@hp.com>
      Based-on-work-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Based-on-work-by: NAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      7039e1db
    • M
      mm: migrate: Add a tracepoint for migrate_pages · 7b2a2d4a
      Mel Gorman 提交于
      The pgmigrate_success and pgmigrate_fail vmstat counters tells the user
      about migration activity but not the type or the reason. This patch adds
      a tracepoint to identify the type of page migration and why the page is
      being migrated.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      7b2a2d4a
    • M
      mm: compaction: Move migration fail/success stats to migrate.c · 5647bc29
      Mel Gorman 提交于
      The compact_pages_moved and compact_pagemigrate_failed events are
      convenient for determining if compaction is active and to what
      degree migration is succeeding but it's at the wrong level. Other
      users of migration may also want to know if migration is working
      properly and this will be particularly true for any automated
      NUMA migration. This patch moves the counters down to migration
      with the new events called pgmigrate_success and pgmigrate_fail.
      The compact_blocks_moved counter is removed because while it was
      useful for debugging initially, it's worthless now as no meaningful
      conclusions can be drawn from its value.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      5647bc29
  11. 01 8月, 2012 3 次提交
  12. 04 6月, 2012 1 次提交
    • H
      mm: fix warning in __set_page_dirty_nobuffers · 752dc185
      Hugh Dickins 提交于
      New tmpfs use of !PageUptodate pages for fallocate() is triggering the
      WARNING: at mm/page-writeback.c:1990 when __set_page_dirty_nobuffers()
      is called from migrate_page_copy() for compaction.
      
      It is anomalous that migration should use __set_page_dirty_nobuffers()
      on an address_space that does not participate in dirty and writeback
      accounting; and this has also been observed to insert surprising dirty
      tags into a tmpfs radix_tree, despite tmpfs not using tags at all.
      
      We should probably give migrate_page_copy() a better way to preserve the
      tag and migrate accounting info, when mapping_cap_account_dirty().  But
      that needs some more work: so in the interim, avoid the warning by using
      a simple SetPageDirty on PageSwapBacked pages.
      Reported-and-tested-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      752dc185
  13. 16 5月, 2012 1 次提交
  14. 26 4月, 2012 1 次提交
  15. 22 3月, 2012 1 次提交
    • C
      mm: fix move/migrate_pages() race on task struct · 3268c63e
      Christoph Lameter 提交于
      Migration functions perform the rcu_read_unlock too early.  As a result
      the task pointed to may change from under us.  This can result in an oops,
      as reported by Dave Hansen in https://lkml.org/lkml/2012/2/23/302.
      
      The following patch extend the period of the rcu_read_lock until after the
      permissions checks are done.  We also take a refcount so that the task
      reference is stable when calling security check functions and performing
      cpuset node validation (which takes a mutex).
      
      The refcount is dropped before actual page migration occurs so there is no
      change to the refcounts held during page migration.
      
      Also move the determination of the mm of the task struct to immediately
      before the do_migrate*() calls so that it is clear that we switch from
      handling the task during permission checks to the mm for the actual
      migration.  Since the determination is only done once and we then no
      longer use the task_struct we can be sure that we operate on a specific
      address space that will not change from under us.
      
      [akpm@linux-foundation.org: checkpatch fixes]
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Reported-by: NDave Hansen <dave@linux.vnet.ibm.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3268c63e
  16. 06 3月, 2012 1 次提交
    • H
      memcg: fix GPF when cgroup removal races with last exit · 7512102c
      Hugh Dickins 提交于
      When moving tasks from old memcg (with move_charge_at_immigrate on new
      memcg), followed by removal of old memcg, hit General Protection Fault in
      mem_cgroup_lru_del_list() (called from release_pages called from
      free_pages_and_swap_cache from tlb_flush_mmu from tlb_finish_mmu from
      exit_mmap from mmput from exit_mm from do_exit).
      
      Somewhat reproducible, takes a few hours: the old struct mem_cgroup has
      been freed and poisoned by SLAB_DEBUG, but mem_cgroup_lru_del_list() is
      still trying to update its stats, and take page off lru before freeing.
      
      A task, or a charge, or a page on lru: each secures a memcg against
      removal.  In this case, the last task has been moved out of the old memcg,
      and it is exiting: anonymous pages are uncharged one by one from the
      memcg, as they are zapped from its pagetables, so the charge gets down to
      0; but the pages themselves are queued in an mmu_gather for freeing.
      
      Most of those pages will be on lru (and force_empty is careful to
      lru_add_drain_all, to add pages from pagevec to lru first), but not
      necessarily all: perhaps some have been isolated for page reclaim, perhaps
      some isolated for other reasons.  So, force_empty may find no task, no
      charge and no page on lru, and let the removal proceed.
      
      There would still be no problem if these pages were immediately freed; but
      typically (and the put_page_testzero protocol demands it) they have to be
      added back to lru before they are found freeable, then removed from lru
      and freed.  We don't see the issue when adding, because the
      mem_cgroup_iter() loops keep their own reference to the memcg being
      scanned; but when it comes to mem_cgroup_lru_del_list().
      
      I believe this was not an issue in v3.2: there, PageCgroupAcctLRU and
      PageCgroupUsed flags were used (like a trick with mirrors) to deflect view
      of pc->mem_cgroup to the stable root_mem_cgroup when neither set.
      38c5d72f ("memcg: simplify LRU handling by new rule") mercifully
      removed those convolutions, but left this General Protection Fault.
      
      But it's surprisingly easy to restore the old behaviour: just check
      PageCgroupUsed in mem_cgroup_lru_add_list() (which decides on which lruvec
      to add), and reset pc to root_mem_cgroup if page is uncharged.  A risky
      change?  just going back to how it worked before; testing, and an audit of
      uses of pc->mem_cgroup, show no problem.
      
      And there's a nice bonus: with mem_cgroup_lru_add_list() itself making
      sure that an uncharged page goes to root lru, mem_cgroup_reset_owner() no
      longer has any purpose, and we can safely revert 4e5f01c2 ("memcg:
      clear pc->mem_cgroup if necessary").
      
      Calling update_page_reclaim_stat() after add_page_to_lru_list() in swap.c
      is not strictly necessary: the lru_lock there, with RCU before memcg
      structures are freed, makes mem_cgroup_get_reclaim_stat_from_page safe
      without that; but it seems cleaner to rely on one dependency less.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7512102c
  17. 04 2月, 2012 1 次提交