1. 19 12月, 2013 4 次提交
  2. 22 11月, 2013 1 次提交
    • D
      mm: thp: give transparent hugepage code a separate copy_page · 30b0a105
      Dave Hansen 提交于
      Right now, the migration code in migrate_page_copy() uses copy_huge_page()
      for hugetlbfs and thp pages:
      
             if (PageHuge(page) || PageTransHuge(page))
                      copy_huge_page(newpage, page);
      
      So, yay for code reuse.  But:
      
        void copy_huge_page(struct page *dst, struct page *src)
        {
              struct hstate *h = page_hstate(src);
      
      and a non-hugetlbfs page has no page_hstate().  This works 99% of the
      time because page_hstate() determines the hstate from the page order
      alone.  Since the page order of a THP page matches the default hugetlbfs
      page order, it works.
      
      But, if you change the default huge page size on the boot command-line
      (say default_hugepagesz=1G), then we might not even *have* a 2MB hstate
      so page_hstate() returns null and copy_huge_page() oopses pretty fast
      since copy_huge_page() dereferences the hstate:
      
        void copy_huge_page(struct page *dst, struct page *src)
        {
              struct hstate *h = page_hstate(src);
              if (unlikely(pages_per_huge_page(h) > MAX_ORDER_NR_PAGES)) {
        ...
      
      Mel noticed that the migration code is really the only user of these
      functions.  This moves all the copy code over to migrate.c and makes
      copy_huge_page() work for THP by checking for it explicitly.
      
      I believe the bug was introduced in commit b32967ff ("mm: numa: Add
      THP migration for the NUMA working set scanning fault case")
      
      [akpm@linux-foundation.org: fix coding-style and comment text, per Naoya Horiguchi]
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Tested-by: NDave Jiang <dave.jiang@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      30b0a105
  3. 15 11月, 2013 2 次提交
    • K
      mm: convert the rest to new page table lock api · c4088ebd
      Kirill A. Shutemov 提交于
      Only trivial cases left. Let's convert them altogether.
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Tested-by: NAlex Thorlton <athorlton@sgi.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Robin Holt <robinmholt@gmail.com>
      Cc: Sedat Dilek <sedat.dilek@gmail.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c4088ebd
    • K
      mm, hugetlb: convert hugetlbfs to use split pmd lock · cb900f41
      Kirill A. Shutemov 提交于
      Hugetlb supports multiple page sizes. We use split lock only for PMD
      level, but not for PUD.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Tested-by: NAlex Thorlton <athorlton@sgi.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Robin Holt <robinmholt@gmail.com>
      Cc: Sedat Dilek <sedat.dilek@gmail.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cb900f41
  4. 29 10月, 2013 1 次提交
  5. 17 10月, 2013 1 次提交
    • C
      mm: migration: do not lose soft dirty bit if page is in migration state · c3d16e16
      Cyrill Gorcunov 提交于
      If page migration is turned on in config and the page is migrating, we
      may lose the soft dirty bit.  If fork and mprotect are called on
      migrating pages (once migration is complete) pages do not obtain the
      soft dirty bit in the correspond pte entries.  Fix it adding an
      appropriate test on swap entries.
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c3d16e16
  6. 09 10月, 2013 5 次提交
    • R
      mm: numa: Copy cpupid on page migration · 7851a45c
      Rik van Riel 提交于
      After page migration, the new page has the nidpid unset. This makes
      every fault on a recently migrated page look like a first numa fault,
      leading to another page migration.
      
      Copying over the nidpid at page migration time should prevent erroneous
      migrations of recently migrated pages.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1381141781-10992-46-git-send-email-mgorman@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7851a45c
    • P
      mm: numa: Change page last {nid,pid} into {cpu,pid} · 90572890
      Peter Zijlstra 提交于
      Change the per page last fault tracking to use cpu,pid instead of
      nid,pid. This will allow us to try and lookup the alternate task more
      easily. Note that even though it is the cpu that is store in the page
      flags that the mpol_misplaced decision is still based on the node.
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/1381141781-10992-43-git-send-email-mgorman@suse.de
      [ Fixed build failure on 32-bit systems. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      90572890
    • M
      sched/numa: Set preferred NUMA node based on number of private faults · b795854b
      Mel Gorman 提交于
      Ideally it would be possible to distinguish between NUMA hinting faults that
      are private to a task and those that are shared. If treated identically
      there is a risk that shared pages bounce between nodes depending on
      the order they are referenced by tasks. Ultimately what is desirable is
      that task private pages remain local to the task while shared pages are
      interleaved between sharing tasks running on different nodes to give good
      average performance. This is further complicated by THP as even
      applications that partition their data may not be partitioning on a huge
      page boundary.
      
      To start with, this patch assumes that multi-threaded or multi-process
      applications partition their data and that in general the private accesses
      are more important for cpu->memory locality in the general case. Also,
      no new infrastructure is required to treat private pages properly but
      interleaving for shared pages requires additional infrastructure.
      
      To detect private accesses the pid of the last accessing task is required
      but the storage requirements are a high. This patch borrows heavily from
      Ingo Molnar's patch "numa, mm, sched: Implement last-CPU+PID hash tracking"
      to encode some bits from the last accessing task in the page flags as
      well as the node information. Collisions will occur but it is better than
      just depending on the node information. Node information is then used to
      determine if a page needs to migrate. The PID information is used to detect
      private/shared accesses. The preferred NUMA node is selected based on where
      the maximum number of approximately private faults were measured. Shared
      faults are not taken into consideration for a few reasons.
      
      First, if there are many tasks sharing the page then they'll all move
      towards the same node. The node will be compute overloaded and then
      scheduled away later only to bounce back again. Alternatively the shared
      tasks would just bounce around nodes because the fault information is
      effectively noise. Either way accounting for shared faults the same as
      private faults can result in lower performance overall.
      
      The second reason is based on a hypothetical workload that has a small
      number of very important, heavily accessed private pages but a large shared
      array. The shared array would dominate the number of faults and be selected
      as a preferred node even though it's the wrong decision.
      
      The third reason is that multiple threads in a process will race each
      other to fault the shared page making the fault information unreliable.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      [ Fix complication error when !NUMA_BALANCING. ]
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1381141781-10992-30-git-send-email-mgorman@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b795854b
    • M
      mm: numa: Scan pages with elevated page_mapcount · 1bc115d8
      Mel Gorman 提交于
      Currently automatic NUMA balancing is unable to distinguish between false
      shared versus private pages except by ignoring pages with an elevated
      page_mapcount entirely. This avoids shared pages bouncing between the
      nodes whose task is using them but that is ignored quite a lot of data.
      
      This patch kicks away the training wheels in preparation for adding support
      for identifying shared/private pages is now in place. The ordering is so
      that the impact of the shared/private detection can be easily measured. Note
      that the patch does not migrate shared, file-backed within vmas marked
      VM_EXEC as these are generally shared library pages. Migrating such pages
      is not beneficial as there is an expectation they are read-shared between
      caches and iTLB and iCache pressure is generally low.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1381141781-10992-28-git-send-email-mgorman@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1bc115d8
    • M
      mm: Close races between THP migration and PMD numa clearing · a54a407f
      Mel Gorman 提交于
      THP migration uses the page lock to guard against parallel allocations
      but there are cases like this still open
      
        Task A					Task B
        ---------------------				---------------------
        do_huge_pmd_numa_page				do_huge_pmd_numa_page
        lock_page
        mpol_misplaced == -1
        unlock_page
        goto clear_pmdnuma
      						lock_page
      						mpol_misplaced == 2
      						migrate_misplaced_transhuge
        pmd = pmd_mknonnuma
        set_pmd_at
      
      During hours of testing, one crashed with weird errors and while I have
      no direct evidence, I suspect something like the race above happened.
      This patch extends the page lock to being held until the pmd_numa is
      cleared to prevent migration starting in parallel while the pmd_numa is
      being cleared. It also flushes the old pmd entry and orders pagetable
      insertion before rmap insertion.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1381141781-10992-9-git-send-email-mgorman@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a54a407f
  7. 01 10月, 2013 1 次提交
    • R
      mm: avoid reinserting isolated balloon pages into LRU lists · 117aad1e
      Rafael Aquini 提交于
      Isolated balloon pages can wrongly end up in LRU lists when
      migrate_pages() finishes its round without draining all the isolated
      page list.
      
      The same issue can happen when reclaim_clean_pages_from_list() tries to
      reclaim pages from an isolated page list, before migration, in the CMA
      path.  Such balloon page leak opens a race window against LRU lists
      shrinkers that leads us to the following kernel panic:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
        IP: [<ffffffff810c2625>] shrink_page_list+0x24e/0x897
        PGD 3cda2067 PUD 3d713067 PMD 0
        Oops: 0000 [#1] SMP
        CPU: 0 PID: 340 Comm: kswapd0 Not tainted 3.12.0-rc1-22626-g4367597 #87
        Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
        RIP: shrink_page_list+0x24e/0x897
        RSP: 0000:ffff88003da499b8  EFLAGS: 00010286
        RAX: 0000000000000000 RBX: ffff88003e82bd60 RCX: 00000000000657d5
        RDX: 0000000000000000 RSI: 000000000000031f RDI: ffff88003e82bd40
        RBP: ffff88003da49ab0 R08: 0000000000000001 R09: 0000000081121a45
        R10: ffffffff81121a45 R11: ffff88003c4a9a28 R12: ffff88003e82bd40
        R13: ffff88003da0e800 R14: 0000000000000001 R15: ffff88003da49d58
        FS:  0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00000000067d9000 CR3: 000000003ace5000 CR4: 00000000000407b0
        Call Trace:
          shrink_inactive_list+0x240/0x3de
          shrink_lruvec+0x3e0/0x566
          __shrink_zone+0x94/0x178
          shrink_zone+0x3a/0x82
          balance_pgdat+0x32a/0x4c2
          kswapd+0x2f0/0x372
          kthread+0xa2/0xaa
          ret_from_fork+0x7c/0xb0
        Code: 80 7d 8f 01 48 83 95 68 ff ff ff 00 4c 89 e7 e8 5a 7b 00 00 48 85 c0 49 89 c5 75 08 80 7d 8f 00 74 3e eb 31 48 8b 80 18 01 00 00 <48> 8b 74 0d 48 8b 78 30 be 02 00 00 00 ff d2 eb
        RIP  [<ffffffff810c2625>] shrink_page_list+0x24e/0x897
         RSP <ffff88003da499b8>
        CR2: 0000000000000028
        ---[ end trace 703d2451af6ffbfd ]---
        Kernel panic - not syncing: Fatal exception
      
      This patch fixes the issue, by assuring the proper tests are made at
      putback_movable_pages() & reclaim_clean_pages_from_list() to avoid
      isolated balloon pages being wrongly reinserted in LRU lists.
      
      [akpm@linux-foundation.org: clarify awkward comment text]
      Signed-off-by: NRafael Aquini <aquini@redhat.com>
      Reported-by: NLuiz Capitulino <lcapitulino@redhat.com>
      Tested-by: NLuiz Capitulino <lcapitulino@redhat.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      117aad1e
  8. 12 9月, 2013 5 次提交
    • L
      mm: vmscan: fix do_try_to_free_pages() livelock · 6e543d57
      Lisa Du 提交于
      This patch is based on KOSAKI's work and I add a little more description,
      please refer https://lkml.org/lkml/2012/6/14/74.
      
      Currently, I found system can enter a state that there are lots of free
      pages in a zone but only order-0 and order-1 pages which means the zone is
      heavily fragmented, then high order allocation could make direct reclaim
      path's long stall(ex, 60 seconds) especially in no swap and no compaciton
      enviroment.  This problem happened on v3.4, but it seems issue still lives
      in current tree, the reason is do_try_to_free_pages enter live lock:
      
      kswapd will go to sleep if the zones have been fully scanned and are still
      not balanced.  As kswapd thinks there's little point trying all over again
      to avoid infinite loop.  Instead it changes order from high-order to
      0-order because kswapd think order-0 is the most important.  Look at
      73ce02e9 in detail.  If watermarks are ok, kswapd will go back to sleep
      and may leave zone->all_unreclaimable =3D 0.  It assume high-order users
      can still perform direct reclaim if they wish.
      
      Direct reclaim continue to reclaim for a high order which is not a
      COSTLY_ORDER without oom-killer until kswapd turn on
      zone->all_unreclaimble= .  This is because to avoid too early oom-kill.
      So it means direct_reclaim depends on kswapd to break this loop.
      
      In worst case, direct-reclaim may continue to page reclaim forever when
      kswapd sleeps forever until someone like watchdog detect and finally kill
      the process.  As described in:
      http://thread.gmane.org/gmane.linux.kernel.mm/103737
      
      We can't turn on zone->all_unreclaimable from direct reclaim path because
      direct reclaim path don't take any lock and this way is racy.  Thus this
      patch removes zone->all_unreclaimable field completely and recalculates
      zone reclaimable state every time.
      
      Note: we can't take the idea that direct-reclaim see zone->pages_scanned
      directly and kswapd continue to use zone->all_unreclaimable.  Because, it
      is racy.  commit 929bea7c (vmscan: all_unreclaimable() use
      zone->all_unreclaimable as a name) describes the detail.
      
      [akpm@linux-foundation.org: uninline zone_reclaimable_pages() and zone_reclaimable()]
      Cc: Aaditya Kumar <aaditya.kumar.30@gmail.com>
      Cc: Ying Han <yinghan@google.com>
      Cc: Nick Piggin <npiggin@gmail.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Bob Liu <lliubbo@gmail.com>
      Cc: Neil Zhang <zhangwm@marvell.com>
      Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
      Reviewed-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NLisa Du <cldu@marvell.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6e543d57
    • N
      mm: migrate: check movability of hugepage in unmap_and_move_huge_page() · 83467efb
      Naoya Horiguchi 提交于
      Currently hugepage migration works well only for pmd-based hugepages
      (mainly due to lack of testing,) so we had better not enable migration of
      other levels of hugepages until we are ready for it.
      
      Some users of hugepage migration (mbind, move_pages, and migrate_pages) do
      page table walk and check pud/pmd_huge() there, so they are safe.  But the
      other users (softoffline and memory hotremove) don't do this, so without
      this patch they can try to migrate unexpected types of hugepages.
      
      To prevent this, we introduce hugepage_migration_support() as an
      architecture dependent check of whether hugepage are implemented on a pmd
      basis or not.  And on some architecture multiple sizes of hugepages are
      available, so hugepage_migration_support() also checks hugepage size.
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83467efb
    • N
      mm: migrate: add hugepage migration code to move_pages() · e632a938
      Naoya Horiguchi 提交于
      Extend move_pages() to handle vma with VM_HUGETLB set.  We will be able to
      migrate hugepage with move_pages(2) after applying the enablement patch
      which comes later in this series.
      
      We avoid getting refcount on tail pages of hugepage, because unlike thp,
      hugepage is not split and we need not care about races with splitting.
      
      And migration of larger (1GB for x86_64) hugepage are not enabled.
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NAndi Kleen <ak@linux.intel.com>
      Reviewed-by: NWanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e632a938
    • N
      mm: soft-offline: use migrate_pages() instead of migrate_huge_page() · b8ec1cee
      Naoya Horiguchi 提交于
      Currently migrate_huge_page() takes a pointer to a hugepage to be migrated
      as an argument, instead of taking a pointer to the list of hugepages to be
      migrated.  This behavior was introduced in commit 189ebff2 ("hugetlb:
      simplify migrate_huge_page()"), and was OK because until now hugepage
      migration is enabled only for soft-offlining which migrates only one
      hugepage in a single call.
      
      But the situation will change in the later patches in this series which
      enable other users of page migration to support hugepage migration.  They
      can kick migration for both of normal pages and hugepages in a single
      call, so we need to go back to original implementation which uses linked
      lists to collect the hugepages to be migrated.
      
      With this patch, soft_offline_huge_page() switches to use migrate_pages(),
      and migrate_huge_page() is not used any more.  So let's remove it.
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NAndi Kleen <ak@linux.intel.com>
      Reviewed-by: NWanpeng Li <liwanp@linux.vnet.ibm.com>
      Acked-by: NHillf Danton <dhillf@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b8ec1cee
    • N
      mm: migrate: make core migration code aware of hugepage · 31caf665
      Naoya Horiguchi 提交于
      Currently hugepage migration is available only for soft offlining, but
      it's also useful for some other users of page migration (clearly because
      users of hugepage can enjoy the benefit of mempolicy and memory hotplug.)
      So this patchset tries to extend such users to support hugepage migration.
      
      The target of this patchset is to enable hugepage migration for NUMA
      related system calls (migrate_pages(2), move_pages(2), and mbind(2)), and
      memory hotplug.
      
      This patchset does not add hugepage migration for memory compaction,
      because users of memory compaction mainly expect to construct thp by
      arranging raw pages, and there's little or no need to compact hugepages.
      CMA, another user of page migration, can have benefit from hugepage
      migration, but is not enabled to support it for now (just because of lack
      of testing and expertise in CMA.)
      
      Hugepage migration of non pmd-based hugepage (for example 1GB hugepage in
      x86_64, or hugepages in architectures like ia64) is not enabled for now
      (again, because of lack of testing.)
      
      As for how these are achived, I extended the API (migrate_pages()) to
      handle hugepage (with patch 1 and 2) and adjusted code of each caller to
      check and collect movable hugepages (with patch 3-7).  Remaining 2 patches
      are kind of miscellaneous ones to avoid unexpected behavior.  Patch 8 is
      about making sure that we only migrate pmd-based hugepages.  And patch 9
      is about choosing appropriate zone for hugepage allocation.
      
      My test is mainly functional one, simply kicking hugepage migration via
      each entry point and confirm that migration is done correctly.  Test code
      is available here:
      
        git://github.com/Naoya-Horiguchi/test_hugepage_migration_extension.git
      
      And I always run libhugetlbfs test when changing hugetlbfs's code.  With
      this patchset, no regression was found in the test.
      
      This patch (of 9):
      
      Before enabling each user of page migration to support hugepage,
      this patch enables the list of pages for migration to link not only
      LRU pages, but also hugepages. As a result, putback_movable_pages()
      and migrate_pages() can handle both of LRU pages and hugepages.
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NAndi Kleen <ak@linux.intel.com>
      Reviewed-by: NWanpeng Li <liwanp@linux.vnet.ibm.com>
      Acked-by: NHillf Danton <dhillf@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      31caf665
  9. 16 7月, 2013 1 次提交
  10. 13 6月, 2013 1 次提交
  11. 25 5月, 2013 1 次提交
    • L
      mm compaction: fix of improper cache flush in migration code · c2cc499c
      Leonid Yegoshin 提交于
      Page 'new' during MIGRATION can't be flushed with flush_cache_page().
      Using flush_cache_page(vma, addr, pfn) is justified only if the page is
      already placed in process page table, and that is done right after
      flush_cache_page().  But without it the arch function has no knowledge
      of process PTE and does nothing.
      
      Besides that, flush_cache_page() flushes an application cache page, but
      the kernel has a different page virtual address and dirtied it.
      
      Replace it with flush_dcache_page(new) which is the proper usage.
      
      The old page is flushed in try_to_unmap_one() before migration.
      
      This bug takes place in Sead3 board with M14Kc MIPS CPU without cache
      aliasing (but Harvard arch - separate I and D cache) in tight memory
      environment (128MB) each 1-3days on SOAK test.  It fails in cc1 during
      kernel build (SIGILL, SIGBUS, SIGSEG) if CONFIG_COMPACTION is switched
      ON.
      Signed-off-by: NLeonid Yegoshin <Leonid.Yegoshin@imgtec.com>
      Cc: Leonid Yegoshin <yegoshin@mips.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: David Miller <davem@davemloft.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c2cc499c
  12. 30 4月, 2013 2 次提交
  13. 24 2月, 2013 6 次提交
    • H
      mm: remove offlining arg to migrate_pages · 9c620e2b
      Hugh Dickins 提交于
      No functional change, but the only purpose of the offlining argument to
      migrate_pages() etc, was to ensure that __unmap_and_move() could migrate a
      KSM page for memory hotremove (which took ksm_thread_mutex) but not for
      other callers.  Now all cases are safe, remove the arg.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Petr Holasek <pholasek@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Izik Eidus <izik.eidus@ravellosystems.com>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9c620e2b
    • H
      ksm: enable KSM page migration · b79bc0a0
      Hugh Dickins 提交于
      Migration of KSM pages is now safe: remove the PageKsm restrictions from
      mempolicy.c and migrate.c.
      
      But keep PageKsm out of __unmap_and_move()'s anon_vma contortions, which
      are irrelevant to KSM: it looks as if that code was preventing hotremove
      migration of KSM pages, unless they happened to be in swapcache.
      
      There is some question as to whether enforcing a NUMA mempolicy migration
      ought to migrate KSM pages, mapped into entirely unrelated processes; but
      moving page_mapcount > 1 is only permitted with MPOL_MF_MOVE_ALL anyway,
      and it seems reasonable to assume that you wouldn't set MADV_MERGEABLE on
      any area where this is a worry.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Petr Holasek <pholasek@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Izik Eidus <izik.eidus@ravellosystems.com>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b79bc0a0
    • H
      ksm: make KSM page migration possible · c8d6553b
      Hugh Dickins 提交于
      KSM page migration is already supported in the case of memory hotremove,
      which takes the ksm_thread_mutex across all its migrations to keep life
      simple.
      
      But the new KSM NUMA merge_across_nodes knob introduces a problem, when
      it's set to non-default 0: if a KSM page is migrated to a different NUMA
      node, how do we migrate its stable node to the right tree?  And what if
      that collides with an existing stable node?
      
      So far there's no provision for that, and this patch does not attempt to
      deal with it either.  But how will I test a solution, when I don't know
      how to hotremove memory?  The best answer is to enable KSM page migration
      in all cases now, and test more common cases.  With THP and compaction
      added since KSM came in, page migration is now mainstream, and it's a
      shame that a KSM page can frustrate freeing a page block.
      
      Without worrying about merge_across_nodes 0 for now, this patch gets KSM
      page migration working reliably for default merge_across_nodes 1 (but
      leave the patch enabling it until near the end of the series).
      
      It's much simpler than I'd originally imagined, and does not require an
      additional tier of locking: page migration relies on the page lock, KSM
      page reclaim relies on the page lock, the page lock is enough for KSM page
      migration too.
      
      Almost all the care has to be in get_ksm_page(): that's the function which
      worries about when a stable node is stale and should be freed, now it also
      has to worry about the KSM page being migrated.
      
      The only new overhead is an additional put/get/lock/unlock_page when
      stable_tree_search() arrives at a matching node: to make sure migration
      respects the raised page count, and so does not migrate the page while
      we're busy with it here.  That's probably avoidable, either by changing
      internal interfaces from using kpage to stable_node, or by moving the
      ksm_migrate_page() callsite into a page_freeze_refs() section (even if not
      swapcache); but this works well, I've no urge to pull it apart now.
      
      (Descents of the stable tree may pass through nodes whose KSM pages are
      under migration: being unlocked, the raised page count does not prevent
      that, nor need it: it's safe to memcmp against either old or new page.)
      
      You might worry about mremap, and whether page migration's rmap_walk to
      remove migration entries will find all the KSM locations where it inserted
      earlier: that should already be handled, by the satisfyingly heavy hammer
      of move_vma()'s call to ksm_madvise(,,,MADV_UNMERGEABLE,).
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Petr Holasek <pholasek@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Izik Eidus <izik.eidus@ravellosystems.com>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c8d6553b
    • M
      mm: rename page struct field helpers · 22b751c3
      Mel Gorman 提交于
      The function names page_xchg_last_nid(), page_last_nid() and
      reset_page_last_nid() were judged to be inconsistent so rename them to a
      struct_field_op style pattern.  As it looked jarring to have
      reset_page_mapcount() and page_nid_reset_last() beside each other in
      memmap_init_zone(), this patch also renames reset_page_mapcount() to
      page_mapcount_reset().  There are others like init_page_count() but as
      it is used throughout the arch code a rename would likely cause more
      conflicts than it is worth.
      
      [akpm@linux-foundation.org: fix zcache]
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Suggested-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      22b751c3
    • H
      mm: numa: cleanup flow of transhuge page migration · 340ef390
      Hugh Dickins 提交于
      When correcting commit 04fa5d6a ("mm: migrate: check page_count of
      THP before migrating") Hugh Dickins noted that the control flow for
      transhuge migration was difficult to follow.  Unconditionally calling
      put_page() in numamigrate_isolate_page() made the failure paths of both
      migrate_misplaced_transhuge_page() and migrate_misplaced_page() more
      complex that they should be.  Further, he was extremely wary that an
      unlock_page() should ever happen after a put_page() even if the
      put_page() should never be the final put_page.
      
      Hugh implemented the following cleanup to simplify the path by calling
      putback_lru_page() inside numamigrate_isolate_page() if it failed to
      isolate and always calling unlock_page() within
      migrate_misplaced_transhuge_page().
      
      There is no functional change after this patch is applied but the code
      is easier to follow and unlock_page() always happens before put_page().
      
      [mgorman@suse.de: changelog only]
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Simon Jeons <simon.jeons@gmail.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      340ef390
    • M
      mm: numa: take THP into account when migrating pages for NUMA balancing · 3abef4e6
      Mel Gorman 提交于
      Wanpeng Li pointed out that numamigrate_isolate_page() assumes that only
      one base page is being migrated when in fact it can also be checking
      THP.
      
      The consequences are that a migration will be attempted when a target
      node is nearly full and fail later.  It's unlikely to be user-visible
      but it should be fixed.  While we are there, migrate_balanced_pgdat()
      should treat nr_migrate_pages as an unsigned long as it is treated as a
      watermark.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Suggested-by: NWanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Simon Jeons <simon.jeons@gmail.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3abef4e6
  14. 05 2月, 2013 1 次提交
  15. 12 1月, 2013 1 次提交
    • M
      mm: migrate: check page_count of THP before migrating · 04fa5d6a
      Mel Gorman 提交于
      Hugh Dickins pointed out that migrate_misplaced_transhuge_page() does
      not check page_count before migrating like base page migration and
      khugepage.  He could not see why this was safe and he is right.
      
      The potential impact of the bug is avoided due to the limitations of
      NUMA balancing.  The page_mapcount() check ensures that only a single
      address space is using this page and as THPs are typically private it
      should not be possible for another address space to fault it in
      parallel.  If the address space has one associated task then it's
      difficult to have both a GUP pin and be referencing the page at the same
      time.  If there are multiple tasks then a buggy scenario requires that
      another thread be accessing the page while the direct IO is in flight.
      This is dodgy behaviour as there is a possibility of corruption with or
      without THP migration.  It would be
      
      While we happen to be safe for the most part it is shoddy to depend on
      such "safety" so this patch checks the page count similar to anonymous
      pages.  Note that this does not mean that the page_mapcount() check can
      go away.  If we were to remove the page_mapcount() check the the THP
      would have to be unmapped from all referencing PTEs, replaced with
      migration PTEs and restored properly afterwards.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reported-by: NHugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      04fa5d6a
  16. 18 12月, 2012 1 次提交
    • S
      mm,numa: fix update_mmu_cache_pmd call · ce4a9cc5
      Stephen Rothwell 提交于
      This build error is currently hidden by the fact that the x86
      implementation of 'update_mmu_cache_pmd()' is a macro that doesn't use
      its last argument, but commit b32967ff ("mm: numa: Add THP migration
      for the NUMA working set scanning fault case") introduced a call with
      the wrong third argument.
      
      In the akpm tree, it causes this build error:
      
        mm/migrate.c: In function 'migrate_misplaced_transhuge_page_put':
        mm/migrate.c:1666:2: error: incompatible type for argument 3 of 'update_mmu_cache_pmd'
        arch/x86/include/asm/pgtable.h:792:20: note: expected 'struct pmd_t *' but argument is of type 'pmd_t'
      
      Fix it.
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ce4a9cc5
  17. 13 12月, 2012 1 次提交
  18. 12 12月, 2012 4 次提交
    • R
      mm: introduce putback_movable_pages() · 5733c7d1
      Rafael Aquini 提交于
      The PATCH "mm: introduce compaction and migration for virtio ballooned pages"
      hacks around putback_lru_pages() in order to allow ballooned pages to be
      re-inserted on balloon page list as if a ballooned page was like a LRU page.
      
      As ballooned pages are not legitimate LRU pages, this patch introduces
      putback_movable_pages() to properly cope with cases where the isolated
      pageset contains ballooned pages and LRU pages, thus fixing the mentioned
      inelegant hack around putback_lru_pages().
      Signed-off-by: NRafael Aquini <aquini@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5733c7d1
    • R
      mm: introduce compaction and migration for ballooned pages · bf6bddf1
      Rafael Aquini 提交于
      Memory fragmentation introduced by ballooning might reduce significantly
      the number of 2MB contiguous memory blocks that can be used within a guest,
      thus imposing performance penalties associated with the reduced number of
      transparent huge pages that could be used by the guest workload.
      
      This patch introduces the helper functions as well as the necessary changes
      to teach compaction and migration bits how to cope with pages which are
      part of a guest memory balloon, in order to make them movable by memory
      compaction procedures.
      Signed-off-by: NRafael Aquini <aquini@redhat.com>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bf6bddf1
    • R
      mm: adjust address_space_operations.migratepage() return code · 78bd5209
      Rafael Aquini 提交于
      Memory fragmentation introduced by ballooning might reduce significantly
      the number of 2MB contiguous memory blocks that can be used within a
      guest, thus imposing performance penalties associated with the reduced
      number of transparent huge pages that could be used by the guest workload.
      
      This patch-set follows the main idea discussed at 2012 LSFMMS session:
      "Ballooning for transparent huge pages" -- http://lwn.net/Articles/490114/
      to introduce the required changes to the virtio_balloon driver, as well as
      the changes to the core compaction & migration bits, in order to make
      those subsystems aware of ballooned pages and allow memory balloon pages
      become movable within a guest, thus avoiding the aforementioned
      fragmentation issue
      
      Following are numbers that prove this patch benefits on allowing
      compaction to be more effective at memory ballooned guests.
      
      Results for STRESS-HIGHALLOC benchmark, from Mel Gorman's mmtests suite,
      running on a 4gB RAM KVM guest which was ballooning 512mB RAM in 64mB
      chunks, at every minute (inflating/deflating), while test was running:
      
      ===BEGIN stress-highalloc
      
      STRESS-HIGHALLOC
                       highalloc-3.7     highalloc-3.7
                           rc4-clean         rc4-patch
      Pass 1          55.00 ( 0.00%)    62.00 ( 7.00%)
      Pass 2          54.00 ( 0.00%)    62.00 ( 8.00%)
      while Rested    75.00 ( 0.00%)    80.00 ( 5.00%)
      
      MMTests Statistics: duration
                       3.7         3.7
                 rc4-clean   rc4-patch
      User         1207.59     1207.46
      System       1300.55     1299.61
      Elapsed      2273.72     2157.06
      
      MMTests Statistics: vmstat
                                      3.7         3.7
                                rc4-clean   rc4-patch
      Page Ins                    3581516     2374368
      Page Outs                  11148692    10410332
      Swap Ins                         80          47
      Swap Outs                      3641         476
      Direct pages scanned          37978       33826
      Kswapd pages scanned        1828245     1342869
      Kswapd pages reclaimed      1710236     1304099
      Direct pages reclaimed        32207       31005
      Kswapd efficiency               93%         97%
      Kswapd velocity             804.077     622.546
      Direct efficiency               84%         91%
      Direct velocity              16.703      15.682
      Percentage direct scans          2%          2%
      Page writes by reclaim        79252        9704
      Page writes file              75611        9228
      Page writes anon               3641         476
      Page reclaim immediate        16764       11014
      Page rescued immediate            0           0
      Slabs scanned               2171904     2152448
      Direct inode steals             385        2261
      Kswapd inode steals          659137      609670
      Kswapd skipped wait               1          69
      THP fault alloc                 546         631
      THP collapse alloc              361         339
      THP splits                      259         263
      THP fault fallback               98          50
      THP collapse fail                20          17
      Compaction stalls               747         499
      Compaction success              244         145
      Compaction failures             503         354
      Compaction pages moved       370888      474837
      Compaction move failure       77378       65259
      
      ===END stress-highalloc
      
      This patch:
      
      Introduce MIGRATEPAGE_SUCCESS as the default return code for
      address_space_operations.migratepage() method and documents the expected
      return code for the same method in failure cases.
      Signed-off-by: NRafael Aquini <aquini@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      78bd5209
    • B
      mm: introduce mm_find_pmd() · 6219049a
      Bob Liu 提交于
      Several place need to find the pmd by(mm_struct, address), so introduce a
      function to simplify it.
      
      [akpm@linux-foundation.org: fix warning]
      Signed-off-by: NBob Liu <lliubbo@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Ni zhan Chen <nizhan.chen@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6219049a
  19. 11 12月, 2012 1 次提交
    • I
      mm/rmap, migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable · 4fc3f1d6
      Ingo Molnar 提交于
      rmap_walk_anon() and try_to_unmap_anon() appears to be too
      careful about locking the anon vma: while it needs protection
      against anon vma list modifications, it does not need exclusive
      access to the list itself.
      
      Transforming this exclusive lock to a read-locked rwsem removes
      a global lock from the hot path of page-migration intense
      threaded workloads which can cause pathological performance like
      this:
      
          96.43%        process 0  [kernel.kallsyms]  [k] perf_trace_sched_switch
                        |
                        --- perf_trace_sched_switch
                            __schedule
                            schedule
                            schedule_preempt_disabled
                            __mutex_lock_common.isra.6
                            __mutex_lock_slowpath
                            mutex_lock
                           |
                           |--50.61%-- rmap_walk
                           |          move_to_new_page
                           |          migrate_pages
                           |          migrate_misplaced_page
                           |          __do_numa_page.isra.69
                           |          handle_pte_fault
                           |          handle_mm_fault
                           |          __do_page_fault
                           |          do_page_fault
                           |          page_fault
                           |          __memset_sse2
                           |          |
                           |           --100.00%-- worker_thread
                           |                     |
                           |                      --100.00%-- start_thread
                           |
                            --49.39%-- page_lock_anon_vma
                                      try_to_unmap_anon
                                      try_to_unmap
                                      migrate_pages
                                      migrate_misplaced_page
                                      __do_numa_page.isra.69
                                      handle_pte_fault
                                      handle_mm_fault
                                      __do_page_fault
                                      do_page_fault
                                      page_fault
                                      __memset_sse2
                                      |
                                       --100.00%-- worker_thread
                                                 start_thread
      
      With this change applied the profile is now nicely flat
      and there's no anon-vma related scheduling/blocking.
      
      Rename anon_vma_[un]lock() => anon_vma_[un]lock_write(),
      to make it clearer that it's an exclusive write-lock in
      that case - suggested by Rik van Riel.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Turner <pjt@google.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      4fc3f1d6