1. 06 11月, 2015 1 次提交
  2. 09 9月, 2015 6 次提交
    • J
      mm/compaction: correct to flush migrated pages if pageblock skip happens · 1a16718c
      Joonsoo Kim 提交于
      We cache isolate_start_pfn before entering isolate_migratepages().  If
      pageblock is skipped in isolate_migratepages() due to whatever reason,
      cc->migrate_pfn can be far from isolate_start_pfn hence we flush pages
      that were freed.  For example, the following scenario can be possible:
      
      - assume order-9 compaction, pageblock order is 9
      - start_isolate_pfn is 0x200
      - isolate_migratepages()
        - skip a number of pageblocks
        - start to isolate from pfn 0x600
        - cc->migrate_pfn = 0x620
        - return
      - last_migrated_pfn is set to 0x200
      - check flushing condition
        - current_block_start is set to 0x600
        - last_migrated_pfn < current_block_start then do useless flush
      
      This wrong flush would not help the performance and success rate so this
      patch tries to fix it.  One simple way to know the exact position where
      we start to isolate migratable pages is that we cache it in
      isolate_migratepages() before entering actual isolation.  This patch
      implements that and fixes the problem.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1a16718c
    • V
      mm, compaction: skip compound pages by order in free scanner · 9fcd6d2e
      Vlastimil Babka 提交于
      The compaction free scanner is looking for PageBuddy() pages and
      skipping all others.  For large compound pages such as THP or hugetlbfs,
      we can save a lot of iterations if we skip them at once using their
      compound_order().  This is generally unsafe and we can read a bogus
      value of order due to a race, but if we are careful, the only danger is
      skipping too much.
      
      When tested with stress-highalloc from mmtests on 4GB system with 1GB
      hugetlbfs pages, the vmstat compact_free_scanned count decreased by at
      least 15%.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Acked-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NMichal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9fcd6d2e
    • V
      mm, compaction: always skip all compound pages by order in migrate scanner · 29c0dde8
      Vlastimil Babka 提交于
      The compaction migrate scanner tries to skip THP pages by their order,
      to reduce number of iterations for pages it cannot isolate.  The check
      is only done if PageLRU() is true, which means it applies to THP pages,
      but not e.g.  hugetlbfs pages or any other non-LRU compound pages, which
      we have to iterate by base pages.
      
      This limitation comes from the assumption that it's only safe to read
      compound_order() when we have the zone's lru_lock and THP cannot be
      split under us.  But the only danger (after filtering out order values
      that are not below MAX_ORDER, to prevent overflows) is that we skip too
      much or too little after reading a bogus compound_order() due to a rare
      race.  This is the same reasoning as patch 99c0fd5e ("mm,
      compaction: skip buddy pages by their order in the migrate scanner")
      introduced for unsafely reading PageBuddy() order.
      
      After this patch, all pages are tested for PageCompound() and we skip
      them by compound_order().  The test is done after the test for
      balloon_page_movable() as we don't want to assume if balloon pages (or
      other pages with own isolation and migration implementation if a generic
      API gets implemented) are compound or not.
      
      When tested with stress-highalloc from mmtests on 4GB system with 1GB
      hugetlbfs pages, the vmstat compact_migrate_scanned count decreased by
      15%.
      
      [kirill.shutemov@linux.intel.com: change PageTransHuge checks to PageCompound for different series was squashed here]
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NMichal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      29c0dde8
    • V
      mm, compaction: encapsulate resetting cached scanner positions · 02333641
      Vlastimil Babka 提交于
      Reseting the cached compaction scanner positions is now open-coded in
      __reset_isolation_suitable() and compact_finished().  Encapsulate the
      functionality in a new function reset_cached_positions().
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NMichal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      02333641
    • V
      mm, compaction: simplify handling restart position in free pages scanner · f5f61a32
      Vlastimil Babka 提交于
      Handling the position where compaction free scanner should restart
      (stored in cc->free_pfn) got more complex with commit e14c720e ("mm,
      compaction: remember position within pageblock in free pages scanner").
      Currently the position is updated in each loop iteration of
      isolate_freepages(), although it should be enough to update it only when
      breaking from the loop.  There's also an extra check outside the loop
      updates the position in case we have met the migration scanner.
      
      This can be simplified if we move the test for having isolated enough
      from the for-loop header next to the test for contention, and
      determining the restart position only in these cases.  We can reuse the
      isolate_start_pfn variable for this instead of setting cc->free_pfn
      directly.  Outside the loop, we can simply set cc->free_pfn to current
      value of isolate_start_pfn without any extra check.
      
      Also add a VM_BUG_ON to catch possible mistake in the future, in case we
      later add a new condition that terminates isolate_freepages_block()
      prematurely without also considering the condition in
      isolate_freepages().
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f5f61a32
    • V
      mm, compaction: more robust check for scanners meeting · f2849aa0
      Vlastimil Babka 提交于
      Assorted compaction cleanups and optimizations.  The interesting patches
      are 4 and 5.  In 4, skipping of compound pages in single iteration is
      improved for migration scanner, so it works also for !PageLRU compound
      pages such as hugetlbfs, slab etc.  Patch 5 introduces this kind of
      skipping in the free scanner.  The trick is that we can read
      compound_order() without any protection, if we are careful to filter out
      values larger than MAX_ORDER.  The only danger is that we skip too much.
      The same trick was already used for reading the freepage order in the
      migrate scanner.
      
      To demonstrate improvements of Patches 4 and 5 I've run stress-highalloc
      from mmtests, set to simulate THP allocations (including __GFP_COMP) on
      a 4GB system where 1GB was occupied by hugetlbfs pages.  I'll include
      just the relevant stats:
      
                                     Patch 3     Patch 4     Patch 5
      
      Compaction stalls                 7523        7529        7515
      Compaction success                 323         304         322
      Compaction failures               7200        7224        7192
      Page migrate success            247778      264395      240737
      Page migrate failure             15358       33184       21621
      Compaction pages isolated       906928      980192      909983
      Compaction migrate scanned     2005277     1692805     1498800
      Compaction free scanned       13255284    11539986     9011276
      Compaction cost                    288         305         277
      
      With 5 iterations per patch, the results are still noisy, but we can see
      that Patch 4 does reduce migrate_scanned by 15% thanks to skipping the
      hugetlbfs pages at once.  Interestingly, free_scanned is also reduced
      and I have no idea why.  Patch 5 further reduces free_scanned as
      expected, by 15%.  Other stats are unaffected modulo noise.
      
      [1] https://lkml.org/lkml/2015/1/19/158
      
      This patch (of 5):
      
      Compaction should finish when the migration and free scanner meet, i.e.
      they reach the same pageblock.  Currently however, the test in
      compact_finished() simply just compares the exact pfns, which may yield
      a false negative when the free scanner position is in the middle of a
      pageblock and the migration scanner reaches the begining of the same
      pageblock.
      
      This hasn't been a problem until commit e14c720e ("mm, compaction:
      remember position within pageblock in free pages scanner") allowed the
      free scanner position to be in the middle of a pageblock between
      invocations.  The hot-fix 1d5bfe1f ("mm, compaction: prevent
      infinite loop in compact_zone") prevented the issue by adding a special
      check in the migration scanner to satisfy the current detection of
      scanners meeting.
      
      However, the proper fix is to make the detection more robust.  This
      patch introduces the compact_scanners_met() function that returns true
      when the free scanner position is in the same or lower pageblock than
      the migration scanner.  The special case in isolate_migratepages()
      introduced by 1d5bfe1f is removed.
      Suggested-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NMichal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f2849aa0
  3. 16 4月, 2015 3 次提交
  4. 15 4月, 2015 1 次提交
    • J
      mm/compaction: enhance compaction finish condition · 2149cdae
      Joonsoo Kim 提交于
      Compaction has anti fragmentation algorithm.  It is that freepage should
      be more than pageblock order to finish the compaction if we don't find any
      freepage in requested migratetype buddy list.  This is for mitigating
      fragmentation, but, there is a lack of migratetype consideration and it is
      too excessive compared to page allocator's anti fragmentation algorithm.
      
      Not considering migratetype would cause premature finish of compaction.
      For example, if allocation request is for unmovable migratetype, freepage
      with CMA migratetype doesn't help that allocation and compaction should
      not be stopped.  But, current logic regards this situation as compaction
      is no longer needed, so finish the compaction.
      
      Secondly, condition is too excessive compared to page allocator's logic.
      We can steal freepage from other migratetype and change pageblock
      migratetype on more relaxed conditions in page allocator.  This is
      designed to prevent fragmentation and we can use it here.  Imposing hard
      constraint only to the compaction doesn't help much in this case since
      page allocator would cause fragmentation again.
      
      To solve these problems, this patch borrows anti fragmentation logic from
      page allocator.  It will reduce premature compaction finish in some cases
      and reduce excessive compaction work.
      
      stress-highalloc test in mmtests with non movable order 7 allocation shows
      considerable increase of compaction success rate.
      
      Compaction success rate (Compaction success * 100 / Compaction stalls, %)
      31.82 : 42.20
      
      I tested it on non-reboot 5 runs stress-highalloc benchmark and found that
      there is no more degradation on allocation success rate than before.  That
      roughly means that this patch doesn't result in more fragmentations.
      
      Vlastimil suggests additional idea that we only test for fallbacks when
      migration scanner has scanned a whole pageblock.  It looked good for
      fragmentation because chance of stealing increase due to making more free
      pages in certain pageblock.  So, I tested it, but, it results in decreased
      compaction success rate, roughly 38.00.  I guess the reason that if system
      is low memory condition, watermark check could be failed due to not enough
      order 0 free page and so, sometimes, we can't reach a fallback check
      although migrate_pfn is aligned to pageblock_nr_pages.  I can insert code
      to cope with this situation but it makes code more complicated so I don't
      include his idea at this patch.
      
      [akpm@linux-foundation.org: fix CONFIG_CMA=n build]
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2149cdae
  5. 14 2月, 2015 1 次提交
    • A
      mm: page_alloc: add kasan hooks on alloc and free paths · b8c73fc2
      Andrey Ryabinin 提交于
      Add kernel address sanitizer hooks to mark allocated page's addresses as
      accessible in corresponding shadow region.  Mark freed pages as
      inaccessible.
      Signed-off-by: NAndrey Ryabinin <a.ryabinin@samsung.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Konstantin Serebryany <kcc@google.com>
      Cc: Dmitry Chernenkov <dmitryc@google.com>
      Signed-off-by: NAndrey Konovalov <adech.fo@gmail.com>
      Cc: Yuri Gribov <tetra2005@gmail.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b8c73fc2
  6. 13 2月, 2015 3 次提交
    • H
      mm: fix negative nr_isolated counts · ff59909a
      Hugh Dickins 提交于
      The vmstat interfaces are good at hiding negative counts (at least when
      CONFIG_SMP); but if you peer behind the curtain, you find that
      nr_isolated_anon and nr_isolated_file soon go negative, and grow ever
      more negative: so they can absorb larger and larger numbers of isolated
      pages, yet still appear to be zero.
      
      I'm happy to avoid a congestion_wait() when too_many_isolated() myself;
      but I guess it's there for a good reason, in which case we ought to get
      too_many_isolated() working again.
      
      The imbalance comes from isolate_migratepages()'s ISOLATE_ABORT case:
      putback_movable_pages() decrements the NR_ISOLATED counts, but we forgot
      to call acct_isolated() to increment them.
      
      It is possible that the bug whcih this patch fixes could cause OOM kills
      when the system still has a lot of reclaimable page cache.
      
      Fixes: edc2ca61 ("mm, compaction: move pageblock checks up from isolate_migratepages_range()")
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: <stable@vger.kernel.org>	[3.18+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ff59909a
    • J
      mm/compaction: stop the isolation when we isolate enough freepage · 932ff6bb
      Joonsoo Kim 提交于
      Currently, freepage isolation in one pageblock doesn't consider how many
      freepages we isolate. When I traced flow of compaction, compaction
      sometimes isolates more than 256 freepages to migrate just 32 pages.
      
      In this patch, freepage isolation is stopped at the point that we
      have more isolated freepage than isolated page for migration. This
      results in slowing down free page scanner and make compaction success
      rate higher.
      
      stress-highalloc test in mmtests with non movable order 7 allocation shows
      increase of compaction success rate.
      
      Compaction success rate (Compaction success * 100 / Compaction stalls, %)
      27.13 : 31.82
      
      pfn where both scanners meets on compaction complete
      (separate test due to enormous tracepoint buffer)
      (zone_start=4096, zone_end=1048576)
      586034 : 654378
      
      In fact, I didn't fully understand why this patch results in such good
      result. There was a guess that not used freepages are released to pcp list
      and on next compaction trial we won't isolate them again so compaction
      success rate would decrease. To prevent this effect, I tested with adding
      pcp drain code on release_freepages(), but, it has no good effect.
      
      Anyway, this patch reduces waste time to isolate unneeded freepages so
      seems reasonable.
      
      Vlastimil said:
      
      : I briefly tried it on top of the pivot-changing series and with order-9
      : allocations it reduced free page scanned counter by almost 10%.  No effect
      : on success rates (maybe because pivot changing already took care of the
      : scanners meeting problem) but the scanning reduction is good on its own.
      :
      : It also explains why e14c720e ("mm, compaction: remember position
      : within pageblock in free pages scanner") had less than expected
      : improvements.  It would only actually stop within pageblock in case of
      : async compaction detecting contention.  I guess that's also why the
      : infinite loop problem fixed by 1d5bfe1f affected so relatively few
      : people.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Tested-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      932ff6bb
    • J
      mm/compaction: fix wrong order check in compact_finished() · 372549c2
      Joonsoo Kim 提交于
      What we want to check here is whether there is highorder freepage in buddy
      list of other migratetype in order to steal it without fragmentation.
      But, current code just checks cc->order which means allocation request
      order.  So, this is wrong.
      
      Without this fix, non-movable synchronous compaction below pageblock order
      would not stopped until compaction is complete, because migratetype of
      most pageblocks are movable and high order freepage made by compaction is
      usually on movable type buddy list.
      
      There is some report related to this bug. See below link.
      
        http://www.spinics.net/lists/linux-mm/msg81666.html
      
      Although the issued system still has load spike comes from compaction,
      this makes that system completely stable and responsive according to his
      report.
      
      stress-highalloc test in mmtests with non movable order 7 allocation
      doesn't show any notable difference in allocation success rate, but, it
      shows more compaction success rate.
      
      Compaction success rate (Compaction success * 100 / Compaction stalls, %)
      18.47 : 28.94
      
      Fixes: 1fb3f8ca ("mm: compaction: capture a suitable high-order page immediately when it is made available")
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: <stable@vger.kernel.org>	[3.7+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      372549c2
  7. 12 2月, 2015 5 次提交
  8. 11 12月, 2014 5 次提交
    • V
      mm, compaction: more focused lru and pcplists draining · fdaf7f5c
      Vlastimil Babka 提交于
      The goal of memory compaction is to create high-order freepages through
      page migration.  Page migration however puts pages on the per-cpu lru_add
      cache, which is later flushed to per-cpu pcplists, and only after pcplists
      are drained the pages can actually merge.  This can happen due to the
      per-cpu caches becoming full through further freeing, or explicitly.
      
      During direct compaction, it is useful to do the draining explicitly so
      that pages merge as soon as possible and compaction can detect success
      immediately and keep the latency impact at minimum.  However the current
      implementation is far from ideal.  Draining is done only in
      __alloc_pages_direct_compact(), after all zones were already compacted,
      and the decisions to continue or stop compaction in individual zones was
      done without the last batch of migrations being merged.  It is also
      missing the draining of lru_add cache before the pcplists.
      
      This patch moves the draining for direct compaction into compact_zone().
      It adds the missing lru_cache draining and uses the newly introduced
      single zone pcplists draining to reduce overhead and avoid impact on
      unrelated zones.  Draining is only performed when it can actually lead to
      merging of a page of desired order (passed by cc->order).  This means it
      is only done when migration occurred in the previously scanned cc->order
      aligned block(s) and the migration scanner is now pointing to the next
      cc->order aligned block.
      
      The patch has been tested with stress-highalloc benchmark from mmtests.
      Although overal allocation success rates of the benchmark were not
      affected, the number of detected compaction successes has doubled.  This
      suggests that allocations were previously successful due to implicit
      merging caused by background activity, making a later allocation attempt
      succeed immediately, but not attributing the success to compaction.  Since
      stress-highalloc always tries to allocate almost the whole memory, it
      cannot show the improvement in its reported success rate metric.  However
      after this patch, compaction should detect success and terminate earlier,
      reducing the direct compaction latencies in a real scenario.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fdaf7f5c
    • V
      mm, compaction: always update cached scanner positions · 6bace090
      Vlastimil Babka 提交于
      Compaction caches the migration and free scanner positions between
      compaction invocations, so that the whole zone gets eventually scanned and
      there is no bias towards the initial scanner positions at the
      beginning/end of the zone.
      
      The cached positions are continuously updated as scanners progress and the
      updating stops as soon as a page is successfully isolated.  The reasoning
      behind this is that a pageblock where isolation succeeded is likely to
      succeed again in near future and it should be worth revisiting it.
      
      However, the downside is that potentially many pages are rescanned without
      successful isolation.  At worst, there might be a page where isolation
      from LRU succeeds but migration fails (potentially always).  So upon
      encountering this page, cached position would always stop being updated
      for no good reason.  It might have been useful to let such page be
      rescanned with sync compaction after async one failed, but this is now
      handled by caching scanner position for async and sync mode separately
      since commit 35979ef3 ("mm, compaction: add per-zone migration pfn
      cache for async compaction").
      
      After this patch, cached positions are updated unconditionally.  In
      stress-highalloc benchmark, this has decreased the numbers of scanned
      pages by few percent, without affecting allocation success rates.
      
      To prevent free scanner from leaving free pages behind after they are
      returned due to page migration failure, the cached scanner pfn is changed
      to point to the pageblock of the returned free page with the highest pfn,
      before leaving compact_zone().
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6bace090
    • V
      mm, compaction: defer only on COMPACT_COMPLETE · f8669795
      Vlastimil Babka 提交于
      Deferred compaction is employed to avoid compacting zone where sync direct
      compaction has recently failed.  As such, it makes sense to only defer
      when a full zone was scanned, which is when compact_zone returns with
      COMPACT_COMPLETE.  It's less useful to defer when compact_zone returns
      with apparent success (COMPACT_PARTIAL), followed by a watermark check
      failure, which can happen due to parallel allocation activity.  It also
      does not make much sense to defer compaction which was completely skipped
      (COMPACT_SKIP) for being unsuitable in the first place.
      
      This patch therefore makes deferred compaction trigger only when
      COMPACT_COMPLETE is returned from compact_zone().  Results of
      stress-highalloc becnmark show the difference is within measurement error,
      so the issue is rather cosmetic.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f8669795
    • V
      mm, compaction: simplify deferred compaction · 97d47a65
      Vlastimil Babka 提交于
      Since commit 53853e2d ("mm, compaction: defer each zone individually
      instead of preferred zone"), compaction is deferred for each zone where
      sync direct compaction fails, and reset where it succeeds.  However, it
      was observed that for DMA zone compaction often appeared to succeed
      while subsequent allocation attempt would not, due to different outcome
      of watermark check.
      
      In order to properly defer compaction in this zone, the candidate zone
      has to be passed back to __alloc_pages_direct_compact() and compaction
      deferred in the zone after the allocation attempt fails.
      
      The large source of mismatch between watermark check in compaction and
      allocation was the lack of alloc_flags and classzone_idx values in
      compaction, which has been fixed in the previous patch.  So with this
      problem fixed, we can simplify the code by removing the candidate_zone
      parameter and deferring in __alloc_pages_direct_compact().
      
      After this patch, the compaction activity during stress-highalloc
      benchmark is still somewhat increased, but it's negligible compared to the
      increase that occurred without the better watermark checking.  This
      suggests that it is still possible to apparently succeed in compaction but
      fail to allocate, possibly due to parallel allocation activity.
      
      [akpm@linux-foundation.org: fix build]
      Suggested-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      97d47a65
    • V
      mm, compaction: pass classzone_idx and alloc_flags to watermark checking · ebff3980
      Vlastimil Babka 提交于
      Compaction relies on zone watermark checks for decisions such as if it's
      worth to start compacting in compaction_suitable() or whether compaction
      should stop in compact_finished().  The watermark checks take
      classzone_idx and alloc_flags parameters, which are related to the memory
      allocation request.  But from the context of compaction they are currently
      passed as 0, including the direct compaction which is invoked to satisfy
      the allocation request, and could therefore know the proper values.
      
      The lack of proper values can lead to mismatch between decisions taken
      during compaction and decisions related to the allocation request.  Lack
      of proper classzone_idx value means that lowmem_reserve is not taken into
      account.  This has manifested (during recent changes to deferred
      compaction) when DMA zone was used as fallback for preferred Normal zone.
      compaction_suitable() without proper classzone_idx would think that the
      watermarks are already satisfied, but watermark check in
      get_page_from_freelist() would fail.  Because of this problem, deferring
      compaction has extra complexity that can be removed in the following
      patch.
      
      The issue (not confirmed in practice) with missing alloc_flags is opposite
      in nature.  For allocations that include ALLOC_HIGH, ALLOC_HIGHER or
      ALLOC_CMA in alloc_flags (the last includes all MOVABLE allocations on
      CMA-enabled systems) the watermark checking in compaction with 0 passed
      will be stricter than in get_page_from_freelist().  In these cases
      compaction might be running for a longer time than is really needed.
      
      Another issue compaction_suitable() is that the check for "does the zone
      need compaction at all?" comes only after the check "does the zone have
      enough free free pages to succeed compaction".  The latter considers extra
      pages for migration and can therefore in some situations fail and return
      COMPACT_SKIPPED, although the high-order allocation would succeed and we
      should return COMPACT_PARTIAL.
      
      This patch fixes these problems by adding alloc_flags and classzone_idx to
      struct compact_control and related functions involved in direct compaction
      and watermark checking.  Where possible, all other callers of
      compaction_suitable() pass proper values where those are known.  This is
      currently limited to classzone_idx, which is sometimes known in kswapd
      context.  However, the direct reclaim callers should_continue_reclaim()
      and compaction_ready() do not currently know the proper values, so the
      coordination between reclaim and compaction may still not be as accurate
      as it could.  This can be fixed later, if it's shown to be an issue.
      
      Additionaly the checks in compact_suitable() are reordered to address the
      second issue described above.
      
      The effect of this patch should be slightly better high-order allocation
      success rates and/or less compaction overhead, depending on the type of
      allocations and presence of CMA.  It allows simplifying deferred
      compaction code in a followup patch.
      
      When testing with stress-highalloc, there was some slight improvement
      (which might be just due to variance) in success rates of non-THP-like
      allocations.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ebff3980
  9. 14 11月, 2014 2 次提交
    • V
      mm, compaction: prevent infinite loop in compact_zone · 1d5bfe1f
      Vlastimil Babka 提交于
      Several people have reported occasionally seeing processes stuck in
      compact_zone(), even triggering soft lockups, in 3.18-rc2+.
      
      Testing a revert of commit e14c720e ("mm, compaction: remember
      position within pageblock in free pages scanner") fixed the issue,
      although the stuck processes do not appear to involve the free scanner.
      
      Finally, by code inspection, the bug was found in isolate_migratepages()
      which uses a slightly different condition to detect if the migration and
      free scanners have met, than compact_finished().  That has not been a
      problem until commit e14c720e allowed the free scanner position
      between individual invocations to be in the middle of a pageblock.
      
      In a relatively rare case, the migration scanner position can end up at
      the beginning of a pageblock, with the free scanner position in the
      middle of the same pageblock.  If it's the migration scanner's turn,
      isolate_migratepages() exits immediately (without updating the
      position), while compact_finished() decides to continue compaction,
      resulting in a potentially infinite loop.  The system can recover only
      if another process creates enough high-order pages to make the watermark
      checks in compact_finished() pass.
      
      This patch fixes the immediate problem by bumping the migration
      scanner's position to meet the free scanner in isolate_migratepages(),
      when both are within the same pageblock.  This causes compact_finished()
      to terminate properly.  A more robust check in compact_finished() is
      planned as a cleanup for better future maintainability.
      
      Fixes: e14c720e ("mm, compaction: remember position within pageblock in free pages scanner)
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reported-by: NP. Christeas <xrg@linux.gr>
      Tested-by: NP. Christeas <xrg@linux.gr>
      Link: http://marc.info/?l=linux-mm&m=141508604232522&w=2Reported-by: NNorbert Preining <preining@logic.at>
      Tested-by: NNorbert Preining <preining@logic.at>
      Link: https://lkml.org/lkml/2014/11/4/904Reported-by: NPavel Machek <pavel@ucw.cz>
      Link: https://lkml.org/lkml/2014/11/7/164
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1d5bfe1f
    • J
      mm/compaction: skip the range until proper target pageblock is met · 58420016
      Joonsoo Kim 提交于
      Commit 7d49d886 ("mm, compaction: reduce zone checking frequency in
      the migration scanner") has a side-effect that changes the iteration
      range calculation.  Before the change, block_end_pfn is calculated using
      start_pfn, but now it blindly adds pageblock_nr_pages to the previous
      value.
      
      This causes the problem that isolation_start_pfn is larger than
      block_end_pfn when we isolate the page with more than pageblock order.
      In this case, isolation would fail due to an invalid range parameter.
      
      To prevent this, this patch implements skipping the range until a proper
      target pageblock is met.  Without this patch, CMA with more than
      pageblock order always fails but with this patch it will succeed.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      58420016
  10. 30 10月, 2014 1 次提交
    • J
      mm/compaction.c: avoid premature range skip in isolate_migratepages_range · 6ea41c0c
      Joonsoo Kim 提交于
      Commit edc2ca61 ("mm, compaction: move pageblock checks up from
      isolate_migratepages_range()") commonizes isolate_migratepages variants
      and make them use isolate_migratepages_block().
      
      isolate_migratepages_block() could stop the execution when enough pages
      are isolated, but, there is no code in isolate_migratepages_range() to
      handle this case.  In the result, even if isolate_migratepages_block()
      returns prematurely without checking all pages in the range,
      
      isolate_migratepages_block() is called repeately on the following
      pageblock and some pages in the previous range are skipped to check.
      Then, CMA is failed frequently due to this fact.
      
      To fix this problem, this patch let isolate_migratepages_range() know
      the situation that enough pages are isolated and stop the isolation in
      that case.
      
      Note that isolate_migratepages() has no such problem, because, it always
      stops the isolation after just one call of isolate_migratepages_block().
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6ea41c0c
  11. 10 10月, 2014 12 次提交
    • K
      mm/balloon_compaction: redesign ballooned pages management · d6d86c0a
      Konstantin Khlebnikov 提交于
      Sasha Levin reported KASAN splash inside isolate_migratepages_range().
      Problem is in the function __is_movable_balloon_page() which tests
      AS_BALLOON_MAP in page->mapping->flags.  This function has no protection
      against anonymous pages.  As result it tried to check address space flags
      inside struct anon_vma.
      
      Further investigation shows more problems in current implementation:
      
      * Special branch in __unmap_and_move() never works:
        balloon_page_movable() checks page flags and page_count.  In
        __unmap_and_move() page is locked, reference counter is elevated, thus
        balloon_page_movable() always fails.  As a result execution goes to the
        normal migration path.  virtballoon_migratepage() returns
        MIGRATEPAGE_BALLOON_SUCCESS instead of MIGRATEPAGE_SUCCESS,
        move_to_new_page() thinks this is an error code and assigns
        newpage->mapping to NULL.  Newly migrated page lose connectivity with
        balloon an all ability for further migration.
      
      * lru_lock erroneously required in isolate_migratepages_range() for
        isolation ballooned page.  This function releases lru_lock periodically,
        this makes migration mostly impossible for some pages.
      
      * balloon_page_dequeue have a tight race with balloon_page_isolate:
        balloon_page_isolate could be executed in parallel with dequeue between
        picking page from list and locking page_lock.  Race is rare because they
        use trylock_page() for locking.
      
      This patch fixes all of them.
      
      Instead of fake mapping with special flag this patch uses special state of
      page->_mapcount: PAGE_BALLOON_MAPCOUNT_VALUE = -256.  Buddy allocator uses
      PAGE_BUDDY_MAPCOUNT_VALUE = -128 for similar purpose.  Storing mark
      directly in struct page makes everything safer and easier.
      
      PagePrivate is used to mark pages present in page list (i.e.  not
      isolated, like PageLRU for normal pages).  It replaces special rules for
      reference counter and makes balloon migration similar to migration of
      normal pages.  This flag is protected by page_lock together with link to
      the balloon device.
      Signed-off-by: NKonstantin Khlebnikov <k.khlebnikov@samsung.com>
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Link: http://lkml.kernel.org/p/53E6CEAA.9020105@oracle.com
      Cc: Rafael Aquini <aquini@redhat.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: <stable@vger.kernel.org>	[3.8+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d6d86c0a
    • X
      mm/compaction.c: fix warning of 'flags' may be used uninitialized · b8b2d825
      Xiubo Li 提交于
      C      mm/compaction.o
      mm/compaction.c: In function isolate_freepages_block:
      mm/compaction.c:364:37: warning: flags may be used uninitialized in this function [-Wmaybe-uninitialized]
             && compact_unlock_should_abort(&cc->zone->lock, flags,
                                           ^
      Signed-off-by: NXiubo Li <Li.Xiubo@freescale.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b8b2d825
    • D
      mm, compaction: pass gfp mask to compact_control · 6d7ce559
      David Rientjes 提交于
      struct compact_control currently converts the gfp mask to a migratetype,
      but we need the entire gfp mask in a follow-up patch.
      
      Pass the entire gfp mask as part of struct compact_control.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6d7ce559
    • D
      mm: rename allocflags_to_migratetype for clarity · 43e7a34d
      David Rientjes 提交于
      The page allocator has gfp flags (like __GFP_WAIT) and alloc flags (like
      ALLOC_CPUSET) that have separate semantics.
      
      The function allocflags_to_migratetype() actually takes gfp flags, not
      alloc flags, and returns a migratetype.  Rename it to
      gfpflags_to_migratetype().
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      43e7a34d
    • V
      mm, compaction: skip buddy pages by their order in the migrate scanner · 99c0fd5e
      Vlastimil Babka 提交于
      The migration scanner skips PageBuddy pages, but does not consider their
      order as checking page_order() is generally unsafe without holding the
      zone->lock, and acquiring the lock just for the check wouldn't be a good
      tradeoff.
      
      Still, this could avoid some iterations over the rest of the buddy page,
      and if we are careful, the race window between PageBuddy() check and
      page_order() is small, and the worst thing that can happen is that we skip
      too much and miss some isolation candidates.  This is not that bad, as
      compaction can already fail for many other reasons like parallel
      allocations, and those have much larger race window.
      
      This patch therefore makes the migration scanner obtain the buddy page
      order and use it to skip the whole buddy page, if the order appears to be
      in the valid range.
      
      It's important that the page_order() is read only once, so that the value
      used in the checks and in the pfn calculation is the same.  But in theory
      the compiler can replace the local variable by multiple inlines of
      page_order().  Therefore, the patch introduces page_order_unsafe() that
      uses ACCESS_ONCE to prevent this.
      
      Testing with stress-highalloc from mmtests shows a 15% reduction in number
      of pages scanned by migration scanner.  The reduction is >60% with
      __GFP_NO_KSWAPD allocations, along with success rates better by few
      percent.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      99c0fd5e
    • V
      mm, compaction: remember position within pageblock in free pages scanner · e14c720e
      Vlastimil Babka 提交于
      Unlike the migration scanner, the free scanner remembers the beginning of
      the last scanned pageblock in cc->free_pfn.  It might be therefore
      rescanning pages uselessly when called several times during single
      compaction.  This might have been useful when pages were returned to the
      buddy allocator after a failed migration, but this is no longer the case.
      
      This patch changes the meaning of cc->free_pfn so that if it points to a
      middle of a pageblock, that pageblock is scanned only from cc->free_pfn to
      the end.  isolate_freepages_block() will record the pfn of the last page
      it looked at, which is then used to update cc->free_pfn.
      
      In the mmtests stress-highalloc benchmark, this has resulted in lowering
      the ratio between pages scanned by both scanners, from 2.5 free pages per
      migrate page, to 2.25 free pages per migrate page, without affecting
      success rates.
      
      With __GFP_NO_KSWAPD allocations, this appears to result in a worse ratio
      (2.1 instead of 1.8), but page migration successes increased by 10%, so
      this could mean that more useful work can be done until need_resched()
      aborts this kind of compaction.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e14c720e
    • V
      mm, compaction: skip rechecks when lock was already held · 69b7189f
      Vlastimil Babka 提交于
      Compaction scanners try to lock zone locks as late as possible by checking
      many page or pageblock properties opportunistically without lock and
      skipping them if not unsuitable.  For pages that pass the initial checks,
      some properties have to be checked again safely under lock.  However, if
      the lock was already held from a previous iteration in the initial checks,
      the rechecks are unnecessary.
      
      This patch therefore skips the rechecks when the lock was already held.
      This is now possible to do, since we don't (potentially) drop and
      reacquire the lock between the initial checks and the safe rechecks
      anymore.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      69b7189f
    • V
      mm, compaction: periodically drop lock and restore IRQs in scanners · 8b44d279
      Vlastimil Babka 提交于
      Compaction scanners regularly check for lock contention and need_resched()
      through the compact_checklock_irqsave() function.  However, if there is no
      contention, the lock can be held and IRQ disabled for potentially long
      time.
      
      This has been addressed by commit b2eef8c0 ("mm: compaction: minimise
      the time IRQs are disabled while isolating pages for migration") for the
      migration scanner.  However, the refactoring done by commit 2a1402aa
      ("mm: compaction: acquire the zone->lru_lock as late as possible") has
      changed the conditions so that the lock is dropped only when there's
      contention on the lock or need_resched() is true.  Also, need_resched() is
      checked only when the lock is already held.  The comment "give a chance to
      irqs before checking need_resched" is therefore misleading, as IRQs remain
      disabled when the check is done.
      
      This patch restores the behavior intended by commit b2eef8c0 and also
      tries to better balance and make more deterministic the time spent by
      checking for contention vs the time the scanners might run between the
      checks.  It also avoids situations where checking has not been done often
      enough before.  The result should be avoiding both too frequent and too
      infrequent contention checking, and especially the potentially
      long-running scans with IRQs disabled and no checking of need_resched() or
      for fatal signal pending, which can happen when many consecutive pages or
      pageblocks fail the preliminary tests and do not reach the later call site
      to compact_checklock_irqsave(), as explained below.
      
      Before the patch:
      
      In the migration scanner, compact_checklock_irqsave() was called each
      loop, if reached.  If not reached, some lower-frequency checking could
      still be done if the lock was already held, but this would not result in
      aborting contended async compaction until reaching
      compact_checklock_irqsave() or end of pageblock.  In the free scanner, it
      was similar but completely without the periodical checking, so lock can be
      potentially held until reaching the end of pageblock.
      
      After the patch, in both scanners:
      
      The periodical check is done as the first thing in the loop on each
      SWAP_CLUSTER_MAX aligned pfn, using the new compact_unlock_should_abort()
      function, which always unlocks the lock (if locked) and aborts async
      compaction if scheduling is needed.  It also aborts any type of compaction
      when a fatal signal is pending.
      
      The compact_checklock_irqsave() function is replaced with a slightly
      different compact_trylock_irqsave().  The biggest difference is that the
      function is not called at all if the lock is already held.  The periodical
      need_resched() checking is left solely to compact_unlock_should_abort().
      The lock contention avoidance for async compaction is achieved by the
      periodical unlock by compact_unlock_should_abort() and by using trylock in
      compact_trylock_irqsave() and aborting when trylock fails.  Sync
      compaction does not use trylock.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8b44d279
    • V
      mm, compaction: khugepaged should not give up due to need_resched() · 1f9efdef
      Vlastimil Babka 提交于
      Async compaction aborts when it detects zone lock contention or
      need_resched() is true.  David Rientjes has reported that in practice,
      most direct async compactions for THP allocation abort due to
      need_resched().  This means that a second direct compaction is never
      attempted, which might be OK for a page fault, but khugepaged is intended
      to attempt a sync compaction in such case and in these cases it won't.
      
      This patch replaces "bool contended" in compact_control with an int that
      distinguishes between aborting due to need_resched() and aborting due to
      lock contention.  This allows propagating the abort through all compaction
      functions as before, but passing the abort reason up to
      __alloc_pages_slowpath() which decides when to continue with direct
      reclaim and another compaction attempt.
      
      Another problem is that try_to_compact_pages() did not act upon the
      reported contention (both need_resched() or lock contention) immediately
      and would proceed with another zone from the zonelist.  When
      need_resched() is true, that means initializing another zone compaction,
      only to check again need_resched() in isolate_migratepages() and aborting.
       For zone lock contention, the unintended consequence is that the lock
      contended status reported back to the allocator is detrmined from the last
      zone where compaction was attempted, which is rather arbitrary.
      
      This patch fixes the problem in the following way:
      - async compaction of a zone aborting due to need_resched() or fatal signal
        pending means that further zones should not be tried. We report
        COMPACT_CONTENDED_SCHED to the allocator.
      - aborting zone compaction due to lock contention means we can still try
        another zone, since it has different set of locks. We report back
        COMPACT_CONTENDED_LOCK only if *all* zones where compaction was attempted,
        it was aborted due to lock contention.
      
      As a result of these fixes, khugepaged will proceed with second sync
      compaction as intended, when the preceding async compaction aborted due to
      need_resched().  Page fault compactions aborting due to need_resched()
      will spare some cycles previously wasted by initializing another zone
      compaction only to abort again.  Lock contention will be reported only
      when compaction in all zones aborted due to lock contention, and therefore
      it's not a good idea to try again after reclaim.
      
      In stress-highalloc from mmtests configured to use __GFP_NO_KSWAPD, this
      has improved number of THP collapse allocations by 10%, which shows
      positive effect on khugepaged.  The benchmark's success rates are
      unchanged as it is not recognized as khugepaged.  Numbers of compact_stall
      and compact_fail events have however decreased by 20%, with
      compact_success still a bit improved, which is good.  With benchmark
      configured not to use __GFP_NO_KSWAPD, there is 6% improvement in THP
      collapse allocations, and only slight improvement in stalls and failures.
      
      [akpm@linux-foundation.org: fix warnings]
      Reported-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1f9efdef
    • V
      mm, compaction: reduce zone checking frequency in the migration scanner · 7d49d886
      Vlastimil Babka 提交于
      The unification of the migrate and free scanner families of function has
      highlighted a difference in how the scanners ensure they only isolate
      pages of the intended zone.  This is important for taking zone lock or lru
      lock of the correct zone.  Due to nodes overlapping, it is however
      possible to encounter a different zone within the range of the zone being
      compacted.
      
      The free scanner, since its inception by commit 748446bb ("mm:
      compaction: memory compaction core"), has been checking the zone of the
      first valid page in a pageblock, and skipping the whole pageblock if the
      zone does not match.
      
      This checking was completely missing from the migration scanner at first,
      and later added by commit dc908600 ("mm: compaction: check for
      overlapping nodes during isolation for migration") in a reaction to a bug
      report.  But the zone comparison in migration scanner is done once per a
      single scanned page, which is more defensive and thus more costly than a
      check per pageblock.
      
      This patch unifies the checking done in both scanners to once per
      pageblock, through a new pageblock_pfn_to_page() function, which also
      includes pfn_valid() checks.  It is more defensive than the current free
      scanner checks, as it checks both the first and last page of the
      pageblock, but less defensive by the migration scanner per-page checks.
      It assumes that node overlapping may result (on some architecture) in a
      boundary between two nodes falling into the middle of a pageblock, but
      that there cannot be a node0 node1 node0 interleaving within a single
      pageblock.
      
      The result is more code being shared and a bit less per-page CPU cost in
      the migration scanner.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7d49d886
    • V
      mm, compaction: move pageblock checks up from isolate_migratepages_range() · edc2ca61
      Vlastimil Babka 提交于
      isolate_migratepages_range() is the main function of the compaction
      scanner, called either on a single pageblock by isolate_migratepages()
      during regular compaction, or on an arbitrary range by CMA's
      __alloc_contig_migrate_range().  It currently perfoms two pageblock-wide
      compaction suitability checks, and because of the CMA callpath, it tracks
      if it crossed a pageblock boundary in order to repeat those checks.
      
      However, closer inspection shows that those checks are always true for CMA:
      - isolation_suitable() is true because CMA sets cc->ignore_skip_hint to true
      - migrate_async_suitable() check is skipped because CMA uses sync compaction
      
      We can therefore move the compaction-specific checks to
      isolate_migratepages() and simplify isolate_migratepages_range().
      Furthermore, we can mimic the freepage scanner family of functions, which
      has isolate_freepages_block() function called both by compaction from
      isolate_freepages() and by CMA from isolate_freepages_range(), where each
      use-case adds own specific glue code.  This allows further code
      simplification.
      
      Thus, we rename isolate_migratepages_range() to
      isolate_migratepages_block() and limit its functionality to a single
      pageblock (or its subset).  For CMA, a new different
      isolate_migratepages_range() is created as a CMA-specific wrapper for the
      _block() function.  The checks specific to compaction are moved to
      isolate_migratepages().  As part of the unification of these two families
      of functions, we remove the redundant zone parameter where applicable,
      since zone pointer is already passed in cc->zone.
      
      Furthermore, going back to compact_zone() and compact_finished() when
      pageblock is found unsuitable (now by isolate_migratepages()) is wasteful
      - the checks are meant to skip pageblocks quickly.  The patch therefore
      also introduces a simple loop into isolate_migratepages() so that it does
      not return immediately on failed pageblock checks, but keeps going until
      isolate_migratepages_range() gets called once.  Similarily to
      isolate_freepages(), the function periodically checks if it needs to
      reschedule or abort async compaction.
      
      [iamjoonsoo.kim@lge.com: fix isolated page counting bug in compaction]
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      edc2ca61
    • V
      mm, compaction: do not recheck suitable_migration_target under lock · f8224aa5
      Vlastimil Babka 提交于
      isolate_freepages_block() rechecks if the pageblock is suitable to be a
      target for migration after it has taken the zone->lock.  However, the
      check has been optimized to occur only once per pageblock, and
      compact_checklock_irqsave() might be dropping and reacquiring lock, which
      means somebody else might have changed the pageblock's migratetype
      meanwhile.
      
      Furthermore, nothing prevents the migratetype to change right after
      isolate_freepages_block() has finished isolating.  Given how imperfect
      this is, it's simpler to just rely on the check done in
      isolate_freepages() without lock, and not pretend that the recheck under
      lock guarantees anything.  It is just a heuristic after all.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f8224aa5