1. 14 11月, 2014 1 次提交
    • J
      mm/compaction: skip the range until proper target pageblock is met · 58420016
      Joonsoo Kim 提交于
      Commit 7d49d886 ("mm, compaction: reduce zone checking frequency in
      the migration scanner") has a side-effect that changes the iteration
      range calculation.  Before the change, block_end_pfn is calculated using
      start_pfn, but now it blindly adds pageblock_nr_pages to the previous
      value.
      
      This causes the problem that isolation_start_pfn is larger than
      block_end_pfn when we isolate the page with more than pageblock order.
      In this case, isolation would fail due to an invalid range parameter.
      
      To prevent this, this patch implements skipping the range until a proper
      target pageblock is met.  Without this patch, CMA with more than
      pageblock order always fails but with this patch it will succeed.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      58420016
  2. 30 10月, 2014 1 次提交
    • J
      mm/compaction.c: avoid premature range skip in isolate_migratepages_range · 6ea41c0c
      Joonsoo Kim 提交于
      Commit edc2ca61 ("mm, compaction: move pageblock checks up from
      isolate_migratepages_range()") commonizes isolate_migratepages variants
      and make them use isolate_migratepages_block().
      
      isolate_migratepages_block() could stop the execution when enough pages
      are isolated, but, there is no code in isolate_migratepages_range() to
      handle this case.  In the result, even if isolate_migratepages_block()
      returns prematurely without checking all pages in the range,
      
      isolate_migratepages_block() is called repeately on the following
      pageblock and some pages in the previous range are skipped to check.
      Then, CMA is failed frequently due to this fact.
      
      To fix this problem, this patch let isolate_migratepages_range() know
      the situation that enough pages are isolated and stop the isolation in
      that case.
      
      Note that isolate_migratepages() has no such problem, because, it always
      stops the isolation after just one call of isolate_migratepages_block().
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6ea41c0c
  3. 10 10月, 2014 13 次提交
    • K
      mm/balloon_compaction: redesign ballooned pages management · d6d86c0a
      Konstantin Khlebnikov 提交于
      Sasha Levin reported KASAN splash inside isolate_migratepages_range().
      Problem is in the function __is_movable_balloon_page() which tests
      AS_BALLOON_MAP in page->mapping->flags.  This function has no protection
      against anonymous pages.  As result it tried to check address space flags
      inside struct anon_vma.
      
      Further investigation shows more problems in current implementation:
      
      * Special branch in __unmap_and_move() never works:
        balloon_page_movable() checks page flags and page_count.  In
        __unmap_and_move() page is locked, reference counter is elevated, thus
        balloon_page_movable() always fails.  As a result execution goes to the
        normal migration path.  virtballoon_migratepage() returns
        MIGRATEPAGE_BALLOON_SUCCESS instead of MIGRATEPAGE_SUCCESS,
        move_to_new_page() thinks this is an error code and assigns
        newpage->mapping to NULL.  Newly migrated page lose connectivity with
        balloon an all ability for further migration.
      
      * lru_lock erroneously required in isolate_migratepages_range() for
        isolation ballooned page.  This function releases lru_lock periodically,
        this makes migration mostly impossible for some pages.
      
      * balloon_page_dequeue have a tight race with balloon_page_isolate:
        balloon_page_isolate could be executed in parallel with dequeue between
        picking page from list and locking page_lock.  Race is rare because they
        use trylock_page() for locking.
      
      This patch fixes all of them.
      
      Instead of fake mapping with special flag this patch uses special state of
      page->_mapcount: PAGE_BALLOON_MAPCOUNT_VALUE = -256.  Buddy allocator uses
      PAGE_BUDDY_MAPCOUNT_VALUE = -128 for similar purpose.  Storing mark
      directly in struct page makes everything safer and easier.
      
      PagePrivate is used to mark pages present in page list (i.e.  not
      isolated, like PageLRU for normal pages).  It replaces special rules for
      reference counter and makes balloon migration similar to migration of
      normal pages.  This flag is protected by page_lock together with link to
      the balloon device.
      Signed-off-by: NKonstantin Khlebnikov <k.khlebnikov@samsung.com>
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Link: http://lkml.kernel.org/p/53E6CEAA.9020105@oracle.com
      Cc: Rafael Aquini <aquini@redhat.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: <stable@vger.kernel.org>	[3.8+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d6d86c0a
    • X
      mm/compaction.c: fix warning of 'flags' may be used uninitialized · b8b2d825
      Xiubo Li 提交于
      C      mm/compaction.o
      mm/compaction.c: In function isolate_freepages_block:
      mm/compaction.c:364:37: warning: flags may be used uninitialized in this function [-Wmaybe-uninitialized]
             && compact_unlock_should_abort(&cc->zone->lock, flags,
                                           ^
      Signed-off-by: NXiubo Li <Li.Xiubo@freescale.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b8b2d825
    • D
      mm, compaction: pass gfp mask to compact_control · 6d7ce559
      David Rientjes 提交于
      struct compact_control currently converts the gfp mask to a migratetype,
      but we need the entire gfp mask in a follow-up patch.
      
      Pass the entire gfp mask as part of struct compact_control.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6d7ce559
    • D
      mm: rename allocflags_to_migratetype for clarity · 43e7a34d
      David Rientjes 提交于
      The page allocator has gfp flags (like __GFP_WAIT) and alloc flags (like
      ALLOC_CPUSET) that have separate semantics.
      
      The function allocflags_to_migratetype() actually takes gfp flags, not
      alloc flags, and returns a migratetype.  Rename it to
      gfpflags_to_migratetype().
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      43e7a34d
    • V
      mm, compaction: skip buddy pages by their order in the migrate scanner · 99c0fd5e
      Vlastimil Babka 提交于
      The migration scanner skips PageBuddy pages, but does not consider their
      order as checking page_order() is generally unsafe without holding the
      zone->lock, and acquiring the lock just for the check wouldn't be a good
      tradeoff.
      
      Still, this could avoid some iterations over the rest of the buddy page,
      and if we are careful, the race window between PageBuddy() check and
      page_order() is small, and the worst thing that can happen is that we skip
      too much and miss some isolation candidates.  This is not that bad, as
      compaction can already fail for many other reasons like parallel
      allocations, and those have much larger race window.
      
      This patch therefore makes the migration scanner obtain the buddy page
      order and use it to skip the whole buddy page, if the order appears to be
      in the valid range.
      
      It's important that the page_order() is read only once, so that the value
      used in the checks and in the pfn calculation is the same.  But in theory
      the compiler can replace the local variable by multiple inlines of
      page_order().  Therefore, the patch introduces page_order_unsafe() that
      uses ACCESS_ONCE to prevent this.
      
      Testing with stress-highalloc from mmtests shows a 15% reduction in number
      of pages scanned by migration scanner.  The reduction is >60% with
      __GFP_NO_KSWAPD allocations, along with success rates better by few
      percent.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      99c0fd5e
    • V
      mm, compaction: remember position within pageblock in free pages scanner · e14c720e
      Vlastimil Babka 提交于
      Unlike the migration scanner, the free scanner remembers the beginning of
      the last scanned pageblock in cc->free_pfn.  It might be therefore
      rescanning pages uselessly when called several times during single
      compaction.  This might have been useful when pages were returned to the
      buddy allocator after a failed migration, but this is no longer the case.
      
      This patch changes the meaning of cc->free_pfn so that if it points to a
      middle of a pageblock, that pageblock is scanned only from cc->free_pfn to
      the end.  isolate_freepages_block() will record the pfn of the last page
      it looked at, which is then used to update cc->free_pfn.
      
      In the mmtests stress-highalloc benchmark, this has resulted in lowering
      the ratio between pages scanned by both scanners, from 2.5 free pages per
      migrate page, to 2.25 free pages per migrate page, without affecting
      success rates.
      
      With __GFP_NO_KSWAPD allocations, this appears to result in a worse ratio
      (2.1 instead of 1.8), but page migration successes increased by 10%, so
      this could mean that more useful work can be done until need_resched()
      aborts this kind of compaction.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e14c720e
    • V
      mm, compaction: skip rechecks when lock was already held · 69b7189f
      Vlastimil Babka 提交于
      Compaction scanners try to lock zone locks as late as possible by checking
      many page or pageblock properties opportunistically without lock and
      skipping them if not unsuitable.  For pages that pass the initial checks,
      some properties have to be checked again safely under lock.  However, if
      the lock was already held from a previous iteration in the initial checks,
      the rechecks are unnecessary.
      
      This patch therefore skips the rechecks when the lock was already held.
      This is now possible to do, since we don't (potentially) drop and
      reacquire the lock between the initial checks and the safe rechecks
      anymore.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      69b7189f
    • V
      mm, compaction: periodically drop lock and restore IRQs in scanners · 8b44d279
      Vlastimil Babka 提交于
      Compaction scanners regularly check for lock contention and need_resched()
      through the compact_checklock_irqsave() function.  However, if there is no
      contention, the lock can be held and IRQ disabled for potentially long
      time.
      
      This has been addressed by commit b2eef8c0 ("mm: compaction: minimise
      the time IRQs are disabled while isolating pages for migration") for the
      migration scanner.  However, the refactoring done by commit 2a1402aa
      ("mm: compaction: acquire the zone->lru_lock as late as possible") has
      changed the conditions so that the lock is dropped only when there's
      contention on the lock or need_resched() is true.  Also, need_resched() is
      checked only when the lock is already held.  The comment "give a chance to
      irqs before checking need_resched" is therefore misleading, as IRQs remain
      disabled when the check is done.
      
      This patch restores the behavior intended by commit b2eef8c0 and also
      tries to better balance and make more deterministic the time spent by
      checking for contention vs the time the scanners might run between the
      checks.  It also avoids situations where checking has not been done often
      enough before.  The result should be avoiding both too frequent and too
      infrequent contention checking, and especially the potentially
      long-running scans with IRQs disabled and no checking of need_resched() or
      for fatal signal pending, which can happen when many consecutive pages or
      pageblocks fail the preliminary tests and do not reach the later call site
      to compact_checklock_irqsave(), as explained below.
      
      Before the patch:
      
      In the migration scanner, compact_checklock_irqsave() was called each
      loop, if reached.  If not reached, some lower-frequency checking could
      still be done if the lock was already held, but this would not result in
      aborting contended async compaction until reaching
      compact_checklock_irqsave() or end of pageblock.  In the free scanner, it
      was similar but completely without the periodical checking, so lock can be
      potentially held until reaching the end of pageblock.
      
      After the patch, in both scanners:
      
      The periodical check is done as the first thing in the loop on each
      SWAP_CLUSTER_MAX aligned pfn, using the new compact_unlock_should_abort()
      function, which always unlocks the lock (if locked) and aborts async
      compaction if scheduling is needed.  It also aborts any type of compaction
      when a fatal signal is pending.
      
      The compact_checklock_irqsave() function is replaced with a slightly
      different compact_trylock_irqsave().  The biggest difference is that the
      function is not called at all if the lock is already held.  The periodical
      need_resched() checking is left solely to compact_unlock_should_abort().
      The lock contention avoidance for async compaction is achieved by the
      periodical unlock by compact_unlock_should_abort() and by using trylock in
      compact_trylock_irqsave() and aborting when trylock fails.  Sync
      compaction does not use trylock.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8b44d279
    • V
      mm, compaction: khugepaged should not give up due to need_resched() · 1f9efdef
      Vlastimil Babka 提交于
      Async compaction aborts when it detects zone lock contention or
      need_resched() is true.  David Rientjes has reported that in practice,
      most direct async compactions for THP allocation abort due to
      need_resched().  This means that a second direct compaction is never
      attempted, which might be OK for a page fault, but khugepaged is intended
      to attempt a sync compaction in such case and in these cases it won't.
      
      This patch replaces "bool contended" in compact_control with an int that
      distinguishes between aborting due to need_resched() and aborting due to
      lock contention.  This allows propagating the abort through all compaction
      functions as before, but passing the abort reason up to
      __alloc_pages_slowpath() which decides when to continue with direct
      reclaim and another compaction attempt.
      
      Another problem is that try_to_compact_pages() did not act upon the
      reported contention (both need_resched() or lock contention) immediately
      and would proceed with another zone from the zonelist.  When
      need_resched() is true, that means initializing another zone compaction,
      only to check again need_resched() in isolate_migratepages() and aborting.
       For zone lock contention, the unintended consequence is that the lock
      contended status reported back to the allocator is detrmined from the last
      zone where compaction was attempted, which is rather arbitrary.
      
      This patch fixes the problem in the following way:
      - async compaction of a zone aborting due to need_resched() or fatal signal
        pending means that further zones should not be tried. We report
        COMPACT_CONTENDED_SCHED to the allocator.
      - aborting zone compaction due to lock contention means we can still try
        another zone, since it has different set of locks. We report back
        COMPACT_CONTENDED_LOCK only if *all* zones where compaction was attempted,
        it was aborted due to lock contention.
      
      As a result of these fixes, khugepaged will proceed with second sync
      compaction as intended, when the preceding async compaction aborted due to
      need_resched().  Page fault compactions aborting due to need_resched()
      will spare some cycles previously wasted by initializing another zone
      compaction only to abort again.  Lock contention will be reported only
      when compaction in all zones aborted due to lock contention, and therefore
      it's not a good idea to try again after reclaim.
      
      In stress-highalloc from mmtests configured to use __GFP_NO_KSWAPD, this
      has improved number of THP collapse allocations by 10%, which shows
      positive effect on khugepaged.  The benchmark's success rates are
      unchanged as it is not recognized as khugepaged.  Numbers of compact_stall
      and compact_fail events have however decreased by 20%, with
      compact_success still a bit improved, which is good.  With benchmark
      configured not to use __GFP_NO_KSWAPD, there is 6% improvement in THP
      collapse allocations, and only slight improvement in stalls and failures.
      
      [akpm@linux-foundation.org: fix warnings]
      Reported-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1f9efdef
    • V
      mm, compaction: reduce zone checking frequency in the migration scanner · 7d49d886
      Vlastimil Babka 提交于
      The unification of the migrate and free scanner families of function has
      highlighted a difference in how the scanners ensure they only isolate
      pages of the intended zone.  This is important for taking zone lock or lru
      lock of the correct zone.  Due to nodes overlapping, it is however
      possible to encounter a different zone within the range of the zone being
      compacted.
      
      The free scanner, since its inception by commit 748446bb ("mm:
      compaction: memory compaction core"), has been checking the zone of the
      first valid page in a pageblock, and skipping the whole pageblock if the
      zone does not match.
      
      This checking was completely missing from the migration scanner at first,
      and later added by commit dc908600 ("mm: compaction: check for
      overlapping nodes during isolation for migration") in a reaction to a bug
      report.  But the zone comparison in migration scanner is done once per a
      single scanned page, which is more defensive and thus more costly than a
      check per pageblock.
      
      This patch unifies the checking done in both scanners to once per
      pageblock, through a new pageblock_pfn_to_page() function, which also
      includes pfn_valid() checks.  It is more defensive than the current free
      scanner checks, as it checks both the first and last page of the
      pageblock, but less defensive by the migration scanner per-page checks.
      It assumes that node overlapping may result (on some architecture) in a
      boundary between two nodes falling into the middle of a pageblock, but
      that there cannot be a node0 node1 node0 interleaving within a single
      pageblock.
      
      The result is more code being shared and a bit less per-page CPU cost in
      the migration scanner.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7d49d886
    • V
      mm, compaction: move pageblock checks up from isolate_migratepages_range() · edc2ca61
      Vlastimil Babka 提交于
      isolate_migratepages_range() is the main function of the compaction
      scanner, called either on a single pageblock by isolate_migratepages()
      during regular compaction, or on an arbitrary range by CMA's
      __alloc_contig_migrate_range().  It currently perfoms two pageblock-wide
      compaction suitability checks, and because of the CMA callpath, it tracks
      if it crossed a pageblock boundary in order to repeat those checks.
      
      However, closer inspection shows that those checks are always true for CMA:
      - isolation_suitable() is true because CMA sets cc->ignore_skip_hint to true
      - migrate_async_suitable() check is skipped because CMA uses sync compaction
      
      We can therefore move the compaction-specific checks to
      isolate_migratepages() and simplify isolate_migratepages_range().
      Furthermore, we can mimic the freepage scanner family of functions, which
      has isolate_freepages_block() function called both by compaction from
      isolate_freepages() and by CMA from isolate_freepages_range(), where each
      use-case adds own specific glue code.  This allows further code
      simplification.
      
      Thus, we rename isolate_migratepages_range() to
      isolate_migratepages_block() and limit its functionality to a single
      pageblock (or its subset).  For CMA, a new different
      isolate_migratepages_range() is created as a CMA-specific wrapper for the
      _block() function.  The checks specific to compaction are moved to
      isolate_migratepages().  As part of the unification of these two families
      of functions, we remove the redundant zone parameter where applicable,
      since zone pointer is already passed in cc->zone.
      
      Furthermore, going back to compact_zone() and compact_finished() when
      pageblock is found unsuitable (now by isolate_migratepages()) is wasteful
      - the checks are meant to skip pageblocks quickly.  The patch therefore
      also introduces a simple loop into isolate_migratepages() so that it does
      not return immediately on failed pageblock checks, but keeps going until
      isolate_migratepages_range() gets called once.  Similarily to
      isolate_freepages(), the function periodically checks if it needs to
      reschedule or abort async compaction.
      
      [iamjoonsoo.kim@lge.com: fix isolated page counting bug in compaction]
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      edc2ca61
    • V
      mm, compaction: do not recheck suitable_migration_target under lock · f8224aa5
      Vlastimil Babka 提交于
      isolate_freepages_block() rechecks if the pageblock is suitable to be a
      target for migration after it has taken the zone->lock.  However, the
      check has been optimized to occur only once per pageblock, and
      compact_checklock_irqsave() might be dropping and reacquiring lock, which
      means somebody else might have changed the pageblock's migratetype
      meanwhile.
      
      Furthermore, nothing prevents the migratetype to change right after
      isolate_freepages_block() has finished isolating.  Given how imperfect
      this is, it's simpler to just rely on the check done in
      isolate_freepages() without lock, and not pretend that the recheck under
      lock guarantees anything.  It is just a heuristic after all.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f8224aa5
    • V
      mm, compaction: defer each zone individually instead of preferred zone · 53853e2d
      Vlastimil Babka 提交于
      When direct sync compaction is often unsuccessful, it may become deferred
      for some time to avoid further useless attempts, both sync and async.
      Successful high-order allocations un-defer compaction, while further
      unsuccessful compaction attempts prolong the compaction deferred period.
      
      Currently the checking and setting deferred status is performed only on
      the preferred zone of the allocation that invoked direct compaction.  But
      compaction itself is attempted on all eligible zones in the zonelist, so
      the behavior is suboptimal and may lead both to scenarios where 1)
      compaction is attempted uselessly, or 2) where it's not attempted despite
      good chances of succeeding, as shown on the examples below:
      
      1) A direct compaction with Normal preferred zone failed and set
         deferred compaction for the Normal zone.  Another unrelated direct
         compaction with DMA32 as preferred zone will attempt to compact DMA32
         zone even though the first compaction attempt also included DMA32 zone.
      
         In another scenario, compaction with Normal preferred zone failed to
         compact Normal zone, but succeeded in the DMA32 zone, so it will not
         defer compaction.  In the next attempt, it will try Normal zone which
         will fail again, instead of skipping Normal zone and trying DMA32
         directly.
      
      2) Kswapd will balance DMA32 zone and reset defer status based on
         watermarks looking good.  A direct compaction with preferred Normal
         zone will skip compaction of all zones including DMA32 because Normal
         was still deferred.  The allocation might have succeeded in DMA32, but
         won't.
      
      This patch makes compaction deferring work on individual zone basis
      instead of preferred zone.  For each zone, it checks compaction_deferred()
      to decide if the zone should be skipped.  If watermarks fail after
      compacting the zone, defer_compaction() is called.  The zone where
      watermarks passed can still be deferred when the allocation attempt is
      unsuccessful.  When allocation is successful, compaction_defer_reset() is
      called for the zone containing the allocated page.  This approach should
      approximate calling defer_compaction() only on zones where compaction was
      attempted and did not yield allocated page.  There might be corner cases
      but that is inevitable as long as the decision to stop compacting dues not
      guarantee that a page will be allocated.
      
      Due to a new COMPACT_DEFERRED return value, some functions relying
      implicitly on COMPACT_SKIPPED = 0 had to be updated, with comments made
      more accurate.  The did_some_progress output parameter of
      __alloc_pages_direct_compact() is removed completely, as the caller
      actually does not use it after compaction sets it - it is only considered
      when direct reclaim sets it.
      
      During testing on a two-node machine with a single very small Normal zone
      on node 1, this patch has improved success rates in stress-highalloc
      mmtests benchmark.  The success here were previously made worse by commit
      3a025760 ("mm: page_alloc: spill to remote nodes before waking
      kswapd") as kswapd was no longer resetting often enough the deferred
      compaction for the Normal zone, and DMA32 zones on both nodes were thus
      not considered for compaction.  On different machine, success rates were
      improved with __GFP_NO_KSWAPD allocations.
      
      [akpm@linux-foundation.org: fix CONFIG_COMPACTION=n build]
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      53853e2d
  4. 05 6月, 2014 10 次提交
    • V
      mm, compaction: properly signal and act upon lock and need_sched() contention · be976572
      Vlastimil Babka 提交于
      Compaction uses compact_checklock_irqsave() function to periodically check
      for lock contention and need_resched() to either abort async compaction,
      or to free the lock, schedule and retake the lock.  When aborting,
      cc->contended is set to signal the contended state to the caller.  Two
      problems have been identified in this mechanism.
      
      First, compaction also calls directly cond_resched() in both scanners when
      no lock is yet taken.  This call either does not abort async compaction,
      or set cc->contended appropriately.  This patch introduces a new
      compact_should_abort() function to achieve both.  In isolate_freepages(),
      the check frequency is reduced to once by SWAP_CLUSTER_MAX pageblocks to
      match what the migration scanner does in the preliminary page checks.  In
      case a pageblock is found suitable for calling isolate_freepages_block(),
      the checks within there are done on higher frequency.
      
      Second, isolate_freepages() does not check if isolate_freepages_block()
      aborted due to contention, and advances to the next pageblock.  This
      violates the principle of aborting on contention, and might result in
      pageblocks not being scanned completely, since the scanning cursor is
      advanced.  This problem has been noticed in the code by Joonsoo Kim when
      reviewing related patches.  This patch makes isolate_freepages_block()
      check the cc->contended flag and abort.
      
      In case isolate_freepages() has already isolated some pages before
      aborting due to contention, page migration will proceed, which is OK since
      we do not want to waste the work that has been done, and page migration
      has own checks for contention.  However, we do not want another isolation
      attempt by either of the scanners, so cc->contended flag check is added
      also to compaction_alloc() and compact_finished() to make sure compaction
      is aborted right after the migration.
      
      The outcome of the patch should be reduced lock contention by async
      compaction and lower latencies for higher-order allocations where direct
      compaction is involved.
      
      [akpm@linux-foundation.org: fix typo in comment]
      Reported-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Acked-by: NMichal Nazarewicz <mina86@mina86.com>
      Tested-by: NShawn Guo <shawn.guo@linaro.org>
      Tested-by: NKevin Hilman <khilman@linaro.org>
      Tested-by: NStephen Warren <swarren@nvidia.com>
      Tested-by: NFabio Estevam <fabio.estevam@freescale.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      be976572
    • V
      mm/compaction: avoid rescanning pageblocks in isolate_freepages · e9ade569
      Vlastimil Babka 提交于
      The compaction free scanner in isolate_freepages() currently remembers PFN
      of the highest pageblock where it successfully isolates, to be used as the
      starting pageblock for the next invocation.  The rationale behind this is
      that page migration might return free pages to the allocator when
      migration fails and we don't want to skip them if the compaction
      continues.
      
      Since migration now returns free pages back to compaction code where they
      can be reused, this is no longer a concern.  This patch changes
      isolate_freepages() so that the PFN for restarting is updated with each
      pageblock where isolation is attempted.  Using stress-highalloc from
      mmtests, this resulted in 10% reduction of the pages scanned by the free
      scanner.
      
      Note that the somewhat similar functionality that records highest
      successful pageblock in zone->compact_cached_free_pfn, remains unchanged.
      This cache is used when the whole compaction is restarted, not for
      multiple invocations of the free scanner during single compaction.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Acked-by: NMichal Nazarewicz <mina86@mina86.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e9ade569
    • V
      mm/compaction: do not count migratepages when unnecessary · f8c9301f
      Vlastimil Babka 提交于
      During compaction, update_nr_listpages() has been used to count remaining
      non-migrated and free pages after a call to migrage_pages().  The
      freepages counting has become unneccessary, and it turns out that
      migratepages counting is also unnecessary in most cases.
      
      The only situation when it's needed to count cc->migratepages is when
      migrate_pages() returns with a negative error code.  Otherwise, the
      non-negative return value is the number of pages that were not migrated,
      which is exactly the count of remaining pages in the cc->migratepages
      list.
      
      Furthermore, any non-zero count is only interesting for the tracepoint of
      mm_compaction_migratepages events, because after that all remaining
      unmigrated pages are put back and their count is set to 0.
      
      This patch therefore removes update_nr_listpages() completely, and changes
      the tracepoint definition so that the manual counting is done only when
      the tracepoint is enabled, and only when migrate_pages() returns a
      negative error code.
      
      Furthermore, migrate_pages() and the tracepoints won't be called when
      there's nothing to migrate.  This potentially avoids some wasted cycles
      and reduces the volume of uninteresting mm_compaction_migratepages events
      where "nr_migrated=0 nr_failed=0".  In the stress-highalloc mmtest, this
      was about 75% of the events.  The mm_compaction_isolate_migratepages event
      is better for determining that nothing was isolated for migration, and
      this one was just duplicating the info.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Acked-by: NMichal Nazarewicz <mina86@mina86.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f8c9301f
    • D
      mm, compaction: terminate async compaction when rescheduling · aeef4b83
      David Rientjes 提交于
      Async compaction terminates prematurely when need_resched(), see
      compact_checklock_irqsave().  This can never trigger, however, if the
      cond_resched() in isolate_migratepages_range() always takes care of the
      scheduling.
      
      If the cond_resched() actually triggers, then terminate this pageblock
      scan for async compaction as well.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aeef4b83
    • D
      mm, compaction: embed migration mode in compact_control · e0b9daeb
      David Rientjes 提交于
      We're going to want to manipulate the migration mode for compaction in the
      page allocator, and currently compact_control's sync field is only a bool.
      
      Currently, we only do MIGRATE_ASYNC or MIGRATE_SYNC_LIGHT compaction
      depending on the value of this bool.  Convert the bool to enum
      migrate_mode and pass the migration mode in directly.  Later, we'll want
      to avoid MIGRATE_SYNC_LIGHT for thp allocations in the pagefault patch to
      avoid unnecessary latency.
      
      This also alters compaction triggered from sysfs, either for the entire
      system or for a node, to force MIGRATE_SYNC.
      
      [akpm@linux-foundation.org: fix build]
      [iamjoonsoo.kim@lge.com: use MIGRATE_SYNC in alloc_contig_range()]
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Suggested-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e0b9daeb
    • D
      mm, compaction: add per-zone migration pfn cache for async compaction · 35979ef3
      David Rientjes 提交于
      Each zone has a cached migration scanner pfn for memory compaction so that
      subsequent calls to memory compaction can start where the previous call
      left off.
      
      Currently, the compaction migration scanner only updates the per-zone
      cached pfn when pageblocks were not skipped for async compaction.  This
      creates a dependency on calling sync compaction to avoid having subsequent
      calls to async compaction from scanning an enormous amount of non-MOVABLE
      pageblocks each time it is called.  On large machines, this could be
      potentially very expensive.
      
      This patch adds a per-zone cached migration scanner pfn only for async
      compaction.  It is updated everytime a pageblock has been scanned in its
      entirety and when no pages from it were successfully isolated.  The cached
      migration scanner pfn for sync compaction is updated only when called for
      sync compaction.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      35979ef3
    • D
      mm, compaction: return failed migration target pages back to freelist · d53aea3d
      David Rientjes 提交于
      Greg reported that he found isolated free pages were returned back to the
      VM rather than the compaction freelist.  This will cause holes behind the
      free scanner and cause it to reallocate additional memory if necessary
      later.
      
      He detected the problem at runtime seeing that ext4 metadata pages (esp
      the ones read by "sbi->s_group_desc[i] = sb_bread(sb, block)") were
      constantly visited by compaction calls of migrate_pages().  These pages
      had a non-zero b_count which caused fallback_migrate_page() ->
      try_to_release_page() -> try_to_free_buffers() to fail.
      
      Memory compaction works by having a "freeing scanner" scan from one end of
      a zone which isolates pages as migration targets while another "migrating
      scanner" scans from the other end of the same zone which isolates pages
      for migration.
      
      When page migration fails for an isolated page, the target page is
      returned to the system rather than the freelist built by the freeing
      scanner.  This may require the freeing scanner to continue scanning memory
      after suitable migration targets have already been returned to the system
      needlessly.
      
      This patch returns destination pages to the freeing scanner freelist when
      page migration fails.  This prevents unnecessary work done by the freeing
      scanner but also encourages memory to be as compacted as possible at the
      end of the zone.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Reported-by: NGreg Thelen <gthelen@google.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d53aea3d
    • D
      mm, migration: add destination page freeing callback · 68711a74
      David Rientjes 提交于
      Memory migration uses a callback defined by the caller to determine how to
      allocate destination pages.  When migration fails for a source page,
      however, it frees the destination page back to the system.
      
      This patch adds a memory migration callback defined by the caller to
      determine how to free destination pages.  If a caller, such as memory
      compaction, builds its own freelist for migration targets, this can reuse
      already freed memory instead of scanning additional memory.
      
      If the caller provides a function to handle freeing of destination pages,
      it is called when page migration fails.  If the caller passes NULL then
      freeing back to the system will be handled as usual.  This patch
      introduces no functional change.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Greg Thelen <gthelen@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      68711a74
    • V
      mm/compaction: cleanup isolate_freepages() · c96b9e50
      Vlastimil Babka 提交于
      isolate_freepages() is currently somewhat hard to follow thanks to many
      looks like it is related to the 'low_pfn' variable, but in fact it is not.
      
      This patch renames the 'high_pfn' variable to a hopefully less confusing name,
      and slightly changes its handling without a functional change. A comment made
      obsolete by recent changes is also updated.
      
      [akpm@linux-foundation.org: comment fixes, per Minchan]
      [iamjoonsoo.kim@lge.com: cleanups]
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Dongjun Shin <d.j.shin@samsung.com>
      Cc: Sunghwan Yun <sunghwan.yun@samsung.com>
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c96b9e50
    • H
      mm/compaction: clean up unused code lines · 13fb44e4
      Heesub Shin 提交于
      Remove code lines currently not in use or never called.
      Signed-off-by: NHeesub Shin <heesub.shin@samsung.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Dongjun Shin <d.j.shin@samsung.com>
      Cc: Sunghwan Yun <sunghwan.yun@samsung.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Dongjun Shin <d.j.shin@samsung.com>
      Cc: Sunghwan Yun <sunghwan.yun@samsung.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      13fb44e4
  5. 07 5月, 2014 1 次提交
    • V
      mm/compaction: make isolate_freepages start at pageblock boundary · 49e068f0
      Vlastimil Babka 提交于
      The compaction freepage scanner implementation in isolate_freepages()
      starts by taking the current cc->free_pfn value as the first pfn.  In a
      for loop, it scans from this first pfn to the end of the pageblock, and
      then subtracts pageblock_nr_pages from the first pfn to obtain the first
      pfn for the next for loop iteration.
      
      This means that when cc->free_pfn starts at offset X rather than being
      aligned on pageblock boundary, the scanner will start at offset X in all
      scanned pageblock, ignoring potentially many free pages.  Currently this
      can happen when
      
       a) zone's end pfn is not pageblock aligned, or
      
       b) through zone->compact_cached_free_pfn with CONFIG_HOLES_IN_ZONE
          enabled and a hole spanning the beginning of a pageblock
      
      This patch fixes the problem by aligning the initial pfn in
      isolate_freepages() to pageblock boundary.  This also permits replacing
      the end-of-pageblock alignment within the for loop with a simple
      pageblock_nr_pages increment.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reported-by: NHeesub Shin <heesub.shin@samsung.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Acked-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Dongjun Shin <d.j.shin@samsung.com>
      Cc: Sunghwan Yun <sunghwan.yun@samsung.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      49e068f0
  6. 08 4月, 2014 6 次提交
  7. 04 4月, 2014 3 次提交
  8. 11 3月, 2014 1 次提交
    • L
      mm/compaction: break out of loop on !PageBuddy in isolate_freepages_block · 2af120bc
      Laura Abbott 提交于
      We received several reports of bad page state when freeing CMA pages
      previously allocated with alloc_contig_range:
      
          BUG: Bad page state in process Binder_A  pfn:63202
          page:d21130b0 count:0 mapcount:1 mapping:  (null) index:0x7dfbf
          page flags: 0x40080068(uptodate|lru|active|swapbacked)
      
      Based on the page state, it looks like the page was still in use.  The
      page flags do not make sense for the use case though.  Further debugging
      showed that despite alloc_contig_range returning success, at least one
      page in the range still remained in the buddy allocator.
      
      There is an issue with isolate_freepages_block.  In strict mode (which
      CMA uses), if any pages in the range cannot be isolated,
      isolate_freepages_block should return failure 0.  The current check
      keeps track of the total number of isolated pages and compares against
      the size of the range:
      
              if (strict && nr_strict_required > total_isolated)
                      total_isolated = 0;
      
      After taking the zone lock, if one of the pages in the range is not in
      the buddy allocator, we continue through the loop and do not increment
      total_isolated.  If in the last iteration of the loop we isolate more
      than one page (e.g.  last page needed is a higher order page), the check
      for total_isolated may pass and we fail to detect that a page was
      skipped.  The fix is to bail out if the loop immediately if we are in
      strict mode.  There's no benfit to continuing anyway since we need all
      pages to be isolated.  Additionally, drop the error checking based on
      nr_strict_required and just check the pfn ranges.  This matches with
      what isolate_freepages_range does.
      Signed-off-by: NLaura Abbott <lauraa@codeaurora.org>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Acked-by: NMichal Nazarewicz <mina86@mina86.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2af120bc
  9. 24 1月, 2014 2 次提交
  10. 22 1月, 2014 2 次提交
    • V
      mm: compaction: reset scanner positions immediately when they meet · 55b7c4c9
      Vlastimil Babka 提交于
      Compaction used to start its migrate and free page scaners at the zone's
      lowest and highest pfn, respectively.  Later, caching was introduced to
      remember the scanners' progress across compaction attempts so that
      pageblocks are not re-scanned uselessly.  Additionally, pageblocks where
      isolation failed are marked to be quickly skipped when encountered again
      in future compactions.
      
      Currently, both the reset of cached pfn's and clearing of the pageblock
      skip information for a zone is done in __reset_isolation_suitable().
      This function gets called when:
      
       - compaction is restarting after being deferred
       - compact_blockskip_flush flag is set in compact_finished() when the scanners
         meet (and not again cleared when direct compaction succeeds in allocation)
         and kswapd acts upon this flag before going to sleep
      
      This behavior is suboptimal for several reasons:
      
       - when direct sync compaction is called after async compaction fails (in the
         allocation slowpath), it will effectively do nothing, unless kswapd
         happens to process the compact_blockskip_flush flag meanwhile. This is racy
         and goes against the purpose of sync compaction to more thoroughly retry
         the compaction of a zone where async compaction has failed.
         The restart-after-deferring path cannot help here as deferring happens only
         after the sync compaction fails. It is also done only for the preferred
         zone, while the compaction might be done for a fallback zone.
      
       - the mechanism of marking pageblock to be skipped has little value since the
         cached pfn's are reset only together with the pageblock skip flags. This
         effectively limits pageblock skip usage to parallel compactions.
      
      This patch changes compact_finished() so that cached pfn's are reset
      immediately when the scanners meet.  Clearing pageblock skip flags is
      unchanged, as well as the other situations where cached pfn's are reset.
      This allows the sync-after-async compaction to retry pageblocks not
      marked as skipped, such as blocks !MIGRATE_MOVABLE blocks that async
      compactions now skips without marking them.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      55b7c4c9
    • V
      mm: compaction: do not mark unmovable pageblocks as skipped in async compaction · 50b5b094
      Vlastimil Babka 提交于
      Compaction temporarily marks pageblocks where it fails to isolate pages
      as to-be-skipped in further compactions, in order to improve efficiency.
      One of the reasons to fail isolating pages is that isolation is not
      attempted in pageblocks that are not of MIGRATE_MOVABLE (or CMA) type.
      
      The problem is that blocks skipped due to not being MIGRATE_MOVABLE in
      async compaction become skipped due to the temporary mark also in future
      sync compaction.  Moreover, this may follow quite soon during
      __alloc_page_slowpath, without much time for kswapd to clear the
      pageblock skip marks.  This goes against the idea that sync compaction
      should try to scan these blocks more thoroughly than the async
      compaction.
      
      The fix is to ensure in async compaction that these !MIGRATE_MOVABLE
      blocks are not marked to be skipped.  Note this should not affect
      performance or locking impact of further async compactions, as skipping
      a block due to being !MIGRATE_MOVABLE is done soon after skipping a
      block marked to be skipped, both without locking.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      50b5b094