• M
    mm: vmscan: stop reclaim/compaction earlier due to insufficient progress if !__GFP_REPEAT · 2876592f
    Mel Gorman 提交于
    should_continue_reclaim() for reclaim/compaction allows scanning to
    continue even if pages are not being reclaimed until the full list is
    scanned.  In terms of allocation success, this makes sense but potentially
    it introduces unwanted latency for high-order allocations such as
    transparent hugepages and network jumbo frames that would prefer to fail
    the allocation attempt and fallback to order-0 pages.  Worse, there is a
    potential that the full LRU scan will clear all the young bits, distort
    page aging information and potentially push pages into swap that would
    have otherwise remained resident.
    
    This patch will stop reclaim/compaction if no pages were reclaimed in the
    last SWAP_CLUSTER_MAX pages that were considered.  For allocations such as
    hugetlbfs that use __GFP_REPEAT and have fewer fallback options, the full
    LRU list may still be scanned.
    
    Order-0 allocation should not be affected because RECLAIM_MODE_COMPACTION
    is not set so the following avoids the gfp_mask being examined:
    
            if (!(sc->reclaim_mode & RECLAIM_MODE_COMPACTION))
                    return false;
    
    A tool was developed based on ftrace that tracked the latency of
    high-order allocations while transparent hugepage support was enabled and
    three benchmarks were run.  The "fix-infinite" figures are 2.6.38-rc4 with
    Johannes's patch "vmscan: fix zone shrinking exit when scan work is done"
    applied.
    
      STREAM Highorder Allocation Latency Statistics
                     fix-infinite     break-early
      1 :: Count            10298           10229
      1 :: Min             0.4560          0.4640
      1 :: Mean            1.0589          1.0183
      1 :: Max            14.5990         11.7510
      1 :: Stddev          0.5208          0.4719
      2 :: Count                2               1
      2 :: Min             1.8610          3.7240
      2 :: Mean            3.4325          3.7240
      2 :: Max             5.0040          3.7240
      2 :: Stddev          1.5715          0.0000
      9 :: Count           111696          111694
      9 :: Min             0.5230          0.4110
      9 :: Mean           10.5831         10.5718
      9 :: Max            38.4480         43.2900
      9 :: Stddev          1.1147          1.1325
    
    Mean time for order-1 allocations is reduced.  order-2 looks increased but
    with so few allocations, it's not particularly significant.  THP mean
    allocation latency is also reduced.  That said, allocation time varies so
    significantly that the reductions are within noise.
    
    Max allocation time is reduced by a significant amount for low-order
    allocations but reduced for THP allocations which presumably are now
    breaking before reclaim has done enough work.
    
      SysBench Highorder Allocation Latency Statistics
                     fix-infinite     break-early
      1 :: Count            15745           15677
      1 :: Min             0.4250          0.4550
      1 :: Mean            1.1023          1.0810
      1 :: Max            14.4590         10.8220
      1 :: Stddev          0.5117          0.5100
      2 :: Count                1               1
      2 :: Min             3.0040          2.1530
      2 :: Mean            3.0040          2.1530
      2 :: Max             3.0040          2.1530
      2 :: Stddev          0.0000          0.0000
      9 :: Count             2017            1931
      9 :: Min             0.4980          0.7480
      9 :: Mean           10.4717         10.3840
      9 :: Max            24.9460         26.2500
      9 :: Stddev          1.1726          1.1966
    
    Again, mean time for order-1 allocations is reduced while order-2
    allocations are too few to draw conclusions from.  The mean time for THP
    allocations is also slightly reduced albeit the reductions are within
    varianes.
    
    Once again, our maximum allocation time is significantly reduced for
    low-order allocations and slightly increased for THP allocations.
    
      Anon stream mmap reference Highorder Allocation Latency Statistics
      1 :: Count             1376            1790
      1 :: Min             0.4940          0.5010
      1 :: Mean            1.0289          0.9732
      1 :: Max             6.2670          4.2540
      1 :: Stddev          0.4142          0.2785
      2 :: Count                1               -
      2 :: Min             1.9060               -
      2 :: Mean            1.9060               -
      2 :: Max             1.9060               -
      2 :: Stddev          0.0000               -
      9 :: Count            11266           11257
      9 :: Min             0.4990          0.4940
      9 :: Mean        27250.4669      24256.1919
      9 :: Max      11439211.0000    6008885.0000
      9 :: Stddev     226427.4624     186298.1430
    
    This benchmark creates one thread per CPU which references an amount of
    anonymous memory 1.5 times the size of physical RAM.  This pounds swap
    quite heavily and is intended to exercise THP a bit.
    
    Mean allocation time for order-1 is reduced as before.  It's also reduced
    for THP allocations but the variations here are pretty massive due to
    swap.  As before, maximum allocation times are significantly reduced.
    
    Overall, the patch reduces the mean and maximum allocation latencies for
    the smaller high-order allocations.  This was with Slab configured so it
    would be expected to be more significant with Slub which uses these size
    allocations more aggressively.
    
    The mean allocation times for THP allocations are also slightly reduced.
    The maximum latency was slightly increased as predicted by the comments
    due to reclaim/compaction breaking early.  However, workloads care more
    about the latency of lower-order allocations than THP so it's an
    acceptable trade-off.
    Signed-off-by: NMel Gorman <mel@csn.ul.ie>
    Acked-by: NAndrea Arcangeli <aarcange@redhat.com>
    Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
    Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
    Acked-by: NAndrea Arcangeli <aarcange@redhat.com>
    Acked-by: NRik van Riel <riel@redhat.com>
    Cc: Michal Hocko <mhocko@suse.cz>
    Cc: Kent Overstreet <kent.overstreet@gmail.com>
    Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
    2876592f
vmscan.c 92.5 KB