1. 09 10月, 2012 1 次提交
  2. 22 8月, 2012 1 次提交
    • M
      mm: compaction: Abort async compaction if locks are contended or taking too long · c67fe375
      Mel Gorman 提交于
      Jim Schutt reported a problem that pointed at compaction contending
      heavily on locks.  The workload is straight-forward and in his own words;
      
      	The systems in question have 24 SAS drives spread across 3 HBAs,
      	running 24 Ceph OSD instances, one per drive.  FWIW these servers
      	are dual-socket Intel 5675 Xeons w/48 GB memory.  I've got ~160
      	Ceph Linux clients doing dd simultaneously to a Ceph file system
      	backed by 12 of these servers.
      
      Early in the test everything looks fine
      
        procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu-------
         r  b       swpd       free       buff      cache   si   so    bi    bo   in   cs  us sy  id wa st
        31 15          0     287216        576   38606628    0    0     2  1158    2   14   1  3  95  0  0
        27 15          0     225288        576   38583384    0    0    18 2222016 203357 134876  11 56  17 15  0
        28 17          0     219256        576   38544736    0    0    11 2305932 203141 146296  11 49  23 17  0
         6 18          0     215596        576   38552872    0    0     7 2363207 215264 166502  12 45  22 20  0
        22 18          0     226984        576   38596404    0    0     3 2445741 223114 179527  12 43  23 22  0
      
      and then it goes to pot
      
        procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu-------
         r  b       swpd       free       buff      cache   si   so    bi    bo   in   cs  us sy  id wa st
        163  8          0     464308        576   36791368    0    0    11 22210  866  536   3 13  79  4  0
        207 14          0     917752        576   36181928    0    0   712 1345376 134598 47367   7 90   1  2  0
        123 12          0     685516        576   36296148    0    0   429 1386615 158494 60077   8 84   5  3  0
        123 12          0     598572        576   36333728    0    0  1107 1233281 147542 62351   7 84   5  4  0
        622  7          0     660768        576   36118264    0    0   557 1345548 151394 59353   7 85   4  3  0
        223 11          0     283960        576   36463868    0    0    46 1107160 121846 33006   6 93   1  1  0
      
      Note that system CPU usage is very high blocks being written out has
      dropped by 42%. He analysed this with perf and found
      
        perf record -g -a sleep 10
        perf report --sort symbol --call-graph fractal,5
          34.63%  [k] _raw_spin_lock_irqsave
                  |
                  |--97.30%-- isolate_freepages
                  |          compaction_alloc
                  |          unmap_and_move
                  |          migrate_pages
                  |          compact_zone
                  |          compact_zone_order
                  |          try_to_compact_pages
                  |          __alloc_pages_direct_compact
                  |          __alloc_pages_slowpath
                  |          __alloc_pages_nodemask
                  |          alloc_pages_vma
                  |          do_huge_pmd_anonymous_page
                  |          handle_mm_fault
                  |          do_page_fault
                  |          page_fault
                  |          |
                  |          |--87.39%-- skb_copy_datagram_iovec
                  |          |          tcp_recvmsg
                  |          |          inet_recvmsg
                  |          |          sock_recvmsg
                  |          |          sys_recvfrom
                  |          |          system_call
                  |          |          __recv
                  |          |          |
                  |          |           --100.00%-- (nil)
                  |          |
                  |           --12.61%-- memcpy
                   --2.70%-- [...]
      
      There was other data but primarily it is all showing that compaction is
      contended heavily on the zone->lock and zone->lru_lock.
      
      commit [b2eef8c0: mm: compaction: minimise the time IRQs are disabled
      while isolating pages for migration] noted that it was possible for
      migration to hold the lru_lock for an excessive amount of time. Very
      broadly speaking this patch expands the concept.
      
      This patch introduces compact_checklock_irqsave() to check if a lock
      is contended or the process needs to be scheduled. If either condition
      is true then async compaction is aborted and the caller is informed.
      The page allocator will fail a THP allocation if compaction failed due
      to contention. This patch also introduces compact_trylock_irqsave()
      which will acquire the lock only if it is not contended and the process
      does not need to schedule.
      Reported-by: NJim Schutt <jaschut@sandia.gov>
      Tested-by: NJim Schutt <jaschut@sandia.gov>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c67fe375
  3. 01 8月, 2012 1 次提交
  4. 04 6月, 2012 1 次提交
  5. 30 5月, 2012 1 次提交
    • B
      mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks · 5ceb9ce6
      Bartlomiej Zolnierkiewicz 提交于
      When MIGRATE_UNMOVABLE pages are freed from MIGRATE_UNMOVABLE type
      pageblock (and some MIGRATE_MOVABLE pages are left in it) waiting until an
      allocation takes ownership of the block may take too long.  The type of
      the pageblock remains unchanged so the pageblock cannot be used as a
      migration target during compaction.
      
      Fix it by:
      
      * Adding enum compact_mode (COMPACT_ASYNC_[MOVABLE,UNMOVABLE], and
        COMPACT_SYNC) and then converting sync field in struct compact_control
        to use it.
      
      * Adding nr_pageblocks_skipped field to struct compact_control and
        tracking how many destination pageblocks were of MIGRATE_UNMOVABLE type.
         If COMPACT_ASYNC_MOVABLE mode compaction ran fully in
        try_to_compact_pages() (COMPACT_COMPLETE) it implies that there is not a
        suitable page for allocation.  In this case then check how if there were
        enough MIGRATE_UNMOVABLE pageblocks to try a second pass in
        COMPACT_ASYNC_UNMOVABLE mode.
      
      * Scanning the MIGRATE_UNMOVABLE pageblocks (during COMPACT_SYNC and
        COMPACT_ASYNC_UNMOVABLE compaction modes) and building a count based on
        finding PageBuddy pages, page_count(page) == 0 or PageLRU pages.  If all
        pages within the MIGRATE_UNMOVABLE pageblock are in one of those three
        sets change the whole pageblock type to MIGRATE_MOVABLE.
      
      My particular test case (on a ARM EXYNOS4 device with 512 MiB, which means
      131072 standard 4KiB pages in 'Normal' zone) is to:
      
      - allocate 120000 pages for kernel's usage
      - free every second page (60000 pages) of memory just allocated
      - allocate and use 60000 pages from user space
      - free remaining 60000 pages of kernel memory
        (now we have fragmented memory occupied mostly by user space pages)
      - try to allocate 100 order-9 (2048 KiB) pages for kernel's usage
      
      The results:
      - with compaction disabled I get 11 successful allocations
      - with compaction enabled - 14 successful allocations
      - with this patch I'm able to get all 100 successful allocations
      
      NOTE: If we can make kswapd aware of order-0 request during compaction, we
      can enhance kswapd with changing mode to COMPACT_ASYNC_FULL
      (COMPACT_ASYNC_MOVABLE + COMPACT_ASYNC_UNMOVABLE).  Please see the
      following thread:
      
      	http://marc.info/?l=linux-mm&m=133552069417068&w=2
      
      [minchan@kernel.org: minor cleanups]
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Signed-off-by: NBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Signed-off-by: NKyungmin Park <kyungmin.park@samsung.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5ceb9ce6
  6. 22 3月, 2012 2 次提交
    • R
      vmscan: only defer compaction for failed order and higher · aff62249
      Rik van Riel 提交于
      Currently a failed order-9 (transparent hugepage) compaction can lead to
      memory compaction being temporarily disabled for a memory zone.  Even if
      we only need compaction for an order 2 allocation, eg.  for jumbo frames
      networking.
      
      The fix is relatively straightforward: keep track of the highest order at
      which compaction is succeeding, and only defer compaction for orders at
      which compaction is failing.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aff62249
    • R
      vmscan: kswapd carefully call compaction · 7be62de9
      Rik van Riel 提交于
      With CONFIG_COMPACTION enabled, kswapd does not try to free contiguous
      free pages, even when it is woken for a higher order request.
      
      This could be bad for eg.  jumbo frame network allocations, which are done
      from interrupt context and cannot compact memory themselves.  Higher than
      before allocation failure rates in the network receive path have been
      observed in kernels with compaction enabled.
      
      Teach kswapd to defragment the memory zones in a node, but only if
      required and compaction is not deferred in a zone.
      
      [akpm@linux-foundation.org: reduce scope of zones_need_compaction]
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7be62de9
  7. 01 11月, 2011 1 次提交
  8. 23 3月, 2011 1 次提交
  9. 14 1月, 2011 3 次提交
  10. 25 5月, 2010 6 次提交