- 16 3月, 2016 1 次提交
-
-
由 Joonsoo Kim 提交于
free_pfn and compact_cached_free_pfn are the pointer that remember restart position of freepage scanner. When they are reset or invalid, we set them to zone_end_pfn because freepage scanner works in reverse direction. But, because zone range is defined as [zone_start_pfn, zone_end_pfn), zone_end_pfn is invalid to access. Therefore, we should not store it to free_pfn and compact_cached_free_pfn. Instead, we need to store zone_end_pfn - 1 to them. There is one more thing we should consider. Freepage scanner scan reversely by pageblock unit. If free_pfn and compact_cached_free_pfn are set to middle of pageblock, it regards that sitiation as that it already scans front part of pageblock so we lose opportunity to scan there. To fix-up, this patch do round_down() to guarantee that reset position will be pageblock aligned. Note that thanks to the current pageblock_pfn_to_page() implementation, actual access to zone_end_pfn doesn't happen until now. But, following patch will change pageblock_pfn_to_page() so this patch is needed from now on. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NDavid Rientjes <rientjes@google.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 1月, 2016 2 次提交
-
-
由 Joonsoo Kim 提交于
This patch uses is_via_compact_memory() to distinguish compaction from sysfs or sysctl. And, this patch also reduces indentation on compaction_defer_reset() by filtering these cases first before checking watermark. There is no functional change. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NYaowei Bai <baiyaowei@cmss.chinamobile.com> Acked-by: NDavid Rientjes <rientjes@google.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yaowei Bai 提交于
sysctl_compaction_handler() is the handler function for compact_memory tunable knob under /proc/sys/vm, add the missing knob name to make this more accurate in comment. No functional change. Signed-off-by: NYaowei Bai <baiyaowei@cmss.chinamobile.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NMichal Nazarewicz <mina86@mina86.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 11月, 2015 3 次提交
-
-
由 Vlastimil Babka 提交于
Compaction returns prematurely with COMPACT_PARTIAL when contended or has fatal signal pending. This is ok for the callers, but might be misleading in the traces, as the usual reason to return COMPACT_PARTIAL is that we think the allocation should succeed. After this patch we distinguish the premature ending condition in the mm_compaction_finished and mm_compaction_end tracepoints. The contended status covers the following reasons: - lock contention or need_resched() detected in async compaction - fatal signal pending - too many pages isolated in the zone (only for async compaction) Further distinguishing the exact reason seems unnecessary for now. Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vlastimil Babka 提交于
Some compaction tracepoints convert the integer return values to strings using the compaction_status_string array. This works for in-kernel printing, but not userspace trace printing of raw captured trace such as via trace-cmd report. This patch converts the private array to appropriate tracepoint macros that result in proper userspace support. trace-cmd output before: transhuge-stres-4235 [000] 453.149280: mm_compaction_finished: node=0 zone=ffffffff81815d7a order=9 ret= after: transhuge-stres-4235 [000] 453.149280: mm_compaction_finished: node=0 zone=ffffffff81815d7a order=9 ret=partial Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Reviewed-by: NSteven Rostedt <rostedt@goodmis.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yaowei Bai 提交于
Introduce is_via_compact_memory() helper indicating compacting via /proc/sys/vm/compact_memory to improve readability. To catch this situation in __compaction_suitable, use order as parameter directly instead of using struct compact_control. This patch has no functional changes. Signed-off-by: NYaowei Bai <bywxiaobai@163.com> Cc: Mel Gorman <mgorman@techsingularity.net> Acked-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 9月, 2015 6 次提交
-
-
由 Joonsoo Kim 提交于
We cache isolate_start_pfn before entering isolate_migratepages(). If pageblock is skipped in isolate_migratepages() due to whatever reason, cc->migrate_pfn can be far from isolate_start_pfn hence we flush pages that were freed. For example, the following scenario can be possible: - assume order-9 compaction, pageblock order is 9 - start_isolate_pfn is 0x200 - isolate_migratepages() - skip a number of pageblocks - start to isolate from pfn 0x600 - cc->migrate_pfn = 0x620 - return - last_migrated_pfn is set to 0x200 - check flushing condition - current_block_start is set to 0x600 - last_migrated_pfn < current_block_start then do useless flush This wrong flush would not help the performance and success rate so this patch tries to fix it. One simple way to know the exact position where we start to isolate migratable pages is that we cache it in isolate_migratepages() before entering actual isolation. This patch implements that and fixes the problem. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vlastimil Babka 提交于
The compaction free scanner is looking for PageBuddy() pages and skipping all others. For large compound pages such as THP or hugetlbfs, we can save a lot of iterations if we skip them at once using their compound_order(). This is generally unsafe and we can read a bogus value of order due to a race, but if we are careful, the only danger is skipping too much. When tested with stress-highalloc from mmtests on 4GB system with 1GB hugetlbfs pages, the vmstat compact_free_scanned count decreased by at least 15%. Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Acked-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NMichal Nazarewicz <mina86@mina86.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Christoph Lameter <cl@linux.com> Cc: Rik van Riel <riel@redhat.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vlastimil Babka 提交于
The compaction migrate scanner tries to skip THP pages by their order, to reduce number of iterations for pages it cannot isolate. The check is only done if PageLRU() is true, which means it applies to THP pages, but not e.g. hugetlbfs pages or any other non-LRU compound pages, which we have to iterate by base pages. This limitation comes from the assumption that it's only safe to read compound_order() when we have the zone's lru_lock and THP cannot be split under us. But the only danger (after filtering out order values that are not below MAX_ORDER, to prevent overflows) is that we skip too much or too little after reading a bogus compound_order() due to a rare race. This is the same reasoning as patch 99c0fd5e ("mm, compaction: skip buddy pages by their order in the migrate scanner") introduced for unsafely reading PageBuddy() order. After this patch, all pages are tested for PageCompound() and we skip them by compound_order(). The test is done after the test for balloon_page_movable() as we don't want to assume if balloon pages (or other pages with own isolation and migration implementation if a generic API gets implemented) are compound or not. When tested with stress-highalloc from mmtests on 4GB system with 1GB hugetlbfs pages, the vmstat compact_migrate_scanned count decreased by 15%. [kirill.shutemov@linux.intel.com: change PageTransHuge checks to PageCompound for different series was squashed here] Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Acked-by: NMel Gorman <mgorman@suse.de> Acked-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NMichal Nazarewicz <mina86@mina86.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Christoph Lameter <cl@linux.com> Cc: Rik van Riel <riel@redhat.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vlastimil Babka 提交于
Reseting the cached compaction scanner positions is now open-coded in __reset_isolation_suitable() and compact_finished(). Encapsulate the functionality in a new function reset_cached_positions(). Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NMichal Nazarewicz <mina86@mina86.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Christoph Lameter <cl@linux.com> Cc: Rik van Riel <riel@redhat.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vlastimil Babka 提交于
Handling the position where compaction free scanner should restart (stored in cc->free_pfn) got more complex with commit e14c720e ("mm, compaction: remember position within pageblock in free pages scanner"). Currently the position is updated in each loop iteration of isolate_freepages(), although it should be enough to update it only when breaking from the loop. There's also an extra check outside the loop updates the position in case we have met the migration scanner. This can be simplified if we move the test for having isolated enough from the for-loop header next to the test for contention, and determining the restart position only in these cases. We can reuse the isolate_start_pfn variable for this instead of setting cc->free_pfn directly. Outside the loop, we can simply set cc->free_pfn to current value of isolate_start_pfn without any extra check. Also add a VM_BUG_ON to catch possible mistake in the future, in case we later add a new condition that terminates isolate_freepages_block() prematurely without also considering the condition in isolate_freepages(). Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Acked-by: NMel Gorman <mgorman@suse.de> Acked-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Christoph Lameter <cl@linux.com> Cc: Rik van Riel <riel@redhat.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vlastimil Babka 提交于
Assorted compaction cleanups and optimizations. The interesting patches are 4 and 5. In 4, skipping of compound pages in single iteration is improved for migration scanner, so it works also for !PageLRU compound pages such as hugetlbfs, slab etc. Patch 5 introduces this kind of skipping in the free scanner. The trick is that we can read compound_order() without any protection, if we are careful to filter out values larger than MAX_ORDER. The only danger is that we skip too much. The same trick was already used for reading the freepage order in the migrate scanner. To demonstrate improvements of Patches 4 and 5 I've run stress-highalloc from mmtests, set to simulate THP allocations (including __GFP_COMP) on a 4GB system where 1GB was occupied by hugetlbfs pages. I'll include just the relevant stats: Patch 3 Patch 4 Patch 5 Compaction stalls 7523 7529 7515 Compaction success 323 304 322 Compaction failures 7200 7224 7192 Page migrate success 247778 264395 240737 Page migrate failure 15358 33184 21621 Compaction pages isolated 906928 980192 909983 Compaction migrate scanned 2005277 1692805 1498800 Compaction free scanned 13255284 11539986 9011276 Compaction cost 288 305 277 With 5 iterations per patch, the results are still noisy, but we can see that Patch 4 does reduce migrate_scanned by 15% thanks to skipping the hugetlbfs pages at once. Interestingly, free_scanned is also reduced and I have no idea why. Patch 5 further reduces free_scanned as expected, by 15%. Other stats are unaffected modulo noise. [1] https://lkml.org/lkml/2015/1/19/158 This patch (of 5): Compaction should finish when the migration and free scanner meet, i.e. they reach the same pageblock. Currently however, the test in compact_finished() simply just compares the exact pfns, which may yield a false negative when the free scanner position is in the middle of a pageblock and the migration scanner reaches the begining of the same pageblock. This hasn't been a problem until commit e14c720e ("mm, compaction: remember position within pageblock in free pages scanner") allowed the free scanner position to be in the middle of a pageblock between invocations. The hot-fix 1d5bfe1f ("mm, compaction: prevent infinite loop in compact_zone") prevented the issue by adding a special check in the migration scanner to satisfy the current detection of scanners meeting. However, the proper fix is to make the detection more robust. This patch introduces the compact_scanners_met() function that returns true when the free scanner position is in the same or lower pageblock than the migration scanner. The special case in isolate_migratepages() introduced by 1d5bfe1f is removed. Suggested-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Acked-by: NMel Gorman <mgorman@suse.de> Acked-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NMichal Nazarewicz <mina86@mina86.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Christoph Lameter <cl@linux.com> Acked-by: NRik van Riel <riel@redhat.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 4月, 2015 3 次提交
-
-
由 Andrew Morton 提交于
mm/compaction.c:250:13: warning: 'suitable_migration_target' defined but not used [-Wunused-function] Reported-by: NFengguang Wu <fengguang.wu@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Gioh Kim 提交于
When the compaction is activated via /proc/sys/vm/compact_memory it would better scan the whole zone. And some platforms, for instance ARM, have the start_pfn of a zone at zero. Therefore the first try to compact via /proc doesn't work. It needs to reset the compaction scanner position first. Signed-off-by: NGioh Kim <gioh.kim@lge.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Eric B Munson 提交于
Currently, pages which are marked as unevictable are protected from compaction, but not from other types of migration. The POSIX real time extension explicitly states that mlock() will prevent a major page fault, but the spirit of this is that mlock() should give a process the ability to control sources of latency, including minor page faults. However, the mlock manpage only explicitly says that a locked page will not be written to swap and this can cause some confusion. The compaction code today does not give a developer who wants to avoid swap but wants to have large contiguous areas available any method to achieve this state. This patch introduces a sysctl for controlling compaction behavior with respect to the unevictable lru. Users who demand no page faults after a page is present can set compact_unevictable_allowed to 0 and users who need the large contiguous areas can enable compaction on locked memory by leaving the default value of 1. To illustrate this problem I wrote a quick test program that mmaps a large number of 1MB files filled with random data. These maps are created locked and read only. Then every other mmap is unmapped and I attempt to allocate huge pages to the static huge page pool. When the compact_unevictable_allowed sysctl is 0, I cannot allocate hugepages after fragmenting memory. When the value is set to 1, allocations succeed. Signed-off-by: NEric B Munson <emunson@akamai.com> Acked-by: NMichal Hocko <mhocko@suse.cz> Acked-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NChristoph Lameter <cl@linux.com> Acked-by: NDavid Rientjes <rientjes@google.com> Acked-by: NRik van Riel <riel@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Christoph Lameter <cl@linux.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 4月, 2015 1 次提交
-
-
由 Joonsoo Kim 提交于
Compaction has anti fragmentation algorithm. It is that freepage should be more than pageblock order to finish the compaction if we don't find any freepage in requested migratetype buddy list. This is for mitigating fragmentation, but, there is a lack of migratetype consideration and it is too excessive compared to page allocator's anti fragmentation algorithm. Not considering migratetype would cause premature finish of compaction. For example, if allocation request is for unmovable migratetype, freepage with CMA migratetype doesn't help that allocation and compaction should not be stopped. But, current logic regards this situation as compaction is no longer needed, so finish the compaction. Secondly, condition is too excessive compared to page allocator's logic. We can steal freepage from other migratetype and change pageblock migratetype on more relaxed conditions in page allocator. This is designed to prevent fragmentation and we can use it here. Imposing hard constraint only to the compaction doesn't help much in this case since page allocator would cause fragmentation again. To solve these problems, this patch borrows anti fragmentation logic from page allocator. It will reduce premature compaction finish in some cases and reduce excessive compaction work. stress-highalloc test in mmtests with non movable order 7 allocation shows considerable increase of compaction success rate. Compaction success rate (Compaction success * 100 / Compaction stalls, %) 31.82 : 42.20 I tested it on non-reboot 5 runs stress-highalloc benchmark and found that there is no more degradation on allocation success rate than before. That roughly means that this patch doesn't result in more fragmentations. Vlastimil suggests additional idea that we only test for fallbacks when migration scanner has scanned a whole pageblock. It looked good for fragmentation because chance of stealing increase due to making more free pages in certain pageblock. So, I tested it, but, it results in decreased compaction success rate, roughly 38.00. I guess the reason that if system is low memory condition, watermark check could be failed due to not enough order 0 free page and so, sometimes, we can't reach a fallback check although migrate_pfn is aligned to pageblock_nr_pages. I can insert code to cope with this situation but it makes code more complicated so I don't include his idea at this patch. [akpm@linux-foundation.org: fix CONFIG_CMA=n build] Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 2月, 2015 1 次提交
-
-
由 Andrey Ryabinin 提交于
Add kernel address sanitizer hooks to mark allocated page's addresses as accessible in corresponding shadow region. Mark freed pages as inaccessible. Signed-off-by: NAndrey Ryabinin <a.ryabinin@samsung.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Konstantin Serebryany <kcc@google.com> Cc: Dmitry Chernenkov <dmitryc@google.com> Signed-off-by: NAndrey Konovalov <adech.fo@gmail.com> Cc: Yuri Gribov <tetra2005@gmail.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 2月, 2015 3 次提交
-
-
由 Hugh Dickins 提交于
The vmstat interfaces are good at hiding negative counts (at least when CONFIG_SMP); but if you peer behind the curtain, you find that nr_isolated_anon and nr_isolated_file soon go negative, and grow ever more negative: so they can absorb larger and larger numbers of isolated pages, yet still appear to be zero. I'm happy to avoid a congestion_wait() when too_many_isolated() myself; but I guess it's there for a good reason, in which case we ought to get too_many_isolated() working again. The imbalance comes from isolate_migratepages()'s ISOLATE_ABORT case: putback_movable_pages() decrements the NR_ISOLATED counts, but we forgot to call acct_isolated() to increment them. It is possible that the bug whcih this patch fixes could cause OOM kills when the system still has a lot of reclaimable page cache. Fixes: edc2ca61 ("mm, compaction: move pageblock checks up from isolate_migratepages_range()") Signed-off-by: NHugh Dickins <hughd@google.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <stable@vger.kernel.org> [3.18+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joonsoo Kim 提交于
Currently, freepage isolation in one pageblock doesn't consider how many freepages we isolate. When I traced flow of compaction, compaction sometimes isolates more than 256 freepages to migrate just 32 pages. In this patch, freepage isolation is stopped at the point that we have more isolated freepage than isolated page for migration. This results in slowing down free page scanner and make compaction success rate higher. stress-highalloc test in mmtests with non movable order 7 allocation shows increase of compaction success rate. Compaction success rate (Compaction success * 100 / Compaction stalls, %) 27.13 : 31.82 pfn where both scanners meets on compaction complete (separate test due to enormous tracepoint buffer) (zone_start=4096, zone_end=1048576) 586034 : 654378 In fact, I didn't fully understand why this patch results in such good result. There was a guess that not used freepages are released to pcp list and on next compaction trial we won't isolate them again so compaction success rate would decrease. To prevent this effect, I tested with adding pcp drain code on release_freepages(), but, it has no good effect. Anyway, this patch reduces waste time to isolate unneeded freepages so seems reasonable. Vlastimil said: : I briefly tried it on top of the pivot-changing series and with order-9 : allocations it reduced free page scanned counter by almost 10%. No effect : on success rates (maybe because pivot changing already took care of the : scanners meeting problem) but the scanning reduction is good on its own. : : It also explains why e14c720e ("mm, compaction: remember position : within pageblock in free pages scanner") had less than expected : improvements. It would only actually stop within pageblock in case of : async compaction detecting contention. I guess that's also why the : infinite loop problem fixed by 1d5bfe1f affected so relatively few : people. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Tested-by: NVlastimil Babka <vbabka@suse.cz> Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joonsoo Kim 提交于
What we want to check here is whether there is highorder freepage in buddy list of other migratetype in order to steal it without fragmentation. But, current code just checks cc->order which means allocation request order. So, this is wrong. Without this fix, non-movable synchronous compaction below pageblock order would not stopped until compaction is complete, because migratetype of most pageblocks are movable and high order freepage made by compaction is usually on movable type buddy list. There is some report related to this bug. See below link. http://www.spinics.net/lists/linux-mm/msg81666.html Although the issued system still has load spike comes from compaction, this makes that system completely stable and responsive according to his report. stress-highalloc test in mmtests with non movable order 7 allocation doesn't show any notable difference in allocation success rate, but, it shows more compaction success rate. Compaction success rate (Compaction success * 100 / Compaction stalls, %) 18.47 : 28.94 Fixes: 1fb3f8ca ("mm: compaction: capture a suitable high-order page immediately when it is made available") Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: <stable@vger.kernel.org> [3.7+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 12 2月, 2015 5 次提交
-
-
由 Joonsoo Kim 提交于
Compaction deferring logic is heavy hammer that block the way to the compaction. It doesn't consider overall system state, so it could prevent user from doing compaction falsely. In other words, even if system has enough range of memory to compact, compaction would be skipped due to compaction deferring logic. This patch add new tracepoint to understand work of deferring logic. This will also help to check compaction success and fail. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joonsoo Kim 提交于
It is not well analyzed that when/why compaction start/finish or not. With these new tracepoints, we can know much more about start/finish reason of compaction. I can find following bug with these tracepoint. http://www.spinics.net/lists/linux-mm/msg81582.htmlSigned-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joonsoo Kim 提交于
It'd be useful to know current range where compaction work for detailed analysis. With it, we can know pageblock where we actually scan and isolate, and, how much pages we try in that pageblock and can guess why it doesn't become freepage with pageblock order roughly. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joonsoo Kim 提交于
We now have tracepoint for begin event of compaction and it prints start position of both scanners, but, tracepoint for end event of compaction doesn't print finish position of both scanners. It'd be also useful to know finish position of both scanners so this patch add it. It will help to find odd behavior or problem on compaction internal logic. And mode is added to both begin/end tracepoint output, since according to mode, compaction behavior is quite different. And lastly, status format is changed to string rather than status number for readability. [akpm@linux-foundation.org: fix sparse warning] Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vlastimil Babka 提交于
Expand the usage of the struct alloc_context introduced in the previous patch also for calling try_to_compact_pages(), to reduce the number of its parameters. Since the function is in different compilation unit, we need to move alloc_context definition in the shared mm/internal.h header. With this change we get simpler code and small savings of code size and stack usage: add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-27 (-27) function old new delta __alloc_pages_direct_compact 283 256 -27 add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-13 (-13) function old new delta try_to_compact_pages 582 569 -13 Stack usage of __alloc_pages_direct_compact goes from 24 to none (per scripts/checkstack.pl). Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NMichal Hocko <mhocko@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Minchan Kim <minchan@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 12月, 2014 5 次提交
-
-
由 Vlastimil Babka 提交于
The goal of memory compaction is to create high-order freepages through page migration. Page migration however puts pages on the per-cpu lru_add cache, which is later flushed to per-cpu pcplists, and only after pcplists are drained the pages can actually merge. This can happen due to the per-cpu caches becoming full through further freeing, or explicitly. During direct compaction, it is useful to do the draining explicitly so that pages merge as soon as possible and compaction can detect success immediately and keep the latency impact at minimum. However the current implementation is far from ideal. Draining is done only in __alloc_pages_direct_compact(), after all zones were already compacted, and the decisions to continue or stop compaction in individual zones was done without the last batch of migrations being merged. It is also missing the draining of lru_add cache before the pcplists. This patch moves the draining for direct compaction into compact_zone(). It adds the missing lru_cache draining and uses the newly introduced single zone pcplists draining to reduce overhead and avoid impact on unrelated zones. Draining is only performed when it can actually lead to merging of a page of desired order (passed by cc->order). This means it is only done when migration occurred in the previously scanned cc->order aligned block(s) and the migration scanner is now pointing to the next cc->order aligned block. The patch has been tested with stress-highalloc benchmark from mmtests. Although overal allocation success rates of the benchmark were not affected, the number of detected compaction successes has doubled. This suggests that allocations were previously successful due to implicit merging caused by background activity, making a later allocation attempt succeed immediately, but not attributing the success to compaction. Since stress-highalloc always tries to allocate almost the whole memory, it cannot show the improvement in its reported success rate metric. However after this patch, compaction should detect success and terminate earlier, reducing the direct compaction latencies in a real scenario. Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Christoph Lameter <cl@linux.com> Acked-by: NRik van Riel <riel@redhat.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vlastimil Babka 提交于
Compaction caches the migration and free scanner positions between compaction invocations, so that the whole zone gets eventually scanned and there is no bias towards the initial scanner positions at the beginning/end of the zone. The cached positions are continuously updated as scanners progress and the updating stops as soon as a page is successfully isolated. The reasoning behind this is that a pageblock where isolation succeeded is likely to succeed again in near future and it should be worth revisiting it. However, the downside is that potentially many pages are rescanned without successful isolation. At worst, there might be a page where isolation from LRU succeeds but migration fails (potentially always). So upon encountering this page, cached position would always stop being updated for no good reason. It might have been useful to let such page be rescanned with sync compaction after async one failed, but this is now handled by caching scanner position for async and sync mode separately since commit 35979ef3 ("mm, compaction: add per-zone migration pfn cache for async compaction"). After this patch, cached positions are updated unconditionally. In stress-highalloc benchmark, this has decreased the numbers of scanned pages by few percent, without affecting allocation success rates. To prevent free scanner from leaving free pages behind after they are returned due to page migration failure, the cached scanner pfn is changed to point to the pageblock of the returned free page with the highest pfn, before leaving compact_zone(). [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Christoph Lameter <cl@linux.com> Acked-by: NRik van Riel <riel@redhat.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vlastimil Babka 提交于
Deferred compaction is employed to avoid compacting zone where sync direct compaction has recently failed. As such, it makes sense to only defer when a full zone was scanned, which is when compact_zone returns with COMPACT_COMPLETE. It's less useful to defer when compact_zone returns with apparent success (COMPACT_PARTIAL), followed by a watermark check failure, which can happen due to parallel allocation activity. It also does not make much sense to defer compaction which was completely skipped (COMPACT_SKIP) for being unsuitable in the first place. This patch therefore makes deferred compaction trigger only when COMPACT_COMPLETE is returned from compact_zone(). Results of stress-highalloc becnmark show the difference is within measurement error, so the issue is rather cosmetic. Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Christoph Lameter <cl@linux.com> Acked-by: NRik van Riel <riel@redhat.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vlastimil Babka 提交于
Since commit 53853e2d ("mm, compaction: defer each zone individually instead of preferred zone"), compaction is deferred for each zone where sync direct compaction fails, and reset where it succeeds. However, it was observed that for DMA zone compaction often appeared to succeed while subsequent allocation attempt would not, due to different outcome of watermark check. In order to properly defer compaction in this zone, the candidate zone has to be passed back to __alloc_pages_direct_compact() and compaction deferred in the zone after the allocation attempt fails. The large source of mismatch between watermark check in compaction and allocation was the lack of alloc_flags and classzone_idx values in compaction, which has been fixed in the previous patch. So with this problem fixed, we can simplify the code by removing the candidate_zone parameter and deferring in __alloc_pages_direct_compact(). After this patch, the compaction activity during stress-highalloc benchmark is still somewhat increased, but it's negligible compared to the increase that occurred without the better watermark checking. This suggests that it is still possible to apparently succeed in compaction but fail to allocate, possibly due to parallel allocation activity. [akpm@linux-foundation.org: fix build] Suggested-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Christoph Lameter <cl@linux.com> Cc: Rik van Riel <riel@redhat.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vlastimil Babka 提交于
Compaction relies on zone watermark checks for decisions such as if it's worth to start compacting in compaction_suitable() or whether compaction should stop in compact_finished(). The watermark checks take classzone_idx and alloc_flags parameters, which are related to the memory allocation request. But from the context of compaction they are currently passed as 0, including the direct compaction which is invoked to satisfy the allocation request, and could therefore know the proper values. The lack of proper values can lead to mismatch between decisions taken during compaction and decisions related to the allocation request. Lack of proper classzone_idx value means that lowmem_reserve is not taken into account. This has manifested (during recent changes to deferred compaction) when DMA zone was used as fallback for preferred Normal zone. compaction_suitable() without proper classzone_idx would think that the watermarks are already satisfied, but watermark check in get_page_from_freelist() would fail. Because of this problem, deferring compaction has extra complexity that can be removed in the following patch. The issue (not confirmed in practice) with missing alloc_flags is opposite in nature. For allocations that include ALLOC_HIGH, ALLOC_HIGHER or ALLOC_CMA in alloc_flags (the last includes all MOVABLE allocations on CMA-enabled systems) the watermark checking in compaction with 0 passed will be stricter than in get_page_from_freelist(). In these cases compaction might be running for a longer time than is really needed. Another issue compaction_suitable() is that the check for "does the zone need compaction at all?" comes only after the check "does the zone have enough free free pages to succeed compaction". The latter considers extra pages for migration and can therefore in some situations fail and return COMPACT_SKIPPED, although the high-order allocation would succeed and we should return COMPACT_PARTIAL. This patch fixes these problems by adding alloc_flags and classzone_idx to struct compact_control and related functions involved in direct compaction and watermark checking. Where possible, all other callers of compaction_suitable() pass proper values where those are known. This is currently limited to classzone_idx, which is sometimes known in kswapd context. However, the direct reclaim callers should_continue_reclaim() and compaction_ready() do not currently know the proper values, so the coordination between reclaim and compaction may still not be as accurate as it could. This can be fixed later, if it's shown to be an issue. Additionaly the checks in compact_suitable() are reordered to address the second issue described above. The effect of this patch should be slightly better high-order allocation success rates and/or less compaction overhead, depending on the type of allocations and presence of CMA. It allows simplifying deferred compaction code in a followup patch. When testing with stress-highalloc, there was some slight improvement (which might be just due to variance) in success rates of non-THP-like allocations. Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Christoph Lameter <cl@linux.com> Acked-by: NRik van Riel <riel@redhat.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 11月, 2014 2 次提交
-
-
由 Vlastimil Babka 提交于
Several people have reported occasionally seeing processes stuck in compact_zone(), even triggering soft lockups, in 3.18-rc2+. Testing a revert of commit e14c720e ("mm, compaction: remember position within pageblock in free pages scanner") fixed the issue, although the stuck processes do not appear to involve the free scanner. Finally, by code inspection, the bug was found in isolate_migratepages() which uses a slightly different condition to detect if the migration and free scanners have met, than compact_finished(). That has not been a problem until commit e14c720e allowed the free scanner position between individual invocations to be in the middle of a pageblock. In a relatively rare case, the migration scanner position can end up at the beginning of a pageblock, with the free scanner position in the middle of the same pageblock. If it's the migration scanner's turn, isolate_migratepages() exits immediately (without updating the position), while compact_finished() decides to continue compaction, resulting in a potentially infinite loop. The system can recover only if another process creates enough high-order pages to make the watermark checks in compact_finished() pass. This patch fixes the immediate problem by bumping the migration scanner's position to meet the free scanner in isolate_migratepages(), when both are within the same pageblock. This causes compact_finished() to terminate properly. A more robust check in compact_finished() is planned as a cleanup for better future maintainability. Fixes: e14c720e ("mm, compaction: remember position within pageblock in free pages scanner) Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Reported-by: NP. Christeas <xrg@linux.gr> Tested-by: NP. Christeas <xrg@linux.gr> Link: http://marc.info/?l=linux-mm&m=141508604232522&w=2Reported-by: NNorbert Preining <preining@logic.at> Tested-by: NNorbert Preining <preining@logic.at> Link: https://lkml.org/lkml/2014/11/4/904Reported-by: NPavel Machek <pavel@ucw.cz> Link: https://lkml.org/lkml/2014/11/7/164 Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joonsoo Kim 提交于
Commit 7d49d886 ("mm, compaction: reduce zone checking frequency in the migration scanner") has a side-effect that changes the iteration range calculation. Before the change, block_end_pfn is calculated using start_pfn, but now it blindly adds pageblock_nr_pages to the previous value. This causes the problem that isolation_start_pfn is larger than block_end_pfn when we isolate the page with more than pageblock order. In this case, isolation would fail due to an invalid range parameter. To prevent this, this patch implements skipping the range until a proper target pageblock is met. Without this patch, CMA with more than pageblock order always fails but with this patch it will succeed. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 30 10月, 2014 1 次提交
-
-
由 Joonsoo Kim 提交于
Commit edc2ca61 ("mm, compaction: move pageblock checks up from isolate_migratepages_range()") commonizes isolate_migratepages variants and make them use isolate_migratepages_block(). isolate_migratepages_block() could stop the execution when enough pages are isolated, but, there is no code in isolate_migratepages_range() to handle this case. In the result, even if isolate_migratepages_block() returns prematurely without checking all pages in the range, isolate_migratepages_block() is called repeately on the following pageblock and some pages in the previous range are skipped to check. Then, CMA is failed frequently due to this fact. To fix this problem, this patch let isolate_migratepages_range() know the situation that enough pages are isolated and stop the isolation in that case. Note that isolate_migratepages() has no such problem, because, it always stops the isolation after just one call of isolate_migratepages_block(). Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Christoph Lameter <cl@linux.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 10月, 2014 7 次提交
-
-
由 Konstantin Khlebnikov 提交于
Sasha Levin reported KASAN splash inside isolate_migratepages_range(). Problem is in the function __is_movable_balloon_page() which tests AS_BALLOON_MAP in page->mapping->flags. This function has no protection against anonymous pages. As result it tried to check address space flags inside struct anon_vma. Further investigation shows more problems in current implementation: * Special branch in __unmap_and_move() never works: balloon_page_movable() checks page flags and page_count. In __unmap_and_move() page is locked, reference counter is elevated, thus balloon_page_movable() always fails. As a result execution goes to the normal migration path. virtballoon_migratepage() returns MIGRATEPAGE_BALLOON_SUCCESS instead of MIGRATEPAGE_SUCCESS, move_to_new_page() thinks this is an error code and assigns newpage->mapping to NULL. Newly migrated page lose connectivity with balloon an all ability for further migration. * lru_lock erroneously required in isolate_migratepages_range() for isolation ballooned page. This function releases lru_lock periodically, this makes migration mostly impossible for some pages. * balloon_page_dequeue have a tight race with balloon_page_isolate: balloon_page_isolate could be executed in parallel with dequeue between picking page from list and locking page_lock. Race is rare because they use trylock_page() for locking. This patch fixes all of them. Instead of fake mapping with special flag this patch uses special state of page->_mapcount: PAGE_BALLOON_MAPCOUNT_VALUE = -256. Buddy allocator uses PAGE_BUDDY_MAPCOUNT_VALUE = -128 for similar purpose. Storing mark directly in struct page makes everything safer and easier. PagePrivate is used to mark pages present in page list (i.e. not isolated, like PageLRU for normal pages). It replaces special rules for reference counter and makes balloon migration similar to migration of normal pages. This flag is protected by page_lock together with link to the balloon device. Signed-off-by: NKonstantin Khlebnikov <k.khlebnikov@samsung.com> Reported-by: NSasha Levin <sasha.levin@oracle.com> Link: http://lkml.kernel.org/p/53E6CEAA.9020105@oracle.com Cc: Rafael Aquini <aquini@redhat.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: <stable@vger.kernel.org> [3.8+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Xiubo Li 提交于
C mm/compaction.o mm/compaction.c: In function isolate_freepages_block: mm/compaction.c:364:37: warning: flags may be used uninitialized in this function [-Wmaybe-uninitialized] && compact_unlock_should_abort(&cc->zone->lock, flags, ^ Signed-off-by: NXiubo Li <Li.Xiubo@freescale.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Rientjes 提交于
struct compact_control currently converts the gfp mask to a migratetype, but we need the entire gfp mask in a follow-up patch. Pass the entire gfp mask as part of struct compact_control. Signed-off-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com> Acked-by: NMinchan Kim <minchan@kernel.org> Acked-by: NMel Gorman <mgorman@suse.de> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Christoph Lameter <cl@linux.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Rientjes 提交于
The page allocator has gfp flags (like __GFP_WAIT) and alloc flags (like ALLOC_CPUSET) that have separate semantics. The function allocflags_to_migratetype() actually takes gfp flags, not alloc flags, and returns a migratetype. Rename it to gfpflags_to_migratetype(). Signed-off-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com> Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: NMinchan Kim <minchan@kernel.org> Acked-by: NMel Gorman <mgorman@suse.de> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Christoph Lameter <cl@linux.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vlastimil Babka 提交于
The migration scanner skips PageBuddy pages, but does not consider their order as checking page_order() is generally unsafe without holding the zone->lock, and acquiring the lock just for the check wouldn't be a good tradeoff. Still, this could avoid some iterations over the rest of the buddy page, and if we are careful, the race window between PageBuddy() check and page_order() is small, and the worst thing that can happen is that we skip too much and miss some isolation candidates. This is not that bad, as compaction can already fail for many other reasons like parallel allocations, and those have much larger race window. This patch therefore makes the migration scanner obtain the buddy page order and use it to skip the whole buddy page, if the order appears to be in the valid range. It's important that the page_order() is read only once, so that the value used in the checks and in the pfn calculation is the same. But in theory the compiler can replace the local variable by multiple inlines of page_order(). Therefore, the patch introduces page_order_unsafe() that uses ACCESS_ONCE to prevent this. Testing with stress-highalloc from mmtests shows a 15% reduction in number of pages scanned by migration scanner. The reduction is >60% with __GFP_NO_KSWAPD allocations, along with success rates better by few percent. Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com> Acked-by: NMinchan Kim <minchan@kernel.org> Acked-by: NMel Gorman <mgorman@suse.de> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Christoph Lameter <cl@linux.com> Cc: Rik van Riel <riel@redhat.com> Acked-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vlastimil Babka 提交于
Unlike the migration scanner, the free scanner remembers the beginning of the last scanned pageblock in cc->free_pfn. It might be therefore rescanning pages uselessly when called several times during single compaction. This might have been useful when pages were returned to the buddy allocator after a failed migration, but this is no longer the case. This patch changes the meaning of cc->free_pfn so that if it points to a middle of a pageblock, that pageblock is scanned only from cc->free_pfn to the end. isolate_freepages_block() will record the pfn of the last page it looked at, which is then used to update cc->free_pfn. In the mmtests stress-highalloc benchmark, this has resulted in lowering the ratio between pages scanned by both scanners, from 2.5 free pages per migrate page, to 2.25 free pages per migrate page, without affecting success rates. With __GFP_NO_KSWAPD allocations, this appears to result in a worse ratio (2.1 instead of 1.8), but page migration successes increased by 10%, so this could mean that more useful work can be done until need_resched() aborts this kind of compaction. Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com> Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: NDavid Rientjes <rientjes@google.com> Acked-by: NMinchan Kim <minchan@kernel.org> Acked-by: NMel Gorman <mgorman@suse.de> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Christoph Lameter <cl@linux.com> Cc: Rik van Riel <riel@redhat.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vlastimil Babka 提交于
Compaction scanners try to lock zone locks as late as possible by checking many page or pageblock properties opportunistically without lock and skipping them if not unsuitable. For pages that pass the initial checks, some properties have to be checked again safely under lock. However, if the lock was already held from a previous iteration in the initial checks, the rechecks are unnecessary. This patch therefore skips the rechecks when the lock was already held. This is now possible to do, since we don't (potentially) drop and reacquire the lock between the initial checks and the safe rechecks anymore. Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com> Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: NMinchan Kim <minchan@kernel.org> Acked-by: NMel Gorman <mgorman@suse.de> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Christoph Lameter <cl@linux.com> Cc: Rik van Riel <riel@redhat.com> Acked-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-