• M
    mm: disable LRU pagevec during the migration temporarily · d479960e
    Minchan Kim 提交于
    LRU pagevec holds refcount of pages until the pagevec are drained.  It
    could prevent migration since the refcount of the page is greater than
    the expection in migration logic.  To mitigate the issue, callers of
    migrate_pages drains LRU pagevec via migrate_prep or lru_add_drain_all
    before migrate_pages call.
    
    However, it's not enough because pages coming into pagevec after the
    draining call still could stay at the pagevec so it could keep
    preventing page migration.  Since some callers of migrate_pages have
    retrial logic with LRU draining, the page would migrate at next trail
    but it is still fragile in that it doesn't close the fundamental race
    between upcoming LRU pages into pagvec and migration so the migration
    failure could cause contiguous memory allocation failure in the end.
    
    To close the race, this patch disables lru caches(i.e, pagevec) during
    ongoing migration until migrate is done.
    
    Since it's really hard to reproduce, I measured how many times
    migrate_pages retried with force mode(it is about a fallback to a sync
    migration) with below debug code.
    
    int migrate_pages(struct list_head *from, new_page_t get_new_page,
    			..
    			..
    
      if (rc && reason == MR_CONTIG_RANGE && pass > 2) {
             printk(KERN_ERR, "pfn 0x%lx reason %d", page_to_pfn(page), rc);
             dump_page(page, "fail to migrate");
      }
    
    The test was repeating android apps launching with cma allocation in
    background every five seconds.  Total cma allocation count was about 500
    during the testing.  With this patch, the dump_page count was reduced
    from 400 to 30.
    
    The new interface is also useful for memory hotplug which currently
    drains lru pcp caches after each migration failure.  This is rather
    suboptimal as it has to disrupt others running during the operation.
    With the new interface the operation happens only once.  This is also in
    line with pcp allocator cache which are disabled for the offlining as
    well.
    
    Link: https://lkml.kernel.org/r/20210319175127.886124-1-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
    Reviewed-by: NChris Goldsworthy <cgoldswo@codeaurora.org>
    Acked-by: NMichal Hocko <mhocko@suse.com>
    Cc: John Dias <joaodias@google.com>
    Cc: Suren Baghdasaryan <surenb@google.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Oliver Sang <oliver.sang@intel.com>
    Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
    d479960e
swap.c 31.8 KB