• V
    mm, page_alloc: disable pcplists during memory offline · ec6e8c7e
    Vlastimil Babka 提交于
    Memory offlining relies on page isolation to guarantee a forward progress
    because pages cannot be reused while they are isolated.  But the page
    isolation itself doesn't prevent from races while freed pages are stored
    on pcp lists and thus can be reused.  This can be worked around by
    repeated draining of pcplists, as done by commit 96831826
    ("mm/memory_hotplug: drain per-cpu pages again during memory offline").
    
    David and Michal would prefer that this race was closed in a way that
    callers of page isolation who need stronger guarantees don't need to
    repeatedly drain.  David suggested disabling pcplists usage completely
    during page isolation, instead of repeatedly draining them.
    
    To achieve this without adding special cases in alloc/free fastpath, we
    can use the same approach as boot pagesets - when pcp->high is 0, any
    pcplist addition will be immediately flushed.
    
    The race can thus be closed by setting pcp->high to 0 and draining
    pcplists once, before calling start_isolate_page_range().  The draining
    will serialize after processes that already disabled interrupts and read
    the old value of pcp->high in free_unref_page_commit(), and processes that
    have not yet disabled interrupts, will observe pcp->high == 0 when they
    are rescheduled, and skip pcplists.  This guarantees no stray pages on
    pcplists in zones where isolation happens.
    
    This patch thus adds zone_pcp_disable() and zone_pcp_enable() functions
    that page isolation users can call before start_isolate_page_range() and
    after unisolating (or offlining) the isolated pages.
    
    Also, drain_all_pages() is optimized to only execute on cpus where
    pcplists are not empty.  The check can however race with a free to pcplist
    that has not yet increased the pcp->count from 0 to 1.  Thus make the
    drain optionally skip the racy check and drain on all cpus, and use this
    option in zone_pcp_disable().
    
    As we have to avoid external updates to high and batch while pcplists are
    disabled, we take pcp_batch_high_lock in zone_pcp_disable() and release it
    in zone_pcp_enable().  This also synchronizes multiple users of
    zone_pcp_disable()/enable().
    
    Currently the only user of this functionality is offline_pages().
    
    [vbabka@suse.cz: add comment, per David]
      Link: https://lkml.kernel.org/r/527480ef-ed72-e1c1-52a0-1c5b0113df45@suse.cz
    
    Link: https://lkml.kernel.org/r/20201111092812.11329-8-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
    Suggested-by: NDavid Hildenbrand <david@redhat.com>
    Suggested-by: NMichal Hocko <mhocko@suse.com>
    Reviewed-by: NOscar Salvador <osalvador@suse.de>
    Reviewed-by: NDavid Hildenbrand <david@redhat.com>
    Acked-by: NMichal Hocko <mhocko@suse.com>
    Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
    ec6e8c7e
memory_hotplug.c 48.7 KB