• M
    mm, page_alloc: do not wake kswapd with zone lock held · ba16c9c8
    Mel Gorman 提交于
    to #28825456
    
    commit 73444bc4d8f92e46a20cb6bd3342fc2ea75c6787 upstream.
    
    syzbot reported the following regression in the latest merge window and
    it was confirmed by Qian Cai that a similar bug was visible from a
    different context.
    
      ======================================================
      WARNING: possible circular locking dependency detected
      4.20.0+ #297 Not tainted
      ------------------------------------------------------
      syz-executor0/8529 is trying to acquire lock:
      000000005e7fb829 (&pgdat->kswapd_wait){....}, at:
      __wake_up_common_lock+0x19e/0x330 kernel/sched/wait.c:120
    
      but task is already holding lock:
      000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: spin_lock
      include/linux/spinlock.h:329 [inline]
      000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: rmqueue_bulk
      mm/page_alloc.c:2548 [inline]
      000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: __rmqueue_pcplist
      mm/page_alloc.c:3021 [inline]
      000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: rmqueue_pcplist
      mm/page_alloc.c:3050 [inline]
      000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: rmqueue
      mm/page_alloc.c:3072 [inline]
      000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at:
      get_page_from_freelist+0x1bae/0x52a0 mm/page_alloc.c:3491
    
    It appears to be a false positive in that the only way the lock ordering
    should be inverted is if kswapd is waking itself and the wakeup
    allocates debugging objects which should already be allocated if it's
    kswapd doing the waking.  Nevertheless, the possibility exists and so
    it's best to avoid the problem.
    
    This patch flags a zone as needing a kswapd using the, surprisingly,
    unused zone flag field.  The flag is read without the lock held to do
    the wakeup.  It's possible that the flag setting context is not the same
    as the flag clearing context or for small races to occur.  However, each
    race possibility is harmless and there is no visible degredation in
    fragmentation treatment.
    
    While zone->flag could have continued to be unused, there is potential
    for moving some existing fields into the flags field instead.
    Particularly read-mostly ones like zone->initialized and
    zone->contiguous.
    
    Link: http://lkml.kernel.org/r/20190103225712.GJ31517@techsingularity.net
    Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs")
    Reported-by: syzbot+93d94a001cfbce9e60e1@syzkaller.appspotmail.com
    Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
    Acked-by: NVlastimil Babka <vbabka@suse.cz>
    Tested-by: NQian Cai <cai@lca.pw>
    Cc: Dmitry Vyukov <dvyukov@google.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Michal Hocko <mhocko@suse.com>
    Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
    
    Conflicts:
    	include/linux/mmzone.h
    Signed-off-by: NXu Yu <xuyu@linux.alibaba.com>
    Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
    ba16c9c8
page_alloc.c 233.6 KB