• C
    swapfile: fix soft lockup in scan_swap_map_slots · b2b6c3df
    Chen Wandun 提交于
    mainline inclusion
    from mainline-v6.1-rc7
    commit de1ccfb6
    category: bugfix
    bugzilla: https://gitee.com/openeuler/kernel/issues/I645DG
    CVE: NA
    
    Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=de1ccfb648243a031cfbdc2d5571dfdaf5023106
    
    --------------------------------
    
    A softlockup occurs in scan free swap slot under huge memory pressure.
    The test scenario is: 64 CPU cores, 64GB memory, and 28 zram devices, the
    disksize of each zram device is 50MB.
    
    LATENCY_LIMIT is used to prevent softlockups in scan_swap_map_slots(), but
    the real loop number would more than LATENCY_LIMIT because of "goto checks
    and goto scan" repeatly without decreasing latency limit.
    
    In order to fix it, decrease latency_ration in advance.
    
    There is also a suspicious place that will cause softlockups in
    get_swap_pages().  In this function, the "goto start_over" may result in
    continuous scanning of the swap partition.  If there is no cond_sched in
    scan_swap_map_slots(), it would cause a softlockup (I am not sure about
    this).
    
    WARN: soft lockup - CPU#11 stuck for 11s! [kswapd0:466]
    CPU: 11 PID: 466 Comm: kswapd@ Kdump: loaded Tainted: G
    dump backtrace+0x0/0x1le4
    show stack+0x20/@x2c
    dump_stack+0xd8/0x140
    watchdog print_info+0x48/0x54
    watchdog_process_before_softlockup+0x98/0xa0
    watchdog_timer_fn+0xlac/0x2d0
    hrtimer_rum_queues+0xb0/0x130
    hrtimer_interrupt+0x13c/0x3c0
    arch_timer_handler_virt+0x3c/0x50
    handLe_percpu_devid_irq+0x90/0x1f4
    handle domain irq+0x84/0x100
    gic_handle_irq+0x88/0x2b0
    e11 ira+0xhB/Bx140
    scan_swap_map_slots+0x678/0x890
    get_swap_pages+0x29c/0x440
    get_swap_page+0x120/0x2e0
    add_to_swap+UX2U/0XyC
    shrink_page_list+0x5d0/0x152c
    shrink_inactive_list+0xl6c/Bx500
    shrink_lruvec+0x270/0x304
    
    WARN: soft lockup - CPU#32 stuck for 11s! [stress-ng:309915]
    watchdog_timer_fn+0x1ac/0x2d0
    __run_hrtimer+0x98/0x2a0
    __hrtimer_run_queues+0xb0/0x130
    hrtimer_interrupt+0x13c/0x3c0
    arch_timer_handler_virt+0x3c/0x50
    handle_percpu_devid_irq+0x90/0x1f4
    __handle_domain_irq+0x84/0x100
    gic_handle_irq+0x88/0x2b0
    el1_irq+0xb8/0x140
    get_swap_pages+0x1e8/0x440
    get_swap_page+0x1c8/0x2e0
    add_to_swap+0x20/0x9c
    shrink_page_list+0x5d0/0x152c
    reclaim_pages+0x160/0x310
    madvise_cold_or_pageout_pte_range+0x7bc/0xe3c
    walk_pmd_range.isra.0+0xac/0x22c
    walk_pud_range+0xfc/0x1c0
    walk_pgd_range+0x158/0x1b0
    __walk_page_range+0x64/0x100
    walk_page_range+0x104/0x150
    
    Link: https://lkml.kernel.org/r/20221118133850.3360369-1-chenwandun@huawei.com
    Fixes: 048c27fd ("[PATCH] swap: scan_swap_map latency breaks")
    Signed-off-by: NChen Wandun <chenwandun@huawei.com>
    Reviewed-by: N"Huang, Ying" <ying.huang@intel.com>
    Cc: Hugh Dickins <hugh@veritas.com>
    Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
    Cc: Nanyong Sun <sunnanyong@huawei.com>
    Cc: <xialonglong1@huawei.com>
    Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
    Conflicts:
    	mm/swapfile.c
    Signed-off-by: NChen Wandun <chenwandun@huawei.com>
    Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
    Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
    b2b6c3df
swapfile.c 97.5 KB