• D
    xfs: log worker needs to start before intent/unlink recovery · e5870eee
    Dave Chinner 提交于
    mainline inclusion
    from mainline-v5.17-rc6
    commit a9a4bc8c
    category: bugfix
    bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
    CVE: NA
    
    Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a9a4bc8c76d747aa40b30e2dfc176c781f353a08
    
    --------------------------------
    
    After 963 iterations of generic/530, it deadlocked during recovery
    on a pinned inode cluster buffer like so:
    
    XFS (pmem1): Starting recovery (logdev: internal)
    INFO: task kworker/8:0:306037 blocked for more than 122 seconds.
          Not tainted 5.17.0-rc6-dgc+ #975
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    task:kworker/8:0     state:D stack:13024 pid:306037 ppid:     2 flags:0x00004000
    Workqueue: xfs-inodegc/pmem1 xfs_inodegc_worker
    Call Trace:
     <TASK>
     __schedule+0x30d/0x9e0
     schedule+0x55/0xd0
     schedule_timeout+0x114/0x160
     __down+0x99/0xf0
     down+0x5e/0x70
     xfs_buf_lock+0x36/0xf0
     xfs_buf_find+0x418/0x850
     xfs_buf_get_map+0x47/0x380
     xfs_buf_read_map+0x54/0x240
     xfs_trans_read_buf_map+0x1bd/0x490
     xfs_imap_to_bp+0x4f/0x70
     xfs_iunlink_map_ino+0x66/0xd0
     xfs_iunlink_map_prev.constprop.0+0x148/0x2f0
     xfs_iunlink_remove_inode+0xf2/0x1d0
     xfs_inactive_ifree+0x1a3/0x900
     xfs_inode_unlink+0xcc/0x210
     xfs_inodegc_worker+0x1ac/0x2f0
     process_one_work+0x1ac/0x390
     worker_thread+0x56/0x3c0
     kthread+0xf6/0x120
     ret_from_fork+0x1f/0x30
     </TASK>
    task:mount           state:D stack:13248 pid:324509 ppid:324233 flags:0x00004000
    Call Trace:
     <TASK>
     __schedule+0x30d/0x9e0
     schedule+0x55/0xd0
     schedule_timeout+0x114/0x160
     __down+0x99/0xf0
     down+0x5e/0x70
     xfs_buf_lock+0x36/0xf0
     xfs_buf_find+0x418/0x850
     xfs_buf_get_map+0x47/0x380
     xfs_buf_read_map+0x54/0x240
     xfs_trans_read_buf_map+0x1bd/0x490
     xfs_imap_to_bp+0x4f/0x70
     xfs_iget+0x300/0xb40
     xlog_recover_process_one_iunlink+0x4c/0x170
     xlog_recover_process_iunlinks.isra.0+0xee/0x130
     xlog_recover_finish+0x57/0x110
     xfs_log_mount_finish+0xfc/0x1e0
     xfs_mountfs+0x540/0x910
     xfs_fs_fill_super+0x495/0x850
     get_tree_bdev+0x171/0x270
     xfs_fs_get_tree+0x15/0x20
     vfs_get_tree+0x24/0xc0
     path_mount+0x304/0xba0
     __x64_sys_mount+0x108/0x140
     do_syscall_64+0x35/0x80
     entry_SYSCALL_64_after_hwframe+0x44/0xae
     </TASK>
    task:xfsaild/pmem1   state:D stack:14544 pid:324525 ppid:     2 flags:0x00004000
    Call Trace:
     <TASK>
     __schedule+0x30d/0x9e0
     schedule+0x55/0xd0
     io_schedule+0x4b/0x80
     xfs_buf_wait_unpin+0x9e/0xf0
     __xfs_buf_submit+0x14a/0x230
     xfs_buf_delwri_submit_buffers+0x107/0x280
     xfs_buf_delwri_submit_nowait+0x10/0x20
     xfsaild+0x27e/0x9d0
     kthread+0xf6/0x120
     ret_from_fork+0x1f/0x30
    
    We have the mount process waiting on an inode cluster buffer read,
    inodegc doing unlink waiting on the same inode cluster buffer, and
    the AIL push thread blocked in writeback waiting for the inode
    cluster buffer to become unpinned.
    
    What has happened here is that the AIL push thread has raced with
    the inodegc process modifying, committing and pinning the inode
    cluster buffer here in xfs_buf_delwri_submit_buffers() here:
    
    	blk_start_plug(&plug);
    	list_for_each_entry_safe(bp, n, buffer_list, b_list) {
    		if (!wait_list) {
    			if (xfs_buf_ispinned(bp)) {
    				pinned++;
    				continue;
    			}
    Here >>>>>>
    			if (!xfs_buf_trylock(bp))
    				continue;
    
    Basically, the AIL has found the buffer wasn't pinned and got the
    lock without blocking, but then the buffer was pinned. This implies
    the processing here was pre-empted between the pin check and the
    lock, because the pin count can only be increased while holding the
    buffer locked. Hence when it has gone to submit the IO, it has
    blocked waiting for the buffer to be unpinned.
    
    With all executing threads now waiting on the buffer to be unpinned,
    we normally get out of situations like this via the background log
    worker issuing a log force which will unpinned stuck buffers like
    this. But at this point in recovery, we haven't started the log
    worker. In fact, the first thing we do after processing intents and
    unlinked inodes is *start the log worker*. IOWs, we start it too
    late to have it break deadlocks like this.
    
    Avoid this and any other similar deadlock vectors in intent and
    unlinked inode recovery by starting the log worker before we recover
    intents and unlinked inodes. This part of recovery runs as though
    the filesystem is fully active, so we really should have the same
    infrastructure running as we normally do at runtime.
    Signed-off-by: NDave Chinner <dchinner@redhat.com>
    Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
    Reviewed-by: NChandan Babu R <chandan.babu@oracle.com>
    Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
    Signed-off-by: NLong Li <leo.lilong@huawei.com>
    Reviewed-by: NYang Erkun <yangerkun@huawei.com>
    Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
    Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
    e5870eee
xfs_log.c 107.2 KB