• B
    xfs: flush inodegc workqueue tasks before cancel · d3611530
    Brian Foster 提交于
    mainline inclusion
    from mainline-v5.16-rc5
    commit 6191cf3a
    category: bugfix
    bugzilla: 187526,https://gitee.com/openeuler/kernel/issues/I6WKVJ
    
    Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6191cf3ad59fda5901160633fef8e41b064a5246
    
    --------------------------------
    
    The xfs_inodegc_stop() helper performs a high level flush of pending
    work on the percpu queues and then runs a cancel_work_sync() on each
    of the percpu work tasks to ensure all work has completed before
    returning.  While cancel_work_sync() waits for wq tasks to complete,
    it does not guarantee work tasks have started. This means that the
    _stop() helper can queue and instantly cancel a wq task without
    having completed the associated work. This can be observed by
    tracepoint inspection of a simple "rm -f <file>; fsfreeze -f <mnt>"
    test:
    
    	xfs_destroy_inode: ... ino 0x83 ...
    	xfs_inode_set_need_inactive: ... ino 0x83 ...
    	xfs_inodegc_stop: ...
    	...
    	xfs_inodegc_start: ...
    	xfs_inodegc_worker: ...
    	xfs_inode_inactivating: ... ino 0x83 ...
    
    The first few lines show that the inode is removed and need inactive
    state set, but the inactivation work has not completed before the
    inodegc mechanism stops. The inactivation doesn't actually occur
    until the fs is unfrozen and the gc mechanism starts back up. Note
    that this test requires fsfreeze to reproduce because xfs_freeze
    indirectly invokes xfs_fs_statfs(), which calls xfs_inodegc_flush().
    
    When this occurs, the workqueue try_to_grab_pending() logic first
    tries to steal the pending bit, which does not succeed because the
    bit has been set by queue_work_on(). Subsequently, it checks for
    association of a pool workqueue from the work item under the pool
    lock. This association is set at the point a work item is queued and
    cleared when dequeued for processing. If the association exists, the
    work item is removed from the queue and cancel_work_sync() returns
    true. If the pwq association is cleared, the remove attempt assumes
    the task is busy and retries (eventually returning false to the
    caller after waiting for the work task to complete).
    
    To avoid this race, we can flush each work item explicitly before
    cancel. However, since the _queue_all() already schedules each
    underlying work item, the workqueue level helpers are sufficient to
    achieve the same ordering effect. E.g., the inodegc enabled flag
    prevents scheduling any further work in the _stop() case. Use the
    drain_workqueue() helper in this particular case to make the intent
    a bit more self explanatory.
    Signed-off-by: NBrian Foster <bfoster@redhat.com>
    Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
    Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
    Reviewed-by: NDave Chinner <dchinner@redhat.com>
    Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com>
    Reviewed-by: NYang Erkun <yangerkun@huawei.com>
    Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
    d3611530
xfs_icache.c 55.7 KB