1. 12 4月, 2023 4 次提交
    • Z
      fs/xfs: convert comma to semicolon · b53af989
      Zheng Yongjun 提交于
      mainline inclusion
      from mainline-v5.10-rc5
      commit 1189686e
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1189686e5440041057f8cc21a7c1d13bb6642cb9
      
      --------------------------------
      
      Replace a comma between expression statements by a semicolon.
      Signed-off-by: NZheng Yongjun <zhengyongjun3@huawei.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NLong Li <leo.lilong@huawei.com>
      Reviewed-by: NYang Erkun <yangerkun@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      b53af989
    • D
      xfs: xfs_ail_push_all_sync() stalls when racing with updates · 6d1cae97
      Dave Chinner 提交于
      mainline inclusion
      from mainline-v5.17-rc6
      commit 941fbdfd
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=941fbdfd6dd0f1d7961c28123b5460912f678cb5
      
      --------------------------------
      
      xfs_ail_push_all_sync() has a loop like this:
      
      while max_ail_lsn {
      	prepare_to_wait(ail_empty)
      	target = max_ail_lsn
      	wake_up(ail_task);
      	schedule()
      }
      
      Which is designed to sleep until the AIL is emptied. When
      xfs_ail_update_finish() moves the tail of the log, it does:
      
      	if (list_empty(&ailp->ail_head))
      		wake_up_all(&ailp->ail_empty);
      
      So it will only wake up the sync push waiter when the AIL goes
      empty. If, by the time the push waiter has woken, the AIL has more
      in it, it will reset the target, wake the push task and go back to
      sleep.
      
      The problem here is that if the AIL is having items added to it
      when xfs_ail_push_all_sync() is called, then they may get inserted
      into the AIL at a LSN higher than the target LSN. At this point,
      xfsaild_push() will see that the target is X, the item LSNs are
      (X+N) and skip over them, hence never pushing the out.
      
      The result of this the AIL will not get emptied by the AIL push
      thread, hence xfs_ail_finish_update() will never see the AIL being
      empty even if it moves the tail. Hence xfs_ail_push_all_sync() never
      gets woken and hence cannot update the push target to capture the
      items beyond the current target on the LSN.
      
      This is a TOCTOU type of issue so the way to avoid it is to not
      use the push target at all for sync pushes. We know that a sync push
      is being requested by the fact the ail_empty wait queue is active,
      hence the xfsaild can just set the target to max_ail_lsn on every
      push that we see the wait queue active. Hence we no longer will
      leave items on the AIL that are beyond the LSN sampled at the start
      of a sync push.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChandan Babu R <chandan.babu@oracle.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NLong Li <leo.lilong@huawei.com>
      Reviewed-by: NYang Erkun <yangerkun@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      6d1cae97
    • D
      xfs: check buffer pin state after locking in delwri_submit · 9781b974
      Dave Chinner 提交于
      mainline inclusion
      from mainline-v5.17-rc6
      commit dbd0f529
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=dbd0f5299302f8506637592e2373891a748c6990
      
      --------------------------------
      
      AIL flushing can get stuck here:
      
      [316649.005769] INFO: task xfsaild/pmem1:324525 blocked for more than 123 seconds.
      [316649.007807]       Not tainted 5.17.0-rc6-dgc+ #975
      [316649.009186] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [316649.011720] task:xfsaild/pmem1   state:D stack:14544 pid:324525 ppid:     2 flags:0x00004000
      [316649.014112] Call Trace:
      [316649.014841]  <TASK>
      [316649.015492]  __schedule+0x30d/0x9e0
      [316649.017745]  schedule+0x55/0xd0
      [316649.018681]  io_schedule+0x4b/0x80
      [316649.019683]  xfs_buf_wait_unpin+0x9e/0xf0
      [316649.021850]  __xfs_buf_submit+0x14a/0x230
      [316649.023033]  xfs_buf_delwri_submit_buffers+0x107/0x280
      [316649.024511]  xfs_buf_delwri_submit_nowait+0x10/0x20
      [316649.025931]  xfsaild+0x27e/0x9d0
      [316649.028283]  kthread+0xf6/0x120
      [316649.030602]  ret_from_fork+0x1f/0x30
      
      in the situation where flushing gets preempted between the unpin
      check and the buffer trylock under nowait conditions:
      
      	blk_start_plug(&plug);
      	list_for_each_entry_safe(bp, n, buffer_list, b_list) {
      		if (!wait_list) {
      			if (xfs_buf_ispinned(bp)) {
      				pinned++;
      				continue;
      			}
      Here >>>>>>
      			if (!xfs_buf_trylock(bp))
      				continue;
      
      This means submission is stuck until something else triggers a log
      force to unpin the buffer.
      
      To get onto the delwri list to begin with, the buffer pin state has
      already been checked, and hence it's relatively rare we get a race
      between flushing and encountering a pinned buffer in delwri
      submission to begin with. Further, to increase the pin count the
      buffer has to be locked, so the only way we can hit this race
      without failing the trylock is to be preempted between the pincount
      check seeing zero and the trylock being run.
      
      Hence to avoid this problem, just invert the order of trylock vs
      pin check. We shouldn't hit that many pinned buffers here, so
      optimising away the trylock for pinned buffers should not matter for
      performance at all.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChandan Babu R <chandan.babu@oracle.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NLong Li <leo.lilong@huawei.com>
      Reviewed-by: NYang Erkun <yangerkun@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      9781b974
    • D
      xfs: log worker needs to start before intent/unlink recovery · 74d73186
      Dave Chinner 提交于
      mainline inclusion
      from mainline-v5.17-rc6
      commit a9a4bc8c
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a9a4bc8c76d747aa40b30e2dfc176c781f353a08
      
      --------------------------------
      
      After 963 iterations of generic/530, it deadlocked during recovery
      on a pinned inode cluster buffer like so:
      
      XFS (pmem1): Starting recovery (logdev: internal)
      INFO: task kworker/8:0:306037 blocked for more than 122 seconds.
            Not tainted 5.17.0-rc6-dgc+ #975
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      task:kworker/8:0     state:D stack:13024 pid:306037 ppid:     2 flags:0x00004000
      Workqueue: xfs-inodegc/pmem1 xfs_inodegc_worker
      Call Trace:
       <TASK>
       __schedule+0x30d/0x9e0
       schedule+0x55/0xd0
       schedule_timeout+0x114/0x160
       __down+0x99/0xf0
       down+0x5e/0x70
       xfs_buf_lock+0x36/0xf0
       xfs_buf_find+0x418/0x850
       xfs_buf_get_map+0x47/0x380
       xfs_buf_read_map+0x54/0x240
       xfs_trans_read_buf_map+0x1bd/0x490
       xfs_imap_to_bp+0x4f/0x70
       xfs_iunlink_map_ino+0x66/0xd0
       xfs_iunlink_map_prev.constprop.0+0x148/0x2f0
       xfs_iunlink_remove_inode+0xf2/0x1d0
       xfs_inactive_ifree+0x1a3/0x900
       xfs_inode_unlink+0xcc/0x210
       xfs_inodegc_worker+0x1ac/0x2f0
       process_one_work+0x1ac/0x390
       worker_thread+0x56/0x3c0
       kthread+0xf6/0x120
       ret_from_fork+0x1f/0x30
       </TASK>
      task:mount           state:D stack:13248 pid:324509 ppid:324233 flags:0x00004000
      Call Trace:
       <TASK>
       __schedule+0x30d/0x9e0
       schedule+0x55/0xd0
       schedule_timeout+0x114/0x160
       __down+0x99/0xf0
       down+0x5e/0x70
       xfs_buf_lock+0x36/0xf0
       xfs_buf_find+0x418/0x850
       xfs_buf_get_map+0x47/0x380
       xfs_buf_read_map+0x54/0x240
       xfs_trans_read_buf_map+0x1bd/0x490
       xfs_imap_to_bp+0x4f/0x70
       xfs_iget+0x300/0xb40
       xlog_recover_process_one_iunlink+0x4c/0x170
       xlog_recover_process_iunlinks.isra.0+0xee/0x130
       xlog_recover_finish+0x57/0x110
       xfs_log_mount_finish+0xfc/0x1e0
       xfs_mountfs+0x540/0x910
       xfs_fs_fill_super+0x495/0x850
       get_tree_bdev+0x171/0x270
       xfs_fs_get_tree+0x15/0x20
       vfs_get_tree+0x24/0xc0
       path_mount+0x304/0xba0
       __x64_sys_mount+0x108/0x140
       do_syscall_64+0x35/0x80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
       </TASK>
      task:xfsaild/pmem1   state:D stack:14544 pid:324525 ppid:     2 flags:0x00004000
      Call Trace:
       <TASK>
       __schedule+0x30d/0x9e0
       schedule+0x55/0xd0
       io_schedule+0x4b/0x80
       xfs_buf_wait_unpin+0x9e/0xf0
       __xfs_buf_submit+0x14a/0x230
       xfs_buf_delwri_submit_buffers+0x107/0x280
       xfs_buf_delwri_submit_nowait+0x10/0x20
       xfsaild+0x27e/0x9d0
       kthread+0xf6/0x120
       ret_from_fork+0x1f/0x30
      
      We have the mount process waiting on an inode cluster buffer read,
      inodegc doing unlink waiting on the same inode cluster buffer, and
      the AIL push thread blocked in writeback waiting for the inode
      cluster buffer to become unpinned.
      
      What has happened here is that the AIL push thread has raced with
      the inodegc process modifying, committing and pinning the inode
      cluster buffer here in xfs_buf_delwri_submit_buffers() here:
      
      	blk_start_plug(&plug);
      	list_for_each_entry_safe(bp, n, buffer_list, b_list) {
      		if (!wait_list) {
      			if (xfs_buf_ispinned(bp)) {
      				pinned++;
      				continue;
      			}
      Here >>>>>>
      			if (!xfs_buf_trylock(bp))
      				continue;
      
      Basically, the AIL has found the buffer wasn't pinned and got the
      lock without blocking, but then the buffer was pinned. This implies
      the processing here was pre-empted between the pin check and the
      lock, because the pin count can only be increased while holding the
      buffer locked. Hence when it has gone to submit the IO, it has
      blocked waiting for the buffer to be unpinned.
      
      With all executing threads now waiting on the buffer to be unpinned,
      we normally get out of situations like this via the background log
      worker issuing a log force which will unpinned stuck buffers like
      this. But at this point in recovery, we haven't started the log
      worker. In fact, the first thing we do after processing intents and
      unlinked inodes is *start the log worker*. IOWs, we start it too
      late to have it break deadlocks like this.
      
      Avoid this and any other similar deadlock vectors in intent and
      unlinked inode recovery by starting the log worker before we recover
      intents and unlinked inodes. This part of recovery runs as though
      the filesystem is fully active, so we really should have the same
      infrastructure running as we normally do at runtime.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NChandan Babu R <chandan.babu@oracle.com>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NLong Li <leo.lilong@huawei.com>
      Reviewed-by: NYang Erkun <yangerkun@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      74d73186
  2. 04 4月, 2023 19 次提交
  3. 03 4月, 2023 3 次提交
  4. 01 4月, 2023 1 次提交
  5. 31 3月, 2023 1 次提交
  6. 29 3月, 2023 12 次提交