1. 12 4月, 2023 6 次提交
    • D
      xfs: convert XFS_IFORK_PTR to a static inline helper · 34ba4153
      Darrick J. Wong 提交于
      mainline inclusion
      from mainline-v5.19-rc5
      commit 732436ef
      category: bugfix
      bugzilla: 187164, https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=732436ef916b4f338d672ea56accfdb11e8d0732
      
      --------------------------------
      
      We're about to make this logic do a bit more, so convert the macro to a
      static inline function for better typechecking and fewer shouty macros.
      No functional changes here.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      
      conflicts:
      	fs/xfs/libxfs/xfs_bmap.c
      	fs/xfs/libxfs/xfs_bmap_btree.c
      	fs/xfs/libxfs/xfs_inode_fork.c
      	fs/xfs/libxfs/xfs_inode_fork.h
      	fs/xfs/scrub/bmap.c
      	fs/xfs/scrub/symlink.c
      	fs/xfs/xfs_inode.c
      	fs/xfs/xfs_ioctl.c
      	fs/xfs/xfs_qm.c
      	fs/xfs/xfs_reflink.c
      Signed-off-by: NLong Li <leo.lilong@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      34ba4153
    • B
      xfs: don't reuse busy extents on extent trim · 63c2dc44
      Brian Foster 提交于
      mainline inclusion
      from mainline-v5.11-rc4
      commit 06058bc4
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=06058bc40534530e617e5623775c53bb24f032cb
      
      --------------------------------
      
      Freed extents are marked busy from the point the freeing transaction
      commits until the associated CIL context is checkpointed to the log.
      This prevents reuse and overwrite of recently freed blocks before
      the changes are committed to disk, which can lead to corruption
      after a crash. The exception to this rule is that metadata
      allocation is allowed to reuse busy extents because metadata changes
      are also logged.
      
      As of commit 97d3ac75 ("xfs: exact busy extent tracking"), XFS
      has allowed modification or complete invalidation of outstanding
      busy extents for metadata allocations. This implementation assumes
      that use of the associated extent is imminent, which is not always
      the case. For example, the trimmed extent might not satisfy the
      minimum length of the allocation request, or the allocation
      algorithm might be involved in a search for the optimal result based
      on locality.
      
      generic/019 reproduces a corruption caused by this scenario. First,
      a metadata block (usually a bmbt or symlink block) is freed from an
      inode. A subsequent bmbt split on an unrelated inode attempts a near
      mode allocation request that invalidates the busy block during the
      search, but does not ultimately allocate it. Due to the busy state
      invalidation, the block is no longer considered busy to subsequent
      allocation. A direct I/O write request immediately allocates the
      block and writes to it. Finally, the filesystem crashes while in a
      state where the initial metadata block free had not committed to the
      on-disk log. After recovery, the original metadata block is in its
      original location as expected, but has been corrupted by the
      aforementioned dio.
      
      This demonstrates that it is fundamentally unsafe to modify busy
      extent state for extents that are not guaranteed to be allocated.
      This applies to pretty much all of the code paths that currently
      trim busy extents for one reason or another. Therefore to address
      this problem, drop the reuse mechanism from the busy extent trim
      path. This code already knows how to return partial non-busy ranges
      of the targeted free extent and higher level code tracks the busy
      state of the allocation attempt. If a block allocation fails where
      one or more candidate extents is busy, we force the log and retry
      the allocation.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NChandan Babu R <chandanrlinux@gmail.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NLong Li <leo.lilong@huawei.com>
      Reviewed-by: NYang Erkun <yangerkun@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      63c2dc44
    • Z
      fs/xfs: convert comma to semicolon · b53af989
      Zheng Yongjun 提交于
      mainline inclusion
      from mainline-v5.10-rc5
      commit 1189686e
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1189686e5440041057f8cc21a7c1d13bb6642cb9
      
      --------------------------------
      
      Replace a comma between expression statements by a semicolon.
      Signed-off-by: NZheng Yongjun <zhengyongjun3@huawei.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NLong Li <leo.lilong@huawei.com>
      Reviewed-by: NYang Erkun <yangerkun@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      b53af989
    • D
      xfs: xfs_ail_push_all_sync() stalls when racing with updates · 6d1cae97
      Dave Chinner 提交于
      mainline inclusion
      from mainline-v5.17-rc6
      commit 941fbdfd
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=941fbdfd6dd0f1d7961c28123b5460912f678cb5
      
      --------------------------------
      
      xfs_ail_push_all_sync() has a loop like this:
      
      while max_ail_lsn {
      	prepare_to_wait(ail_empty)
      	target = max_ail_lsn
      	wake_up(ail_task);
      	schedule()
      }
      
      Which is designed to sleep until the AIL is emptied. When
      xfs_ail_update_finish() moves the tail of the log, it does:
      
      	if (list_empty(&ailp->ail_head))
      		wake_up_all(&ailp->ail_empty);
      
      So it will only wake up the sync push waiter when the AIL goes
      empty. If, by the time the push waiter has woken, the AIL has more
      in it, it will reset the target, wake the push task and go back to
      sleep.
      
      The problem here is that if the AIL is having items added to it
      when xfs_ail_push_all_sync() is called, then they may get inserted
      into the AIL at a LSN higher than the target LSN. At this point,
      xfsaild_push() will see that the target is X, the item LSNs are
      (X+N) and skip over them, hence never pushing the out.
      
      The result of this the AIL will not get emptied by the AIL push
      thread, hence xfs_ail_finish_update() will never see the AIL being
      empty even if it moves the tail. Hence xfs_ail_push_all_sync() never
      gets woken and hence cannot update the push target to capture the
      items beyond the current target on the LSN.
      
      This is a TOCTOU type of issue so the way to avoid it is to not
      use the push target at all for sync pushes. We know that a sync push
      is being requested by the fact the ail_empty wait queue is active,
      hence the xfsaild can just set the target to max_ail_lsn on every
      push that we see the wait queue active. Hence we no longer will
      leave items on the AIL that are beyond the LSN sampled at the start
      of a sync push.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChandan Babu R <chandan.babu@oracle.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NLong Li <leo.lilong@huawei.com>
      Reviewed-by: NYang Erkun <yangerkun@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      6d1cae97
    • D
      xfs: check buffer pin state after locking in delwri_submit · 9781b974
      Dave Chinner 提交于
      mainline inclusion
      from mainline-v5.17-rc6
      commit dbd0f529
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=dbd0f5299302f8506637592e2373891a748c6990
      
      --------------------------------
      
      AIL flushing can get stuck here:
      
      [316649.005769] INFO: task xfsaild/pmem1:324525 blocked for more than 123 seconds.
      [316649.007807]       Not tainted 5.17.0-rc6-dgc+ #975
      [316649.009186] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [316649.011720] task:xfsaild/pmem1   state:D stack:14544 pid:324525 ppid:     2 flags:0x00004000
      [316649.014112] Call Trace:
      [316649.014841]  <TASK>
      [316649.015492]  __schedule+0x30d/0x9e0
      [316649.017745]  schedule+0x55/0xd0
      [316649.018681]  io_schedule+0x4b/0x80
      [316649.019683]  xfs_buf_wait_unpin+0x9e/0xf0
      [316649.021850]  __xfs_buf_submit+0x14a/0x230
      [316649.023033]  xfs_buf_delwri_submit_buffers+0x107/0x280
      [316649.024511]  xfs_buf_delwri_submit_nowait+0x10/0x20
      [316649.025931]  xfsaild+0x27e/0x9d0
      [316649.028283]  kthread+0xf6/0x120
      [316649.030602]  ret_from_fork+0x1f/0x30
      
      in the situation where flushing gets preempted between the unpin
      check and the buffer trylock under nowait conditions:
      
      	blk_start_plug(&plug);
      	list_for_each_entry_safe(bp, n, buffer_list, b_list) {
      		if (!wait_list) {
      			if (xfs_buf_ispinned(bp)) {
      				pinned++;
      				continue;
      			}
      Here >>>>>>
      			if (!xfs_buf_trylock(bp))
      				continue;
      
      This means submission is stuck until something else triggers a log
      force to unpin the buffer.
      
      To get onto the delwri list to begin with, the buffer pin state has
      already been checked, and hence it's relatively rare we get a race
      between flushing and encountering a pinned buffer in delwri
      submission to begin with. Further, to increase the pin count the
      buffer has to be locked, so the only way we can hit this race
      without failing the trylock is to be preempted between the pincount
      check seeing zero and the trylock being run.
      
      Hence to avoid this problem, just invert the order of trylock vs
      pin check. We shouldn't hit that many pinned buffers here, so
      optimising away the trylock for pinned buffers should not matter for
      performance at all.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChandan Babu R <chandan.babu@oracle.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NLong Li <leo.lilong@huawei.com>
      Reviewed-by: NYang Erkun <yangerkun@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      9781b974
    • D
      xfs: log worker needs to start before intent/unlink recovery · 74d73186
      Dave Chinner 提交于
      mainline inclusion
      from mainline-v5.17-rc6
      commit a9a4bc8c
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a9a4bc8c76d747aa40b30e2dfc176c781f353a08
      
      --------------------------------
      
      After 963 iterations of generic/530, it deadlocked during recovery
      on a pinned inode cluster buffer like so:
      
      XFS (pmem1): Starting recovery (logdev: internal)
      INFO: task kworker/8:0:306037 blocked for more than 122 seconds.
            Not tainted 5.17.0-rc6-dgc+ #975
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      task:kworker/8:0     state:D stack:13024 pid:306037 ppid:     2 flags:0x00004000
      Workqueue: xfs-inodegc/pmem1 xfs_inodegc_worker
      Call Trace:
       <TASK>
       __schedule+0x30d/0x9e0
       schedule+0x55/0xd0
       schedule_timeout+0x114/0x160
       __down+0x99/0xf0
       down+0x5e/0x70
       xfs_buf_lock+0x36/0xf0
       xfs_buf_find+0x418/0x850
       xfs_buf_get_map+0x47/0x380
       xfs_buf_read_map+0x54/0x240
       xfs_trans_read_buf_map+0x1bd/0x490
       xfs_imap_to_bp+0x4f/0x70
       xfs_iunlink_map_ino+0x66/0xd0
       xfs_iunlink_map_prev.constprop.0+0x148/0x2f0
       xfs_iunlink_remove_inode+0xf2/0x1d0
       xfs_inactive_ifree+0x1a3/0x900
       xfs_inode_unlink+0xcc/0x210
       xfs_inodegc_worker+0x1ac/0x2f0
       process_one_work+0x1ac/0x390
       worker_thread+0x56/0x3c0
       kthread+0xf6/0x120
       ret_from_fork+0x1f/0x30
       </TASK>
      task:mount           state:D stack:13248 pid:324509 ppid:324233 flags:0x00004000
      Call Trace:
       <TASK>
       __schedule+0x30d/0x9e0
       schedule+0x55/0xd0
       schedule_timeout+0x114/0x160
       __down+0x99/0xf0
       down+0x5e/0x70
       xfs_buf_lock+0x36/0xf0
       xfs_buf_find+0x418/0x850
       xfs_buf_get_map+0x47/0x380
       xfs_buf_read_map+0x54/0x240
       xfs_trans_read_buf_map+0x1bd/0x490
       xfs_imap_to_bp+0x4f/0x70
       xfs_iget+0x300/0xb40
       xlog_recover_process_one_iunlink+0x4c/0x170
       xlog_recover_process_iunlinks.isra.0+0xee/0x130
       xlog_recover_finish+0x57/0x110
       xfs_log_mount_finish+0xfc/0x1e0
       xfs_mountfs+0x540/0x910
       xfs_fs_fill_super+0x495/0x850
       get_tree_bdev+0x171/0x270
       xfs_fs_get_tree+0x15/0x20
       vfs_get_tree+0x24/0xc0
       path_mount+0x304/0xba0
       __x64_sys_mount+0x108/0x140
       do_syscall_64+0x35/0x80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
       </TASK>
      task:xfsaild/pmem1   state:D stack:14544 pid:324525 ppid:     2 flags:0x00004000
      Call Trace:
       <TASK>
       __schedule+0x30d/0x9e0
       schedule+0x55/0xd0
       io_schedule+0x4b/0x80
       xfs_buf_wait_unpin+0x9e/0xf0
       __xfs_buf_submit+0x14a/0x230
       xfs_buf_delwri_submit_buffers+0x107/0x280
       xfs_buf_delwri_submit_nowait+0x10/0x20
       xfsaild+0x27e/0x9d0
       kthread+0xf6/0x120
       ret_from_fork+0x1f/0x30
      
      We have the mount process waiting on an inode cluster buffer read,
      inodegc doing unlink waiting on the same inode cluster buffer, and
      the AIL push thread blocked in writeback waiting for the inode
      cluster buffer to become unpinned.
      
      What has happened here is that the AIL push thread has raced with
      the inodegc process modifying, committing and pinning the inode
      cluster buffer here in xfs_buf_delwri_submit_buffers() here:
      
      	blk_start_plug(&plug);
      	list_for_each_entry_safe(bp, n, buffer_list, b_list) {
      		if (!wait_list) {
      			if (xfs_buf_ispinned(bp)) {
      				pinned++;
      				continue;
      			}
      Here >>>>>>
      			if (!xfs_buf_trylock(bp))
      				continue;
      
      Basically, the AIL has found the buffer wasn't pinned and got the
      lock without blocking, but then the buffer was pinned. This implies
      the processing here was pre-empted between the pin check and the
      lock, because the pin count can only be increased while holding the
      buffer locked. Hence when it has gone to submit the IO, it has
      blocked waiting for the buffer to be unpinned.
      
      With all executing threads now waiting on the buffer to be unpinned,
      we normally get out of situations like this via the background log
      worker issuing a log force which will unpinned stuck buffers like
      this. But at this point in recovery, we haven't started the log
      worker. In fact, the first thing we do after processing intents and
      unlinked inodes is *start the log worker*. IOWs, we start it too
      late to have it break deadlocks like this.
      
      Avoid this and any other similar deadlock vectors in intent and
      unlinked inode recovery by starting the log worker before we recover
      intents and unlinked inodes. This part of recovery runs as though
      the filesystem is fully active, so we really should have the same
      infrastructure running as we normally do at runtime.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NChandan Babu R <chandan.babu@oracle.com>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NLong Li <leo.lilong@huawei.com>
      Reviewed-by: NYang Erkun <yangerkun@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      74d73186
  2. 04 4月, 2023 19 次提交
  3. 03 4月, 2023 3 次提交
  4. 01 4月, 2023 1 次提交
  5. 31 3月, 2023 1 次提交
  6. 29 3月, 2023 10 次提交