1. 29 6月, 2023 1 次提交
    • Y
      xfs: fix mounting failed caused by sequencing problem in the log records · dba19fb8
      yangerkun 提交于
      Offering: HULK
      hulk inclusion
      category: bugfix
      bugzilla: 188870, https://gitee.com/openeuler/kernel/issues/I76JSK
      
      --------------------------------
      
      During the test of growfs + power-off, we encountered a mounting failure
      issue.
      The specific call stack is as follows:
      
      [584505.210179] XFS (loop0): xfs_buf_find: daddr 0x6d6002 out of range,
      EOFS 0x6d6000
      ...
      [584505.210739] Call Trace:
      [584505.210776]  xfs_buf_get_map+0x44/0x230 [xfs]
      [584505.210780]  ? trace_event_buffer_commit+0x57/0x140
      [584505.210818]  xfs_buf_read_map+0x54/0x280 [xfs]
      [584505.210858]  ? xlog_recover_items_pass2+0x53/0xb0 [xfs]
      [584505.210899]  xlog_recover_buf_commit_pass2+0x112/0x440 [xfs]
      [584505.210939]  ? xlog_recover_items_pass2+0x53/0xb0 [xfs]
      [584505.210980]  xlog_recover_items_pass2+0x53/0xb0 [xfs]
      [584505.211020]  xlog_recover_commit_trans+0x2ca/0x320 [xfs]
      [584505.211061]  xlog_recovery_process_trans+0xc6/0xf0 [xfs]
      [584505.211101]  xlog_recover_process_data+0x9e/0x110 [xfs]
      [584505.211141]  xlog_do_recovery_pass+0x3b4/0x5c0 [xfs]
      [584505.211181]  xlog_do_log_recovery+0x5e/0x80 [xfs]
      [584505.211223]  xlog_do_recover+0x33/0x1a0 [xfs]
      [584505.211262]  xlog_recover+0xd7/0x170 [xfs]
      [584505.211303]  xfs_log_mount+0x217/0x2b0 [xfs]
      [584505.211341]  xfs_mountfs+0x3da/0x870 [xfs]
      [584505.211384]  xfs_fc_fill_super+0x3fa/0x7a0 [xfs]
      [584505.211428]  ? xfs_setup_devices+0x80/0x80 [xfs]
      [584505.211432]  get_tree_bdev+0x16f/0x260
      [584505.211434]  vfs_get_tree+0x25/0xc0
      [584505.211436]  do_new_mount+0x156/0x1b0
      [584505.211438]  __se_sys_mount+0x165/0x1d0
      [584505.211440]  do_syscall_64+0x33/0x40
      [584505.211442]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
      
      After analyzing the log records, we have discovered the following
      content:
      
      ============================================================================
      cycle: 173  version: 2    lsn: 173,2742 tail_lsn: 173,1243
      length of Log Record: 25600 prev offset: 2702   num ops: 258
      uuid: fb958458-48a3-4c76-ae23-7a1cf3053065   format: little endian linux
      h_size: 32768
      ----------------------------------------------------------------------------
      ...
      ----------------------------------------------------------------------------
      Oper (100): tid: 1c010724  len: 24  clientid: TRANS  flags: none
      BUF:  #regs: 2   start blkno: 7168002 (0x6d6002)  len: 1  bmap size: 1
      flags: 0x3800
      Oper (101): tid: 1c010724  len: 128  clientid: TRANS  flags: none
      AGI Buffer: XAGI
      ver: 1  seq#: 28  len: 2048  cnt: 0  root: 3
      level: 1  free#: 0x0  newino: 0x140
      bucket[0 - 3]: 0xffffffff 0xffffffff 0xffffffff 0xffffffff
      bucket[4 - 7]: 0xffffffff 0xffffffff 0xffffffff 0xffffffff
      bucket[8 - 11]: 0xffffffff 0xffffffff 0xffffffff 0xffffffff
      bucket[12 - 15]: 0xffffffff 0xffffffff 0xffffffff 0xffffffff
      bucket[16 - 19]: 0xffffffff
      ----------------------------------------------------------------------------
      ...
      ----------------------------------------------------------------------------
      Oper (108): tid: 1c010724  len: 24  clientid: TRANS  flags: none
      BUF:  #regs: 2   start blkno: 0 (0x0)  len: 1  bmap size: 1  flags:
      0x9000
      Oper (109): tid: 1c010724  len: 384  clientid: TRANS  flags: none
      SUPER BLOCK Buffer:
      icount: 6360863066640355328  ifree: 898048  fdblks: 0  frext: 0
      ----------------------------------------------------------------------------
      ...
      
      We found that in the log records, the modification transaction for the
      expanded block is before the growfs transaction, which leads to
      verification
      failure during log replay.
      
      We need to ensure that when replaying logs, transactions related to the
      superblock are replayed first.
      Signed-off-by: NWu Guanghao <wuguanghao3@huawei.com>
      Signed-off-by: Nyangerkun <yangerkun@huawei.com>
      Signed-off-by: NLong Li <leo.lilong@huawei.com>
      dba19fb8
  2. 07 6月, 2023 2 次提交
    • D
      xfs: log shutdown triggers should only shut down the log · 50092d28
      Dave Chinner 提交于
      mainline inclusion
      from mainline-v5.17-rc6
      commit b5f17bec
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I76JSK
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b5f17bec1213a3ed2f4d79ad4c566e00cabe2a9b
      
      --------------------------------
      
      We've got a mess on our hands.
      
      1. xfs_trans_commit() cannot cancel transactions because the mount is
      shut down - that causes dirty, aborted, unlogged log items to sit
      unpinned in memory and potentially get written to disk before the
      log is shut down. Hence xfs_trans_commit() can only abort
      transactions when xlog_is_shutdown() is true.
      
      2. xfs_force_shutdown() is used in places to cause the current
      modification to be aborted via xfs_trans_commit() because it may be
      impractical or impossible to cancel the transaction directly, and
      hence xfs_trans_commit() must cancel transactions when
      xfs_is_shutdown() is true in this situation. But we can't do that
      because of #1.
      
      3. Log IO errors cause log shutdowns by calling xfs_force_shutdown()
      to shut down the mount and then the log from log IO completion.
      
      4. xfs_force_shutdown() can result in a log force being issued,
      which has to wait for log IO completion before it will mark the log
      as shut down. If #3 races with some other shutdown trigger that runs
      a log force, we rely on xfs_force_shutdown() silently ignoring #3
      and avoiding shutting down the log until the failed log force
      completes.
      
      5. To ensure #2 always works, we have to ensure that
      xfs_force_shutdown() does not return until the the log is shut down.
      But in the case of #4, this will result in a deadlock because the
      log Io completion will block waiting for a log force to complete
      which is blocked waiting for log IO to complete....
      
      So the very first thing we have to do here to untangle this mess is
      dissociate log shutdown triggers from mount shutdowns. We already
      have xlog_forced_shutdown, which will atomically transistion to the
      log a shutdown state. Due to internal asserts it cannot be called
      multiple times, but was done simply because the only place that
      could call it was xfs_do_force_shutdown() (i.e. the mount shutdown!)
      and that could only call it once and once only.  So the first thing
      we do is remove the asserts.
      
      We then convert all the internal log shutdown triggers to call
      xlog_force_shutdown() directly instead of xfs_force_shutdown(). This
      allows the log shutdown triggers to shut down the log without
      needing to care about mount based shutdown constraints. This means
      we shut down the log independently of the mount and the mount may
      not notice this until it's next attempt to read or modify metadata.
      At that point (e.g. xfs_trans_commit()) it will see that the log is
      shutdown, error out and shutdown the mount.
      
      To ensure that all the unmount behaviours and asserts track
      correctly as a result of a log shutdown, propagate the shutdown up
      to the mount if it is not already set. This keeps the mount and log
      state in sync, and saves a huge amount of hassle where code fails
      because of a log shutdown but only checks for mount shutdowns and
      hence ends up doing the wrong thing. Cleaning up that mess is
      an exercise for another day.
      
      This enables us to address the other problems noted above in
      followup patches.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NLong Li <leo.lilong@huawei.com>
      50092d28
    • D
      xfs: shutdown in intent recovery has non-intent items in the AIL · ce25eef7
      Dave Chinner 提交于
      mainline inclusion
      from mainline-v5.17-rc6
      commit ab9c81ef
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I76JSK
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ab9c81ef321f90dd208b1d4809c196c2794e4b15
      
      --------------------------------
      
      generic/388 triggered a failure in RUI recovery due to a corrupted
      btree record and the system then locked up hard due to a subsequent
      assert failure while holding a spinlock cancelling intents:
      
       XFS (pmem1): Corruption of in-memory data (0x8) detected at xfs_do_force_shutdown+0x1a/0x20 (fs/xfs/xfs_trans.c:964).  Shutting down filesystem.
       XFS (pmem1): Please unmount the filesystem and rectify the problem(s)
       XFS: Assertion failed: !xlog_item_is_intent(lip), file: fs/xfs/xfs_log_recover.c, line: 2632
       Call Trace:
        <TASK>
        xlog_recover_cancel_intents.isra.0+0xd1/0x120
        xlog_recover_finish+0xb9/0x110
        xfs_log_mount_finish+0x15a/0x1e0
        xfs_mountfs+0x540/0x910
        xfs_fs_fill_super+0x476/0x830
        get_tree_bdev+0x171/0x270
        ? xfs_init_fs_context+0x1e0/0x1e0
        xfs_fs_get_tree+0x15/0x20
        vfs_get_tree+0x24/0xc0
        path_mount+0x304/0xba0
        ? putname+0x55/0x60
        __x64_sys_mount+0x108/0x140
        do_syscall_64+0x35/0x80
        entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Essentially, there's dirty metadata in the AIL from intent recovery
      transactions, so when we go to cancel the remaining intents we assume
      that all objects after the first non-intent log item in the AIL are
      not intents.
      
      This is not true. Intent recovery can log new intents to continue
      the operations the original intent could not complete in a single
      transaction. The new intents are committed before they are deferred,
      which means if the CIL commits in the background they will get
      inserted into the AIL at the head.
      
      Hence if we shut down the filesystem while processing intent
      recovery, the AIL may have new intents active at the current head.
      Hence this check:
      
                      /*
                       * We're done when we see something other than an intent.
                       * There should be no intents left in the AIL now.
                       */
                      if (!xlog_item_is_intent(lip)) {
                              for (; lip; lip = xfs_trans_ail_cursor_next(ailp, &cur))
                                      ASSERT(!xlog_item_is_intent(lip));
                              break;
                      }
      
      in both xlog_recover_process_intents() and
      log_recover_cancel_intents() is simply not valid. It was valid back
      when we only had EFI/EFD intents and didn't chain intents, but it
      hasn't been valid ever since intent recovery could create and commit
      new intents.
      
      Given that crashing the mount task like this pretty much prevents
      diagnosing what went wrong that lead to the initial failure that
      triggered intent cancellation, just remove the checks altogether.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NLong Li <leo.lilong@huawei.com>
      ce25eef7
  3. 06 6月, 2023 4 次提交
  4. 19 4月, 2023 2 次提交
  5. 21 11月, 2022 2 次提交
    • Z
      xfs: flush inode gc workqueue before clearing agi bucket · ef4894f0
      Zhang Yi 提交于
      mainline inclusion
      from mainline-v5.19-rc2
      commit 04a98a03
      category: bugfix
      bugzilla: 187526,https://gitee.com/openeuler/kernel/issues/I4KIAO
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=04a98a036cf8b810dda172a9dcfcbd783bf63655
      
      --------------------------------
      
      In the procedure of recover AGI unlinked lists, if something bad
      happenes on one of the unlinked inode in the bucket list, we would call
      xlog_recover_clear_agi_bucket() to clear the whole unlinked bucket list,
      not the unlinked inodes after the bad one. If we have already added some
      inodes to the gc workqueue before the bad inode in the list, we could
      get below error when freeing those inodes, and finaly fail to complete
      the log recover procedure.
      
       XFS (ram0): Internal error xfs_iunlink_remove at line 2456 of file
       fs/xfs/xfs_inode.c.  Caller xfs_ifree+0xb0/0x360 [xfs]
      
      The problem is xlog_recover_clear_agi_bucket() clear the bucket list, so
      the gc worker fail to check the agino in xfs_verify_agino(). Fix this by
      flush workqueue before clearing the bucket.
      
      Fixes: ab23a776 ("xfs: per-cpu deferred inode inactivation queues")
      Signed-off-by: NZhang Yi <yi.zhang@huawei.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      ef4894f0
    • D
      xfs: only run COW extent recovery when there are no live extents · 8efeef76
      Darrick J. Wong 提交于
      mainline inclusion
      from mainline-v5.16-rc5
      commit 7993f1a4
      category: bugfix
      bugzilla: 186901,https://gitee.com/openeuler/kernel/issues/I4KIAO
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7993f1a431bc5271369d359941485a9340658ac3
      
      --------------------------------
      
      As part of multiple customer escalations due to file data corruption
      after copy on write operations, I wrote some fstests that use fsstress
      to hammer on COW to shake things loose.  Regrettably, I caught some
      filesystem shutdowns due to incorrect rmap operations with the following
      loop:
      
      mount <filesystem>				# (0)
      fsstress <run only readonly ops> &		# (1)
      while true; do
      	fsstress <run all ops>
      	mount -o remount,ro			# (2)
      	fsstress <run only readonly ops>
      	mount -o remount,rw			# (3)
      done
      
      When (2) happens, notice that (1) is still running.  xfs_remount_ro will
      call xfs_blockgc_stop to walk the inode cache to free all the COW
      extents, but the blockgc mechanism races with (1)'s reader threads to
      take IOLOCKs and loses, which means that it doesn't clean them all out.
      Call such a file (A).
      
      When (3) happens, xfs_remount_rw calls xfs_reflink_recover_cow, which
      walks the ondisk refcount btree and frees any COW extent that it finds.
      This function does not check the inode cache, which means that incore
      COW forks of inode (A) is now inconsistent with the ondisk metadata.  If
      one of those former COW extents are allocated and mapped into another
      file (B) and someone triggers a COW to the stale reservation in (A), A's
      dirty data will be written into (B) and once that's done, those blocks
      will be transferred to (A)'s data fork without bumping the refcount.
      
      The results are catastrophic -- file (B) and the refcount btree are now
      corrupt.  In the first patch, we fixed the race condition in (2) so that
      (A) will always flush the COW fork.  In this second patch, we move the
      _recover_cow call to the initial mount call in (0) for safety.
      
      As mentioned previously, xfs_reflink_recover_cow walks the refcount
      btree looking for COW staging extents, and frees them.  This was
      intended to be run at mount time (when we know there are no live inodes)
      to clean up any leftover staging events that may have been left behind
      during an unclean shutdown.  As a time "optimization" for readonly
      mounts, we deferred this to the ro->rw transition, not realizing that
      any failure to clean all COW forks during a rw->ro transition would
      result in catastrophic corruption.
      
      Therefore, remove this optimization and only run the recovery routine
      when we're guaranteed not to have any COW staging extents anywhere,
      which means we always run this at mount time.  While we're at it, move
      the callsite to xfs_log_mount_finish because any refcount btree
      expansion (however unlikely given that we're removing records from the
      right side of the index) must be fed by a per-AG reservation, which
      doesn't exist in its current location.
      
      Fixes: 174edb0e ("xfs: store in-progress CoW allocations in the refcount btree")
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NChandan Babu R <chandan.babu@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      8efeef76
  6. 09 3月, 2022 3 次提交
  7. 07 1月, 2022 1 次提交
    • D
      xfs: per-cpu deferred inode inactivation queues · 705fccfb
      Dave Chinner 提交于
      mainline-inclusion
      from mainline-v5.14-rc4
      commit ab23a776
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ab23a7768739a23d21d8a16ca37dff96b1ca957a
      
      -------------------------------------------------
      
      Move inode inactivation to background work contexts so that it no
      longer runs in the context that releases the final reference to an
      inode. This will allow process work that ends up blocking on
      inactivation to continue doing work while the filesytem processes
      the inactivation in the background.
      
      A typical demonstration of this is unlinking an inode with lots of
      extents. The extents are removed during inactivation, so this blocks
      the process that unlinked the inode from the directory structure. By
      moving the inactivation to the background process, the userspace
      applicaiton can keep working (e.g. unlinking the next inode in the
      directory) while the inactivation work on the previous inode is
      done by a different CPU.
      
      The implementation of the queue is relatively simple. We use a
      per-cpu lockless linked list (llist) to queue inodes for
      inactivation without requiring serialisation mechanisms, and a work
      item to allow the queue to be processed by a CPU bound worker
      thread. We also keep a count of the queue depth so that we can
      trigger work after a number of deferred inactivations have been
      queued.
      
      The use of a bound workqueue with a single work depth allows the
      workqueue to run one work item per CPU. We queue the work item on
      the CPU we are currently running on, and so this essentially gives
      us affine per-cpu worker threads for the per-cpu queues. THis
      maintains the effective CPU affinity that occurs within XFS at the
      AG level due to all objects in a directory being local to an AG.
      Hence inactivation work tends to run on the same CPU that last
      accessed all the objects that inactivation accesses and this
      maintains hot CPU caches for unlink workloads.
      
      A depth of 32 inodes was chosen to match the number of inodes in an
      inode cluster buffer. This hopefully allows sequential
      allocation/unlink behaviours to defering inactivation of all the
      inodes in a single cluster buffer at a time, further helping
      maintain hot CPU and buffer cache accesses while running
      inactivations.
      
      A hard per-cpu queue throttle of 256 inode has been set to avoid
      runaway queuing when inodes that take a long to time inactivate are
      being processed. For example, when unlinking inodes with large
      numbers of extents that can take a lot of processing to free.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      [djwong: tweak comments and tracepoints, convert opflags to state bits]
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NLihong Kou <koulihong@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      705fccfb
  8. 27 12月, 2021 4 次提交
  9. 22 10月, 2020 1 次提交
    • D
      xfs: cancel intents immediately if process_intents fails · 2e76f188
      Darrick J. Wong 提交于
      If processing recovered log intent items fails, we need to cancel all
      the unprocessed recovered items immediately so that a subsequent AIL
      push in the bail out path won't get wedged on the pinned intent items
      that didn't get processed.
      
      This can happen if the log contains (1) an intent that gets and releases
      an inode, (2) an intent that cannot be recovered successfully, and (3)
      some third intent item.  When recovery of (2) fails, we leave (3) pinned
      in memory.  Inode reclamation is called in the error-out path of
      xfs_mountfs before xfs_log_cancel_mount.  Reclamation calls
      xfs_ail_push_all_sync, which gets stuck waiting for (3).
      
      Therefore, call xlog_recover_cancel_intents if _process_intents fails.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      2e76f188
  10. 07 10月, 2020 5 次提交
    • D
      xfs: fix an incore inode UAF in xfs_bui_recover · ff4ab5e0
      Darrick J. Wong 提交于
      In xfs_bui_item_recover, there exists a use-after-free bug with regards
      to the inode that is involved in the bmap replay operation.  If the
      mapping operation does not complete, we call xfs_bmap_unmap_extent to
      create a deferred op to finish the unmapping work, and we retain a
      pointer to the incore inode.
      
      Unfortunately, the very next thing we do is commit the transaction and
      drop the inode.  If reclaim tears down the inode before we try to finish
      the defer ops, we dereference garbage and blow up.  Therefore, create a
      way to join inodes to the defer ops freezer so that we can maintain the
      xfs_inode reference until we're done with the inode.
      
      Note: This imposes the requirement that there be enough memory to keep
      every incore inode in memory throughout recovery.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      ff4ab5e0
    • D
      xfs: xfs_defer_capture should absorb remaining transaction reservation · 929b92f6
      Darrick J. Wong 提交于
      When xfs_defer_capture extracts the deferred ops and transaction state
      from a transaction, it should record the transaction reservation type
      from the old transaction so that when we continue the dfops chain, we
      still use the same reservation parameters.
      
      Doing this means that the log item recovery functions get to determine
      the transaction reservation instead of abusing tr_itruncate in yet
      another part of xfs.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      929b92f6
    • D
      xfs: xfs_defer_capture should absorb remaining block reservations · 4f9a60c4
      Darrick J. Wong 提交于
      When xfs_defer_capture extracts the deferred ops and transaction state
      from a transaction, it should record the remaining block reservations so
      that when we continue the dfops chain, we can reserve the same number of
      blocks to use.  We capture the reservations for both data and realtime
      volumes.
      
      This adds the requirement that every log intent item recovery function
      must be careful to reserve enough blocks to handle both itself and all
      defer ops that it can queue.  On the other hand, this enables us to do
      away with the handwaving block estimation nonsense that was going on in
      xlog_finish_defer_ops.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      4f9a60c4
    • D
      xfs: proper replay of deferred ops queued during log recovery · e6fff81e
      Darrick J. Wong 提交于
      When we replay unfinished intent items that have been recovered from the
      log, it's possible that the replay will cause the creation of more
      deferred work items.  As outlined in commit 50995582 ("xfs: log
      recovery should replay deferred ops in order"), later work items have an
      implicit ordering dependency on earlier work items.  Therefore, recovery
      must replay the items (both recovered and created) in the same order
      that they would have been during normal operation.
      
      For log recovery, we enforce this ordering by using an empty transaction
      to collect deferred ops that get created in the process of recovering a
      log intent item to prevent them from being committed before the rest of
      the recovered intent items.  After we finish committing all the
      recovered log items, we allocate a transaction with an enormous block
      reservation, splice our huge list of created deferred ops into that
      transaction, and commit it, thereby finishing all those ops.
      
      This is /really/ hokey -- it's the one place in XFS where we allow
      nested transactions; the splicing of the defer ops list is is inelegant
      and has to be done twice per recovery function; and the broken way we
      handle inode pointers and block reservations cause subtle use-after-free
      and allocator problems that will be fixed by this patch and the two
      patches after it.
      
      Therefore, replace the hokey empty transaction with a structure designed
      to capture each chain of deferred ops that are created as part of
      recovering a single unfinished log intent.  Finally, refactor the loop
      that replays those chains to do so using one transaction per chain.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      e6fff81e
    • D
      xfs: remove XFS_LI_RECOVERED · 901219bb
      Darrick J. Wong 提交于
      The ->iop_recover method of a log intent item removes the recovered
      intent item from the AIL by logging an intent done item and committing
      the transaction, so it's superfluous to have this flag check.  Nothing
      else uses it, so get rid of the flag entirely.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      901219bb
  11. 26 9月, 2020 1 次提交
  12. 24 9月, 2020 1 次提交
  13. 23 9月, 2020 1 次提交
  14. 16 9月, 2020 5 次提交
  15. 07 9月, 2020 1 次提交
  16. 05 8月, 2020 1 次提交
  17. 07 7月, 2020 1 次提交
  18. 08 5月, 2020 4 次提交