1. 07 10月, 2020 1 次提交
  2. 23 9月, 2020 1 次提交
    • D
      xfs: log new intent items created as part of finishing recovered intent items · 93293bcb
      Darrick J. Wong 提交于
      During a code inspection, I found a serious bug in the log intent item
      recovery code when an intent item cannot complete all the work and
      decides to requeue itself to get that done.  When this happens, the
      item recovery creates a new incore deferred op representing the
      remaining work and attaches it to the transaction that it allocated.  At
      the end of _item_recover, it moves the entire chain of deferred ops to
      the dummy parent_tp that xlog_recover_process_intents passed to it, but
      fail to log a new intent item for the remaining work before committing
      the transaction for the single unit of work.
      
      xlog_finish_defer_ops logs those new intent items once recovery has
      finished dealing with the intent items that it recovered, but this isn't
      sufficient.  If the log is forced to disk after a recovered log item
      decides to requeue itself and the system goes down before we call
      xlog_finish_defer_ops, the second log recovery will never see the new
      intent item and therefore has no idea that there was more work to do.
      It will finish recovery leaving the filesystem in a corrupted state.
      
      The same logic applies to /any/ deferred ops added during intent item
      recovery, not just the one handling the remaining work.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      93293bcb
  3. 20 5月, 2020 1 次提交
    • D
      xfs: use ordered buffers to initialize dquot buffers during quotacheck · 78bba5c8
      Darrick J. Wong 提交于
      While QAing the new xfs_repair quotacheck code, I uncovered a quota
      corruption bug resulting from a bad interaction between dquot buffer
      initialization and quotacheck.  The bug can be reproduced with the
      following sequence:
      
      # mkfs.xfs -f /dev/sdf
      # mount /dev/sdf /opt -o usrquota
      # su nobody -s /bin/bash -c 'touch /opt/barf'
      # sync
      # xfs_quota -x -c 'report -ahi' /opt
      User quota on /opt (/dev/sdf)
                              Inodes
      User ID      Used   Soft   Hard Warn/Grace
      ---------- ---------------------------------
      root            3      0      0  00 [------]
      nobody          1      0      0  00 [------]
      
      # xfs_io -x -c 'shutdown' /opt
      # umount /opt
      # mount /dev/sdf /opt -o usrquota
      # touch /opt/man2
      # xfs_quota -x -c 'report -ahi' /opt
      User quota on /opt (/dev/sdf)
                              Inodes
      User ID      Used   Soft   Hard Warn/Grace
      ---------- ---------------------------------
      root            1      0      0  00 [------]
      nobody          1      0      0  00 [------]
      
      # umount /opt
      
      Notice how the initial quotacheck set the root dquot icount to 3
      (rootino, rbmino, rsumino), but after shutdown -> remount -> recovery,
      xfs_quota reports that the root dquot has only 1 icount.  We haven't
      deleted anything from the filesystem, which means that quota is now
      under-counting.  This behavior is not limited to icount or the root
      dquot, but this is the shortest reproducer.
      
      I traced the cause of this discrepancy to the way that we handle ondisk
      dquot updates during quotacheck vs. regular fs activity.  Normally, when
      we allocate a disk block for a dquot, we log the buffer as a regular
      (dquot) buffer.  Subsequent updates to the dquots backed by that block
      are done via separate dquot log item updates, which means that they
      depend on the logged buffer update being written to disk before the
      dquot items.  Because individual dquots have their own LSN fields, that
      initial dquot buffer must always be recovered.
      
      However, the story changes for quotacheck, which can cause dquot block
      allocations but persists the final dquot counter values via a delwri
      list.  Because recovery doesn't gate dquot buffer replay on an LSN, this
      means that the initial dquot buffer can be replayed over the (newer)
      contents that were delwritten at the end of quotacheck.  In effect, this
      re-initializes the dquot counters after they've been updated.  If the
      log does not contain any other dquot items to recover, the obsolete
      dquot contents will not be corrected by log recovery.
      
      Because quotacheck uses a transaction to log the setting of the CHKD
      flags in the superblock, we skip quotacheck during the second mount
      call, which allows the incorrect icount to remain.
      
      Fix this by changing the ondisk dquot initialization function to use
      ordered buffers to write out fresh dquot blocks if it detects that we're
      running quotacheck.  If the system goes down before quotacheck can
      complete, the CHKD flags will not be set in the superblock and the next
      mount will run quotacheck again, which can fix uninitialized dquot
      buffers.  This requires amending the defer code to maintaine ordered
      buffer state across defer rolls for the sake of the dquot allocation
      code.
      
      For regular operations we preserve the current behavior since the dquot
      items require properly initialized ondisk dquot records.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      78bba5c8
  4. 05 5月, 2020 6 次提交
  5. 27 8月, 2019 1 次提交
  6. 29 6月, 2019 1 次提交
  7. 30 4月, 2019 1 次提交
    • D
      xfs: always rejoin held resources during defer roll · 710d707d
      Darrick J. Wong 提交于
      During testing of xfs/141 on a V4 filesystem, I observed some
      inconsistent behavior with regards to resources that are held (i.e.
      remain locked) across a defer roll.  The transaction roll always gives
      the defer roll function a new transaction, even if committing the old
      transaction fails.  However, the defer roll function only rejoins the
      held resources if the transaction commit succeedied.  This means that
      callers of defer roll have to figure out whether the held resources are
      attached to the transaction being passed back.
      
      Worse yet, if the defer roll was part of a defer finish call, we have a
      third possibility: the defer finish could pass back a dirty transaction
      with dirty held resources and an error code.
      
      The only sane way to handle all of these scenarios is to require that
      the code that held the resource either cancel the transaction before
      unlocking and releasing the resources, or use functions that detach
      resources from a transaction properly (e.g.  xfs_trans_brelse) if they
      need to drop the reference before committing or cancelling the
      transaction.
      
      In order to make this so, change the defer roll code to join held
      resources to the new transaction unconditionally and fix all the bhold
      callers to release the held buffers correctly.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      710d707d
  8. 13 12月, 2018 2 次提交
  9. 03 8月, 2018 11 次提交
  10. 27 7月, 2018 6 次提交
    • B
      xfs: bypass final dfops roll in trans commit path · b277c37f
      Brian Foster 提交于
      Once xfs_defer_finish() has completed all deferred operations, it
      checks the dirty state of the transaction and rolls it once more to
      return a clean transaction for the caller. This primarily to cover
      the case where repeated xfs_defer_finish() calls are made in a loop
      and we need to make sure that the caller starts the next iteration
      with a clean transaction. Otherwise we risk transaction reservation
      overrun.
      
      This final transaction roll is not required in the transaction
      commit path, however, because the transaction is immediately
      committed and freed after dfops completion. Refactor the final roll
      into a separate helper such that we can avoid it in the transaction
      commit path.  Lift the dfops reset as well so dfops remains valid
      until after the last call to xfs_defer_trans_roll(). The reset is
      also unnecessary in the transaction commit path because the
      transaction is about to complete.
      
      This eliminates unnecessary regrants of transactions where the
      associated transaction roll can be replaced by a transaction commit.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NBill O'Donnell <billodo@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      b277c37f
    • B
      xfs: drop unnecessary xfs_defer_finish() dfops parameter · 9e28a242
      Brian Foster 提交于
      Every caller of xfs_defer_finish() now passes the transaction and
      its associated ->t_dfops. The xfs_defer_ops parameter is therefore
      no longer necessary and can be removed.
      
      Since most xfs_defer_finish() callers also have to consider
      xfs_defer_cancel() on error, update the latter to also receive the
      transaction for consistency. The log recovery code contains an
      outlier case that cancels a dfops directly without an available
      transaction. Retain an internal wrapper to support this outlier case
      for the time being.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NBill O'Donnell <billodo@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      9e28a242
    • B
      xfs: support embedded dfops in transaction · e021a2e5
      Brian Foster 提交于
      The dfops structure used by multi-transaction operations is
      typically stored on the stack and carried around by the associated
      transaction. The lifecycle of dfops does not quite match that of the
      transaction, but they are tightly related in that the former depends
      on the latter.
      
      The relationship of these objects is tight enough that we can avoid
      the cumbersome boilerplate code required in most cases to manage
      them separately by just embedding an xfs_defer_ops in the
      transaction itself. This means that a transaction allocation returns
      with an initialized dfops, a transaction commit finishes pending
      deferred items before the tx commit, a transaction cancel cancels
      the dfops before the transaction and a transaction dup operation
      transfers the current dfops state to the new transaction.
      
      The dup operation is slightly complicated by the fact that we can no
      longer just copy a dfops pointer from the old transaction to the new
      transaction. This is solved through a dfops move helper that
      transfers the pending items and other dfops state across the
      transactions. This also requires that transaction rolling code
      always refer to the transaction for the current dfops reference.
      
      Finally, to facilitate incremental conversion to the internal dfops
      and continue to support the current external dfops mode of
      operation, create the new ->t_dfops_internal field with a layer of
      indirection. On allocation, ->t_dfops points to the internal dfops.
      This state is overridden by callers who re-init a local dfops on the
      transaction. Once ->t_dfops is overridden, the external dfops
      reference is maintained as the transaction rolls.
      
      This patch adds the fundamental ability to support an internal
      dfops. All codepaths that perform deferred processing continue to
      override the internal dfops until they are converted over in
      subsequent patches.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NBill O'Donnell <billodo@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      e021a2e5
    • B
      xfs: reset dfops to initial state after finish · 509308b4
      Brian Foster 提交于
      xfs_defer_init() is currently used in two particular situations. The
      first and most obvious case is raw initialization of an
      xfs_defer_ops struct. The other case is partial reinit of
      xfs_defer_ops on reuse due to iteration.
      
      Most instances of the first case will be replaced by a single init
      of a dfops embedded in the transaction. Init calls are still
      technically required for the second case because the dfops may have
      low space mode enabled or have joined items that need to be reset
      before the dfops should be reused.
      
      Since the current dfops usage expects either a final transaction
      commit after xfs_defer_finish() or xfs_defer_init() if dfops is to
      be reused, we can shift some of the init logic into
      xfs_defer_finish() such that the latter returns with a reinitialized
      dfops. This eliminates the second dependency noted above such that a
      dfops is immediately ready for reuse after an xfs_defer_finish()
      without the need to change any calling code.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBill O'Donnell <billodo@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      509308b4
    • B
      xfs: remove unused deferred ops committed field · 83200bfa
      Brian Foster 提交于
      dop_committed is set when deferred item processing rolls the
      transaction at least once, but is only ever accessed in tracepoints.
      The transaction roll/commit events are already available via
      independent tracepoints, so remove the otherwise unused field.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBill O'Donnell <billodo@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      83200bfa
    • B
      xfs: make deferred processing safe for embedded dfops · 03f4e4b2
      Brian Foster 提交于
      xfs_defer_finish() has a couple quirks that are not safe with
      respect to the upcoming internal dfops functionality. First,
      xfs_defer_finish() attaches the passed in dfops structure to
      ->t_dfops and caches and restores the original value. Second, it
      continues to use the initial dfops reference before and after the
      transaction roll.
      
      These behaviors assume that dop is an independent memory allocation
      from the transaction itself, which may not always be true once
      transactions begin to use an embedded dfops structure. In the latter
      model, dfops processing creates a new xfs_defer_ops structure with
      each transaction and the associated state is migrated across to the
      new transaction.
      
      Fix up xfs_defer_finish() to handle the possibility of the current
      dfops changing after a transaction roll. Since ->t_dfops is used
      unconditionally in this path, it is no longer necessary to
      attach/restore ->t_dfops and pass it explicitly down to
      xfs_defer_trans_roll(). Update dop in the latter function and the
      caller to ensure that it always refers to the current dfops
      structure.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBill O'Donnell <billodo@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      03f4e4b2
  11. 24 7月, 2018 1 次提交
    • D
      xfs: return from _defer_finish with a clean transaction · 81b549aa
      Darrick J. Wong 提交于
      The following assertion was seen on generic/051:
      
      XFS: Assertion failed: tp->t_firstblock == NULLFSBLOCK, file: fs/xfs/libxfs5
      ------------[ cut here ]------------
      kernel BUG at fs/xfs/xfs_message.c:102!
      invalid opcode: 0000 [#1] SMP PTI
      CPU: 2 PID: 20757 Comm: fsstress Not tainted 4.18.0-rc4+ #3969
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/4
      RIP: 0010:assfail+0x23/0x30
      Code: c3 66 0f 1f 44 00 00 48 89 f1 41 89 d0 48 c7 c6 88 e0 8c 82 48 89 fa
      RSP: 0018:ffff88012dc43c08 EFLAGS: 00010202
      RAX: 0000000000000000 RBX: ffff88012dc43ca0 RCX: 0000000000000000
      RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff828480eb
      RBP: ffff88012aa92758 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: f000000000000000 R12: 0000000000000000
      R13: ffff88012dc43d48 R14: ffff88013092e7e8 R15: 0000000000000014
      FS:  00007f8d689b8e80(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f8d689c7000 CR3: 000000012ba6a000 CR4: 00000000000006e0
      Call Trace:
       xfs_defer_init+0xff/0x160
       xfs_reflink_remap_extent+0x31b/0xa00
       xfs_reflink_remap_blocks+0xec/0x4a0
       xfs_reflink_remap_range+0x3a1/0x650
       xfs_file_dedupe_range+0x39/0x50
       vfs_dedupe_file_range+0x218/0x260
       do_vfs_ioctl+0x262/0x6a0
       ? __se_sys_newfstat+0x3c/0x60
       ksys_ioctl+0x35/0x60
       __x64_sys_ioctl+0x11/0x20
       do_syscall_64+0x4b/0x190
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The root cause of the assertion failure is that xfs_defer_finish doesn't
      roll the transaction after processing all the deferred items.  Therefore
      it returns a dirty transaction to the caller, which leaves the caller at
      risk of exceeding the transaction reservation if it logs more items.
      
      Brian Foster's patchset to move the defer_ops firstblock into the
      transaction requires t_firstblock == NULLFSBLOCK upon defer_ops
      initialization, which is how this was noticed at all.
      Reported-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      81b549aa
  12. 12 7月, 2018 3 次提交
  13. 07 6月, 2018 1 次提交
    • D
      xfs: convert to SPDX license tags · 0b61f8a4
      Dave Chinner 提交于
      Remove the verbose license text from XFS files and replace them
      with SPDX tags. This does not change the license of any of the code,
      merely refers to the common, up-to-date license files in LICENSES/
      
      This change was mostly scripted. fs/xfs/Makefile and
      fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
      and modified by the following command:
      
      for f in `git grep -l "GNU General" fs/xfs/` ; do
      	echo $f
      	cat $f | awk -f hdr.awk > $f.new
      	mv -f $f.new $f
      done
      
      And the hdr.awk script that did the modification (including
      detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
      is as follows:
      
      $ cat hdr.awk
      BEGIN {
      	hdr = 1.0
      	tag = "GPL-2.0"
      	str = ""
      }
      
      /^ \* This program is free software/ {
      	hdr = 2.0;
      	next
      }
      
      /any later version./ {
      	tag = "GPL-2.0+"
      	next
      }
      
      /^ \*\// {
      	if (hdr > 0.0) {
      		print "// SPDX-License-Identifier: " tag
      		print str
      		print $0
      		str=""
      		hdr = 0.0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \* / {
      	if (hdr > 1.0)
      		next
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \*/ {
      	if (hdr > 0.0)
      		next
      	print $0
      	next
      }
      
      // {
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      }
      
      END { }
      $
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0b61f8a4
  14. 10 5月, 2018 2 次提交
  15. 15 12月, 2017 1 次提交
  16. 02 9月, 2017 1 次提交