1. 15 4月, 2019 1 次提交
  2. 21 11月, 2018 1 次提交
  3. 29 9月, 2018 2 次提交
    • B
      xfs: refactor xfs_buf_log_item reference count handling · 95808459
      Brian Foster 提交于
      The xfs_buf_log_item structure has a reference counter with slightly
      tricky semantics. In the common case, a buffer is logged and
      committed in a transaction, committed to the on-disk log (added to
      the AIL) and then finally written back and removed from the AIL. The
      bli refcount covers two potentially overlapping timeframes:
      
       1. the bli is held in an active transaction
       2. the bli is pinned by the log
      
      The caveat to this approach is that the reference counter does not
      purely dictate the lifetime of the bli. IOW, when a dirty buffer is
      physically logged and unpinned, the bli refcount may go to zero as
      the log item is inserted into the AIL. Only once the buffer is
      written back can the bli finally be freed.
      
      The above semantics means that it is not enough for the various
      refcount decrementing contexts to release the bli on decrement to
      zero. xfs_trans_brelse(), transaction commit (->iop_unlock()) and
      unpin (->iop_unpin()) must all drop the associated reference and
      make additional checks to determine if the current context is
      responsible for freeing the item.
      
      For example, if a transaction holds but does not dirty a particular
      bli, the commit may drop the refcount to zero. If the bli itself is
      clean, it is also not AIL resident and must be freed at this time.
      The same is true for xfs_trans_brelse(). If the transaction dirties
      a bli and then aborts or an unpin results in an abort due to a log
      I/O error, the last reference count holder is expected to explicitly
      remove the item from the AIL and release it (since an abort means
      filesystem shutdown and metadata writeback will never occur).
      
      This leads to fairly complex checks being replicated in a few
      different places. Since ->iop_unlock() and xfs_trans_brelse() are
      nearly identical, refactor the logic into a common helper that
      implements and documents the semantics in one place. This patch does
      not change behavior.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      95808459
    • B
      xfs: don't unlock invalidated buf on aborted tx commit · d9183105
      Brian Foster 提交于
      xfstests generic/388,475 occasionally reproduce assertion failures
      in xfs_buf_item_unpin() when the final bli reference is dropped on
      an invalidated buffer and the buffer is not locked as it is expected
      to be. Invalidated buffers should remain locked on transaction
      commit until the final unpin, at which point the buffer is removed
      from the AIL and the bli is freed since stale buffers are not
      written back.
      
      The assert failures are associated with filesystem shutdown,
      typically due to log I/O errors injected by the test. The
      problematic situation can occur if the shutdown happens to cause a
      race between an active transaction that has invalidated a particular
      buffer and an I/O error on a log buffer that contains the bli
      associated with the same (now stale) buffer.
      
      Both transaction and log contexts acquire a bli reference. If the
      transaction has already invalidated the buffer by the time the I/O
      error occurs and ends up aborting due to shutdown, the transaction
      and log hold the last two references to a stale bli. If the
      transaction cancel occurs first, it treats the buffer as non-stale
      due to the aborted state: the bli reference is dropped and the
      buffer is released/unlocked. The log buffer I/O error handling
      eventually calls into xfs_buf_item_unpin(), drops the final
      reference to the bli and treats it as stale. The buffer wasn't left
      locked by xfs_buf_item_unlock(), however, so the assert fails and
      the buffer is double unlocked. The latter problem is mitigated by
      the fact that the fs is shutdown and no further damage is possible.
      
      ->iop_unlock() of an invalidated buffer should behave consistently
      with respect to the bli refcount, regardless of aborted state. If
      the refcount remains elevated on commit, we know the bli is awaiting
      an unpin (since it can't be in another transaction) and will be
      handled appropriately on log buffer completion. If the final bli
      reference of an invalidated buffer is dropped in ->iop_unlock(), we
      can assume the transaction has aborted because invalidation implies
      a dirty transaction. In the non-abort case, the log would have
      acquired a bli reference in ->iop_pin() and prevented bli release at
      ->iop_unlock() time. In the abort case the item must be freed and
      buffer unlocked because it wasn't pinned by the log.
      
      Rework xfs_buf_item_unlock() to simplify the currently circuitous
      and duplicate logic and leave invalidated buffers locked based on
      bli refcount, regardless of aborted state. This ensures that a
      pinned, stale buffer is always found locked when eventually
      unpinned.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      d9183105
  4. 09 6月, 2018 1 次提交
  5. 07 6月, 2018 1 次提交
    • D
      xfs: convert to SPDX license tags · 0b61f8a4
      Dave Chinner 提交于
      Remove the verbose license text from XFS files and replace them
      with SPDX tags. This does not change the license of any of the code,
      merely refers to the common, up-to-date license files in LICENSES/
      
      This change was mostly scripted. fs/xfs/Makefile and
      fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
      and modified by the following command:
      
      for f in `git grep -l "GNU General" fs/xfs/` ; do
      	echo $f
      	cat $f | awk -f hdr.awk > $f.new
      	mv -f $f.new $f
      done
      
      And the hdr.awk script that did the modification (including
      detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
      is as follows:
      
      $ cat hdr.awk
      BEGIN {
      	hdr = 1.0
      	tag = "GPL-2.0"
      	str = ""
      }
      
      /^ \* This program is free software/ {
      	hdr = 2.0;
      	next
      }
      
      /any later version./ {
      	tag = "GPL-2.0+"
      	next
      }
      
      /^ \*\// {
      	if (hdr > 0.0) {
      		print "// SPDX-License-Identifier: " tag
      		print str
      		print $0
      		str=""
      		hdr = 0.0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \* / {
      	if (hdr > 1.0)
      		next
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \*/ {
      	if (hdr > 0.0)
      		next
      	print $0
      	next
      }
      
      // {
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      }
      
      END { }
      $
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0b61f8a4
  6. 10 5月, 2018 3 次提交
  7. 12 3月, 2018 1 次提交
  8. 29 1月, 2018 3 次提交
  9. 02 9月, 2017 4 次提交
  10. 23 8月, 2017 2 次提交
  11. 19 6月, 2017 1 次提交
    • B
      xfs: remove bli from AIL before release on transaction abort · 3d4b4a3e
      Brian Foster 提交于
      When a buffer is modified, logged and committed, it ultimately ends
      up sitting on the AIL with a dirty bli waiting for metadata
      writeback. If another transaction locks and invalidates the buffer
      (freeing an inode chunk, for example) in the meantime, the bli is
      flagged as stale, the dirty state is cleared and the bli remains in
      the AIL.
      
      If a shutdown occurs before the transaction that has invalidated the
      buffer is committed, the transaction is ultimately aborted. The log
      items are flagged as such and ->iop_unlock() handles the aborted
      items. Because the bli is clean (due to the invalidation),
      ->iop_unlock() unconditionally releases it. The log item may still
      reside in the AIL, however, which means the I/O completion handler
      may still run and attempt to access it. This results in assert
      failure due to the release of the bli while still present in the AIL
      and a subsequent NULL dereference and panic in the buffer I/O
      completion handling. This can be reproduced by running generic/388
      in repetition.
      
      To avoid this problem, update xfs_buf_item_unlock() to first check
      whether the bli is aborted and if so, remove it from the AIL before
      it is released. This ensures that the bli is no longer accessed
      during the shutdown sequence after it has been freed.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      3d4b4a3e
  12. 04 2月, 2017 1 次提交
  13. 14 9月, 2016 2 次提交
    • E
      xfs: normalize "infinite" retries in error configs · 77169812
      Eric Sandeen 提交于
      As it stands today, the "fail immediately" vs. "retry forever"
      values for max_retries and retry_timeout_seconds in the xfs metadata
      error configurations are not consistent.
      
      A retry_timeout_seconds of 0 means "retry forever," but a
      max_retries of 0 means "fail immediately."
      
      retry_timeout_seconds < 0 is disallowed, while max_retries == -1
      means "retry forever."
      
      Make this consistent across the error configs, such that a value of
      0 means "fail immediately" (i.e. wait 0 seconds, or retry 0 times),
      and a value of -1 always means "retry forever."
      
      This makes retry_timeout a signed long to accommodate the -1, even
      though it stores jiffies.  Given our limit of a 1 day maximum
      timeout, this should be sufficient even at much higher HZ values
      than we have available today.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      77169812
    • X
      xfs: fix signed integer overflow · 79c350e4
      Xie XiuQi 提交于
      Use 1U for unsigned int to avoid a overflow warning from UBSAN.
      
      [   31.910858] UBSAN: Undefined behaviour in fs/xfs/xfs_buf_item.c:889:25
      [   31.911252] signed integer overflow:
      [   31.911478] -2147483648 - 1 cannot be represented in type 'int'
      [   31.911846] CPU: 1 PID: 1011 Comm: tuned Tainted: G    B          ---- -------   3.10.0-327.28.3.el7.x86_64 #1
      [   31.911857] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 01/07/2011
      [   31.911866]  1ffff1004069cd3b 0000000076bec3fd ffff8802034e69a0 ffffffff81ee3140
      [   31.911883]  ffff8802034e69b8 ffffffff81ee31fd ffffffffa0ad79e0 ffff8802034e6b20
      [   31.911898]  ffffffff81ee46e2 0000002d515470c0 0000000000000001 0000000041b58ab3
      [   31.911913] Call Trace:
      [   31.911932]  [<ffffffff81ee3140>] dump_stack+0x1e/0x20
      [   31.911947]  [<ffffffff81ee31fd>] ubsan_epilogue+0x12/0x55
      [   31.911964]  [<ffffffff81ee46e2>] handle_overflow+0x1ba/0x215
      [   31.912083]  [<ffffffff81ee4798>] __ubsan_handle_sub_overflow+0x2a/0x31
      [   31.912204]  [<ffffffffa08676fb>] xfs_buf_item_log+0x34b/0x3f0 [xfs]
      [   31.912314]  [<ffffffffa0880490>] xfs_trans_log_buf+0x120/0x260 [xfs]
      [   31.912402]  [<ffffffffa079a890>] xfs_btree_log_recs+0x80/0xc0 [xfs]
      [   31.912490]  [<ffffffffa07a29f8>] xfs_btree_delrec+0x11a8/0x2d50 [xfs]
      [   31.913589]  [<ffffffffa07a86f9>] xfs_btree_delete+0xc9/0x260 [xfs]
      [   31.913762]  [<ffffffffa075b5cf>] xfs_free_ag_extent+0x63f/0xe20 [xfs]
      [   31.914339]  [<ffffffffa075ec0f>] xfs_free_extent+0x2af/0x3e0 [xfs]
      [   31.914641]  [<ffffffffa0801b2b>] xfs_bmap_finish+0x32b/0x4b0 [xfs]
      [   31.914841]  [<ffffffffa083c2e7>] xfs_itruncate_extents+0x3b7/0x740 [xfs]
      [   31.915216]  [<ffffffffa08342fa>] xfs_setattr_size+0x60a/0x860 [xfs]
      [   31.915471]  [<ffffffffa08345ea>] xfs_vn_setattr+0x9a/0xe0 [xfs]
      [   31.915590]  [<ffffffff8149ad38>] notify_change+0x5c8/0x8a0
      [   31.915607]  [<ffffffff81450f22>] do_truncate+0x122/0x1d0
      [   31.915640]  [<ffffffff8147beee>] do_last+0x15de/0x2c80
      [   31.915707]  [<ffffffff8147d777>] path_openat+0x1e7/0xcc0
      [   31.915802]  [<ffffffff81480824>] do_filp_open+0xa4/0x160
      [   31.915848]  [<ffffffff81453127>] do_sys_open+0x1b7/0x3f0
      [   31.915879]  [<ffffffff81453392>] SyS_open+0x32/0x40
      [   31.915897]  [<ffffffff81f08989>] system_call_fastpath+0x16/0x1b
      
      [  240.086809] UBSAN: Undefined behaviour in fs/xfs/xfs_buf_item.c:866:34
      [  240.086820] signed integer overflow:
      [  240.086830] -2147483648 - 1 cannot be represented in type 'int'
      [  240.086846] CPU: 1 PID: 12969 Comm: rm Tainted: G    B          ---- -------   3.10.0-327.28.3.el7.x86_64 #1
      [  240.086857] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 01/07/2011
      [  240.086868]  1ffff10040491def 00000000e2ea59c1 ffff88020248ef40 ffffffff81ee3140
      [  240.086885]  ffff88020248ef58 ffffffff81ee31fd ffffffffa0ad79e0 ffff88020248f0c0
      [  240.086901]  ffffffff81ee46e2 0000002d02488000 0000000000000001 0000000041b58ab3
      [  240.086915] Call Trace:
      [  240.086938]  [<ffffffff81ee3140>] dump_stack+0x1e/0x20
      [  240.086953]  [<ffffffff81ee31fd>] ubsan_epilogue+0x12/0x55
      [  240.086971]  [<ffffffff81ee46e2>] handle_overflow+0x1ba/0x215
      ...
      Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      79c350e4
  14. 22 7月, 2016 1 次提交
    • D
      xfs: allocate log vector buffers outside CIL context lock · b1c5ebb2
      Dave Chinner 提交于
      One of the problems we currently have with delayed logging is that
      under serious memory pressure we can deadlock memory reclaim. THis
      occurs when memory reclaim (such as run by kswapd) is reclaiming XFS
      inodes and issues a log force to unpin inodes that are dirty in the
      CIL.
      
      The CIL is pushed, but this will only occur once it gets the CIL
      context lock to ensure that all committing transactions are complete
      and no new transactions start being committed to the CIL while the
      push switches to a new context.
      
      The deadlock occurs when the CIL context lock is held by a
      committing process that is doing memory allocation for log vector
      buffers, and that allocation is then blocked on memory reclaim
      making progress. Memory reclaim, however, is blocked waiting for
      a log force to make progress, and so we effectively deadlock at this
      point.
      
      To solve this problem, we have to move the CIL log vector buffer
      allocation outside of the context lock so that memory reclaim can
      always make progress when it needs to force the log. The problem
      with doing this is that a CIL push can take place while we are
      determining if we need to allocate a new log vector buffer for
      an item and hence the current log vector may go away without
      warning. That means we canot rely on the existing log vector being
      present when we finally grab the context lock and so we must have a
      replacement buffer ready to go at all times.
      
      To ensure this, introduce a "shadow log vector" buffer that is
      always guaranteed to be present when we gain the CIL context lock
      and format the item. This shadow buffer may or may not be used
      during the formatting, but if the log item does not have an existing
      log vector buffer or that buffer is too small for the new
      modifications, we swap it for the new shadow buffer and format
      the modifications into that new log vector buffer.
      
      The result of this is that for any object we modify more than once
      in a given CIL checkpoint, we double the memory required
      to track dirty regions in the log. For single modifications then
      we consume the shadow log vectorwe allocate on commit, and that gets
      consumed by the checkpoint. However, if we make multiple
      modifications, then the second transaction commit will allocate a
      shadow log vector and hence we will end up with double the memory
      usage as only one of the log vectors is consumed by the CIL
      checkpoint. The remaining shadow vector will be freed when th elog
      item is freed.
      
      This can probably be optimised in future - access to the shadow log
      vector is serialised by the object lock (as opposited to the active
      log vector, which is controlled by the CIL context lock) and so we
      can probably free shadow log vector from some objects when the log
      item is marked clean on removal from the AIL.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      b1c5ebb2
  15. 20 7月, 2016 2 次提交
  16. 01 6月, 2016 1 次提交
    • B
      xfs: fix broken multi-fsb buffer logging · a3916e52
      Brian Foster 提交于
      Multi-block buffers are logged based on buffer offset in
      xfs_trans_log_buf(). xfs_buf_item_log() ultimately walks each mapping in
      the buffer and marks the associated range to be logged in the
      xfs_buf_log_format bitmap for that mapping. This code is broken,
      however, in that it marks the actual buffer offsets of the associated
      range in each bitmap rather than shifting to the byte range for that
      particular mapping.
      
      For example, on a 4k fsb fs, buffer offset 4096 refers to the first byte
      of the second mapping in the buffer. This means byte 0 of the second log
      format bitmap should be tagged as dirty. Instead, the current code marks
      byte offset 4096 of the second log format bitmap, which is invalid and
      potentially out of range of the mapping.
      
      As a result of this, the log item format code invoked at transaction
      commit time is not be able to correctly identify what parts of the
      buffer to copy into log vectors. This can lead to NULL log vector
      pointer dereferences in CIL push context if the item format code was not
      able to locate any dirty ranges at all. This crash has been reproduced
      on a 4k FSB filesystem using 16k directory blocks where an unlink
      operation happened not to log anything in the first block of the
      mapping. The logged offsets were all over 4k, marked as such in the
      subsequent log format mappings, and thus left the transaction with an
      xfs_log_item that is marked DIRTY but without any logged regions.
      
      Further, even when the logged regions are marked correctly in the buffer
      log format bitmaps, the format code doesn't copy the correct ranges of
      the buffer into the log. This means that any logged region beyond the
      first block of a multi-block buffer is subject to corruption after a
      crash and log recovery sequence. This is due to a failure to convert the
      mapping bm_len field from basic blocks to bytes in the buffer offset
      tracking code in xfs_buf_item_format().
      
      Update xfs_buf_item_log() to convert buffer offsets to segment relative
      offsets when logging multi-block buffers. This ensures that the modified
      regions of a buffer are logged correctly and avoids the aforementioned
      crash. Also update xfs_buf_item_format() to correctly track the source
      offset into the buffer for the log vector formatting code. This ensures
      that the correct data is copied into the log.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      
      a3916e52
  17. 18 5月, 2016 3 次提交
  18. 10 2月, 2016 3 次提交
  19. 25 8月, 2015 1 次提交
  20. 19 8月, 2015 1 次提交
  21. 24 2月, 2015 1 次提交
  22. 22 1月, 2015 1 次提交
  23. 24 12月, 2014 1 次提交
  24. 28 11月, 2014 1 次提交
  25. 02 10月, 2014 1 次提交
    • D
      xfs: introduce xfs_buf_submit[_wait] · 595bff75
      Dave Chinner 提交于
      There is a lot of cookie-cutter code that looks like:
      
      	if (shutdown)
      		handle buffer error
      	xfs_buf_iorequest(bp)
      	error = xfs_buf_iowait(bp)
      	if (error)
      		handle buffer error
      
      spread through XFS. There's significant complexity now in
      xfs_buf_iorequest() to specifically handle this sort of synchronous
      IO pattern, but there's all sorts of nasty surprises in different
      error handling code dependent on who owns the buffer references and
      the locks.
      
      Pull this pattern into a single helper, where we can hide all the
      synchronous IO warts and hence make the error handling for all the
      callers much saner. This removes the need for a special extra
      reference to protect IO completion processing, as we can now hold a
      single reference across dispatch and waiting, simplifying the sync
      IO smeantics and error handling.
      
      In doing this, also rename xfs_buf_iorequest to xfs_buf_submit and
      make it explicitly handle on asynchronous IO. This forces all users
      to be switched specifically to one interface or the other and
      removes any ambiguity between how the interfaces are to be used. It
      also means that xfs_buf_iowait() goes away.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      595bff75