1. 27 3月, 2020 2 次提交
  2. 23 3月, 2020 3 次提交
  3. 11 11月, 2019 1 次提交
  4. 22 10月, 2019 1 次提交
  5. 27 8月, 2019 1 次提交
  6. 29 6月, 2019 6 次提交
  7. 15 4月, 2019 1 次提交
    • B
      xfs: wake commit waiters on CIL abort before log item abort · 545aa41f
      Brian Foster 提交于
      XFS shutdown deadlocks have been reproduced by fstest generic/475.
      The deadlock signature involves log I/O completion running error
      handling to abort logged items and waiting for an inode cluster
      buffer lock in the buffer item unpin handler. The buffer lock is
      held by xfsaild attempting to flush an inode. The buffer happens to
      be pinned and so xfs_iflush() triggers an async log force to begin
      work required to get it unpinned. The log force is blocked waiting
      on the commit completion, which never occurs and thus leaves the
      filesystem deadlocked.
      
      The root problem is that aborted log I/O completion pots commit
      completion behind callback completion, which is unexpected for async
      log forces. Under normal running conditions, an async log force
      returns to the caller once the CIL ctx has been formatted/submitted
      and the commit completion event triggered at the tail end of
      xlog_cil_push(). If the filesystem has shutdown, however, we rely on
      xlog_cil_committed() to trigger the completion event and it happens
      to do so after running log item unpin callbacks. This makes it
      unsafe to invoke an async log force from contexts that hold locks
      that might also be required in log completion processing.
      
      To address this problem, wake commit completion waiters before
      aborting log items in the log I/O completion handler. This ensures
      that an async log force will not deadlock on held locks if the
      filesystem happens to shutdown. Note that it is still unsafe to
      issue a sync log force while holding such locks because a sync log
      force explicitly waits on the force completion, which occurs after
      log I/O completion processing.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      545aa41f
  8. 07 6月, 2018 1 次提交
    • D
      xfs: convert to SPDX license tags · 0b61f8a4
      Dave Chinner 提交于
      Remove the verbose license text from XFS files and replace them
      with SPDX tags. This does not change the license of any of the code,
      merely refers to the common, up-to-date license files in LICENSES/
      
      This change was mostly scripted. fs/xfs/Makefile and
      fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
      and modified by the following command:
      
      for f in `git grep -l "GNU General" fs/xfs/` ; do
      	echo $f
      	cat $f | awk -f hdr.awk > $f.new
      	mv -f $f.new $f
      done
      
      And the hdr.awk script that did the modification (including
      detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
      is as follows:
      
      $ cat hdr.awk
      BEGIN {
      	hdr = 1.0
      	tag = "GPL-2.0"
      	str = ""
      }
      
      /^ \* This program is free software/ {
      	hdr = 2.0;
      	next
      }
      
      /any later version./ {
      	tag = "GPL-2.0+"
      	next
      }
      
      /^ \*\// {
      	if (hdr > 0.0) {
      		print "// SPDX-License-Identifier: " tag
      		print str
      		print $0
      		str=""
      		hdr = 0.0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \* / {
      	if (hdr > 1.0)
      		next
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \*/ {
      	if (hdr > 0.0)
      		next
      	print $0
      	next
      }
      
      // {
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      }
      
      END { }
      $
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0b61f8a4
  9. 10 5月, 2018 2 次提交
  10. 10 4月, 2018 1 次提交
  11. 12 3月, 2018 1 次提交
    • D
      xfs: fall back to vmalloc when allocation log vector buffers · cb0a8d23
      Dave Chinner 提交于
      When using large directory blocks, we regularly see memory
      allocations of >64k being made for the shadow log vector buffer.
      When we are under memory pressure, kmalloc() may not be able to find
      contiguous memory chunks large enough to satisfy these allocations
      easily, and if memory is fragmented we can potentially stall here.
      
      TO avoid this problem, switch the log vector buffer allocation to
      use kmem_alloc_large(). This will allow failed allocations to fall
      back to vmalloc and so remove the dependency on large contiguous
      regions of memory being available. This should prevent slowdowns
      and potential stalls when memory is low and/or fragmented.
      Signed-Off-By: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      cb0a8d23
  12. 05 8月, 2017 1 次提交
  13. 19 6月, 2017 4 次提交
    • S
      xfs: remove lsn relevant fields from xfs_trans structure and its users · f990fc5a
      Shan Hai 提交于
      The t_lsn is not used anymore and the t_commit_lsn is used as a tmp
      storage for the checkpoint sequence number only in the current code.
      
      And the start/commit lsn are tracked as a transaction group tag in
      the xfs_cil_ctx instead of a single transaction, so remove them from
      the xfs_trans structure and their users to match with the design.
      Signed-off-by: NShan Hai <shan.hai@oracle.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      f990fc5a
    • B
      xfs: dump transaction usage details on log reservation overrun · d4ca1d55
      Brian Foster 提交于
      If a transaction log reservation overrun occurs, the ticket data
      associated with the reservation is dumped in xfs_log_commit_cil().
      This occurs long after the transaction items and details have been
      removed from the transaction and effectively lost. This limited set
      of ticket data provides very little information to support debugging
      transaction overruns based on the typical report.
      
      To improve transaction log reservation overrun reporting, create a
      helper to dump transaction details such as log items, log vector
      data, etc., as well as the underlying ticket data for the
      transaction. Move the overrun detection from xfs_log_commit_cil() to
      xlog_cil_insert_items() so it occurs prior to migration of the
      logged items to the CIL. Call the new helper such that it is able to
      dump this transaction data before it is lost.
      
      Also, warn on overrun to provide callstack context for the offending
      transaction and include a few additional messages from
      xlog_cil_insert_items() to display the reservation consumed locally
      for overhead such as log vector headers, split region headers and
      the context ticket. This provides a complete general breakdown of
      the reservation consumption of a transaction when/if it happens to
      overrun the reservation.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      d4ca1d55
    • B
      xfs: refactor xlog_cil_insert_items() to facilitate transaction dump · e2f23426
      Brian Foster 提交于
      Transaction reservation overrun detection currently occurs too late
      to print useful information about the offending transaction.
      Ideally, the transaction data is printed before the associated log
      items are moved from the transaction to the CIL, which occurs in
      xlog_cil_insert_items(), such that details of the items logged by
      the transaction are available for analysis.
      
      Refactor xlog_cil_insert_items() to facilitate moving tx overrun
      detection to this function. Update the function to track each bit of
      extra log reservation stolen from the transaction (i.e., such as for
      the CIL context ticket) and perform the log item migration as the
      last operation before the CIL lock is released. This creates a
      context where the transaction reservation consumption has been fully
      calculated when the log items are moved to the CIL. This patch makes
      no functional changes.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      e2f23426
    • B
      xfs: separate shutdown from ticket reservation print helper · 7d2d5653
      Brian Foster 提交于
      xlog_print_tic_res() pre-dates delayed logging and the committed
      items list (CIL) and thus retains some factoring warts, such as hard
      coded function names in the output and the fact that it induces a
      shutdown.
      
      In preparation for more detailed logging of regular transaction
      overrun situations, refactor xlog_print_tic_res() to be slightly
      more generic. Reword some of the warning messages and pull the
      shutdown into the callers.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      7d2d5653
  14. 10 2月, 2017 1 次提交
  15. 22 7月, 2016 1 次提交
    • D
      xfs: allocate log vector buffers outside CIL context lock · b1c5ebb2
      Dave Chinner 提交于
      One of the problems we currently have with delayed logging is that
      under serious memory pressure we can deadlock memory reclaim. THis
      occurs when memory reclaim (such as run by kswapd) is reclaiming XFS
      inodes and issues a log force to unpin inodes that are dirty in the
      CIL.
      
      The CIL is pushed, but this will only occur once it gets the CIL
      context lock to ensure that all committing transactions are complete
      and no new transactions start being committed to the CIL while the
      push switches to a new context.
      
      The deadlock occurs when the CIL context lock is held by a
      committing process that is doing memory allocation for log vector
      buffers, and that allocation is then blocked on memory reclaim
      making progress. Memory reclaim, however, is blocked waiting for
      a log force to make progress, and so we effectively deadlock at this
      point.
      
      To solve this problem, we have to move the CIL log vector buffer
      allocation outside of the context lock so that memory reclaim can
      always make progress when it needs to force the log. The problem
      with doing this is that a CIL push can take place while we are
      determining if we need to allocate a new log vector buffer for
      an item and hence the current log vector may go away without
      warning. That means we canot rely on the existing log vector being
      present when we finally grab the context lock and so we must have a
      replacement buffer ready to go at all times.
      
      To ensure this, introduce a "shadow log vector" buffer that is
      always guaranteed to be present when we gain the CIL context lock
      and format the item. This shadow buffer may or may not be used
      during the formatting, but if the log item does not have an existing
      log vector buffer or that buffer is too small for the new
      modifications, we swap it for the new shadow buffer and format
      the modifications into that new log vector buffer.
      
      The result of this is that for any object we modify more than once
      in a given CIL checkpoint, we double the memory required
      to track dirty regions in the log. For single modifications then
      we consume the shadow log vectorwe allocate on commit, and that gets
      consumed by the checkpoint. However, if we make multiple
      modifications, then the second transaction commit will allocate a
      shadow log vector and hence we will end up with double the memory
      usage as only one of the log vectors is consumed by the CIL
      checkpoint. The remaining shadow vector will be freed when th elog
      item is freed.
      
      This can probably be optimised in future - access to the shadow log
      vector is serialised by the object lock (as opposited to the active
      log vector, which is controlled by the CIL context lock) and so we
      can probably free shadow log vector from some objects when the log
      item is marked clean on removal from the AIL.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      b1c5ebb2
  16. 06 4月, 2016 1 次提交
  17. 29 7月, 2015 1 次提交
    • B
      xfs: close xc_cil list_empty() races with cil commit sequence · 4703da7b
      Brian Foster 提交于
      We have seen somewhat rare reports of the following assert from
      xlog_cil_push_background() failing during ltp tests or somewhat
      innocuous desktop root fs workloads (e.g., virt operations, initramfs
      construction):
      
          ASSERT(!list_empty(&cil->xc_cil));
      
      The reasoning behind the assert is that the transaction has inserted
      items to the CIL and hit background push codepath all with
      cil->xc_ctx_lock held for reading. This locks out background commit from
      emptying the CIL, which acquires the lock for writing. Therefore, the
      reasoning is that the items previously inserted in the CIL should still
      be present.
      
      The cil->xc_ctx_lock read lock is not sufficient to protect the xc_cil
      list, however, due to how CIL insertion is handled.
      xlog_cil_insert_items() inserts and reorders the dirty transaction items
      to the tail of the CIL under xc_cil_lock. It uses list_move_tail() to
      achieve insertion and reordering in the same block of code. This
      function removes and reinserts an item to the tail of the list. If a
      transaction commits an item that was already logged and thus already
      resides in the CIL, and said item is the sole item on the list, the
      removal and reinsertion creates a temporary state where the list is
      actually empty.
      
      This state is not valid and thus should never be observed by concurrent
      transaction commit-side checks in the circumstances outlined above. We
      do not want to acquire the xc_cil_lock in all of these instances as it
      was previously removed and replaced with a separate push lock for
      performance reasons. Therefore, close any races with list_empty() on the
      insertion side by ensuring that the list is never in a transient empty
      state.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      4703da7b
  18. 04 6月, 2015 3 次提交
  19. 28 11月, 2014 2 次提交
  20. 23 9月, 2014 1 次提交
    • D
      xfs: xlog_cil_force_lsn doesn't always wait correctly · 8af3dcd3
      Dave Chinner 提交于
      When running a tight mount/unmount loop on an older kernel, RedHat
      QE found that unmount would occasionally hang in
      xfs_buf_unpin_wait() on the superblock buffer. Tracing and other
      debug work by Eric Sandeen indicated that it was hanging on the
      writing of the superblock during unmount immediately after logging
      the superblock counters in a synchronous transaction. Further debug
      indicated that the synchronous transaction was not waiting for
      completion correctly, and we narrowed it down to
      xlog_cil_force_lsn() returning NULLCOMMITLSN and hence not pushing
      the transaction in the iclog buffer to disk correctly.
      
      While this unmount superblock write code is now very different in
      mainline kernels, the xlog_cil_force_lsn() code is identical, and it
      was bisected to the backport of commit f876e446 ("xfs: always do log
      forces via the workqueue"). This commit made the CIL push
      asynchronous for log forces and hence exposed a race condition that
      couldn't occur on a synchronous push.
      
      Essentially, the xlog_cil_force_lsn() relied implicitly on the fact
      that the sequence push would be complete by the time
      xlog_cil_push_now() returned, resulting in the context being pushed
      being in the committing list. When it was made asynchronous, it was
      recognised that there was a race condition in detecting whether an
      asynchronous push has started or not and code was added to handle
      it.
      
      Unfortunately, the fix was not quite right and left a race condition
      where it it would detect an empty CIL while a push was in progress
      before the context had been added to the committing list. This was
      incorrectly seen as a "nothing to do" condition and so would tell
      xfs_log_force_lsn() that there is nothing to wait for, and hence it
      would push the iclogbufs in memory.
      
      The fix is simple, but explaining the logic and the race condition
      is a lot more complex. The fix is to add the context to the
      committing list before we start emptying the CIL. This allows us to
      detect the difference between an empty "do nothing" push and a push
      that has not started by adding a discrete "emptying the CIL" state
      to avoid the transient, incorrect "empty" condition that the
      (unchanged) waiting code was seeing.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      8af3dcd3
  21. 24 7月, 2014 1 次提交
    • M
      xfs: fix cil push sequence after log recovery · 5c18717e
      Mark Tinguely 提交于
      When the CIL checkpoint is fully written to the log, the LSN of the checkpoint
      commit record is written into the CIL context structure. This allows log force
      waiters to correctly detect when the checkpoint they are waiting on have been
      fully written into the log buffers.
      
      However, the initial context after mount is initialised with a non-zero commit
      LSN, so appears to waiters as though it is complete even though it may not have
      even been pushed, let alone written to the log buffers. Hence a log force
      immediately after a filesystem is mounted may not behave correctly, nor does
      commit record ordering if multiple CIL pushes interleave immediately after
      mount.
      
      To fix this, make sure the initial context commit LSN is not touched until the
      first checkpointis actually pushed.
      
      [dchinner: rewrite commit message]
      Signed-off-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      5c18717e
  22. 25 6月, 2014 1 次提交
    • D
      xfs: global error sign conversion · 2451337d
      Dave Chinner 提交于
      Convert all the errors the core XFs code to negative error signs
      like the rest of the kernel and remove all the sign conversion we
      do in the interface layers.
      
      Errors for conversion (and comparison) found via searches like:
      
      $ git grep " E" fs/xfs
      $ git grep "return E" fs/xfs
      $ git grep " E[A-Z].*;$" fs/xfs
      
      Negation points found via searches like:
      
      $ git grep "= -[a-z,A-Z]" fs/xfs
      $ git grep "return -[a-z,A-D,F-Z]" fs/xfs
      $ git grep " -[a-z].*;" fs/xfs
      
      [ with some bits I missed from Brian Foster ]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      2451337d
  23. 22 6月, 2014 1 次提交
  24. 20 5月, 2014 1 次提交
    • D
      xfs: log vector rounding leaks log space · 110dc24a
      Dave Chinner 提交于
      The addition of direct formatting of log items into the CIL
      linear buffer added alignment restrictions that the start of each
      vector needed to be 64 bit aligned. Hence padding was added in
      xlog_finish_iovec() to round up the vector length to ensure the next
      vector started with the correct alignment.
      
      This adds a small number of bytes to the size of
      the linear buffer that is otherwise unused. The issue is that we
      then use the linear buffer size to determine the log space used by
      the log item, and this includes the unused space. Hence when we
      account for space used by the log item, it's more than is actually
      written into the iclogs, and hence we slowly leak this space.
      
      This results on log hangs when reserving space, with threads getting
      stuck with these stack traces:
      
      Call Trace:
      [<ffffffff81d15989>] schedule+0x29/0x70
      [<ffffffff8150d3a2>] xlog_grant_head_wait+0xa2/0x1a0
      [<ffffffff8150d55d>] xlog_grant_head_check+0xbd/0x140
      [<ffffffff8150ee33>] xfs_log_reserve+0x103/0x220
      [<ffffffff814b7f05>] xfs_trans_reserve+0x2f5/0x310
      .....
      
      The 4 bytes is significant. Brain Foster did all the hard work in
      tracking down a reproducable leak to inode chunk allocation (it went
      away with the ikeep mount option). His rough numbers were that
      creating 50,000 inodes leaked 11 log blocks. This turns out to be
      roughly 800 inode chunks or 1600 inode cluster buffers. That
      works out at roughly 4 bytes per cluster buffer logged, and at that
      I started looking for a 4 byte leak in the buffer logging code.
      
      What I found was that a struct xfs_buf_log_format structure for an
      inode cluster buffer is 28 bytes in length. This gets rounded up to
      32 bytes, but the vector length remains 28 bytes. Hence the CIL
      ticket reservation is decremented by 32 bytes (via lv->lv_buf_len)
      for that vector rather than 28 bytes which are written into the log.
      
      The fix for this problem is to separately track the bytes used by
      the log vectors in the item and use that instead of the buffer
      length when accounting for the log space that will be used by the
      formatted log item.
      
      Again, thanks to Brian Foster for doing all the hard work and long
      hours to isolate this leak and make finding the bug relatively
      simple.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      110dc24a
  25. 07 5月, 2014 1 次提交
    • D
      xfs: don't sleep in xlog_cil_force_lsn on shutdown · ac983517
      Dave Chinner 提交于
      Reports of a shutdown hang when fsyncing a directory have surfaced,
      such as this:
      
      [ 3663.394472] Call Trace:
      [ 3663.397199]  [<ffffffff815f1889>] schedule+0x29/0x70
      [ 3663.402743]  [<ffffffffa01feda5>] xlog_cil_force_lsn+0x185/0x1a0 [xfs]
      [ 3663.416249]  [<ffffffffa01fd3af>] _xfs_log_force_lsn+0x6f/0x2f0 [xfs]
      [ 3663.429271]  [<ffffffffa01a339d>] xfs_dir_fsync+0x7d/0xe0 [xfs]
      [ 3663.435873]  [<ffffffff811df8c5>] do_fsync+0x65/0xa0
      [ 3663.441408]  [<ffffffff811dfbc0>] SyS_fsync+0x10/0x20
      [ 3663.447043]  [<ffffffff815fc7d9>] system_call_fastpath+0x16/0x1b
      
      If we trigger a shutdown in xlog_cil_push() from xlog_write(), we
      will never wake waiters on the current push sequence number, so
      anything waiting in xlog_cil_force_lsn() for that push sequence
      number to come up will not get woken and hence stall the shutdown.
      
      Fix this by ensuring we call wake_up_all(&cil->xc_commit_wait) in
      the push abort handling, in the log shutdown code when waking all
      waiters, and adding a shutdown check in the sequence completion wait
      loops to ensure they abort when a wakeup due to a shutdown occurs.
      Reported-by: NBoris Ranto <branto@redhat.com>
      Reported-by: NEric Sandeen <esandeen@redhat.com>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      ac983517