1. 23 8月, 2017 1 次提交
  2. 19 6月, 2017 1 次提交
    • B
      xfs: remove bli from AIL before release on transaction abort · 3d4b4a3e
      Brian Foster 提交于
      When a buffer is modified, logged and committed, it ultimately ends
      up sitting on the AIL with a dirty bli waiting for metadata
      writeback. If another transaction locks and invalidates the buffer
      (freeing an inode chunk, for example) in the meantime, the bli is
      flagged as stale, the dirty state is cleared and the bli remains in
      the AIL.
      
      If a shutdown occurs before the transaction that has invalidated the
      buffer is committed, the transaction is ultimately aborted. The log
      items are flagged as such and ->iop_unlock() handles the aborted
      items. Because the bli is clean (due to the invalidation),
      ->iop_unlock() unconditionally releases it. The log item may still
      reside in the AIL, however, which means the I/O completion handler
      may still run and attempt to access it. This results in assert
      failure due to the release of the bli while still present in the AIL
      and a subsequent NULL dereference and panic in the buffer I/O
      completion handling. This can be reproduced by running generic/388
      in repetition.
      
      To avoid this problem, update xfs_buf_item_unlock() to first check
      whether the bli is aborted and if so, remove it from the AIL before
      it is released. This ensures that the bli is no longer accessed
      during the shutdown sequence after it has been freed.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      3d4b4a3e
  3. 04 2月, 2017 1 次提交
  4. 14 9月, 2016 2 次提交
    • E
      xfs: normalize "infinite" retries in error configs · 77169812
      Eric Sandeen 提交于
      As it stands today, the "fail immediately" vs. "retry forever"
      values for max_retries and retry_timeout_seconds in the xfs metadata
      error configurations are not consistent.
      
      A retry_timeout_seconds of 0 means "retry forever," but a
      max_retries of 0 means "fail immediately."
      
      retry_timeout_seconds < 0 is disallowed, while max_retries == -1
      means "retry forever."
      
      Make this consistent across the error configs, such that a value of
      0 means "fail immediately" (i.e. wait 0 seconds, or retry 0 times),
      and a value of -1 always means "retry forever."
      
      This makes retry_timeout a signed long to accommodate the -1, even
      though it stores jiffies.  Given our limit of a 1 day maximum
      timeout, this should be sufficient even at much higher HZ values
      than we have available today.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      77169812
    • X
      xfs: fix signed integer overflow · 79c350e4
      Xie XiuQi 提交于
      Use 1U for unsigned int to avoid a overflow warning from UBSAN.
      
      [   31.910858] UBSAN: Undefined behaviour in fs/xfs/xfs_buf_item.c:889:25
      [   31.911252] signed integer overflow:
      [   31.911478] -2147483648 - 1 cannot be represented in type 'int'
      [   31.911846] CPU: 1 PID: 1011 Comm: tuned Tainted: G    B          ---- -------   3.10.0-327.28.3.el7.x86_64 #1
      [   31.911857] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 01/07/2011
      [   31.911866]  1ffff1004069cd3b 0000000076bec3fd ffff8802034e69a0 ffffffff81ee3140
      [   31.911883]  ffff8802034e69b8 ffffffff81ee31fd ffffffffa0ad79e0 ffff8802034e6b20
      [   31.911898]  ffffffff81ee46e2 0000002d515470c0 0000000000000001 0000000041b58ab3
      [   31.911913] Call Trace:
      [   31.911932]  [<ffffffff81ee3140>] dump_stack+0x1e/0x20
      [   31.911947]  [<ffffffff81ee31fd>] ubsan_epilogue+0x12/0x55
      [   31.911964]  [<ffffffff81ee46e2>] handle_overflow+0x1ba/0x215
      [   31.912083]  [<ffffffff81ee4798>] __ubsan_handle_sub_overflow+0x2a/0x31
      [   31.912204]  [<ffffffffa08676fb>] xfs_buf_item_log+0x34b/0x3f0 [xfs]
      [   31.912314]  [<ffffffffa0880490>] xfs_trans_log_buf+0x120/0x260 [xfs]
      [   31.912402]  [<ffffffffa079a890>] xfs_btree_log_recs+0x80/0xc0 [xfs]
      [   31.912490]  [<ffffffffa07a29f8>] xfs_btree_delrec+0x11a8/0x2d50 [xfs]
      [   31.913589]  [<ffffffffa07a86f9>] xfs_btree_delete+0xc9/0x260 [xfs]
      [   31.913762]  [<ffffffffa075b5cf>] xfs_free_ag_extent+0x63f/0xe20 [xfs]
      [   31.914339]  [<ffffffffa075ec0f>] xfs_free_extent+0x2af/0x3e0 [xfs]
      [   31.914641]  [<ffffffffa0801b2b>] xfs_bmap_finish+0x32b/0x4b0 [xfs]
      [   31.914841]  [<ffffffffa083c2e7>] xfs_itruncate_extents+0x3b7/0x740 [xfs]
      [   31.915216]  [<ffffffffa08342fa>] xfs_setattr_size+0x60a/0x860 [xfs]
      [   31.915471]  [<ffffffffa08345ea>] xfs_vn_setattr+0x9a/0xe0 [xfs]
      [   31.915590]  [<ffffffff8149ad38>] notify_change+0x5c8/0x8a0
      [   31.915607]  [<ffffffff81450f22>] do_truncate+0x122/0x1d0
      [   31.915640]  [<ffffffff8147beee>] do_last+0x15de/0x2c80
      [   31.915707]  [<ffffffff8147d777>] path_openat+0x1e7/0xcc0
      [   31.915802]  [<ffffffff81480824>] do_filp_open+0xa4/0x160
      [   31.915848]  [<ffffffff81453127>] do_sys_open+0x1b7/0x3f0
      [   31.915879]  [<ffffffff81453392>] SyS_open+0x32/0x40
      [   31.915897]  [<ffffffff81f08989>] system_call_fastpath+0x16/0x1b
      
      [  240.086809] UBSAN: Undefined behaviour in fs/xfs/xfs_buf_item.c:866:34
      [  240.086820] signed integer overflow:
      [  240.086830] -2147483648 - 1 cannot be represented in type 'int'
      [  240.086846] CPU: 1 PID: 12969 Comm: rm Tainted: G    B          ---- -------   3.10.0-327.28.3.el7.x86_64 #1
      [  240.086857] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 01/07/2011
      [  240.086868]  1ffff10040491def 00000000e2ea59c1 ffff88020248ef40 ffffffff81ee3140
      [  240.086885]  ffff88020248ef58 ffffffff81ee31fd ffffffffa0ad79e0 ffff88020248f0c0
      [  240.086901]  ffffffff81ee46e2 0000002d02488000 0000000000000001 0000000041b58ab3
      [  240.086915] Call Trace:
      [  240.086938]  [<ffffffff81ee3140>] dump_stack+0x1e/0x20
      [  240.086953]  [<ffffffff81ee31fd>] ubsan_epilogue+0x12/0x55
      [  240.086971]  [<ffffffff81ee46e2>] handle_overflow+0x1ba/0x215
      ...
      Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      79c350e4
  5. 22 7月, 2016 1 次提交
    • D
      xfs: allocate log vector buffers outside CIL context lock · b1c5ebb2
      Dave Chinner 提交于
      One of the problems we currently have with delayed logging is that
      under serious memory pressure we can deadlock memory reclaim. THis
      occurs when memory reclaim (such as run by kswapd) is reclaiming XFS
      inodes and issues a log force to unpin inodes that are dirty in the
      CIL.
      
      The CIL is pushed, but this will only occur once it gets the CIL
      context lock to ensure that all committing transactions are complete
      and no new transactions start being committed to the CIL while the
      push switches to a new context.
      
      The deadlock occurs when the CIL context lock is held by a
      committing process that is doing memory allocation for log vector
      buffers, and that allocation is then blocked on memory reclaim
      making progress. Memory reclaim, however, is blocked waiting for
      a log force to make progress, and so we effectively deadlock at this
      point.
      
      To solve this problem, we have to move the CIL log vector buffer
      allocation outside of the context lock so that memory reclaim can
      always make progress when it needs to force the log. The problem
      with doing this is that a CIL push can take place while we are
      determining if we need to allocate a new log vector buffer for
      an item and hence the current log vector may go away without
      warning. That means we canot rely on the existing log vector being
      present when we finally grab the context lock and so we must have a
      replacement buffer ready to go at all times.
      
      To ensure this, introduce a "shadow log vector" buffer that is
      always guaranteed to be present when we gain the CIL context lock
      and format the item. This shadow buffer may or may not be used
      during the formatting, but if the log item does not have an existing
      log vector buffer or that buffer is too small for the new
      modifications, we swap it for the new shadow buffer and format
      the modifications into that new log vector buffer.
      
      The result of this is that for any object we modify more than once
      in a given CIL checkpoint, we double the memory required
      to track dirty regions in the log. For single modifications then
      we consume the shadow log vectorwe allocate on commit, and that gets
      consumed by the checkpoint. However, if we make multiple
      modifications, then the second transaction commit will allocate a
      shadow log vector and hence we will end up with double the memory
      usage as only one of the log vectors is consumed by the CIL
      checkpoint. The remaining shadow vector will be freed when th elog
      item is freed.
      
      This can probably be optimised in future - access to the shadow log
      vector is serialised by the object lock (as opposited to the active
      log vector, which is controlled by the CIL context lock) and so we
      can probably free shadow log vector from some objects when the log
      item is marked clean on removal from the AIL.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      b1c5ebb2
  6. 20 7月, 2016 2 次提交
  7. 01 6月, 2016 1 次提交
    • B
      xfs: fix broken multi-fsb buffer logging · a3916e52
      Brian Foster 提交于
      Multi-block buffers are logged based on buffer offset in
      xfs_trans_log_buf(). xfs_buf_item_log() ultimately walks each mapping in
      the buffer and marks the associated range to be logged in the
      xfs_buf_log_format bitmap for that mapping. This code is broken,
      however, in that it marks the actual buffer offsets of the associated
      range in each bitmap rather than shifting to the byte range for that
      particular mapping.
      
      For example, on a 4k fsb fs, buffer offset 4096 refers to the first byte
      of the second mapping in the buffer. This means byte 0 of the second log
      format bitmap should be tagged as dirty. Instead, the current code marks
      byte offset 4096 of the second log format bitmap, which is invalid and
      potentially out of range of the mapping.
      
      As a result of this, the log item format code invoked at transaction
      commit time is not be able to correctly identify what parts of the
      buffer to copy into log vectors. This can lead to NULL log vector
      pointer dereferences in CIL push context if the item format code was not
      able to locate any dirty ranges at all. This crash has been reproduced
      on a 4k FSB filesystem using 16k directory blocks where an unlink
      operation happened not to log anything in the first block of the
      mapping. The logged offsets were all over 4k, marked as such in the
      subsequent log format mappings, and thus left the transaction with an
      xfs_log_item that is marked DIRTY but without any logged regions.
      
      Further, even when the logged regions are marked correctly in the buffer
      log format bitmaps, the format code doesn't copy the correct ranges of
      the buffer into the log. This means that any logged region beyond the
      first block of a multi-block buffer is subject to corruption after a
      crash and log recovery sequence. This is due to a failure to convert the
      mapping bm_len field from basic blocks to bytes in the buffer offset
      tracking code in xfs_buf_item_format().
      
      Update xfs_buf_item_log() to convert buffer offsets to segment relative
      offsets when logging multi-block buffers. This ensures that the modified
      regions of a buffer are logged correctly and avoids the aforementioned
      crash. Also update xfs_buf_item_format() to correctly track the source
      offset into the buffer for the log vector formatting code. This ensures
      that the correct data is copied into the log.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      
      a3916e52
  8. 18 5月, 2016 3 次提交
  9. 10 2月, 2016 3 次提交
  10. 25 8月, 2015 1 次提交
  11. 19 8月, 2015 1 次提交
  12. 24 2月, 2015 1 次提交
  13. 22 1月, 2015 1 次提交
  14. 24 12月, 2014 1 次提交
  15. 28 11月, 2014 1 次提交
  16. 02 10月, 2014 2 次提交
    • D
      xfs: introduce xfs_buf_submit[_wait] · 595bff75
      Dave Chinner 提交于
      There is a lot of cookie-cutter code that looks like:
      
      	if (shutdown)
      		handle buffer error
      	xfs_buf_iorequest(bp)
      	error = xfs_buf_iowait(bp)
      	if (error)
      		handle buffer error
      
      spread through XFS. There's significant complexity now in
      xfs_buf_iorequest() to specifically handle this sort of synchronous
      IO pattern, but there's all sorts of nasty surprises in different
      error handling code dependent on who owns the buffer references and
      the locks.
      
      Pull this pattern into a single helper, where we can hide all the
      synchronous IO warts and hence make the error handling for all the
      callers much saner. This removes the need for a special extra
      reference to protect IO completion processing, as we can now hold a
      single reference across dispatch and waiting, simplifying the sync
      IO smeantics and error handling.
      
      In doing this, also rename xfs_buf_iorequest to xfs_buf_submit and
      make it explicitly handle on asynchronous IO. This forces all users
      to be switched specifically to one interface or the other and
      removes any ambiguity between how the interfaces are to be used. It
      also means that xfs_buf_iowait() goes away.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      595bff75
    • D
      xfs: xfs_buf_ioend and xfs_buf_iodone_work duplicate functionality · e8aaba9a
      Dave Chinner 提交于
      We do some work in xfs_buf_ioend, and some work in
      xfs_buf_iodone_work, but much of that functionality is the same.
      This work can all be done in a single function, leaving
      xfs_buf_iodone just a wrapper to determine if we should execute it
      by workqueue or directly. hence rename xfs_buf_iodone_work to
      xfs_buf_ioend(), and add a new xfs_buf_ioend_async() for places that
      need async processing.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      e8aaba9a
  17. 23 9月, 2014 1 次提交
  18. 25 6月, 2014 1 次提交
    • D
      xfs: global error sign conversion · 2451337d
      Dave Chinner 提交于
      Convert all the errors the core XFs code to negative error signs
      like the rest of the kernel and remove all the sign conversion we
      do in the interface layers.
      
      Errors for conversion (and comparison) found via searches like:
      
      $ git grep " E" fs/xfs
      $ git grep "return E" fs/xfs
      $ git grep " E[A-Z].*;$" fs/xfs
      
      Negation points found via searches like:
      
      $ git grep "= -[a-z,A-Z]" fs/xfs
      $ git grep "return -[a-z,A-D,F-Z]" fs/xfs
      $ git grep " -[a-z].*;" fs/xfs
      
      [ with some bits I missed from Brian Foster ]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      2451337d
  19. 06 6月, 2014 1 次提交
  20. 14 4月, 2014 1 次提交
  21. 07 2月, 2014 1 次提交
  22. 17 12月, 2013 1 次提交
    • D
      xfs: abort metadata writeback on permanent errors · ac8809f9
      Dave Chinner 提交于
      If we are doing aysnc writeback of metadata, we can get write errors
      but have nobody to report them to. At the moment, we simply attempt
      to reissue the write from io completion in the hope that it's a
      transient error.
      
      When it's not a transient error, the buffer is stuck forever in
      this loop, and we cannot break out of it. Eventually, unmount will
      hang because the AIL cannot be emptied and everything goes downhill
      from them.
      
      To solve this problem, only retry the write IO once before aborting
      it. We don't throw the buffer away because some transient errors can
      last minutes (e.g.  FC path failover) or even hours (thin
      provisioned devices that have run out of backing space) before they
      go away. Hence we really want to keep trying until we can't try any
      more.
      
      Because the buffer was not cleaned, however, it does not get removed
      from the AIL and hence the next pass across the AIL will start IO on
      it again. As such, we still get the "retry forever" semantics that
      we currently have, but we allow other access to the buffer in the
      mean time. Meanwhile the filesystem can continue to modify the
      buffer and relog it, so the IO errors won't hang the log or the
      filesystem.
      
      Now when we are pushing the AIL, we can see all these "permanent IO
      error" buffers and we can issue a warning about failures before we
      retry the IO. We can also catch these buffers when unmounting an
      issue a corruption warning, too.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      ac8809f9
  23. 13 12月, 2013 3 次提交
  24. 31 10月, 2013 1 次提交
  25. 24 10月, 2013 1 次提交
    • D
      xfs: decouple log and transaction headers · 239880ef
      Dave Chinner 提交于
      xfs_trans.h has a dependency on xfs_log.h for a couple of
      structures. Most code that does transactions doesn't need to know
      anything about the log, but this dependency means that they have to
      include xfs_log.h. Decouple the xfs_trans.h and xfs_log.h header
      files and clean up the includes to be in dependency order.
      
      In doing this, remove the direct include of xfs_trans_reserve.h from
      xfs_trans.h so that we remove the dependency between xfs_trans.h and
      xfs_mount.h. Hence the xfs_trans.h include can be moved to the
      indicate the actual dependencies other header files have on it.
      
      Note that these are kernel only header files, so this does not
      translate to any userspace changes at all.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      239880ef
  26. 25 9月, 2013 1 次提交
  27. 11 9月, 2013 1 次提交
    • D
      xfs: aborted buf items can be in the AIL. · 46f9d2eb
      Dave Chinner 提交于
      Saw this on generic/270 after a DQALLOC transaction overrun
      shutdown:
      
      XFS: Assertion failed: !(bip->bli_item.li_flags & XFS_LI_IN_AIL), file: fs/xfs/xfs_buf_item.c, line: 952
      .....
       xfs_buf_item_relse+0x4f/0xd0
       xfs_buf_item_unlock+0x1b4/0x1e0
       xfs_trans_free_items+0x7d/0xb0
       xfs_trans_cancel+0x13c/0x1b0
       xfs_symlink+0x37e/0xa60
      ....
      
      When a transaction abort occured.
      
      If we are aborting a transaction and trigger this code path, then
      the item may be dirty. If the item is dirty, then it may be in the
      AIL. Hence if we are aborting, we need to check if the item is in
      the AIL and remove it before freeing it.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      46f9d2eb
  28. 16 8月, 2013 1 次提交
  29. 14 8月, 2013 1 次提交
  30. 28 6月, 2013 2 次提交
    • D
      xfs: Use inode create transaction · ddf6ad01
      Dave Chinner 提交于
      Replace the use of buffer based logging of inode initialisation,
      uses the new logical form to describe the range to be initialised
      in recovery. We continue to "log" the inode buffers to push them
      into the AIL and ensure that the inode create transaction is not
      removed from the log before the inode buffers are written to disk.
      
      Update the transaction identifier and reservations to match the
      changed implementation.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      ddf6ad01
    • D
      xfs: Introduce an ordered buffer item · 5f6bed76
      Dave Chinner 提交于
      If we have a buffer that we have modified but we do not wish to
      physically log in a transaction (e.g. we've logged a logical
      change), we still need to ensure that transactional integrity is
      maintained. Hence we must not move the tail of the log past the
      transaction that the buffer is associated with before the buffer is
      written to disk.
      
      This means these special buffers still need to be included in the
      transaction and added to the AIL just like a normal buffer, but we
      do not want the modifications to the buffer written into the
      transaction. IOWs, what we want is an "ordered buffer" that
      maintains the same transactional life cycle as a physically logged
      buffer, just without the transcribing of the modifications to the
      log.
      
      Hence we need to flag the buffer as an "ordered buffer" to avoid
      including it in vector size calculations or formatting during the
      transaction. Once the transaction is committed, the buffer appears
      for all intents to be the same as a physically logged buffer as it
      transitions through the log and AIL.
      
      Relogging will also work just fine for such an ordered buffer - the
      logical transaction will be replayed before the subsequent
      modifications that relog the buffer, so everything will be
      reconstructed correctly by recovery.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      5f6bed76