1. 15 2月, 2013 1 次提交
    • B
      xfs: recheck buffer pinned status after push trylock failure · 5337fe9b
      Brian Foster 提交于
      The buffer pinned check and trylock sequence in xfs_buf_item_push()
      can race with an active transaction on marking the buffer pinned.
      This can result in the buffer becoming pinned and stale after the
      initial check and the trylock failure, but before the check in
      xfs_buf_trylock() that issues a log force. If the log force is
      issued from this context, a spinlock recursion occurs on xa_lock.
      
      Prepare xfs_buf_item_push() to handle the race by detecting a
      pinned buffer after the trylock failure so xfsaild issues a log
      force from a safe context. This, along with various previous fixes,
      renders the log force in xfs_buf_trylock() redundant.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      5337fe9b
  2. 29 1月, 2013 1 次提交
    • D
      xfs: fix shutdown hang on invalid inode during create · 9f87832a
      Dave Chinner 提交于
      When the new inode verify in xfs_iread() fails, the create
      transaction is aborted and a shutdown occurs. The subsequent unmount
      then hangs in xfs_wait_buftarg() on a buffer that has an elevated
      hold count. Debug showed that it was an AGI buffer getting stuck:
      
      [   22.576147] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck
      [   22.976213] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck
      [   23.376206] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck
      [   23.776325] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck
      
      The trace of this buffer leading up to the shutdown (trimmed for
      brevity) looks like:
      
      xfs_buf_init:        bno 0x2 nblks 0x1 hold 1 caller xfs_buf_get_map
      xfs_buf_get:         bno 0x2 len 0x200 hold 1 caller xfs_buf_read_map
      xfs_buf_read:        bno 0x2 len 0x200 hold 1 caller xfs_trans_read_buf_map
      xfs_buf_iorequest:   bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_read
      xfs_buf_hold:        bno 0x2 nblks 0x1 hold 1 caller xfs_buf_iorequest
      xfs_buf_rele:        bno 0x2 nblks 0x1 hold 2 caller xfs_buf_iorequest
      xfs_buf_iowait:      bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_read
      xfs_buf_ioerror:     bno 0x2 len 0x200 hold 1 caller xfs_buf_bio_end_io
      xfs_buf_iodone:      bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_ioend
      xfs_buf_iowait_done: bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_read
      xfs_buf_hold:        bno 0x2 nblks 0x1 hold 1 caller xfs_buf_item_init
      xfs_trans_read_buf:  bno 0x2 len 0x200 hold 2 recur 0 refcount 1
      xfs_trans_brelse:    bno 0x2 len 0x200 hold 2 recur 0 refcount 1
      xfs_buf_item_relse:  bno 0x2 nblks 0x1 hold 2 caller xfs_trans_brelse
      xfs_buf_rele:        bno 0x2 nblks 0x1 hold 2 caller xfs_buf_item_relse
      xfs_buf_unlock:      bno 0x2 nblks 0x1 hold 1 caller xfs_trans_brelse
      xfs_buf_rele:        bno 0x2 nblks 0x1 hold 1 caller xfs_trans_brelse
      xfs_buf_trylock:     bno 0x2 nblks 0x1 hold 2 caller _xfs_buf_find
      xfs_buf_find:        bno 0x2 len 0x200 hold 2 caller xfs_buf_get_map
      xfs_buf_get:         bno 0x2 len 0x200 hold 2 caller xfs_buf_read_map
      xfs_buf_read:        bno 0x2 len 0x200 hold 2 caller xfs_trans_read_buf_map
      xfs_buf_hold:        bno 0x2 nblks 0x1 hold 2 caller xfs_buf_item_init
      xfs_trans_read_buf:  bno 0x2 len 0x200 hold 3 recur 0 refcount 1
      xfs_trans_log_buf:   bno 0x2 len 0x200 hold 3 recur 0 refcount 1
      xfs_buf_item_unlock: bno 0x2 len 0x200 hold 3 flags DIRTY liflags ABORTED
      xfs_buf_unlock:      bno 0x2 nblks 0x1 hold 3 caller xfs_buf_item_unlock
      xfs_buf_rele:        bno 0x2 nblks 0x1 hold 3 caller xfs_buf_item_unlock
      
      And that is the AGI buffer from cold cache read into memory to
      transaction abort. You can see at transaction abort the bli is dirty
      and only has a single reference. The item is not pinned, and it's
      not in the AIL. Hence the only reference to it is this transaction.
      
      The problem is that the xfs_buf_item_unlock() call is dropping the
      last reference to the xfs_buf_log_item attached to the buffer (which
      holds a reference to the buffer), but it is not freeing the
      xfs_buf_log_item. Hence nothing will ever release the buffer, and
      the unmount hangs waiting for this reference to go away.
      
      The fix is simple - xfs_buf_item_unlock needs to detect the last
      reference going away in this case and free the xfs_buf_log_item to
      release the reference it holds on the buffer.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      9f87832a
  3. 26 1月, 2013 1 次提交
    • D
      xfs: fix shutdown hang on invalid inode during create · 3b19034d
      Dave Chinner 提交于
      When the new inode verify in xfs_iread() fails, the create
      transaction is aborted and a shutdown occurs. The subsequent unmount
      then hangs in xfs_wait_buftarg() on a buffer that has an elevated
      hold count. Debug showed that it was an AGI buffer getting stuck:
      
      [   22.576147] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck
      [   22.976213] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck
      [   23.376206] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck
      [   23.776325] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck
      
      The trace of this buffer leading up to the shutdown (trimmed for
      brevity) looks like:
      
      xfs_buf_init:        bno 0x2 nblks 0x1 hold 1 caller xfs_buf_get_map
      xfs_buf_get:         bno 0x2 len 0x200 hold 1 caller xfs_buf_read_map
      xfs_buf_read:        bno 0x2 len 0x200 hold 1 caller xfs_trans_read_buf_map
      xfs_buf_iorequest:   bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_read
      xfs_buf_hold:        bno 0x2 nblks 0x1 hold 1 caller xfs_buf_iorequest
      xfs_buf_rele:        bno 0x2 nblks 0x1 hold 2 caller xfs_buf_iorequest
      xfs_buf_iowait:      bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_read
      xfs_buf_ioerror:     bno 0x2 len 0x200 hold 1 caller xfs_buf_bio_end_io
      xfs_buf_iodone:      bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_ioend
      xfs_buf_iowait_done: bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_read
      xfs_buf_hold:        bno 0x2 nblks 0x1 hold 1 caller xfs_buf_item_init
      xfs_trans_read_buf:  bno 0x2 len 0x200 hold 2 recur 0 refcount 1
      xfs_trans_brelse:    bno 0x2 len 0x200 hold 2 recur 0 refcount 1
      xfs_buf_item_relse:  bno 0x2 nblks 0x1 hold 2 caller xfs_trans_brelse
      xfs_buf_rele:        bno 0x2 nblks 0x1 hold 2 caller xfs_buf_item_relse
      xfs_buf_unlock:      bno 0x2 nblks 0x1 hold 1 caller xfs_trans_brelse
      xfs_buf_rele:        bno 0x2 nblks 0x1 hold 1 caller xfs_trans_brelse
      xfs_buf_trylock:     bno 0x2 nblks 0x1 hold 2 caller _xfs_buf_find
      xfs_buf_find:        bno 0x2 len 0x200 hold 2 caller xfs_buf_get_map
      xfs_buf_get:         bno 0x2 len 0x200 hold 2 caller xfs_buf_read_map
      xfs_buf_read:        bno 0x2 len 0x200 hold 2 caller xfs_trans_read_buf_map
      xfs_buf_hold:        bno 0x2 nblks 0x1 hold 2 caller xfs_buf_item_init
      xfs_trans_read_buf:  bno 0x2 len 0x200 hold 3 recur 0 refcount 1
      xfs_trans_log_buf:   bno 0x2 len 0x200 hold 3 recur 0 refcount 1
      xfs_buf_item_unlock: bno 0x2 len 0x200 hold 3 flags DIRTY liflags ABORTED
      xfs_buf_unlock:      bno 0x2 nblks 0x1 hold 3 caller xfs_buf_item_unlock
      xfs_buf_rele:        bno 0x2 nblks 0x1 hold 3 caller xfs_buf_item_unlock
      
      And that is the AGI buffer from cold cache read into memory to
      transaction abort. You can see at transaction abort the bli is dirty
      and only has a single reference. The item is not pinned, and it's
      not in the AIL. Hence the only reference to it is this transaction.
      
      The problem is that the xfs_buf_item_unlock() call is dropping the
      last reference to the xfs_buf_log_item attached to the buffer (which
      holds a reference to the buffer), but it is not freeing the
      xfs_buf_log_item. Hence nothing will ever release the buffer, and
      the unmount hangs waiting for this reference to go away.
      
      The fix is simple - xfs_buf_item_unlock needs to detect the last
      reference going away in this case and free the xfs_buf_log_item to
      release the reference it holds on the buffer.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      3b19034d
  4. 17 1月, 2013 3 次提交
  5. 18 12月, 2012 4 次提交
  6. 09 11月, 2012 1 次提交
    • D
      xfs: fix buffer shudown reference count mismatch · 03b1293e
      Dave Chinner 提交于
      When we shut down the filesystem, we have to unpin and free all the
      buffers currently active in the CIL. To do this we unpin and remove
      them in one operation as a result of a failed iclogbuf write. For
      buffers, we do this removal via a simultated IO completion of after
      marking the buffer stale.
      
      At the time we do this, we have two references to the buffer - the
      active LRU reference and the buf log item.  The LRU reference is
      removed by marking the buffer stale, and the active CIL reference is
      by the xfs_buf_iodone() callback that is run by
      xfs_buf_do_callbacks() during ioend processing (via the bp->b_iodone
      callback).
      
      However, ioend processing requires one more reference - that of the
      IO that it is completing. We don't have this reference, so we free
      the buffer prematurely and use it after it is freed. For buffers
      marked with XBF_ASYNC, this leads to assert failures in
      xfs_buf_rele() on debug kernels because the b_hold count is zero.
      
      Fix this by making sure we take the necessary IO reference before
      starting IO completion processing on the stale buffer, and set the
      XBF_ASYNC flag to ensure that IO completion processing removes all
      the active references from the buffer to ensure it is fully torn
      down.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      03b1293e
  7. 08 11月, 2012 1 次提交
    • D
      xfs: fix buffer shudown reference count mismatch · 137fff09
      Dave Chinner 提交于
      When we shut down the filesystem, we have to unpin and free all the
      buffers currently active in the CIL. To do this we unpin and remove
      them in one operation as a result of a failed iclogbuf write. For
      buffers, we do this removal via a simultated IO completion of after
      marking the buffer stale.
      
      At the time we do this, we have two references to the buffer - the
      active LRU reference and the buf log item.  The LRU reference is
      removed by marking the buffer stale, and the active CIL reference is
      by the xfs_buf_iodone() callback that is run by
      xfs_buf_do_callbacks() during ioend processing (via the bp->b_iodone
      callback).
      
      However, ioend processing requires one more reference - that of the
      IO that it is completing. We don't have this reference, so we free
      the buffer prematurely and use it after it is freed. For buffers
      marked with XBF_ASYNC, this leads to assert failures in
      xfs_buf_rele() on debug kernels because the b_hold count is zero.
      
      Fix this by making sure we take the necessary IO reference before
      starting IO completion processing on the stale buffer, and set the
      XBF_ASYNC flag to ensure that IO completion processing removes all
      the active references from the buffer to ensure it is fully torn
      down.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      137fff09
  8. 14 7月, 2012 2 次提交
  9. 02 7月, 2012 2 次提交
    • D
      xfs: support discontiguous buffers in the xfs_buf_log_item · 372cc85e
      Dave Chinner 提交于
      discontigous buffer in separate buffer format structures. This means log
      recovery will recover all the changes on a per segment basis without
      requiring any knowledge of the fact that it was logged from a
      compound buffer.
      
      To do this, we need to be able to determine what buffer segment any
      given offset into the compound buffer sits over. This enables us to
      translate the dirty bitmap in the number of separate buffer format
      structures required.
      
      We also need to be able to determine the number of bitmap elements
      that a given buffer segment has, as this determines the size of the
      buffer format structure. Hence we need to be able to determine the
      both the start offset into the buffer and the length of a given
      segment to be able to calculate this.
      
      With this information, we can preallocate, build and format the
      correct log vector array for each segment in a compound buffer to
      appear exactly the same as individually logged buffers in the log.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      372cc85e
    • D
      xfs: struct xfs_buf_log_format isn't variable sized. · 77c1a08f
      Dave Chinner 提交于
      The struct xfs_buf_log_format wants to think the dirty bitmap is
      variable sized.  In fact, it is variable size on disk simply due to
      the way we map it from the in-memory structure, but we still just
      use a fixed size memory allocation for the in-memory structure.
      
      Hence it makes no sense to set the function up as a variable sized
      structure when we already know it's maximum size, and we always
      allocate it as such. Simplify the structure by making the dirty
      bitmap a fixed sized array and just using the size of the structure
      for the allocation size.
      
      This will make it much simpler to allocate and manipulate an array
      of format structures for discontiguous buffer support.
      
      The previous struct xfs_buf_log_item size according to
      /proc/slabinfo was 224 bytes. pahole doesn't give the same size
      because of the variable size definition. With this modification,
      pahole reports the same as /proc/slabinfo:
      
      	/* size: 224, cachelines: 4, members: 6 */
      
      Because the xfs_buf_log_item size is now determined by the maximum
      supported block size we introduce a dependency on xfs_alloc_btree.h.
      Avoid this dependency by moving the idefines for the maximum block
      sizes supported to xfs_types.h with all the other max/min type
      defines to avoid any new dependencies.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      77c1a08f
  10. 15 5月, 2012 5 次提交
    • D
      xfs: move xfsagino_t to xfs_types.h · 60a34607
      Dave Chinner 提交于
      Untangle the header file includes a bit by moving the definition of
      xfs_agino_t to xfs_types.h. This removes the dependency that xfs_ag.h has on
      xfs_inum.h, meaning we don't need to include xfs_inum.h everywhere we include
      xfs_ag.h.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      60a34607
    • D
      xfs: use blocks for storing the desired IO size · aa0e8833
      Dave Chinner 提交于
      Now that we pass block counts everywhere, and index buffers by block
      number and length in units of blocks, convert the desired IO size
      into block counts rather than bytes. Convert the code to use block
      counts, and those that need byte counts get converted at the time of
      use.
      
      Rename the b_desired_count variable to something closer to it's
      purpose - b_io_length - as it is only used to specify the length of
      an IO for a subset of the buffer.  The only time this is used is for
      log IO - both writing iclogs and during log recovery. In all other
      cases, the b_io_length matches b_length, and hence a lot of code
      confuses the two. e.g. the buf item code uses the io count
      exclusively when it should be using the buffer length. Fix these
      apprpriately as they are found.
      
      Also, remove the XFS_BUF_{SET_}COUNT() macros that are just wrappers
      around the desired IO length. They only serve to make the code
      shouty loud, don't actually add any real value, and are often used
      incorrectly.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      aa0e8833
    • D
      xfs: pass shutdown method into xfs_trans_ail_delete_bulk · 04913fdd
      Dave Chinner 提交于
      xfs_trans_ail_delete_bulk() can be called from different contexts so
      if the item is not in the AIL we need different shutdown for each
      context.  Pass in the shutdown method needed so the correct action
      can be taken.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      04913fdd
    • C
      xfs: on-stack delayed write buffer lists · 43ff2122
      Christoph Hellwig 提交于
      Queue delwri buffers on a local on-stack list instead of a per-buftarg one,
      and write back the buffers per-process instead of by waking up xfsbufd.
      
      This is now easily doable given that we have very few places left that write
      delwri buffers:
      
       - log recovery:
      	Only done at mount time, and already forcing out the buffers
      	synchronously using xfs_flush_buftarg
      
       - quotacheck:
      	Same story.
      
       - dquot reclaim:
      	Writes out dirty dquots on the LRU under memory pressure.  We might
      	want to look into doing more of this via xfsaild, but it's already
      	more optimal than the synchronous inode reclaim that writes each
      	buffer synchronously.
      
       - xfsaild:
      	This is the main beneficiary of the change.  By keeping a local list
      	of buffers to write we reduce latency of writing out buffers, and
      	more importably we can remove all the delwri list promotions which
      	were hitting the buffer cache hard under sustained metadata loads.
      
      The implementation is very straight forward - xfs_buf_delwri_queue now gets
      a new list_head pointer that it adds the delwri buffers to, and all callers
      need to eventually submit the list using xfs_buf_delwi_submit or
      xfs_buf_delwi_submit_nowait.  Buffers that already are on a delwri list are
      skipped in xfs_buf_delwri_queue, assuming they already are on another delwri
      list.  The biggest change to pass down the buffer list was done to the AIL
      pushing. Now that we operate on buffers the trylock, push and pushbuf log
      item methods are merged into a single push routine, which tries to lock the
      item, and if possible add the buffer that needs writeback to the buffer list.
      This leads to much simpler code than the previous split but requires the
      individual IOP_PUSH instances to unlock and reacquire the AIL around calls
      to blocking routines.
      
      Given that xfsailds now also handle writing out buffers, the conditions for
      log forcing and the sleep times needed some small changes.  The most
      important one is that we consider an AIL busy as long we still have buffers
      to push, and the other one is that we do increment the pushed LSN for
      buffers that are under flushing at this moment, but still count them towards
      the stuck items for restart purposes.  Without this we could hammer on stuck
      items without ever forcing the log and not make progress under heavy random
      delete workloads on fast flash storage devices.
      
      [ Dave Chinner:
      	- rebase on previous patches.
      	- improved comments for XBF_DELWRI_Q handling
      	- fix XBF_ASYNC handling in queue submission (test 106 failure)
      	- rename delwri submit function buffer list parameters for clarity
      	- xfs_efd_item_push() should return XFS_ITEM_PINNED ]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      43ff2122
    • C
      xfs: do not add buffers to the delwri queue until pushed · 960c60af
      Christoph Hellwig 提交于
      Instead of adding buffers to the delwri list as soon as they are logged,
      even if they can't be written until commited because they are pinned
      defer adding them to the delwri list until xfsaild pushes them.  This
      makes the code more similar to other log items and prepares for writing
      buffers directly from xfsaild.
      
      The complication here is that we need to fail buffers that were added
      but not logged yet in xfs_buf_item_unpin, borrowing code from
      xfs_bioerror.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      960c60af
  11. 09 11月, 2011 1 次提交
  12. 12 10月, 2011 5 次提交
  13. 11 8月, 2011 1 次提交
  14. 26 7月, 2011 8 次提交
  15. 13 7月, 2011 3 次提交
  16. 08 7月, 2011 1 次提交
新手
引导
客服 返回
顶部