1. 07 4月, 2009 7 次提交
    • D
      xfs: block callers of xfs_flush_inodes() correctly · e43afd72
      Dave Chinner 提交于
      xfs_flush_inodes() currently uses a magic timeout to wait for
      some inodes to be flushed before returning. This isn't
      really reliable but used to be the best that could be done
      due to deadlock potential of waiting for the entire flush.
      
      Now the inode flush is safe to execute while we hold page
      and inode locks, we can wait for all the inodes to flush
      synchronously. Convert the wait mechanism to a completion
      to do this efficiently. This should remove all remaining
      spurious ENOSPC errors from the delayed allocation reservation
      path.
      
      This is extracted almost line for line from a larger patch
      from Mikulas Patocka.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      e43afd72
    • D
      xfs: make inode flush at ENOSPC synchronous · 5825294e
      Dave Chinner 提交于
      When we are writing to a single file and hit ENOSPC, we trigger a background
      flush of the inode and try again.  Because we hold page locks and the iolock,
      the flush won't proceed until after we release these locks. This occurs once
      we've given up and ENOSPC has been reported. Hence if this one is the only
      dirty inode in the system, we'll get an ENOSPC prematurely.
      
      To fix this, remove the async flush from the allocation routines and move
      it to the top of the write path where we can do a synchronous flush
      and retry the write again. Only retry once as a second ENOSPC indicates
      that we really are ENOSPC.
      
      This avoids a page cache deadlock when trying to do this flush synchronously
      in the allocation layer that was identified by Mikulas Patocka.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      5825294e
    • D
      xfs: use xfs_sync_inodes() for device flushing · a8d770d9
      Dave Chinner 提交于
      Currently xfs_device_flush calls sync_blockdev() which is
      a no-op for XFS as all it's metadata is held in a different
      address to the one sync_blockdev() works on.
      
      Call xfs_sync_inodes() instead to flush all the delayed
      allocation blocks out. To do this as efficiently as possible,
      do it via two passes - one to do an async flush of all the
      dirty blocks and a second to wait for all the IO to complete.
      This requires some modification to the xfs-sync_inodes_ag()
      flush code to do efficiently.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      a8d770d9
    • D
      xfs: inform the xfsaild of the push target before sleeping · 9d7fef74
      Dave Chinner 提交于
      When trying to reserve log space, we find the amount of space
      we need, then go to sleep waiting for space. When we are
      woken, we try to push the tail of the log forward to make
      sure we have space available.
      
      Unfortunately, this means that if there is not space available, and
      everyone who needs space goes to sleep there is no-one left to push
      the tail of the log to make space available. Once we have a thread
      waiting for space to become available, the others queue up behind
      it in a FIFO, and none of them push the tail of the log.
      
      This can result in everyone going to sleep in xlog_grant_log_space()
      if the first sleeper races with the last I/O that moves the tail
      of the log forward. With no further I/O tomove the tail of the log,
      there is nothing to wake the sleepers and hence all transactions
      just stop.
      
      Fix this by making sure the xfsaild will create enough space for the
      transaction that is about to sleep by moving the push target far
      enough forwards to ensure that that the curent proceeees will have
      enough space available when it is woken. That is, we push the
      AIL before we go to sleep.
      
      Because we've inserted the log ticket into the queue before we've
      pushed and gone to sleep, subsequent transactions will wait behind
      this one. Hence we are guaranteed to have space available when we
      are woken.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      9d7fef74
    • D
      xfs: prevent unwritten extent conversion from blocking I/O completion · c626d174
      Dave Chinner 提交于
      Unwritten extent conversion can recurse back into the filesystem due
      to memory allocation. Memory reclaim requires I/O completions to be
      processed to allow the callers to make progress. If the I/O
      completion workqueue thread is doing the recursion, then we have a
      deadlock situation.
      
      Move unwritten extent completion into it's own workqueue so it
      doesn't block I/O completions for normal delayed allocation or
      overwrite data.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      c626d174
    • D
      xfs: fix double free of inode · 705db3fd
      Dave Chinner 提交于
      If we fail to initialise the VFS inode in inode_init_always(),
      it will call ->delete_inode internally resulting in the inode being
      freed. Hence we need to delay the call to inode_init_always()
      until after the XFS inode is sufficient set up to handle a
      call to ->delete_inode, and then if that fails do not touch
      the inode again at all as it has been freed.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      705db3fd
    • D
      xfs: validate log feature fields correctly · a6cb767e
      Dave Chinner 提交于
      If the large log sector size feature bit is set in the
      superblock by accident (say disk corruption), the then
      fields that are now considered valid are not checked on
      production kernels. The checks are present as ASSERT
      statements so cause a panic on a debug kernel.
      
      Change this so that the fields are validity checked if
      the feature bit is set and abort the log mount if the
      fields do not contain valid values.
      Reported-by: NEric Sesterhenn <snakebyte@gmx.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      a6cb767e
  2. 31 3月, 2009 33 次提交